There was a post on CF-Talk regarding specifics of locking, and I thought I'd create a summary (though I'm not Ben Forta, who was requested by name), along with some ideas for making the job simpler.
In CF5 and less, CFLOCK was required for shared memory access, as well as race conditions. With CFMX, CFLOCK is only required for race conditions, because the underlying Java runtime takes care of the shared memory access issues. Great, but what is a race condition?
Race Condition: A situation where two different requests are manipulating a single resource, and it's possible for the two requests to step on each other's toes.
The easiest example is something like these two queries (which deducts an item from the inventory of a given product):
<cfquery datasource="#request.dsn#" name="get">
SELECT inventory
FROM product
WHERE productID = 1
</cfquery>
<cfquery datasource="#request.dsn#">
UPDATE product SET
inventory = #get.inventory# - 1,
inventoryUpdate = now()
WHERE productID = 1
</cfquery>
If two separate requests start this process perfectly in parallel they'll both get the same result from the first query (say 4), and then when they run the second query, they'll both update the inventory to 3. This is clearly NOT the correct result (it should be 2).
We can solve this problem several ways, but we'll use a CFLOCK statement to do it:
<cflock name="inventoryupdate" type="exclusive" timeout="10">
<cfquery datasource="#request.dsn#" name="get">
SELECT inventory
FROM product
WHERE productID = 1
</cfquery>
<cfquery datasource="#request.dsn#">
UPDATE product SET
inventory = #get.inventory# - 1,
inventoryUpdate = now()
WHERE productID = 1
</cfquery>
</cflock>
What we've done is single-thread access to these two queries. Running through the example again, the first request would enter the CFLOCK, and the second would wait for the first to complete. The first would select 4, update to 3, and then exit the CFLOCK. Then the second request would enter the CFLOCK, select 3, update to 2, and exit. Problem solved.
There are better ways to solve this particular problem (relative updates and transactions spring to mind), but this entry is about CFLOCK. What's important is the general type of thing that's happening: a multi-step process concerning a resource that is shared across requests, where later steps depend on results of earlier steps, or the steps must happen all-or-nothing.
Where else might we find these kind of problems?
Well, all CF variables in shared scopes (server, application, session, client) are resources that are shared across requests. This includes instance variables within CFCs that are stored in one of these scopes. Corollary to this last item is the fact that local variables in CFC methods not declared with the var
keyword are instance variables. This can result in mysterious bugs that only ever crop up under load, so it's VERY important to use the var
keyword properly.
We also find it in database access, as demonstrated above, though it's usually better to use CFTRANSACTION to solve those problems, since database-level transactions are very likely going to be a lot more efficient.
Finally, we see it other external resources. The most common is files on the filesystem, though external objects (Java, COM) are another. Many objects are internally synchronized, so you needn't worry about locking, but not all. Make sure you check the documentation of your specific object. Notably, most of the Java Collection Classes are NOT synchronized, though there are static methods in the Collections
class to turn them into synchronized versions of themselves.
Well, we know we don't have to CFLOCK all access to shared scopes (that went the way of the dodo in CFMX), but when do we have to lock? We again look at the type of operation we need. Clearly reading and writing single variables doesn't qualify, but reading and writing multiple variables does.
So what does this really mean? If you ever write a variable in a shared scope, and any code that depends on it also depends on any other shared value, you must lock all access to the shared variable, both read and write. Ouch. That's a lot of locking, because every variable has to be written, or it wouldn't exist, so that means you have to lock everything except stand-alone variables.
But fret not, because CFLOCK isn't the only way to lock variable access. You can use some tricks to avoid having to use CFLOCK all over the place. The best one is for application variables that get initialized once, and never change. Since there is only one write event, we can break their lifecycle in two: the write phase, and the read phase. All we need to do is assure that no request will EVER get to the read phase before the write phase is complete, and that no request will EVER perform the write phase if it has already been performed. If we do that, then we never need to use CFLOCK on application variable reads. The question, of course, is how do we do that? Here's the way I prefer (in Application.cfm
, or the root settings file):
1. <cflock scope="application" type="readonly" timeout="10">
2. <cfset isAppWritten = structKeyExists(application, "appWritten") />
3. </cflock>
4. <cfif NOT isAppWritten>
5. <cflock scope="application" type="exclusive" timeout="10">
6. <cfif NOT structKeyExists(application, "appWritten")>
7. <!--- set your app variables --->
8. <cfset application.appWritten = true />
9. </cfif>
10. </cflock>
11. </cfif>
Why does this work? First we test if we're through the write phase (lines 1-4). If we are, great, otherwise we have to attempt to perform it ourselves. Assuming it's not complete, we then get a lock on the initialization code (line 5). Once we get the lock (potentially waiting for other requests to release it), then we again check if the write phase is complete (line 6). We need the second check, because it's possible that while we were waiting for the lock, another request might have finished. If it's still not done, then we perform the write phase and exit the lock (lines 7-11).
There is a slight fudge going on for efficiency. The outer CFIF is unneeded, because the inner one will work by itself (though the reverse is NOT true). However, getting exclusive locks is expensive (and kills scalability), so we want to avoid it where possible, especially since this code will be executed by EVERY request. The outer CFIF is ensuring that no request will have to get the lock unless it comes in before the first request finishes the write phase, which basically translates to never.
"But what about CFC instance variables?", you're probably saying. "They're application variables too, and they're definitely going to get manipulated, or they'd just be normal application-scope variables." Time for another 'trick', though this one is far less sneaky: we don't have to lock application variables only with a scope="application"
CFLOCK.
Instead, inside our CFC, we'll lock all access to instance variables using a named lock. Then the non-CFC application code can still reference the application-scope instances without aquiring a lock, but we retain our ability to prevent race conditions. I perfer to use a UUID for my locking, which is set in the init()
method of the CFC into an instance variable. That UUID is then used to lock all instance variable access using a named CFLOCK in exactly the same way as we'd used scoped CFLOCK for "normal" variables.
<cffunction name="init">
<cfset variables.my.uuid = createUUID() />
<!--- set inventory variables --->
</cffunction>
<cffunction name="getInventory">
<cfreturn variables.my.inventory />
</cffunction>
<cffunction name="setInventory">
<cfargument name="inventory" />
<cflock name="#variables.my.uuid#" type="exclusive" timeout="10">
<cfset variables.my.inventory = inventory />
<cfset variables.my.inventoryUpdate = now() />
</cflock>
</cffunction>
There are two caveats:
- CFC don't have real constructors, meaning that it's possible to call the
init()
method multiple times (bad) or call other methods before calling init()
(even worse). What does this mean? You need to take a couple precautions. First, all methods should fail if init()
hasn't been called. Most CFCs are like this anyway, because they depend on initialization parameters (like a DSN). Second, calls to the init()
method must be externally locked. Fortunately, since we're creating and initializing all our application-scope CFCs within the locking framework discussed above, that's already taken care of as well. Just be careful of non-application-scope CFCs (like session-scope).
- This type of locking only keeps the CFC's internals in sync. It is still suceptible to the exact same problem we ran into with the first example using two queries (coincidentally, performing the exact same operation). So if in our application code (outside the CFC) we call
getInventory()
and follow with a setInventory()
that uses the value, we still have to lock it on our end, just like the first example.
For external resources, locking is a bit trickier. Files on the local file system are easy, always use a named lock on the canonical absolute pathname. Files on a remote filesystem shared between servers are problematic, because there's no way to use CFLOCK across multiple servers. You'd have to use some kind of semaphore file, and then lock access to that, and it turns into a mess very quickly. External objects can usually be locked using their class name (like files), if they're local. Remote shared objects should have built-in synchronization.