Locking in CFMX

There was a post on CF-Talk regarding specifics of locking, and I thought I'd create a summary (though I'm not Ben Forta, who was requested by name), along with some ideas for making the job simpler.

In CF5 and less, CFLOCK was required for shared memory access, as well as race conditions. With CFMX, CFLOCK is only required for race conditions, because the underlying Java runtime takes care of the shared memory access issues. Great, but what is a race condition?

Race Condition: A situation where two different requests are manipulating a single resource, and it's possible for the two requests to step on each other's toes.

The easiest example is something like these two queries (which deducts an item from the inventory of a given product):

<cfquery datasource="#request.dsn#" name="get">
  SELECT inventory
  FROM product
  WHERE productID = 1
</cfquery>
<cfquery datasource="#request.dsn#">
  UPDATE product SET
    inventory = #get.inventory# - 1,
    inventoryUpdate = now()
  WHERE productID = 1
</cfquery>

If two separate requests start this process perfectly in parallel they'll both get the same result from the first query (say 4), and then when they run the second query, they'll both update the inventory to 3. This is clearly NOT the correct result (it should be 2).

We can solve this problem several ways, but we'll use a CFLOCK statement to do it:

<cflock name="inventoryupdate" type="exclusive" timeout="10">
  <cfquery datasource="#request.dsn#" name="get">
    SELECT inventory
    FROM product
    WHERE productID = 1
  </cfquery>
  <cfquery datasource="#request.dsn#">
    UPDATE product SET
      inventory = #get.inventory# - 1,
      inventoryUpdate = now()
    WHERE productID = 1
  </cfquery>
</cflock>

What we've done is single-thread access to these two queries. Running through the example again, the first request would enter the CFLOCK, and the second would wait for the first to complete. The first would select 4, update to 3, and then exit the CFLOCK. Then the second request would enter the CFLOCK, select 3, update to 2, and exit. Problem solved.

There are better ways to solve this particular problem (relative updates and transactions spring to mind), but this entry is about CFLOCK. What's important is the general type of thing that's happening: a multi-step process concerning a resource that is shared across requests, where later steps depend on results of earlier steps, or the steps must happen all-or-nothing.

Where else might we find these kind of problems?

Well, all CF variables in shared scopes (server, application, session, client) are resources that are shared across requests. This includes instance variables within CFCs that are stored in one of these scopes. Corollary to this last item is the fact that local variables in CFC methods not declared with the var keyword are instance variables. This can result in mysterious bugs that only ever crop up under load, so it's VERY important to use the var keyword properly.

We also find it in database access, as demonstrated above, though it's usually better to use CFTRANSACTION to solve those problems, since database-level transactions are very likely going to be a lot more efficient.

Finally, we see it other external resources. The most common is files on the filesystem, though external objects (Java, COM) are another. Many objects are internally synchronized, so you needn't worry about locking, but not all. Make sure you check the documentation of your specific object. Notably, most of the Java Collection Classes are NOT synchronized, though there are static methods in the Collections class to turn them into synchronized versions of themselves.

Well, we know we don't have to CFLOCK all access to shared scopes (that went the way of the dodo in CFMX), but when do we have to lock? We again look at the type of operation we need. Clearly reading and writing single variables doesn't qualify, but reading and writing multiple variables does.

So what does this really mean? If you ever write a variable in a shared scope, and any code that depends on it also depends on any other shared value, you must lock all access to the shared variable, both read and write. Ouch. That's a lot of locking, because every variable has to be written, or it wouldn't exist, so that means you have to lock everything except stand-alone variables.

But fret not, because CFLOCK isn't the only way to lock variable access. You can use some tricks to avoid having to use CFLOCK all over the place. The best one is for application variables that get initialized once, and never change. Since there is only one write event, we can break their lifecycle in two: the write phase, and the read phase. All we need to do is assure that no request will EVER get to the read phase before the write phase is complete, and that no request will EVER perform the write phase if it has already been performed. If we do that, then we never need to use CFLOCK on application variable reads. The question, of course, is how do we do that? Here's the way I prefer (in Application.cfm, or the root settings file):

 1. <cflock scope="application" type="readonly" timeout="10">
 2.   <cfset isAppWritten = structKeyExists(application, "appWritten") />
 3. </cflock>
 4. <cfif NOT isAppWritten>
 5.   <cflock scope="application" type="exclusive" timeout="10">
 6.     <cfif NOT structKeyExists(application, "appWritten")>
 7.       <!--- set your app variables --->
 8.       <cfset application.appWritten = true />
 9.     </cfif>
10.   </cflock>
11. </cfif>

Why does this work? First we test if we're through the write phase (lines 1-4). If we are, great, otherwise we have to attempt to perform it ourselves. Assuming it's not complete, we then get a lock on the initialization code (line 5). Once we get the lock (potentially waiting for other requests to release it), then we again check if the write phase is complete (line 6). We need the second check, because it's possible that while we were waiting for the lock, another request might have finished. If it's still not done, then we perform the write phase and exit the lock (lines 7-11).

There is a slight fudge going on for efficiency. The outer CFIF is unneeded, because the inner one will work by itself (though the reverse is NOT true). However, getting exclusive locks is expensive (and kills scalability), so we want to avoid it where possible, especially since this code will be executed by EVERY request. The outer CFIF is ensuring that no request will have to get the lock unless it comes in before the first request finishes the write phase, which basically translates to never.

"But what about CFC instance variables?", you're probably saying. "They're application variables too, and they're definitely going to get manipulated, or they'd just be normal application-scope variables." Time for another 'trick', though this one is far less sneaky: we don't have to lock application variables only with a scope="application" CFLOCK.

Instead, inside our CFC, we'll lock all access to instance variables using a named lock. Then the non-CFC application code can still reference the application-scope instances without aquiring a lock, but we retain our ability to prevent race conditions. I perfer to use a UUID for my locking, which is set in the init() method of the CFC into an instance variable. That UUID is then used to lock all instance variable access using a named CFLOCK in exactly the same way as we'd used scoped CFLOCK for "normal" variables.

<cffunction name="init">
  <cfset variables.my.uuid = createUUID() />
  <!--- set inventory variables --->
</cffunction>

<cffunction name="getInventory">
  <cfreturn variables.my.inventory />
</cffunction>

<cffunction name="setInventory">
  <cfargument name="inventory" />
  <cflock name="#variables.my.uuid#" type="exclusive" timeout="10">
    <cfset variables.my.inventory = inventory />
    <cfset variables.my.inventoryUpdate = now() />
  </cflock>
</cffunction>

There are two caveats:

  1. CFC don't have real constructors, meaning that it's possible to call the init() method multiple times (bad) or call other methods before calling init() (even worse). What does this mean? You need to take a couple precautions. First, all methods should fail if init() hasn't been called. Most CFCs are like this anyway, because they depend on initialization parameters (like a DSN). Second, calls to the init() method must be externally locked. Fortunately, since we're creating and initializing all our application-scope CFCs within the locking framework discussed above, that's already taken care of as well. Just be careful of non-application-scope CFCs (like session-scope).
  2. This type of locking only keeps the CFC's internals in sync. It is still suceptible to the exact same problem we ran into with the first example using two queries (coincidentally, performing the exact same operation). So if in our application code (outside the CFC) we call getInventory() and follow with a setInventory() that uses the value, we still have to lock it on our end, just like the first example.

For external resources, locking is a bit trickier. Files on the local file system are easy, always use a named lock on the canonical absolute pathname. Files on a remote filesystem shared between servers are problematic, because there's no way to use CFLOCK across multiple servers. You'd have to use some kind of semaphore file, and then lock access to that, and it turns into a mess very quickly. External objects can usually be locked using their class name (like files), if they're local. Remote shared objects should have built-in synchronization.

14 responses to “Locking in CFMX”

  1. barneyb

    It was pointed out after I wrote this that locking is not required if you don't care if there are slight errors in application logic that could result from race conditions. Memory corruption will never result from a lack of locking in CFMX (unlike previous versions), only logic errors.

  2. joshua frankamp

    thanks for your post, i realized a mistake of mine about synchronized objects… I am using a linkedhashmap in a shared cfc field. I needed to use the Collections static method to return a thread safe map.

  3. Barney

    Synchronized objects don't avoid race conditions if a condition can exist across multiple operations (i.e. a contains() and then a subsequent add()), it only protects against single-operation race conditions. So basically, getting a thread-safe object isn't going to help you solve the same problems that CF locks will solve. And more importantly, if you use CF locks properly, there will be no need for a synchronized object, because CF will be single-threading all access to the object. For non-single-threaded objects (i.e. ones you access outside CFLOCK blocks) that are accessible to multiple requests, you should always use a synchronized version as you describe. That's the reason that CF uses Vector over ArrayList for CF arrays, and Hashtable over HashMap for CF structs.

  4. Cynthia

    I only read the beginning of your article, can't continue because I am shaking my head so much. Your first example is odd to say the least. The normal CF way of ensuring that the queries run as one unit would be to wrap them with a cftransaction. There are far better examples of the use of cflock… why not start with the common example of locking write to Application or session variables??

    This statement is misleading as well: "a multi-step process concerning a resource that is shared across requests, where later steps depend on results of earlier steps, or the steps must happen all-or-nothing."

    Um, no, not really. I need to lock my writes to shared scope variables not because there are multiple *steps* involved, but because there could be multiple *threads*.

    You are really going to confuse newbies with this post.

  5. Cynthia

    >Regarding your second point, you're wrong. A race condition can ONLY
    >exist where there are multiple steps.

    You're missing the point. People conceive of writing to an Application or session variable as one "step". Their confusion will be reinforced by the first example you used, where clearly there are two steps. Do query one, then do query two.

    I remember having a hard time understanding why I needed to lock when I was first starting out. I corresponded with Ben Forta and he cleared everything up, and I can tell you that the way you are explaining things is most definitely going to leave newbies (and probably some others as well, sorry to say) with the wrong ideas.

  6. Acker

    The first example of course is ehh, but it does demo two processes that take time to execute, and could potential be cross threaded. The 3rd example is where it's at for me.

    I think I've come to find you can't store a cfc in the application scope if it needs to be thread safe. I did find I could store a cfc that has not been init(), and then init it to the request scope. All you would gain here is that the cfc is loaded and waiting for use in a non-shared scope.

    I'm having the issue that this particular cfc at work requires a dsn, dbo, and was stored in the application scope. Problem was that if another request with a different dsn/dbo came threw, and the application name was the same, they'd cross thread.

    All in all I wanted to post somewhere that I don't think you can safely store a cfc that counts on it's variables remaining the same for a full request.

    TIP!!!! ColdFusion 8 has cfThread with attribute action="sleep", this made it very easy for me to loop from="1″ to="100″ and sleep for 200milsecs on every loop. So it made it much much much easier for me to run other machines/browsers to attempt a cross thread.

    Damn Descent post BarneyB. I've got a "Cynthia" type at my job too, mine is an impatient woman who wouldn't finish reading an article either, and quick to bash an article with comments such as "keep my hands from shaking".

  7. Acker

    BarneyB, your correct. At my job they are trying to keep the CFC in the application scope, but then they mail-blasted that they needed to move the CFC into the request scope to make it thread safe.

    I was so focused on the thread safe part, that I didn't consider the simplistic fact that it's in a shared scope.

    To fix the problem I came up with the idea of separate application names, per dsn name.

    B, could you please elaborate what you meant by "but if you just made the dsn/dbo arguments that are passed into the CFC, the problem should go away" … I think somethings missing in your statement.

  8. Acker

    I gotcha … Yeah we would have to find/replace all calls to the CFC methods to always supply dsn/dbo … for now the CFC lies in the request scope, and is re-initiated with every request =(

    I say my company needs to make an application name per project.
    -Acker

  9. Acker

    B,
    I thought and suggested that same idea. Mine was more like: Lets keep the CFC in the application scope for loading speed, but INIT() it to the request scope at each request. Basically saying the same thing your saying, but your method is more logical (although your suggesting to use a shared variable within a cfc, which is advised against, but doable).

    Cool Cool BarneyB, best single threaded blog about cross threading in shared scopes I've ever seen.
    -Acker