To Closure Or Not To Closure

Sean, programmer extrordinaire, released a closures library for CF and it's generated a lot of interest.  Unfortunately, a lot of the interest has been of the "what are closures and what are they good for" type.  So I thought I'd contribute my thoughts.

Closures, at a most basic level, are functions defined inside a scope of some sort.  You can then pass the function around and call it whenever, and the function continues to execute as if it were still inside the scope it's defined in.  Closures (and their close derivatives) are often found in multithreaded environments as callback functions.  Every JS Remoting library that I've played with, (along with Flash Remoting) uses callbacks for responding to request completion.  What you may not have been aware of is that the callback functions are closures, not just simple functions.

I'm going to avoid using Sean's library, because it makes the issue a bit more confusing than it needs to be because of the syntax it entails.  Instead, I'm going to use Javascript, since that should be pretty familiar to everyone who reads this.  Here's a JS Remoting invocation using a non-existant Remote library:

function getSomethingById(id) {
  Remote.invoke("http://domain.com/neat_service/" + id, function(response) {
    recordSomethingById(id, response);
  });
}

What's going to happen is that when the request finishes, the server's response  is going to be passed as an argument to the callback function.  What's not quite so obvious is that the callback function is going to be invoked LONG after this bit of code actually runs to completion (because the remoting request is asynchronous).  What's amazingly useful is that the callback function can use variables that are defined outside the callback, but inside the outer function (e.g. the id parameter), even though the outer function has long since returned.

This is the essence of closures.  Not only are they a discrete block of functionality that can be passed around (i.e. a function), but they carry with them the scope they were defined in, and can reference back any time they are executed.  In this case, the use is pretty bland, simply as an adapter for the recordSomethingById method because the raw callback's signature doesn't match.  They can be used for much more interesting things, however.

Imagine, if you will, a collection of "things".  You need to sort those things, but the code in charge of building the collection (e.g. hitting the DB) is not the same as the code in charge of specifying the sort criteria.  One approach would be to force the sorting code to specify the sort order to the collection building code (for use in an ORDER BY).  Often nonviable because the sorting code doesn't call the building code, it just gets handed a collection.  Another would be for the sorting code to get the unsorted collection and manually sort it with some sort of sorting algorithm for the specific type of "thing" that the collection is made of.  Less coupling, which is good, but you can imagine if you have five (or 500) different types of things to sort; lots of sorting algorithms that are all basically the same, and only differ in the comparison bits.  Finally, the "right" solution: a generic sorting algorithm that doesn't understand how to compare objects, just sort them.  You pass it a collection and a closure that can do a comparison between any two "things" in the collection.  So now you can use the same sorting agorithm regardless of collection type, as long as you can provide a closure for comparison.  Here it is in code (again JavaScript):

 


function sort(collection, comparator) {
  var n = collection.length;
  for (var i=0; i < n - 1; i++) {
    for (var j=0; j < n - i - 1; j++)
      if (comparator(a[j+1], a[j])) {
        swap(collection, j, j + 1);
      }
    }
  }
}
c = [
  {name: "barney", height: 195, weight: 80},
  {name: "heather", height: 182, weight: undefined},
  {name: "jerry", height: 185, weight: 81}
]c_by_height = sort(c, function(a, b) {
  return a.height < b.height;
});
c_by_name_reverse = sort(c, function(a, b) {
  return a.name > b.name;
});

 

Notice that the sort function is totally generic, and that sorting an arbitrary collection of arbitrary elements is as simple as writing a comparison function.  In this case I've opted for a bubble sort, but imagine the freedom you have; at any point in the future you could reimplement the sort method with a mergesort and magically all your sort operations would be updated.

How about a more practical issue.  I don't know about anyone else, but transaction management is one of the more frustrating aspects of CF to me.  In particular, there's simply no good way to say "if i'm not in a transaction start one, otherwise just use the one I'm already in".  A solution (that does not work on BlueDragon, by the way, because of different connection pooling models) is to implement a transaction management CFC that manages transactions directly with the database instead of using CFTRANSACTION.  There are definite downsides, like the horrific mess you can get into if you don't trap exceptions properly, but the upsides are very nice (not having to worry about transaction nesting).  The problem is that you end up with boilerplate code all over the place:

...
<cfset variables.transactionManager.enterTransactionalBlock() />
<cfquery ...>
...
</cfquery>
<cfset variables.transactionManager.exitTransactionalBlock() />
...

Not a big deal, but if you leave it out, or only put half of it, you can get some weird issues.  Now if we were to have a "good" language-supported closure implementation, we could do this:

<cffunction name="callback">
  <cfquery ...>
  ...
  </cfquery>
</cffunction>
<cfset variables.transactionManager.doInTransaction(callback) />

Not hugely different, I'll admit, but notice that the doInTransaction method totally encapsulates all the transaction related stuff, be it a CFTRANSACTION tag or some custom mumbo-jumbo.  As such, you can transparently switch back and forth as needed, and because closures carry the scope they are defined in, that CFQUERY tag in there can reference variables local to where it's define even though it's executing inside the doInTransaction method of some other CFC.  This is a ludicrously powerful concept, though one that, if applied to the wrong tasks, can lead to very nasty code. 

One last note, for those familiar with Java.  Java doesn't have real closures because it doesn't have functions, but anonymous inner classes share most of the same traits, and usually the same purpose.  Not a syntactically elegant, but having multiple methods on an object rather than a single function can be quite useful in some scenarios.  Java also forces you to  mark as final all local variables (including method parameters) that your anonymous inner class will reference, which is an aid to performance as well as helping demark what external references have a closure dependancy.

7 responses to “To Closure Or Not To Closure”

  1. Bruce

    Thanks for posting this. I read Sean's post on Closures and I was like WTF. Now I see their benefit. As a CF and Java developer I wasn't familar with the concept, though as you explain so well Java has a closure type feature.

    Concerning your last paragraph about Java's anonymous inner classes–I believe that only local variables must be marked as final, instance fields can be referenced by the anonymous inner class and don't have to be marked as final.

    See: http://www.developer.com/java/other/article.php/3300881

  2. Spike

    Hi Barney,
    You can use callbacks just fine in ColdFusion.

    The code below outputs the content of the query exactly as you would want it to (assuming the code shows up in the comments of course.).

    Obviously you'd want to replace the opening and closing cftransaction tags with your transactionManager.enterTransactionalBlock() and transactionManager.exitTransactionalBlock() calls, but apart from that it should work fine.

    <cffunction name="getPages">
    <cfset var q = "" />
    <cfquery datasource="mySite" name="q">
    SELECT *
    FROM pages
    </cfquery>
    <cfreturn q />
    </cffunction>

    <cffunction name="transact">
    <cfargument name="queryMethod" />
    <cfset var result = "" />
    <cftransaction>
    <cfset result = queryMethod() />
    </cftransaction>
    <cfreturn result />
    </cffunction>

    <cfset result = transact(getPages) />
    <cfdump var="#result#" />

  3. Sean Corfield

    I think Barney's point was that what you really want is for the query to be able to bind to local variables:

    ").bind(userId=userId)
    )>

    And this will bind the local userId variable into the query but still execute inside the transaction. The cf variable is a ClosureFactory instance and doInTransaction() would need to do callback.call() instead of just callback().

  4. Sean Corfield

    I think Barney's point was that what you really want is for the query to be able to bind to local variables:

    [cfset var userId = arguments.event.getValue("id")>

    [cfset variables.transactionManager.doInTransaction(
    cf.new(" [cfqueryparam … #userId# ..> [/cfquery> [cfreturn theQuery>").bind(userId=userId)
    )>

    And this will bind the local userId variable into the query but still execute inside the transaction. The cf variable is a ClosureFactory instance and doInTransaction() would need to do callback.call() instead of just callback().

  5. Sean Corfield

    Oh I give up… my code fragments just aren't going to show… I hope folks can see what I'm trying to suggest…

  6. Barney

    Bruce,

    Good catch on the Java anonymous inner classes. You are absolute correct; only local variable (including method parameters) need to be marked as final. The reason for this is apparent with a little thought: those variables only last for the duration of the method invocation, while the outer class's instance variables persist until the instance dies. So the instance variables are inherently "final enough". I will ammend the entry to that effect.

  7. Barney

    Spike,

    A callback is not a closure. Callbacks don't carry the scope they're defined in, so the callback mechanism you propose would correctly resolve external variable dependancies if the variable scope that the callback is defined in is the same as the variable scope the trasact() function executes in. Chances are slim that'll be the case. With a closure, the callback's variable scope would come along for free, so you don't have the issues.

    Sean hit the nail right on the head. A callback is only 20% of what a closure is, and without the full closure capability, the callback mechanism would only suffice for very simplistic operations (ones with no external variable dependancies).