On-The-Fly YUI Compressor

A couple years ago I wrote about using YUI Compressor to do built-time aggregation and compression of static assets.  That works well and good if you have a build environment, but that's not always the case.  So, still using YUI Compressor, I set up a simple script that'll do runtime aggregation and compression of assets using my favorite mod_rewrite-based file caching mechanism.

The basic idea is that your HTML includes a reference to "agg_standard.js", which is an alias for whatever JS files you need for your standard user (as opposed to a mobile user, for example).  That request comes to your server and if the file exists, gets served back like any other JS file.  If it doesn't exist, however, mod_rewrite will forward it to a CFM page to generate it:

RewriteCond     %{REQUEST_FILENAME}     !-s
RewriteRule     (/my_app)/static/(agg_.*)\.js$ $1/aggregator.cfm?name=$2.js

In our example, the request passed to aggregator.cfm would have "agg_standard.js" as the 'name' attribute, which is how we'll figure out what we need to aggregate together:

switch (url.name) {
case "agg_standard.js":
  files = [
    "jquery/jquery" & (request.isProduction ? ".min" : ""),
    "jquery/jquery.ui" & (request.isProduction ? ".min" : ""),
case "agg_iphone.js":
  files = [
    "jquery/jquery" & (request.isProduction ? ".min" : ""),

The important bits, of course, is the Groovy script that actually does the aggregation and compression.  It uses YUI Compressor, so you'll need to have the yiucompressor-x.y.z.jar file on your classpath (probably in /WEB-INF/lib).  Here it is:

import com.yahoo.platform.yui.compressor.*

sw = new StringWriter()
variables.files.each {
  def f = new File(variables.STATIC_DIR + it + '.js')
  sw.append('/* ').append(f.name).append(' */\n')
  if (! it.endsWith(".min") && ! it.endsWith("-min")) {
    def compressor = new JavaScriptCompressor(f.newReader(), null)
    compressor.compress(sw, -1, false, false, false, false)
  } else {
variables.buffer = sw.toString()

Pretty straightforward: it just loops over the files, using a StringWriter to build up the aggregated buffer.  Each file is either compressed into the Writer or simply appended based on whether the file has already been minified (based on ".min" or "-min" in the filename).  Each file also gets a comment label in the Writer above it's contents so that the aggregated file is a little easier to parse (at the expense of a few extra bytes).  Once done, the Writer's contents are stored in the 'buffer' variable caching on the filesystem:

<cfset fileWrite(STATIC_DIR & url.name, buffer) />
<cflocation url="#url.name#" addtoken="false" />

You'll notice that I'm not streaming the buffer back out to the user, but instead 302-ing back to the same URL.  This is important.  The reason is that Apache does a whole bunch of stuff to optimize static assets, and if I serve the content back with CFCONTENT, I'll miss out on all of that.  Yes, the 302 has a little bit of overhead on the initial page load, but it reduces the total transfer size by several hundred KB (because of the GZIPping), and avoids a rerequest on the next page load (because of the cache headers).  So it's completely worth it, the moreso because this is an application likely to generate extended usage rather than a content-centric site that is likely to see single-page "bounce" visits from search engines.

The last piece of the puzzle is handling versioning of your assets.  When you change your JS file, you necessarily have to invalidate your cache (by deleting the files) so the aggregated version can be rebuilt with the new JS.  The easiest way to do that is to use psuedo-versioning of your assets.  You'll see a lot of sites will add a timestamp or a version number to their files (e.g., "arc/yui/reset_2.6.5.css" from Yahoo.com) so that when the update the file it gets a new filename, and is therefore redownloaded by everyone (because it doesn't exist in their cache).  That's great, but it means you have to rename your files all the time which is kind of a pain.  But you can fake it:

<script type="text/javascript" src="static/agg_standard_#STATIC_ASSET_VERSION#.js"></script>

That'll generate a request to "agg_standard_15.js", as you might imagine, which isn't going to work so well.  But we can just change the 'switch' line from the first snippet to this:

switch (REReplace(url.name, "^(agg_.*?)(_[0-9]+)?\.js$", "\1.js")) {

Now it'll strip out that "fake" number string and switch on just "agg_standard.js", which is what we want.  But that 'fileWrite' call later will still use the full filename (with the number embedded).  That way subsequents will get the filesystem cacheing, the headers and GZIPping from Apache, and all the other love.  And when you rev your files, you need only increment the STATIC_ASSET_VERSION variable and you'll have a brand new set of virtual URLs for all your assets, no fuss, no muss.

Oh, and just in case you're wondering, the aggregation and compression is fast.  If you've ever used the command line or Ant task, you might fear that it's slow, but most of the time you see there is from the JVM spinning up, not the actual compression.  Since this is all in-JVM, you don't pay any of that cost.  It's certainly not fast enough to have it run in production on every request (hence the file caching), but it's totally reasonable to do on your production box as part of deploying a new version of the app.  It's also probably fast enough to have running every request on your internal test/staging boxes, though that'll depend on how much you're aggregating/compressing among other things.

Comments are closed.