Archive for the 'tools' Category

FB3lite as a Custom Tag

Last night at work Koen uncovered an issue with using FB3lite as a custom tag.  Inside the tag it does "formUrl2Attributes" to merge the two scopes into the attributes scope.  What I'd done incorrectly was omit the "don't override" parameter to the structAppend calls, so the URL and FORM scopes would supercede any existing attributes for the invocation.

During normal execution this is irrelevant.  It's similarly irrelevant if you don't have a collision between FORM/URL parameters and custom tag attributes.  However, if you do have a collision, the FORM/URL parameter wins, which is clearly incorrect.

To fix this, I added the missing third parameter to structAppend, as well as reversing the lines (so FORM still overrides URL as it always has).  You can grab the source directly, or pull from Subversion.

Efficient Caching With mod_rewrite

Caching with mod_rewrite?  What?  I'll admit it's a slightly misleading title; the cache is actually a disk cache, but mod_rewrite is where the magic happens.  Bear with me for a moment…

Most content on the web is fairly static.  Some of it changes every few minutes, some changes every few hours, some changes a few times a month, and the vast majority of it changes approximately never.  However, a large percentage of it is generated dynamically, every request.  Maybe it's news articles, maybe it's thumbnails for images/pdfs/videos, maybe it's RSS feeds, but identical content is dynamically generated over and over again.  Huge waste of resources.

On the flip side, you can use pre-generation to build stuff ahead of time so you can serve everything statically.  However, that can be ridiculously expensive as well.  For example, my blog has several hundred (if not thousands) distinct feeds available on it.  The main one (listing posts), one per category (posts), one per author (posts), the main comment feed (listing comments), and one per post (comments).  Each of those is available in RSS 2.0, Atom 0.3, and RSS 0.92 formats.  Pregenerating those all the time is silly, because the vast majority of them will never be accessed, let alone frequently.

Ideally, we'd be able to generate these resources dynamically, on demand, but then keep the output around to serve back statically for subsequent requests.  This saves us the expense of pregenerating lots of stuff that will never be accessed, but gives us the speed of static access after the first request.

Duh, Barney, what's your point?

My points is that while this is, in a conceptual mindset, the obvious solution, it's ridiculously trivial to implement.  It'll take longer to read this post than to set it up.  As such, there's no excuse for being resource constrained on non-user-specific resources, even though this seems to be a really common complaint.

Here's a more concrete example.  Say I host photo galleries, allowing people to upload their full-size images, and I provide several views of the galleries with appropriate thumbnails.  Those pages are littered with things like this:

<img src="/gen_tn.cfm?id=12345&width=100&height=100" />

This is great, because I can create arbitrarily sized thumbnails without having to go back and regenerate them for all existing photos.  That's handy when I create a new layout and realize I want 125×125 thumbnails instead of 100×100, and then want to use 250×250 for the 'featured' section.  But I'm generating the thumbnails dynamically every request, which is a waste.  And adding caching in gen_tn.cfm is the wrong answer.  : )

First, let's change the URLs in the pages to look like this:

<img src="/tn/p12345-100x100.jpg" />

Same information as before, just packaged differently.  Then I'll use the following RewriteRule to (internally) turn it back into the original request to gen_tn.cfm (effectively a no-op):

RewriteRule  ^/tn/p([0-9]+)-([0-9]+)x([0-9]+)\.jpg$  /gen_tn.cfm?id=$1&width=$2&height=$3  [PT,L]

Lipstick on a pig, you might say, and you'd be almost right.  We now have normal-looking URLs for our thumbnails (lipstick), but they're still dynamically generated every request (on a pig).  This abstraction, however, is incredibly powerful.  Lets add a RewriteCond in front of that rule real quick:

RewriteCond  %{REQUEST_FILENAME}                     !-s
RewriteRule  ^/tn/p([0-9]+)-([0-9]+)x([0-9]+)\.jpg$  /gen_tn.cfm?id=$1&width=$2&height=$3  [PT,L]

That says to only do the RewriteRule if the requested file doesn't exist or is zero length ('-s' says a regular file with non-zero length, the '!' negates it).  Next step is to create the 'tn' directory in your web root and ensure it's writable by your application server.  You can probably see where I'm going with this…

The final step is to tweak gen_tn.cfm slightly.  Currently, it creates the thumbnail and serves it back to the client.  We need to change it so that before serving it back, it writes it to disk in that new 'tn' directory, using the appropriate filename.  Once that's done, send it to the client as usual.  The next time the thumbnail is requested, Apache will hit the RewriteRule, but the RewriteCond will not match (because the file exists and has length).  As such, it won't be rewritten to gen_tn.cfm, and will instead be served statically directly from disk bypassing the application server completely.

With those couple simple changes, you suddenly have a ridiculously effective caching mechanism in place.

What about changes to the source, though?  You realize one of your photos (#12345) was miscropped, so you fix it and upload a new version, but you want your thumbnails to be regenerated too.  Fortunately, flushing the cache is as simple as deleting all files in 'tn' that match '*p12345*.jpg'.

Same thing goes for deletions.  If you decide you just want to remove photo #12345 completely and want to remove the thumbnails too, run the same deletion of '*p12345*.jpg' from the 'tn' directory.  Or if you stop using 100 pixel thumbnails (like when I switched to 125×125 a few paragraphs ago), you can just delete '*100×100*.jpg'.

Because you're using the filenames as an index of sorts, it means you have to name your files carefully.  The filename needs to contain not only everything to uniquely specify the file (photo ID, width, and height in this example), but also everything that you might want to use for clearing the cache.  For example, if you need the ability to clear based on gallery ID you'd need to change the URL to '/tn/g123-p12345-125×125.jpg' or something.  In this case the gallery ID isn't needed for unique specification, only for flush selection.

The net of this is that you can hit that sweet spot: avoiding any extra work generating resources that aren't accessed, and never generating the same resource more than once.  Obviously the first request to a resource has to wait for generation, so this technique isn't suitable for all use cases, but it covers a huge swath of them.  It's especially well suited to situations where you have a large number of resources and have either relatively light usage across them and/or need the ability to change the derived resources' specifications (e.g. new thumbnail dimensions or new XML feed formats).

As you'd imagine, PotD (NSFW, OMM) uses this technique extensively for several classes of thumbnails as well as RSS feeds.  It also does some pre-generation where the first-request delay is unacceptable.  I also used this to great effect at my previous employer's for front-end caching of CMS-generated HTML pages.  We handled hundreds of millions of pages per day on a pair of single-P4 servers with 1GB of RAM each, with an average cache life of between two and four hours.

One significant gotcha is that you only get full-request caching with this technique.  I.e. you can't cache portions of a request's response, because it's either fully dynamic (the first request) or fully static (subsequent requests).  For example, most blogs have a "remember me" feature so you don't have enter you information each time you want to comment.  In order to beat this, you need some sort of two-phase generation where the cache happens between the phases, and that means you have to have your application running "above" the cache.  Ajax can be used as the second phase, but that's a disaster waiting to happen, if you ask me.

Minor FB3lite Update (and a Weird CF Bug)

This evening while adding some reporting to PotD (NSFW, OMM) to help nail down some performance issues that I think are Apache's fault, I noticed a strange issue with FB3lite.  If you've used it, you know the core of the "framework" are the do() and include() UDFs.  Both contain a CFINCLUDE tag, and a weird situation arises with scoping.

CFML has implicit scope traversal, so if you have an unscoped variable, it will automatically traverse across a bunch of scopes until it finds one with the right name.  Further, you get an implicit local scope within UDFs, and you get a magic psuedo-scope within query-based CFOUTPUT and CFLOOP tags.

What I noticed was that looping over a query with a column named "template" always displayed the currently executing template, instead of the value from the query.   No worries, I thought, since prefixing it with the query name solved the issue.  "template" isn't a common query name for me, so while it surprised me I'd never noticed or heard about this magic "template" variable before, I didn't think too much of it.

Then a while later I realized it was my own fault, because of the include() UDF in FB3lite.  The argument to the UDF is named "template", and CFML places the implcit local scope at the top of the heap, including above the magic query-loop scope.  That argument was shadowing my query variable.

I've fixed FB3lite to use a prefix on all it's UDF arguments, so this problem cannot manifest itself anymore.  Well, I suppose it could, but you'd have to have a weird-ass variable name (e.g., "_fb3lite_template").  You can download the latest version, or pull it from Subversion.

CFYourFavoriteLanguage (Formerly CFGroovy)

CFGroovy grew some wings this afternoon.  It retains it's core functionality of running Groovy code in a CFML environment, whether you have it installed on your classpath or if it's transparently loaded from the local copy of the JAR.  However, it now supports any JSR 223 scripting language as well (assuming you're on a 1.6 or newer JVM).  Of the various choices, Groovy seems the best fit for CFML developers (hence the focus on this language), but I also tested Python (via Jython) and PHP (via Quercus).

Of the CFML engines, Railo 3.1 was the champ, running all three guest languages flawlessly.  ColdFusion 8.0.1 refused to run the Python example, not really sure why.  Open BlueDragon refused both Python and PHP.  All three run Groovy, of course, even with the conversion to use the JSR 223 interface (for consistency) instead of the "normal" GroovyClassLoader interface.

You can access any installed languages via the new 'lang' (or 'langauge') attribute; Groovy remains the default, of course.  Here's an example for PHP:

<g:script lang="php">
<?php
  $variables["myArray"][] = "it's some PHP";
?>
</g:script>

The empty brackets mean "create a new item at the end", so that line appends a string to the named array.

Latest mods are in Subversion, of course.

More CFGroovy2 Goodness

Last night at dinner I was talking with Mark Mandel and Luis Majano and realized I'd completely misunderstood the way JavaLoader worked based on my initial look see.  So for the price of 21 additional lines (nine of which are purely for misbehaving CFML runtimes), CFGroovy will transparently load an internal copy of Groovy if it can't find one on the classpath.

I've created a branch in Subversion to house the new version at https://ssl.barneyb.com/svn/barneyb/cfgroovy/branches/cfgroovy2/engine/.  It's organized the same way as the trunk, so there is a ../demo/ directory that contains a trivial demo application.  Here's the demo template, so you can get a feel for how easy CFGroovy is to use:

<cfimport prefix="g" taglib="engine" />

<cfset variables.myArray = listToArray("barney is tall,CFML is taggy,Groovy is AWESOME!") />

<cfoutput>
<h1>Inline Groovy</h1>

<p>This demo creates an array of strings, CFDUMPs it, uses some inline
Groovy (via <code>&lt;g:script&gt;</code>) to add a few more, and
then CFDUMPs it again.
</p>

<h1>Only Three</h1>
<cfdump var="#variables.myArray#" />

<g:script>
// better add emery
variables.myArray.add("emery")
// and some other stuff, using some other syntaxes
variables.myArray += "CF Runtime: " + server.coldfusion.productname + " " + server.coldfusion.productversion
variables.myArray << "User Agent: " + cgi.http_user_agent
</g:script>

<h1>There we go!</h1>
<cfdump var="#variables.myArray#" />
</cfoutput>

CFGroovy in Forty Lines

It's been a couple months since I've done anything with CFGroovy.  I've been mulling how to get back to the essence, which is Groovy scriptlets in CFML.  Today at cf.objective() I put my fingers back on the keyboard for the first time.  Here's a full implementation of the <g:script> tag in 40 lines.  There are no bells and whistles, and no Hibernate support, just scriptlets.  But it's production worthy.

<cfsilent>
<cffunction name="getBinding" access="private" output="false" returntype="any"
  hint="I build and return the Binding for the GroovyScriptEngine.">
  <cfargument name="variablesScope" type="struct" default="#structNew()#" />
  <cfset var binding = createObject("java", "groovy.lang.Binding").init() />
  <cfset binding.setVariable("variables", variablesScope) />
  <cfif isDefined("url")>
    <cfset binding.setVariable("url", url) />
  </cfif>
  <cfif isDefined("form")>
    <cfset binding.setVariable("form", form) />
  </cfif>
  <cfset binding.setVariable("request", request) />
  <cfset binding.setVariable("cgi", cgi) />
  <cfset binding.setVariable("pageContext", getPageContext()) />
  <cfif isDefined("session")>
    <cfset binding.setVariable("session", session) />
  </cfif>
  <cfif isDefined("application")>
    <cfset binding.setVariable("application", application) />
  </cfif>
  <cfset binding.setVariable("server", server) />
  <cfreturn binding />
</cffunction>
<cfif thisTag.executionMode EQ "start">
  <cfif NOT structKeyExists(server, "cfgroovy.groovyLoader")>
    <cfset server["cfgroovy.scriptCache"] = createObject("java", "java.util.HashMap").init() />
    <cfset server["cfgroovy.groovyLoader"] = createObject("java", "groovy.lang.GroovyClassLoader").init() />
  </cfif>
<cfelse><!--- executionMode EQ "end" --->
  <cfset body = trim(thisTag.generatedContent) />
  <cfset thisTag.generatedContent = "" />
  <cfif NOT server["cfgroovy.scriptCache"].containsKey(body)>
    <cfset server["cfgroovy.scriptCache"].put(body, server["cfgroovy.groovyLoader"].parseClass(body)) />
  </cfif>
  <cfset script = server["cfgroovy.scriptCache"].get(body).newInstance() />
  <cfset script.setBinding(getBinding(caller)) />
  <cfset script.run() />
</cfif>
</cfsilent>

It does depend on having Groovy already available on your classpath (one of those missing bells and whistles is auto-loading Groovy).  This is the first version of the CFGroovy 2 line, which I've decided will be a ground-up rewrite.  Backwards compatibility is a goal, and I think it's a reasonable one, but where CFGroovy 1.0 was developed with an application focus, CFGroovy 2.0 will be developed with an architecture focus.  I'm willing to sacrifice a little backwards compatibility, particularly with Hibernate integration, to attain that.

I Love Apache

So after I got my new server online, I wanted to deal with my oversight on DSN TTLs.  Not surprisingly, mod_rewrite saved the day again.  First, since my old server is subject to the same TTLs, I added a record in my /etc/hosts file to point all of them at the new IP.  Then I changed my Apache config to run this rule set for all hosts:

RewriteEngine   On
RewriteCond     %{HTTP_HOST}   (.+)
RewriteRule     ^/(.*)$        http://%1/$1    [P,L]

Done.  That simply proxies any request with a host header through to the same exact URL, except that this time the DNS lookup is done by the old server, which has the /etc/hosts file to help it find the right place, instead of the browser that has incorrect DNS cache.  Works like a champ.

Show Me Your Tool

If you read my blog regularly, chances are you write software and therefore can't, because your tools don't exist in the visual world.  They're just magic strings of minuscule magnets on a rapidly spinning chunk of plastic…

I took my chef's knife to the sharpener a few days ago.  Cost a whopping $4 to have him put a wicked edge on it, and I watched it happen.  I saw him carefully run the blade along a belt sander (for lack of a better term) a few times to give it the rough shape, then he used a bench grinder to finish the edge, a steel to hone it, and finally a jeweler's wheel to polish it.  Not five minutes elapsed before he handed it back to me, wrapped in butcher's paper for the journey home.

If he let me loose in his shop, there's no way I could have achieved the same result.  But given ten knives, I bet I could get a pretty good edge on the last few (after undoubtedly destroying the first couple).  Nothing like his result, to be sure, but significantly sharper than the initial state.

Sharpening a knife is a pretty simple task, because a knife is an inherently simple item, but it's just one example.  Consider a master furniture maker.  He can take the same wood you and I buy at Home Depot or Lowe's and with his tools and expertise turn it into a beautiful bureau or armiore.  Turn me loose in his shop and I'd probably be able to make a functional dresser in twice the time it'd take him to make an exquisite one.  With some more experience, both using the tools and in furniture construction overall, I've no doubt I could make something I'd be proud to have in my home.  It wouldn't be the same quality as something the master craftsman created, no question, but better than the prefab stuff you might otherwise buy.

So what's special about these tasks?  Nothing, really.  Most things are of a similar nature: cooking, playing music, grooming dogs, surfing, etc.  Attaining mastery of a given profession requires certain in-born characteristics, but attaining laudable proficiency is pretty much available to anyone willing to put in the time (barring physical disabilities and such).

Every chest of drawers provides a way to organize and store clothes.  People spend a lot of money on well made dressers that are made of pretty woods, appeal to their personal tastes (mission, contemporary, etc.), or are simply of a higher quality of manufacture.  None of which has the least to do with holding clothes.  Every dresser I've ever seen holds clothes with about equal proficiency, but even though the drawers are a bit sticky, I still use the one I had as a child.

Now consider software development.  The tools are invisible.  The process is invisible.  The result is intangible.  As far as a profession for a craftsman goes, programmers are fucked.  Sure, we get paid because people are willing to pay for the benefits of our software, but it's 100% functional.  No one buys software because it's well made or "pretty".  They might pick between two vendors because one is less error-prone, but that's still functional.

Every database application provides a way to organize and store data.  No one spends extra money on a database system because it was made with snazzy buttons, appeals to their sense of style, or was produced by a higher quality process.  Every database system I've used is inconvenient in one way or another (no OFFSET, no CTEs, etc.), and every one is built using some completely opaque process by unknown automatons in some office building somewhere.

It has occurred to me that the reason for this could simply be that programming is so damned hard it can't be automated.  As a result, there's no way to produce the gradations of craftsmanship that you see in dressers (from the mass-produced pressboard affairs to the hand-crafted hardwood masterpieces).  With software all you get are the hand-crafted versions.  Sure, some of them are simply horrible, but they're all hand-crafted.

Which brings me back to the point: programming is opaque for everyone that isn't also a programmer.  There's absolutely no way you can take your average Joe, sit him next to you while you write something, and then give him your workstation and have him do the same.  Unlike the furniture maker where a simple demo is enough to get the gist of what is happening, with software it's all abstract and divorced from anything tangible.  My mom (who is fairly technically adept) doesn't have any idea what the hell Subversion is, and even if I sufficiently explained it, there's no way she would understand how massively beneficial vendor branches are.  Heck, a lot of programmers don't understand vendor branches.  And yet a one-year old can run her fingers over a piece of wood and tell you if you need to keep sanding (if not run it through the planer again).

Further, there's absolutely no way Joe (or my mom) can look at two pieces of software and compare their "quality" on any meaningful level.  He can make distinctions like "this one crashes more", or "that one has confusing icons", but that's it.   Even a competent programmer looking at a piece of software has the internals almost completely hidden from them.  Very careful observation can provide certain clues (the query optimizer must be making decision X based on inputs A, B and C), but by and large, everything is opaque.  Again, contrast this with a fine bureau where you can see the carefully planed wood, the perfectly matched dovetail joints on the drawers, and the complete lack of any visible fasteners.

I certainly consider myself a craftsman.  I hope to justify calling myself a master someday, but today is not that day.  And yet every evening, as I'm walking out the front door to head home, I think to myself at how completely impossible my work is to appreciate.  My kids ask what I did today, and I have no meaningful answer to give.  The best I can do is "fixed some bugs", or "had an architecture meeting".  I can't explain to my non-programmer friends what I do, or why it has such appeal.  I live a dual life: a "normal" one and a programmer one.  They are as compatible as fire and ice.  I greatly enjoy the praise and criticism I receive from my peers regarding stuff I share, especially when it helps make others' lives easier, but I'd trade it all to bring something home from work one day, show it to Lindsay and Emery, and have them say "Wow, Daddy, that's amazing."

Firefox Spellcheck for Programmers

No, it doesn't exist, but shouldn't it?  I've found that when I write prose in Firefox, I almost invariably ignore the redlines.  Why?  Too many false positives.  Technical prose, which is usually what I'm writing,  is littered with domain-specific terms that no spellchecker will ever consider valid.  Obviously you can help your specific spellchecker by adding words to the dictionary, but that can be a lot of work, and seems like it could easily be automated.  Why doesn't someone do this?

And no, this isn't a Firefox-specific issue.  But probably 85% of my prose is drafted in Firefox, and most of that remaining 15% is drafted somewhere that doesn't have redlining (iPhone, Outlook, etc.).

CFGroovy is Self Executing

Tonight I finished porting the internals of CFGroovy from CFML to Groovy.  Yes, the CFGroovy core is now implemented in Groovy.  The remaining CFML code is for managing the public API (which is a CFML API and therefore must remain CFML),and for bootstrapping the Groovy core.

This architecture provides a number of benefits, primarily a huge reduction in the amount of crazy CFML-based Java interactions.  If you ever get to thinking that doing reflection with CFML wouldn't be too bad, you're wrong.  It's like pulling teeth with scissors.  That is not a typo or an inadvertant mixed metaphor.  The internal code is now far shorter and more readable, though there is still some nasty CFML in there.  Fortunately, I was able to get bootstrapping done with only no-arg constructors, so no more need to type-based constructor selection in CFML, thank god.

Moving the core down into Groovy also move one of my longer-term goals a bit closer to reasonableness.  I really want to create a persistence layer entirely in Groovy, manage it with an IoC container, and use it as a parent BeanFactory for a service layer (implemented with CFCs).  I tried a couple hacks to get this working with the 1.0 engine, and while both of them mostly worked, neither one worked all the way or was even remotely elegant.  Elegance isn't always possible, of course, but the lack of it is usually a red flag.  So I backed off until I had a better platform to approach it from.  But like I said, that goal is still a ways off.