Archive for the 'development' Category

CF Groovy Takes a Nap

After close to two weeks of struggling, I finally managed to deploy pure source to a CFML runtime (Railo, in this case), and get Groovy entities in and out of the database with Hibernate.  No compliation, no IDE, no development-mode server, just my Groovy source along with a hacked up CF Groovy.  This is very exciting for me, because while Groovy is cool and all, CF's dynamic nature has become a must have.

I don't have source to download yet, just a horribly hacky proof of concept, but here's what the user of the "framework" does.  First, create your entity:

package com.barneyb
import javax.persistence.*

@Entity
class Person {

  @Id
  @GeneratedValue
  Long id
  Long version
  String name
  Date dob

  def getAgeInSeconds() {
    (new Date().getTime() - dob.getTime()) / 1000
  }

  def getAgeInDays() {
    ageInSeconds / 86400
  }

  def getAgeInYears() {
    ageInDays / 365.249
  }

  String toString() {
    "$name is $ageInYears years ($ageInDays days) old"
  }

}

That's the same class as my initial CF Groovy demo used, except with the of JPA annotations added.  Because I'm binding to Hibernate, you can use any Hibernate-supported annotations, but I opted to stick with vanilla JPA.  Now we need our hibernate.cfg.xml, which is totally stock:

<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE hibernate-configuration PUBLIC
  "-//Hibernate/Hibernate Configuration DTD//EN"
  "http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd">
<hibernate-configuration>

  <session-factory>

    <property name="current_session_context_class">thread</property>
    <property name="cache.provider_class">org.hibernate.cache.NoCacheProvider</property>
    <property name="show_sql">true</property>
    <property name="hibernate.dialect">org.hibernate.dialect.MySQLDialect</property>
    <property name="hibernate.connection.driver_class">com.mysql.jdbc.Driver</property>
    <property name="hibernate.connection.url">jdbc:mysql://localhost:3306/dbname</property>
    <property name="hibernate.connection.username">username</property>
    <property name="hibernate.connection.password">password</property>
    <property name="hibernate.hbm2ddl.auto">update</property>

    <mapping class="com.barneyb.Person" />

  </session-factory>

</hibernate-configuration>

You could also map your classes manually with XML, if you wanted, but for the sake of simplicitly, I've done it all with annotations.  You'll also notice the database connection information embedded directly in the file. This is far from ideal.  I'm hoping to extract it from a CFML DSN, but that might only be possible on Adobe ColdFusion (via the Admin API).  The last thing is the test script:

<g:script>
  import java.text.SimpleDateFormat

  import org.hibernate.*
  import org.hibernate.cfg.*

  import com.barneyb.Person

  sdf = new java.text.SimpleDateFormat("yyy/MM/dd")
  sdf.lenient = true

  def sf = new AnnotationConfiguration()
      .configure().buildSessionFactory()
  def sess = sf.openSession()
  def tx  = sess.beginTransaction()
  sess.createQuery("delete Person").executeUpdate()
  sess.save(new Person(
    name: attributes.name ?: "Barney",
    dob: sdf.parse(attributes.dob ?: "1980/06/10")
  ))
  tx.commit()
  sess.close()
  sf.close()
</g:script>

No need to compile anything, no need to drop JARs all over your server, nothing.  Write, refresh, persist.  That includes adding new entities, new properties to existing entities, etc.

It's not ready for public consumption at the moment, but that's my top "free time" priority.  For the incredibly impatient, there's a branch in my SVN repository with the current source.  Don't expect anything pretty.  ;)

Query/Reporting DSL Bug Fix

Rob Pilic found a couple local variables that I (gasp!) forgot to var scope in the DSL implementations while he was troubleshooting some concurrency issues.  There was one in each file, though totally unrelated.  I've applied his patch to Subversion and updated the demo install (though it won't matter, because it doesn't share instances).

Licensing My Code

I occasionally get questions about licensing my code for inclusion in other projects.  I thought I'd make a blanket statement that all my code can be assumed under the Apache 2 license unless otherwise indicated.  For where it's not explicit, just drop me a line if you need formal licensing, and I'd be happy to set it up.  Most of what I do is rather informally released (i.e. put in public Subversion), so I usually don't take the time.

CFUNITED Day One

As is typically the case, CFUNITED has a pair of themes.  There's the conference theme, which, as always, is helping CF coders become more empowered by learning about new things (OO, using CFCs, learning frameworks, etc.), and then there's the "backtheme".  This year it's all don't use only CF.  Adobe's integrating Hibernate into CF9, Railo is preaching the benefits of the JBoss platform (clustering, caching, Hibernate, etc.), Groovy has a lot of lovers, and Grails (which is Spring and Hibernate for Groovy) does to.

The integration of Hibernate all over the place is very exciting.  CF-based ORM tools suck, frankly.  Which isn't to belittle Mark or Doug in any way, they've done a fantastic job, it's problems with CF itself that are the issue.  With Railo's "CFC is a class" implementation, Hibernate is directly applicable.  With CF's crazy "a CFC is a bunch of classes in a Map" implementation, I'm not sure how Adobe's going to get it to work.  I'm very much hoping they fix the core issue (which would almost certainly give some nice performance benefits as well) instead of bastardizing Hibernate to get it to work, but we'll see.

Comparators in CF with Groovy

I don't know about anyone else, but wanting to create little ad hoc Java classes in my CF apps in a common occurrence for me.  With my CF Groovy project, it's both possible and very easy.  No Java, no compiling, no classloading hacks.

Here's a simple example.  I'm going to create an array of structs and then sort the array based on a specific field value (the 'date' field) of the structs.  Doing this in CF usually means making a lookup struct, sorting that, and then rebuilding the array (but only if you have unique values), or converting to a query and using QofQ to do it.  Both have serious drawbacks, as well as being very circuitous/obscuring.

First the array construction (six items, each with 'letter' and 'date' fields, holding what you'd expect).  Nothing very interesting here.

<cfscript>
a = [];
month = createDate(year(now()), month(now()), 1);
for (i = 1; i LTE 6; i = i + 1) {
    s = {
        letter = chr(65 + randRange(0, 25)),
        date = dateAdd("d", randRange(0, 30), month)
    };
    arrayAppend(a, s);
}
</cfscript>

Now, using the <g:script> tag from CF Groovy, we'll sort it by date using a java.util.Comparator:

<g:script>
Collections.sort(variables.a, {o1, o2 ->
    o1.date.compareTo(o2.date)
} as Comparator)
</g:script>

If you want to compare based on the letter, just change the two ".date" references to ".letter".  You can, of course, make the comparator as complicated as you'd like, including referencing other Groovy classes through CF Groovy's classloading, or other Java classes on your classpath.

You can see it in action at the demo page, or get the full source from Subversion at https://ssl.barneyb.com/svn/barneyb/cfgroovy/trunk/demo/.

Use ColdFusion? Use Java.

If you use ColdFusion (or another Java-based CFML runtime), you should be using Java. There's a reason that CF uses Java under the hood: Java is incredibly powerful. Yes the interface to Java from the CF level is cumbersome and creating hybrid CF/Java applications pretty much costs you CF's RAD capabilities, but there are some real gems in the Java libraries.

On CF-Talk today, someone asked about reversing an array. There's not a built-in function for doing that, but if you remember that CF arrays are just Java Lists (java.util.Vector, specifically), you can suddenly leverage the full Java Collections framework. In this case, the solution was simple:

createObject("java", "java.util.Collections").reverse(myArray);

Want get unique values from an array? Not a difficult problem, but how about this:

myArray = createObject("java", "java.util.ArrayList").init(
  createObject("java", "java.util.HashSet").init(myArray)
);

Want them sorted? Here you go:

createObject("java", "java.util.Collections").sort(myArray);

Yes, there is an arraySort() CF built-in, but it only sorts text and numbers. So if you want to sort an array of Dates, you're stuck. Collections.sort, on the other hand, will happily sort the dates.

This only breaks the surface of what you can do with Java. Obviously you can't leverage this if you have to support non-Java CFML runtimes, but if you developing for a Java runtime (or runtimes), you owe it to yourself to learn a little bit about the Java tooling available to you.  I've blogged about a couple other Java tricks (fast directory filename listings and string builder tricks) in addition to myriad Java libraries that can be leveraged (Batik, Weka, Ant, etc.)

Here's an complete example of the above tricks:

a = [1, 2, 3, 4, 5, 1, 2, 3, 4, 5];
// reverse it
createObject("java", "java.util.Collections").reverse(a);
writeOutput(a.toString()); // [5, 4, 3, 2, 1, 5, 4, 3, 2, 1]

// sort it
createObject("java", "java.util.Collections").sort(a);
writeOutput(a.toString()); // [1, 1, 2, 2, 3, 3, 4, 4, 5, 5]

// pull unique elements
a = createObject("java", "java.util.ArrayList").init(
  createObject("java", "java.util.HashSet").init(a)
);
writeOutput(a.toString()); // [3, 2, 1, 5, 4]

// unique elements in sorted order
createObject("java", "java.util.Collections").sort(a);
writeOutput(a.toString()); // [1, 2, 3, 4, 5]

Prototype and jQuery

Since I discovered it a few years ago, I've been a big Prototype fan.  It's simple, and gets the job done with a minimum of fuss.  It's not without warts, of course.  I still occasionally forget to put 'new' in front of Ajax.Request, and some of the Ruby-like methods share their lineage's arcane naming.  When it was new, it was the best thing around, and while it now has competitors, it's certainly not lagging behind.

At work, however, jQuery has been adopted as the standard (and I've no power to change it).  The lack of the $() function is annoying; several times I've debated adding this function (or one of various similar ones) to our library:

function $(id) {
  return jQuery("#" + id)[0];
}

I haven't, of course, as it's not the jQuery way.  jQuery also lacks any sort of class assistance, so we still use the Prototype class framework for our class-based JS.  That seems to work fairly well, except for the fact that we have to use two frameworks where one could suffice.

jQuery is not without it's benefits, of course.  The plugin architecture is a nice aspect that Prototype didn't really offer an equivalent of.  It means the core stays lighter (good), but if you want additional functionality you're stuck managing files from a bunch of different projects (annoying).  Event handling is a bit more straightforward, in some ways.  "Magically" acting on collections of elements with a single call (i.e. no .each(function(o){…}) garbage) definitely makes for more readable code as well.

Because of this shift at work, I've been porting some of my personal apps over to jQuery as well.  I've actually been using a couple jQuery plugins (both self-written and external) for specific tasks for a while now, but not the core framework.  What I've found, however, is that jQuery can be prone to slow code.  To avoid a huge amount of extra work on the part of the JS interpreter, using temporary variables for jQuery objects is essential.  If you do strictly id-based queries, the degradation isn't huge, but if you do CSS-based queries, it can be significant.  With Prototype's focus on id-based queries (at least until $$() came about in 1.5), that was less of an issue.

This need to query a minimum number of times can provide a fair amount of complexity when you have more than a handful of closures hanging about and/or a dynamic DOM.  You end up doing a lot of state management work because you're, in effect, caching DOM lookups and have to ensure you never have stale cache.

Other than that issue and the lack of an equivalent to document.viewport, porting has been relatively painless.  Still very id heavy, so not leveraging jQuery as much as could be, but most of what I'm doing wouldn't benefit from other selectors.

Which one is better?  Hard to say.  jQuery seems to make you work harder to type less code, while Prototype seems to cost you a few more characters for a bit less density.  With the exception of Prototype's class support, their feature sets are fairly equivalent, especially with jQuery UI now available to "compete" with Scriptaculous.  For the moment, I'm choosing to use jQuery on new stuff, but wishing for Prototype every few minutes.  Until I come up against some sort of significant wall, it'll probably stay that way, just to stick with the same tooling professionally and personally.  And over time it'll probably get better as the Prototype-ness fades from apps.

FlexChart Updates

The past month or so has seen quite a few improvements and bug fixes to FlexChart, though I haven't blogged about any of them.  Most notably, there was a weird NPE that manifested itself when loading a Pie chart via FlashVars.  For some unknown reason, Flex/Flash didn't give any indication the error was occurring, it just silently terminated the active call stack and continued on it's merry way.  This left the app in a quasi-broken state that would prevent certain future calls from working, but allowing others to execute without issue.  I still have no explanation as to why the error silently terminated, but I've since seen the same behaviour inside FDS, so it's not charting specific.

The ability to style charts has been extended a bit, though it's still not as highly polished as I could wish.  For example, supplying a stroke weight for a line series causes the stroke color to default to black, instead of the automatically assigned color (orange, green, blue, …).  In the reverse case, if you supply custom colors on a Pie chart, they render correctly, but the legend (if one is used) uses the default colors (orange, green, blue, …).  Gradient fills are now available as well.

I've also improved handling of empty charts.  The stock custom tag requires a descriptor, but if you don't have data at page render time, you usually end up providing "<chart />" as the descriptor.  The engine now detects this case (whether on the initial load or later passed in), and is a bit more intelligent about ensuring it clears it's stage.  Previously you could end up with an empty CartesianChart in some cases.

Finally, I made a number of improvements to performance and handling of data values.  This was mostly accomplished by explicitly converting the XML nodes into real objects for the chart to render, rather than using the XML directly.  There were some implicit type conversions that didn't happen consistently out of XML nodes, but work fine out of generic objects.

Data Mining With Weka

I've a large application that has a as a major component rank-based prioritization of assets. Users rank the assets on a one-to-five scale, and then that rank data is used to select other assets of interest for the user. If you've seen Amazon's "Recommended for you" or Netflix's recommended titles, you get the general idea.

The app was originally built back in 2004, and used a complex (and cumbersome, and slow) metadata-based algorithm. Each asset has a set of metadata facets specified. At prioritization time, an overall rank is computed for each facet for the given user, based on the rank of assets with the different facets. Unranked assets that have the high-ranked facets and not the low-ranked facets are given a high prioritization. If you've used Pandora, it's the same general idea, though I used far fewer facets. Overall, this algorithm worked quite well. I've tuned it over the years, but it's architecturally unchanged from the initial version.

However, this approach has one huge problem (aside from the complexity): it requires metadata. That metadata has to be populated by someone, and it's a thankless job. I tried a few different ways to make it easier for users to contribute, but never really hit on anything that worked well, so I ended up spending a bit of time every once and a while tagging stuff. As the asset and user counts increase, the workload only goes up, so not a scalable solution.

Which brings me to the topic of the post: data mining with Weka.

Data mining is basically digging through a crapton of low-level data to find higher-level information. Weka is a piece of software, written in Java, that provides an array of machine learning tools, many of which can be used for data mining.

In my particular case, I wanted to remove the metadata dependency of the prioritization algorithm, and rely strictly on rank data. It took a while to really wrap my head around what I wanted to do and what the data path actually looked like, but once I figured it out it was incredibly simple to implement.

In a nutshell, I create a relation (i.e. table, spreadsheet, grid) with rows representing assets and columns representing user. The intersection of each row/column (i.e. a cell) is the rank from that user for that asset. Obviously not every user has ranked every asset, but Weka happily deals with missing data (expressed as a question mark). Here's a partial data set (12 each of assets and users), in Weka's ARFF format:

@relation 'asset-ranks'
@attribute assetId string
@attribute u2 numeric
@attribute u5 numeric
@attribute u6 numeric
@attribute u7 numeric
@attribute u8 numeric
@attribute u9 numeric
@attribute u10 numeric
@attribute u12 numeric
@attribute u13 numeric
@attribute u18 numeric
@attribute u20 numeric
@attribute u21 numeric
@data
48,1,?,?,2,?,?,?,?,?,?,?,?
50,?,?,?,3,?,?,?,2,?,?,?,?
52,1,3,4,2,?,?,4,?,?,?,?,?
70,4,3,5,5,?,2,3,3,1,4,?,?
73,2,3,1,5,?,2,?,5,1,5,?,?
91,3,?,5,2,?,?,?,?,?,?,?,?
165,1,2,4,5,1,?,?,3,1,4,1,?
196,4,2,4,3,5,3,?,?,?,?,?,?
234,3,5,4,2,4,4,4,4,3,5,?,5
235,?,5,5,1,?,2,?,?,?,?,?,?
259,?,?,5,4,?,?,?,?,1,?,?,?
261,3,4,5,4,5,4,?,3,?,?,?,?

Running that through Weka's clustering engine breaks all the assets into clusters averaging 50 assets (my choice) in size, and appends a cluster identifier to each row in the data file. Here's command line I use:

java -classpath weka.jar \
  weka.filters.unsupervised.attribute.AddCluster \
  -i $srcFile \ # the data above
  -I 1 \
  -W "weka.clusterers.SimpleKMeans -N $clusterCount" \ # ceil(rows / 50)
  >& $destFile # the data above, with a 'cluster' attribute added

The clusters represent groups of assets that the ranks indicate are related. The assumption is that for a given users, all assets in a given cluster will be ranked similarly, and the data bears that out. How exactly Weka is doing that, I'm not sure - voodoo may be at play.

Anyway, I read the result into the database, setting up asset-cluster relationships, and then can prioritize the clusters based on their average rank by each user. Unranked assets from the highest-priority cluster should be the assets the user is most interested in.

This approach is not only much simpler, it's enormously faster, and it uses someone else's code (which is always a good thing). However, it's not without a significant problem of it's own: it can only prioritize ranked assets. I've addressed this by randomly mixing in an occasional random unranked asset to seed the pool. Time will tell if that approach works well or not; it's hard to estimate without any data.

With my trials, the two algorithms generally gave similar results. Not identical, of course, but similar. What's interesting is that the old algorithm computes an estimated rank for each unranked asset, while the latter just finds a collection of similar assets that the user indicated an interest in (via ranking some members of the collection). I'll probably look at some predictive stuff to add on top of the clustering to do actual per-asset rank predictions, but for now, it seems unneeded.

I'll be using Weka on some other projects, no question there. Like so much else, the hard part is figuring out how to express the question you want answered. Not technically so much as conceptually. Once you have that, implementation is straightforward.

Build-Time Aggregation of JS/CSS Assets

Ben Nadel posted about compiling multiple linked files (JS/CSS) into a single file this morning, and he does it at runtime. I commented about doing it at build-time instead, and a couple people were wondering more, so here's a brief explaination.

The first part is a properties file (which can be read by both Ant and CF (or whatever)). Here's an example (named agg.js.properties):

# the type of file being aggregated (used to do minification)
type         = js
# the URL path the files are relative to.
urlBasePath  = /marketing/js/
# the list of filenames to aggregate.  The first line (with the equals
# sign) should be a filename and a slash, all other lines should be a
# comma, a filename, and a slash  Indentation is irrelevant.
filenames    = date.js\
  ,jquery-latest.js\
  ,ui.datepicker.js\
  ,ui.mouse.js\
  ,ui.slider.js\
  ,ui.draggable.js\
  ,jquery.dimensions.js\
  ,jquery.easing.1.2.js\
  ,jquery-easing-compatibility.1.2.js\
  ,coda-slider.1.1.1.js\
  ,jquery.tooltip.min.js\
  ,jScrollPane.min.js\
  ,jquery.metadata.js\
  ,prototype.classes.js\
  ,reporting.js\
  ,jquery.ajaxQueue-min.js\
  ,script.js

This sets up the everything needed for the aggregation. Within our project, we have this file as a peer of the property file (named agg.js.cfm):

<cfscript>
filename = replace(getCurrentTemplatePath(), ".cfm", ".properties");
fis = createObject("java", "java.io.FileInputStream").init(filename);
bis = createObject("java", "java.io.BufferedInputStream").init(fis);
props = createObject("java", "java.util.Properties").init();
props.load(bis);
urlBasePath = props.getProperty("urlBasePath");
type = props.getProperty("type");
filenames = listToArray(props.getProperty('filenames'));
for (i = 1; i LTE arrayLen(filenames); i = i + 1) {
	if (type EQ "css") {
		writeOutput('<link rel="stylesheet" href="#urlBasePath##filenames[i]#" type="text/css" />');
	} else { // js
		writeOutput('<script src="#urlBasePath##filenames[i]#" type="text/javascript"></script>');
	}
	writeOutput(chr(10));
}
</cfscript>

It reads the properties file, and writes out either LINK or SCRIPT tags as appropriate to the individual assets. This facilitates easy debugging in development, because nothing is modified from it's source. The file is included into the HEAD of our layout templates to get everything in page.

The real magic happens with Ant, which we use for our deployments. Within the build file, we have a call to the aggregateAssets target for each properties file:

<antcall target="aggregateAssets">
  <param name="propfile" value="${output}/wwwroot/marketing/templates/agg.js.properties" />
  <param name="rootdir" value="${output}/wwwroot/marketing/js" />
</antcall>

The params specify the properties file and the root directory. Note that the rootdir param corresponds with the urlBasePath in the properties file. The target itself looks like this:

<target name="aggregateAssets">
  <!-- read the aggregation properties -->
  <property file="${propfile}" prefix="agg" />

  <!-- get the root -->
  <propertyregex property="agg.root"
    input="${propfile}"
    regexp="^(.*)\.properties$"
    select="\1" />

  <!-- split the root into file and path sections -->
  <propertyregex property="agg.fileroot"
    input="${agg.root}"
    regexp="^.*/([^/]+)$"
    select="\1″ />
  <propertyregex property="agg.pathroot"
    input="${agg.root}"
    regexp="^(.*/)[^/]+$"
    select="\1″ />

  <!– set up the output file stuff –>
  <property name="agg.outfile" value="${rootdir}/${agg.fileroot}" />
  <property name="agg.cfmfile" value="${agg.root}.cfm" />
  <property name="minsuffix" value=".yuimin" />

  <!– run everything through the YUI Compressor –>
  <for list="${agg.filenames}" param="filename">
    <sequential>
      <echo message="compressing @{filename} to @{filename}${minsuffix} (in ${rootdir})" />
      <java classname="com.yahoo.platform.yui.compressor.YUICompressor"
        failonerror="true"
        output="${rootdir}/@{filename}${minsuffix}"
        append="true"
        logError="true"
        fork="true">
        <arg value="–type"/>
        <arg value="${agg.type}"/>
        <arg value="–nomunge"/>
        <arg file="${rootdir}/@{filename}" />
        <classpath>
          <pathelement path="${java.class.path}"/>
        </classpath>
      </java>
    </sequential>
  </for>

  <!– aggregate all the compressed files together –>
  <echo file="${agg.outfile}" message="// built by Ant using YUI Compressor" />
  <for list="${agg.filenames}" param="filename">
    <sequential>
      <concat destfile="${agg.outfile}" append="true">
        <header trimleading="true">
          // @{filename}
        </header>
        <filelist dir="${rootdir}" files="@{filename}${minsuffix}" />
      </concat>
    </sequential>
  </for>

  <!– delete all the compressed files –>
  <delete>
    <fileset dir="${rootdir}" includes="*${minsuffix}" />
  </delete>

  <!– write the CFM file to pull in the compressed and aggregated file –>
  <if>
    <equals arg1="${agg.type}" arg2="css" />
    <then>
      <echo file="${agg.cfmfile}"><![CDATA[<link rel="stylesheet" href="${agg.urlBasePath}${agg.fileroot}" type="text/css" />]]></echo>
    </then>
    <else>
      <echo file="${agg.cfmfile}"><![CDATA[<script src="${agg.urlBasePath}${agg.fileroot}" type="text/javascript"></script>]]></echo>
    </else>
  </if>
</target>

First, it reads the properties file, runs each listed asset through the YUI Compressor, and then aggregates the result. Finally, it overwrites agg.js.cfm (from above) with one that contains a single LINK/SCRIPT element to the aggregation result. End result is a single aggregated, compressed asset in production for speed, and separate uncompressed assets in development for easy debugging.

Edit: Do note that you'll need both the ant-contrib package and the YUI Compressor JARs to be installed into Ant for this to work.