REReplaceCallback UDF

If you've used pretty much any modern language, you know all about callback functions.  Unfortunately CFML is capable of doing it, but the language itself doesn't leverage the feature anywhere.  In particular, a callback for the replace operation is of great value.  Ben Nadel has blogged about such things a couple times, and now I'm doing the same.  First, here's how you use it:

<cfscript>
string = "The catapult bifurcated the bearcat.";
fancyString = REReplaceCallback(string, "(\w*)(cat)(\w*)", doit, "all");
function doit(match) {
  if (match[2] EQ "") {
    return '#match[2]#<b><i>#match[3]#</i></b>#match[4]#';
  } else {
    return '<u>#match[2]#<b><i>#match[3]#</i></b>#match[4]#</u>';
  }
}
</cfscript>

As you'd imagine, the 'doit' function is invoked for each match of the regular expression (in this case looking for a literal "cat" surrounded by any number of word characters).  It then does a check on match[2] (the leading word characters) to see if it's empty and then forks based on that result (either underlining or not).  The 'match' array, as you might surmise, contains the matched expressions.  The first index is the entire expression, and an additional index is added for each subexpression in the regular expression.  In this case, there are three subexpressions, so the 'match' array will have length 3 + 1 = 4.

This particular conditional can be performed without a callback.  Here are a pair of REReplace calls that do it:

<cfscript>
string = "The catapult bifurcated the bearcat.";
fancyString = REReplace(string, "(\W|^)(cat)", "\1<b><i>\2</i></b>", "all");
fancyString = REReplace(fancyString, "(\w+)(cat)\w*", "<u>\1<b><i>\2</i></b>\3</u>", "all");
</cfscript>

The first one takes care of words starting with 'cat', the second words with 'cat' inside or at the end.  Note that this only works because the result of the first replace does NOT put word characters next to 'cat' in the replacement string.  If it did that, we'd be screwed, because the two replaces happen sequentially, not in parallel.

In this particular case, neither one of them is very readable.  :)  With a little cleanup and a well-named temp variable, I'd say the callback version has the potential to be more readable, but the pair of REReplaces is pretty much stuck as-is.  As things get more complicated, however, the callback approach becomes dramatically clearer.

The big win, of course, has nothing to do with conditional replaces.  Rather, it's the ability to execute arbitrary CFML code to generate the replace string based on the matched string.  Your callback can do anything you want: go hit the database, shell out to a web service, go grab a dynamically selected bean from ColdSpring and get a value from it, etc.  The sky's the limit.

Here's the REReplaceCallback UDF itself:

<cffunction name="REReplaceCallback" access="private" output="false" returntype="string">
  <cfargument name="string" type="string" required="true" />
  <cfargument name="pattern" type="string" required="true" />
  <cfargument name="callback" type="any" required="true" />
  <cfargument name="scope" type="string" default="one" />
  <cfscript>
  var start = 0;
  var match = "";
  var parts = "";
  var replace = "";
  var i = "";
  var l = "";
  while (true) {
    match = REFind(pattern, string, start, true);
    if (match.pos[1] EQ 0) {
      break;
    }
    parts = [];
    l = arrayLen(match.pos);
    for (i = 1; i LTE l; i++) {
      if (match.pos[i] EQ 0) {
        arrayAppend(parts, "");
      } else {
        arrayAppend(parts, mid(string, match.pos[i], match.len[i]));
      }
    }
    replace = callback(parts);
    start = start + len(replace);
    string = mid(string, 1, match.pos[1] - 1) & replace & removeChars(string, 1, match.pos[1] + match.len[1] - 1);
    if (scope EQ "one") {
      break;
    }
  }
  return string;
  </cfscript>
</cffunction>

Lots of stuff going on in there, but it's basically just doing a REFind with returnsubexpressions=true, ripping apart the string to pass the pieces to the callback function, and then reassembling the string afterwards.  It'd be trivially easy to make a REReplaceNoCaseCallback function, but I haven't done.  I've implemented the function with CFFUNCTION/CFARGUMENT tags so that I can have an optional fourth parameter on CF8, but the body as CFSCRIPT so that if you want to use the UDF in pure CFSCRIPT on CF9, you only have to rewrap the body (not reimplement).

This particular implementation differs from what you might expect in that the callback gets substrings instead of position/length tuples (i.e., the way REFind works).  I opted for this approach for two reasons: first it removes the need for the callback to have access to the raw string, and secondly all you do with the len/pos is rip the string apart to get the characters so why make every callback do it.

Why did I write this?  Just for fun?  No, not at all.  I needed a way of doing rich inline markup with tags that could be implemented via plugging for a project (you get one guess), and after playing with a couple formats I concluded that porting WordPress's shortcodes was as close to an optimal solution as I was going to get.  The shortcode implementation requires this sort of conditional replace operations, so I built this UDF.  If you do PHP, it's basically equivalent to preg_replace_callback but with CFML argument ordering.

Yes, I'll be sharing the CFC that implements shortcodes (complete with a port of the WordPress unit tests from PHPUnit to MXUnit), but not right this second.

<cffunction name="REReplaceCallback" output="false" returntype="string">
<cfargument name="string" type="string" required="true" />
<cfargument name="pattern" type="string" required="true" />
<cfargument name="callback" type="any" required="true" />
<cfargument name="scope" type="string" default="one" />
<cfset var start = 0 />
<cfset var match = "" />
<cfset var parts = "" />
<cfset var replace = "" />
<cfset var i = "" />
<cfloop condition="true">
<cfset match = REFind(pattern, string, start, true) />
<cfif match.pos[1] EQ 0>
<cfbreak />
</cfif>
<cfset parts = [] />
<cfloop from="1″ to="#arrayLen(match.pos)#" index="i">
<cfif match.pos[i] EQ 0>
<cfset arrayAppend(parts, "") />
<cfelse>
<cfset arrayAppend(parts, mid(string, match.pos[i], match.len[i])) />
</cfif>
</cfloop>
<cfset replace = callback(parts) />
<cfset start = start + len(replace) />
<cfset string = mid(string, 1, match.pos[1] – 1) & replace & removeChars(string, 1, match.pos[1] + match.len[1] – 1) />
</cfloop>
<cfreturn string />
</cffunction>

6 responses to “REReplaceCallback UDF”

  1. Raymond Camden

    Can we use this on CFLib?

  2. Ben Nadel

    Replace-style algorithms are really where I see this being the most powerful. Even if the CF team were only able to implement this in the CFScript-only aspect of ColdFusion, I think it would awesome.

  3. Raymond Camden

    Coolio. Just so you know, we (we being the CFLib corporation, entity, army, etc) are perfectly ok with CF9 only UDFs – just got to make sure it's noted.

  4. WordPress Shortcodes in CFML

    [...] and the port (including unit tests) took perhaps an hour and a half.  I had to roll my own REReplaceCallback UDF to match one of the PHP builtins, as well as change the callback API slightly to deal with CFML [...]