IndentXml CF UDF

I had a need to fix indentation of some XML today, and a quick Googling didn't turn up much help. So I wrote a little UDF that will take an XML string and return it with all the tags nicely indented:

<cffunction name="indentXml" output="false" returntype="string">
  <cfargument name="xml" type="string" required="true" />
  <cfargument name="indent" type="string" default="  "
    hint="The string to use for indenting (default is two spaces)." />
  <cfset var lines = "" />
  <cfset var depth = "" />
  <cfset var line = "" />
  <cfset var isCDATAStart = "" />
  <cfset var isCDATAEnd = "" />
  <cfset var isEndTag = "" />
  <cfset var isSelfClose = "" />
  <cfset xml = trim(REReplace(xml, "(^|>)\s*(<|$)", "\1#chr(10)#\2", "all")) />
  <cfset lines = listToArray(xml, chr(10)) />
  <cfset depth = 0 />
  <cfloop from="1" to="#arrayLen(lines)#" index="i">
    <cfset line = trim(lines[i]) />
    <cfset isCDATAStart = left(line, 9) EQ "<![CDATA[" />
    <cfset isCDATAEnd = right(line, 3) EQ "]]>" />
    <cfif NOT isCDATAStart AND NOT isCDATAEnd AND left(line, 1) EQ "<" AND right(line, 1) EQ ">">
      <cfset isEndTag = left(line, 2) EQ "</" />
      <cfset isSelfClose = right(line, 2) EQ "/>"
        OR REFindNoCase("<([a-z][a-z0-9_-]*).*</\1>", line) />
      <cfif isEndTag>
        <!--- use max for safety against multi-line open tags --->
        <cfset depth = max(0, depth - 1) />
      </cfif>
      <cfset lines[i] = repeatString(indent, depth) & line />
      <cfif NOT isEndTag AND NOT isSelfClose>
        <cfset depth = depth + 1 />
      </cfif>
    <cfelseif isCDATAStart>
      <!---
      we don't indent CDATA ends, because that would change the
      content of the CDATA, which isn't desirable
      --->
      <cfset lines[i] = repeatString(indent, depth) & line />
    </cfif>
  </cfloop>
  <cfreturn arrayToList(lines, chr(10)) />
</cffunction>

There's nothing XML-ish about the implementation, as you can see, so you can happily feed non-XML tag based markup, as long as it uses '<' and '>' as tag delimiters in the XML fashion. Just don't expect to get good formatting if you don't have tags that follow the XML spec (e.g. CFELSE).  Also, it doesn't account for open tags (or closing tags) that are split across multiple lines. That wasn't a case I cared about, and I don't know that you can solve it correctly without actually parsing at least CDATA blocks out of the XML.

Update (2010-03-10): I've made a slight change to isSelfClose to include tags that don't use an XML self-close, but do have a closing tag on the same line, to avoid extra indentation on following lines.  This change is reflected in the above.

Update (2010-07-30): Jean Moniatte, via cflib.org,  noticed that my regex doesn't handle element names that contains dashes or underscores (both of which are legal).  I've addressed that issue above, and Ray has updated the UDF at cflib.org as well.

Update (2012-09-28): Per a question that got to be through cflib.org, I'm explicitly licensing this UDF under the MIT license.

2 responses to “IndentXml CF UDF”

  1. Joshua

    I usually use xsl transforms. Although that may not have been useful for what you wanted? There are several variants on this available on the web.

    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml"/>
    <xsl:param name="indent-increment" select="' '" />

    <xsl:template match="*">
    <xsl:param name="indent" select="'&#xA;'"/>

    <xsl:value-of select="$indent"/>

    <xsl:copy>
    <xsl:copy-of select="@*" />
    <xsl:apply-templates>
    <xsl:with-param name="indent"
    select="concat($indent, $indent-increment)"/>

    </xsl:apply-templates>
    <xsl:value-of select="$indent"/>
    </xsl:copy>
    </xsl:template>

    <xsl:template match="comment()|processing-instruction()">

    <xsl:copy />
    </xsl:template>

    <!-- WARNING: this is dangerous. Handle with care -->
    <xsl:template match="text()[normalize-space(.)='']"/>

    </xsl:stylesheet>