readGZippedText() UDF

I needed to read in some gzipped text files from disk, so I wrote a little UDF to do it for me, and thought I'd share.  It uses Java to do the heavy lifting under the hood:

<cffunction name="readGZippedText" output="false" returntype="string">
  <cfargument name="filename" type="string" required="true" />
  <cfset var data = "" />
  <cfset var stream = "" />
  <cfset var text = "" />
  <cfset var s = structNew() />
  <cffile action="readbinary"
    file="#filename#"
    variable="data" />
  <cfscript>
    stream =
      createObject("java", "java.io.BufferedReader").init(
        createObject("java", "java.io.InputStreamReader").init(
          createObject("java", "java.util.zip.GZIPInputStream").init(
            createObject("java", "java.io.ByteArrayInputStream").init(data)
          )
        )
      )
    ;
    text = createObject("java", "java.lang.StringBuffer").init();
    while (true) {
      s.line = stream.readLine();
      if (NOT structKeyExists(s, "line")) {
        break;
      }
      text.append(s.line).append(chr(10));
    }
  </cfscript>
  <cfreturn text />
</cffunction>

The code isn't very complex, just reads in the data with CFFILE, uses the native Java zip and IO stuff to get a text stream, and then reads that into a StringBuffer (for performance reasons) before returning it.

4 Responses to “readGZippedText() UDF”


  1. 1 Boyan

    This could come in handy. It's going in my Source Code library. Thanks.

  2. 2 Dan G. Switzer, II

    @Barney:

    I wrote a similar function a couple years ago:
    http://blog.pengoworks.com/blogger/index.cfm?action=blog:501

    You're using a StringBuffer and I used a ByteArrayOutputStream, but I ran into some Buffer Overflow errors if the GZIP'ed content was too large.

    In my case I was reading in an XML packet that had been GZIP'ed and the XML files could get pretty large.

    Just something you might want to test for…

  3. 3 barneyb

    Dan,

    I wonder if the exception was do to the wholesale conversion from the ByteArrayOutputStream buffer to a String. I haven't cracked 10MB yet, so I can't say if mine will have the same problem, but I'm doing that conversion a piece at a time (via the Reader/StringBuilder arrangement). More simply, I'm reading textual characters a line at a time, rather than bytes 1024 at a time, into my output "thing". I'll definitely post back if/when I cross that threshold though.

  4. 4 Sami Hoda

    I'd submit this to cflib. Useful!

Leave a Reply