readGZippedText() UDF

I needed to read in some gzipped text files from disk, so I wrote a little UDF to do it for me, and thought I'd share.  It uses Java to do the heavy lifting under the hood:

<cffunction name="readGZippedText" output="false" returntype="string">
  <cfargument name="filename" type="string" required="true" />
  <cfset var data = "" />
  <cfset var stream = "" />
  <cfset var text = "" />
  <cfset var s = structNew() />
  <cffile action="readbinary"
    file="#filename#"
    variable="data" />
  <cfscript>
    stream =
      createObject("java", "java.io.BufferedReader").init(
        createObject("java", "java.io.InputStreamReader").init(
          createObject("java", "java.util.zip.GZIPInputStream").init(
            createObject("java", "java.io.ByteArrayInputStream").init(data)
          )
        )
      )
    ;
    text = createObject("java", "java.lang.StringBuffer").init();
    while (true) {
      s.line = stream.readLine();
      if (NOT structKeyExists(s, "line")) {
        break;
      }
      text.append(s.line).append(chr(10));
    }
  </cfscript>
  <cfreturn text />
</cffunction>

The code isn't very complex, just reads in the data with CFFILE, uses the native Java zip and IO stuff to get a text stream, and then reads that into a StringBuffer (for performance reasons) before returning it.

6 responses to “readGZippedText() UDF”

  1. Boyan

    This could come in handy. It's going in my Source Code library. Thanks.

  2. Dan G. Switzer, II

    @Barney:

    I wrote a similar function a couple years ago:
    http://blog.pengoworks.com/blogger/index.cfm?action=blog:501

    You're using a StringBuffer and I used a ByteArrayOutputStream, but I ran into some Buffer Overflow errors if the GZIP'ed content was too large.

    In my case I was reading in an XML packet that had been GZIP'ed and the XML files could get pretty large.

    Just something you might want to test for…

  3. Sami Hoda

    I'd submit this to cflib. Useful!

  4. Geoff

    Hi Barney

    I don't suppose you wrote the reverse of this function by any chance? I've been struggling with Output Streams etc, but not getting anywhere… (I'm actually trying to deflate, not gzip but it's pretty much the same process I believe)