I needed to read in some gzipped text files from disk, so I wrote a little UDF to do it for me, and thought I'd share. It uses Java to do the heavy lifting under the hood:
<cffunction name="readGZippedText" output="false" returntype="string"> <cfargument name="filename" type="string" required="true" /> <cfset var data = "" /> <cfset var stream = "" /> <cfset var text = "" /> <cfset var s = structNew() /> <cffile action="readbinary" file="#filename#" variable="data" /> <cfscript> stream = createObject("java", "java.io.BufferedReader").init( createObject("java", "java.io.InputStreamReader").init( createObject("java", "java.util.zip.GZIPInputStream").init( createObject("java", "java.io.ByteArrayInputStream").init(data) ) ) ) ; text = createObject("java", "java.lang.StringBuffer").init(); while (true) { s.line = stream.readLine(); if (NOT structKeyExists(s, "line")) { break; } text.append(s.line).append(chr(10)); } </cfscript> <cfreturn text /> </cffunction>
The code isn't very complex, just reads in the data with CFFILE, uses the native Java zip and IO stuff to get a text stream, and then reads that into a StringBuffer (for performance reasons) before returning it.
This could come in handy. It's going in my Source Code library. Thanks.
@Barney:
I wrote a similar function a couple years ago:
http://blog.pengoworks.com/blogger/index.cfm?action=blog:501
You're using a StringBuffer and I used a ByteArrayOutputStream, but I ran into some Buffer Overflow errors if the GZIP'ed content was too large.
In my case I was reading in an XML packet that had been GZIP'ed and the XML files could get pretty large.
Just something you might want to test for…
Dan,
I wonder if the exception was do to the wholesale conversion from the ByteArrayOutputStream buffer to a String. I haven't cracked 10MB yet, so I can't say if mine will have the same problem, but I'm doing that conversion a piece at a time (via the Reader/StringBuilder arrangement). More simply, I'm reading textual characters a line at a time, rather than bytes 1024 at a time, into my output "thing". I'll definitely post back if/when I cross that threshold though.
I'd submit this to cflib. Useful!
Hi Barney
I don't suppose you wrote the reverse of this function by any chance? I've been struggling with Output Streams etc, but not getting anywhere… (I'm actually trying to deflate, not gzip but it's pretty much the same process I believe)
Geoff, I sure haven't. Did you check cflib.org? If not, you should be able to basically do the same thing as I did, just in reverse. Here's psuedocode:
I haven't tested that, but it should be close.