Archive for the 'amazon' Category

AmazonS3.cfc Update

I've updated my AmazonS3 CFC to include local caching of files. The new source is available here: amazons3.cfc.txt, or visit the project page.  The only public API change from the first version is the addition of an optional third parameter to the init method for specifying the local directory to use as a cache. If you're doing repetitive read operations on S3-stored assets, using the local cache can speed things up significantly, though it is not without drawbacks.

In particular, the CFC assumes that it is the only interface to the S3-stored assets that it is used to interface with. If you use any other mechanism to manipulate those assets (including multiple CF applications), you'll run into issues. The cache itself is the canonical source for cache state, so emptying the cache folder will always revert the CFC back to S3's state if the cache is out of sync.

If you cluster multiple CF instance together, you can still use the local cache, but you must use a single cache for all CF instances. I.e. the cache must reside on a disk shared by all instances, rather than each instance having it's own separate cache. This reduces the performance benefit slightly (since you must use a non-local disk), but it will still be faster than S3.

The CFC exposes a deleteCacheFor() method that accepts a bucket and objectKey pair that can be used for managing the cache outside of actual S3 operations. If you have multiple CF instances that cannot share a single local cache, or for which the network overhead for a shared cache is still undesirable, you can use this method to synchronize the instances' caches via JMS or something. Obviously that's far outside the scope of the CFC itself, but the hook is there to support it. Note that you must delete cache when overwriting an asset on S3, as the local cache will not pick up the change in S3, it will continue to return the old version if it's not cleared.

S3 is Sweet (One App Down)

This weekend I ported my big filesystem-based app to S3, and it went like a dream. It's a image-management application, with all the actual images stored on disk. In addition to the standard import/edit/delete, the app provides automatic on-the-fly thumbnail generation, along with primitive editing capabilities (crop, resize, rotate, etc.). With images on local disk, that's all really easy: read them in, do whatever, write them back out. I figured using S3 would make things both more cumbersome and less performant. Both suspicions turned out to be unwarranted.

Building on the 's3Url' UDF that I published last week, I whipped up a little CFC to manage file storage on S3 with a very simple API. It has s3Url, putFileOnS3, getFileFromS3, s3FileExists, and deleteS3File methods, which all do about what you'd expect. You can grab the code here: amazons3.cfc.txt (make sure you remove the ".txt" extension) or visit the project page. It uses the simple HTTP-based interface, so after the authentication is handled, it's all very simple and fast. I haven't looked at the SOAP interface - why bother complicating a simple task?

With that CFC (and an application-specific wrapper to take care of some path-related transforms), porting the whole app took about two hours. I also realized after I was mostly done that the CF image tools accept URLs as well as files, so I switched my image reads to just use URLs instead of pulling the file local and reading it from disk.

As for moving all the actual content, S3Sync was a champ, moving about 4.5GB of data from my Cari server to S3 in a few hours, including gracefully handling a couple errors raised by S3 (which a retry - performed automatically - solved), and a stop/restart in the middle. Total cost: about 65 cents.

Next is porting the blogs, including all the Picasa-based galleries. Unfortunately, that means writing PHP, but with how easy the CF stuff was, I don't think it'll be too much effort.

My Amazon Toolkit (Thus Far)

I'm early in the move to Amazon, of course, but already some specific tools are indispensable.  I'm sure the list will grow, but here's where I'm at right now:

  • S3Sync - A simple rsync-like command line tool (called 's3sync') for syncing stuff from a computer to S3 or the reverse.  Also includes the 's3cmd' tool that roughly implements the web service API (list your buckets, put a file, etc.).  This is the cornerstone of the plan for moving all my data files from my current server and backups to S3.  Once the migration is complete, s3cmd will probably be the tool of choice for manipulating S3 programatically.  Written in Ruby, and requires 1.8.4+; my CentOS 4 box couldn't find a new enough RPM, so I had to compile from source (which was totally painless).
  • S3 Firefox Organizer (S3Fox)- a client for S3 following the standard FTP client paradigms.  It has it's own proprietary definition of folders, but they're unobstrusive.  Since I'm getting stuff into S3 mostly with s3sync, I'm mostly using this for read-only oversight.
  • EC2 UI - a client for managing your EC2 "stuff" from Firefox.  While not FTP-like at all, it shares a lot of the same UI as S3Fox for setting up accounts and the like.

Amazon S3 URL Builder for ColdFusion

First task for my Amazon move is getting data assets (non-code-managed files) over to S3. I have a variety of types of data assets that need to move and have references updated, most of which require authentication. To make that easier, I wrote a little UDF to take care of building urls with authentication credentials in there.

<cffunction name="s3Url" output="false" returntype="string">
  <cfargument name="awsKey" type="string" required="true" />
  <cfargument name="awsSecret" type="string" required="true" />
  <cfargument name="bucket" type="string" required="true" />
  <cfargument name="objectKey" type="string" required="true" />
  <cfargument name="requestType" type="string" default="vhost"
    hint="Must be one of 'regular', 'ssl', 'vhost', or 'cname'.  'Vhost' and 'cname' are only valid if your bucket name conforms to the S3 virtual host conventions, and cname requires a CNAME record configured in your DNS." />
  <cfargument name="timeout" type="numeric" default="900"
    hint="The number of seconds the URL is good for.  Defaults to 900 (15 minutes)." />
  <cfscript>
    var expires = "";
    var stringToSign = "";
    var algo = "HmacSHA1";
    var signingKey = "";
    var mac = "";
    var signature = "";
    var destUrl = "";

    expires = int(getTickCount() / 1000) + timeout;
    stringToSign = "GET" & chr(10)
      & chr(10)
      & chr(10)
      & expires & chr(10)
      & "/#bucket#/#objectKey#";
    signingKey = createObject("java", "javax.crypto.spec.SecretKeySpec").init(awsSecret.getBytes(), algo);
    mac = createObject("java", "javax.crypto.Mac").getInstance(algo);
    mac.init(signingKey);
    signature = toBase64(mac.doFinal(stringToSign.getBytes()));
    if (requestType EQ "ssl" OR requestType EQ "regular") {
      destUrl = "http" & iif(requestType EQ "ssl", de("s"), de("")) & "://s3.amazonaws.com/#bucket#/#objectKey#?AWSAccessKeyId=#awsKey#&Signature=#urlEncodedFormat(signature)#&Expires=#expires#";
    } else if (requestType EQ "cname") {
      destUrl = "http://#bucket#/#objectKey#?AWSAccessKeyId=#awsKey#&Signature=#urlEncodedFormat(signature)#&Expires=#expires#";
    } else { // vhost
      destUrl = "http://#bucket#.s3.amazonaws.com/#objectKey#?AWSAccessKeyId=#awsKey#&Signature=#urlEncodedFormat(signature)#&Expires=#expires#";
    }

    return destUrl;
  </cfscript>
</cffunction>

To use it, do something like this:

s3Url(aws_key, aws_secret, "s3.barneyb.com", "test.txt", 'cname');

That will generate a request to the file "test.txt" in the "s3.barneyb.com" bucket, using a CNAME-style URL. Obviously you'll have to know my AWS key and secret for it to work, and I'm not telling, but substitute your own values. You can use regular (bucket name in the request), vhost (bucket name in an S3 subdomain), cname (a vanity CNAME pointing at S3), or ssl (regular over HTTPS) for the 5th type parameter to control the style of URL generated.

Edit: here's a link to the project page.

Moving to the Amazon

I'm in the process of switching my hosting from a dedicated box at cari.net over to Amazon EC2 and S3. Based on my estimates, the costs will be slightly higher per month ($60/mo right now, $75-80/mo post move), but the benefits are significant:

  • Using S3 for all my backups and data storage will definitely give me some piece of mind that I've been lacking.
  • The virtualized nature of the servers means doing upgrades is totally safe: launch a new copy of the box, do the upgrade, and if everything's golden, switch the IP to the new box. Cost is $0.10/hr which is close enough to zero to not matter.
  • I get a processor "upgrade" from my Celeron at Cari to a similarly clocked Xeon equivalent. The latter is paravirtualized, of course, but it should still help since most of my apps are CPU-bound. I also get some more RAM, but that's less important.
  • Last, but not least, Cari has had a lot of network issues in the year I've hosted there while Amazon hasn't.

First task is to move storage over to S3, and update the applications that currently access stuff off the filesystem (like autogeneration of thumbnails).