I'm early in the move to Amazon, of course, but already some specific tools are indispensable. I'm sure the list will grow, but here's where I'm at right now:
- S3Sync - A simple rsync-like command line tool (called 's3sync') for syncing stuff from a computer to S3 or the reverse. Also includes the 's3cmd' tool that roughly implements the web service API (list your buckets, put a file, etc.). This is the cornerstone of the plan for moving all my data files from my current server and backups to S3. Once the migration is complete, s3cmd will probably be the tool of choice for manipulating S3 programatically. Written in Ruby, and requires 1.8.4+; my CentOS 4 box couldn't find a new enough RPM, so I had to compile from source (which was totally painless).
- S3 Firefox Organizer (S3Fox)- a client for S3 following the standard FTP client paradigms. It has it's own proprietary definition of folders, but they're unobstrusive. Since I'm getting stuff into S3 mostly with s3sync, I'm mostly using this for read-only oversight.
- EC2 UI - a client for managing your EC2 "stuff" from Firefox. While not FTP-like at all, it shares a lot of the same UI as S3Fox for setting up accounts and the like.
Brian Rinaldi posted on his blog about dummy queries in CF 8.0.1, and it struck me as a weird solution. So here's a drop-in replacement, that I think works in a more reasonable fashion, and doesn't have any dependency on an existing DSN.
<cffunction name="dummyQuery2" access="public" output="false" returntype="query">
<cfargument name="queryData" type="struct" required="true" />
<cfset var i = 0 />
<cfset var columnName = "" />
<cfset var myQuery = queryNew(structKeyList(queryData)) />
<cfset var queryLength = arrayLen(arguments.queryData[listFirst(structKeyList(arguments.queryData))]) />
<cfloop from="1" to="#queryLength#" index="i">
<cfset queryAddRow(myQuery) />
<cfloop collection="#arguments.queryData#" item="columnName">
<cfset querySetCell(myQuery, columnName, queryData[columnName][i]) />
</cfloop>
</cfloop>
<cfquery dbtype="query" name="myQuery">
select *
from [myQuery]
</cfquery>
<cfreturn myQuery />
</cffunction>
As you can see, the structure is almost identical, but it doesn't use a database, it just builds in memory. The "no-op" QofQ at the end is to ensure there is actual query metadata, not just the raw records, which Brian listed as one of his prerequisites. If you don't care, it can be removed with no ill effects.
One interesting benefit of this approach is that the rows come out in the same order as they go in - with Brian's DB-based one, that's not guaranteed because there is no ORDER BY clause on the query. Running his example on my box (using MSSQL 2005), I got rows sorted by first name. With the in-memory building, the rows are explicitly kept in order throughout.
First task for my Amazon move is getting data assets (non-code-managed files) over to S3. I have a variety of types of data assets that need to move and have references updated, most of which require authentication. To make that easier, I wrote a little UDF to take care of building urls with authentication credentials in there.
<cffunction name="s3Url" output="false" returntype="string">
<cfargument name="awsKey" type="string" required="true" />
<cfargument name="awsSecret" type="string" required="true" />
<cfargument name="bucket" type="string" required="true" />
<cfargument name="objectKey" type="string" required="true" />
<cfargument name="requestType" type="string" default="vhost"
hint="Must be one of 'regular', 'ssl', 'vhost', or 'cname'. 'Vhost' and 'cname' are only valid if your bucket name conforms to the S3 virtual host conventions, and cname requires a CNAME record configured in your DNS." />
<cfargument name="timeout" type="numeric" default="900"
hint="The number of seconds the URL is good for. Defaults to 900 (15 minutes)." />
<cfscript>
var expires = "";
var stringToSign = "";
var algo = "HmacSHA1";
var signingKey = "";
var mac = "";
var signature = "";
var destUrl = "";
expires = int(getTickCount() / 1000) + timeout;
stringToSign = "GET" & chr(10)
& chr(10)
& chr(10)
& expires & chr(10)
& "/#bucket#/#objectKey#";
signingKey = createObject("java", "javax.crypto.spec.SecretKeySpec").init(awsSecret.getBytes(), algo);
mac = createObject("java", "javax.crypto.Mac").getInstance(algo);
mac.init(signingKey);
signature = toBase64(mac.doFinal(stringToSign.getBytes()));
if (requestType EQ "ssl" OR requestType EQ "regular") {
destUrl = "http" & iif(requestType EQ "ssl", de("s"), de("")) & "://s3.amazonaws.com/#bucket#/#objectKey#?AWSAccessKeyId=#awsKey#&Signature=#urlEncodedFormat(signature)#&Expires=#expires#";
} else if (requestType EQ "cname") {
destUrl = "http://#bucket#/#objectKey#?AWSAccessKeyId=#awsKey#&Signature=#urlEncodedFormat(signature)#&Expires=#expires#";
} else { // vhost
destUrl = "http://#bucket#.s3.amazonaws.com/#objectKey#?AWSAccessKeyId=#awsKey#&Signature=#urlEncodedFormat(signature)#&Expires=#expires#";
}
return destUrl;
</cfscript>
</cffunction>
To use it, do something like this:
s3Url(aws_key, aws_secret, "s3.barneyb.com", "test.txt", 'cname');
That will generate a request to the file "test.txt" in the "s3.barneyb.com" bucket, using a CNAME-style URL. Obviously you'll have to know my AWS key and secret for it to work, and I'm not telling, but substitute your own values. You can use regular (bucket name in the request), vhost (bucket name in an S3 subdomain), cname (a vanity CNAME pointing at S3), or ssl (regular over HTTPS) for the 5th type parameter to control the style of URL generated.
Edit: here's a link to the project page.
I'm in the process of switching my hosting from a dedicated box at cari.net over to Amazon EC2 and S3. Based on my estimates, the costs will be slightly higher per month ($60/mo right now, $75-80/mo post move), but the benefits are significant:
- Using S3 for all my backups and data storage will definitely give me some piece of mind that I've been lacking.
- The virtualized nature of the servers means doing upgrades is totally safe: launch a new copy of the box, do the upgrade, and if everything's golden, switch the IP to the new box. Cost is $0.10/hr which is close enough to zero to not matter.
- I get a processor "upgrade" from my Celeron at Cari to a similarly clocked Xeon equivalent. The latter is paravirtualized, of course, but it should still help since most of my apps are CPU-bound. I also get some more RAM, but that's less important.
- Last, but not least, Cari has had a lot of network issues in the year I've hosted there while Amazon hasn't.
First task is to move storage over to S3, and update the applications that currently access stuff off the filesystem (like autogeneration of thumbnails).