Amazon S3 CFC Now Supports Paged Lists

The initial implementation of the listObjects method on my Amazon S3 CFC didn't include any means for paging through records.  The default behaviour of S3 when doing a listObjects request (a GET of the bucket) is to return the first 1000 keys in alphabetical order, and then truncate the result.

There are now two more parameters to the listObjects method on the CFC:

  • startAfter[""] – The object key to start the listing after (corresponding to the 'marker' parameter in the raw S3 API), defaulting to the empty string (which is before the first object key).
  • maxResults[1000] – The number of results to return (corresponding to the 'max-keys' parameter in the raw S3 API), defaulting to 1000.

This is a simple, but rather atypical way to do paging, so a brief bit of psuedo-code is in order:

s3 = new S3("MY_AWS_KEY", "MY_AWS_SECRET");
pageSize = 10;
items1To10 = s3.listObjects('s3.test.com', '', '/', '', pageSize);
items11To20 = s3.listObjects('s3.test.com', '', '/',
  items1To10.objectKey[items1To10.recordCount], pageSize);

As you can see, the second page of results is retrieved by passing the last object key from the first page as the 'startAfter' parameter.  Note that S3 always returns object keys in alphabetical order, so all paging operations are using that ordering.

In addition to these two new parameters to listObjects, both listObjects and listBuckets now return extended metadata including both the standard fields (executionTime, cached, recordCount, and columnList) and extra S3-specific metadata.  In particular, listObjects returns an 'isTruncated' field indicating whether the listing was truncated (meaning there are more keys to retrieve).  Very helpful in determining whether you need to look for another page of records.

Some psuedocode for this might look like this:

s3 = new S3("MY_AWS_KEY", "MY_AWS_SECRET");
pageSize = 10;
items1To10 = s3.listObjects('s3.test.com', '', '/', '', pageSize);
if (items1To10.getMetadata().getExtendedMetadata().isTruncated) {
  // ... fetch next page ...
}

Just for reference, the extended metadata is what is returned when you use the 'result' attribute on a normal CFQUERY tag.  That same data is available from the query result (what you pass to the 'name' attribute of CFQUERY) via the syntax above.  So you can always get the SQL, execution time, cache status, etc from any query result, even if you don't have a reference to the 'result' attribute's variable.  Forgive the confusing nomenclature, the attribute names are really screwed up, so here's some psuedocode that will hopefully illustrate:

<cfquery name="myQuery" result="myMetadata">
  select ...
  from ...
</cfquery>
md = myQuery.getMetadata().getExtendedMetadata();
assert md.sql EQ myMetadata.sql;
assert md.executionTime EQ myMetadata.executionTime;
assert md.equals(myMetadata);

This is handy for query results which are returned from CFC methods, for example, where you can't use the 'result' attribute.

Development of these new features was sponsored by Gaurav Malik and CSSTC.

3 responses to “Amazon S3 CFC Now Supports Paged Lists”

  1. derek

    Hi Barney

    I posted a request on one of your old blogs so forgive me for repeating, just not sure if you still access it.

    I have a challenge in that I want to give users access to upload and download to amazon on the fly but I need to track the movement so I can monitor bandwidth usage per client. All clients login to the same website so I know who they are by userid.

    I want to use your CFC, just not sure of the code top actually call each method so that I can track the movement.

    Thanks in advance
    Derek

  2. derek

    Hi Barney

    Thanks a ton, will give it a whack and give the feedback.

    Cheers

    Derek