11 May 2007

summary of python interfaces to amazon s3

after i carelessly ran an idle amazon ec2 instance for a month doing nothing, and then geting charged US$40 for the privilege by amazon, i swore off playing with it again for a while.

recently, i've been looking at mass online storage and the ways you can access it, i came across a couple of interesting "deals" like $3/month for 20GB on amazon s3, 140GB for $10/month on dreamhost and $49/year for 25GB on bingodisk.

those deals seem pretty good, dreamhost offers all sorts of ways to get your data on to there, such as rsync over ssh, sftp or webdav, bingodisk offers only webdav and amazon s3 uses it's own buckets/keys system over their HTTP based API.

amazon s3 is the most attractive since you use what you need and can quit any time. bingo disk comes second on price and unlimited bandwidth. dreamhost is good if you really do need 140GB, but right now, i can't think of why i would want to. plus, how long would it take for me to upload 140GB over my 1.5Mbit upstream? (ans: over 3 years.) but if i only use 20GB, then it doesn't make much sense to pay more than the other plans.

hence, i was thinking of giving amazon s3 a try. there are a growing number of cool tools available such as s3 browser and jungle disk. s3 browser is just a very simple mac only app for you to add and remove objects from the s3 storage system. jungle disk is a local webdav proxy that exposes your s3 bucket as a webdav share. it works on linux, mac and windows. pretty neat in my opinion.

other thing i wanted to know was what the support is for python tools. turns out there are a couple of them so i thought it would be interesting to list them all out:

Amazon S3 Library for REST in Python

this is the official python implementation of their S3 protocol from amazon. pretty simple and straightforward, does the trick. link.

Python Amazon

this is an unofficial implementation of the protocol for both the S3 and SQS service. has epydocs although they're not that much more useful than looking straight at the code. link.


this is a wrapper around the amazon official library to make it more "pythonic". however, on first glance i didn't find it much better than the amazon official implementation. instead you use simple basic python setattr, getattr rather than the explicit methods in the S3 module. maybe some prefer this over the amazon one. link.


this is a fuse-python module that exposes the s3 buckets as a filesystem using the fuse module/library. that means it should in theory work on macfuse and vanilla-fuse on linux. link.

finally, worth a mention is s3sync which is in ruby but basically does file syncing for you like rsync. except it doesn't do incremental backups, but keeps track of what has changed and what has not by setting metadata on the objects in your s3 buckets. pretty neat.

well, after all that poking around, i still haven't made a decision on what to use for my offsite backups yet, but s3 seems to be winning out over struggling with webdav. mac's implementation of webdav (via bingodisk) does leave quite a bit to be desired.

You can reply to me about this on Twitter: