Amazon S3 (or, How I Solved My Image Hosting Woes)

June 12, 2007 at 12:19 PM | Python, Play | View Comments

A couple months ago, I read Jeff Atwood's article on Using Amazon S3 as an Image Hosting Service, but because my favicon does not does not generate 27 GBs of traffic a month, I didn't think much of it.

(if you're scratching your head wondering what Amazon's S3 is, I'll do my best to explain. It stands for Simple Storage Service, and it is just that: a simple service that allows you to store data. That data (or "those files", if you wish) can be world-readable (from a web browser) or private (many people are using it for backup). It's targeted at developers (Amazon only provides a set of APIs -- all GUIs are 3rd party) and it is dirt cheep: only $0.15 a gigabyte.)

Then, a few weeks ago, I was taking some pictures of my sister's soccer practice, and realized that I needed some place to put them. After looking at a couple of cheep hosting plans, I decided that I didn't like them; $5 a month isn't much, but that's still $60 a year for a whole lot of space I won't be using. Then I remembered S3.

After a night of playing around, I figured that it should be pretty simple. Write a script that will go through an HTML file looking for images, then upload them one at a time, replacing the links as it goes. html2s3 was born.

It turns out that the S3 library provided by Amazon is very easy to use, with most of my time spent correcting file paths. A couple of neat bits:

parser = OptionParser(usage = "usage: %prog [options] [FILES]\nWill process ..."
parser.add_option("-b", "--bucket", action="store", dest="bucket", help=...)
parser.add_option("-k", "--key-prefix", dest="remote_base_path", help="Prefix key ...")
parser.add_option("-n", "--no-backup", dest="no_backup", action="store_true" ...)
(options, args) = parser.parse_args()

I had seen OptionParser before, but never used it. It's so simple, though, that I think I'm going to use it quite a bit more frequently.

  source = args or sys.stdin.xreadlines()
  for line in source:

One thing I loved about C was the ability to use assignment in loops (while (fgets(buf, sizeof(buf), stdin) != EOF)), but stdin.xreadlines() does the same sort of thing.

Update: When I moved the script over to my server, it started telling me that I didn't have permission to list the bucket. So I started digging in to it a bit, and found out that I was getting the error RequestTimeTooSkewed. It turns out that I forgot to turn my server's clock back last daylight savings... D'oh!