My first experience using
Amazon Web Services for a production quality project was quite fun, and deeply interesting. I've played with AWS a bit on my own time, but I recently had a chance to really sink my teeth into it and implement production level code that uses AWS as a real platform for an upcoming web, and mobile application.
Perhaps the most interesting, and frustrating, part of this project involved storing hundreds of thousands of objects in an
AWS S3 bucket. If you're not familiar with
S3, it's the AWS equivalent to an online storage web-service. The concept is simple: you create an S3 "bucket" then shove "objects" into the bucket, creating folders where necessary. Of course, you can also update and delete objects. If it helps, think of S3 as a pseudo online file-system that's theoretically capable of storing an unlimited amount of data. Yes, I'm talking
Exabytes of data ... theoretically ... if you're willing to pay Amazon for that much storage.
In any event, I created a new S3 bucket and eventually placed hundreds of thousands of objects into it. S3 handled this with ease. The problem, however, was when it came time to delete this bucket and all objects inside of it. Turns out, there is no native
S3 API call that recursively deletes an S3 bucket, or renames it for that matter. I guess Amazon leaves it up to the developer to implement such functionality?
That said, if you need to recursively delete a very large S3 bucket, you really have 2 options: use a tool like
s3funnel or write your own tool that efficiently deletes multiple objects
concurrently. Note that I say
concurrently, otherwise you'll waste a lot of time sitting around waiting for a single-threaded delete to remove objects one at a time, which is horribly inefficient. Well this sounds like a perfect problem for a thread pool and wouldn't you guess it, even a
CountDownLatch!
Continue reading for the code ...