Using Amazon AWS to create an offline database
Problem Statement: I would like to create an offline database to lookup prices/info on the n most useful books to sell in the United States (where n is probably 3 million or so).
Question: So, my question is (and I am open to other approaches here as well), I am trying to figure out how to use Amazon AWS to download a list of the n higest salesrank books being sold as well as some information about the book (i.e. title, prices, etc...).
What I have done so far: First, something like this exists already (asellertool.com), however, I thought this would be an interesting project to work on and quite frankly, we aren't serious enough to need to pay the $30/month subscription.
Now, AWS is great (and easy) if you have a few items you want to look up, but I can't seem to figure out how enumerate on sales rank. Originally, I was hoping to enumerate all of the book items Amazon had by ISBN. But that wasn't available either. Then I thought I could find a list of all ISBN numbers out there, but that was a dead end too. Finally I thought I could create my own list of ISBN numbers, but as I did some back of the envelope calculations, I thought better of it as my solutions would take roughly a year to go through a third of the 10 digit space at 100/second (and it was overkill anyway).
So, I am back on Sales Ranking, which is currently seems like a dead end as well. So, if you have any thoughts, I would appreciate it.
Amazon has a data feed service you can use which contains GZipped xml files of all their products based on top level categories. It's updated once a day and totals about 20GB/110GB of compressed/uncompressed data. Since you only need books it's more in the area of 4GB/31GB. The only thing is I'm not sure who's able to use this and what's involved with getting an account. They don't list anything about this on their website as far as I know so you will most likely have to contact someone there to find out more about it. We use this at work for stuff we do with them and it's some of the craziest xml processing I've had to do.
Take a look at AWS Zone, in the Amazon E-Commerce Service section.