How to wget a file at a moving adress?

I am trying to download a PDF file from a website, I know the name of the file, e.g. foo.pdf, but It's location changes every few weeks:

e.g. www.server.com/media/123456/foo.pdf changes into www.server.com/media/245415/foo.pdf

The number is always a six-figure number, so I tried using a bash script to go through all 10 million of them, but that obviously takes a lot of time:

i=0
until [ "$RC" == "0" ] || [ $i == 1000000 ]
do
  b=$(printf %06d $i)
  wget -q http://www.server.com/media/${b}/foo.pdf -O bar.pdf
  export RC=$?
  i=$(($i + 1))
done

For wrong addresses I just get 404 errors. I tested it around the currently correct address and it works.

Does anyone know a faster way to solve this problem?

Answers


If that page is linked form anywhere else, then you can get the link from there, and just get the file. If it's not, you are probably out of luck.

Note that most servers would consider trying to hit the webserver 1,000,000 times abuse, and would ban your IP for even trying.


Need Your Help

Why is this 'from-import' failing with PyRun_SimpleString?

python c python-c-api python-embedding

I am working on a simple(?) embedded Python project. I have a custom package that has been installed into site-packages with 'setup.py install', e.g.:

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.