Scraping dynamic content in a website

I need to scrape news announcements from this website, Link. The announcements seem to be generated dynamically. They dont appear in the source. I usually use mechanize but I assume it wouldnt work. What can I do for this? I'm ok with python or perl.

Answers


The polite option would be to ask the owners of the site if they have an API which allows you access to their news stories.

The less polite option would be to trace the HTTP transactions that take place while the page is loading and work out which one is the AJAX call which pulls in the data.

Looks like it's this one. But it looks like it might contain session data, so I don't know how it will continue to work for.


If the content is generated dynamically, you can use Windmill or Seleninum to drive the browser and get the data once it's been rendered.

You can find an example here.


Need Your Help

How to add @xml:base to my .xml files?

xml xslt docbook

I need a stylesheet to convert my docbook xml files so that they now include a xml:base element to my section tags. How do I go about doing that (since xml:base needs system and node info???)

Facebook Paper-like table cells animation

ios uitableview animation

I'm trying to implement transition used in settings menu in Facebook's Paper app: http://blog.brianlovin.com/design-details-paper-by-facebook/#1. I'm using my custom UIViewControllerAnimatedTransit...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.