How can I programmatically scrape a web page and “click” a javascript button?

I'm trying to scrape a web page for work where there are hundreds of table rows with a check box, and to submit the form I need to click a button which calls a javascript function. The button in html looks like this:

<a onclick="JavaScript: return verifyChecked('Resend the selected request for various approvals?');"
id="_ctl0_cphMain_lbtnReapprove"
title="Click a single request to send to relevant managers for reapproval."
class="lnkDBD" href="javascript:__doPostBack('_ctl0$cphMain$lbtnReapprove','')"
style="border-color:#0077D4;border-width:1px;border-style:Solid;text-decoration: overline;">&nbsp;Resend&nbsp;</a>

I know with libraries like beautiful soup you can submit forms by adding post data to the url, but how could I check a checkbox and "click" this javascript button? The website is a help desk of sorts, and for this particular button we can only check one request at a time which takes way too long when there are hundreds of requests that need re-submitted.

When I check the checkbox a message also pops up verifying that I want to do this, I don't know if that will affect programmatically submit it.

EDIT: I forgot to include the doPostBack method.

<script type="text/javascript"> 
<!--
var theForm = document.forms['aspnetForm'];
if (!theForm) {
    theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
    if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
        theForm.__EVENTTARGET.value = eventTarget;
        theForm.__EVENTARGUMENT.value = eventArgument;
        theForm.submit();
    }
}
// -->
</script>

Answers


Get Firefox and Firebug, open Firebug load up the page, and look in the console tab for what its actually sending to the server.

Then just repeat what its sending using what ever tool you like.


You're probably better off using a browser automation library like selenium for something like this.


Try Imacros. For simple browser automation it's excellent. You can record your sessions and it makes code based on that. If there is more logic, standard programming in the non-complex documentation can have you going fast. You can cal outside language / scripts as well. A few projects for example I've used this for:

1) collect business leads: a site had a list of all ther retail stores but would not give them all just close to a user input zip code. In spreadsheet put a ton of zip codes and when ran, would go through each one from csv and scrape info for store in csv file. Every 5 minutes would open VPN program on pc and change ip. Took. 20 minutes to make.

I'd your set on programming it then ok, but I find this the best way as its easier to debug if site changes , their "code" is very easy and you can call other scripts and files with ease. Firefox add on is most stable and free.


Need Your Help

Fusion table query : zoom on a marker from an external link

javascript google-maps google-fusion-tables

I'm trying to create links from a text to a marker in a google fusion map. More precisely, I want the map to refresh and zoom on the location selected.

Executing Love2D scripts

love2d

The only way I found out to execute my script with the Love2d framework is to zip all of it and then rename the zip to love. This kinds of take a lot of time for a small modification. Is there a ...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.