Downloader Middleware to ignore all requests to a certain URL in scrapy

I am trying to define a custom downloader middleware in Scrapy to ignore all requests to a particular URL (these requests are redirected from other URLs, so I can't filter them out when I generate the requests in the first place).

I have the following code, the idea of which is to catch this at the response processing stage (as I'm not exactly sure how requests redirecting to other requests works), check the URL, and if it matches the one I'm trying to filter out then return an IgnoreRequest exception, if not, return the response as usual so that it can continue to be processed.

from scrapy.exceptions import IgnoreRequest
from scrapy import log

class CustomDownloaderMiddleware:

    def process_response(request, response, spider):
        log.msg("In Middleware " + response.url, level=log.WARNING)
        if response.url == "http://www.achurchnearyou.com//":
            return IgnoreRequest()
        else:
            return response

and I add this to the dict of middlewares:

DOWNLOADER_MIDDLEWARES = {
    'acny.middlewares.CustomDownloaderMiddleware': 650
}

with a value of 650, which should - I think - make it run directly after the RedirectMiddleware.

However, when I run the crawler, I get an error saying:

ERROR: Error downloading <GET http://www.achurchnearyou.com/venue.php?V=00001>: process_response() got multiple values for keyword argument 'request'

This error is occurring on the very first page crawled, and I can't work out why it is occurring - I think I've followed what the manual said to do. What am I doing wrong?

Answers


I've found the solution to my own problem - it was a silly mistake with creating the class and method in Python. The code above needs to be:

from scrapy.exceptions import IgnoreRequest
from scrapy import log

class CustomDownloaderMiddleware(object):

   def process_response(self, request, response, spider):
       log.msg("In Middleware " + response.url, level=log.WARNING)
       if response.url == "http://www.achurchnearyou.com//":
           raise IgnoreRequest()
       else:
           return response

That is, there needs to be a self parameter for the method as the first parameter, and the class needs to inherit from object.


Need Your Help

Trouble enumerating an array

c arrays strstr

can't seem to see my error here, when I compile and run I just get the "Search for: " coming up, I enter something that should show a result but nothing happens and just exits.

Endian Dependency in bit shift, bitwise operators

binary endianness bit bit-shift

Do any of the operations dealing with masking or extracting individual bits from an integer depend on endianness? I've written some code, but with access only to hardware of one type, I can't really