handling caching of images in render_to_response
The brief version:
I wish to strip image urls out of the oiutput of a render_to_response call and replace them with a locally cached copy (creating it if it doesn't exist). I have an idea of an implementation, but it seems like every man and his wife has done this before, so I don't want to waste my time when I could be using a version that is tried and tested.
I'm developing a syndication client using django for my back-end (I intend to make an browser version so this will make the transition easier). Now for the Mobile and Desktop versions of my client I want to be able to view feeds offline such as when on a long tube journey. I guess it would effectively be like a feed version of Instapaper.
I have a table in my database holding the image url, the last update time, and an ImageField. The intention is that every time I encounter an image url I look it up in this table (that would be the primary index).
try: img = Image.objects.get(url=image_url) except Image.DoesNotExist: img = Image(url=image_url, image=standard_placeholder) img.save() if not img.last_updated or too_long_ago(img.last_updated): try: new_image = File(open(urllib.urlretreive())) img.image.save(md5(img.url),new_image) img.update() except: pass return img.image
Then all I have to do is replace the images from render_to_response with those from here.
Problems I've already spotted:
I'm not entirely sure how I was intending to find all image urls. the src arguement for an image tag should be easy enough, and should cover the majority of cases, but are there any other methods that are a little more foolproof and a little less full of holes?
Since you are serving crawled html, you should probably filter it using a parser like lxml. While doing this, you can try to find the img tags, fetch the images and change the urls. I suggest doing it as an offline task using Celery.
How about creating a custom template tag/filter that checks if the url is cached and decides which url to use?
Use something like BeautifulSoup or HTMLParser to parse the document and to pull out all of the <img> tags and grab the src attribute.