django (or wsgi) chain stdout from subprocess
I am writing a webservice in Django to handle image/video streams, but it's mostly done in an external program. For instance:
- client requests for /1.jpg?size=300x200
- python code parse 300x200 in django (or other WSGI app)
- python calls convert (part of Imagemagick) using subprocess module, with parameter 300x200
- convert reads 1.jpg from local disk, convert to size accordingly
- Writing to a temp file
- Django builds HttpResponse() and read the whole temp file content as body
As you can see, the whole temp file read-then-write process is inefficient. I need a generic way to handle similar external programs like this, not only convert, but others as well like cjpeg, ffmepg, etc. or even proprietary binaries.
I want to implement it in this way:
- python gets the stdout fd of the convert child process
- chain it to WSGI socket fd for output
I've done my homework, Google says this kind of zero-copy could be done with system call splice(). but it's not available in Python. So how to maximize performance in Python for these kind of scenario?
- Call splice() using ctypes?
- hack memoryview() or buffer() ?
- subprocess has stdout which has readinto(), could this be utilized somehow?
- How could we get fd number for any WSGI app?
I am kinda newbie to these, any suggestion is appreciated, thanks!
If the goal is to increase performance, you ought to examine the bottlenecks on a case-by-case basis, rather than taking a "one solution fits all" approach.
For the convert case, assuming the images aren't insanely large, the bottleneck there will most likely be spawning a subprocess for each request.
I'd suggest avoiding creating a subprocess and a temporary file, and do the whole thing in the Django process using PIL with something like this...
import os from PIL import Image from django.http import HttpResponse IMAGE_ROOT = '/path/to/images' # A Django view which returns a resized image # Example parameters: image_filename='1.jpg', width=300, height=200 def resized_image_view(request, image_filename, width, height): full_path = os.path.join(IMAGE_ROOT, image_filename) source_image = Image.open(full_path) resized_image = source_image.resize((width, height)) response = HttpResponse(content_type='image/jpeg') resized_image.save(response, 'JPEG') return response
You should be able to get results identical to ImageMagick by using the correct scaling algorithm, which, in general is ANTIALIAS for cases where the rescaled image is less than 50% of the size of the original, and BICUBIC in all other cases.
For the case of videos, if you're returning a transcoded video stream, the bottleneck will likely be either CPU-time, or network bandwidth.
I find that WSGI could actually handle an fd as an interator response
Example WSGI app:
def image_app(environ, start_response): start_response('200 OK', [('Content-Type', 'image/jpeg'), ('Connection', 'Close')]) proc = subprocess.Popen([ 'convert', '1.jpg', '-thumbnail', '200x150', '-', //to stdout ], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) return proc.stdout
It wrapps the stdout as http response via a pipe