Do form parameter names need to be encoded when doing a POST?

Quick version: Do the names of parameters of "forms" being sent using the standard multipart/form-data encoding need to be encoded?

Longer version: The upload form on 1fichier.com (a service to upload large files) uses the following to specify the file parameter to upload:

<input type="file" name="file[]" size="50" title="Select the files to upload" />

The name of the parameter is file[] (notice the brackets).

Using LiveHTTPHeaders I see that the parameter is sent like this (i.e. with brackets) when submitting the form in Firefox. However, for a program I'm writing in Python, I am using the poster module to be able to upload files using the standard multipart/form-data encoding. If I enter the parameter name with the brackets, it gets sent like this:

file%5B%5D

Internally, poster encodes the names of the parameters using this function:

def encode_and_quote(data):
    """If ``data`` is unicode, return urllib.quote_plus(data.encode("utf-8"))
    otherwise return urllib.quote_plus(data)"""
    if data is None:
        return None

    if isinstance(data, unicode):
        data = data.encode("utf-8")
    return urllib.quote_plus(data)

The urllib.quote_plus documentation says that this is only "required for quoting HTML form values when building up a query string to go into a URL". But here we're doing a POST, so the form values don't go in the url.

So, do they still need to be encoded, or is it an error of poster to be doing this?

Answers


RFC 2388 covers multipart/form-data submissions. Section 3 specifies that parameter names should be either ASCII or encoded as per RFC 2047.

So if your POST request is encoded as multipart/form-data (which poster is doing), then no, parameter names don't need to be encoded this way. I suggest filing a bug with the author (ahem...), he might be willing to fix it in a future release ;)

A workaround is to set your MultipartParam's name attribute directly, e.g.

   p.name = 'file[]'

Although in essence this question has been answered, I'm including some more details on how to dig through those RFCs.

RFC 2388 section 3 states that a Content-Disposition header is reqired. Non-ASCII data should be encoded using RFC 2047 even though that looks like a conflict. RFC 2183 section 2 describes the format of this Content-disposition header. The name fits in the general parameter rule of that grammar, but references RFC 2045 for that. There in section 5.1 you find that the right hand side of a parameter is either a token or a quoted-string. Neither production mentions any URL-encoded format for form names. But [ and ] are in tspecials, so they cannot be part of a token. So we get

Content-Disposition: form-data; name="file[]"        (correct)
Content-Disposition: form-data; name=file[]          (invalid)
Content-Disposition: form-data; name="file%5B%5D" (wrong name)
Content-Disposition: form-data; name=file%5B%5D   (wrong name)

One more note for non-ASCII file names: the current HTML 5 specification draft requires not encoding them in a 7-bit safe manner, but instead transferring them in the encoding used throughout the request. A question about non-ascii field names is what brought me to look at this question of yours today.


Need Your Help

How does popular mail websites handle server side scripting?

javascript python css jsp cgi

I was wondering how do popular mail websites handle / call the serverside scripts. How do they do it differently in a way that users are not easily able to decipher which file they are calling to i...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.