Open Bug 136676 Opened 22 years ago Updated 2 years ago

no quoting at all in MIME-Headers (Content-Disposition header not escaped)

Categories

(Core :: DOM: Core & HTML, defect, P3)

defect

Tracking

()

Future

People

(Reporter: bugzilla, Unassigned)

References

Details

(Keywords: good-first-bug, helpwanted, intl)

Attachments

(1 file)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020310
BuildID:    2002031008

When uploading files, the filename doesn't get quoted at all, leading to failed
form submission or mangled filenames.

Reproducible: Always
Steps to Reproduce:
1. upload a file with a rfc2822-tspecial in the name, e.g
   x\x or x"x
2. watch the content-disposition-header


Actual Results:  bad:
Content-Disposition: form-data; name="ef00621"; filename="x\x"

worse:
Content-Disposition: form-data; name="ef00621"; filename="x"x"

Expected Results:  Content-Disposition: form-data; name="ef00621"; filename="x\\x"

or

Content-Disposition: form-data; name="ef00621"; filename="x\"x"

see rfc2045/rfc822/rfc2822
-> jkeiser
Assignee: darin → jkeiser
Component: Networking: HTTP → Form Submission
Yep.  Per RFC 822:

     quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or
                                                 ;   quoted chars.
and
     quoted-pair =  "\" CHAR                     ; may quote any char

Per RFC 1521:

   value := token / quoted-string

Per RFC 1806 ("Communicating Presentation Information in Internet Messages: The
Content-Disposition Header"):

        filename-parm := "filename" "=" value;

   `Extension-token', `parameter' and `value' are defined according to
   [RFC 822] and [RFC 1521].
Status: UNCONFIRMED → NEW
Ever confirmed: true
Status: NEW → ASSIGNED
*** Bug 142581 has been marked as a duplicate of this bug. ***
A possible reason for lowering the priority of this bug is that IE5.50 has the 
same bug. Here are some captured headers:
Content-Disposition: form-data; name="thisfile"; 
filename="C:\windows\Desktop\sunzu10.txt"
Content-Disposition: form-data; name="x; "xx"

I wonder what the PHP/perl people do about this problem? It seems like it 
mostly affects CGI authors. In fact, I wonder if there is a secret de-facto 
standard of ignorance?
This is pretty important for CGI authors and pretty easy to write.  I will get
to it soon.  If IE has the same bug, we'll just beat them :)
Priority: -- → P3
Bulk moving P1-P5 un-milestoned bugs to future. 
Target Milestone: --- → Future
jkeiser?  It's been 5 months...  ;)
Ah.  This is what the nsACharFilter stuff is being written for.  I need to get
back to that.  Won't happen for many weeks.
Depends on: 166906
QA Contact: tever → ashishbajpai
QA Contact: ashishbajpai → ashishbhatt
The code involved is in nsFormSubmission.cpp --
nsFSMultipartFormData::AddNameFilePair.

Note also bug 66041 -- the two should probably be fixed together...

Frankly, I doubt efficiency is an issue here.  Correctness is, though.
Blocks: 66041
Keywords: helpwanted
Whiteboard: [good first bug](maybe?)
Depends on: 193439
*** Bug 270670 has been marked as a duplicate of this bug. ***
Summary: no quoting at all in MIME-Headers → no quoting at all in MIME-Headers (Content-Disposition header not escaped)
OS: Linux → All
Hardware: PC → All
This needs a method like nsStyleUtil::escapeCSSString(...) in
layout/style/nsStyleUtil.cpp, doesn't it? Because escaping is very similiar in
many cases, should we implement a generic method for backslash-escaping? Or just
copy?
This needs a method like nsStyleUtil::escapeCSSString(...) in
layout/style/nsStyleUtil.cpp, doesn't it? Because escaping is very similiar in
many cases, should we implement a generic method for backslash-escaping? Or just
copy?
For parameters like 'filename' in C-D header fields, we have to use RFC 2231.
However, there would be an interoperability problem if we did that because most
server-side programs don't understand that. Anyway, mailnews already has the
necessary code (that does escaping 'special characters' and RFC 2231-encoding)
and one way to solve this problem is moving that to necko (as I did with
decoding RFC 2231/2047 a couple of years ago)
Keywords: intl
CSS escaping and RFC 2231 escaping are pretty different beasts...

jshin is right; if we have code already doing this in mailnews we should just
move it into shared code.
Depends on: 295084
*** Bug 315528 has been marked as a duplicate of this bug. ***
*** Bug 312464 has been marked as a duplicate of this bug. ***
*** Bug 345776 has been marked as a duplicate of this bug. ***
QA Contact: ashshbhatt → form-submission
To confirm, Firefox 3.5.6 has the same problem - it incorrectly handles double quotes in filenames.

Theses sort of files do exist. Here's one in my iTunes directory

10 Mrs. Robinson (From The Motion Picture "The Graduate").mp3

In addition, if the filename contains a newline then the newline will be put into the headers, breaking the read of the part. No, filenames shouldn't have newlines, but yes, the special characters should be quoted or even removed.
(In reply to Andrew Dalke from comment #19)
> To confirm, Firefox 3.5.6 has the same problem - it incorrectly handles
> double quotes in filenames.

Fx 11 handles double quotes in quoted-text correctly but does not quote the backslash character.

> In addition, if the filename contains a newline then the newline will be put
> into the headers, breaking the read of the part.

Fx 3.6 does, Fx 11 removes the newline.

> No, filenames shouldn't have newlines, 

The permitted characters in a file's name are defined by the filesystem.
I believe this is actually a specification bug: the HTML5 spec doesn't actually state how multipart/form-data parameters should be encoded. At least that is the opinion of some Webkit developers: https://bugs.webkit.org/show_bug.cgi?id=62107

I've opened an HTML5 specification bug for this issue. I would love it if some Firefox developers would chime in on what the right answer is here: https://www.w3.org/Bugs/Public/show_bug.cgi?id=16909
RFC 2388 covers formatting of multipart/formdata headers.

Section 4.4 specifies that RFC 2231 should be used for the "filename" parameter.

Some experimentation with Firefox indicates that Firefox does not encode filename, and simply uses the character set of the page with the upload form to encode the filename. If the filename cannot be represented in the page's character set, Firefox uses "&#xxxxx;" in the filename parameter, which was surprising.
John, are you still working on this bug? I'm going through old "good first bugs" that have been inactive for a while. 

Is this still valid?
Flags: needinfo?(john)
The main still existing problem as today (Firefox 48) is that we're not escaping backslashes.
Since I'm likely about to touch that code anyway (bug 236084), I'll try to fix this as well.
Assignee: john → g.maone
Flags: needinfo?(john)
Comment on attachment 8743501 [details]
MozReview Request: Bug 136676 - Escape backslashes. r?smaug

baku, could you take a look at this. my review queue is getting long.
I assume html spec defines the right format here, or refers to some RFC
Attachment #8743501 - Flags: review?(bugs) → review?(amarchesini)
Giorgio, are we sure we want to do this? Here why I'm asking.
This is what the spec says:

https://tools.ietf.org/html/rfc7578#section-4.2

No particular escaping is required. I also tested a simple form:

<form method="POST" action="a.html" enctype="multipart/form-data">
<input type="hidden" name="aa&quot;bb\\cc" value="cc&quot;dd" />
<input type="submit" name="GO" />
</form>

We do:

-----------------------------13888476951432543466375544523
Content-Disposition: form-data; name="aa\"bb\\cc"

Blink:

------WebKitFormBoundaryPT456NDS2Q02j1ZJ
Content-Disposition: form-data; name="aa%22bb\\cc"

and I suspect that with your patch we end up doing:

Content-Disposition: form-data; name="aa\"bb\\\\cc"

So, tell me more why you think this patch is needed. Thanks.
Flags: needinfo?(g.maone)
Attachment #8743501 - Flags: review?(amarchesini)
(In reply to Andrea Marchesini (:baku) from comment #27)
> Giorgio, are we sure we want to do this? Here why I'm asking.
> This is what the spec says:
> 
> https://tools.ietf.org/html/rfc7578#section-4.2
> 
> No particular escaping is required.

I agree on the specs being a mess regarding escaping.

Notwithstanding, since we choose to go with backslash escaping of double quote, we should be at least self-consistent and escape the prefix escape character itself, which is a basic requirement for any prefix-based escaping.

Consider the following form field:

<input type="hidden" name="This name will be\&quot; truncated, but only if enctype is multipart/formdata" value="x">

Try to submit it with both the @enctype values we support for forms, and you'll see the most common server implementation to parse very different names depending on the enctype.


> I also tested a simple form:
> and I suspect that with your patch we end up doing:
> 
> Content-Disposition: form-data; name="aa\"bb\\\\cc"
> 
 
which is the correct representation of 'aa";bb\\cc' under self-consistent backslash prefix escaping rules, and is correctly parsed by PHP at least, producing the same result no matter the enctype. In facts, currently the name of your field is parsed by PHP scripts as 'aa"bb\cc' for enctype="multipart/formdata", but as 'aa"bb\\cc' otherwise. We might argue it's a PHP bug, but we're definitely not self-consistent about escaping anyway.

If you concern is interoperability or compatibility with other browsers, that's even messier, since they use a mix of percent encoding and ad-hoc rules: Edge for instance seems to escape the . (dot) character as _ (underscore) in multipart header subfields.
Flags: needinfo?(g.maone)
Anne, what do you think about this issue? Here a map of browsers and how the "aa"bb\cc" name is escaped:

Blink/Safari  => "aa%22bb\cc"
Firefox       => "aa\"bb\cc"
Edge          => I don't know yet. I need to test it.

With the Giorgio's patch we will escape that string as: "aa\"bb\\cc".
Another way to do it would be to escape it as: "aa%22bb%5ccc" or "aa\"bb%5ccc".
Flags: needinfo?(annevk)
I'm a little bit confused. Does a single backslash become a double backslash in any implementation? Using percent-encoding here seems more logical if other implementations do that (and matches the other form submission format).

(Evan, apologies the standards situation is still not any better. I had hopes for the new RFC, but it seems like HTML should just define this format itself at some point, once we have some spare cycles.)
Flags: needinfo?(annevk)
Component: HTML: Form Submission → DOM: Core & HTML
Keywords: good-first-bug
Whiteboard: [good first bug](maybe?)

This good-first-bug hasn't had any activity for 6 months, it is automatically unassigned.
For more information, please visit auto_nag documentation.

Assignee: g.maone → nobody
Status: ASSIGNED → NEW

The handling of escapes in multipart/form-data names and filenames was recently specified in https://github.com/whatwg/html/pull/6282 to match Chrome and Safari's behavior. See bug 1686765.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: