Mangled utf8 names prevent file upload [form sub]




19 years ago
3 months ago


(Reporter: lcollins, Assigned: Ehsan)


(Depends on 1 bug, {intl})

Dependency tree / graph

Firefox Tracking Flags

(Not tracked)


(Whiteboard: [good first bug], )



19 years ago
This problem occurs on all versions of Netscape up the latest Mozilla

1. Create two asp files (This is the simplest test since we can't provide our 

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<TITLE>File Upload Test</TITLE>


<form method="POST" action="upload.asp" enctype="multipart/form-data">
  <input name="test" type="file"><br>
  <input type="submit" name="submit">



And a text file with a non-ASCII name. I used &#945;&#946;&#947;.txt, with the contents "Alpha 
Beta Gamma". The contents don't matter, just the name.

2. Place the enclosed files (test.asp, upload.asp, &#945;&#946;&#947;.txt) in the same folder. 
3. Start -> Programs -> NT 4.0 Option Pack -> Microsoft Personal Webserver -> 
Internet Service Manager
To make the above a virtual folder

4. Load http://<YourPath>Test.asp file in Mozilla. Browse to select the &#945;&#946;&#947;.txt 
file (these are the greek letters "alpha", "beta", and "gamma"). The file name 
doesn't really matter as long as it is not ASCII. We have found this problem in 
all scripts. This will also happen if there are any non-ASCII text characters 
anywhere on the path. Notice that the file name is garbled in the field.

4. Click "Submit Query".

The next page will be the results of the submission. What you should see (at 
least on IE) is something like 

-----------------------------7d13bd2a390236 Content-Disposition: form-data; 
name="test"; filename="<YourPath>\αβγ.txt" Content-Type: text/plain Alpha 
Beta Gamma -----------------------------7d13bd2a390236 Content-Disposition: 
form-data; name="submit" Submit Query -----------------------------

Instead, you see.

-----------------------------236153271123998 Content-Disposition: form-data; 
name="test"; filename="<YourPath>\aß?.txt" -----------------------------
Content-Disposition: form-data; name="submit" Submit Query ---------------------

Note that the file name is mangled (not recognizable as any encoding) and the 
file content has not been found.

Comment 1

19 years ago
setting bug status to New

Comment 2

19 years ago
Ever confirmed: true

Comment 3

19 years ago
Related to this is that filenames containing a " character are not properly escaped.

eg. the file name 'badfile".txt' returns a content disposition header of
Content-Disposition: form-data; name="uploadfield"; filename="badfile".txt"

when it should be:
Content-Disposition: form-data; name="uploadfield"; filename="badfile\".txt"


Comment 4

19 years ago
reassigning to guru of uploading
Assignee: rods → pollmann

Comment 5

19 years ago
I don't know if RFC 2047 is applicable here, covering encoding of message headers?


Comment 6

19 years ago
I'm not sure that RFC 2047 is relevant here. If you run the enclosed test, 
which prints the header and body of the POST, you will see that the problem is 
that Netscape fails to find the file when it creates the content for the POST. 
It should show up in the body. The file contains ASCII, so the encoding of the 
body should not be an issue. It would be useful to look into the Netscape code 
that actually tries to find the file when it generates the body.

Comment 7

19 years ago
setting TFM to mozilla0.9.1 as part of the bug triage
Target Milestone: --- → mozilla0.9.1

Comment 8

19 years ago
QA Contact Update
QA Contact: bsharma → vladimire
Target Milestone: mozilla0.9.1 → Future
Bulk reassigning Eric Pollmann's remaining form submission bugs to Alex.
Assignee: pollmann → alexsavulov


18 years ago
Summary: Mangled utf8 names prevent file upload → Mangled utf8 names prevent file upload[from sub]


18 years ago
Summary: Mangled utf8 names prevent file upload[from sub] → Mangled utf8 names prevent file upload[form sub]


18 years ago
Priority: -- → P4
->Form Submission
Component: HTML Form Controls → Form Submission

Comment 11

17 years ago
On 2002072204 (1.1beta) on W2k(US):
In an <input type="file" enctype="multipart/form-data"> form element on a page
encoded with utf-8:

If I use the "Browse" button to locate a file with Japanese characters in the
filename, the Japanese characters are replaced with ? question mark characters.
This appears in the UI form field the user sees as well as the in the
transmitted Content-disposition filename= field parameter that the server receives.
(If the file path is wider than the UI field, the ??'s may not be visible.)

If I correct the ? question marks by typing in the UI field to replace them with
the correct kanji characters, they are replaced with _ underscore characters.

In both cases the filename is mangled, and as a result no file with the mangled
filename is found when the form is submitted.

No error is reported when the upload file is not found; instead an empty file
with the mangled filename is transmitted (bug 82634).

But the file would be found if Mozilla didn't mangle the filename.

[IE successfully transmits the file, and encodes the name using UTF-8 charset
(native w2k encoding may be UTF-16).]

I think rfc2047 is appropriate, and comment #6 is mistaken.  Rfc2047 is
applicable to headers, and the filename appears in a MIME header
(Content-disposition:) in a part of the multipart/form-data body (RFC 1867, RFC
2388).  It is true that file was not transmitted, but as noted above, it was not
transmitted because it was not found (bug 82634), and it was not found because
the name was mangled (this bug).

Comment 12

17 years ago
The RFC 1867 (file upload) states:
"The client application should make best effort to supply the file name; if the
file name of the client's operating system is not in US-ASCII, the file name
might be approximated or encoded using the method of RFC 1522."
The RFC 2047 being an update on the RFC 1522, it seems to be appropriate.

Comment 13

17 years ago
I'm afraid RFC 1867 is too outdated. It's not possible
to use RFC 2047-style(1522-style) encoding for
parameters of mail headers(by extension, http headers)
while abiding by RFC 822(STD 11). That's why they
came up with RFC 2231-style(2184-style) encoding
for parameters like 'filename' in C-D header. So what
Mozilla should do in this case is to use RFC 2231-style
encoding. BTW, Mozilla-mail uses RFC 2047-style
encoding for attachment instead of RFC 2231-style encoding and this
has to be fixed,too. (I found this bug while checking
if mail attachment has already been filed.)

Comment 14

16 years ago
See bug 213628 for a similar problem.
not a blocker.
Severity: blocker → major
See sFSMultipartFormData::AddNameFilePair.  Perhaps we need to do something
instead of (or in addition to?) ProcessAndEncode() there?
Assignee: alexsavulov → form-submission
Keywords: intl
OS: Windows NT → All
Priority: P4 → --
QA Contact: vladimire
Hardware: PC → All
Target Milestone: Future → ---
Keywords: helpwanted
Whiteboard: [good first bug]

Comment 17

15 years ago
(In reply to comment #3)

e.g. Bug 185863

QA Contact: ian
Summary: Mangled utf8 names prevent file upload[form sub] → Mangled utf8 names prevent file upload [form sub]

Comment 18

15 years ago
There are two problems here. One is Windows-specific (in comment #0, "non-ASCII"
should be replaced "characters outside the repertoire of the current default
locale") and the other is not honoring RFC 2231. The latter part is a dupe of
136676 in a sense and depends on 193439 to some degree. The former part depends
on bug 162361. 
Depends on: 162361

Comment 19

13 years ago
This is same as bug 273225 ?

Comment 20

13 years ago
*** Bug 273225 has been marked as a duplicate of this bug. ***

Comment 21

13 years ago
*** Bug 303852 has been marked as a duplicate of this bug. ***

Comment 22

13 years ago
The test cases seem to work fine in Firefox 2 beta 1 (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1b1) Gecko/20060710 Firefox/2.0b1).  Perhaps this bug needs to be closed?


13 years ago
Assignee: form-submission → ehsan.akhgari
Duplicate of this bug: 377615

Comment 24

12 years ago
Based on comment 22, and since no one has been able to reproduce it since that time, I'm closing this as WORKSFORME.
Closed: 12 years ago
Resolution: --- → WORKSFORME


12 years ago
Keywords: verifyme

Comment 25

12 years ago
(In reply to comment #24)
> Based on comment 22, and since no one has been able to reproduce it since that
> time, I'm closing this as WORKSFORME.

Please try it on Mac OS X as cited in bug 377615, I'm able to reproduce this bug repeatelly in Firefox 2.0 on my Mac and not in Firefox 1.5 using filename with czech national chars like "ěščřžýáíé".

According to User-Agent cited in comment #22 I'm not sure it was tested on Mac by closing this bug.


11 years ago
Keywords: helpwanted, verifyme
Component: HTML: Form Submission → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.