Closed Bug 66041 Opened 24 years ago Closed 17 years ago

Mangled utf8 names prevent file upload [form sub]

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
major

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: lcollins, Assigned: ehsan.akhgari)

References

(Depends on 1 open bug, )

Details

(Keywords: intl, Whiteboard: [good first bug])

This problem occurs on all versions of Netscape up the latest Mozilla

1. Create two asp files (This is the simplest test since we can't provide our 
product):

"test.asp"
=========================
<HTML>
<HEAD>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<TITLE>File Upload Test</TITLE>
</HEAD>

<BODY><BR><BR>

<form method="POST" action="upload.asp" enctype="multipart/form-data">
  <input name="test" type="file"><br>
  <input type="submit" name="submit">
</form>


</BODY></HTML>

"upload.asp"
=========================
<h2>content</h2>
<%
  Response.BinaryWrite(Request.BinaryRead(Request.TotalBytes))
%>
=========================

And a text file with a non-ASCII name. I used &#945;&#946;&#947;.txt, with the contents "Alpha 
Beta Gamma". The contents don't matter, just the name.


2. Place the enclosed files (test.asp, upload.asp, &#945;&#946;&#947;.txt) in the same folder. 
3. Start -> Programs -> NT 4.0 Option Pack -> Microsoft Personal Webserver -> 
Internet Service Manager
To make the above a virtual folder

4. Load http://<YourPath>Test.asp file in Mozilla. Browse to select the &#945;&#946;&#947;.txt 
file (these are the greek letters "alpha", "beta", and "gamma"). The file name 
doesn't really matter as long as it is not ASCII. We have found this problem in 
all scripts. This will also happen if there are any non-ASCII text characters 
anywhere on the path. Notice that the file name is garbled in the field.

4. Click "Submit Query".

The next page will be the results of the submission. What you should see (at 
least on IE) is something like 

-----------------------------7d13bd2a390236 Content-Disposition: form-data; 
name="test"; filename="<YourPath>\αβγ.txt" Content-Type: text/plain Alpha 
Beta Gamma -----------------------------7d13bd2a390236 Content-Disposition: 
form-data; name="submit" Submit Query -----------------------------
7d13bd2a390236-- 


Instead, you see.

-----------------------------236153271123998 Content-Disposition: form-data; 
name="test"; filename="<YourPath>\aß?.txt" -----------------------------
236153271123998
Content-Disposition: form-data; name="submit" Submit Query ---------------------
--------236153271123998-- 

Note that the file name is mangled (not recognizable as any encoding) and the 
file content has not been found.
setting bug status to New
.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Related to this is that filenames containing a " character are not properly escaped.

eg. the file name 'badfile".txt' returns a content disposition header of
Content-Disposition: form-data; name="uploadfield"; filename="badfile".txt"

when it should be:
Content-Disposition: form-data; name="uploadfield"; filename="badfile\".txt"

hth
vittal
reassigning to guru of uploading
Assignee: rods → pollmann
I don't know if RFC 2047 is applicable here, covering encoding of message headers?
http://rfc.fh-koeln.de/rfc/html/rfc2047.html

vittal
I'm not sure that RFC 2047 is relevant here. If you run the enclosed test, 
which prints the header and body of the POST, you will see that the problem is 
that Netscape fails to find the file when it creates the content for the POST. 
It should show up in the body. The file contains ASCII, so the encoding of the 
body should not be an issue. It would be useful to look into the Netscape code 
that actually tries to find the file when it generates the body.
setting TFM to mozilla0.9.1 as part of the bug triage
Target Milestone: --- → mozilla0.9.1
QA Contact Update
QA Contact: bsharma → vladimire
Target Milestone: mozilla0.9.1 → Future
Bulk reassigning Eric Pollmann's remaining form submission bugs to Alex.
Assignee: pollmann → alexsavulov
Summary: Mangled utf8 names prevent file upload → Mangled utf8 names prevent file upload[from sub]
Summary: Mangled utf8 names prevent file upload[from sub] → Mangled utf8 names prevent file upload[form sub]
Priority: -- → P4
->Form Submission
Component: HTML Form Controls → Form Submission
On 2002072204 (1.1beta) on W2k(US):
In an <input type="file" enctype="multipart/form-data"> form element on a page
encoded with utf-8:

If I use the "Browse" button to locate a file with Japanese characters in the
filename, the Japanese characters are replaced with ? question mark characters.
This appears in the UI form field the user sees as well as the in the
transmitted Content-disposition filename= field parameter that the server receives.
(If the file path is wider than the UI field, the ??'s may not be visible.)

If I correct the ? question marks by typing in the UI field to replace them with
the correct kanji characters, they are replaced with _ underscore characters.

In both cases the filename is mangled, and as a result no file with the mangled
filename is found when the form is submitted.

No error is reported when the upload file is not found; instead an empty file
with the mangled filename is transmitted (bug 82634).

But the file would be found if Mozilla didn't mangle the filename.

[IE successfully transmits the file, and encodes the name using UTF-8 charset
(native w2k encoding may be UTF-16).]

I think rfc2047 is appropriate, and comment #6 is mistaken.  Rfc2047 is
applicable to headers, and the filename appears in a MIME header
(Content-disposition:) in a part of the multipart/form-data body (RFC 1867, RFC
2388).  It is true that file was not transmitted, but as noted above, it was not
transmitted because it was not found (bug 82634), and it was not found because
the name was mangled (this bug).
The RFC 1867 (file upload) states:
"The client application should make best effort to supply the file name; if the
file name of the client's operating system is not in US-ASCII, the file name
might be approximated or encoded using the method of RFC 1522."
The RFC 2047 being an update on the RFC 1522, it seems to be appropriate.
I'm afraid RFC 1867 is too outdated. It's not possible
to use RFC 2047-style(1522-style) encoding for
parameters of mail headers(by extension, http headers)
while abiding by RFC 822(STD 11). That's why they
came up with RFC 2231-style(2184-style) encoding
for parameters like 'filename' in C-D header. So what
Mozilla should do in this case is to use RFC 2231-style
encoding. BTW, Mozilla-mail uses RFC 2047-style
encoding for attachment instead of RFC 2231-style encoding and this
has to be fixed,too. (I found this bug while checking
if mail attachment has already been filed.)
See bug 213628 for a similar problem.
not a blocker.
Severity: blocker → major
See sFSMultipartFormData::AddNameFilePair.  Perhaps we need to do something
instead of (or in addition to?) ProcessAndEncode() there?
Assignee: alexsavulov → form-submission
Keywords: intl
OS: Windows NT → All
Priority: P4 → --
QA Contact: vladimire
Hardware: PC → All
Target Milestone: Future → ---
Keywords: helpwanted
Whiteboard: [good first bug]
Depends on: 136676
(In reply to comment #3)

e.g. Bug 185863

QA Contact: ian
Summary: Mangled utf8 names prevent file upload[form sub] → Mangled utf8 names prevent file upload [form sub]
There are two problems here. One is Windows-specific (in comment #0, "non-ASCII"
should be replaced "characters outside the repertoire of the current default
locale") and the other is not honoring RFC 2231. The latter part is a dupe of
136676 in a sense and depends on 193439 to some degree. The former part depends
on bug 162361. 
Depends on: 162361
This is same as bug 273225 ?
*** Bug 273225 has been marked as a duplicate of this bug. ***
*** Bug 303852 has been marked as a duplicate of this bug. ***
The test cases seem to work fine in Firefox 2 beta 1 (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1b1) Gecko/20060710 Firefox/2.0b1).  Perhaps this bug needs to be closed?
Status: NEW → ASSIGNED
Assignee: form-submission → ehsan.akhgari
Status: ASSIGNED → NEW
Based on comment 22, and since no one has been able to reproduce it since that time, I'm closing this as WORKSFORME.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → WORKSFORME
Keywords: verifyme
(In reply to comment #24)
> Based on comment 22, and since no one has been able to reproduce it since that
> time, I'm closing this as WORKSFORME.

Please try it on Mac OS X as cited in bug 377615, I'm able to reproduce this bug repeatelly in Firefox 2.0 on my Mac and not in Firefox 1.5 using filename with czech national chars like "ěščřžýáíé".

According to User-Agent cited in comment #22 I'm not sure it was tested on Mac by closing this bug.
Keywords: helpwanted, verifyme
Component: HTML: Form Submission → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.