Open Bug 968142 Opened 7 years ago Updated 2 years ago

Unconventional normalization form for uploaded filename on OSX

Categories

(Core :: DOM: Core & HTML, defect)

27 Branch
x86
macOS
defect
Not set
normal

Tracking

()

UNCONFIRMED

People

(Reporter: sheerun, Unassigned, NeedInfo)

Details

Attachments

(1 file)

## Steps to reproduce:

- Create file named żółć.pdf (Polish characters)
- Upload this file (for example on http://blueimp.github.io/jQuery-File-Upload/basic.html)
- Check Network tab in developer tools, open POST request view, click "Edit and Resend"
- View uploaded file name in request body sent by firefox

## What happens:

Content-disposition header says:

Content-Disposition: form-data; name="files[]"; filename="zÌoÌÅcÌ.pdf"

## What should happen:

Content-disposition header should say:

Content-Disposition: form-data; name="files[]"; filename="żóÅÄ.pdf"

## What happened?

Firefox encodes filename as precomposed characters flavour OF UTF-8 (https://en.wikipedia.org/wiki/Precomposed_character) instead of combining character (https://en.wikipedia.org/wiki/Combining_character)

## What other browsers do?

Almost all I checked (IE9, Safari, Opera, Chrome, even Firefox 27 on Windows) send filename with combining character encoding.

## Why it hurts me?

My JS library for uploading files on s3 using CORS expects each browser will send the same filename to s3, so it can know how file will be named on server beforehand.

Firefox sends filename encoded in different way, so it is uploaded in different place than in other browsers.

Here is relevant issue: https://github.com/sheerun/s3_file_field/pull/15
This happens also on JavaScript level.

Firefox 27 on Windows: 

encodeURIComponent(file.name)
# => "%C5%BC%C3%B3%C5%82%C4%87.pdf"

Firefox 27 on OSX:

encodeURIComponent(file.name)
# => "z%CC%87o%CC%81%C5%82c%CC%81.pdf"
Chrome & Safari use precomposed characters in JS, but send them as combined anyway:

https://code.google.com/p/chromium/issues/detail?id=341019&thanks=341019&ts=1391605530

The default encoding of strings for all browsers is combined one:

encodeURIComponent("żółć.pdf")
# => "%C5%BC%C3%B3%C5%82%C4%87.pdf"

Also there are unexpected behaviours when dealing with precomposed characters. 

Observe:

file.name
# => "żółć.pdf"
file.name[0]
# =>"z"
file.name.length
# =>11

But:

"żółć.pdf"[0]
# =>"ż"
"żółć.pdf".length
# =>8
HFS+ conventions strike again. :-(

We need to make sure to expose file names as NFC-normalized in JS and in form submission.
Summary: Wrong encoding for uploaded filename on OSX → Unconventional normalization form for uploaded filename on OSX
Attached image ScreenShot.png
With latest Nightly 31.0a1 on MAC OS X 10.9 I'm getting what is shown in the attached screenshot. 

Adam, is this what you are also seeing?
Flags: needinfo?(sheerun)
Component: File Handling → Networking: File
Product: Firefox → Core
From the information here and in the related bugs it's not clear to me whether the problem is in XPCOM file code or in necko code and where it should be fixed. Dragana, since you're one of few necko people who have a MacOS, could you please have a look at it?
Assignee: nobody → dd.mozilla
Whiteboard: [necko-active]
This can be fix at line: https://dxr.mozilla.org/mozilla-central/source/dom/html/HTMLFormSubmission.cpp#537

But i think that is not the right place, the file name that is received from blob is already wrong encoded. I will move the bug xpcom.
Assignee: dd.mozilla → nobody
Component: Networking: File → XPCOM
XPCOM exposes exactly what comes from the file system and is not the right layer to solve this DOM-level problem. If you want to normalize the representation, it needs to be a layer up at the blob/DOM code.
Component: XPCOM → HTML: Form Submission
Whiteboard: [necko-active]
Component: HTML: Form Submission → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.