Open
Bug 588409
Opened 14 years ago
Updated 2 years ago
content-disposition filename parameter decoder apparently sniffs for UTF-8
Categories
(Core :: Networking, defect, P5)
Core
Networking
Tracking
()
NEW
People
(Reporter: julian.reschke, Unassigned)
References
(Blocks 1 open bug, )
Details
(Whiteboard: [necko-would-take])
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 (.NET CLR 3.5.30729)
The code that decodes the filename parameter in the Content-Disposition header field apparently tries to sniff the character encoding. This isn't backed by any spec, and also disagrees with Internet Explorer, Opera, and Konqueror. (Webkit appears to agree).
Reproducible: Always
Steps to Reproduce:
1. Run the test at http://greenbytes.de/tech/tc2231/#attwithutf8fnplain
Actual Results:
Detects filename "foo-ä.html".
Expected Results:
Should detect filename "foo-ä.html".
See http://greenbytes.de/tech/tc2231/#attwithutf8fnplain
Comment 1•14 years ago
|
||
You have to either use 7-bit ASCII or specify a charset in the disposition header.
(rfc2231). isn't ä outside of 7bit ASCII ?
Component: General → File Handling
Product: Firefox → Core
QA Contact: general → file-handling
Version: unspecified → 1.9.2 Branch
Reporter | ||
Comment 2•14 years ago
|
||
(In reply to comment #1)
> You have to either use 7-bit ASCII or specify a charset in the disposition
> header.
Not really. HTTP allows ISO-8859-1 as well.
So, in the absence of RFC2231 encoding, the client should assume ISO-8859-1 (as seen in <http://greenbytes.de/tech/tc2231/#attwithisofnplain>), but not just switch to encoding sniffing.
> (rfc2231). isn't ä outside of 7bit ASCII ?
Comment 3•14 years ago
|
||
So the issue here is that nsMIMEHeaderParamImpl::GetParameter calls nsUTF8ConverterService::ConvertStringToUTF8 with aSkipCheck == false, which triggers this code:
92 // return if ASCII only or valid UTF-8 providing that the ASCII/UTF-8
93 // check is requested. It may not be asked for if a caller suspects
94 // that the input is in non-ASCII 7bit charset (ISO-2022-xx, HZ) or
95 // it's in a charset other than UTF-8 that can be mistaken for UTF-8.
96 if (!aSkipCheck && (IsASCII(aString) || IsUTF8(aString))) {
97 aUTF8String = aString;
98 return NS_OK;
99 }
This was done quite on purpose; see bug 162765 comment 3. Sadly, that bug doesn't say _why_ this was done, but I would expect site compat issues... except it doesn't match IE. Does it match Outlook or whatnot? This code is shared by mail and HTTP.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Version: 1.9.2 Branch → Trunk
Updated•14 years ago
|
Component: File Handling → Networking
QA Contact: file-handling → networking
Reporter | ||
Comment 4•14 years ago
|
||
(In reply to comment #3)
> This was done quite on purpose; see bug 162765 comment 3. Sadly, that bug
> doesn't say _why_ this was done, but I would expect site compat issues...
As you can tell from my test cases, I'm trying to find what today's UA have in common. This is something IE and Opera do not do, so I'd be very surprised if it was needed in practice.
> except it doesn't match IE. Does it match Outlook or whatnot? This code is
> shared by mail and HTTP.
Oops. That would complicate things. My understanding was that this code was *inherited* from mail, but separately maintained.
Comment 5•14 years ago
|
||
It was moved from mail into shared network code; the old mail code was deleted and mail switched to use this code.
This code is also used for non-HTTP protocols that might happen to use parameters in headers (e.g. we plan to switch data: to it).
Comment 6•14 years ago
|
||
And multipart/mixed parts already use it, of course.
Reporter | ||
Comment 7•14 years ago
|
||
(In reply to comment #5)
> It was moved from mail into shared network code; the old mail code was deleted
> and mail switched to use this code.
>
> This code is also used for non-HTTP protocols that might happen to use
> parameters in headers (e.g. we plan to switch data: to it).
Aha, I was mislead by the comments in the code then.
In theory this is good; all header fields that use this type of ABNF should go through the same parser (so, optimally, this would also apply to Content-Type, Link, etc).
Of course this is a change that's not appropriate at this point of time (pre-FF4).
Reporter | ||
Updated•13 years ago
|
OS: Windows 7 → All
Hardware: x86 → All
Updated•9 years ago
|
Whiteboard: [necko-would-take]
Comment 8•7 years ago
|
||
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P5
Comment 10•4 years ago
|
||
Are there test cases for this? What do Chrome and Safari do? If WebKit does this chances are both have the same behavior and in that case we cannot change this and the specification should change.
Reporter | ||
Comment 11•4 years ago
|
||
A test is over here: http://test.greenbytes.de/tech/tc2231/#attwithutf8fnplain, but the test results have not been updated in ages. It seems all browsers that passed this test are now dead.
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•