content-disposition filename parameter decoder apparently sniffs for UTF-8

NEW
Unassigned

Status

()

P5
normal
9 years ago
a month ago

People

(Reporter: julian.reschke, Unassigned)

Tracking

(Blocks: 1 bug)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [necko-would-take], URL)

(Reporter)

Description

9 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 (.NET CLR 3.5.30729)

The code that decodes the filename parameter in the Content-Disposition header field apparently tries to sniff the character encoding. This isn't backed by any spec, and also disagrees with Internet Explorer, Opera, and Konqueror. (Webkit appears to agree).

Reproducible: Always

Steps to Reproduce:
1. Run the test at http://greenbytes.de/tech/tc2231/#attwithutf8fnplain

Actual Results:  
Detects filename "foo-ä.html".

Expected Results:  
Should detect filename "foo-ä.html".

See http://greenbytes.de/tech/tc2231/#attwithutf8fnplain
You have to either use 7-bit ASCII or specify a charset in the disposition header.
(rfc2231). isn't ä outside of 7bit ASCII ?
Component: General → File Handling
Product: Firefox → Core
QA Contact: general → file-handling
Version: unspecified → 1.9.2 Branch
(Reporter)

Comment 2

9 years ago
(In reply to comment #1)
> You have to either use 7-bit ASCII or specify a charset in the disposition
> header.

Not really. HTTP allows ISO-8859-1 as well.

So, in the absence of RFC2231 encoding, the client should assume ISO-8859-1 (as seen in <http://greenbytes.de/tech/tc2231/#attwithisofnplain>), but not just switch to encoding sniffing.

> (rfc2231). isn't ä outside of 7bit ASCII ?
So the issue here is that nsMIMEHeaderParamImpl::GetParameter calls nsUTF8ConverterService::ConvertStringToUTF8 with aSkipCheck == false, which triggers this code:

92   // return if ASCII only or valid UTF-8 providing that the ASCII/UTF-8
93   // check is requested. It may not be asked for if a caller suspects
94   // that the input is in non-ASCII 7bit charset (ISO-2022-xx, HZ) or 
95   // it's in a charset other than UTF-8 that can be mistaken for UTF-8.
96   if (!aSkipCheck && (IsASCII(aString) || IsUTF8(aString))) {
97     aUTF8String = aString;
98     return NS_OK;
99   }

This was done quite on purpose; see bug 162765 comment 3.  Sadly, that bug doesn't say _why_ this was done, but I would expect site compat issues... except it doesn't match IE.  Does it match Outlook or whatnot?  This code is shared by mail and HTTP.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Version: 1.9.2 Branch → Trunk
Component: File Handling → Networking
QA Contact: file-handling → networking
(Reporter)

Comment 4

9 years ago
(In reply to comment #3)
> This was done quite on purpose; see bug 162765 comment 3.  Sadly, that bug
> doesn't say _why_ this was done, but I would expect site compat issues...

As you can tell from my test cases, I'm trying to find what today's UA have in common. This is something IE and Opera do not do, so I'd be very surprised if it was needed in practice.

> except it doesn't match IE.  Does it match Outlook or whatnot?  This code is
> shared by mail and HTTP.

Oops. That would complicate things. My understanding was that this code was *inherited* from mail, but separately maintained.
It was moved from mail into shared network code; the old mail code was deleted and mail switched to use this code.

This code is also used for non-HTTP protocols that might happen to use parameters in headers (e.g. we plan to switch data: to it).
And multipart/mixed parts already use it, of course.
(Reporter)

Comment 7

9 years ago
(In reply to comment #5)
> It was moved from mail into shared network code; the old mail code was deleted
> and mail switched to use this code.
> 
> This code is also used for non-HTTP protocols that might happen to use
> parameters in headers (e.g. we plan to switch data: to it).

Aha, I was mislead by the comments in the code then.

In theory this is good; all header fields that use this type of ABNF should go through the same parser (so, optimally, this would also apply to Content-Type, Link, etc).

Of course this is a change that's not appropriate at this point of time (pre-FF4).

Updated

8 years ago
Blocks: 609667
(Reporter)

Updated

8 years ago
OS: Windows 7 → All
Hardware: x86 → All
Whiteboard: [necko-would-take]

Comment 9

a month ago

Just a note to say this still occurs with FF 65.

You need to log in before you can comment on or make changes to this bug.