588409 - content-disposition filename parameter decoder apparently sniffs for UTF-8

Reporter

Description

•

14 years ago

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 (.NET CLR 3.5.30729)

The code that decodes the filename parameter in the Content-Disposition header field apparently tries to sniff the character encoding. This isn't backed by any spec, and also disagrees with Internet Explorer, Opera, and Konqueror. (Webkit appears to agree).

Reproducible: Always

Steps to Reproduce:
1. Run the test at http://greenbytes.de/tech/tc2231/#attwithutf8fnplain

Actual Results:  
Detects filename "foo-ä.html".

Expected Results:  
Should detect filename "foo-Ã¤.html".

See http://greenbytes.de/tech/tc2231/#attwithutf8fnplain

Matthias Versen [:Matti]

Comment 1

•

14 years ago

You have to either use 7-bit ASCII or specify a charset in the disposition header.
(rfc2231). isn't Ã¤ outside of 7bit ASCII ?

Component: General → File Handling

Product: Firefox → Core

QA Contact: general → file-handling

Version: unspecified → 1.9.2 Branch

Julian Reschke

Reporter

Comment 2

•

14 years ago

(In reply to comment #1)
> You have to either use 7-bit ASCII or specify a charset in the disposition
> header.

Not really. HTTP allows ISO-8859-1 as well.

So, in the absence of RFC2231 encoding, the client should assume ISO-8859-1 (as seen in <http://greenbytes.de/tech/tc2231/#attwithisofnplain>), but not just switch to encoding sniffing.

> (rfc2231). isn't Ã¤ outside of 7bit ASCII ?

Boris Zbarsky [:bzbarsky]

Comment 3

•

14 years ago

So the issue here is that nsMIMEHeaderParamImpl::GetParameter calls nsUTF8ConverterService::ConvertStringToUTF8 with aSkipCheck == false, which triggers this code:

92   // return if ASCII only or valid UTF-8 providing that the ASCII/UTF-8
93   // check is requested. It may not be asked for if a caller suspects
94   // that the input is in non-ASCII 7bit charset (ISO-2022-xx, HZ) or 
95   // it's in a charset other than UTF-8 that can be mistaken for UTF-8.
96   if (!aSkipCheck && (IsASCII(aString) || IsUTF8(aString))) {
97     aUTF8String = aString;
98     return NS_OK;
99   }

This was done quite on purpose; see bug 162765 comment 3.  Sadly, that bug doesn't say _why_ this was done, but I would expect site compat issues... except it doesn't match IE.  Does it match Outlook or whatnot?  This code is shared by mail and HTTP.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Version: 1.9.2 Branch → Trunk

Boris Zbarsky [:bzbarsky]

Updated

•

14 years ago

Component: File Handling → Networking

QA Contact: file-handling → networking

Julian Reschke

Reporter

Comment 4

•

14 years ago

(In reply to comment #3)
> This was done quite on purpose; see bug 162765 comment 3.  Sadly, that bug
> doesn't say _why_ this was done, but I would expect site compat issues...

As you can tell from my test cases, I'm trying to find what today's UA have in common. This is something IE and Opera do not do, so I'd be very surprised if it was needed in practice.

> except it doesn't match IE.  Does it match Outlook or whatnot?  This code is
> shared by mail and HTTP.

Oops. That would complicate things. My understanding was that this code was *inherited* from mail, but separately maintained.

Boris Zbarsky [:bzbarsky]

Comment 5

•

14 years ago

It was moved from mail into shared network code; the old mail code was deleted and mail switched to use this code.

This code is also used for non-HTTP protocols that might happen to use parameters in headers (e.g. we plan to switch data: to it).

Boris Zbarsky [:bzbarsky]

Comment 6

•

14 years ago

And multipart/mixed parts already use it, of course.

Julian Reschke

Reporter

Comment 7

•

14 years ago

(In reply to comment #5)
> It was moved from mail into shared network code; the old mail code was deleted
> and mail switched to use this code.
> 
> This code is also used for non-HTTP protocols that might happen to use
> parameters in headers (e.g. we plan to switch data: to it).

Aha, I was mislead by the comments in the code then.

In theory this is good; all header fields that use this type of ABNF should go through the same parser (so, optimally, this would also apply to Content-Type, Link, etc).

Of course this is a change that's not appropriate at this point of time (pre-FF4).

Jason Duell

Updated

•

13 years ago

Blocks: 609667

Julian Reschke

Reporter

Updated

•

13 years ago

OS: Windows 7 → All

Hardware: x86 → All

Chris Peterson [:cpeterson]

Updated

•

9 years ago

Blocks: 1200643

Patrick McManus [:mcmanus]

Updated

•

8 years ago

Whiteboard: [necko-would-take]

Firefox Bug Husbandry Bot

Comment 8

•

7 years ago

Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258

Priority: -- → P5

Sean

Comment 9

•

5 years ago

Just a note to say this still occurs with FF 65.

Anne (:annevk)

Comment 10

•

4 years ago

Are there test cases for this? What do Chrome and Safari do? If WebKit does this chances are both have the same behavior and in that case we cannot change this and the specification should change.

Julian Reschke

Reporter

Comment 11

•

4 years ago

A test is over here: http://test.greenbytes.de/tech/tc2231/#attwithutf8fnplain, but the test results have not been updated in ages. It seems all browsers that passed this test are now dead.

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Bugzilla

Quick Search

content-disposition filename parameter decoder apparently sniffs for UTF-8

Categories

(Core :: Networking, defect, P5)

Tracking

()

People

(Reporter: julian.reschke, Unassigned)

References

(Blocks 1 open bug,
URL
)

Details

(Whiteboard: [necko-would-take])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Updated

Updated

Updated

Comment 8

Comment 9

Comment 10

Comment 11

Updated