The default bug view has changed. See this FAQ.

RFC2231/5987 encoding: charset information should be treated as authoritative

RESOLVED FIXED in mozilla15

Status

()

Core
Networking
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: Julian Reschke, Assigned: Julian Reschke)

Tracking

(Blocks: 1 bug)

unspecified
mozilla15
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment, 3 obsolete attachments)

(Assignee)

Description

6 years ago
Test case <http://greenbytes.de/tech/tc2231/#attwithfn2231utf8-bad>:

  Content-Disposition: attachment; filename*=iso-8859-1''foo-%c3%a4-%e2%82%ac.html

After percent-unescaping, Firefox apparently decides that this is more likely to be UTF-8. Other UAs either ignore the parameter (good, as it contains a non-ISO-8859-1 octet) or treat it as in a superset-encoding of ISO-8859-1 (where %82 represents a valid code point).

Optimally, FF would align with IE9's and Konqueror's strict behavior.
(Assignee)

Updated

6 years ago
Assignee: nobody → julian.reschke
(Assignee)

Comment 1

6 years ago
Created attachment 567559 [details] [diff] [review]
Proposed patch, incl test cases

This patch is big considering the change of behavior, so below a few explanations.

Functional changes:

- DecodeParameter() always skips the check, we trust the charset as obtained from the 2231/5987 format

- Before returning data from GetParameterInternal(), if we do have a charset we always check that the octet sequence decodes according to the character encoding, otherwise we ignore the parameter

To get there, I made a few more changes:

- rather than short-circuiting, we always collect all information and only decide at the (only) exit point which param to use

- to do that, we collect param values for caseA, caseB and caseCorD separately (and for the two latter cases, their charset values)

- renamed "needUnquote" to "needPercentDecoding" (to avoid confusion with quoted-string unquoting)

- made the code that copies fragments into new strings more readable by calculating the length only once

I hope I haven't added a memory leak, and I also think I fixed one what would occur when both caseCorD and caseB are present (in this order), and both specify the charset
Attachment #567559 - Flags: review?(bzbarsky)
(Assignee)

Comment 2

6 years ago
Comment on attachment 567559 [details] [diff] [review]
Proposed patch, incl test cases

Refactoring the code, and trying to fix bug 384571 first.
Attachment #567559 - Flags: review?(bzbarsky)
(Assignee)

Comment 3

6 years ago
Created attachment 568455 [details] [diff] [review]
Proposed patch, incl test case

To be applied on top of the patch for bug 384571.
Attachment #567559 - Attachment is obsolete: true
Attachment #568455 - Flags: review?(bzbarsky)
(Assignee)

Comment 4

5 years ago
Created attachment 575733 [details] [diff] [review]
Proposed patch, incl test case

Updated patch (applies on top of bug 384571)
Attachment #568455 - Attachment is obsolete: true
Attachment #568455 - Flags: review?(bzbarsky)
Attachment #575733 - Flags: review?(bzbarsky)
(Assignee)

Comment 5

5 years ago
Comment on attachment 575733 [details] [diff] [review]
Proposed patch, incl test case

(patch needs to be updated)
Attachment #575733 - Flags: review?(bzbarsky)
(Assignee)

Comment 6

5 years ago
Created attachment 607983 [details] [diff] [review]
Proposed patch, incl test case
Attachment #575733 - Attachment is obsolete: true
Attachment #607983 - Flags: review?(jduell.mcbugs)
Comment on attachment 607983 [details] [diff] [review]
Proposed patch, incl test case

Review of attachment 607983 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  Only one question:

::: netwerk/mime/nsMIMEHeaderParamImpl.cpp
@@ -721,2 @@
>        return cvtUTF8->ConvertStringToUTF8(aParamValue, aCharset,
> -          IS_7BIT_NON_ASCII_CHARSET(aCharset), aResult);

Any reason why you've scrapped the 7bit optimization?
(Assignee)

Comment 8

5 years ago
(In reply to Jason Duell (:jduell) from comment #7)
> Comment on attachment 607983 [details] [diff] [review]
> Proposed patch, incl test case
> 
> Review of attachment 607983 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good.  Only one question:
> 
> ::: netwerk/mime/nsMIMEHeaderParamImpl.cpp
> @@ -721,2 @@
> >        return cvtUTF8->ConvertStringToUTF8(aParamValue, aCharset,
> > -          IS_7BIT_NON_ASCII_CHARSET(aCharset), aResult);
> 
> Any reason why you've scrapped the 7bit optimization?

It's not an optimization; it's a heuristic that will cause the wrong thing to happen when the name encoding is declared to be ISO-8859-1, but it *looks* as if it was UTF-8.

(see http://greenbytes.de/tech/tc2231/#attwithfn2231utf8-bad)

This bug is about removing this heuristic (that is, to treat the declared character encoding to be authoritative, not just advisory)
Comment on attachment 607983 [details] [diff] [review]
Proposed patch, incl test case

OK.  I'm going to hold off on landing this until the next code fork (April 24th), since we've closed mozilla-central to all but approved patched till then.
Attachment #607983 - Flags: review?(jduell.mcbugs) → review+
(Assignee)

Comment 10

5 years ago
(In reply to Jason Duell (:jduell) from comment #9)
> OK.  I'm going to hold off on landing this until the next code fork (April
> 24th), since we've closed mozilla-central to all but approved patched till
> then.

Makes perfect sense. Thank you!
https://hg.mozilla.org/integration/mozilla-inbound/rev/7902fd4ca51d
https://hg.mozilla.org/mozilla-central/rev/7902fd4ca51d
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla15
You need to log in before you can comment on or make changes to this bug.