Last Comment Bug 693806 - RFC2231/5987 encoding: charset information should be treated as authoritative
: RFC2231/5987 encoding: charset information should be treated as authoritative
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Networking (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla15
Assigned To: Julian Reschke
:
Mentors:
http://greenbytes.de/tech/tc2231/#att...
Depends on:
Blocks: 609667
  Show dependency treegraph
 
Reported: 2011-10-11 14:11 PDT by Julian Reschke
Modified: 2012-05-03 10:05 PDT (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Proposed patch, incl test cases (12.61 KB, patch)
2011-10-17 13:39 PDT, Julian Reschke
no flags Details | Diff | Splinter Review
Proposed patch, incl test case (3.92 KB, patch)
2011-10-20 11:39 PDT, Julian Reschke
no flags Details | Diff | Splinter Review
Proposed patch, incl test case (3.92 KB, patch)
2011-11-20 03:16 PST, Julian Reschke
no flags Details | Diff | Splinter Review
Proposed patch, incl test case (4.26 KB, patch)
2012-03-21 09:08 PDT, Julian Reschke
jduell.mcbugs: review+
Details | Diff | Splinter Review

Description Julian Reschke 2011-10-11 14:11:42 PDT
Test case <http://greenbytes.de/tech/tc2231/#attwithfn2231utf8-bad>:

  Content-Disposition: attachment; filename*=iso-8859-1''foo-%c3%a4-%e2%82%ac.html

After percent-unescaping, Firefox apparently decides that this is more likely to be UTF-8. Other UAs either ignore the parameter (good, as it contains a non-ISO-8859-1 octet) or treat it as in a superset-encoding of ISO-8859-1 (where %82 represents a valid code point).

Optimally, FF would align with IE9's and Konqueror's strict behavior.
Comment 1 Julian Reschke 2011-10-17 13:39:47 PDT
Created attachment 567559 [details] [diff] [review]
Proposed patch, incl test cases

This patch is big considering the change of behavior, so below a few explanations.

Functional changes:

- DecodeParameter() always skips the check, we trust the charset as obtained from the 2231/5987 format

- Before returning data from GetParameterInternal(), if we do have a charset we always check that the octet sequence decodes according to the character encoding, otherwise we ignore the parameter

To get there, I made a few more changes:

- rather than short-circuiting, we always collect all information and only decide at the (only) exit point which param to use

- to do that, we collect param values for caseA, caseB and caseCorD separately (and for the two latter cases, their charset values)

- renamed "needUnquote" to "needPercentDecoding" (to avoid confusion with quoted-string unquoting)

- made the code that copies fragments into new strings more readable by calculating the length only once

I hope I haven't added a memory leak, and I also think I fixed one what would occur when both caseCorD and caseB are present (in this order), and both specify the charset
Comment 2 Julian Reschke 2011-10-19 03:00:33 PDT
Comment on attachment 567559 [details] [diff] [review]
Proposed patch, incl test cases

Refactoring the code, and trying to fix bug 384571 first.
Comment 3 Julian Reschke 2011-10-20 11:39:10 PDT
Created attachment 568455 [details] [diff] [review]
Proposed patch, incl test case

To be applied on top of the patch for bug 384571.
Comment 4 Julian Reschke 2011-11-20 03:16:48 PST
Created attachment 575733 [details] [diff] [review]
Proposed patch, incl test case

Updated patch (applies on top of bug 384571)
Comment 5 Julian Reschke 2011-11-28 05:35:06 PST
Comment on attachment 575733 [details] [diff] [review]
Proposed patch, incl test case

(patch needs to be updated)
Comment 6 Julian Reschke 2012-03-21 09:08:52 PDT
Created attachment 607983 [details] [diff] [review]
Proposed patch, incl test case
Comment 7 Jason Duell [:jduell] (needinfo me) 2012-04-16 21:36:23 PDT
Comment on attachment 607983 [details] [diff] [review]
Proposed patch, incl test case

Review of attachment 607983 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  Only one question:

::: netwerk/mime/nsMIMEHeaderParamImpl.cpp
@@ -721,2 @@
>        return cvtUTF8->ConvertStringToUTF8(aParamValue, aCharset,
> -          IS_7BIT_NON_ASCII_CHARSET(aCharset), aResult);

Any reason why you've scrapped the 7bit optimization?
Comment 8 Julian Reschke 2012-04-17 06:06:55 PDT
(In reply to Jason Duell (:jduell) from comment #7)
> Comment on attachment 607983 [details] [diff] [review]
> Proposed patch, incl test case
> 
> Review of attachment 607983 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good.  Only one question:
> 
> ::: netwerk/mime/nsMIMEHeaderParamImpl.cpp
> @@ -721,2 @@
> >        return cvtUTF8->ConvertStringToUTF8(aParamValue, aCharset,
> > -          IS_7BIT_NON_ASCII_CHARSET(aCharset), aResult);
> 
> Any reason why you've scrapped the 7bit optimization?

It's not an optimization; it's a heuristic that will cause the wrong thing to happen when the name encoding is declared to be ISO-8859-1, but it *looks* as if it was UTF-8.

(see http://greenbytes.de/tech/tc2231/#attwithfn2231utf8-bad)

This bug is about removing this heuristic (that is, to treat the declared character encoding to be authoritative, not just advisory)
Comment 9 Jason Duell [:jduell] (needinfo me) 2012-04-17 17:57:30 PDT
Comment on attachment 607983 [details] [diff] [review]
Proposed patch, incl test case

OK.  I'm going to hold off on landing this until the next code fork (April 24th), since we've closed mozilla-central to all but approved patched till then.
Comment 10 Julian Reschke 2012-04-18 00:25:43 PDT
(In reply to Jason Duell (:jduell) from comment #9)
> OK.  I'm going to hold off on landing this until the next code fork (April
> 24th), since we've closed mozilla-central to all but approved patched till
> then.

Makes perfect sense. Thank you!
Comment 11 Jason Duell [:jduell] (needinfo me) 2012-04-29 13:49:40 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/7902fd4ca51d

Note You need to log in before you can comment on or make changes to this bug.