Last Comment Bug 700589 - HTTP content type charset parameter accepts single quotes
: HTTP content type charset parameter accepts single quotes
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla11
Assigned To: Julian Reschke
:
Mentors:
http://greenbytes.de/tech/tc/httpcont...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-08 00:42 PST by Julian Reschke
Modified: 2011-11-30 03:57 PST (History)
4 users (show)
dholbert: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Proposed patch, incl updated test case results, plus one new test (5.12 KB, patch)
2011-11-08 11:14 PST, Julian Reschke
no flags Details | Diff | Review
Proposed patch, incl updated test case results, plus new tests (6.77 KB, patch)
2011-11-11 05:23 PST, Julian Reschke
jduell.mcbugs: review+
Details | Diff | Review
Proposed patch, incl updated test case results, plus new tests (7.80 KB, patch)
2011-11-16 13:42 PST, Julian Reschke
no flags Details | Diff | Review
Proposed patch, incl test case (7.85 KB, patch)
2011-11-21 13:38 PST, Julian Reschke
bzbarsky: review-
Details | Diff | Review
Proposed patch, incl test case (7.94 KB, patch)
2011-11-29 13:14 PST, Julian Reschke
bzbarsky: review+
Details | Diff | Review

Description Julian Reschke 2011-11-08 00:42:26 PST
The parser for the HTTP content-type header field apparently accepts single quotes as delimiter. This is a bug according to RFC 2616, and also not the case in IE9 and Firefox.

See test case at:

  http://greenbytes.de/tech/tc/httpcontenttype/#textplainutf8qsa
Comment 1 Jason Duell [:jduell] (needinfo? me) 2011-11-08 01:07:35 PST
> also not the case in IE9 and Firefox.

s/Firefox/some other browser/  ?   Otherwise we're already fixed :)
Comment 2 Jason Duell [:jduell] (needinfo? me) 2011-11-08 01:08:33 PST
Ah, looking at the link I see it was IE9 and Opera that don't accept.  I suppose that's probably good enough to get strict about this.
Comment 3 Julian Reschke 2011-11-08 01:49:05 PST
(In reply to Jason Duell (:jduell) from comment #2)
> Ah, looking at the link I see it was IE9 and Opera that don't accept.  I
> suppose that's probably good enough to get strict about this.

Yep, sorry for that.

Note that the single quote is a legal character in param values, and thus any kind of special handling will cause problems with headers like:

  Content-Type: foo/bar; ext='; charset=qux

(will add a test case)
Comment 4 Julian Reschke 2011-11-08 07:50:23 PST
(In reply to Julian Reschke from comment #3)
> Note that the single quote is a legal character in param values, and thus
> any kind of special handling will cause problems with headers like:
> 
>   Content-Type: foo/bar; ext='; charset=qux
> 
> (will add a test case)

Test case: http://greenbytes.de/tech/tc/httpcontenttype/#textplainutf8extparamsq

And not surprisingly, Firefox fails that, too.
Comment 5 Julian Reschke 2011-11-08 09:56:12 PST
Seems to be quite easy to fix; now looking at the test cases which appear to check for this behavior.
Comment 6 Julian Reschke 2011-11-08 11:14:59 PST
Created attachment 572910 [details] [diff] [review]
Proposed patch, incl updated test case results, plus one new test
Comment 7 Jason Duell [:jduell] (needinfo? me) 2011-11-10 16:30:22 PST
Comment on attachment 572910 [details] [diff] [review]
Proposed patch, incl updated test case results, plus one new test

Review of attachment 572910 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the patch.  A couple minor questions.

::: netwerk/test/unit/test_parse_content_type.js
@@ +112,3 @@
>  
>    type = netutil.parseContentType('text/plain, TEXT/HTML; param=charset=UTF8; charset=\'ISO-8859-1\'; param2=charset=UTF16, text/html, TEXT/HTML', charset, hadCharset);
> +  check("text/html", "'ISO-8859-1'", true);  

I'm not clear on why you're adding all the single quoting to the ISO-8859-1 args.  Is that just to test that single quotes make it through the parser and are returned as part of the value, or is this what should actually show up on the web?  If we suspect that most servers are sending the value in double quotes, why not change most of the tests to pass in a dbl-quoted value instead?

@@ +135,5 @@
>    check("text/html", "", false);
>  
> +  // Bug 700589
> +  type = netutil.parseContentType("text/plain; x='; charset=\"UTF-8\"", charset, hadCharset);
> +  check("text/plain", "UTF-8", true);

Do you want to check for x="'" too?
Comment 8 Julian Reschke 2011-11-10 23:15:38 PST
(In reply to Jason Duell (:jduell) from comment #7)
> Comment on attachment 572910 [details] [diff] [review] [diff] [details] [review]
> Proposed patch, incl updated test case results, plus one new test
> 
> Review of attachment 572910 [details] [diff] [review] [diff] [details] [review]:
> -----------------------------------------------------------------
> 
> Thanks for the patch.  A couple minor questions.
> 
> ::: netwerk/test/unit/test_parse_content_type.js
> @@ +112,3 @@
> >  
> >    type = netutil.parseContentType('text/plain, TEXT/HTML; param=charset=UTF8; charset=\'ISO-8859-1\'; param2=charset=UTF16, text/html, TEXT/HTML', charset, hadCharset);
> > +  check("text/html", "'ISO-8859-1'", true);  
> 
> I'm not clear on why you're adding all the single quoting to the ISO-8859-1
> args.  Is that just to test that single quotes make it through the parser
> and are returned as part of the value, or is this what should actually show
> up on the web?  If we suspect that most servers are sending the value in
> double quotes, why not change most of the tests to pass in a dbl-quoted
> value instead?

Well, I just adjusted test results. Apparently the person working on this before felt it was important to test all these cases; I will review those, and also make sure that " has sufficient coverage.

> @@ +135,5 @@
> >    check("text/html", "", false);
> >  
> > +  // Bug 700589
> > +  type = netutil.parseContentType("text/plain; x='; charset=\"UTF-8\"", charset, hadCharset);
> > +  check("text/plain", "UTF-8", true);
> 
> Do you want to check for x="'" too?

Yes, I'll that for clarity.

Thanks so far.
Comment 9 Julian Reschke 2011-11-11 05:23:12 PST
Created attachment 573779 [details] [diff] [review]
Proposed patch, incl updated test case results, plus new tests

Updated tests; I modified those that didn't seem to test ' vs " to use " instead (so the test changes, but the test result does not). Also added a specific one to the end of the test suite covering the bug for which this ticket was opened.
Comment 10 Jason Duell [:jduell] (needinfo? me) 2011-11-11 11:14:43 PST
Comment on attachment 573779 [details] [diff] [review]
Proposed patch, incl updated test case results, plus new tests

Review of attachment 573779 [details] [diff] [review]:
-----------------------------------------------------------------

OK, looks good.  Thanks Julian!
Comment 11 Daniel Holbert [:dholbert] (largely AFK until June 28) 2011-11-13 17:23:31 PST
Comment on attachment 573779 [details] [diff] [review]
Proposed patch, incl updated test case results, plus new tests

># HG changeset patch
># Parent 1374294a61191a19ba6f52be7ee980b1f5186532
># User julian.reschke@gmx.de
>Bug 700589 - HTTP content type charset parameter accepts single quotes

Julian: It looks like this is the description of the bug rather than the description of the change.  Could you suggest a checkin message that describes the change?
Comment 12 Julian Reschke 2011-11-14 00:50:17 PST
"Fix Content-Type parser to treat single quote as regular character instead of a delimiter"
Comment 13 Daniel Holbert [:dholbert] (largely AFK until June 28) 2011-11-15 10:42:05 PST
Pushed to try; assuming that goes well, I can push this to mozilla-inbound later today.
  https://tbpl.mozilla.org/?tree=Try&rev=143a2ba7ad7c
Comment 14 Daniel Holbert [:dholbert] (largely AFK until June 28) 2011-11-15 12:43:18 PST
This makes us fail a mochitest:
{
10593 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_bug397234.html | Headers should match - got text/plain; charset=UTF-8, expected text/plain; charset='uTf-8'
}
https://tbpl.mozilla.org/php/getParsedLog.php?id=7409173&tree=Try

It may be that the test is broken; if so, the test-fixup needs to be included as part of the patch, so that our test boxes will stay green when this is pushed.
Comment 15 Daniel Holbert [:dholbert] (largely AFK until June 28) 2011-11-15 12:44:12 PST
For reference, the test that failed is:
 http://mxr.mozilla.org/mozilla-central/source/content/base/test/test_bug397234.html?force=1
Comment 16 Julian Reschke 2011-11-15 13:35:57 PST
So what happens is that the test case checks that the case of the charset param is preserved in the request being sent *if* it does match the charset that is actually in use.

The header set by the test case uses

  charset='uTf-8'

from which we now extract "'uTf-8'" as charset, previously "UTf-8" (single quotes now preserved).

This is an illegal charset, so XMLHttpRequest defaults to UTF-8 (my theory), and thus the code that tries to re-insert the funny-cased original value doesn't get triggered anymore.

Essentially this works as intended, and we can change the test to use double quotes rather than single quotes. It would be good to know though whether it's a pure accident that the test uses single quotes, or whether this was really seen in the wild.
Comment 17 Julian Reschke 2011-11-15 13:53:00 PST
The conservative fix seems to add a workaround to the code in XmlHTTPRequest so that the charset *with* single quotes matches as well, such as (pseudo-code):

  if (specifiedCharset.startsWith("'") && specifiedCharset.ensWith("'")) {
    ...remove the quoting
  }

That way, the observable behavior would stay the same.

(It would be better if this hack could be removed; it might be good to check what other UAs do here, and whether the XHR spec says something about it)
Comment 18 Julian Reschke 2011-11-16 13:42:15 PST
Created attachment 574984 [details] [diff] [review]
Proposed patch, incl updated test case results, plus new tests

Same as previous patch; except that it adds a workaround on top of another workaround, so that the logic in XmlHttpRequest that rewrites the content-type header continues to do the right thing even for charset names in single quotes.
Comment 19 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-11-21 11:08:05 PST
Comment on attachment 574984 [details] [diff] [review]
Proposed patch, incl updated test case results, plus new tests

specifiedCharset could be empty there, no?  You need to check specifiedCharset.Length() >= 2.
Comment 20 Julian Reschke 2011-11-21 13:38:59 PST
Created attachment 575957 [details] [diff] [review]
Proposed patch, incl test case

Patch updated as suggested, mochitests not run yet; will try try server now.
Comment 21 Mozilla RelEng Bot 2011-11-22 00:20:22 PST
Try run for d967b35c9d35 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=d967b35c9d35
Results (out of 13 total builds):
    exception: 12
    failure: 1
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/julian.reschke@gmx.de-d967b35c9d35
Comment 22 Julian Reschke 2011-11-22 04:55:38 PST
Comment on attachment 575957 [details] [diff] [review]
Proposed patch, incl test case

Try results: https://tbpl.mozilla.org/?tree=Try&rev=2002a0e4a1f2 (I don't believe that any problems related here have anything to do with the patch; in particular the test failure for test_bug397234.html is gone)
Comment 23 Mozilla RelEng Bot 2011-11-22 08:40:23 PST
Try run for 2002a0e4a1f2 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=2002a0e4a1f2
Results (out of 247 total builds):
    exception: 2
    success: 206
    warnings: 35
    failure: 4
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/julian.reschke@gmx.de-2002a0e4a1f2
Comment 24 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-11-29 11:17:37 PST
Comment on attachment 575957 [details] [diff] [review]
Proposed patch, incl test case

This doesn't seem quite right either for the case when specifiedCharset is, say, "'ut'f-8'".
Comment 25 Julian Reschke 2011-11-29 13:14:52 PST
Created attachment 577733 [details] [diff] [review]
Proposed patch, incl test case

(addressed review comment, try results: https://tbpl.mozilla.org/?tree=Try&rev=38eaa7d2fb60)
Comment 26 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-11-29 13:18:21 PST
Comment on attachment 577733 [details] [diff] [review]
Proposed patch, incl test case

r=me on the XHR bits

Now we get to watch out for this breaking sites....
Comment 27 Julian Reschke 2011-11-29 13:24:48 PST
(In reply to Boris Zbarsky (:bz) from comment #26)
> Now we get to watch out for this breaking sites....

Maybe. We'll see.
Comment 28 Daniel Holbert [:dholbert] (largely AFK until June 28) 2011-11-29 14:08:50 PST
I removed 2 instances of end-of-line whitespace (one in nsXMLHttpRequest.cpp, one in test_parse_content_type.js) and pushed:
  https://hg.mozilla.org/integration/mozilla-inbound/rev/4b085d906272
Comment 29 Marco Bonardo [::mak] 2011-11-30 03:57:14 PST
https://hg.mozilla.org/mozilla-central/rev/4b085d906272

Note You need to log in before you can comment on or make changes to this bug.