Last Comment Bug 796882 - Make BOM take precedence over HTTP for CSS
: Make BOM take precedence over HTTP for CSS
Status: RESOLVED FIXED
: dev-doc-complete
Product: Core
Classification: Components
Component: CSS Parsing and Computation (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla19
Assigned To: Henri Sivonen (:hsivonen)
:
: Jet Villegas (:jet)
Mentors:
Depends on: 816849 801402
Blocks: encoding 859706
  Show dependency treegraph
 
Reported: 2012-10-02 04:56 PDT by Henri Sivonen (:hsivonen)
Modified: 2013-07-14 00:31 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
WIP (9.36 KB, patch)
2012-10-24 07:14 PDT, Henri Sivonen (:hsivonen)
no flags Details | Diff | Splinter Review
Adjust old tests (3.10 KB, patch)
2012-11-08 07:36 PST, Henri Sivonen (:hsivonen)
no flags Details | Diff | Splinter Review
Implement CSS3 Syntax charset handling (10.00 KB, patch)
2012-11-08 07:37 PST, Henri Sivonen (:hsivonen)
no flags Details | Diff | Splinter Review
Import CSSWG tests (29.38 KB, patch)
2012-11-09 09:47 PST, :Ms2ger (⌚ UTC+1/+2)
no flags Details | Diff | Splinter Review
Patch on top of Ms2ger’s patch (3.30 KB, patch)
2012-11-12 07:33 PST, Henri Sivonen (:hsivonen)
no flags Details | Diff | Splinter Review
Adjust old tests by flipping red and green (3.10 KB, patch)
2012-11-12 07:36 PST, Henri Sivonen (:hsivonen)
bzbarsky: review+
Details | Diff | Splinter Review
Implement CSS3 Syntax charset handling, with updated comment (10.00 KB, patch)
2012-11-12 07:39 PST, Henri Sivonen (:hsivonen)
bzbarsky: review+
Details | Diff | Splinter Review
Implement CSS3 Syntax charset handling, with review comments addressed (13.79 KB, patch)
2012-11-13 05:46 PST, Henri Sivonen (:hsivonen)
no flags Details | Diff | Splinter Review
Implement CSS3 Syntax charset handling, with review comments addressed, without off-by-one error (13.79 KB, patch)
2012-11-14 23:12 PST, Henri Sivonen (:hsivonen)
bzbarsky: review+
Details | Diff | Splinter Review

Description Henri Sivonen (:hsivonen) 2012-10-02 04:56:12 PDT
Determining the character encoding when CSS loading should be consistent with bug 716579 and bug 687859 for the sake of the overall consistency of the platform. That is, a BOM should take precedence over HTTP-level charset in accordance with http://encoding.spec.whatwg.org/#decode .
Comment 1 Boris Zbarsky [:bz] (still a bit busy) 2012-10-02 08:47:03 PDT
So BOM should imply UTF-16, then?  Right now we actually look for @charset after a BOM and allow @charset to set the encoding if it's present....

In any case, this needs to be done in the working group, since it involves changing http://www.w3.org/TR/CSS21/syndata.html#charset
Comment 2 Henri Sivonen (:hsivonen) 2012-10-08 01:31:48 PDT
(In reply to Boris Zbarsky (:bz) from comment #1)
> So BOM should imply UTF-16, then?

Or UTF-8 in the case of the UTF-8 BOM.
Comment 3 Henri Sivonen (:hsivonen) 2012-10-19 05:35:04 PDT
Chrome and Opera already do what this bug asks for. We are not currently matching IE, either. (IE strips the BOM from CSS and then proceeds to decode according to the encoding specified on the HTTP level, which is really bizarre.) I did not test Safari.
Comment 4 Henri Sivonen (:hsivonen) 2012-10-19 07:43:00 PDT
(In reply to Boris Zbarsky (:bz) from comment #1)
> In any case, this needs to be done in the working group, since it involves
> changing http://www.w3.org/TR/CSS21/syndata.html#charset

http://lists.w3.org/Archives/Public/www-style/2012Oct/0516.html
Comment 5 Henri Sivonen (:hsivonen) 2012-10-24 06:52:43 PDT
I don’t see mCharsetHint on SheetLoadData ever getting set to a non-empty value. What am I missing?
Comment 6 Henri Sivonen (:hsivonen) 2012-10-24 07:14:48 PDT
Created attachment 674651 [details] [diff] [review]
WIP
Comment 7 Henri Sivonen (:hsivonen) 2012-10-24 07:15:25 PDT
The patch needs bug 801402 to land first.
Comment 8 Boris Zbarsky [:bz] (still a bit busy) 2012-10-24 11:37:46 PDT
> I don’t see mCharsetHint on SheetLoadData ever getting set to a non-empty value.

It might not be.  It's sort of part of the public LoadSheet() API, but consumers may always be passing an empty string.

We could just nuke it.
Comment 9 Henri Sivonen (:hsivonen) 2012-11-08 07:34:25 PST
(In reply to Boris Zbarsky (:bz) from comment #8)
> > I don’t see mCharsetHint on SheetLoadData ever getting set to a non-empty value.
> 
> It might not be.

Turns out it’s used for preloads.

- -

The tests for this bug are at https://test.csswg.org/source/contributors/opera/submitted/css3-syntax/charset/

That is, the tests are testharness.js tests but they are part of CSS WG test suite. We don’t have any precedent for importing tests with this combination. Where should I put them? Does the tooling at http://mxr.mozilla.org/mozilla-central/source/dom/imptests/ support importing a test suite under layout/ ?
Comment 10 Henri Sivonen (:hsivonen) 2012-11-08 07:36:26 PST
Created attachment 679679 [details] [diff] [review]
Adjust old tests
Comment 11 Henri Sivonen (:hsivonen) 2012-11-08 07:37:59 PST
Created attachment 679680 [details] [diff] [review]
Implement CSS3 Syntax charset handling
Comment 12 :Ms2ger (⌚ UTC+1/+2) 2012-11-09 09:47:58 PST
Created attachment 680108 [details] [diff] [review]
Import CSSWG tests

This puts them under dom/imptests/css. I guess it would be a bit nicer to put them somewhere under layout; file a bug on me if you'd like me to extend the script to allow that.
Comment 13 Henri Sivonen (:hsivonen) 2012-11-12 07:33:55 PST
Created attachment 680627 [details] [diff] [review]
Patch on top of Ms2ger’s patch

This patch is needed on top of the one Ms2ger uploaded. However, this doesn’t mark dom/imptests/css/contributors/opera/submitted/css3-syntax/charset/test_page-utf16-css-bomless-utf16be.html as a known failure, because I couldn’t get parseFailures.py to do anything.
Comment 14 Henri Sivonen (:hsivonen) 2012-11-12 07:35:06 PST
And it’s a known failure, because it ends up testing whether our UTF-16 decoders try to sniff endianness in the BOMless case. (Our decoder sniffs. The Encoding Standard says not to, at the moment.)

Outside the scope of this bug.
Comment 15 Henri Sivonen (:hsivonen) 2012-11-12 07:36:03 PST
Created attachment 680628 [details] [diff] [review]
Adjust old tests by flipping red and green
Comment 16 Henri Sivonen (:hsivonen) 2012-11-12 07:39:30 PST
Created attachment 680630 [details] [diff] [review]
Implement CSS3 Syntax charset handling, with updated comment
Comment 17 :Ms2ger (⌚ UTC+1/+2) 2012-11-12 12:48:04 PST
Comment on attachment 680627 [details] [diff] [review]
Patch on top of Ms2ger’s patch

Review of attachment 680627 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/imptests/Makefile.in
@@ +22,5 @@
>  
>  include $(srcdir)/editing.mk
>  include $(srcdir)/html.mk
>  include $(srcdir)/webapps.mk
> +include $(srcdir)/css.mk

Gah, sorry :/ I fixed the instructions in the readme to mention this:

https://hg.mozilla.org/mozilla-central/rev/96e7580fb756
Comment 18 Boris Zbarsky [:bz] (still a bit busy) 2012-11-13 00:39:39 PST
Comment on attachment 680628 [details] [diff] [review]
Adjust old tests by flipping red and green

This is landing in the same changeset as the code changes, right?

r=me
Comment 19 Boris Zbarsky [:bz] (still a bit busy) 2012-11-13 00:50:36 PST
Comment on attachment 680630 [details] [diff] [review]
Implement CSS3 Syntax charset handling, with updated comment

r=me if you leave the logging about where the charset info is coming from instead of removing it.
Comment 20 Boris Zbarsky [:bz] (still a bit busy) 2012-11-13 00:55:39 PST
Comment on attachment 680630 [details] [diff] [review]
Implement CSS3 Syntax charset handling, with updated comment

Actually, two more comments:

>+++ b/layout/style/Loader.cpp
>+  for (; i < sizeof(kCharsetSym) - 1; ++i) {

Why not just strncmp?

>+        aCharset.Trim(" \t\n\f\r");

Is this specced?  Is it needed?  Seems pretty weird... especially since having an unescaped linebreak inside a quoted-string is not OK in CSS.
Comment 21 Henri Sivonen (:hsivonen) 2012-11-13 04:23:54 PST
(In reply to Boris Zbarsky (:bz) from comment #20)
> >+++ b/layout/style/Loader.cpp
> >+  for (; i < sizeof(kCharsetSym) - 1; ++i) {
> 
> Why not just strncmp?

Because the old code did not have strncmp and I don’t tend to think in terms of the C standard library.

> >+        aCharset.Trim(" \t\n\f\r");
> 
> Is this specced?

Yes. See http://dev.w3.org/csswg/css3-syntax/#the-input-byte-stream

Step 3 says, in part: “then get an encoding for the sequence of XX bytes, decoded per windows-1252, and let temp be the return value.”

The sequence of XX bytes can contain anything except the double quote.

“Get an encoding” is defined in http://encoding.spec.whatwg.org/#concept-encoding-get which says “Remove any leading and trailing ASCII whitespace from label.” And ASCII whitespace is defined to be " ", "\t", "\n", "\f" and "\r".

I suppose it would be slightly more proper to call EncodingUtils::FindEncodingForLabel() instead of inlining “get an encoding” like this.

> Is it needed? 

It’s definitely needed in other parts of the platform that call “get an encoding”. Dunno if it is needed here. It’s a side effect of reusing the “get an encoding” concept.
Comment 22 Henri Sivonen (:hsivonen) 2012-11-13 05:46:57 PST
Created attachment 681011 [details] [diff] [review]
Implement CSS3 Syntax charset handling, with review comments addressed
Comment 23 Boris Zbarsky [:bz] (still a bit busy) 2012-11-13 09:47:19 PST
Comment on attachment 681011 [details] [diff] [review]
Implement CSS3 Syntax charset handling, with review comments addressed

> +  for (uint32_t i = sizeof(kCharsetSym); i < aDataLength; ++i) {

Should that not be sizeof(kCharsetSym) - 1?
Comment 24 Henri Sivonen (:hsivonen) 2012-11-14 23:12:06 PST
Created attachment 681867 [details] [diff] [review]
Implement CSS3 Syntax charset handling, with review comments addressed, without off-by-one error

(In reply to Boris Zbarsky (:bz) from comment #23)
> Comment on attachment 681011 [details] [diff] [review]
> Implement CSS3 Syntax charset handling, with review comments addressed
> 
> > +  for (uint32_t i = sizeof(kCharsetSym); i < aDataLength; ++i) {
> 
> Should that not be sizeof(kCharsetSym) - 1?

Oops. Yes.

This patch uses strncmp and does not attempt to inline parts of "get an encoding".
Comment 25 Boris Zbarsky [:bz] (still a bit busy) 2012-11-14 23:31:51 PST
Comment on attachment 681867 [details] [diff] [review]
Implement CSS3 Syntax charset handling, with review comments addressed, without off-by-one error

r=me
Comment 26 Henri Sivonen (:hsivonen) 2012-11-15 00:50:27 PST
(In reply to Boris Zbarsky (:bz) from comment #25)
> r=me

Thanks.

Landed the code now in order to get all the BOM precedence stuff in the same train:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b301f9b2e956

Leaving open pending a decision and help from Ms2ger on how to deal with the test import. (Need to sort out the failure metadata and the HTTP headers.)
Comment 27 Ed Morley [:emorley] 2012-11-15 08:22:16 PST
https://hg.mozilla.org/mozilla-central/rev/b301f9b2e956
Comment 28 Boris Zbarsky [:bz] (still a bit busy) 2013-06-03 06:49:36 PDT
Having landed and shipped code changes that change behavior without resolving the bug and setting a target milestone indicating when it got is _really_ confusing for regression-hunting.

Please resolve this bug appropriately and file a followup on whatever work remains.
Comment 29 Henri Sivonen (:hsivonen) 2013-06-06 05:18:56 PDT
(In reply to Boris Zbarsky (:bz) from comment #28)
> Having landed and shipped code changes that change behavior without
> resolving the bug and setting a target milestone indicating when it got is
> _really_ confusing for regression-hunting.

Yeah. This remained open longer than I expected it to remain open.

> Please resolve this bug appropriately and file a followup on whatever work
> remains.

Filed bug 880201.
Comment 30 Jeremie Patonnier :Jeremie 2013-06-20 03:17:42 PDT
Updated https://developer.mozilla.org/en-US/docs/Web/CSS/@charset
to reflect that change.

Teoli, can you check if we have other places we document that topic ?
Comment 31 Jean-Yves Perrier [:teoli] 2013-07-14 00:31:07 PDT
I have not found other places. It looks ok.

Note You need to log in before you can comment on or make changes to this bug.