Last Comment Bug 462458 - [FIX]Update @charset detection to spec changes
: [FIX]Update @charset detection to spec changes
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: CSS Parsing and Computation (show other bugs)
: Trunk
: x86 All
: -- normal (vote)
: ---
Assigned To: Boris Zbarsky [:bz]
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-10-30 16:50 PDT by Boris Zbarsky [:bz]
Modified: 2012-11-09 00:58 PST (History)
5 users (show)
bzbarsky: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Like so (2.03 KB, patch)
2008-10-30 17:18 PDT, Boris Zbarsky [:bz]
dbaron: review+
dbaron: superreview+
Details | Diff | Splinter Review
Patch with tests (12.40 KB, patch)
2008-11-04 19:24 PST, Boris Zbarsky [:bz]
mbeltzner: approval1.9.1b2+
Details | Diff | Splinter Review

Description Boris Zbarsky [:bz] 2008-10-30 16:50:54 PDT
The spec has gotten a good bit stricter about what's allowed in @charset.  In particular, runs of whitespace are not allowed before the opening quote of the charset name, and only double-quotes are allowed to delimit the charset name.
Comment 1 Boris Zbarsky [:bz] 2008-10-30 17:18:32 PDT
Created attachment 345629 [details] [diff] [review]
Like so

This just removes the whitespace-skipping and disallows backslash-escapes and the use of single-quotes around the charset name.
Comment 2 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2008-11-03 15:49:47 PST
Warning: potential conflicts with bug 335531, depending on who lands first.  If you land first, you might want to review the merging there.
Comment 3 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2008-11-03 16:50:40 PST
The spec in question is at http://www.w3.org/TR/CSS21/syndata.html#charset
Comment 4 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2008-11-03 16:54:37 PST
Comment on attachment 345629 [details] [diff] [review]
Like so

r+sr=dbaron

Having some tests for this at some point would be nice...
Comment 5 Boris Zbarsky [:bz] 2008-11-04 19:24:41 PST
Created attachment 346387 [details] [diff] [review]
Patch with tests
Comment 6 Masatoshi Kimura [:emk] 2008-11-06 04:38:16 PST
Comment on attachment 346387 [details] [diff] [review]
Patch with tests

test-charset-utf-16-be-bom and test-charset-utf-16-le-bom should fail because UTF-32BE and UTF-32LE don't allow the BOM.
Comment 7 Masatoshi Kimura [:emk] 2008-11-06 04:39:28 PST
(In reply to comment #6)
> UTF-32BE and UTF-32LE don't allow the BOM.
Oops, UTF-16BE and UTF-16LE don't allow the BOM.
Comment 8 Boris Zbarsky [:bz] 2008-11-06 07:29:35 PST
Interesting.  They certainly pass in our code as it stands.  I assume your patch for bug 335531 will change that?

I'm happy to remove those tests (or change them into tests testing for failure, which might be preferable; just need to watch out for landing order in that case).
Comment 9 Masatoshi Kimura [:emk] 2008-11-06 12:07:17 PST
(In reply to comment #8)
> Interesting.  They certainly pass in our code as it stands.  I assume your
> patch for bug 335531 will change that?
I meant to say, "test-charset-utf-16-be-bom and test-charset-utf-16-le-bom should fail, but actually don't."
At present, I made no changes the behavior of UTF-16BE and UTF-16LE decoders due to concern about compatibility.
> I'm happy to remove those tests (or change them into tests testing for failure,
> which might be preferable; just need to watch out for landing order in that
> case).
Plaese remove tests from this patch. I'll made a new patch with modified test if needed. Do you agree to change decoders?
Comment 10 Boris Zbarsky [:bz] 2008-11-06 12:13:53 PST
I'll keep the tests but reverse the sense and mark them todo.

No opinion on decoders; that's not my area of expertise.

I should also note that given the current style system code I'd expect those tests to pass no matter what the decoders do here; they'll only start to fail if we change the decoders _and_ we make some CSS parser changes to toss the sheet if the @charset that's parsed doesn't match what's sniffed.
Comment 11 Boris Zbarsky [:bz] 2008-11-06 13:11:41 PST
Comment on attachment 346387 [details] [diff] [review]
Patch with tests

It'd be good to get real-world testing here in case sites use single-quotes on charset...
Comment 12 Mike Beltzner [:beltzner, not reading bugmail] 2008-11-10 17:02:14 PST
Comment on attachment 346387 [details] [diff] [review]
Patch with tests

a=beltzner, mmm, tests
Comment 13 Boris Zbarsky [:bz] 2008-11-11 21:20:47 PST
Pushed changeset b765de190a11.
Comment 14 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2010-10-13 17:48:03 PDT
(In reply to comment #10)
> I'll keep the tests but reverse the sense and mark them todo.

Do we have a bug on this?
Comment 15 Boris Zbarsky [:bz] 2010-10-13 17:58:12 PDT
I don't think so.  Note that the tests are currently marked as "fails", not "todo".  And I believe that failing as they're currently written is correct behavior; we should reverse the red and green in the test and stylesheet and mark them as just ==, perhaps.
Comment 16 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2010-10-13 22:24:39 PDT
The tests as currently written are also part of the CSS 2.1 test suite, where we also fail them.
Comment 17 Boris Zbarsky [:bz] 2010-10-13 22:37:20 PDT
I thought they'd been brought up as incorrect for the suite and I fixed them there....  Is that not the case?
Comment 18 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2010-10-14 00:27:52 PDT
http://test.csswg.org/suites/css2.1/20101001/html4/at-charset-utf16-be-002.htm
http://test.csswg.org/suites/css2.1/20101001/html4/at-charset-utf16-le-002.htm
http://test.csswg.org/suites/css2.1/20101001/xhtml1/at-charset-utf16-be-002.xht
http://test.csswg.org/suites/css2.1/20101001/xhtml1/at-charset-utf16-le-002.xht

I think fantasai may have re-imported the tests from our tree rather than using the contribution I collected.  Could that be the issue?  Or were your fixes newer than the beginning of this month?
Comment 19 Boris Zbarsky [:bz] 2010-10-14 05:05:38 PDT
No, apparently I just never fixed these.  :(  And I can't find the relevant public-css-testsuite emails, so maybe I imagined those too....

In any case, iirc the right behavior here is to ignore @charset "UTF-16LE" and likewise for BE, because it's not a valid charset name, and just use the BOM to determine endianness.  At least as I recall this stuff at this point.
Comment 20 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2010-10-15 11:05:56 PDT
Any chance you could send email to public-css-testsuite explaining what should happen and why?
Comment 21 Boris Zbarsky [:bz] 2010-10-15 11:33:42 PDT
OK, I just read through this bug and http://www.w3.org/TR/CSS21/syndata.html#charset again really carefully.  Given those rules, these tests are incorrect.  In particular, the rules in that section tell us to decode the stylesheet "as specified".  When doing that, if we followed the Unicode standard (which we don't; see comment 9) we have a garbage U+FEFF character at the beginning of the character stream, but that doesn't affect anything other than parsing of the @charset rule... which should get discarded rather than parsed.  We can't detect that with a reftest; it would require a CSSOM test to check that.

So I think we should reverse the sense of these tests.  I'll mail the list.

It might be worth adding a test that has a BE BOM, then has an @charset encoded in BE which says "UTF-16LE" and have the actual rules encoded as LE.... but maybe that's out of scope at this point.
Comment 22 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2010-10-15 17:10:53 PDT
bzbarsky sent said email:
http://lists.w3.org/Archives/Public/public-css-testsuite/2010Oct/0166.html
Comment 23 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2012-11-08 06:21:35 PST
bz, do you recall if layout/reftests/css-charset/test-charset-utf-16-be-no-bom.css and its le variant are motivated by real Web content or if they test stuff just because the spec said so?

I am guessing that they exist because of the spec’s say-so and not to address any real-world Web-compat issue.

Those two tests are invalidated by the current CSS3 Syntax draft.
Comment 24 Masatoshi Kimura [:emk] 2012-11-08 10:14:40 PST
I proposed the change assuming UTF-16, UTF-16LE, and UTF-16BE decoders follow the Unicode spec and IANA registry. But I'm going to remove BOM-sniffing UTF-16 decoder to comply the Encoding Standard now. Bug 335531 should also be reverted.
Comment 25 Boris Zbarsky [:bz] 2012-11-08 12:20:20 PST
I think I just wrote the test to make sure to test all the possibilities.  I don't recall actual web content depending on this stuff; to a first approximation web content never uses @charset.
Comment 26 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2012-11-09 00:58:54 PST
Thanks. I’ll edit the test cases.

Note You need to log in before you can comment on or make changes to this bug.