The spec has gotten a good bit stricter about what's allowed in @charset. In particular, runs of whitespace are not allowed before the opening quote of the charset name, and only double-quotes are allowed to delimit the charset name.
Created attachment 345629 [details] [diff] [review] Like so This just removes the whitespace-skipping and disallows backslash-escapes and the use of single-quotes around the charset name.
Summary: Update @charset detection to spec changes → [FIX]Update @charset detection to spec changes
Warning: potential conflicts with bug 335531, depending on who lands first. If you land first, you might want to review the merging there.
The spec in question is at http://www.w3.org/TR/CSS21/syndata.html#charset
Comment on attachment 345629 [details] [diff] [review] Like so r+sr=dbaron Having some tests for this at some point would be nice...
Created attachment 346387 [details] [diff] [review] Patch with tests
Attachment #345629 - Attachment is obsolete: true
Comment on attachment 346387 [details] [diff] [review] Patch with tests test-charset-utf-16-be-bom and test-charset-utf-16-le-bom should fail because UTF-32BE and UTF-32LE don't allow the BOM.
(In reply to comment #6) > UTF-32BE and UTF-32LE don't allow the BOM. Oops, UTF-16BE and UTF-16LE don't allow the BOM.
Interesting. They certainly pass in our code as it stands. I assume your patch for bug 335531 will change that? I'm happy to remove those tests (or change them into tests testing for failure, which might be preferable; just need to watch out for landing order in that case).
(In reply to comment #8) > Interesting. They certainly pass in our code as it stands. I assume your > patch for bug 335531 will change that? I meant to say, "test-charset-utf-16-be-bom and test-charset-utf-16-le-bom should fail, but actually don't." At present, I made no changes the behavior of UTF-16BE and UTF-16LE decoders due to concern about compatibility. > I'm happy to remove those tests (or change them into tests testing for failure, > which might be preferable; just need to watch out for landing order in that > case). Plaese remove tests from this patch. I'll made a new patch with modified test if needed. Do you agree to change decoders?
I'll keep the tests but reverse the sense and mark them todo. No opinion on decoders; that's not my area of expertise. I should also note that given the current style system code I'd expect those tests to pass no matter what the decoders do here; they'll only start to fail if we change the decoders _and_ we make some CSS parser changes to toss the sheet if the @charset that's parsed doesn't match what's sniffed.
Comment on attachment 346387 [details] [diff] [review] Patch with tests It'd be good to get real-world testing here in case sites use single-quotes on charset...
Attachment #346387 - Flags: approval1.9.1b2?
Comment on attachment 346387 [details] [diff] [review] Patch with tests a=beltzner, mmm, tests
Attachment #346387 - Flags: approval1.9.1b2? → approval1.9.1b2+
Pushed changeset b765de190a11.
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
(In reply to comment #10) > I'll keep the tests but reverse the sense and mark them todo. Do we have a bug on this?
I don't think so. Note that the tests are currently marked as "fails", not "todo". And I believe that failing as they're currently written is correct behavior; we should reverse the red and green in the test and stylesheet and mark them as just ==, perhaps.
The tests as currently written are also part of the CSS 2.1 test suite, where we also fail them.
I thought they'd been brought up as incorrect for the suite and I fixed them there.... Is that not the case?
http://test.csswg.org/suites/css2.1/20101001/html4/at-charset-utf16-be-002.htm http://test.csswg.org/suites/css2.1/20101001/html4/at-charset-utf16-le-002.htm http://test.csswg.org/suites/css2.1/20101001/xhtml1/at-charset-utf16-be-002.xht http://test.csswg.org/suites/css2.1/20101001/xhtml1/at-charset-utf16-le-002.xht I think fantasai may have re-imported the tests from our tree rather than using the contribution I collected. Could that be the issue? Or were your fixes newer than the beginning of this month?
No, apparently I just never fixed these. :( And I can't find the relevant public-css-testsuite emails, so maybe I imagined those too.... In any case, iirc the right behavior here is to ignore @charset "UTF-16LE" and likewise for BE, because it's not a valid charset name, and just use the BOM to determine endianness. At least as I recall this stuff at this point.
Any chance you could send email to public-css-testsuite explaining what should happen and why?
OK, I just read through this bug and http://www.w3.org/TR/CSS21/syndata.html#charset again really carefully. Given those rules, these tests are incorrect. In particular, the rules in that section tell us to decode the stylesheet "as specified". When doing that, if we followed the Unicode standard (which we don't; see comment 9) we have a garbage U+FEFF character at the beginning of the character stream, but that doesn't affect anything other than parsing of the @charset rule... which should get discarded rather than parsed. We can't detect that with a reftest; it would require a CSSOM test to check that. So I think we should reverse the sense of these tests. I'll mail the list. It might be worth adding a test that has a BE BOM, then has an @charset encoded in BE which says "UTF-16LE" and have the actual rules encoded as LE.... but maybe that's out of scope at this point.
bzbarsky sent said email: http://lists.w3.org/Archives/Public/public-css-testsuite/2010Oct/0166.html
bz, do you recall if layout/reftests/css-charset/test-charset-utf-16-be-no-bom.css and its le variant are motivated by real Web content or if they test stuff just because the spec said so? I am guessing that they exist because of the spec’s say-so and not to address any real-world Web-compat issue. Those two tests are invalidated by the current CSS3 Syntax draft.
I proposed the change assuming UTF-16, UTF-16LE, and UTF-16BE decoders follow the Unicode spec and IANA registry. But I'm going to remove BOM-sniffing UTF-16 decoder to comply the Encoding Standard now. Bug 335531 should also be reverted.
I think I just wrote the test to make sure to test all the possibilities. I don't recall actual web content depending on this stuff; to a first approximation web content never uses @charset.
Thanks. I’ll edit the test cases.
You need to log in before you can comment on or make changes to this bug.