Closed
Bug 462458
Opened 16 years ago
Closed 16 years ago
[FIX]Update @charset detection to spec changes
Categories
(Core :: CSS Parsing and Computation, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: bzbarsky, Assigned: bzbarsky)
Details
Attachments
(1 file, 1 obsolete file)
12.40 KB,
patch
|
beltzner
:
approval1.9.1b2+
|
Details | Diff | Splinter Review |
The spec has gotten a good bit stricter about what's allowed in @charset. In particular, runs of whitespace are not allowed before the opening quote of the charset name, and only double-quotes are allowed to delimit the charset name.
Assignee | ||
Comment 1•16 years ago
|
||
This just removes the whitespace-skipping and disallows backslash-escapes and the use of single-quotes around the charset name.
Attachment #345629 -
Flags: superreview?(dbaron)
Attachment #345629 -
Flags: review?(dbaron)
Assignee | ||
Updated•16 years ago
|
Summary: Update @charset detection to spec changes → [FIX]Update @charset detection to spec changes
Warning: potential conflicts with bug 335531, depending on who lands first. If you land first, you might want to review the merging there.
The spec in question is at http://www.w3.org/TR/CSS21/syndata.html#charset
Comment on attachment 345629 [details] [diff] [review]
Like so
r+sr=dbaron
Having some tests for this at some point would be nice...
Attachment #345629 -
Flags: superreview?(dbaron)
Attachment #345629 -
Flags: superreview+
Attachment #345629 -
Flags: review?(dbaron)
Attachment #345629 -
Flags: review+
Assignee | ||
Comment 5•16 years ago
|
||
Attachment #345629 -
Attachment is obsolete: true
Comment 6•16 years ago
|
||
Comment on attachment 346387 [details] [diff] [review]
Patch with tests
test-charset-utf-16-be-bom and test-charset-utf-16-le-bom should fail because UTF-32BE and UTF-32LE don't allow the BOM.
Comment 7•16 years ago
|
||
(In reply to comment #6)
> UTF-32BE and UTF-32LE don't allow the BOM.
Oops, UTF-16BE and UTF-16LE don't allow the BOM.
Assignee | ||
Comment 8•16 years ago
|
||
Interesting. They certainly pass in our code as it stands. I assume your patch for bug 335531 will change that?
I'm happy to remove those tests (or change them into tests testing for failure, which might be preferable; just need to watch out for landing order in that case).
Comment 9•16 years ago
|
||
(In reply to comment #8)
> Interesting. They certainly pass in our code as it stands. I assume your
> patch for bug 335531 will change that?
I meant to say, "test-charset-utf-16-be-bom and test-charset-utf-16-le-bom should fail, but actually don't."
At present, I made no changes the behavior of UTF-16BE and UTF-16LE decoders due to concern about compatibility.
> I'm happy to remove those tests (or change them into tests testing for failure,
> which might be preferable; just need to watch out for landing order in that
> case).
Plaese remove tests from this patch. I'll made a new patch with modified test if needed. Do you agree to change decoders?
Assignee | ||
Comment 10•16 years ago
|
||
I'll keep the tests but reverse the sense and mark them todo.
No opinion on decoders; that's not my area of expertise.
I should also note that given the current style system code I'd expect those tests to pass no matter what the decoders do here; they'll only start to fail if we change the decoders _and_ we make some CSS parser changes to toss the sheet if the @charset that's parsed doesn't match what's sniffed.
Assignee | ||
Comment 11•16 years ago
|
||
Comment on attachment 346387 [details] [diff] [review]
Patch with tests
It'd be good to get real-world testing here in case sites use single-quotes on charset...
Attachment #346387 -
Flags: approval1.9.1b2?
Comment 12•16 years ago
|
||
Comment on attachment 346387 [details] [diff] [review]
Patch with tests
a=beltzner, mmm, tests
Attachment #346387 -
Flags: approval1.9.1b2? → approval1.9.1b2+
Assignee | ||
Updated•16 years ago
|
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
(In reply to comment #10)
> I'll keep the tests but reverse the sense and mark them todo.
Do we have a bug on this?
Assignee | ||
Comment 15•14 years ago
|
||
I don't think so. Note that the tests are currently marked as "fails", not "todo". And I believe that failing as they're currently written is correct behavior; we should reverse the red and green in the test and stylesheet and mark them as just ==, perhaps.
The tests as currently written are also part of the CSS 2.1 test suite, where we also fail them.
Assignee | ||
Comment 17•14 years ago
|
||
I thought they'd been brought up as incorrect for the suite and I fixed them there.... Is that not the case?
http://test.csswg.org/suites/css2.1/20101001/html4/at-charset-utf16-be-002.htm
http://test.csswg.org/suites/css2.1/20101001/html4/at-charset-utf16-le-002.htm
http://test.csswg.org/suites/css2.1/20101001/xhtml1/at-charset-utf16-be-002.xht
http://test.csswg.org/suites/css2.1/20101001/xhtml1/at-charset-utf16-le-002.xht
I think fantasai may have re-imported the tests from our tree rather than using the contribution I collected. Could that be the issue? Or were your fixes newer than the beginning of this month?
Assignee | ||
Comment 19•14 years ago
|
||
No, apparently I just never fixed these. :( And I can't find the relevant public-css-testsuite emails, so maybe I imagined those too....
In any case, iirc the right behavior here is to ignore @charset "UTF-16LE" and likewise for BE, because it's not a valid charset name, and just use the BOM to determine endianness. At least as I recall this stuff at this point.
Any chance you could send email to public-css-testsuite explaining what should happen and why?
Assignee | ||
Comment 21•14 years ago
|
||
OK, I just read through this bug and http://www.w3.org/TR/CSS21/syndata.html#charset again really carefully. Given those rules, these tests are incorrect. In particular, the rules in that section tell us to decode the stylesheet "as specified". When doing that, if we followed the Unicode standard (which we don't; see comment 9) we have a garbage U+FEFF character at the beginning of the character stream, but that doesn't affect anything other than parsing of the @charset rule... which should get discarded rather than parsed. We can't detect that with a reftest; it would require a CSSOM test to check that.
So I think we should reverse the sense of these tests. I'll mail the list.
It might be worth adding a test that has a BE BOM, then has an @charset encoded in BE which says "UTF-16LE" and have the actual rules encoded as LE.... but maybe that's out of scope at this point.
bzbarsky sent said email:
http://lists.w3.org/Archives/Public/public-css-testsuite/2010Oct/0166.html
Comment 23•12 years ago
|
||
bz, do you recall if layout/reftests/css-charset/test-charset-utf-16-be-no-bom.css and its le variant are motivated by real Web content or if they test stuff just because the spec said so?
I am guessing that they exist because of the spec’s say-so and not to address any real-world Web-compat issue.
Those two tests are invalidated by the current CSS3 Syntax draft.
Comment 24•12 years ago
|
||
I proposed the change assuming UTF-16, UTF-16LE, and UTF-16BE decoders follow the Unicode spec and IANA registry. But I'm going to remove BOM-sniffing UTF-16 decoder to comply the Encoding Standard now. Bug 335531 should also be reverted.
Assignee | ||
Comment 25•12 years ago
|
||
I think I just wrote the test to make sure to test all the possibilities. I don't recall actual web content depending on this stuff; to a first approximation web content never uses @charset.
Comment 26•12 years ago
|
||
Thanks. I’ll edit the test cases.
You need to log in
before you can comment on or make changes to this bug.
Description
•