Closed Bug 462458 Opened 16 years ago Closed 16 years ago

[FIX]Update @charset detection to spec changes

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: bzbarsky, Assigned: bzbarsky)

Details

Attachments

(1 file, 1 obsolete file)

Like so 16 years ago Boris Zbarsky [:bzbarsky] 2.03 KB, patch	dbaron : review+ dbaron : superreview+	Details \| Diff \| Splinter Review
Patch with tests 16 years ago Boris Zbarsky [:bzbarsky] 12.40 KB, patch	beltzner : approval1.9.1b2+	Details \| Diff \| Splinter Review

Boris Zbarsky [:bzbarsky]

Assignee

Description

•

16 years ago

The spec has gotten a good bit stricter about what's allowed in @charset. In particular, runs of whitespace are not allowed before the opening quote of the charset name, and only double-quotes are allowed to delimit the charset name.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 1

•

16 years ago

Attached patch Like so (obsolete) — Details — Splinter Review

This just removes the whitespace-skipping and disallows backslash-escapes and the use of single-quotes around the charset name.

Attachment #345629 - Flags: superreview?(dbaron)

Attachment #345629 - Flags: review?(dbaron)

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

16 years ago

Summary: Update @charset detection to spec changes → [FIX]Update @charset detection to spec changes

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 2

•

16 years ago

Warning: potential conflicts with bug 335531, depending on who lands first. If you land first, you might want to review the merging there.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 3

•

16 years ago

The spec in question is at http://www.w3.org/TR/CSS21/syndata.html#charset

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 4

•

16 years ago

Comment on attachment 345629 [details] [diff] [review] Like so r+sr=dbaron Having some tests for this at some point would be nice...

Attachment #345629 - Flags: superreview?(dbaron)

Attachment #345629 - Flags: superreview+

Attachment #345629 - Flags: review?(dbaron)

Attachment #345629 - Flags: review+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 5

•

16 years ago

Attached patch Patch with tests — Details — Splinter Review

Attachment #345629 - Attachment is obsolete: true

Masatoshi Kimura [:emk]

Comment 6

•

16 years ago

Comment on attachment 346387 [details] [diff] [review] Patch with tests test-charset-utf-16-be-bom and test-charset-utf-16-le-bom should fail because UTF-32BE and UTF-32LE don't allow the BOM.

Masatoshi Kimura [:emk]

Comment 7

•

16 years ago

(In reply to comment #6) > UTF-32BE and UTF-32LE don't allow the BOM. Oops, UTF-16BE and UTF-16LE don't allow the BOM.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 8

•

16 years ago

Interesting. They certainly pass in our code as it stands. I assume your patch for bug 335531 will change that? I'm happy to remove those tests (or change them into tests testing for failure, which might be preferable; just need to watch out for landing order in that case).

Masatoshi Kimura [:emk]

Comment 9

•

16 years ago

(In reply to comment #8) > Interesting. They certainly pass in our code as it stands. I assume your > patch for bug 335531 will change that? I meant to say, "test-charset-utf-16-be-bom and test-charset-utf-16-le-bom should fail, but actually don't." At present, I made no changes the behavior of UTF-16BE and UTF-16LE decoders due to concern about compatibility. > I'm happy to remove those tests (or change them into tests testing for failure, > which might be preferable; just need to watch out for landing order in that > case). Plaese remove tests from this patch. I'll made a new patch with modified test if needed. Do you agree to change decoders?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 10

•

16 years ago

I'll keep the tests but reverse the sense and mark them todo. No opinion on decoders; that's not my area of expertise. I should also note that given the current style system code I'd expect those tests to pass no matter what the decoders do here; they'll only start to fail if we change the decoders _and_ we make some CSS parser changes to toss the sheet if the @charset that's parsed doesn't match what's sniffed.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 11

•

16 years ago

Comment on attachment 346387 [details] [diff] [review] Patch with tests It'd be good to get real-world testing here in case sites use single-quotes on charset...

Attachment #346387 - Flags: approval1.9.1b2?

Mike Beltzner [:beltzner, not reading bugmail]

Comment 12

•

16 years ago

Comment on attachment 346387 [details] [diff] [review] Patch with tests a=beltzner, mmm, tests

Attachment #346387 - Flags: approval1.9.1b2? → approval1.9.1b2+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 13

•

16 years ago

Pushed changeset b765de190a11.

Flags: in-testsuite+

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

16 years ago

Status: NEW → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 14

•

14 years ago

(In reply to comment #10) > I'll keep the tests but reverse the sense and mark them todo. Do we have a bug on this?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 15

•

14 years ago

I don't think so. Note that the tests are currently marked as "fails", not "todo". And I believe that failing as they're currently written is correct behavior; we should reverse the red and green in the test and stylesheet and mark them as just ==, perhaps.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 16

•

14 years ago

The tests as currently written are also part of the CSS 2.1 test suite, where we also fail them.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 17

•

14 years ago

I thought they'd been brought up as incorrect for the suite and I fixed them there.... Is that not the case?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 18

•

14 years ago

http://test.csswg.org/suites/css2.1/20101001/html4/at-charset-utf16-be-002.htm http://test.csswg.org/suites/css2.1/20101001/html4/at-charset-utf16-le-002.htm http://test.csswg.org/suites/css2.1/20101001/xhtml1/at-charset-utf16-be-002.xht http://test.csswg.org/suites/css2.1/20101001/xhtml1/at-charset-utf16-le-002.xht I think fantasai may have re-imported the tests from our tree rather than using the contribution I collected. Could that be the issue? Or were your fixes newer than the beginning of this month?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 19

•

14 years ago

No, apparently I just never fixed these. :( And I can't find the relevant public-css-testsuite emails, so maybe I imagined those too.... In any case, iirc the right behavior here is to ignore @charset "UTF-16LE" and likewise for BE, because it's not a valid charset name, and just use the BOM to determine endianness. At least as I recall this stuff at this point.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 20

•

14 years ago

Any chance you could send email to public-css-testsuite explaining what should happen and why?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 21

•

14 years ago

OK, I just read through this bug and http://www.w3.org/TR/CSS21/syndata.html#charset again really carefully. Given those rules, these tests are incorrect. In particular, the rules in that section tell us to decode the stylesheet "as specified". When doing that, if we followed the Unicode standard (which we don't; see comment 9) we have a garbage U+FEFF character at the beginning of the character stream, but that doesn't affect anything other than parsing of the @charset rule... which should get discarded rather than parsed. We can't detect that with a reftest; it would require a CSSOM test to check that. So I think we should reverse the sense of these tests. I'll mail the list. It might be worth adding a test that has a BE BOM, then has an @charset encoded in BE which says "UTF-16LE" and have the actual rules encoded as LE.... but maybe that's out of scope at this point.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 22

•

14 years ago

bzbarsky sent said email: http://lists.w3.org/Archives/Public/public-css-testsuite/2010Oct/0166.html

Henri Sivonen (:hsivonen)

Comment 23

•

12 years ago

bz, do you recall if layout/reftests/css-charset/test-charset-utf-16-be-no-bom.css and its le variant are motivated by real Web content or if they test stuff just because the spec said so? I am guessing that they exist because of the spec’s say-so and not to address any real-world Web-compat issue. Those two tests are invalidated by the current CSS3 Syntax draft.

Masatoshi Kimura [:emk]

Comment 24

•

12 years ago

I proposed the change assuming UTF-16, UTF-16LE, and UTF-16BE decoders follow the Unicode spec and IANA registry. But I'm going to remove BOM-sniffing UTF-16 decoder to comply the Encoding Standard now. Bug 335531 should also be reverted.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 25

•

12 years ago

I think I just wrote the test to make sure to test all the possibilities. I don't recall actual web content depending on this stuff; to a first approximation web content never uses @charset.

Henri Sivonen (:hsivonen)

Comment 26

•

12 years ago

Thanks. I’ll edit the test cases.

You need to log in before you can comment on or make changes to this bug.