Closed Bug 299148 Opened 20 years ago Closed 20 years ago

[FIXr]Some CSS unicode escape characters not allowed in id selector

Categories

(Core :: CSS Parsing and Computation, defect, P1)

x86
All
defect

Tracking

()

RESOLVED FIXED
mozilla1.8beta4

People

(Reporter: jbw22285, Assigned: bzbarsky)

Details

(Keywords: css2)

Attachments

(2 files)

Using build: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b2) Gecko/20050629 Firefox/1.0+ Certain unicode escape characters don't seem to be accepted by Mozilla as id selectors in CSS. For example, the line: #\0032, .foo {color:green} is being treated as invalid, and anything in the "foo" class will not be green. Unicode escape characters that represent control characters, numeric digits, and various punctuation characters are being treated as invalid. The unicode characters that are being invalidated even though they are escaped are: U+0000 through U+0040, U+005B, U+005D, U +005E, U+0060, and U+007B through U+00A0 Please see the testcase (soon to be attached) for a test which shows this bug. We did test all unicode characters above the ASCII range, and all of them were fine except for those listed above. We understood the following specification as indicating that all unicode escape characters are valid. From: http://www.w3.org/TR/CSS21/syndata.html#q6 -------------------------------------------- "Backslash escapes are always considered to be part of an identifier or a string (i.e., "\7B" is not punctuation, even though "{" is, and "\32" is allowed at the start of a class name, even though "2" is not)." -------------------------------------------- So according to these specs, something like #\0032 as an id selector should be valid, correct? Our understanding of the tokenization section also seems to indicate that all unicode escapes should be valid. From: http://www.w3.org/TR/CSS21/syndata.html#tokenization -------------------------------------------- ident [-]?{nmstart}{nmchar}* nmstart [_a-zA-Z]|{nonascii}|{escape} escape {unicode}|\\[^\n\r\f0-9a-f] unicode \\[0-9a-f]{1,6}(\r\n|[ \n\r\t\f])? -------------------------------------------- Thus an id selector such as #\0032 should be valid according to these CSS2.1 specs, right?
OS: Windows XP → All
I think this is INVALID. The problem with an selector like #\0000 is not with the escape character per se, but that the id itself is invalid per http://www.w3.org/TR/REC-html40/types.html#type-name | ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by | any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons | (":"), and periods ("."). If you change all the selectors to #A\0000, #A\0001, etc., every line goes green.
For what it's worth, the IDs are also invalid in both XML 1.0 (and thereby XHTML). http://www.w3.org/TR/2004/REC-xml-20040204/#NT-Name However, whether or not it would be valid HTML or XHTML is not material to whether the CSS is valid. The test case does not have any invalid IDs in the XHTML -- the IDs in question are only in the CSS. It validates on both the W3C HTML validator (as valid XHTML 1.1) and the CSS validator (with the exception of the intentionally invalid control statement). The CSS 2.1 spec is explicit about allowing all unicode escapes in identifiers. Since it is allowed in CSS, the CSS parser should not be ignoring the whole statement as if there was an error in the selector. Also note that a similar test (which tests an identifier starting with '#\6' is included in the working copy of the CSS 2.1 Test Suite, and Mozilla is currently failing this test because of this bug.
The problem is that the CSS will never match as those elements are treated as if no ID was defined.
What makes this hard to fix is that the CSS tokenization rules use the same token for #rrggbb and #rgb colors and for #id selectors, which means that for the ID selector case we need to do additional checking to ensure that it's a valid identifier. This additional identifier check happens after character escape processing, which was a mistake when bug 239079 was fixed.
Fixing this probably requires splitting up the token into two tokens -- one for those that are valid identifiers and one for those that aren't.
Probably three tokens; one for colours, one for identifiers that can have the leading "-", and one for those who can't.
Attached patch Proposed patchSplinter Review
This fixes the testcase in this bug (modulo the \0000 thing which is not defined in CSS2.1), doesn't regress bug 239079, and makes the tests at http://web.mit.edu/bzbarsky/www/testcases/css2.1-conformance/t0509-id-sel-syntax-01-f.xml and http://web.mit.edu/bzbarsky/www/testcases/css2.1-conformance/t0509-id-sel-syntax-02-b.xml both happy.
Assignee: dbaron → bzbarsky
Status: NEW → ASSIGNED
Attachment #188876 - Flags: superreview?(dbaron)
Attachment #188876 - Flags: review?(dbaron)
This does make our error reporting here less informative, but I don't see an easy way around that.
Priority: -- → P1
Summary: Some CSS unicode escape characters not allowed in id selector → [FIX]Some CSS unicode escape characters not allowed in id selector
Target Milestone: --- → mozilla1.8beta4
Attachment #188876 - Flags: superreview?(dbaron)
Attachment #188876 - Flags: superreview+
Attachment #188876 - Flags: review?(dbaron)
Attachment #188876 - Flags: review+
Comment on attachment 188876 [details] [diff] [review] Proposed patch Requesting 1.8b4 approval for this CSS parsing regression fix
Attachment #188876 - Flags: approval1.8b4?
Summary: [FIX]Some CSS unicode escape characters not allowed in id selector → [FIXr]Some CSS unicode escape characters not allowed in id selector
Attachment #188876 - Flags: approval1.8b4? → approval1.8b4+
Fixed.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: