Closed Bug 717529 Opened 13 years ago Closed 13 years ago

Support CSS escape sequences like `\d834\df06 ` (broken up in UTF-16 code units)

Tracking

()

Status:

RESOLVED INVALID

People

(Reporter: mathias, Unassigned)

References

(
URL
)

Details

Mathias Bynens

Reporter

Description

•

13 years ago

Gecko is the only engine that doesn’t support CSS escape sequences of the form `\d834\df06 ` (broken up in UTF-16 code units). Even though I cannot find any mention of these in the spec (http://www.w3.org/TR/CSS21/syndata.html#characters), it would be better for interoperability if Gecko added support for these. The example I’m using — `\d834\df06 ` should be identical to `\1d306 ` or `\01d306`, both of which are escape sequences for the “tetragram for centre” symbol (U+1D306). Here’s a simple test case: http://jsfiddle.net/mathias/jY7ra/ Opera and IE8+ support both types of escape sequences. Note that WebKit doesn’t support the standard CSS escape sequences for symbols outside the BMP; see https://bugs.webkit.org/show_bug.cgi?id=76152.

Mathias Bynens

Reporter

Updated

•

13 years ago

URL: http://jsfiddle.net/mathias/jY7ra/

See Also: → https://bugs.webkit.org/show_bug.cgi?id=76152

Jonathan Kew [:jfkthame]

Comment 1

•

13 years ago

Given that a backslash-hexadecimal CSS escape sequence is defined as providing the code number of _an ISO 10646 character_, not the code number of a UTF16 code unit, I don't think this should be supported. The correct way to represent a non-BMP character is to use a 5- or 6-hexdigit sequence. Perhaps this could be clarified in http://www.w3.org/TR/css3-syntax; IMO, the range \d000 to \dfff should, if anything, be explicitly made invalid. cc'ing dbaron for any thoughts.

Mathias Bynens

Reporter

Comment 2

•

13 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #1) > Given that a backslash-hexadecimal CSS escape sequence is defined as > providing the code number of _an ISO 10646 character_, not the code number > of a UTF16 code unit, I don't think this should be supported. The correct > way to represent a non-BMP character is to use a 5- or 6-hexdigit sequence. Don’t get me wrong, I definitely agree that is the only correct way according to the spec; I filed this “bug” purely because of interoperability concerns. Since Opera, WebKit and IE8+ support this non-standard syntax, it would be nice if Firefox could support it as well. Either way the spec could use some tweaks, be it by just clarifying what you suggested, or by changing it to reflect reality (i.e. defining the “broken-up UTF-16 code units” syntax that almost all browsers have implemented).

Mathias Bynens

Reporter

Comment 3

•

13 years ago

Taking this to www-style: http://lists.w3.org/Archives/Public/www-style/2012Jan/0536.html

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 4

•

13 years ago

If we were to support this, we'd need a way to prevent unpaired surrogates from ending up in internal data structures -- that could be a security risk. The current code that does this is the use of ENSURE_VALID_CHAR() inside of nsCSSScanner::ParseAndAppendEscape (in layout/style/nsCSSScanner.cpp). ENSURE_VALID_CHAR is defined in xpcom/string/public/nsCharTraits.h and converts surrogates (U+D800 to U+DFFF) and codepoints greater than U+10FFFF to U+FFFD.

Mathias Bynens

Reporter

Comment 5

•

13 years ago

This bug is invalid as per http://lists.w3.org/Archives/Public/www-style/2012Feb/0006.html.

Status: UNCONFIRMED → RESOLVED

Closed: 13 years ago

Resolution: --- → INVALID

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Support CSS escape sequences like `\d834\df06 ` (broken up in UTF-16 code units)

Categories

(Core :: CSS Parsing and Computation, defect)

Tracking

()

People

(Reporter: mathias, Unassigned)

References

(
URL
)

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5