Closed Bug 1032034 Opened 11 years ago Closed 7 years ago

u+foo is not parsed as a valid selector due to the "u+f" being treated as a unicode-range-token

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla62

Tracking Flags:

Tracking

Status

firefox62

---

fixed

People

(Reporter: proger.xp, Assigned: emilio, Mentored)

Details

Attachments

(2 files)

demo.html 11 years ago proger.xp 207 bytes, text/html		Details
Bug 1032034: Test that unicode range syntax isn't a primitive token. 7 years ago Emilio Cobos Álvarez (:emilio) 59 bytes, text/x-review-board-request	bzbarsky : review+	Details

proger.xp

Reporter

Description

•

11 years ago

Attached file demo.html — Details

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0 (Beta/Release) Steps to reproduce: demo.html: <!DOCTYPE html> <html> <head> <style> #nav u+a { background: yellow; } </style> </head> <body> <nav id="nav"> <u>THE U</u> <a href="#">THE A</a> </nav> </body> </html> Actual results: The <a> element is not styled (not yellow). Rule defined in <style> is dropped; console says: Dangling combinator. Ruleset ignored due to bad selector. This can be fixed by adding a space either after u+ or before +a or both: #nav u+ a { background: yellow; } #nav u +a { background: yellow; } #nav u + a { background: yellow; } Either of these 3 works. Tested on: 33 (latest), 28, 26. All broken. Expected results: u+a should work the same way as u + a with or without spaces.

proger.xp

Reporter

Updated

•

11 years ago

OS: Linux → All

Petruta Horea [:phorea], Desktop Test Engineering

Comment 1

•

11 years ago

Reproduced on latest Nightly 33.0a1 2014-06-29 under Win 7, 64-bit. Please notice that the yellow background is not displayed on Chrome too.

Component: Untriaged → CSS Parsing and Computation

Product: Firefox → Core

proger.xp

Reporter

Comment 2

•

11 years ago

> Please notice that the yellow background is not displayed on Chrome too. It is not indeed but this behaviour is a mystery to me. Obviously it also doesn't work on Safari. But it *does* work on IE 10+ - probably one of those lucky cases when Microsoft got it right.

Boris Zbarsky [:bzbarsky]

Comment 3

•

11 years ago

Consider how this gets tokenized by the CSS tokenizer. I'm basing the analysis on http://dev.w3.org/csswg/css-syntax/ which, unlike CSS2, has a clearly defined tokenizer. First we see the '#nav'. This produces a <hash-token>. Then we see a space. This produces a <whitespace-token>. Then we see a 'u'. Looking at http://dev.w3.org/csswg/css-syntax/#consume-a-token we see: U+0055 LATIN CAPITAL LETTER U (U) U+0075 LATIN SMALL LETTER U (u) If the next 2 input code points are U+002B PLUS SIGN (+) followed by a hex digit or U+003F QUESTION MARK (?), consume the next input code point. Note: don’t consume both of them. Consume a unicode-range token and return it. In this case, the next two input code points are in fact U+002B followed by a hex digit, so this returns a <unicode-range-token>. This token is not valid in selectors, so we seem to treat this as end of selector and claim that we had a combinator (space) followed by no more selector, hence dangling combinator. So the error reporting is suboptimal, for sure. It would be nice if it actually said it saw an unexpected token, and what kind. IE does support unicode-range, so I wonder what exactly they are doing in their tokenizer, and whether their tokenizer is based on the spec at all... That said, this is a pretty weird footgun, and I wonder whether we can adjust the spec here somehow. Right now "#nav u+i" would work, but "#nav u+b" would not, nor would "#nav u+code" and various others! I posted http://lists.w3.org/Archives/Public/www-style/2014Jun/0486.html about that.

Flags: needinfo?(jackalmage)

Flags: needinfo?(dbaron)

Summary: Dangling combinator. Ruleset ignored due to bad selector. → u+foo is not parsed as a valid selector due to the "u+f" being treated as a unicode-range-token

Boris Zbarsky [:bzbarsky]

Comment 4

•

11 years ago

And thank you for the excellent bug report!

Status: UNCONFIRMED → NEW

Ever confirmed: true

proger.xp

Reporter

Comment 5

•

11 years ago

Thanks Boris, I would have never guessed that it expects to see a Unicode sequence. Shouldn't it be U+... (case-sensitive)? At any rate this is quite a gotcha, I would expect the parser to ignore possible Unicode reference as soon as it fails to recognize it (it doesn't fail on u+zzz so why it does on u+a?). That's probably what IE does, reasonably. Curious to see W3C's reaction.

proger.xp

Reporter

Comment 6

•

11 years ago

And by the way, forgot to mention that I ran into this after minifying some stylesheets so I guess many if not all existing CSS minifiers are broken due to this behaviour as they always remove spaces around selector symbols.

Boris Zbarsky [:bzbarsky]

Comment 7

•

11 years ago

> I would have never guessed that it expects to see a Unicode sequence. It actually expects a sequence or range. Something like u+123-678, but the part after '-' is optional. > Shouldn't it be U+... (case-sensitive)? CSS syntax is uniformly case-insensitive, so both U+ and u+ are allowed. > it doesn't fail on u+zzz so why it does on u+a Because 'a' is a valid hex digit and 'z' is not.

proger.xp

Reporter

Comment 8

•

11 years ago

> Because 'a' is a valid hex digit and 'z' is not. That was my point though now I have checked the spec and found that it starts unicode-range-token after u+<hex digit> and not just 'u+'. Well, Firefox and Chrome do all right by the spec but it's still lousy IMO.

Boris Zbarsky [:bzbarsky]

Comment 9

•

11 years ago

Right, we agree the current spec situation is not OK. ;)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 10

•

11 years ago

The working group discussed this in today's teleconference and decided to change things so unicode-range is not a special token, but instead gets parsed from the appropriate set of tokens. As a shorter-term fix, I think we could probably just switch to context-sensitive tokenization (i.e., calling a different scanner method when parsing the unicode-range descriptor) for now, at least until the spec for the new way stabilizes.

Flags: needinfo?(dbaron)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 11

•

11 years ago

http://dev.w3.org/csswg/css-syntax/#urange-syntax has draft rules, though likely to move to css-fonts. I think we should implement this by: * changing nsCSSToken to have an nsDependentString for its textual representation (I think we can do this, though we'd need to check) * as followup to this, getting rid of the complexity of nsCSSToken::AppendToString, which I think is only currently used in error messages (need to check that too -- if not it might be a problem) * then using nsCSSToken::AppendToString to implement what is described in the spec, and simultaneously remove the eCSSToken_URange token, and the mInteger2 from nsCSSToken.

Boris Zbarsky [:bzbarsky]

Updated

•

10 years ago

Mentor: bzbarsky

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 12

•

7 years ago

Does stylo do this correctly? (i.e., can we close this?)

Flags: needinfo?(simon.sapin)

Simon Sapin (:SimonSapin)

Comment 13

•

7 years ago

Yes. rust-cssparser’s tokenizer does not have a unicode-range token (anymore), and instead provides a type and method to parse <unicode-range> based on other tokens. This matches what is not specified in CSS Syntax Level 3: * https://drafts.csswg.org/css-syntax/#tokenizer-algorithms * https://drafts.csswg.org/css-syntax/#urange I’ve verified that the yellow background does show up in Firefox 60. (It presumably does since 57.)

Status: NEW → RESOLVED

Closed: 7 years ago

Flags: needinfo?(simon.sapin)

Flags: needinfo?(jackalmage)

Resolution: --- → FIXED

Simon Sapin (:SimonSapin)

Comment 14

•

7 years ago

> what is not specified Typo: what is *now* specified

Boris Zbarsky [:bzbarsky]

Comment 15

•

7 years ago

We should really make sure we have a testcase for this...

Emilio Cobos Álvarez (:emilio)

Assignee

Comment 16

•

7 years ago

Agreed.

Assignee: nobody → emilio

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Comment hidden (mozreview-request)

Boris Zbarsky [:bzbarsky]

Comment 18

•

7 years ago

mozreview-review

Comment on attachment 8974782 [details] Bug 1032034: Test that unicode range syntax isn't a primitive token. https://reviewboard.mozilla.org/r/243184/#review249026 Thank you!

Attachment #8974782 - Flags: review?(bzbarsky) → review+

Pulsebot

Comment 19

•

7 years ago

Pushed by ecoal95@gmail.com: https://hg.mozilla.org/integration/autoland/rev/85d3e5f02273 Test that unicode range syntax isn't a primitive token. r=bz

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 20

•

7 years ago

Created web-platform-tests PR https://github.com/w3c/web-platform-tests/pull/10960 for changes under testing/web-platform/tests

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 21

•

7 years ago

Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.

Eliza Balazs [:ebalazs_]

Comment 22

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/85d3e5f02273

Status: REOPENED → RESOLVED

Closed: 7 years ago → 7 years ago

status-firefox62: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla62

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 23

•

7 years ago

Upstream PR merged

You need to log in before you can comment on or make changes to this bug.