u+foo is not parsed as a valid selector due to the "u+f" being treated as a unicode-range-token

RESOLVED FIXED in Firefox 62



CSS Parsing and Computation
4 years ago
a month ago


(Reporter: proger.xp, Assigned: emilio, Mentored)


33 Branch

Firefox Tracking Flags

(firefox62 fixed)


MozReview Requests


Submitter Diff Changes Open Issues Last Updated
Error loading review requests:


(2 attachments)



4 years ago
Created attachment 8447820 [details]

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0 (Beta/Release)

Steps to reproduce:


<!DOCTYPE html>
    #nav u+a { background: yellow; }
    <nav id="nav">
      <u>THE U</u>
      <a href="#">THE A</a>

Actual results:

The <a> element is not styled (not yellow). Rule defined in <style> is dropped; console says:

Dangling combinator.  Ruleset ignored due to bad selector.

This can be fixed by adding a space either after u+ or before +a or both:

    #nav u+ a { background: yellow; }
    #nav u +a { background: yellow; }
    #nav u + a { background: yellow; }

Either of these 3 works.

Tested on: 33 (latest), 28, 26. All broken.

Expected results:

u+a should work the same way as u + a with or without spaces.


4 years ago
OS: Linux → All
Reproduced on latest Nightly 33.0a1 2014-06-29 under Win 7, 64-bit. 
Please notice that the yellow background is not displayed on Chrome too.
Component: Untriaged → CSS Parsing and Computation
Product: Firefox → Core

Comment 2

4 years ago
> Please notice that the yellow background is not displayed on Chrome too.
It is not indeed but this behaviour is a mystery to me. Obviously it also doesn't work on Safari. 

But it *does* work on IE 10+ - probably one of those lucky cases when Microsoft got it right.
Consider how this gets tokenized by the CSS tokenizer.  I'm basing the analysis on http://dev.w3.org/csswg/css-syntax/ which, unlike CSS2, has a clearly defined tokenizer.

First we see the '#nav'.  This produces a <hash-token>.

Then we see a space.  This produces a <whitespace-token>.

Then we see a 'u'.  Looking at http://dev.w3.org/csswg/css-syntax/#consume-a-token we see:

    If the next 2 input code points are U+002B PLUS SIGN (+) followed by a hex digit or
    U+003F QUESTION MARK (?), consume the next input code point. Note: don’t consume both
    of them. Consume a unicode-range token and return it. 

In this case, the next two input code points are in fact U+002B followed by a hex digit, so this returns a <unicode-range-token>.  This token is not valid in selectors, so we seem to treat this as end of selector and claim that we had a combinator (space) followed by no more selector, hence dangling combinator.

So the error reporting is suboptimal, for sure.  It would be nice if it actually said it saw an unexpected token, and what kind.

IE does support unicode-range, so I wonder what exactly they are doing in their tokenizer, and whether their tokenizer is based on the spec at all...

That said, this is a pretty weird footgun, and I wonder whether we can adjust the spec here somehow.  Right now "#nav u+i" would work, but "#nav u+b" would not, nor would "#nav u+code" and various others!

I posted http://lists.w3.org/Archives/Public/www-style/2014Jun/0486.html about that.
Flags: needinfo?(jackalmage)
Flags: needinfo?(dbaron)
Summary: Dangling combinator. Ruleset ignored due to bad selector. → u+foo is not parsed as a valid selector due to the "u+f" being treated as a unicode-range-token
And thank you for the excellent bug report!
Ever confirmed: true

Comment 5

4 years ago
Thanks Boris, I would have never guessed that it expects to see a Unicode sequence. Shouldn't it be U+... (case-sensitive)? At any rate this is quite a gotcha, I would expect the parser to ignore possible Unicode reference as soon as it fails to recognize it (it doesn't fail on u+zzz so why it does on u+a?). That's probably what IE does, reasonably.

Curious to see W3C's reaction.

Comment 6

4 years ago
And by the way, forgot to mention that I ran into this after minifying some stylesheets so I guess many if not all existing CSS minifiers are broken due to this behaviour as they always remove spaces around selector symbols.
> I would have never guessed that it expects to see a Unicode sequence.

It actually expects a sequence or range.  Something like u+123-678, but the part after '-' is optional.

> Shouldn't it be U+... (case-sensitive)? 

CSS syntax is uniformly case-insensitive, so both U+ and u+ are allowed.

> it doesn't fail on u+zzz so why it does on u+a

Because 'a' is a valid hex digit and 'z' is not.

Comment 8

4 years ago
> Because 'a' is a valid hex digit and 'z' is not.
That was my point though now I have checked the spec and found that it starts unicode-range-token after u+<hex digit> and not just 'u+'. Well, Firefox and Chrome do all right by the spec but it's still lousy IMO.
Right, we agree the current spec situation is not OK.  ;)
The working group discussed this in today's teleconference and decided to change things so unicode-range is not a special token, but instead gets parsed from the appropriate set of tokens.

As a shorter-term fix, I think we could probably just switch to context-sensitive tokenization (i.e., calling a different scanner method when parsing the unicode-range descriptor) for now, at least until the spec for the new way stabilizes.
Flags: needinfo?(dbaron)
http://dev.w3.org/csswg/css-syntax/#urange-syntax has draft rules, though likely to move to css-fonts.

I think we should implement this by:

 * changing nsCSSToken to have an nsDependentString for its textual representation (I think we can do this, though we'd need to check)
   * as followup to this, getting rid of the complexity of nsCSSToken::AppendToString, which I think is only currently used in error messages (need to check that too -- if not it might be a problem)
 * then using nsCSSToken::AppendToString to implement what is described in the spec, and simultaneously remove the eCSSToken_URange token, and the mInteger2 from nsCSSToken.
Does stylo do this correctly?  (i.e., can we close this?)
Flags: needinfo?(simon.sapin)
Yes. rust-cssparser’s tokenizer does not have a unicode-range token (anymore), and instead provides a type and method to parse <unicode-range> based on other tokens. This matches what is not specified in CSS Syntax Level 3:

* https://drafts.csswg.org/css-syntax/#tokenizer-algorithms
* https://drafts.csswg.org/css-syntax/#urange

I’ve verified that the yellow background does show up in Firefox 60. (It presumably does since 57.)
Last Resolved: a month ago
Flags: needinfo?(simon.sapin)
Flags: needinfo?(jackalmage)
Resolution: --- → FIXED
> what is not specified

Typo: what is *now* specified
We should really make sure we have a testcase for this...

Comment 16

a month ago
Assignee: nobody → emilio
Resolution: FIXED → ---
Comment hidden (mozreview-request)
Comment on attachment 8974782 [details]
Bug 1032034: Test that unicode range syntax isn't a primitive token.


Thank you!
Attachment #8974782 - Flags: review?(bzbarsky) → review+

Comment 19

a month ago
Pushed by ecoal95@gmail.com:
Test that unicode range syntax isn't a primitive token. r=bz
Created web-platform-tests PR https://github.com/w3c/web-platform-tests/pull/10960 for changes under testing/web-platform/tests
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.

Comment 22

a month ago
Last Resolved: a month agoa month ago
status-firefox62: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Upstream PR merged
You need to log in before you can comment on or make changes to this bug.