Closed
Bug 1032034
Opened 11 years ago
Closed 7 years ago
u+foo is not parsed as a valid selector due to the "u+f" being treated as a unicode-range-token
Categories
(Core :: CSS Parsing and Computation, defect)
Tracking
()
RESOLVED
FIXED
mozilla62
Tracking | Status | |
---|---|---|
firefox62 | --- | fixed |
People
(Reporter: proger.xp, Assigned: emilio, Mentored)
Details
Attachments
(2 files)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0 (Beta/Release)
Steps to reproduce:
demo.html:
<!DOCTYPE html>
<html>
<head>
<style>
#nav u+a { background: yellow; }
</style>
</head>
<body>
<nav id="nav">
<u>THE U</u>
<a href="#">THE A</a>
</nav>
</body>
</html>
Actual results:
The <a> element is not styled (not yellow). Rule defined in <style> is dropped; console says:
Dangling combinator. Ruleset ignored due to bad selector.
This can be fixed by adding a space either after u+ or before +a or both:
#nav u+ a { background: yellow; }
#nav u +a { background: yellow; }
#nav u + a { background: yellow; }
Either of these 3 works.
Tested on: 33 (latest), 28, 26. All broken.
Expected results:
u+a should work the same way as u + a with or without spaces.
Comment 1•11 years ago
|
||
Reproduced on latest Nightly 33.0a1 2014-06-29 under Win 7, 64-bit.
Please notice that the yellow background is not displayed on Chrome too.
Component: Untriaged → CSS Parsing and Computation
Product: Firefox → Core
> Please notice that the yellow background is not displayed on Chrome too.
It is not indeed but this behaviour is a mystery to me. Obviously it also doesn't work on Safari.
But it *does* work on IE 10+ - probably one of those lucky cases when Microsoft got it right.
![]() |
||
Comment 3•11 years ago
|
||
Consider how this gets tokenized by the CSS tokenizer. I'm basing the analysis on http://dev.w3.org/csswg/css-syntax/ which, unlike CSS2, has a clearly defined tokenizer.
First we see the '#nav'. This produces a <hash-token>.
Then we see a space. This produces a <whitespace-token>.
Then we see a 'u'. Looking at http://dev.w3.org/csswg/css-syntax/#consume-a-token we see:
U+0055 LATIN CAPITAL LETTER U (U)
U+0075 LATIN SMALL LETTER U (u)
If the next 2 input code points are U+002B PLUS SIGN (+) followed by a hex digit or
U+003F QUESTION MARK (?), consume the next input code point. Note: don’t consume both
of them. Consume a unicode-range token and return it.
In this case, the next two input code points are in fact U+002B followed by a hex digit, so this returns a <unicode-range-token>. This token is not valid in selectors, so we seem to treat this as end of selector and claim that we had a combinator (space) followed by no more selector, hence dangling combinator.
So the error reporting is suboptimal, for sure. It would be nice if it actually said it saw an unexpected token, and what kind.
IE does support unicode-range, so I wonder what exactly they are doing in their tokenizer, and whether their tokenizer is based on the spec at all...
That said, this is a pretty weird footgun, and I wonder whether we can adjust the spec here somehow. Right now "#nav u+i" would work, but "#nav u+b" would not, nor would "#nav u+code" and various others!
I posted http://lists.w3.org/Archives/Public/www-style/2014Jun/0486.html about that.
Flags: needinfo?(jackalmage)
Flags: needinfo?(dbaron)
Summary: Dangling combinator. Ruleset ignored due to bad selector. → u+foo is not parsed as a valid selector due to the "u+f" being treated as a unicode-range-token
![]() |
||
Comment 4•11 years ago
|
||
And thank you for the excellent bug report!
Status: UNCONFIRMED → NEW
Ever confirmed: true
Thanks Boris, I would have never guessed that it expects to see a Unicode sequence. Shouldn't it be U+... (case-sensitive)? At any rate this is quite a gotcha, I would expect the parser to ignore possible Unicode reference as soon as it fails to recognize it (it doesn't fail on u+zzz so why it does on u+a?). That's probably what IE does, reasonably.
Curious to see W3C's reaction.
And by the way, forgot to mention that I ran into this after minifying some stylesheets so I guess many if not all existing CSS minifiers are broken due to this behaviour as they always remove spaces around selector symbols.
![]() |
||
Comment 7•11 years ago
|
||
> I would have never guessed that it expects to see a Unicode sequence.
It actually expects a sequence or range. Something like u+123-678, but the part after '-' is optional.
> Shouldn't it be U+... (case-sensitive)?
CSS syntax is uniformly case-insensitive, so both U+ and u+ are allowed.
> it doesn't fail on u+zzz so why it does on u+a
Because 'a' is a valid hex digit and 'z' is not.
> Because 'a' is a valid hex digit and 'z' is not.
That was my point though now I have checked the spec and found that it starts unicode-range-token after u+<hex digit> and not just 'u+'. Well, Firefox and Chrome do all right by the spec but it's still lousy IMO.
![]() |
||
Comment 9•11 years ago
|
||
Right, we agree the current spec situation is not OK. ;)
The working group discussed this in today's teleconference and decided to change things so unicode-range is not a special token, but instead gets parsed from the appropriate set of tokens.
As a shorter-term fix, I think we could probably just switch to context-sensitive tokenization (i.e., calling a different scanner method when parsing the unicode-range descriptor) for now, at least until the spec for the new way stabilizes.
Flags: needinfo?(dbaron)
http://dev.w3.org/csswg/css-syntax/#urange-syntax has draft rules, though likely to move to css-fonts.
I think we should implement this by:
* changing nsCSSToken to have an nsDependentString for its textual representation (I think we can do this, though we'd need to check)
* as followup to this, getting rid of the complexity of nsCSSToken::AppendToString, which I think is only currently used in error messages (need to check that too -- if not it might be a problem)
* then using nsCSSToken::AppendToString to implement what is described in the spec, and simultaneously remove the eCSSToken_URange token, and the mInteger2 from nsCSSToken.
![]() |
||
Updated•10 years ago
|
Mentor: bzbarsky
Does stylo do this correctly? (i.e., can we close this?)
Flags: needinfo?(simon.sapin)
Comment 13•7 years ago
|
||
Yes. rust-cssparser’s tokenizer does not have a unicode-range token (anymore), and instead provides a type and method to parse <unicode-range> based on other tokens. This matches what is not specified in CSS Syntax Level 3:
* https://drafts.csswg.org/css-syntax/#tokenizer-algorithms
* https://drafts.csswg.org/css-syntax/#urange
I’ve verified that the yellow background does show up in Firefox 60. (It presumably does since 57.)
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(simon.sapin)
Flags: needinfo?(jackalmage)
Resolution: --- → FIXED
Comment 14•7 years ago
|
||
> what is not specified
Typo: what is *now* specified
![]() |
||
Comment 15•7 years ago
|
||
We should really make sure we have a testcase for this...
Assignee | ||
Comment 16•7 years ago
|
||
Agreed.
Assignee: nobody → emilio
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment hidden (mozreview-request) |
![]() |
||
Comment 18•7 years ago
|
||
mozreview-review |
Comment on attachment 8974782 [details]
Bug 1032034: Test that unicode range syntax isn't a primitive token.
https://reviewboard.mozilla.org/r/243184/#review249026
Thank you!
Attachment #8974782 -
Flags: review?(bzbarsky) → review+
Comment 19•7 years ago
|
||
Pushed by ecoal95@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/85d3e5f02273
Test that unicode range syntax isn't a primitive token. r=bz
Created web-platform-tests PR https://github.com/w3c/web-platform-tests/pull/10960 for changes under testing/web-platform/tests
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.
Comment 22•7 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
status-firefox62:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Upstream PR merged
You need to log in
before you can comment on or make changes to this bug.
Description
•