Should \url(xxx) be parsed as a URI token? There seems to never be a clear interest about resolving this issue.
CSS 2.1 core grammar has
instead of (u|\\u|...) and whatsoever. WebKit follows this (see attached URL) but all other browsers don't.
I think we should do this simply because it seems better performance-wise. I'll attach my working patch which includes a test, as long as I get my bug number here.
This will make Gecko fail , but I think that test is just invalid. WebKit and my patch still pass. That's a bit mysterious to me and I'll look into this and submit another patch.
This shouldn't have backwards compatibility issue because WebKit has been like this for a while and I heard that IE didn't really parse escape sequence a few years ago (it's behavior is still quite buggy).
If we are going another way, we should ask CSSWG to change the grammar. In that case, I we probably should support \u+???? too.
Created attachment 620953 [details] [diff] [review]
Part1v1 tweaks to nsCSSScanner
I am uneasy the redundancy about calling Read() and then Pushback(), but I don't know this code very well anyway.
(In reply to Kang-Hao (Kenny) Lu from comment #0)
> This will make Gecko fail , but I think that test is just invalid. WebKit
> and my patch still pass. That's a bit mysterious to me and I'll look into
> look into this and submit another patch.
>  http://www.w3.org/Style/CSS/Test/CSS3/Namespace/current/syntax-004.xml
It turns out that this test case is just wrong. I'm going to just ignore it and I actually think there's nothing else that needs to be done here.
If time permits, I might send some cleanup patch to the scanner.
I think that, based on the principles described in the section on character escapes, we should fix the formal grammar here. I think it was an oversight not to fix url( when all the other similar tokens in the formal grammar were fixed (which happened at some point during the 2.1 process).
For what it's worth, Appendix G was changed to match the formal grammar in 4.1.1 after a mail from Bjoern Hoehrmann.
The principles written in human sentences are really just bound for incompleteness. For example:
1. It doesn't say an isolated hyphen isn't an identifier.
2. It doesn't say you can't have a comment between the sign and the digit, and this is being discussed.
3. It doesn't say control characters in url() need to be quoted.
In terms of browser convergence, I recall WebKit using a machine generated lexer generated with the formal grammar. IE9 pretty much still handles CSS escape sequences as a preprocessing step so it is far from fully compliant. I know nothing about Opera, but my guess is that my approach has a better chance to be what browsers converge to.
(In reply to David Baron [:dbaron] from comment #3)
> I think that, based on the principles described in the section on character
> escapes, we should fix the formal grammar here.
What about the "u" as in "u+1234" ? Heck, CSS2.1 doesn't even talk about this besides what's in the formal grammar.
However, the CSS grammar has in general not been very accurate. For example, originally, the rules on character escapes weren't expressed in the grammar at all in terms of what goes in fixed "words" that appear in the grammar. That was fixed at some point during work on 2.1.
If there's an explicit working group decision that this is the way it's intended, then I'm ok with it. But I think it should be the other way around.
The working group resolved quite a while ago that our curent behavior is correct:
but CSS 2.1 edits were lost, for some reason.