Last Comment Bug 751805 - CSS escape in "url(" (e.g. "\url(" ) should not be parsed as a URI token
: CSS escape in "url(" (e.g. "\url(" ) should not be parsed as a URI token
Status: RESOLVED WONTFIX
:
Product: Core
Classification: Components
Component: CSS Parsing and Computation (show other bugs)
: Trunk
: All All
: -- normal (vote)
: ---
Assigned To: Kang-Hao (Kenny) Lu [:kennyluck]
:
Mentors:
data:text/html,<!DOCTYPE html><style>...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-03 21:14 PDT by Kang-Hao (Kenny) Lu [:kennyluck]
Modified: 2013-01-07 09:00 PST (History)
1 user (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Part1v1 tweaks to nsCSSScanner (3.47 KB, patch)
2012-05-03 21:26 PDT, Kang-Hao (Kenny) Lu [:kennyluck]
dbaron: review-
Details | Diff | Splinter Review

Description Kang-Hao (Kenny) Lu [:kennyluck] 2012-05-03 21:14:29 PDT
Should \url(xxx) be parsed as a URI token? There seems to never be a clear interest about resolving this issue.

CSS 2.1 core grammar has

URI: url\({w}{string}{w}\)
     |url\({w}([!#$%&*-\[\]-~]|{nonascii}|{escape})*{w}\)

instead of (u|\\u|...) and whatsoever. WebKit follows this (see attached URL) but all other browsers don't.

I think we should do this simply because it seems better performance-wise. I'll attach my working patch which includes a test, as long as I get my bug number here.

This will make Gecko fail [1], but I think that test is just invalid. WebKit and my patch still pass[2]. That's a bit mysterious to me and I'll look into this and submit another patch.

This shouldn't have backwards compatibility issue because WebKit has been like this for a while and I heard that IE didn't really parse escape sequence a few years ago (it's behavior is still quite buggy).

If we are going another way, we should ask CSSWG to change the grammar. In that case, I we probably should support \u+???? too.

[1] http://test.csswg.org/suites/css2.1/nightly-unstable/html4/uri-015.htm 
[2] http://www.w3.org/Style/CSS/Test/CSS3/Namespace/current/syntax-004.xml
Comment 1 Kang-Hao (Kenny) Lu [:kennyluck] 2012-05-03 21:26:15 PDT
Created attachment 620953 [details] [diff] [review]
Part1v1 tweaks to nsCSSScanner

I am uneasy the redundancy about calling Read() and then Pushback(), but I don't know this code very well anyway.
Comment 2 Kang-Hao (Kenny) Lu [:kennyluck] 2012-05-04 01:26:31 PDT
(In reply to Kang-Hao (Kenny) Lu from comment #0)
> This will make Gecko fail [1], but I think that test is just invalid. WebKit
> and my patch still pass[2]. That's a bit mysterious to me and I'll look into
> look into this and submit another patch.
>
> [2] http://www.w3.org/Style/CSS/Test/CSS3/Namespace/current/syntax-004.xml

It turns out that this test case is just wrong[3]. I'm going to just ignore it and I actually think there's nothing else that needs to be done here.

If time permits, I might send some cleanup patch to the scanner.

[3] http://lists.w3.org/Archives/Public/public-css-testsuite/2012May/0005
Comment 3 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2012-05-08 00:25:31 PDT
I think that, based on the principles described in the section on character escapes, we should fix the formal grammar here.  I think it was an oversight not to fix url( when all the other similar tokens in the formal grammar were fixed (which happened at some point during the 2.1 process).
Comment 4 Kang-Hao (Kenny) Lu [:kennyluck] 2012-05-08 01:21:21 PDT
For what it's worth, Appendix G was changed to match the formal grammar in 4.1.1[1] after a mail from Bjoern Hoehrmann[2].

The principles written in human sentences are really just bound for incompleteness. For example:

1. It doesn't say an isolated hyphen isn't an identifier.
2. It doesn't say you can't have a comment between the sign and the digit, and this is being discussed.
3. It doesn't say control characters in url() need to be quoted.

In terms of browser convergence, I recall WebKit using a machine generated lexer generated with the formal grammar. IE9 pretty much still handles CSS escape sequences as a preprocessing step so it is far from fully compliant. I know nothing about Opera, but my guess is that my approach has a better chance to be what browsers converge to.

(In reply to David Baron [:dbaron] from comment #3)
> I think that, based on the principles described in the section on character
> escapes, we should fix the formal grammar here.

What about the "u" as in "u+1234" ? Heck, CSS2.1 doesn't even talk about this besides what's in the formal grammar.


[1] http://www.w3.org/TR/CSS21/changes.html#q546
[2] http://lists.w3.org/Archives/Public/www-style/2010Jul/0499
Comment 5 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2012-05-08 01:34:25 PDT
However, the CSS grammar has in general not been very accurate.  For example, originally, the rules on character escapes weren't expressed in the grammar at all in terms of what goes in fixed "words" that appear in the grammar.  That was fixed at some point during work on 2.1.

If there's an explicit working group decision that this is the way it's intended, then I'm ok with it.  But I think it should be the other way around.
Comment 6 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2013-01-03 07:48:08 PST
http://lists.w3.org/Archives/Public/www-style/2013Jan/thread.html#msg8
Comment 7 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2013-01-07 09:00:03 PST
The working group resolved quite a while ago that our curent behavior is correct:
http://lists.w3.org/Archives/Public/www-style/2012May/0329.html
but CSS 2.1 edits were lost, for some reason.

Note You need to log in before you can comment on or make changes to this bug.