Last Comment Bug 604179 - url() tokenization should not be contextual
: url() tokenization should not be contextual
Status: RESOLVED FIXED
: css2
Product: Core
Classification: Components
Component: CSS Parsing and Computation (show other bugs)
: Trunk
: All All
: P3 normal (vote)
: mozilla5
Assigned To: David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch)
:
Mentors:
Depends on:
Blocks: css2.1-tests
  Show dependency treegraph
 
Reported: 2010-10-13 14:27 PDT by David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch)
Modified: 2011-03-23 17:56 PDT (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch 1: use GatherURL more, and rename it (7.36 KB, patch)
2011-03-10 14:28 PST, David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch)
bzbarsky: review+
Details | Diff | Splinter Review
patch 2: rename tokens to match spec (4.40 KB, patch)
2011-03-10 14:29 PST, David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch)
bzbarsky: review+
Details | Diff | Splinter Review
patch 3: move consumption of leading whitespace (for all forms) and trailing whitespace (for string forms) inside url() into scanner (3.29 KB, patch)
2011-03-10 14:30 PST, David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch)
bzbarsky: review+
Details | Diff | Splinter Review
patch 4: fix indent (1020 bytes, patch)
2011-03-10 14:31 PST, David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch)
bzbarsky: review+
Details | Diff | Splinter Review
patch 5: disallow control characters in unquoted url() (2.91 KB, patch)
2011-03-10 14:31 PST, David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch)
bzbarsky: review+
Details | Diff | Splinter Review
patch 6: include the close paren in the URL token, and make quoted URLs produce the URL/Bad_URL tokens (6.59 KB, patch)
2011-03-10 14:38 PST, David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch)
bzbarsky: review+
Details | Diff | Splinter Review
patch 7: include the opening "url(" in the URL token, and make tokenization of url() (but not url-prefix() or domain()) noncontextual (15.47 KB, patch)
2011-03-10 14:38 PST, David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch)
bzbarsky: review+
Details | Diff | Splinter Review

Description David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2010-10-13 14:27:42 PDT
Right now the CSS scanner has a separate entry point for tokenizing URIs.  This is incorrect according to the spec; it means that our tokenization is incorrect when we encounter a url() at an unexpected location.

This makes us fail:
http://test.csswg.org/suites/css2.1/20100917/xhtml1/uri-016.xht
http://test.csswg.org/suites/css2.1/20100917/html4/uri-016.htm
Comment 1 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2011-03-10 12:34:51 PST
I have a seven patch series in my patch queue for this:

share-gather-url
rename-tokens
move-eating-whitespace
fix-indent-next-url
disallow-control-in-url
url-include-close-paren
url-noncontextual

in http://hg.mozilla.org/users/dbaron_mozilla.com/patches/file/fdc0e62eada4

though it still needs a little more work, and more tests.
Comment 3 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2011-03-10 14:28:47 PST
Created attachment 518522 [details] [diff] [review]
patch 1: use GatherURL more, and rename it
Comment 4 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2011-03-10 14:29:12 PST
Created attachment 518523 [details] [diff] [review]
patch 2: rename tokens to match spec
Comment 5 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2011-03-10 14:30:42 PST
Created attachment 518524 [details] [diff] [review]
patch 3: move consumption of leading whitespace (for all forms) and trailing whitespace (for string forms) inside url() into scanner
Comment 6 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2011-03-10 14:31:07 PST
Created attachment 518525 [details] [diff] [review]
patch 4: fix indent
Comment 7 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2011-03-10 14:31:55 PST
Created attachment 518526 [details] [diff] [review]
patch 5: disallow control characters in unquoted url()

This fixes another bug I noticed while looking at the spec (basically independent of this bug, really).
Comment 8 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2011-03-10 14:38:01 PST
Created attachment 518529 [details] [diff] [review]
patch 6: include the close paren in the URL token, and make quoted URLs produce the URL/Bad_URL tokens
Comment 9 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2011-03-10 14:38:46 PST
Created attachment 518530 [details] [diff] [review]
patch 7: include the opening "url(" in the URL token, and make tokenization of url() (but not url-prefix() or domain()) noncontextual
Comment 10 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2011-03-10 15:05:08 PST
Also, I'm aware that these changes might cause an incorrect leading url( on error messages while parsing domain() or url-prefix(), but I don't think it's a serious problem.
Comment 11 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2011-03-10 19:31:55 PST
Passed try on both debug and opt mac-64:
http://tbpl.mozilla.org/?tree=MozillaTry&rev=e31cd6d5e01c
Comment 12 Boris Zbarsky [:bz] 2011-03-10 20:28:34 PST
Comment on attachment 518522 [details] [diff] [review]
patch 1: use GatherURL more, and rename it

>+++ b/layout/style/nsCSSParser.cpp
>+CSSParserImpl::ParseURLOrString(nsString& aURL)
>+  UngetToken();

This will change the behavior of this testcase:

  <style>div { color: red }</style>
  <style>@import ; div { color: green; }</style>
  <div>This should be green</div>

to make the div green, not red.  I think that's good, and will make us match Opera and Webkit here, but we should add a test for this.  This looks like the @import version of bug 589672.

(Incidentally, if I use "media" there, not "import", then we match Webkit but not Opera; I'm not sure which is correct.)

r=me with that.
Comment 13 Boris Zbarsky [:bz] 2011-03-10 20:30:01 PST
Comment on attachment 518523 [details] [diff] [review]
patch 2: rename tokens to match spec

r=me
Comment 14 Boris Zbarsky [:bz] 2011-03-10 20:33:55 PST
Comment on attachment 518524 [details] [diff] [review]
patch 3: move consumption of leading whitespace (for all forms) and trailing whitespace (for string forms) inside url() into scanner

r=me
Comment 15 Boris Zbarsky [:bz] 2011-03-10 20:35:21 PST
Comment on attachment 518525 [details] [diff] [review]
patch 4: fix indent

r=me fwiw
Comment 16 Boris Zbarsky [:bz] 2011-03-10 20:39:05 PST
Comment on attachment 518526 [details] [diff] [review]
patch 5: disallow control characters in unquoted url()

r=me
Comment 17 Boris Zbarsky [:bz] 2011-03-10 20:54:11 PST
Comment on attachment 518529 [details] [diff] [review]
patch 6: include the close paren in the URL token, and make quoted URLs produce the URL/Bad_URL tokens

>+++ b/layout/style/nsCSSParser.cpp
>+  if (eCSSToken_URL != mToken.mType) {
>     // in the failure case, we do not have to match parentheses, since
>     // this is now an invalid URL token.

This comment doesn't really make sense, imo.

>+++ b/layout/style/nsCSSScanner.cpp
>+    case eCSSToken_URL:
>+    case eCSSToken_Bad_URL:
>+      if (mSymbol != PRUnichar(0)) {
>+        aBuffer.Append(mSymbol);
>+      }
>+      aBuffer.Append(mIdent);
>+      if (mSymbol != PRUnichar(0)) {
>+        aBuffer.Append(mSymbol);
>+      }
>+      if (mType == eCSSToken_URL) {
>+        aBuffer.Append(PRUnichar('('));
>+      }
>+      break;

I don't follow this code.  There are four cases here, I think:

1) eCSSToken_URL with quotes; eg: url('foo').  Then mIdent is the thing between 
   the quotes and mSymbol is the quote mark.  So we'd output: 'foo'(
2) eCSSToken_Bad_URL with quotes.  Then we'd output something like 'foo' even
   though the string was not terminated?
3) eCSSTokenURL without quotes; eg: url(foo).  Then we output: foo(
4) eCSSToken_Bad_URL without quotes.  Then we output: foo

Am I just missing something?  It looks like the old code didn't do a very good job here either...
Comment 18 Boris Zbarsky [:bz] 2011-03-10 21:00:18 PST
Comment on attachment 518529 [details] [diff] [review]
patch 6: include the close paren in the URL token, and make quoted URLs produce the URL/Bad_URL tokens

Oh, I see.  You just typed '(' when you meant ')' in ToString.

r=me with that fixed and the comment about failure case made sane.
Comment 19 Boris Zbarsky [:bz] 2011-03-10 21:02:05 PST
Comment on attachment 518530 [details] [diff] [review]
patch 7: include the opening "url(" in the URL token, and make tokenization of url() (but not url-prefix() or domain()) noncontextual

r=me.  Very nice!

Note You need to log in before you can comment on or make changes to this bug.