Closed Bug 337287 Opened 18 years ago Closed 15 years ago

wacky parsing of comments in URLs

Categories

(Core :: CSS Parsing and Computation, defect, P2)

defect

Tracking

()

RESOLVED DUPLICATE of bug 476856

People

(Reporter: dbaron, Assigned: alfredkayser)

References

(Depends on 1 open bug)

Details

(Whiteboard: [patch])

Attachments

(1 file, 6 obsolete files)

Our parsing of comments inside CSS URL tokens (which is actually not currently described by the spec) is quite wacky, because of a small typo.  I can't actually explain quite what the current behavior is, but the fix is pretty simple.  With this fix, we'll match MacIE and WinIE on the following testcases:

http://www.hixie.ch/tests/adhoc/css/parsing/uri/001.html
http://www.hixie.ch/tests/adhoc/css/parsing/uri/002.html
http://www.hixie.ch/tests/adhoc/css/parsing/uri/003.html
Attached patch patch (obsolete) — Splinter Review
The basic principle here is that the Next and NextURL functions (which are analogous; NextURL is used inside url(), which uses different tokenization) handle comments by recursively calling themselves and returning the thing after the comment; but here, NextURL was calling Next instead of recursively calling itself.
Attachment #221450 - Flags: superreview?(bzbarsky)
Attachment #221450 - Flags: review?(bzbarsky)
Status: NEW → ASSIGNED
Priority: -- → P2
Whiteboard: [patch]
Target Milestone: --- → mozilla1.8.1beta1
Attachment #221450 - Flags: approval-branch-1.8.1?(bzbarsky)
(Then again, maybe the whole principle of the thing is wrong and we should be returning a whitespace token...)
Another thing here where we have the same bug as IE:  we only treat them as comments at the start of URLs.
Actually, that's not quite true:  the spec (latest revisions only) currently says to accept comments at the beginning and end; WinIE accepts them at the beginning and, when preceded by a space, in the middle.  Mozilla accepts them only at the start.  Neither accept them at the end.
So we reverted the change to the spec that allows comments inside URLs.
This doesn't do the pushback, so it doesn't pass the testcases where braces or brackets were part of the part that could have been accepted if it weren't followed by ).
Attachment #221450 - Attachment is obsolete: true
Attachment #221450 - Flags: superreview?(bzbarsky)
Attachment #221450 - Flags: review?(bzbarsky)
Attachment #221450 - Flags: approval-branch-1.8.1?(bzbarsky)
So I assume none of those patches need review yet?
Probably not, although the last patch does get us to a significantly better state than the current code...
QA Contact: ian → style-system
This patch makes NextURL parse everything (whitespace, comments, strings) within the url() correctly, and only return String, URL or InvalidURL tokens (and PR_FALSE on EOF).

The parser within NextURL is a simple character by character parser, maintaining status in URLState, so that we can detect any illegal constructs.
In essence only one URL or String is allowed between the (), and all whitespace and comments are skipped.

I will create meanwhile a more extensive test than the one of comment 1.
Assignee: dbaron → alfredkayser
Attachment #221586 - Attachment is obsolete: true
Attachment #221587 - Attachment is obsolete: true
Attachment #284609 - Flags: review?
Attachment #284609 - Flags: review? → review?(dbaron)
Allowed:
url("index.jpg")
url(/**/"index.jpg")
url(/**/"index.jpg")
url("index.jpg"/**/)
url(/**/"index.jpg"/**/)

url(/*aap*/index.jpg)
url(index.jpg/*noot*/)
url(/*aap*/index.jpg/*noot*/)

url(/**/  "index.jpg")
url("index.jpg"  /**/)
url(/**/  "index.jpg"  /**/)

url(/*aap*/  index.jpg)
url(index.jpg  /*noot*/)
url(/*aap*/  index.jpg  /*noot*/)

url(/*aap*/  index.jpg  )
url(index.jpg  /*noot*/  )
url(/*aap*/  index.jpg  /*noot*/  )

url(  /*aap*/  index.jpg  )
url(  index.jpg  /*noot*/  )
url(  /*aap*/  index.jpg  /*noot*/  )

Not allowed:
url("index.jpg" more)
url("index.jpg" "red.jpg")
url(more "index.jpg")
url(more "index.jpg" more)
url(/*aap*/index.jpg/*noot*/"index.jpg")
url(/*aap*/"green.jpg"/*noot*/red.jpg) -> URL = [green.jpg] (as the string is identied as URL), so the red.jpg is then invalid

Dubious cases:
Looks invalid, but is considered valid, with the comment as part of the URL itself. W3c says comments are allowed between tokens (not within them), so how to interpret?
url(more/*aap*/index.jpg)  -> URL = [more/*aap*/index.jpg]
url(index.jpg/*noot*/more) -> URL = [index.jpg/*noot*/more]
url(/*aap*/index.jpg/*noot*/"index.jpg") -> URL = [index.jpg/*noot*/"index.jpg"]
Could you include the test as part of the patch?
Alfred, think you can supply some tests with the patch?
Target Milestone: mozilla1.8.1beta1 → ---
This makes URL much more robust, handling all cases of comments and whitespace before and after URL's. By also eating the closing ')', nsCSSParser no longer needs to check for that one. 
By making NextURL accepting any nsString, it allows nSCCSParser to directly scan it into a local buffer, instead first scan it into mToken.mIdent, and then copying it into a local ns(Auto)String.
So, this saves a string alloc and copy for every URL parsed, and makes appending much faster (whole blocks instead of per character).
Attachment #284609 - Attachment is obsolete: true
Attachment #339206 - Flags: review?(zweinberg)
Attachment #284609 - Flags: review?(dbaron)
Comment on attachment 339206 [details] [diff] [review]
V5: Updated to trunk.Parse whole of URL and append as much as possible in go

Reading the spec at http://www.w3.org/TR/CSS21/syndata.html#uri, I see now that comments are not allowed at all. That makes the parsing much easier, so I will need to submit a new patch...
Attachment #339206 - Attachment is obsolete: true
Attachment #339206 - Flags: review?(zweinberg)
With this patch, the testcases of comment #1 now work (except for two, but I believe those two to be wrong testcases). 
Next action is to convert these testcases into Mochitests...
Attachment #339272 - Flags: review?(zweinberg)
I think I'm correct, when linking this to http://lists.w3.org/Archives/Public/www-style/2008Sep/0217.html

FWIW, the testcase in that mail shows the same results in Opera 9.52, Safari 3.1.2 and IE8b2. So we should match those. v6 already does it, doesn't it?
In bug 473914 I removed parsing of comments in URLs, per current spec.  I think there may be some things here that belong in other bugs, though.
(In reply to comment #18)
> In bug 473914 I removed parsing of comments in URLs, per current spec.  I think
> there may be some things here that belong in other bugs, though.

url('allowed.gif' /*x*/)
url('allowed.gif' /*x*//*x*/)
url('allowed.gif'/*x*/)
url('allowed.gif'/*x*//*x*/)

These notations still work, although they shouldn't. Otherwise our implementation is much closer to other implementations now :)
Updated to trunk, but more importantly:
Added specific tests: test_parse_url_337287.html.
With this patch also test_css_eof_handling.html now shows 0 Failed.
So, fully tested with mochitest and the testcases of comment #1.
Attachment #339272 - Attachment is obsolete: true
Attachment #370261 - Flags: review?(dbaron)
Attachment #339272 - Flags: review?(zweinberg)
Comment on attachment 370261 [details] [diff] [review]
V7: Correctly parse URL including fastcopy of long unquoted URL

Please call the parameter to nsCSSScanner::NextURL aURL rather than
anURL.  Same for CSSParserImpl::GetURLToken.

In nsCSSScanner.h, "Get the URL" isn't a useful comment; either remove
it or add something useful.

In nsCSSScanner.h, you should comment that ParseString's return value is
a boolean indicating success, rather than not-eof.  (That's unusual for
the scanner.)


+  // Search for the end marker, skipping comments
+  while (ch != ')') {
+    if (ch < 0) {
+      return ok;                  // EOF during URL parsing
+    }
+    if ((ch == '/') && LookAhead('*')) {
+      SkipCComment();             // Comments are allowed after URL
+    } else if (!IsWhitespace(ch)) {
+      ok = PR_FALSE;              // Additional tokens are invalid
+    }
+    ch = Read();
+  }
+  Pushback(ch);

Isn't part of the point of this bug that comments *aren't* allowed?
Can't you just replace this with a call to EatWhiteSpace() and then a
check that the next character is ')'?  (And also don't read the extra ch
in the ParseString path, and push it back or not read it in the
non-string path?)  Could you also add tests for this (see comment 19)?

+          else if ((nc == '"') || (nc == '\'') || (nc == '(')) {
+            // This is an invalid URL spec, but keep scanning till the end
+            ok = PR_FALSE;
+          }

We need a followup bug on fixing this if we don't have one already; if
it doesn't match the url token production in the spec, it matches other
productions, which means that we need to use SkipUntil appropriately.
DO NOT attempt to fix this in this patch.

Please make CSSParserImpl::GetURLToken assert (with NS_ABORT_IF_FALSE)
that !mHavePushBack, since the code doesn't make sense if there is
pushback.


Please add your new tests to test_parse_url.html instead of making a new
file?

These tests are testing for the opposite result of what is expected:
+// Comments after URL and whitespace are allowed
+div.style.listStyleImage = 'url(bad)';
+div.style.listStyleImage = 'url(good /*valid comment*/)';
+is(div.style.listStyleImage, 'url("good")',
+   "comment not skipped");
+
+div.style.listStyleImage = 'url(bad)';
+div.style.listStyleImage = 'url(good /*valid comments*/ /*Hello*/)';
+is(div.style.listStyleImage, 'url("good")',
+   "multiple comments not skipped");

...

+div.style.listStyleImage = 'url(bad)';
+div.style.listStyleImage = 'url(good/**/ /*secondcommentcanbeskipped*/ )';
+is(div.style.listStyleImage, 'url("good/**/")',
+   "second comment not skipped?");


+div.style.listStyleImage = 'url(bad)';
+div.style.listStyleImage = 'url(good /*)*/';
+is(div.style.listStyleImage, 'url("good")',
+   "comment with closing bracket not skipped");

... this one is wrong, but you *should* test that 'url(good/*)' is accepted.


You should also add tests that the following are not accepted:
  url(good bad)
  url(\g b)
  url( \g b)
  url(c\g b)
  url(cc\g b)
  url(\f  b)
  url( \f  b)
  url(c\f  b)
  url(cc\f  b)
but that the following are accepted:
  url(\f good)
  url( \f good)
  url(f\f good)
  url(go\od)
  url(goo\d)
  url(go\o)


r=dbaron with those issues addressed (and no other changes, please)

Sorry for the long delay getting to this.
Attachment #370261 - Flags: review?(dbaron) → review+
Though, actually, I think I prefer the patch I've had in my tree for ages (see above), which also fixes the parenthesis matching issues.  I'll attach it to bug 476856.
Though I think some of the scanner performance work here could be added to that if it makes a measurable improvement.
I will restart my performance patches on the new base (from bug 476856),
focusing on the appending multiple characters thing, whilst keeping the overall logic in tact.
Alfred, you might want to keep an eye on bug 513149.
The parsing issues here should be fixed now.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
Depends on: 623196
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: