Closed Bug 337287 Opened 19 years ago Closed 15 years ago

wacky parsing of comments in URLs

Tracking

()

Status:

RESOLVED DUPLICATE of bug 476856

People

(Reporter: dbaron, Assigned: alfredkayser)

References

(Depends on 1 open bug)

Details

(Whiteboard: [patch])

Attachments

(1 file, 6 obsolete files)

patch 19 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 1.57 KB, patch		Details \| Diff \| Splinter Review
patch that removes comments and mostly fixes bugs (diff -u) 19 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 15.22 KB, patch		Details \| Diff \| Splinter Review
patch that removes comments and mostly fixes bugs (diff -ub) 19 years ago David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla) 13.73 KB, patch		Details \| Diff \| Splinter Review
Let NextURL fully parse everything within url() correctly 17 years ago Alfred Kayser 9.21 KB, patch		Details \| Diff \| Splinter Review
V5: Updated to trunk.Parse whole of URL and append as much as possible in go 16 years ago Alfred Kayser 15.76 KB, patch		Details \| Diff \| Splinter Review
V6: Don't scan for comments and as much as possible for scan 16 years ago Alfred Kayser 15.99 KB, patch		Details \| Diff \| Splinter Review
V7: Correctly parse URL including fastcopy of long unquoted URL 16 years ago Alfred Kayser 19.29 KB, patch	dbaron : review+	Details \| Diff \| Splinter Review

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Description

•

19 years ago

Our parsing of comments inside CSS URL tokens (which is actually not currently described by the spec) is quite wacky, because of a small typo. I can't actually explain quite what the current behavior is, but the fix is pretty simple. With this fix, we'll match MacIE and WinIE on the following testcases: http://www.hixie.ch/tests/adhoc/css/parsing/uri/001.html http://www.hixie.ch/tests/adhoc/css/parsing/uri/002.html http://www.hixie.ch/tests/adhoc/css/parsing/uri/003.html

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 1

•

19 years ago

Attached patch patch (obsolete) — Details — Splinter Review

The basic principle here is that the Next and NextURL functions (which are analogous; NextURL is used inside url(), which uses different tokenization) handle comments by recursively calling themselves and returning the thing after the comment; but here, NextURL was calling Next instead of recursively calling itself.

Attachment #221450 - Flags: superreview?(bzbarsky)

Attachment #221450 - Flags: review?(bzbarsky)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

19 years ago

Status: NEW → ASSIGNED

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

19 years ago

Priority: -- → P2

Whiteboard: [patch]

Target Milestone: --- → mozilla1.8.1beta1

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

19 years ago

Attachment #221450 - Flags: approval-branch-1.8.1?(bzbarsky)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 2

•

19 years ago

(Then again, maybe the whole principle of the thing is wrong and we should be returning a whitespace token...)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 3

•

19 years ago

Another thing here where we have the same bug as IE: we only treat them as comments at the start of URLs.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 4

•

19 years ago

Actually, that's not quite true: the spec (latest revisions only) currently says to accept comments at the beginning and end; WinIE accepts them at the beginning and, when preceded by a space, in the middle. Mozilla accepts them only at the start. Neither accept them at the end.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 5

•

19 years ago

So we reverted the change to the spec that allows comments inside URLs.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 6

•

19 years ago

Attached patch patch that removes comments and mostly fixes bugs (diff -u) (obsolete) — Details — Splinter Review

This doesn't do the pushback, so it doesn't pass the testcases where braces or brackets were part of the part that could have been accepted if it weren't followed by ).

Attachment #221450 - Attachment is obsolete: true

Attachment #221450 - Flags: superreview?(bzbarsky)

Attachment #221450 - Flags: review?(bzbarsky)

Attachment #221450 - Flags: approval-branch-1.8.1?(bzbarsky)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 7

•

19 years ago

Attached patch patch that removes comments and mostly fixes bugs (diff -ub) (obsolete) — Details — Splinter Review

Boris Zbarsky [:bzbarsky]

Comment 8

•

19 years ago

So I assume none of those patches need review yet?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 9

•

19 years ago

Probably not, although the last patch does get us to a significantly better state than the current code...

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

18 years ago

QA Contact: ian → style-system

Alfred Kayser

Assignee

Comment 10

•

17 years ago

Attached patch Let NextURL fully parse everything within url() correctly (obsolete) — Details — Splinter Review

This patch makes NextURL parse everything (whitespace, comments, strings) within the url() correctly, and only return String, URL or InvalidURL tokens (and PR_FALSE on EOF). The parser within NextURL is a simple character by character parser, maintaining status in URLState, so that we can detect any illegal constructs. In essence only one URL or String is allowed between the (), and all whitespace and comments are skipped. I will create meanwhile a more extensive test than the one of comment 1.

Assignee: dbaron → alfredkayser

Attachment #221586 - Attachment is obsolete: true

Attachment #221587 - Attachment is obsolete: true

Attachment #284609 - Flags: review?

Alfred Kayser

Assignee

Updated

•

17 years ago

Attachment #284609 - Flags: review? → review?(dbaron)

Alfred Kayser

Assignee

Comment 11

•

17 years ago

Allowed: url("index.jpg") url(/**/"index.jpg") url(/**/"index.jpg") url("index.jpg"/**/) url(/**/"index.jpg"/**/) url(/*aap*/index.jpg) url(index.jpg/*noot*/) url(/*aap*/index.jpg/*noot*/) url(/**/ "index.jpg") url("index.jpg" /**/) url(/**/ "index.jpg" /**/) url(/*aap*/ index.jpg) url(index.jpg /*noot*/) url(/*aap*/ index.jpg /*noot*/) url(/*aap*/ index.jpg ) url(index.jpg /*noot*/ ) url(/*aap*/ index.jpg /*noot*/ ) url( /*aap*/ index.jpg ) url( index.jpg /*noot*/ ) url( /*aap*/ index.jpg /*noot*/ ) Not allowed: url("index.jpg" more) url("index.jpg" "red.jpg") url(more "index.jpg") url(more "index.jpg" more) url(/*aap*/index.jpg/*noot*/"index.jpg") url(/*aap*/"green.jpg"/*noot*/red.jpg) -> URL = [green.jpg] (as the string is identied as URL), so the red.jpg is then invalid Dubious cases: Looks invalid, but is considered valid, with the comment as part of the URL itself. W3c says comments are allowed between tokens (not within them), so how to interpret? url(more/*aap*/index.jpg) -> URL = [more/*aap*/index.jpg] url(index.jpg/*noot*/more) -> URL = [index.jpg/*noot*/more] url(/*aap*/index.jpg/*noot*/"index.jpg") -> URL = [index.jpg/*noot*/"index.jpg"]

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 12

•

17 years ago

Could you include the test as part of the patch?

Reed Loden [:reed]

Comment 13

•

17 years ago

Alfred, think you can supply some tests with the patch?

Target Milestone: mozilla1.8.1beta1 → ---

Alfred Kayser

Assignee

Comment 14

•

16 years ago

Attached patch V5: Updated to trunk.Parse whole of URL and append as much as possible in go (obsolete) — Details — Splinter Review

This makes URL much more robust, handling all cases of comments and whitespace before and after URL's. By also eating the closing ')', nsCSSParser no longer needs to check for that one. By making NextURL accepting any nsString, it allows nSCCSParser to directly scan it into a local buffer, instead first scan it into mToken.mIdent, and then copying it into a local ns(Auto)String. So, this saves a string alloc and copy for every URL parsed, and makes appending much faster (whole blocks instead of per character).

Attachment #284609 - Attachment is obsolete: true

Attachment #339206 - Flags: review?(zweinberg)

Attachment #284609 - Flags: review?(dbaron)

Alfred Kayser

Assignee

Comment 15

•

16 years ago

Comment on attachment 339206 [details] [diff] [review] V5: Updated to trunk.Parse whole of URL and append as much as possible in go Reading the spec at http://www.w3.org/TR/CSS21/syndata.html#uri, I see now that comments are not allowed at all. That makes the parsing much easier, so I will need to submit a new patch...

Attachment #339206 - Attachment is obsolete: true

Attachment #339206 - Flags: review?(zweinberg)

Alfred Kayser

Assignee

Comment 16

•

16 years ago

Attached patch V6: Don't scan for comments and as much as possible for scan (obsolete) — Details — Splinter Review

With this patch, the testcases of comment #1 now work (except for two, but I believe those two to be wrong testcases). Next action is to convert these testcases into Mochitests...

Attachment #339272 - Flags: review?(zweinberg)

Daniel.S

Comment 17

•

16 years ago

I think I'm correct, when linking this to http://lists.w3.org/Archives/Public/www-style/2008Sep/0217.html FWIW, the testcase in that mail shows the same results in Opera 9.52, Safari 3.1.2 and IE8b2. So we should match those. v6 already does it, doesn't it?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 18

•

16 years ago

In bug 473914 I removed parsing of comments in URLs, per current spec. I think there may be some things here that belong in other bugs, though.

Daniel.S

Comment 19

•

16 years ago

(In reply to comment #18) > In bug 473914 I removed parsing of comments in URLs, per current spec. I think > there may be some things here that belong in other bugs, though. url('allowed.gif' /*x*/) url('allowed.gif' /*x*//*x*/) url('allowed.gif'/*x*/) url('allowed.gif'/*x*//*x*/) These notations still work, although they shouldn't. Otherwise our implementation is much closer to other implementations now :)

Alfred Kayser

Assignee

Comment 20

•

16 years ago

Attached patch V7: Correctly parse URL including fastcopy of long unquoted URL — Details — Splinter Review

Updated to trunk, but more importantly: Added specific tests: test_parse_url_337287.html. With this patch also test_css_eof_handling.html now shows 0 Failed. So, fully tested with mochitest and the testcases of comment #1.

Attachment #339272 - Attachment is obsolete: true

Attachment #370261 - Flags: review?(dbaron)

Attachment #339272 - Flags: review?(zweinberg)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 21

•

16 years ago

Comment on attachment 370261 [details] [diff] [review] V7: Correctly parse URL including fastcopy of long unquoted URL Please call the parameter to nsCSSScanner::NextURL aURL rather than anURL. Same for CSSParserImpl::GetURLToken. In nsCSSScanner.h, "Get the URL" isn't a useful comment; either remove it or add something useful. In nsCSSScanner.h, you should comment that ParseString's return value is a boolean indicating success, rather than not-eof. (That's unusual for the scanner.) + // Search for the end marker, skipping comments + while (ch != ')') { + if (ch < 0) { + return ok; // EOF during URL parsing + } + if ((ch == '/') && LookAhead('*')) { + SkipCComment(); // Comments are allowed after URL + } else if (!IsWhitespace(ch)) { + ok = PR_FALSE; // Additional tokens are invalid + } + ch = Read(); + } + Pushback(ch); Isn't part of the point of this bug that comments *aren't* allowed? Can't you just replace this with a call to EatWhiteSpace() and then a check that the next character is ')'? (And also don't read the extra ch in the ParseString path, and push it back or not read it in the non-string path?) Could you also add tests for this (see comment 19)? + else if ((nc == '"') || (nc == '\'') || (nc == '(')) { + // This is an invalid URL spec, but keep scanning till the end + ok = PR_FALSE; + } We need a followup bug on fixing this if we don't have one already; if it doesn't match the url token production in the spec, it matches other productions, which means that we need to use SkipUntil appropriately. DO NOT attempt to fix this in this patch. Please make CSSParserImpl::GetURLToken assert (with NS_ABORT_IF_FALSE) that !mHavePushBack, since the code doesn't make sense if there is pushback. Please add your new tests to test_parse_url.html instead of making a new file? These tests are testing for the opposite result of what is expected: +// Comments after URL and whitespace are allowed +div.style.listStyleImage = 'url(bad)'; +div.style.listStyleImage = 'url(good /*valid comment*/)'; +is(div.style.listStyleImage, 'url("good")', + "comment not skipped"); + +div.style.listStyleImage = 'url(bad)'; +div.style.listStyleImage = 'url(good /*valid comments*/ /*Hello*/)'; +is(div.style.listStyleImage, 'url("good")', + "multiple comments not skipped"); ... +div.style.listStyleImage = 'url(bad)'; +div.style.listStyleImage = 'url(good/**/ /*secondcommentcanbeskipped*/ )'; +is(div.style.listStyleImage, 'url("good/**/")', + "second comment not skipped?"); +div.style.listStyleImage = 'url(bad)'; +div.style.listStyleImage = 'url(good /*)*/'; +is(div.style.listStyleImage, 'url("good")', + "comment with closing bracket not skipped"); ... this one is wrong, but you *should* test that 'url(good/*)' is accepted. You should also add tests that the following are not accepted: url(good bad) url(\g b) url( \g b) url(c\g b) url(cc\g b) url(\f b) url( \f b) url(c\f b) url(cc\f b) but that the following are accepted: url(\f good) url( \f good) url(f\f good) url(go\od) url(goo\d) url(go\o) r=dbaron with those issues addressed (and no other changes, please) Sorry for the long delay getting to this.

Attachment #370261 - Flags: review?(dbaron) → review+

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 22

•

16 years ago

Though, actually, I think I prefer the patch I've had in my tree for ages (see above), which also fixes the parenthesis matching issues. I'll attach it to bug 476856.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 23

•

16 years ago

Though I think some of the scanner performance work here could be added to that if it makes a measurable improvement.

Alfred Kayser

Assignee

Comment 24

•

15 years ago

I will restart my performance patches on the new base (from bug 476856), focusing on the appending multiple characters thing, whilst keeping the overall logic in tact.

Boris Zbarsky [:bzbarsky]

Comment 25

•

15 years ago

Alfred, you might want to keep an eye on bug 513149.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 26

•

15 years ago

The parsing issues here should be fixed now.

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Resolution: --- → DUPLICATE

Serge Gautherie (:sgautherie)

Updated

•

14 years ago

Depends on: 623196

You need to log in before you can comment on or make changes to this bug.