Closed Bug 1190027 Opened 9 years ago Closed 8 years ago

URL parser should be consistent in handling of \r\n\t\0 characters

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox51 --- fixed

People

(Reporter: valentin, Assigned: CuveeHsu)

References

Details

(Whiteboard: [necko-active])

Attachments

(1 file, 1 obsolete file)

The URL spec suggests that generally parsing \n\r\t\0 should result in a parsing error.
However... this is not the case in most browsers. See table below.

We should return an error when trying to set url.search and url.pathname to an string containing \0. This means we should either be consistent with our current behaviour and fail silently, or make sure we throw for every setter that contains \0. We could also try to close the behaviour gap with Chrome, and strip \0s in setters.

BROWSER     | \0 in URL | \0 in setters | \r\n\t in URL | \r\n\t in setters
------------+-----------+---------------+---------------+------------------
Chrome      | throws    | strips        | strips        | encodes
Safari      | cutoff    | cutoff        | throws        | strips
IE location | cutoff    | cutoff        | strips        | depends*
Firefox     | throws    | depends**     | strips        | encodes

depends* : IE window.location strips characters for .search, encodes in .pathname, does not encode or strip in .hash setter

depends** : Firefox handles \0 by cutting off .search and .pathname, and by silently failing for .hash and .host

A cutoff means that a "hello\0test" will be handled as "hello"
Actually, per https://url.spec.whatwg.org/#concept-basic-url-parser it strips unless it's part of a setter. You should probably file a bug on https://github.com/whatwg/url/issues/new to specify something new for the setters and perhaps something new for \0 although it seems somewhat weird to special case it.
(In reply to Anne (:annevk) from comment #1)
> Actually, per https://url.spec.whatwg.org/#concept-basic-url-parser it
> strips unless it's part of a setter.

Do you mean this line?
> "1.2. Remove any leading and trailing C0 controls and space from input." 
Because it seems to me that all browsers strip special characters even when they are in the middle of the string.

> You should probably file a bug on
> https://github.com/whatwg/url/issues/new to specify something new for the
> setters and perhaps something new for \0 although it seems somewhat weird to
> special case it.

I have filed https://github.com/whatwg/url/issues/57
By the way, note that when the URL specification says "parse error", this is not fatal.
So I don't think the specification is wrong here. I think implementations are just buggy. There's no reason why

  var x = document.createElement("a"); x.href = "http://x/\x00x"; x.href

and

  var x = new URL("http://x/\x00x"); x.href

need to give different results.

Stripping \n\r\t consistently across the normal parser and setters also seems preferable to having different behavior for each.

It's also not clear to me why special casing U+0000 is justified.
Clarifying the "parse error" wording is https://github.com/whatwg/url/issues/60 by the way.
Thanks Anne. I'll start working on a patch to filter out these characters in parsing/setters.
Whiteboard: [necko-backlog]
I tested it in nightly.
We encode \0 to "%00" for both URL constructors and setters. (bug 1272284)
We strip \r\n\t for both URL constructors and setters, which satisfies the spec (third point in [1])
I believe it just happens to be fixed in [2].

And we lack for the test of \r\n\t in middle URL.

[1] https://url.spec.whatwg.org/#concept-basic-url-parser
[2] http://searchfox.org/mozilla-central/rev/9ec085584d7491ddbaf6574d3732c08511709172/netwerk/base/nsStandardURL.cpp#1435
Assignee: valentin.gosu → juhsu
Whiteboard: [necko-backlog] → [necko-active]
Attached patch test_strip_tab_and_newline (obsolete) — Splinter Review
Per comment 8, add a test to improve the test coverage.
Attachment #8781437 - Flags: review?(mcmanus)
Comment on attachment 8781437 [details] [diff] [review]
test_strip_tab_and_newline

Review of attachment 8781437 [details] [diff] [review]:
-----------------------------------------------------------------

if any change is made here (and this is just a test I realize) chrome-compat should be the #1 goal. If we don't like chrome's behavior we should try and coordinate a change with them.
Attachment #8781437 - Flags: review?(mcmanus) → review?(valentin.gosu)
> if any change is made here (and this is just a test I realize) chrome-compat
> should be the #1 goal. If we don't like chrome's behavior we should try and
> coordinate a change with them.
The reason I did no behavior change is in comment 8.

To clarify, Chrome's behavior is almost the same as Firefox Nightly.
The only exception is that |new URL("http://x/\x00x")| would throw.
It disobeys comment 4 so I don't think it's worth to compat with it.
Comment on attachment 8781437 [details] [diff] [review]
test_strip_tab_and_newline

Review of attachment 8781437 [details] [diff] [review]:
-----------------------------------------------------------------

I don't know if we really need _this_ test.
test_standardurl.js::test_filterWhitespace already checks that we strip tab and newlines inside the URL as well.
I think what's missing is a test that setters are behaving properly.
Such as url.filePath = "PA\r\n\tTH"; do_check_eq(url.path, "PA%0D%0A%09TH");
Attachment #8781437 - Flags: review?(valentin.gosu)
(In reply to Valentin Gosu [:valentin] (vacation until 1 Sept) from comment #11)
> Comment on attachment 8781437 [details] [diff] [review]
> test_strip_tab_and_newline
> 
> Review of attachment 8781437 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I don't know if we really need _this_ test.
> test_standardurl.js::test_filterWhitespace already checks that we strip tab
> and newlines inside the URL as well.
> I think what's missing is a test that setters are behaving properly.
> Such as url.filePath = "PA\r\n\tTH"; do_check_eq(url.path, "PA%0D%0A%09TH");

Sorry for missing test_filterWhitespace.
Let me check compat first in order to see what we should do next.
Attachment #8781437 - Attachment is obsolete: true
Attachment #8782385 - Flags: review?(valentin.gosu)
Comment on attachment 8782385 [details] [diff] [review]
test_setters_for_tab_and_newline

Looks good. Thanks!
Attachment #8782385 - Flags: review?(valentin.gosu) → review+
Keywords: checkin-needed
Attachment #8782385 - Attachment is patch: true
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f9094e8d2743
Test setters for tab and newline. r=valentin
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/f9094e8d2743
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: