Closed Bug 241788 Opened 20 years ago Closed 8 years ago

URL: link with href attribute javascript - new line - : is wrongly parsed

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla47

Tracking Flags:

Tracking

Status

firefox47

---

fixed

People

(Reporter: martijn.martijn, Assigned: valentin)

References

(
URL
)

Details

(Keywords: testcase, Whiteboard: [necko-backlog])

Attachments

(3 files)

Testcase to show the bug 20 years ago Martijn Wargers (dead) 458 bytes, text/html		Details
MozReview Request: Bug 241788 - mozilla::Tokenizer - token type for \n should whitespace if given in constructor r=honzab 8 years ago Valentin Gosu [:valentin] (he/him) 58 bytes, text/x-review-board-request	mayhemer : review+	Details
MozReview Request: Bug 241788 - net_FilterURIString should filter \r\n\t from the entire URL r=honzab 8 years ago Valentin Gosu [:valentin] (he/him) 58 bytes, text/x-review-board-request	mayhemer : review+	Details

Martijn Wargers (dead)

Reporter

Description

•

20 years ago

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a) Gecko/20040418 Firefox/0.8.0+
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a) Gecko/20040418 Firefox/0.8.0+

Originally mentioned here:
http://forums.mozillazine.org/viewtopic.php?t=71959&sid=0da1a41302e6fd6b4521f71e4eb83893

Probably related to bug 47078 (or depending on it).

This piece of code:
<a href="javascript
:alert('test')" style="white-space:pre;">a href="
javascript:alert('test')"></a> (watch the new-line)

When hovering over this link, you get to see: "javascript:///alert('test')"
This is incorrect.
When clicking on the link I get a javascript error:
Error: unterminated regular expression literal
Source File: javascript:///alert('test')
Line: 1
Source Code:
/alert('test')

I will attach a testcase.
Notice the difference in behavior between the different versions of Mozilla:
Mozilla1.01: Gives 'file not found'
Mozilla1.4.1 Displays alert box 'test'
Mozilla1.5a and further gives the behavior I described above.
Internet Explorer does the same as Mozilla1.4.1, displays alert box 'test'.

I think the behavior from Mozilla1.01 is the most correct one. 
Well, whatever you decide what the correct behavior is, current behavior is
certainly not correct :)




Reproducible: Always
Steps to Reproduce:
1. See testcase
2.
3.

Actual Results:  
Gives js error

Expected Results:  
file not found error

Martijn Wargers (dead)

Reporter

Comment 1

•

20 years ago

Attached file Testcase to show the bug — Details

Boris Zbarsky [:bzbarsky]

Comment 2

•

20 years ago

This has nothing to do with the HTML parser -- this is the URI parser all the way.

Assignee: parser → darin

Component: HTML: Parser → Networking

QA Contact: benc

Christian :Biesinger (don't email me, ping me on IRC)

Comment 3

•

20 years ago

bz: well, it's certainly not the IO service that causes this:
js> ios=Components.classes["@mozilla.org/network/io-service;1"].getService();
[xpconnect wrapped nsISupports]
js> ios.nsIIOService.newURI("javascript\n:alert('foo');", null, null);
uncaught exception: [Exception... "Component returned failure code: 0x804b000a
[nsIIOService.newURI]"  nsresult: "0x804b000a (<unknown>)"  location: "JS frame
:: typein :: <TOP_LEVEL> :: line 4"  data: no]

it refuses to create such a uri.

Boris Zbarsky [:bzbarsky]

Comment 4

•

20 years ago

> bz: well, it's certainly not the IO service that causes this:

Wrong test, biesi.  The non-URI char before the '\n' means that this URI has
nothing that the IO service considers a scheme, and of course a relative URI
with no base will fail.

>js> ios=Components.classes["@mozilla.org/network/io-service;1"].getService();
>[xpconnect wrapped nsISupports]
>js> ios.nsIIOService.newURI("javascript\n:alert('foo');", null, null);

Try:

ios = Components.classes["@mozilla.org/network/io-service;1"].getService();
base = ios.nsIIOService.newURI("http://mozilla.org/", null, null);
uri = ios.nsIIOService.newURI("javascript\n:alert('foo');", null, base);

which is what has to happen in a real document, of course.

The real problem, as I see it, is that the ExtractScheme() thing that IOService
does in NewURI doesn't find a scheme in that string, but
nsStandardURL::Resolve() does (see code at
http://lxr.mozilla.org/seamonkey/source/netwerk/base/src/nsStandardURL.cpp#1512).

Perhaps ExtractScheme() should do something similar?

Boris Zbarsky [:bzbarsky]

Comment 5

•

20 years ago

> The non-URI char before the '\n'

I meant "before the ':'", of course.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 6

•

20 years ago

(In reply to comment #4)
> Perhaps ExtractScheme() should do something similar?

note that it would not match the expected results from comment 0 :)

why is \r and \n stripped out?

Boris Zbarsky [:bzbarsky]

Comment 7

•

20 years ago

> note that it would not match the expected results from comment 0 :)

I don't know where those expected results came from, exactly, but it'd make more
sense to do what IE does, no?

> why is \r and \n stripped out?

See bug 102312.  This is basically the same bug; the fix for that one doesn't
work if the base and target URI have different protocols.

The URI RFC has nothing to say on the matter past what it says at
<http://www.gbiv.com/protocols/uri/rfc/rfc2396.html#rfc.section.E> and one could
argue that that doesn't apply here, but realistically we want to be stripping
out newlines in URI processing, imo.

Status: UNCONFIRMED → NEW

Ever confirmed: true

benc

Comment 8

•

20 years ago

From the RFC:

"In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) may need to be
added to break long URI across lines. The whitespace should be ignored when
extracting the URI."

Shouldn't ioService be removing the whitespace (spaces, new lines, tabs)?

Summary: link with href attribute javascript - new line - : is wrongly parsed → URL: link with href attribute javascript - new line - : is wrongly parsed

Boris Zbarsky [:bzbarsky]

Comment 9

•

20 years ago

Ben, that's talking about cases when we're extracting a URI from text or
something.  This is not quite the case here.  Here, the contents of the src
attribute are unambiguously a URI; this attribute value just happens to contain
chars that are illegal in a URI, so we have to do some sort of error-handling.

But I do agree that the sane error-handling is to strip out whitespace consistently.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 10

•

20 years ago

(In reply to comment #7)
> I don't know where those expected results came from, exactly, but it'd make more
> sense to do what IE does, no?

hm, yeah, I suppose. either way, we should be consistent, wrt how ExtractScheme
and StandardURL handle schemes.

Jason Barnabe (np)

Updated

•

20 years ago

Keywords: testcase

Darin Fisher

Updated

•

18 years ago

Assignee: darin → nobody

QA Contact: benc → networking

Patrick McManus [:mcmanus]

Comment 11

•

8 years ago

valentin, what should we do with this bug?

Flags: needinfo?(valentin.gosu)

Whiteboard: [necko-backlog]

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 12

•

8 years ago

This is still an issue - Chrome and Edge seem to be dealing with the last use case (stripping \n but not spaces)
Keeping the need-info flag so I can take a shot at a patch.

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 13

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c12d35e57a8b

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 14

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1a4f4b68895d

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 15

•

8 years ago

Attached file MozReview Request: Bug 241788 - mozilla::Tokenizer - token type for \n should whitespace if given in constructor r=honzab — Details

Review commit: https://reviewboard.mozilla.org/r/32147/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/32147/

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 16

•

8 years ago

Attached file MozReview Request: Bug 241788 - net_FilterURIString should filter \r\n\t from the entire URL r=honzab — Details

* net_ExtractURLScheme now uses mozilla::Tokenizer
* net_FilterURIString also filters characters in the scheme now
* removed startPos and endPos parameters for net_FilterURIString and introduced net_IsAbsoluteURL

Review commit: https://reviewboard.mozilla.org/r/32149/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/32149/

Valentin Gosu [:valentin] (he/him)

Assignee

Updated

•

8 years ago

Attachment #8711391 - Flags: review?(honzab.moz)

Valentin Gosu [:valentin] (he/him)

Assignee

Updated

•

8 years ago

Attachment #8711392 - Flags: review?(honzab.moz)

Valentin Gosu [:valentin] (he/him)

Assignee

Updated

•

8 years ago

Assignee: nobody → valentin.gosu

Flags: needinfo?(valentin.gosu)

Honza Bambas (:mayhemer)

Updated

•

8 years ago

Attachment #8711391 - Flags: review?(honzab.moz) → review+

Honza Bambas (:mayhemer)

Comment 17

•

8 years ago

Comment on attachment 8711391 [details]
MozReview Request: Bug 241788 - mozilla::Tokenizer - token type for \n should whitespace if given in constructor r=honzab

https://reviewboard.mozilla.org/r/32147/#review29395

Honza Bambas (:mayhemer)

Updated

•

8 years ago

Attachment #8711392 - Flags: review?(honzab.moz)

Honza Bambas (:mayhemer)

Comment 18

•

8 years ago

Comment on attachment 8711392 [details]
MozReview Request: Bug 241788 - net_FilterURIString should filter \r\n\t from the entire URL r=honzab

https://reviewboard.mozilla.org/r/32149/#review29421

::: netwerk/base/nsURLHelper.cpp:493
(Diff revision 1)
> +bool

static?

::: netwerk/base/nsURLHelper.cpp:521
(Diff revision 1)
> -            if (scheme)
> +    while (p.CheckChar(net_IsValidSchemeChar) || p.CheckWhite()) {

so you now accept whitespaces AFTER a schema has started?  means you accept URLs like:

"  \r\nht\r\n tp://example.com"

I thought the bug was just about accepting leading white chars, OTOH, I'm not that deeply familiar with the latest URI specs.

::: netwerk/base/nsURLHelper.cpp:524
(Diff revision 1)
> -        else 
> +    if (p.CheckChar(':')) {

nit: to stick with "early return failure" pattern, rather do if (!p.CheckChar(':')) { return MARLF_URI; }

::: netwerk/base/nsURLHelper.cpp:526
(Diff revision 1)
> +        scheme.StripChars("\r\n\t");

tab is removed but space is not?  odd.

::: netwerk/base/nsURLHelper.cpp:547
(Diff revision 1)
>              return false;

seems like this method will return false while your new code will success.

::: netwerk/base/nsURLHelper.cpp:574
(Diff revision 1)
> -                break;
> +    p.SkipWhites();

we accept ws here?

::: netwerk/test/unit/test_standardurl.js:278
(Diff revision 1)
> +  var url = stringToURL(" \r\n\th\nt\rt\tp://ex\r\n\tample.com/path\r\n\t/\r\n\tto/fil\r\n\te.e\r\n\txt?que\r\n\try#ha\r\n\tsh \r\n\t ");

we really want to support this?  OK, so when I type something like this to a URL bar, the path portion gets escaped BEFORE we create the URL object of it?  Because any spaces in the path/query will otherwise be removed with this.

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 19

•

8 years ago

https://reviewboard.mozilla.org/r/32149/#review29421

> so you now accept whitespaces AFTER a schema has started?  means you accept URLs like:
> 
> "  \r\nht\r\n tp://example.com"
> 
> I thought the bug was just about accepting leading white chars, OTOH, I'm not that deeply familiar with the latest URI specs.

Actually no. A white space ' ' is only allowed at the begining or end.
In between we only allow \n\r\t - but not ' '. This is what Blink does as well.
In this context p.CheckWhite() only matches the characters I gave in the Tokenizer's constructor \r\n\t

> tab is removed but space is not?  odd.

Yes! Space is not allowed inside the URL, only in leading or trailing characters.

> seems like this method will return false while your new code will success.

Right. The fact is that net_IsValidScheme("ht\ntp") should still return false. That's not a valid scheme.
It's only that net_ExtractURLScheme will extract the valid scheme even though it has \r\t\n embedded in it, and net_FilterURIString will strip these characters from all of the URI.

> we accept ws here?

Yes. WS as in \r\n\t, but not ' '

> we really want to support this?  OK, so when I type something like this to a URL bar, the path portion gets escaped BEFORE we create the URL object of it?  Because any spaces in the path/query will otherwise be removed with this.

That is still valid. Spaces inside the URL are not stripped so they get escaped eventually.
I'll add a unit test for that case too, to make sure we don't mess it up.

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 20

•

8 years ago

Comment on attachment 8711391 [details]
MozReview Request: Bug 241788 - mozilla::Tokenizer - token type for \n should whitespace if given in constructor r=honzab

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32147/diff/1-2/

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 21

•

8 years ago

Comment on attachment 8711392 [details]
MozReview Request: Bug 241788 - net_FilterURIString should filter \r\n\t from the entire URL r=honzab

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32149/diff/1-2/

Attachment #8711392 - Flags: review?(honzab.moz)

Honza Bambas (:mayhemer)

Comment 22

•

8 years ago

Comment on attachment 8711392 [details]
MozReview Request: Bug 241788 - net_FilterURIString should filter \r\n\t from the entire URL r=honzab

https://reviewboard.mozilla.org/r/32149/#review30307

::: netwerk/base/nsURLHelper.cpp:532
(Diff revisions 1 - 2)
> -    return NS_ERROR_MALFORMED_URI;
> +    return NS_OK;

when MOZILLA_XPCOMRT_API is defined, this should return a failure?

Attachment #8711392 - Flags: review?(honzab.moz) → review+

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 23

•

8 years ago

(In reply to Honza Bambas (:mayhemer) from comment #22)
> > -    return NS_ERROR_MALFORMED_URI;
> > +    return NS_OK;
> when MOZILLA_XPCOMRT_API is defined, this should return a failure?

Yeah, I guess that would be best.

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 24

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/5342641913be9b90f5309a42ae226a1d340f30c6
Bug 241788 - mozilla::Tokenizer - token type for \n should whitespace if given in constructor r=honzab

https://hg.mozilla.org/integration/mozilla-inbound/rev/77f8588dad8c0349f88df7798a2e612ada487ffe
Bug 241788 - net_FilterURIString should filter \r\n\t from the entire URL r=honzab

Valentin Gosu [:valentin] (he/him)

Assignee

Comment 25

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/9072e402ee233d2f37835e2a5e9f5c4d1b9728bf
Bug 241788 - Add missing run_next_test() to xpcshell-test a=testonly

Phil Ringnalda (:philor)

Comment 26

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/5342641913be
https://hg.mozilla.org/mozilla-central/rev/77f8588dad8c
https://hg.mozilla.org/mozilla-central/rev/9072e402ee23

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox47: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla47

You need to log in before you can comment on or make changes to this bug.