Closed Bug 241788 Opened 16 years ago Closed 4 years ago

URL: link with href attribute javascript - new line - : is wrongly parsed

Categories

(Core :: Networking, defect)

x86
Windows XP
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox47 --- fixed

People

(Reporter: martijn.martijn, Assigned: valentin)

References

()

Details

(Keywords: testcase, Whiteboard: [necko-backlog])

Attachments

(3 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a) Gecko/20040418 Firefox/0.8.0+
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a) Gecko/20040418 Firefox/0.8.0+

Originally mentioned here:
http://forums.mozillazine.org/viewtopic.php?t=71959&sid=0da1a41302e6fd6b4521f71e4eb83893

Probably related to bug 47078 (or depending on it).

This piece of code:
<a href="javascript
:alert('test')" style="white-space:pre;">a href="
javascript:alert('test')"></a> (watch the new-line)

When hovering over this link, you get to see: "javascript:///alert('test')"
This is incorrect.
When clicking on the link I get a javascript error:
Error: unterminated regular expression literal
Source File: javascript:///alert('test')
Line: 1
Source Code:
/alert('test')

I will attach a testcase.
Notice the difference in behavior between the different versions of Mozilla:
Mozilla1.01: Gives 'file not found'
Mozilla1.4.1 Displays alert box 'test'
Mozilla1.5a and further gives the behavior I described above.
Internet Explorer does the same as Mozilla1.4.1, displays alert box 'test'.

I think the behavior from Mozilla1.01 is the most correct one. 
Well, whatever you decide what the correct behavior is, current behavior is
certainly not correct :)




Reproducible: Always
Steps to Reproduce:
1. See testcase
2.
3.

Actual Results:  
Gives js error

Expected Results:  
file not found error
This has nothing to do with the HTML parser -- this is the URI parser all the way.
Assignee: parser → darin
Component: HTML: Parser → Networking
QA Contact: benc
bz: well, it's certainly not the IO service that causes this:
js> ios=Components.classes["@mozilla.org/network/io-service;1"].getService();
[xpconnect wrapped nsISupports]
js> ios.nsIIOService.newURI("javascript\n:alert('foo');", null, null);
uncaught exception: [Exception... "Component returned failure code: 0x804b000a
[nsIIOService.newURI]"  nsresult: "0x804b000a (<unknown>)"  location: "JS frame
:: typein :: <TOP_LEVEL> :: line 4"  data: no]

it refuses to create such a uri.
> bz: well, it's certainly not the IO service that causes this:

Wrong test, biesi.  The non-URI char before the '\n' means that this URI has
nothing that the IO service considers a scheme, and of course a relative URI
with no base will fail.

>js> ios=Components.classes["@mozilla.org/network/io-service;1"].getService();
>[xpconnect wrapped nsISupports]
>js> ios.nsIIOService.newURI("javascript\n:alert('foo');", null, null);

Try:

ios = Components.classes["@mozilla.org/network/io-service;1"].getService();
base = ios.nsIIOService.newURI("http://mozilla.org/", null, null);
uri = ios.nsIIOService.newURI("javascript\n:alert('foo');", null, base);

which is what has to happen in a real document, of course.

The real problem, as I see it, is that the ExtractScheme() thing that IOService
does in NewURI doesn't find a scheme in that string, but
nsStandardURL::Resolve() does (see code at
http://lxr.mozilla.org/seamonkey/source/netwerk/base/src/nsStandardURL.cpp#1512).

Perhaps ExtractScheme() should do something similar?
> The non-URI char before the '\n'

I meant "before the ':'", of course.
(In reply to comment #4)
> Perhaps ExtractScheme() should do something similar?

note that it would not match the expected results from comment 0 :)

why is \r and \n stripped out?
> note that it would not match the expected results from comment 0 :)

I don't know where those expected results came from, exactly, but it'd make more
sense to do what IE does, no?

> why is \r and \n stripped out?

See bug 102312.  This is basically the same bug; the fix for that one doesn't
work if the base and target URI have different protocols.

The URI RFC has nothing to say on the matter past what it says at
<http://www.gbiv.com/protocols/uri/rfc/rfc2396.html#rfc.section.E> and one could
argue that that doesn't apply here, but realistically we want to be stripping
out newlines in URI processing, imo.
Status: UNCONFIRMED → NEW
Ever confirmed: true
From the RFC:

"In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) may need to be
added to break long URI across lines. The whitespace should be ignored when
extracting the URI."

Shouldn't ioService be removing the whitespace (spaces, new lines, tabs)?

Summary: link with href attribute javascript - new line - : is wrongly parsed → URL: link with href attribute javascript - new line - : is wrongly parsed
Ben, that's talking about cases when we're extracting a URI from text or
something.  This is not quite the case here.  Here, the contents of the src
attribute are unambiguously a URI; this attribute value just happens to contain
chars that are illegal in a URI, so we have to do some sort of error-handling.

But I do agree that the sane error-handling is to strip out whitespace consistently.
(In reply to comment #7)
> I don't know where those expected results came from, exactly, but it'd make more
> sense to do what IE does, no?

hm, yeah, I suppose. either way, we should be consistent, wrt how ExtractScheme
and StandardURL handle schemes.
Keywords: testcase
Assignee: darin → nobody
QA Contact: benc → networking
valentin, what should we do with this bug?
Flags: needinfo?(valentin.gosu)
Whiteboard: [necko-backlog]
This is still an issue - Chrome and Edge seem to be dealing with the last use case (stripping \n but not spaces)
Keeping the need-info flag so I can take a shot at a patch.
* net_ExtractURLScheme now uses mozilla::Tokenizer
* net_FilterURIString also filters characters in the scheme now
* removed startPos and endPos parameters for net_FilterURIString and introduced net_IsAbsoluteURL

Review commit: https://reviewboard.mozilla.org/r/32149/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/32149/
Attachment #8711391 - Flags: review?(honzab.moz)
Attachment #8711392 - Flags: review?(honzab.moz)
Assignee: nobody → valentin.gosu
Flags: needinfo?(valentin.gosu)
Attachment #8711391 - Flags: review?(honzab.moz) → review+
Comment on attachment 8711391 [details]
MozReview Request: Bug 241788 - mozilla::Tokenizer - token type for \n should whitespace if given in constructor r=honzab

https://reviewboard.mozilla.org/r/32147/#review29395
Attachment #8711392 - Flags: review?(honzab.moz)
Comment on attachment 8711392 [details]
MozReview Request: Bug 241788 - net_FilterURIString should filter \r\n\t from the entire URL r=honzab

https://reviewboard.mozilla.org/r/32149/#review29421

::: netwerk/base/nsURLHelper.cpp:493
(Diff revision 1)
> +bool

static?

::: netwerk/base/nsURLHelper.cpp:521
(Diff revision 1)
> -            if (scheme)
> +    while (p.CheckChar(net_IsValidSchemeChar) || p.CheckWhite()) {

so you now accept whitespaces AFTER a schema has started?  means you accept URLs like:

"  \r\nht\r\n tp://example.com"

I thought the bug was just about accepting leading white chars, OTOH, I'm not that deeply familiar with the latest URI specs.

::: netwerk/base/nsURLHelper.cpp:524
(Diff revision 1)
> -        else 
> +    if (p.CheckChar(':')) {

nit: to stick with "early return failure" pattern, rather do if (!p.CheckChar(':')) { return MARLF_URI; }

::: netwerk/base/nsURLHelper.cpp:526
(Diff revision 1)
> +        scheme.StripChars("\r\n\t");

tab is removed but space is not?  odd.

::: netwerk/base/nsURLHelper.cpp:547
(Diff revision 1)
>              return false;

seems like this method will return false while your new code will success.

::: netwerk/base/nsURLHelper.cpp:574
(Diff revision 1)
> -                break;
> +    p.SkipWhites();

we accept ws here?

::: netwerk/test/unit/test_standardurl.js:278
(Diff revision 1)
> +  var url = stringToURL(" \r\n\th\nt\rt\tp://ex\r\n\tample.com/path\r\n\t/\r\n\tto/fil\r\n\te.e\r\n\txt?que\r\n\try#ha\r\n\tsh \r\n\t ");

we really want to support this?  OK, so when I type something like this to a URL bar, the path portion gets escaped BEFORE we create the URL object of it?  Because any spaces in the path/query will otherwise be removed with this.
https://reviewboard.mozilla.org/r/32149/#review29421

> so you now accept whitespaces AFTER a schema has started?  means you accept URLs like:
> 
> "  \r\nht\r\n tp://example.com"
> 
> I thought the bug was just about accepting leading white chars, OTOH, I'm not that deeply familiar with the latest URI specs.

Actually no. A white space ' ' is only allowed at the begining or end.
In between we only allow \n\r\t - but not ' '. This is what Blink does as well.
In this context p.CheckWhite() only matches the characters I gave in the Tokenizer's constructor \r\n\t

> tab is removed but space is not?  odd.

Yes! Space is not allowed inside the URL, only in leading or trailing characters.

> seems like this method will return false while your new code will success.

Right. The fact is that net_IsValidScheme("ht\ntp") should still return false. That's not a valid scheme.
It's only that net_ExtractURLScheme will extract the valid scheme even though it has \r\t\n embedded in it, and net_FilterURIString will strip these characters from all of the URI.

> we accept ws here?

Yes. WS as in \r\n\t, but not ' '

> we really want to support this?  OK, so when I type something like this to a URL bar, the path portion gets escaped BEFORE we create the URL object of it?  Because any spaces in the path/query will otherwise be removed with this.

That is still valid. Spaces inside the URL are not stripped so they get escaped eventually.
I'll add a unit test for that case too, to make sure we don't mess it up.
Comment on attachment 8711391 [details]
MozReview Request: Bug 241788 - mozilla::Tokenizer - token type for \n should whitespace if given in constructor r=honzab

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32147/diff/1-2/
Comment on attachment 8711392 [details]
MozReview Request: Bug 241788 - net_FilterURIString should filter \r\n\t from the entire URL r=honzab

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32149/diff/1-2/
Attachment #8711392 - Flags: review?(honzab.moz)
Comment on attachment 8711392 [details]
MozReview Request: Bug 241788 - net_FilterURIString should filter \r\n\t from the entire URL r=honzab

https://reviewboard.mozilla.org/r/32149/#review30307

::: netwerk/base/nsURLHelper.cpp:532
(Diff revisions 1 - 2)
> -    return NS_ERROR_MALFORMED_URI;
> +    return NS_OK;

when MOZILLA_XPCOMRT_API is defined, this should return a failure?
Attachment #8711392 - Flags: review?(honzab.moz) → review+
(In reply to Honza Bambas (:mayhemer) from comment #22)
> > -    return NS_ERROR_MALFORMED_URI;
> > +    return NS_OK;
> when MOZILLA_XPCOMRT_API is defined, this should return a failure?

Yeah, I guess that would be best.
https://hg.mozilla.org/integration/mozilla-inbound/rev/5342641913be9b90f5309a42ae226a1d340f30c6
Bug 241788 - mozilla::Tokenizer - token type for \n should whitespace if given in constructor r=honzab

https://hg.mozilla.org/integration/mozilla-inbound/rev/77f8588dad8c0349f88df7798a2e612ada487ffe
Bug 241788 - net_FilterURIString should filter \r\n\t from the entire URL r=honzab
You need to log in before you can comment on or make changes to this bug.