Closed Bug 662215 Opened 13 years ago Closed 13 years ago

"ASSERTION: not a file URL"

Categories

(Core :: Networking, defect)

x86_64
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla7
Tracking Status
firefox7 --- fixed
blocking1.9.2 --- .23+
status1.9.2 --- .23-fixed

People

(Reporter: jruderman, Assigned: jesup)

References

Details

(Keywords: assertion, testcase, Whiteboard: [sg:low][qa-])

Attachments

(7 files, 4 obsolete files)

###!!! ASSERTION: not a file URL: 'Error', file netwerk/base/src/nsStandardURL.cpp

Firefox seems to be quite confused about what kind of URL 
   d\nata:text/plain,2
is.  Some parts are treating it as a data: URL and others... aren't.
Attached file stack trace
It ends up downloaded as a file called "plain,2" containing just "2".
So we're creating an nsStandardURL with the spec "data:///text/plain,2".

The issue is that in newURI we call net_ExtractURLScheme which will stop at the '\n' and return, claiming no scheme.  Then we proceed to treat this as a relative URI (relative to a file:// URI) and call nsStandardURL::Init, which once again does not find a scheme and hence calls into nsStandardURL::Resolve.  

This is where things break.  The first thing Resolve does is filter out "\r\n\t".  Then it calls ParseURL(), which extracts a scheme from the resulting url string.

I think we should extract the scheme before we net_FilterURIString, except that can cause issues with leading whitespace....
Randell, do you want to take a look at this?
I'll take a look.  Taking bug.
Assignee: nobody → rjesup
I have a patch that results in this when you try the testcase:

Not Found

The requested URL /data:text/plain,2 was not found on this server.

No assertion, just:
WARNING: malformed uri: file /home/jesup/src/mozilla/head/netwerk/base/src/nsURLParsers.cpp, line 153

in the (debug) output.

The patch causes net_FilterURIString to not remove \n\r\t from the scheme portion of a URL.  It later will be flagged as a malformed URI by net_IsValidScheme().  (I'll need to verify all possible cases are covered, but I think they are.  There are only 3 calls to net_FilterURIString.)
Attached patch Proposed patch (obsolete) — Splinter Review
Attachment #537943 - Flags: review?(bzbarsky)
Comment on attachment 537943 [details] [diff] [review]
Proposed patch

I believe this does the wrong thing when |str| is a relative URI which doesn't contain ':' at all but does contain \n.
Attachment #537943 - Flags: review?(bzbarsky) → review-
You're right - I rewrote it to avoid any perf losses and lost the "act the way it would have if there's no ':' anywhere" aspect.  That's a little tricky without reading the string twice.

Updated patch tonight.
Attached patch Updated patch (obsolete) — Splinter Review
Updated.  A little tricky to avoid duplicating code, but well documented and fast for the normal case.

I'll need to make a testcase for the no-scheme thing before releasing, and also double-check that the : checking doesn't get broken by something like IPV6 literals (check it against the other scheme-parsing code for consistency).  So, at this point I'd like it to be looked at, but until I test/check those cases it's not ready for checkin.
Attachment #537943 - Attachment is obsolete: true
Attachment #538102 - Flags: review?(bzbarsky)
This relative URL appears to work the same in both the new patch and in 4.0.1, so that's good (the \n gets filtered)
What's your best guess for security severity here?

Would it be possible to hide a second scheme after the one delimited by \n? There'd be value in evading site filters with something that starts with some whitelisted scheme (http:) followed by what might be interpreted as javascript:. Or is it only possible to get relative URLs out of this?
I'd say low, but it's worth some more thought.

A filter might see java\nscript: and assume it's a relative URL (or an unknown scheme), but it might Resolve() to javascript: (as you mention).

However, I'm not sure I can see a real attack vector, where one part might be confused by the \n but we'd remove it and hit a different scheme.  It would have to be an external filter or one that works before (during?) DOM parsing/creation.

It is a little hard to wrap my head around the possibilities here without looking a the callers.  Also, I'd need to know where (internal) filters might be, how they work, etc.  However, I *think* the risk is pretty minor depending on what filters might be affected.  I bow to others knowledge on this.
There are multiple ways to extract the scheme:

net_ExtractURLScheme(): validates the format of a scheme (alpha[alpha|digit|+|.|-]*): as it extracts, returns MALFORMED_URI for anything that doesn't match.  Format matching is the same as net_IsValidScheme().  Result used in nsIOService::NewURI() to decide if it's relative or not, and if so Resolve() it (which caused this bug).

nsBaseURLParser::ParseURL(): searches for first colon (excluding ones after a [ or @) and the first slash (or #/?/;).  If the colon came first, that's end of scheme, otherwise there is no scheme.  Then it calls net_IsValidScheme()

nsSimpleURI::SetSpec(): uses FindChar(':'), then it calls net_IsValidScheme()

While they disagree, all use net_IsValidScheme or equivalent, so invalid schemes shouldn't get through - the problem in this bug is that they got caught until the spec was stripped before parsing, and stripping could modify the scheme portion.

Thus while unifying these would be good, fixing net_FilterURIString() should handle all the cases that cause problems.  I'm updating the patch one last time to be comparable to nsBaseURLParser::ParseURL() in deciding what to strip of \t\r\n (basically stop looking for a scheme on '/', '@' or '[' and assume it's all relative URL).
Tested against both testcases here, plus general browsing.  I'll hit a try server before checkin probably just for paranoia.
Attachment #538102 - Attachment is obsolete: true
Attachment #538102 - Flags: review?(bzbarsky)
Attachment #538193 - Flags: review?(bzbarsky)
Comment on attachment 538193 [details] [diff] [review]
Patch ready for review/checkin

I think this is still wrong... it'll find a scheme in situations where net_ExtractURLScheme won't.

I really think we should look into doing the last paragraph of comment 3....
Attachment #538193 - Flags: review?(bzbarsky) → review-
net_ExtractURLScheme() wants "(alpha[alpha|digit|+|.|-]*):" for a valid scheme (and it will skip whitespace).  Obviously any \t\r\n's will cause it to return MALFORMED_URI.

net_FilterURIString() will remote \t\r\n, but only after what it feels is a valid scheme.  It's then followed by ParseURL(), which extracts the scheme and call IsValidScheme(), which is the equivalent to netExtractURLScheme().

So the only question is does net_FilterURIString remove \t\r\n from what ParseURL will pull out as the scheme?


1) Does net_FilterURIString remove \t\r\n from what ParseURL will pull out as the scheme?

No.  net_FilterURIString uses the same logic (including relative positions of / and @) to decide where the scheme ends as nsBaseURLParser::ParseURL().  \t\r\n will not be removed if ParseURL will think they're part of the scheme.

2) net_IsValidScheme() will reject anything with \t\r\n, so Filtering will never cause something to pass as a scheme where it wouldn't without filtering (assuming leading whitespace removal).

ParseURL will identify a wider selection as a scheme than net_ExtractURLScheme(), but it takes that selection and runs it through net_IsValidScheme(), which guarantees nothing will pass ParseURL that would kick out of net_ExtractURLScheme().

Note that if anything, it's erring to the conservative side now in net_FilterURIString(), because it's not removing \t\r\n from the wider selection that might be a valid scheme.

Can you think of a case this code wouldn't handle?  It's easy to test.

The current patch has little or no perf impact, as it only backs up to change the string in the very very odd case that it finds a \t\r\n, and it doesn't add an extra parse of the scheme (which can mean walking the whole string in a relative URL case).  The simple solution from comment 3 would have some negative perf implications (such as in relative URLs).
> So the only question is does net_FilterURIString remove \t\r\n from what
> ParseURL will pull out as the scheme?

No, the second question is does net_FilterURIString fail to remove \t\r\n from stuff that wouldn't be considered as a scheme anyway?  The answer seems to be "yes" as far as I can tell.
It may leave in \t\r\n in sections that net_ExtractURLScheme() wouldn't look at (since it exits early on failure) - but ParseURL will exit with a MALFORMED_URI error after calling net_IsValidScheme(), and so it doesn't matter.  SetSpec() will return failure.  It would only be a problem if net_FilterURIString() didn't match what ParseURL is doing.

I can add comments to that effect, warning people these two parsers must remain in sync.
If randall's right that you can't hide a second scheme then this is probably low severity. Still could lead to wrong behavior though
Whiteboard: [sg:low]
So with this patch if I have the base URL http://foo.com/ and I resolve the string

  "foo\nbar:andsomething"

(where the \n is a newline) relative to it, we will leave the \n in place in net_FilterURIString, then call ParseURL(), which will throw NS_ERROR_INVALID_URI, right?  Then we treat that url, with the newline still in it, as a relative URL, correct?  That seems suboptimal.
Well, in some ways it *is* a relative URL - there is no valid scheme.  A malformed relative URL.

nsSimpleURI::SetSpec() will call ParseURL which uses Validate and thus will return MALFORMED_URI, as will nsStandardURI::SetSpec().

nsStandardURL::Resolve() will decide it's relative (because ParseURL() will fail), and build a malformed URL from the base plus the bad input, since Resolve() doesn't validate the result (or normalize it).  However, the base's scheme will still be the scheme, so there shouldn't be a security risk here.  The URL will be bad, and should at most lead to a 404 page.

Note that we don't validate any constructed relative URL, so this is no different in that way.  You can argue that that is a (different) bug, though fixing it would result in somewhat slower Resolve() performance.
> Well, in some ways it *is* a relative URL - there is no valid scheme

Sure.  That's not the problem.  Not removing the newline concerns me more.

> The URL will be bad, and should at most lead to a 404 page.

Yes, that's my concern.  ;)  Seems like the right behavior in that situation is to create the URI "http://foo.com/foobar:andsomething" (with the newline stripped out).

What do other browsers do?

> Note that we don't validate any constructed relative URL

Except we go through Resolve() on NS_NewURI calls for relative URIs, so this is about dealing with web input, not about dealing with stuff people stick in via URI APIs directly.
(In reply to comment #23)
> > Well, in some ways it *is* a relative URL - there is no valid scheme
> 
> Sure.  That's not the problem.  Not removing the newline concerns me more.
> 
> > The URL will be bad, and should at most lead to a 404 page.
> 
> Yes, that's my concern.  ;)  Seems like the right behavior in that situation
> is to create the URI "http://foo.com/foobar:andsomething" (with the newline
> stripped out).

I'm not sure there is a "right" behavior in the case of a malformed relative URL like this.  If anything, the right thing is more likely to be to terminate it at the \r or \n than our current "remove them".  I suspect "remove them" is to help users who paste in URLs that were split (by email/etc) across lines.

This case, however, is different, because it's in the scheme and that changes how we see the URL (and could make filters that look a scheme miss it) - unless we want to make it repair the scheme by removing the \n and treat it the same as normal - i.e. *let* the scheme be 'hidden'.  I don't advise that.

So if we do as you indicate, it still produces a 'bad' URL, just in a different way - the : isn't valid there - so I don't see why you have a preference for that version.

> 
> What do other browsers do?

Opened from a file: URL:

IE8: 
  absolute: data:text/plain,2 - can't load
  relative: didn't seem to try to load the iframe

(when loaded from http, it appears to create the iframe and try to load http://site/data/text/plain,2 and gets a 403 Forbidden)

Chrome:
  absolute: iframe with "2" in it
  relative: No webpage was found for the web address: file:///D:/Temp/data/text/plain,2

(When loaded from http, it appears to load http://site/data/text/plain,2 and get 403)

Chrome seemed to treat it as relative and strip both the \n and the ':'(!).  IE seemed to treat it as malformed and not even try to load it.  If I wasn't about to fall over I'd copy them to an HTTP server where I can check what is actually requested.

> > Note that we don't validate any constructed relative URL
> 
> Except we go through Resolve() on NS_NewURI calls for relative URIs, so this
> is about dealing with web input, not about dealing with stuff people stick
> in via URI APIs directly.

Right - but why does that matter?  Perhaps I'm missing something.
IE 8 and chrome both (for the relative case) fetch "GET /data/text/plain,2".  They do different things for the absolute case.
I have to say converting d\nata:text/plain,2 to http://site/data/text/plain,2 has it's own issues with it possibly avoiding filters, etc.  I do't think that in "bad" malformed URLs it should be making such large transformations - especially removing the ':'.  If they left it there, it's more likely that the website would (properly) reject the GET as malformed.  (Though perhaps leaving the : could cause problems with other schemes, like file:.) 

The best solution IMHO would be to on a malformed URI return an error for Resolve() - though we'd want to check the calls to be sure.  Again I assume the point of the \n\r\t removal is to "fix" when URLs get broken in email/etc, and are then cut-and-pasted.  This would be noticeably different than either IE or Chrome.

However, I think this patch resolves the bug as presented and remove any significant (and perhaps any) security risk.
> I suspect "remove them" is to help users who paste in URLs that were split (by
> email/etc) across lines

Yes, exactly.

> This case, however, is different, because it's in the scheme

Hmm...  The argument being that had the url not gotten split this would be seen as a scheme.  Yeah, ok.

> Right - but why does that matter?

Because that means the codepath gets hit for <a href> on web pages.  That is, that codepath is the #1 most common case of URI creation.  Hence giving it behavior web pages don't expect is bad, while giving such behavior to the URI setters may not be.

It looks like for now the other browsers don't so much agree, so probably no compat worries, which is what I was worried about.
Comment on attachment 538193 [details] [diff] [review]
Patch ready for review/checkin

r=me, but please add a test in this bug (but not checked in, until we open up the bug).
Attachment #538193 - Flags: review- → review+
Test is largely coded; one more main test to add (I'm also adding tests for other aspects of URI objects while I'm at it (IPV6 addresses, check user/password/port values, etc), and testing \n\r\t in other parts of URIs).

I'm going to commit this patch and add the tests here in the next day or so once I'm done cleaning them up.  Thanks.
Comment on attachment 538193 [details] [diff] [review]
Patch ready for review/checkin

Checked in as http://hg.mozilla.org/mozilla-central/rev/4df47729b74d
Attachment #538193 - Flags: checkin+
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Attached patch First cut at testcode (obsolete) — Splinter Review
I'll likely break the relevant testcase or two out of this patch and leave them as a separate patch on this bug until it's public, and commit the rest of the test_URIs.js patch in the meantime.

I plan to add a few more tests, do some cleanup, and if I feel ambitious rework the test structure to allow for a base or relative URL with ref's to be tested for adding (replacing) refs.  Right now the testcode assumes it can just concatenate a spec and #blah, which isn't the case if the spec has a #foo.

Also, I found one very minor 'bug' vs RFC 3986; the test for that is disabled.
Attached patch Small cleanup of tests patch (obsolete) — Splinter Review
Attachment #544662 - Attachment is obsolete: true
Target Milestone: --- → mozilla7
The main test patch landed already on trunk; this is the portion that applies to this bug.
Attachment #544744 - Attachment is obsolete: true
blocking1.9.2: --- → .21+
This patch applies cleanly against 1.9.2; however my build environment is gcc 4.6+ (Fedora 15), which does not like 1.9.2.

I see no reason to believe that this patch would be incorrect to land for 1.9.2.  I'm updating an older VM which has gcc 4.5.2 to see if that works better.  I have a build now (crashed my VM once building...); I'll give it a spin
Attached patch Patch for 3.6Splinter Review
Patch for 3.6; verified to produce the 'right' error; no changes from the trunk version
Attachment #560983 - Flags: review?(bzbarsky)
Comment on attachment 560983 [details] [diff] [review]
Patch for 3.6

backport r=dveditz
Approved for the 1.9.2 branch, a=dveditz
Attachment #560983 - Flags: review?(bzbarsky)
Attachment #560983 - Flags: review+
Attachment #560983 - Flags: approval1.9.2.23+
qa+ for QA fix verification on Firefox 7
Whiteboard: [sg:low] → [sg:low][qa+]
Changing to qa- as this is an assertion.
Whiteboard: [sg:low][qa+] → [sg:low][qa-]
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: