Closed Opened 4 years ago Closed 1 year ago

# ContentPrincipal makes assumptions about escaping of '^' in URIs that don't seem to be true

Not set
normal

## Tracking

### ()

VERIFIED FIXED
mozilla77
Tracking Status
firefox-esr68 --- wontfix
firefox55 --- wontfix
firefox74 --- wontfix
firefox75 --- wontfix
firefox76 --- wontfix
firefox77 --- verified

## Attachments

### (1 file)

 47 bytes, text/x-phabricator-request Details | Review
Testcase

<a href="http://test^test/foo^bar#x^y">Test</a>
<script>
</script>

Note that the '^' in the hostname is NOT escaped.  Same thing for .host.

As far as I can tell, ContentPrincipal::GenerateOriginNoSuffixFromURI assumes that GetAsciiHostPort() on such a URI _will_ escape '^' or something.  I suspect that this assumption is false.

(Note that Gecko does escape ^ in the path, but not in the anchor or the hostname... The good news for the former is that we strip off the anchor and query in GenerateOriginNoSuffixFromURI in the "full spec" case, so don't need to worry about those bits.)
Flags: needinfo?(bobbyholley)
On our call, bz determined that Chrome and edge seem to reject ^, and so we can probably change this spec. NI bz to fill in the details and follow up with Anne.
Flags: needinfo?(bobbyholley) → needinfo?(bzbarsky)
Note that in the standard we've been through this various times already: https://github.com/whatwg/url/issues/159 and https://github.com/whatwg/url/issues/214. Putting arbitrary restrictions on hosts is arguably not in line with the design of generic URLs and might break non-DNS systems. Chrome rejecting this particular code point makes that less of a concern (though I believe that doesn't mean they'll reject it in all code paths that involve their URL parser, so maybe it is), but what do we tell Safari about the rationale for blocking this?

I'd rather we not depend on such risky things to begin with.
So in terms of browser behavior, here is a testcase:

<pre><script>
var strs = [
"http://test^test/foo\\bar",
"http://test:2^3/foo\\bar",
"http://test/foo^bar",
];
for (var str of strs) {
var a = document.createElement("a");
a.setAttribute("href", str);
var href;
try {
href = a.href;
} catch(e) {
href = "href getter threw";
}
var url;
try {
url = (new URL(str).href);
} catch(e) {
url = "constructor threw";
}
document.writeln(str, " -- ", href, " -- ", url);
}
</script>

with a live version at https://jsfiddle.net/yb02ng3r/2/ if someone wants to see it.  Browser behavior seems to be as follows:

* Safari: Allows '^' in host or path.  Does not allow it in port; fails URL parsing in
that case (as you can tell by the URL ctor throwing on the second output line
and the '\\' not getting changed to '/' in the href).
* Firefox: Allows '^' in host.  Escapes it in path.  As Safari for port.
* Edge: Escapes '^' in host and path.  Fails parsing for '^' in port, and also
throws from href getter.
* Chrome: Escapes '^' in host via href, but throws from new URL().  Allows '^' in port
via href, but throws from new URL() (!).  Escapes '^' in path.

I believe the current spec text aligns with Safari, for what it's worth.

Anne, what we need in practice is a faithful string representation of an origin, in a slightly broader meaning than the HTML spec one.  That means a URL (but without the ref and query bits) and some other additional information.  We need a way to clearly separate the additional information from the URL.  At the time we set this up, it was pretty clear from the URL RFCs that '^' was not allowed in URLs in unescaped form.  For example, it's not in either the "reserved" or the "unreserved" set in RFC 2396, same for RFC 3986.  I _think_ the whatwg URL spec at the time also forbid it in unescaped form, though I don't have a good way to check that.

If we can't use '^' (e.g. by aligning on the Edge behavior instead of the Safari one), what can we use?  I guess we can try '>'.  This is in the userinfo and path percent-encode sets in the URL spec, is not allowed in URLs by the URI RFCs (because <> is used to delimit URLs in text), and I _think_ shouldn't be allowed in hosts, for that very same reason.  That said, I think the URL spec allows it in hosts afaict, and Safari follows the URL spec, nicely breaking every single URL linkifier since 1994 that tries to be reasonable.  :(

> what do we tell Safari about the rationale for blocking this

It aligns with the URI RFCs and the Chrome/Edge behavior (modulo Chrome's "yeah, let's have two different URL parser behaviors" mess), and is thus the quickest way to interop across a broad range of things, including but not limited to web browsers?
Flags: needinfo?(bzbarsky)
I filed https://github.com/whatwg/url/issues/291 on the problem I see with '>' in the URL spec as it stands today.  :(
Also, it would be a PITA to change from ^ to something else, because it's already embedded in the filenames of various persistent storage bits in the file system and whatnot.
If we _were_ to change the format, mystor was proposing that we just always append the ^, rather than only in the case of having originAttributes. This has the same backwards-compat issues, but at least would remove the need to find a perfect character for this.
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #6)
> If we _were_ to change the format, mystor was proposing that we just always
> append the ^, rather than only in the case of having originAttributes. This
> has the same backwards-compat issues, but at least would remove the need to
> find a perfect character for this.

If we do have to make this change, it would be very nice to be able to unconditionally add a suffix to an origin URI. This would help us with:
a) Always being able to rfind the ^ or whatever other character and split on it,
b) Avoid errors where code assumes that an origin is a valid URI, and tries to parse it as such, which succeeds in most cases but fails when container tabs or private browsing is present (there is an example of this happening, for example, in SessionStorage.jsm which I am fixing in bug 1353844),
c) Easily identify origin strings vs URIs when seeing them in debug output or similar

This is only worthwhile to do if we already have to change origins due to security problems with the current implementation. Changing the format of origins will be a monumental effort so it would be nice to avoid changing it if at all possible (it is used as a literal string for comparison in a bunch of different persistent storage forms, such as IIRC indexeddb, sessionstore, and the permission manager)
> but at least would remove the need to find a perfect character for this.

Would it?  Can the originAttributes not contain '^'?

> Avoid errors where code assumes that an origin is a valid URI

Or vice versa.  I have recently had to tell someone repeatedly not to call the "from origin" principal stuff while passing it a URL instead of an origin...
U+0009 TAB, U+000A LF, or U+000C CR would work. U+0023 (#) too probably. Anything non-ASCII if that's an option. From your test results changing whether ^ escapes might be reasonable though. Hopefully Safari is okay with it.
Sec bug triage group: What is the security impact of our current implementation? Should this bug remain private?
(In reply to Anne (:annevk) from comment #9)
> U+0009 TAB, U+000A LF, or U+000C CR would work. U+0023 (#) too probably.
> Anything non-ASCII if that's an option. From your test results changing
> whether ^ escapes might be reasonable though. Hopefully Safari is okay with
> it.

I think the problem with these symbols is that we had to make sure that whatever character we used was legal to use in a filename on all of our supported platforms, as we use origins as part of paths for indexeddb IIRC.

I'd be inclined to keep it a readable character for that reason as well. # seems like a bad idea as it would be possible for a URI to look like that which could mess things up due to anchor references.

(In reply to Matt Wobensmith [:mwobensmith][:matt:] from comment #10)
> Sec bug triage group: What is the security impact of our current
> implementation? Should this bug remain private?

I'm not in said triage group but I think this bug should probably remain private at least until we're confident that we can mitigate the issues. If it was possible to get an unescaped ^ character in a website's domain it would be possible for that website to, for example, escape private browsing mode or read cookies from a different container tab.

(In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #8)
> > but at least would remove the need to find a perfect character for this.
>
> Would it?  Can the originAttributes not contain '^'?

When we were originally making this decision I don't think we had a string origin attribute, but I think that we do (firstOriginDomain). I have no idea if that can contain ^ characters, but theoretically we could define a more strict encoding system for origin attributes then we do for URIs as they don't need to be web compatible.
Keywords: sec-other
Assignee: nobody → ckerschb
Status: NEW → ASSIGNED
Flags: needinfo?(ckerschb)

(In reply to Sebastian Hengst [:aryx] [limited availability until end of March](needinfo on intermittent or backout) from comment #13)

Backed out for eslint failures:

Arrrh - on it!

Flags: needinfo?(ckerschb)
Group: dom-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76
Flags: in-testsuite+
Status: RESOLVED → REOPENED
Flags: needinfo?(sstreich)
Resolution: FIXED → ---
Target Milestone: mozilla76 → ---
Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla77

Since the status are different for nightly and release, what's the status for beta?