ContentPrincipal makes assumptions about escaping of '^' in URIs that don't seem to be true
Categories
(Core :: Security: CAPS, enhancement)
Tracking
()
People
(Reporter: bzbarsky, Assigned: ckerschb)
Details
(Keywords: sec-other, Whiteboard: [post-critsmash-triage][adv-main77-])
Attachments
(1 file)
Testcase <a href="http://test^test/foo^bar#x^y">Test</a> <script> alert(document.querySelector("a").href); </script> Note that the '^' in the hostname is NOT escaped. Same thing for .host. As far as I can tell, ContentPrincipal::GenerateOriginNoSuffixFromURI assumes that GetAsciiHostPort() on such a URI _will_ escape '^' or something. I suspect that this assumption is false. (Note that Gecko does escape ^ in the path, but not in the anchor or the hostname... The good news for the former is that we strip off the anchor and query in GenerateOriginNoSuffixFromURI in the "full spec" case, so don't need to worry about those bits.)
Comment 1•7 years ago
|
||
On our call, bz determined that Chrome and edge seem to reject ^, and so we can probably change this spec. NI bz to fill in the details and follow up with Anne.
Comment 2•7 years ago
|
||
Note that in the standard we've been through this various times already: https://github.com/whatwg/url/issues/159 and https://github.com/whatwg/url/issues/214. Putting arbitrary restrictions on hosts is arguably not in line with the design of generic URLs and might break non-DNS systems. Chrome rejecting this particular code point makes that less of a concern (though I believe that doesn't mean they'll reject it in all code paths that involve their URL parser, so maybe it is), but what do we tell Safari about the rationale for blocking this? I'd rather we not depend on such risky things to begin with.
Reporter | ||
Comment 3•7 years ago
|
||
So in terms of browser behavior, here is a testcase: <pre><script> var strs = [ "http://test^test/foo\\bar", "http://test:2^3/foo\\bar", "http://test/foo^bar", ]; for (var str of strs) { var a = document.createElement("a"); a.setAttribute("href", str); var href; try { href = a.href; } catch(e) { href = "href getter threw"; } var url; try { url = (new URL(str).href); } catch(e) { url = "constructor threw"; } document.writeln(str, " -- ", href, " -- ", url); } </script> with a live version at https://jsfiddle.net/yb02ng3r/2/ if someone wants to see it. Browser behavior seems to be as follows: * Safari: Allows '^' in host or path. Does not allow it in port; fails URL parsing in that case (as you can tell by the URL ctor throwing on the second output line and the '\\' not getting changed to '/' in the href). * Firefox: Allows '^' in host. Escapes it in path. As Safari for port. * Edge: Escapes '^' in host and path. Fails parsing for '^' in port, and also throws from href getter. * Chrome: Escapes '^' in host via href, but throws from new URL(). Allows '^' in port via href, but throws from new URL() (!). Escapes '^' in path. I believe the current spec text aligns with Safari, for what it's worth. Anne, what we need in practice is a faithful string representation of an origin, in a slightly broader meaning than the HTML spec one. That means a URL (but without the ref and query bits) and some other additional information. We need a way to clearly separate the additional information from the URL. At the time we set this up, it was pretty clear from the URL RFCs that '^' was not allowed in URLs in unescaped form. For example, it's not in either the "reserved" or the "unreserved" set in RFC 2396, same for RFC 3986. I _think_ the whatwg URL spec at the time also forbid it in unescaped form, though I don't have a good way to check that. If we can't use '^' (e.g. by aligning on the Edge behavior instead of the Safari one), what can we use? I guess we can try '>'. This is in the userinfo and path percent-encode sets in the URL spec, is not allowed in URLs by the URI RFCs (because <> is used to delimit URLs in text), and I _think_ shouldn't be allowed in hosts, for that very same reason. That said, I think the URL spec allows it in hosts afaict, and Safari follows the URL spec, nicely breaking every single URL linkifier since 1994 that tries to be reasonable. :( > what do we tell Safari about the rationale for blocking this It aligns with the URI RFCs and the Chrome/Edge behavior (modulo Chrome's "yeah, let's have two different URL parser behaviors" mess), and is thus the quickest way to interop across a broad range of things, including but not limited to web browsers?
Reporter | ||
Comment 4•7 years ago
|
||
I filed https://github.com/whatwg/url/issues/291 on the problem I see with '>' in the URL spec as it stands today. :(
Comment 5•7 years ago
|
||
Also, it would be a PITA to change from ^ to something else, because it's already embedded in the filenames of various persistent storage bits in the file system and whatnot.
Comment 6•7 years ago
|
||
If we _were_ to change the format, mystor was proposing that we just always append the ^, rather than only in the case of having originAttributes. This has the same backwards-compat issues, but at least would remove the need to find a perfect character for this.
Comment 7•7 years ago
|
||
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #6) > If we _were_ to change the format, mystor was proposing that we just always > append the ^, rather than only in the case of having originAttributes. This > has the same backwards-compat issues, but at least would remove the need to > find a perfect character for this. If we do have to make this change, it would be very nice to be able to unconditionally add a suffix to an origin URI. This would help us with: a) Always being able to rfind the `^` or whatever other character and split on it, b) Avoid errors where code assumes that an origin is a valid URI, and tries to parse it as such, which succeeds in most cases but fails when container tabs or private browsing is present (there is an example of this happening, for example, in SessionStorage.jsm which I am fixing in bug 1353844), c) Easily identify origin strings vs URIs when seeing them in debug output or similar This is only worthwhile to do if we already have to change origins due to security problems with the current implementation. Changing the format of origins will be a monumental effort so it would be nice to avoid changing it if at all possible (it is used as a literal string for comparison in a bunch of different persistent storage forms, such as IIRC indexeddb, sessionstore, and the permission manager)
Reporter | ||
Comment 8•7 years ago
|
||
> but at least would remove the need to find a perfect character for this. Would it? Can the originAttributes not contain '^'? > Avoid errors where code assumes that an origin is a valid URI Or vice versa. I have recently had to tell someone repeatedly not to call the "from origin" principal stuff while passing it a URL instead of an origin...
Comment 9•7 years ago
|
||
U+0009 TAB, U+000A LF, or U+000C CR would work. U+0023 (#) too probably. Anything non-ASCII if that's an option. From your test results changing whether ^ escapes might be reasonable though. Hopefully Safari is okay with it.
Comment 10•7 years ago
|
||
Sec bug triage group: What is the security impact of our current implementation? Should this bug remain private?
Comment 11•7 years ago
|
||
(In reply to Anne (:annevk) from comment #9) > U+0009 TAB, U+000A LF, or U+000C CR would work. U+0023 (#) too probably. > Anything non-ASCII if that's an option. From your test results changing > whether ^ escapes might be reasonable though. Hopefully Safari is okay with > it. I think the problem with these symbols is that we had to make sure that whatever character we used was legal to use in a filename on all of our supported platforms, as we use origins as part of paths for indexeddb IIRC. I'd be inclined to keep it a readable character for that reason as well. # seems like a bad idea as it would be possible for a URI to look like that which could mess things up due to anchor references. (In reply to Matt Wobensmith [:mwobensmith][:matt:] from comment #10) > Sec bug triage group: What is the security impact of our current > implementation? Should this bug remain private? I'm not in said triage group but I think this bug should probably remain private at least until we're confident that we can mitigate the issues. If it was possible to get an unescaped ^ character in a website's domain it would be possible for that website to, for example, escape private browsing mode or read cookies from a different container tab. (In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #8) > > but at least would remove the need to find a perfect character for this. > > Would it? Can the originAttributes not contain '^'? When we were originally making this decision I don't think we had a string origin attribute, but I think that we do (firstOriginDomain). I have no idea if that can contain ^ characters, but theoretically we could define a more strict encoding system for origin attributes then we do for URIs as they don't need to be web compatible.
Assignee | ||
Comment 12•4 years ago
|
||
Updated•4 years ago
|
Comment 13•4 years ago
|
||
Landed: https://hg.mozilla.org/integration/autoland/rev/21aedccf1abbfabba2989e00fb3856bcbb12b871
Backed out for eslint failures:
https://hg.mozilla.org/integration/autoland/rev/d4a9dcd3f6e4e90c4acbfa2dd017d0ddc0158ab4
Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&selectedJob=293479753&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunnable&revision=21aedccf1abbfabba2989e00fb3856bcbb12b871
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=293482540&repo=autoland
Assignee | ||
Comment 14•4 years ago
|
||
(In reply to Sebastian Hengst [:aryx] [limited availability until end of March](needinfo on intermittent or backout) from comment #13)
Backed out for eslint failures:
Arrrh - on it!
Comment 15•4 years ago
|
||
https://hg.mozilla.org/integration/autoland/rev/1ed0cfd336ddda0d6ba49818a36f6847569cd9bf
https://hg.mozilla.org/mozilla-central/rev/1ed0cfd336dd
Updated•4 years ago
|
Comment 16•4 years ago
|
||
Backed out for conflict with backed out bug 1620624:
https://hg.mozilla.org/integration/autoland/rev/e72622db09a1ee360b89b7f54b8e4c38a86e2d5e
Comment 17•4 years ago
|
||
https://hg.mozilla.org/integration/autoland/rev/f2c995169b5c3337d3d73a664fce95eb0fe6b829
https://hg.mozilla.org/mozilla-central/rev/f2c995169b5c
Comment 18•4 years ago
|
||
Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.
Updated•4 years ago
|
Updated•4 years ago
|
Comment 19•4 years ago
|
||
Reproduced the behavior before the fix following the test case from the description, with an old Nightly build 56.0a1 (build id: 20170731100325) using Windows 10.
Verified - Fixed in Nightly 77.0a1 (build id: 20200423095248) and latest Nightly 78.0a1 (build id: 20200505094621).
Updated•4 years ago
|
Updated•3 years ago
|
Description
•