Closed Bug 1389251 Opened 7 years ago Closed 7 years ago

data: URIs with spaces around MIME type are rejected in Firefox but not in other browsers

Categories

(Core :: Networking, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla62
Tracking Status
firefox62 --- fixed

People

(Reporter: dholbert, Assigned: valentin)

References

Details

(Whiteboard: [necko-triaged][webcompat:p1])

User Story

Business driver: Achieve tier-1 Google Search experience for Gecko on Android

Note: Selena would like a security review here. bz guesses safety from a compat standpoint.

Attachments

(7 files, 1 obsolete file)

STR: 1. Load attached testcase -- or just load this data URI: data: text/html,<h1>Large text (Edge won't accept data URIs in the URL bar -- hence, including a testcase with an iframe as well, for Edge testability.) EXPECTED RESULTS: Browser should render the HTML, showing "Large text". ACTUAL RESULTS: Browser doesn't render the HTML -- it treats it as having an unrecognized media type, I think, which prompts a file download. This is a result of the space character between "data:" and "text/html". I think we might be correct according to https://tools.ietf.org/html/rfc2397#section-3 but it seems other browsers disagree with us, so from a practical perspective we probably need to align with the other browsers' behavior.
Attached file testcase 1
We really need a sane spec for data: parsing.
Flags: needinfo?(annevk)
(Note that we also have related bug 1104311 and bug 1388594, but it looks to me like those are about whitespace in the payload -- i.e. the content after the "," -- rather than whitespace in the media type.)
Summary: data: URIs with space after "data: " are rejected in Firefox but not in other browsers → data: URIs with space between "data:" and the media type are rejected in Firefox but not in other browsers
These browsers all give "EXPECTED RESULTS" (rendering the "Large text" html in the attached testcase): Chrome 60 (Mac) Safari 10.1 (Mac) Edge 15 (Win10) Opera 46 (Mac) Opera 12.15 (Mac) Only Firefox gives "ACTUAL RESULTS".
(In reply to Boris Zbarsky [:bz] (vacation Aug 14-27) from comment #3) > We really need a sane spec for data: parsing. True -- but for this proximal issue at least, it seems like we should strip whitespace between the colon and the media type. Just testing in Chrome, it seems like that's what they do, at least. (That's the only place they're permissive -- spaces that occur *within* the media type are preserved and break the rendering -- e.g. these are all treated as non-HTML, and match what happens if I provide a bogus media type: data:t ext/html,<h1>Large text data:text/ html,<h1>Large text data:text/h tml,<h1>Large text
I'm going to assign this to kahs to work on when he's done with the bug he's on right now. I'm not 100% sure, but I think the work here will be in: https://dxr.mozilla.org/mozilla-central/rev/4c5fbf49376351679dcc49f4cff26c3c2e055ccc/netwerk/protocol/data/nsDataHandler.cpp#157
Assignee: nobody → kevin.hsieh
Actually I guess we already have parsed |contentType| at that point, so it'd be a bit earlier.
What about: data:text/html ,<h1>Large text ? If the handle that as HTML, I guess the point is that we'd just want to trim whitespace around our tentative media type.
Attached file testcase 2
(In reply to Boris Zbarsky [:bz] (vacation Aug 14-27) from comment #9) > What about: > > data:text/html ,<h1>Large text Good question. Chrome and Edge do treat that as HTML, yes. (I've encoded that in "testcase 2" for testing in Edge.) > If the handle that as HTML, I guess the point is that we'd just want to > trim whitespace around our tentative media type. Yeah, that makes sense to me.
FWIW, I started writing some tests to have something to back an eventual specification and it doesn't seem like Chrome strips all whitespace. U+0009, U+000A, and U+000D are already gone due to the URL parser. U+000C is left in by Chrome and U+0020 is stripped. So e.g. data:%0Ctext/html,x ends up with a Content-Type of "%0ctext/html" (which results in a download when you navigate to it). Same happens (including the Content-Type value, seems they percent encode stuff) for data:\x0Ctext/html,x assuming you replace \x0C with a U+000C literal. If you're interested in my tests I posted a copy at https://gist.github.com/annevk/4287452653921b2b7de35e4208b4a985 and they can be run using web-platform-tests (at same URL with .js replaced with .html). Anything worth standardizing can be noted here: https://github.com/whatwg/fetch/issues/234.
Flags: needinfo?(annevk)
(By the way, one thing that would be nice to fix is to use "text/plain;charset=US-ASCII" as fallback for unrecognized MIME types rather than "text/plain". We already use that as default MIME type and there's no real reason for them to be different I think. US-ASCII is a little dated and actually ends up meaning windows-1252, but whatever, it's interoperable.)
(In reply to Daniel Holbert [:dholbert] from comment #7) > I'm going to assign this to kahs to work on when he's done with the bug he's > on right now. (Cancel that - he's gonna work on some ruby bugs instead. Unassigning -- feel free to take, though maybe it would be wise to wait to see how the standardizing shakes out.)
Assignee: kevin.hsieh → nobody
Sticking this in next for now (to keep it on our radar), but based on comment 14 we may want to hold off on doing any real work until the spec solidifies?
Whiteboard: [necko-next]
https://github.com/whatwg/fetch/pull/579 has my work in progress on a better specification. Given the tests I've written thus far I think that fixing this by trimming leading and trailing whitespace from the MIME type portion is fine. There's more work to be done on data: URL processing, but we can do it incrementally (unless there's a good reason not to that I don't know about).
Summary: data: URIs with space between "data:" and the media type are rejected in Firefox but not in other browsers → data: URIs with spaces around MIME type are rejected in Firefox but not in other browsers
Blocks: 1392241
Priority: -- → P2
BTW, it seems like we get this right IF the data URI has ";base64" after the mimetype, as shown at e.g. https://software.hixie.ch/utilities/js/live-dom-viewer/?saved=5761 (I believe that means "data: image/svg+xml;base64,..." would work, even though "data: image/svg+xml,..." is broken per my most recent attachment. I can't test the ;base64 version with a useful URL at the moment, since data URI kitchen (my preferred base64-data-URI-encoder) seems to not be working for me right now.)
Note: https://software.hixie.ch/utilities/cgi/data/data is broken by design. We no longer support redirects to data: URLs.
Flags: webcompat+
Whiteboard: [necko-next] → [necko-next][webcompat:p1]
This is silly. We already have code that tries to strip whitespace from the content type. It's just not working right because it runs prior to unescaping, so at that point the whitespace is %20, not ' '. This was probably missed in the patches for bug 391951. That said, it seems to me that https://fetch.spec.whatwg.org/#data-urls step 6 would have the same problem, unless the URL spec allows unescaped literal ' ' in URLs? Which maybe it does in this case because https://url.spec.whatwg.org/#cannot-be-a-base-url-path-state uses https://url.spec.whatwg.org/#c0-control-percent-encode-set which does not include U+0020. This is unlike https://url.spec.whatwg.org/#path-state which uses https://url.spec.whatwg.org/#path-percent-encode-set which _does_ include U+0020. In our case, I think we don't really have this concept of "cannot be a base URL path state". We could try to hack around this in the data handler, but do we have an ETA on having our URL parsing just match the URL spec?
Flags: needinfo?(valentin.gosu)
User Story: (updated)
User Story: (updated)
So, the main problem here is that while nsDataHandler::ParsePathWithoutRef does strip the whitespace from the content type, it doesn't change the original spec, so the URI we end up creating will still percent encode the space, without stripping it. (In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #21) > We could try to hack around this in the data handler, but do we have an ETA > on having our URL parsing just match the URL spec? No. I'll try and dedicate more time to this, but it'll still be quite a long process. The assumptions our internal code makes about URIs usually create roadblocks in this undertaking.
Assignee: nobody → valentin.gosu
Flags: needinfo?(valentin.gosu)
Whiteboard: [necko-next][webcompat:p1] → [necko-triaged][webcompat:p1]
So, I tried to make ParsePathWithoutRef strip whitespace from the initial URI MIME type. Apart from the issue that it actually changed the URI, it turned out to be much more complicated than I expected. So I looked more carefully into :bz's suggestion, and _obviously_ it was a much better fix to the problem. (/me makes mental note for the future: always try Bz's solution first) The only issue I've got now is that browser_webconsole_context_menu_copy_entire_message.js is failing, but I will ask jdescottes to help me with it. Now I just need to audit a few other calls to NS_EscapeURL(esc_OnlyNonASCII), to make sure spaces aren't an issue.
This change makes nsEscape::T_EscapeURL not escape spaces when passed esc_OnlyNonASCII. This fixes a web-compat issue for URLs such as "javascript: alert('hello');" and the fact that data: URIs with spaces around MIME type are rejected.
Attachment #8983407 - Flags: review?(bzbarsky)
Hi Julian, I was wondering if you could help me with this failure in browser_webconsole_context_menu_copy_entire_message.js: https://treeherder.mozilla.org/logviewer.html#?job_id=181731693&repo=try&lineNumber=10729 The patch above makes data: URIs no longer encode spaces, which I guess affects this test because it loads a data URI. I can get the test to pass if I remove all the "The last line is an empty new line" checks. Can you help me figure out if the new behaviour is correct, or otherwise what changes need to be made? I used this patch to figure out what is different: https://pastebin.mozilla.org/9087038 And got these outputs: before: https://pastebin.mozilla.org/9087039 after: https://pastebin.mozilla.org/9087040 So whereas before the clipboard text was ```17:26:32.299 simple text message data:<script> window.logStuff = function () { console.log("simple text m:1:38 ``` It is now: ```17:32:04.213 console.trace() data:<script> window.logStuff = function () { console.log("simple text message"); function wr:1:103 wrapper data:text/html;charset=utf-8,<script> window.logStuff = function () { console.log("simple text message"); function wrapper() { console.trace(); } wrapper(); };</script>:1:103 window.logStuff data:text/html;charset=utf-8,<script> window.logStuff = function () { console.log("simple text message"); function wrapper() { console.trace(); } wrapper(); };</script>:1:128 ``` Thanks!
Flags: needinfo?(jdescottes)
I don't think your patch breaks the behavior, it's just that this test is badly designed. So the first issue is that a lot of asserts in this test are wrong: ok(lines.length, 2, "There are 2 lines in the copied text"); The intent was to check if lines.length was equal to 2. We should have used is(lines.length, 2, ...). Sorry about that. Now the real issue is that with your patch, the test clicks on the second message rather than on the first. If you look at the "location" of the messages, before your patch we have data:<script> window.logStuff = function () { console.log("simple text m:1:38 And after your patch data:<script> window.logStuff = function () { console.log("simple text message"); function wr:1:38 (logic to shorten source locations is at https://searchfox.org/mozilla-central/rev/3737701cfab93ccea04c0e9cab211ad10f931d87/devtools/client/shared/source-utils.js#117-184 , now that spaces are no longer escaped, more "text" is displayed) And in our test, we find messages (via findMessage()) by matching textContent. For the first message we look for "simple text message", which is now part of the origin for both the first and the second message. To fix this, several options: 1/ pass a selector to findMessage() which will force it to look only in the text of the messages: let message = await waitFor(() => findMessage(hud, "simple text message", ".message-body")); message = message.closest(".message"); ( we still need to go back to the .message to copy the expected string) https://pastebin.mozilla.org/9087101 2/ modify the test source to avoid having "simple text message" in the location, eg console.log("simple " + "text message"); Let me know if this fixes the issue.
Flags: needinfo?(jdescottes)
(In reply to Julian Descottes [:jdescottes][:julian] from comment #29) > 2/ modify the test source to avoid having "simple text message" in the > location, eg console.log("simple " + "text message"); > > Let me know if this fixes the issue. Thanks a lot! This last solution fixed it. I also updated the test to use is(lines.length...) instead of ok()
See Also: → 1467407
(In reply to Valentin Gosu [:valentin] from comment #30) > (In reply to Julian Descottes [:jdescottes][:julian] from comment #29) > > 2/ modify the test source to avoid having "simple text message" in the > > location, eg console.log("simple " + "text message"); > > > > Let me know if this fixes the issue. > > Thanks a lot! This last solution fixed it. > I also updated the test to use is(lines.length...) instead of ok() Thanks for cleaning this up! FYI, I am fixing other similar issues in Bug 1467407.
Comment on attachment 8984204 [details] Bug 1389251 - Add esc_Spaces that may be used to force escaping of spaces https://reviewboard.mozilla.org/r/250006/#review256316 Looks good to me, thanks!
Attachment #8984204 - Flags: review?(jdescottes) → review+
Attachment #8983407 - Attachment is obsolete: true
Attachment #8983407 - Flags: review?(bzbarsky)
Comment on attachment 8984203 [details] Bug 1389251 - Do not escape spaces in nsSimpleURI https://reviewboard.mozilla.org/r/250004/#review256808 Sorry for the lag here.... I have a few questions/concerns: 1) Do all the current consumers of esc_OnlyNonASCII (including the HTTP ones) want the new behavior? 2) Does the change the behavior of data: URIs that _do_ have %20 in them? I'd assume not, but wanted ot check. I'm a little surprised there aren't other URL constructor tests in wpt for this "space allowed in some URIs but not others" thing....
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #35) > Comment on attachment 8984203 [details] > Bug 1389251 - Do not escape spaces in nsSimpleURI > > https://reviewboard.mozilla.org/r/250004/#review256808 > > Sorry for the lag here.... > > I have a few questions/concerns: > > 1) Do all the current consumers of esc_OnlyNonASCII (including the HTTP > ones) want the new behavior? I went through all of them - the HTTP redirect ones for example then build a HTTP URL with the path, so it gets properly escaped afterwards. It might be better to escape it anyway, just in case we change stuff in the future, or if there are code paths I'm missing. I added an esc_Spaces flag that forces space escape. I use it for all calls where we _might_ want to escape spaces. Patch incoming. > 2) Does the change the behavior of data: URIs that _do_ have %20 in them? > I'd assume not, but wanted ot check. It doesn't. I added some tests. > I'm a little surprised there aren't other URL constructor tests in wpt for > this "space allowed in some URIs but not others" thing.... It seems one more was recently added. :) I agree that there are quite few.
Comment on attachment 8984920 [details] Bug 1389251 - Fix browser_webconsole_context_menu_copy_entire_message.js so findMessage doesn't match on the wrong thing https://reviewboard.mozilla.org/r/250716/#review257204
Attachment #8984920 - Flags: review?(jdescottes) → review+
Attachment #8984203 - Flags: review?(bzbarsky) → review+
Comment on attachment 8984204 [details] Bug 1389251 - Add esc_Spaces that may be used to force escaping of spaces https://reviewboard.mozilla.org/r/250006/#review257328 ::: netwerk/base/nsSimpleURI.cpp:515 (Diff revision 2) > // string (and will result in .spec and .path having a terminal #). > nsresult > nsSimpleURI::SetRef(const nsACString &aRef) > { > nsAutoCString ref; > - nsresult rv = NS_EscapeURL(aRef, esc_OnlyNonASCII, ref, fallible); > + nsresult rv = NS_EscapeURL(aRef, esc_OnlyNonASCII | esc_Spaces, ref, fallible); Are there web platform tests for this? If not, should there be? Does the new behavior match the other browsers?
Attachment #8984204 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky [:bz] (Vacation Jun 16-24) (no decent commit message means r-) from comment #42) > Comment on attachment 8984204 [details] > Bug 1389251 - Add esc_Spaces that may be used to force escaping of spaces > > https://reviewboard.mozilla.org/r/250006/#review257328 > > ::: netwerk/base/nsSimpleURI.cpp:515 > (Diff revision 2) > > // string (and will result in .spec and .path having a terminal #). > > nsresult > > nsSimpleURI::SetRef(const nsACString &aRef) > > { > > nsAutoCString ref; > > - nsresult rv = NS_EscapeURL(aRef, esc_OnlyNonASCII, ref, fallible); > > + nsresult rv = NS_EscapeURL(aRef, esc_OnlyNonASCII | esc_Spaces, ref, fallible); > > Are there web platform tests for this? If not, should there be? Does the > new behavior match the other browsers? Just a couple, but it's good enough for now. We match Chrome - I haven't tested with the others. https://searchfox.org/mozilla-central/rev/42930ab9634ebf3f62aed60f7d1c1bf25c0bf00c/testing/web-platform/tests/url/resources/urltestdata.json#153,168
> Just a couple, but it's good enough for now. OK. So those tests fail with just the first patch?
(In reply to Boris Zbarsky [:bz] (Vacation Jun 16-24) (no decent commit message means r-) from comment #44) > > Just a couple, but it's good enough for now. > > OK. So those tests fail with just the first patch? Yes.
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/6e89a11ae28e Do not escape spaces in nsSimpleURI r=bz https://hg.mozilla.org/integration/autoland/rev/c6baebf7b34c Add esc_Spaces that may be used to force escaping of spaces r=bz,jdescottes https://hg.mozilla.org/integration/autoland/rev/a76a3251a9d2 Fix browser_webconsole_context_menu_copy_entire_message.js so findMessage doesn't match on the wrong thing r=jdescottes
Backed out 3 changesets (bug 1389251) for browser-chrome failure on browser/base/content/test/pageinfo/browser_pageinfo_image_info.js. CLOSED TREE Log: https://treeherder.mozilla.org/logviewer.html#?job_id=183288028&repo=autoland&lineNumber=1082 03:03:50 INFO - 6 INFO TEST-OK | browser/base/content/test/pageinfo/browser_pageinfo_firstPartyIsolation.js | took 592ms 03:03:50 INFO - 7 INFO checking window state 03:03:50 INFO - 8 INFO TEST-START | browser/base/content/test/pageinfo/browser_pageinfo_image_info.js 03:04:35 INFO - TEST-INFO | started process screenshot 03:04:35 INFO - TEST-INFO | screenshot: exit 0 03:04:35 INFO - Buffered messages logged at 03:03:50 03:04:35 INFO - 9 INFO Console message: [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "data:text/html,<style type='text/css'>%23test-image,%23not-test-image {background-image: url('about:logo?c');}</style><img src='about:logo?b' height=300 width=350 alt=2 id='not-test-image'><img src='about:logo?b' height=300 width=350 alt=2><img src='about:logo?a' height=200 width=250><img src='about:logo?b' height=200 width=250 alt=1><img src='about:logo?b' height=100 width=150 alt=2 id='test-image'>" line: 0}] 03:04:35 INFO - Buffered messages finished 03:04:35 ERROR - 10 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/pageinfo/browser_pageinfo_image_info.js | Test timed out - 03:04:35 INFO - GECKO(1016) | MEMORY STAT | vsize 701MB | vsizeMaxContiguous 626MB | residentFast 196MB | heapAllocated 75MB 03:04:35 INFO - 11 INFO TEST-OK | browser/base/content/test/pageinfo/browser_pageinfo_image_info.js | took 45027ms 03:04:35 INFO - Not taking screenshot here: see the one that was previously logged 03:04:35 ERROR - 12 INFO TEST-UNEXPECTED-FAIL | browser/base/content/test/pageinfo/browser_pageinfo_image_info.js | Found a tab after previous test timed out: data:text/html,<style type='text/css'>%23test-image,%23not-test-image {background-image: url('about:logo?c');}</style><img src='about:logo?b' height=300 width=350 alt=2 id='not-test-image'><img src='about:logo?b' height=300 width=350 alt=2><img src='about:logo?a' height=200 width=250><img src='about:logo?b' height=200 width=250 alt=1><img src='about:logo?b' height=100 width=150 alt=2 id='test-image'> - 03:04:35 INFO - 13 INFO checking window state 03:04:35 INFO - 14 INFO TEST-START | browser/base/content/test/pageinfo/browser_pageinfo_images.js 03:04:35 INFO - GECKO(1016) | MEMORY STAT | vsize 704MB | vsizeMaxContiguous 626MB | residentFast 203MB | heapAllocated 78MB 03:04:35 INFO - 15 INFO TEST-OK | browser/base/content/test/pageinfo/browser_pageinfo_images.js | took 223ms 03:04:35 INFO - 16 INFO checking window state 03:04:35 INFO - 17 INFO TEST-START | browser/base/content/test/pageinfo/browser_pageinfo_permissions.js 03:04:37 INFO - GECKO(1016) | MEMORY STAT | vsize 715MB | vsizeMaxContiguous 626MB | residentFast 214MB | heapAllocated 84MB 03:04:37 INFO - 18 INFO TEST-OK | browser/base/content/test/pageinfo/browser_pageinfo_permissions.js | took 1530ms 03:04:37 INFO - 19 INFO checking window state Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=a76a3251a9d242186b5de495e4388e56390b8fc8&filter-searchStr=browser%20chrome&selectedJob=183288091 Backout: https://hg.mozilla.org/integration/autoland/rev/9ca74dd454400190188c6646cf94441f198bbc24
Flags: needinfo?(valentin.gosu)
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/0c2c45a2a22d Do not escape spaces in nsSimpleURI r=bz https://hg.mozilla.org/integration/autoland/rev/180bf1c0385b Add esc_Spaces that may be used to force escaping of spaces r=bz,jdescottes https://hg.mozilla.org/integration/autoland/rev/58d566f57fb0 Fix browser_webconsole_context_menu_copy_entire_message.js so findMessage doesn't match on the wrong thing r=jdescottes
Flags: needinfo?(valentin.gosu)
Could this be tested from a QA perspective using any testcase? If not could it be marked as verified and fixed? Thanks.
Well, comment 0 it a very easy way of verifying it :) It didn't use to work before.
Status: RESOLVED → VERIFIED
Regressions: CVE-2021-43541
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: