1388594 - Should we strip whitespace for non-text and non-base64 data: URI?

Yoshi Cheng-Hao Huang [:allstars.chh][:allstarschh][:yoshi]

Reporter

Description

•

8 years ago

For data: URI, we strip whitespace unless it's text. http://searchfox.org/mozilla-central/rev/0f16d437cce97733c6678d29982a6bcad49f817b/netwerk/protocol/data/nsDataHandler.cpp#92 This is done in bug 391951 and bug 390126. However I've already found some failures in wpt, where the tests expect whitespace to be kept. like http://searchfox.org/mozilla-central/rev/0f16d437cce97733c6678d29982a6bcad49f817b/testing/web-platform/tests/XMLHttpRequest/data-uri.htm#35 in this case the mimeType is image/png so we strip whitespace, however we will fail in this case because the result will be 'Hello,World!', instead of expected 'Hello, World!' There's anotehr wpt I met, https://github.com/w3c/web-platform-tests/blob/af610fabf05f1761321e41b031cc71ae9840bdc0/workers/data-url.html#L53 The line 'else postMessage(...)' will be parsed as 'elsepostMessage(...)', and we throwed ReferenceError, (and I've fixed it in our bug 1340974) As we're fixing bug 1324406 I'd like to know in which cases we should strip whitespace. Bz, I found you reviewed bug 3919151 and bug 390126, could you provide some suggestions what we should do here? Thanks

Robert Longson [:longsonr]

Comment 1

•

8 years ago

FWIW The data from bug 390126 was taken from the w3c SVG testsuite i.e. https://www.w3.org/Graphics/SVG/Test/20061213/svggen/filters-blend-01-b.svg

Boris Zbarsky [:bzbarsky]

Comment 2

•

8 years ago

So... In spec terms, the syntax for data: URIs is originally given in https://tools.ietf.org/html/rfc2397#section-3 as follows: dataurl := "data:" [ mediatype ] [ ";base64" ] "," data mediatype := [ type "/" subtype ] *( ";" parameter ) data := *urlchar and "urlchar" is claimed to come from https://tools.ietf.org/html/rfc2396 but that RFC doesn't actually define that production. Looks like there's an erratum at https://www.rfc-editor.org/errata/eid2045 that says this should actually be: data := *uric which in RFC2396 is defined as: uric = reserved | unreserved | escaped reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," unreserved = alphanum | mark alphanum = alpha | digit mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")" escaped = "%" hex hex hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f" I left out alpha and digit, which are [a-zA-Z0-9]. The upshot of all this is that whitespace is not valid in a data: URI, so if it's in there at all we're now in error-recovery mode. There's an attempt at https://simonsapin.github.io/data-urls/ to address some of the ambiguities of RFC 2397, but it doesn't seem to include this bit. I'd guess that either the fetch spec or https://simonsapin.github.io/data-urls/ should describe what should happen here.

Flags: needinfo?(simon.sapin)

Flags: needinfo?(annevk)

Anne (:annevk)

Comment 3

•

8 years ago

So the question is why Chrome and Firefox both pass the SVG test, but only Chrome passes the data:image/png,Hello World! test? Does Chrome only strip spaces for base64? (And yeah, at some point someone should write the definitive algorithm for data URLs. I can probably have another go at it if we think it's important, but I suspect it might take a while given all the tests that would need to be written.)

Flags: needinfo?(annevk)

Simon Sapin (:SimonSapin)

Comment 4

•

8 years ago

Whitespace is invalid per RFC 2396, but that RFC doesn’t say when to do with invalid inputs. (Non-ASCII characters for example are UTF-8-encoded then percent-encoded, if I remember correctly.) Base 64 decoding can be thought of as an additional step after URL parsing (whether or not it’s implemented that way). I think it makes sense to ignore whitespace in the former, but not the latter. Then again, interop is often not about what makes sense…

Flags: needinfo?(simon.sapin)

Anne (:annevk)

Comment 5

•

8 years ago

When exactly does the whitespace stripping happen inside image/png? I tried to reproduce with data:image/png,X%20X, but the byte output of that is always 0x58 0x20 0x58 as far as I can tell. Is it specific to XMLHttpRequest somehow?

Boris Zbarsky [:bzbarsky]

Comment 6

•

8 years ago

> I tried to reproduce with data:image/png,X%20X The whitespace stripping happens before URI unescaping. You'd need a data: URI string with an actual whitespace in it.

Anne (:annevk)

Comment 7

•

8 years ago

Chrome also strips whitespace. It seems that Edge and Safari do not strip whitespace. I think the Edge/Safari behavior is better is it doesn't rely on interpreting the MIME type to produce a byte sequence for the body. I also tested stripping for base64. It seems only Edge and Firefox strip U+000C (FF; \f). Everyone strips U+0020. (Note that U+0009, U+000A, and U+000D are already stripped by the URL parser for all URLs.) I think what Edge and Firefox do is reasonable for base64. If we're going to strip we might as well strip all known ASCII whitespace. https://github.com/whatwg/fetch/issues/234 is the tracking issue for a more proper specification.

Anne (:annevk)

Comment 8

•

8 years ago

(And as far as I can tell data URL base64 reuses the window.atob() algorithm which already discards ASCII whitespace so we wouldn't have to do anything special there.)

Anne (:annevk)

Updated

•

7 years ago

Blocks: 1392241

Selena Deckelmann :selenamarie :selena

Updated

•

7 years ago

Component: General → DOM: Security

Christoph Kerschbaumer [:ckerschb]

Updated

•

7 years ago

Component: DOM: Security → Networking

Dragana Damjanovic [:dragana]

Updated

•

7 years ago

Priority: -- → P3

Whiteboard: [necko-triaged]

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Thomas Wisniewski [:twisniewski]

Comment 10

•

1 years ago

We now basically align with the current spec as of bug 1845006 landing (with further caveats being considered in bug 1845005). I think it's safe to close this bug unless the spec needs changes.

Status: NEW → RESOLVED

Closed: 1 years ago

Resolution: --- → FIXED

Bugzilla

Should we strip whitespace for non-text and non-base64 data: URI?

Categories

(Core :: Networking, defect, P3)

Tracking

()

People

(Reporter: allstars.chh, Unassigned)

References

Details

(Whiteboard: [necko-triaged])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Updated

Updated

Updated

Updated

Comment 10