Open Bug 1362824 Opened 7 years ago Updated 2 years ago

A blob created from a fetch response does not correctly get type set

Categories

(Core :: DOM: Core & HTML, defect, P3)

defect

Tracking

()

UNCONFIRMED

People

(Reporter: jhabdas, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

User Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1

Steps to reproduce:

Fetch a file with header "Content-Type: text/css; charset=utf-8"


Actual results:

blob type is set to "text/css; charset=utf-8"


Expected results:

blob type is set to just "text/css" (Current behavior in Chrome, Yandex. Safari currently being patched with a bug of same title, so can't reproduce there)
Component: Untriaged → DOM
Product: Firefox → Core
Is this a regression?
Flags: needinfo?(amarchesini)
(In reply to Olli Pettay [:smaug] from comment #1)
> Is this a regression?

I don't believe, though it's hard for me to tell as I'm not up to speed with the fetch development history for this browser. In reviewing the tests earlier I did not see anything attempting to cover this particular scenario. Looking at spec there seems to be some ambiguity in this area, though given the `Content-Type` can be pulled from response headers is seems there is value in keeping this particular field standard across browser to help avoid necessitating string parsing in libs wrapping Fetch.
I think there is some inconsistency and might need a spec issue somewhere.  For example, the WPT tests actually assert content-type parameters like this *should* be included:

https://github.com/w3c/web-platform-tests/blob/master/fetch/api/response/response-consume.html#L155

Chrome and Firefox pass this test, but webkit normalizes all blob content types by stripping parameters.

But the cases in that WPT are only checking param on URLSearchParam sources.  It seems behavior differs when the source is the network.

It seems like we should spec consistent behavior for these content type parameters regardless of the source.
Adding reduced test case exhibiting issue: https://bl.ocks.org/jhabdas/87f8bc3573cf91b4c35e0c799addd762
I suspect that our implementation is correct. The spec says:

Objects implementing the Body mixin also have an associated consume body algorithm, given a type, runs these steps:
...
Return the result of transforming promise by a fulfillment handler that returns the result of the package data algorithm with its first argument, type and this object’s MIME type.

Blob - Return a Blob whose contents are bytes and type attribute is mimeType.

To extract a MIME type from a header list (headers), run these steps:
    Let mimeType be the result of extracting header list values given `Content-Type` and headers.
    If mimeType is null or failure, then return the empty byte sequence.
    Return mimeType, byte-lowercased.
Flags: needinfo?(amarchesini)
It's not so much about correctness as it is about consistency. The specs can be ambiguous (and they are in cases like this). WebKit just fixed their related issue in the Nightly and and they're also returning `text/css` for the case reduced text case.

That makes the following return values for `blob.type` when `charset` prop is present across browsers:

- Chrome: `text/css`
- Safari: `text/css`
- Yandex: `text/css`
- Firefox: `text/css; charset=utf-8`

As a pragmatist myself, simply looking at a field called `type`, unless it's returned as part of the `headers`, would immediately lead me to believe only the media type (and not the "parsable MIME" would be returned). And it seems I'm not alone.
(In reply to jhabdas from comment #6)
> That makes the following return values for `blob.type` when `charset` prop
> is present across browsers:
> 
> - Chrome: `text/css`
> - Safari: `text/css`
> - Yandex: `text/css`
> - Firefox: `text/css; charset=utf-8`

I also ran your reduced test case through edge and got:

* edge: text/css; charset=utf-8

Unfortunately, though, its even more confused then this.  If you try this snippet:

  new Response(new URLSearchParams('name=value')).blob().then(b => console.log(b.type))

You get:

* chrome: application/x-www-form-urlencoded;charset=utf-8
* safari: application/x-www-form-urlencoded
* firefox: application/x-www-form-urlencoded;charset=utf-8

Firefox consistently includes params like charset.  Safari is consistent in not including params like charset.  Chrome does different things depending on the source.

Edge does not support URLSearchParams, so that test can't run there.

I'll open a spec issue.
Priority: -- → P3
If I do

  const mimeType = "text/css;charset=utf-8"
  new Blob([1], { type: mimeType }).type

it returns mimeType and Fetch is pretty clear about just forwarding the Content-Type value to the internal Blob constructor (it could be slightly better, once the File API gets better maintenance). Including specifying what the Content-Type value for URLSearchParams should be in https://fetch.spec.whatwg.org/#body-mixin and such.

So... INVALID and we should file bugs on the other browsers?
So, the bugs appear to be:

1. Webkit scrubs the mime type parameters passed to a blob regardless of where they come from.
2. Blink scrubs the mime type parameters when extracting from a network header.

Otherwise browsers seem consistent in passing mime type parameters through.

Does that sound right?
In the example I gave that does not involve Fetch I don't see WebKit (using Safari Technology Preview to test) removing the parameters. Otherwise it sounds right.
Dropping some relevant spec and interpretation here for posterity and in hopes of facilitating discussion:

https://w3c.github.io/FileAPI/#dfn-BPtype

> type, the ASCII-encoded string in lower case representing the media type of the Blob. Normative conditions for this member are provided in the §3.1 Constructors.

https://w3c.github.io/FileAPI/#constructorBlob

> If the type member of the optional options argument is provided and is not the empty string, run the following sub-steps: 1. Let t be the type dictionary member. If t contains any characters outside the range U+0020 to U+007E, then set t to the empty string and return from these substeps. 2. Convert every character in t to ASCII lowercase.

Notice the part which states "media type of the Blob". Then look for the difference in terminology in the spec when differentiating between "media type" and "Parsable MIME type". They are different things.
(In reply to Anne (:annevk) from comment #11)
> In the example I gave that does not involve Fetch I don't see WebKit (using
> Safari Technology Preview to test) removing the parameters. Otherwise it
> sounds right.

Ugh, you are right.
Results from Opera captured, here's an updated list of actual results for given issue:

- Chrome: `text/css`
- Safari: `text/css`
- Firefox: `text/css; charset=utf-8`
- Opera: `text/css`
- Yandex: `text/css`
- Edge: `text/css; charset=utf-8`

As mentioned above, when I copied over the spec, "media types" are not analogous with "parsable MIME type".

Can this be a confirmed bug now without the scope broadening?
I feel the other things should be spun off into separate issues.
This should also help alleviate some of the confusion:

RFC 2045 - Multipurpose Internet Mail Extensions (MIME)
https://tools.ietf.org/html/rfc2045#section-5.1.

> type := discrete-type / composite-type
I believe chrome, opera, and Yandex all use the blink engine.  So their results should always be the same for something like this.
jhabdas, File API is not really interoperable as you can tell from the results here and excluding parameters would actually make certain formats not parseable (application/form-data for instance) so I'm not really sure we should be doing that. It's probably a good idea to file an issue against that standard though to get things sorted out.
I'm not seeing "application/form-data" on the list of registered IANA media types (last updated 2 days ago): https://www.iana.org/assignments/media-types/media-types.xhtml. What about it specifically do you feel would make it not possible to determine the MIME type of a file blob?

Also, here are some definitions I found (with links to RFCs):
https://mimesniff.spec.whatwg.org/#understanding-mime-types.

- parsable MIME type (https://mimesniff.spec.whatwg.org/#parsable-mime-type)
- parsed MIME type (https://mimesniff.spec.whatwg.org/#parsed-mime-type)
- valid MIME type (https://mimesniff.spec.whatwg.org/#valid-mime-type)
- valid MIME type with no parameters (https://mimesniff.spec.whatwg.org/#valid-mime-type-with-no-parameters)
- serialized MIME tye (https://mimesniff.spec.whatwg.org/#serialized-mime-type)

I've opened an issue against Edge and am including it here in hopes of facilitating cross-browser communication: https://developer.microsoft.com/en-us/microsoft-edge/platform/issues/12009681/

Thanks for working to get this corrected. Please let me know how I can help.
(In reply to jhabdas from comment #18)
> I've opened an issue against Edge and am including it here in hopes of
> facilitating cross-browser communication:
> https://developer.microsoft.com/en-us/microsoft-edge/platform/issues/
> 12009681/

I don't understand why you think the parameters should be removed.  Clearly webkit and blink both expose the parameters in some cases.  The bug seems to be that they sometimes strip the charset parameter.  Gecko and Edge are consistent in always exposing the parameters.

The shortest path to cross-browser compat would be for webkit and blink to avoid stripping those parameters.

I commented to that effect on the edge issue.
> I don't understand why you think the parameters should be removed.

Quoting myself from 5 days ago:

> It's not so much about correctness as it is about consistency. The specs can be ambiguous (and they are in cases like this).

That said, I appreciate your diligence in seeking the achieve consistency and help remove ambiguity from the related specifications, some of which are still in draft.

> The bug seems to be that they sometimes strip the charset parameter.

In the specific use case provided when opening this issue I have never seen a change in behavior and therefore understand when you use the word "sometimes" you may be referring to a different bug.
What I'm saying is that your proposal requires 4 browser engines to change to handle the blob content type consistently.  If we always include parameters like charset then only two browser engines need to change and only for certain inputs.
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #21)
> What I'm saying is that your proposal requires 4 browser engines to change
> to handle the blob content type consistently.

If I made a proposal of any kind it was a mistake. What I'm ultimately after is the same goal. Consistency. And with that I will step out of this conversation as I approach it with bias and limited knowledge, and do not want to affect the outcome as a result.
Tests for Blob and File have now been written: https://github.com/w3c/web-platform-tests/pull/7764. At this point this basically needs someone to work on this.
Depends on: 1423877
Component: DOM → DOM: Core & HTML
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.