Open Bug 1895075 Opened 7 months ago Updated 1 month ago

Improve parsing of response content-type type headers to match spec

Tracking

()

Status:

ASSIGNED

People

(Reporter: twisniewski, Assigned: twisniewski)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [necko-triaged])

Attachments

(3 files)

Bug 1895075 - part 1: implement the Fetch standard's Extract a MimeType and related algorithms; r?hsivonen 7 months ago Thomas Wisniewski [:twisniewski] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1895075 - part 2: parse content-type response headers using the Extract a Mime Type algorithm from the Fetch spec; r?valentin,kershaw 7 months ago Thomas Wisniewski [:twisniewski] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1895075 - part 3: continue to use net_ParseContentType for EnsureMIMEOfScript for now; r?dveditz,tschuster 6 months ago Thomas Wisniewski [:twisniewski] 48 bytes, text/x-phabricator-request		Details \| Review

Thomas Wisniewski [:twisniewski]

Assignee

Description

•

7 months ago

Right now, we parse individual Content-Type headers separately as see them, in ParseHeaderLine_locked. This is incorrectly per spec. We are meant to collect them all, and then use Extract a MIME Type on them all. This will let us parse quotes and such as the spec expects.

Since bug 1847659 has shipped without known regressions for several releases already, I think it's safe to continue nudging us along here to match the spec. I propose we fall back to our current behavior if network.standard_content_type_parsing.response_headers is false, but collect and extract the content type headers per spec if it's true.

This will let us confirm whether the spec text is web-compatible, and if so, we can continue to iterate and re-use our implementation of Extract a MIME type elsewhere as the spec text expects. And we can later store the full, properly-parsed content-type object so that other code may access it (rather than our current approach of only saving the content type's essence as mContentType, and its charset parameter as mContentCharset, which leads to other spec-compatibility bugs and complications).

Thomas Wisniewski [:twisniewski]

Assignee

Comment 1

•

7 months ago

https://treeherder.mozilla.org/jobs?repo=try&revision=2b1fda09bf4205d8147457be33b9892c794cb81a

Sunil Mayya

Updated

•

7 months ago

Blocks: fetch

Severity: -- → S3

Priority: -- → P2

Whiteboard: [necko-triaged][necko-priority-review]

Thomas Wisniewski [:twisniewski]

Assignee

Comment 2

•

7 months ago

•

Edited

https://treeherder.mozilla.org/jobs?repo=try&revision=8587d5956a03660128e1610aa4fcef17b3dce899

Thomas Wisniewski [:twisniewski]

Assignee

Updated

•

7 months ago

Duplicate of this bug: 560392

Thomas Wisniewski [:twisniewski]

Assignee

Updated

•

7 months ago

Blocks: mimesniff

Thomas Wisniewski [:twisniewski]

Assignee

Comment 4

•

7 months ago

Attached file Bug 1895075 - part 1: implement the Fetch standard's Extract a MimeType and related algorithms; r?hsivonen — Details

Phabricator Automation

Updated

•

7 months ago

Assignee: nobody → twisniewski

Status: NEW → ASSIGNED

Thomas Wisniewski [:twisniewski]

Assignee

Comment 5

•

7 months ago

Attached file Bug 1895075 - part 2: parse content-type response headers using the Extract a Mime Type algorithm from the Fetch spec; r?valentin,kershaw — Details

Pulsebot

Comment 6

•

7 months ago

Pushed by twisniewski@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d5fef245a430 part 1: implement the Fetch standard's Extract a MimeType and related algorithms; r=hsivonen https://hg.mozilla.org/integration/autoland/rev/a1cad969287d part 2: parse content-type response headers using the Extract a Mime Type algorithm from the Fetch spec; r=valentin,kershaw,necko-reviewers

Aron Cseh

Comment 7

•

7 months ago

Backed out for causing mochitest failures on test_block_script_wrong_mime.html

Flags: needinfo?(twisniewski)

Thomas Wisniewski [:twisniewski]

Assignee

Comment 8

•

6 months ago

I just chatted with dveditz about the test-case failure here, and I think we're stuck for now. The failure is because the test in question expects scripts with certain invalid mime-types to be blocked, which clashes with the spec text. For instance, image/ is currently blocked, but if we follow the spec text, it would be allowed, because that's an invalid mime-type and the spec says parsing failures should be allowed.

Dan suggested that we should add telemetry to measure if this is a problem before landing these patches, and I'd like to give him a chance to think about this some more before we try landing this again.

Flags: needinfo?(twisniewski)

Frederik Braun [:freddy]

Comment 9

•

6 months ago

I think it would be nice if we could better distinguish between empty & invalid, where AFAIU we have to allow empty.
However, I know that we've been bitten by making script mime checking harder many, many times. It would be nice if we could play with this before rolling it out completely (using an experiment, testing in Nightly or such...

As desirable as those security checks are, I wouldn't want us to take the full load of webcompat issues if other browsers are less strict.

Adding Tom Schuster because he has a painful history with stricter mime type parsing (Sorry Tom!) :)

Thomas Wisniewski [:twisniewski]

Assignee

Comment 10

•

6 months ago

Agreed, :freddy. My goal is to eventually move over to storing the parsed MimeType as the channel's/responsehead's ContentType, rather than two strings for ContentType and ContentCharset. But I will need to touch a lot of code to add that, since lots of necko classes and boilerplate code is involved, plus I will need to make the MimeType class XPCOM-compatible (from what I can tell).

In the meantime, I was hoping that maybe we could make a simpler change, where the existing ContentType members can be made void vs empty, where void means "not even set" and empty meaning "failed to parse". But a lot of code checks this, so it too will be a bit of a project to vet everything.

And no worries if more eyes are on this; I really would like to continue pushing this forward, and if others can help me devise more piece-meal ways to accomplish it, so much the better.

Valentin Gosu [:valentin] (he/him)

Updated

•

6 months ago

Blocks: 1883037

Tom Schuster

Comment 11

•

6 months ago

We should figure out if we can block scripts with an invalid MIME type. Otherwise I am hesitant to further weak our already weak measures again scripts with a wrong MIME.

Thomas Wisniewski [:twisniewski]

Assignee

Comment 12

•

6 months ago

Out of curiosity, is there a standard for how scripts should be blocked where MIME Types are considered? It would be nice to confirm whether we're aligning with other browsers on this.

Flags: needinfo?(tschuster)

Tom Schuster

Comment 13

•

6 months ago

(In reply to Thomas Wisniewski [:twisniewski] from comment #12)

Out of curiosity, is there a standard for how scripts should be blocked where MIME Types are considered? It would be nice to confirm whether we're aligning with other browsers on this.

Yes, this is part of the fetch standard: 2.10. Should response to request be blocked due to its MIME type?

Flags: needinfo?(tschuster)

Thomas Wisniewski [:twisniewski]

Assignee

Comment 14

•

6 months ago

Ah, that's what I thought. Then if I'm not mistaken, the spec doesn't really match what we're doing now, which is to block if MimeType parsing fails (that is, we're more strict). For instance, these three tests should allow, not block, the load (because they don't parse so step 2 allows them): https://searchfox.org/mozilla-central/source/dom/security/test/general/test_block_script_wrong_mime.html#18,20,22

Would we want to change that to align with the spec? (Perhaps the blocking of active content can/should be made more strict, assuming it's web-compatible?)

Flags: needinfo?(tschuster)

Tom Schuster

Comment 15

•

6 months ago

I don't think we can weaken our handling here, because we seem to allow executing scripts with invalid mime types. I am extremely hesitant to undo any small win we got here. Please open an issue with spec instead.

PS: You don't need to needinfo me for every comment.

Flags: needinfo?(tschuster)

Thomas Wisniewski [:twisniewski]

Assignee

Comment 16

•

6 months ago

Hmm. Then to follow the spec here, it does sound like we'd need to disallow scripts with invalid mime types from being run (assuming that's web-compatible). But if there isn't already spec-text for that, then I'm not sure which spec to file an issue under.

Thomas Wisniewski [:twisniewski]

Assignee

Comment 17

•

6 months ago

I've found that just adjusting the script-blocking code to account for the content type being void (no header passed in at all, so allow), versus it being an empty-string (parsing the MimeType failed, so block), passes the script-blocking test, and doesn't seem to be triggering any obvious test failures:

I think that for now that's a fair compromise to move us forward here (as in, to keep blocking scripts as we are right now, while still fixing the handling of multiple-content-type-headers to match the spec).

As such I'll update my patch with those changes and request review from Dan and Tom.

Thomas Wisniewski [:twisniewski]

Assignee

Comment 18

•

6 months ago

Attached file Bug 1895075 - part 3: continue to use net_ParseContentType for EnsureMIMEOfScript for now; r?dveditz,tschuster — Details

Tom Schuster

Comment 19

•

6 months ago

But if there isn't already spec-text for that, then I'm not sure which spec to file an issue under.

I imagine this would also be under the fetch spec, because that contains the blocking of image/, video/ etc. currently.

I think that for now that's a fair compromise to move us forward here (as in, to keep blocking scripts as we are right now, while still fixing the handling of multiple-content-type-headers to match the spec).

We have no idea if that is web compatible though? We first have to land telemetry to figure out the impact before blocking this. I imagine invalid MIME types are a much bigger set than the probably support uncommon Content-Type: image/ case.

Phabricator Automation

Updated

•

6 months ago

Attachment #9404412 - Attachment description: Bug 1895075 - part 3: distinguish between no content-type headers being given versus them failing to parse in ResponseHead, and have EnsureMIMEOfScript explicitly check for those cases; r?dveditz,tschuster → Bug 1895075 - part 3: continue to use net_ParseContentType for EnsureMIMEOfScript for now; r?dveditz,tschuster

Daniel Veditz [:dveditz]

Comment 20

•

6 months ago

(In reply to Thomas Wisniewski [:twisniewski] from comment #16)

Hmm. Then to follow the spec here, it does sound like we'd need to disallow scripts with invalid mime types from being run (assuming that's web-compatible). But if there isn't already spec-text for that, then I'm not sure which spec to file an issue under.

That assumption is the catch, though! We don't know if it's web compatible until we have telemetry on it. We'd very much LIKE to block invalid Mime Types, but so far the only ones we have been confident in blocking are the invalid "image/" etc. ones, because those were the only ones we measured telemetry on. The data gathering rules did not let us send back all the mime types we saw: we had to enumerate the ones we knew were wrong and likely blockable and measure those explicitly.

We know we can't block "missing" because that was too common. I'm not sure our previous telemetry distinguished between "missing" and "empty string". I assume it didn't, actually, so we might not even be safe blocking empty string (or formerly parsing as empty string) ones. Again, we'd like to, but we CANNOT afford to break sites that "work in Chrome". You know well that a lot of sites aren't going to change things to fix their site for Firefox, and every Firefox user who uses that site then has to start using a different browser for that site, which leads to using the other browser for all their sites.

We can get away with breaking sites only when we offer users something they value, like private browsing or tracking protection (and even there we have to offer a way to turn it off site-by-site). Users don't care if we're spec-correct unless it means "not broken".

Thomas Wisniewski [:twisniewski]

Assignee

Comment 21

•

6 months ago

Right, that's why my patches here will not change the behavior for script-blocking, and we can figure out what to do about it (telemetry or otherwise) in a follow-up.

Valentin Gosu [:valentin] (he/him)

Comment 22

•

6 months ago

We can get away with breaking sites only when we offer users something they value, like private browsing or tracking protection (and even there we have to offer a way to turn it off site-by-site). Users don't care if we're spec-correct unless it means "not broken".

I wouldn't say it's just about being spec-correct. There are current differences in how we parse duplicate content-type headers:
https://wpt.fyi/results/fetch/content-type/response.window.html?label=master&label=experimental&aligned&q=content-type%2Fresponse

The Firefox only failures are a potential web-compat risk, similar to how the content-length parsing differences caused bug 1829925 to become an incident. These are usually corner cases that don't matter much to the web.
In any case, the patches in their current form don't actually change the blocking behaviour, right?

Thomas Wisniewski [:twisniewski]

Assignee

Comment 23

•

6 months ago

the patches in their current form don't actually change the blocking behaviour, right?

Yes, the third patch in the set makes sure we still use net_parseContentType for the blocker, so the behavior remains the same for now.

Simon Pieters [:zcorpan]

Comment 24

•

6 months ago

(In reply to Thomas Wisniewski [:twisniewski] from comment #16)

Hmm. Then to follow the spec here, it does sound like we'd need to disallow scripts with invalid mime types from being run (assuming that's web-compatible). But if there isn't already spec-text for that, then I'm not sure which spec to file an issue under.

No, https://fetch.spec.whatwg.org/#should-response-to-request-be-blocked-due-to-mime-type? step 2 says "If mimeType is failure, then return allowed."

The caller is here: https://fetch.spec.whatwg.org/#ref-for-should-response-to-request-be-blocked-due-to-mime-type?%E2%91%A0

So invalid MIME type (i.e., where https://fetch.spec.whatwg.org/#concept-header-extract-mime-type returns failure) is not blocked/treated as a network error.

However, ORB also checks MIME types, see https://github.com/whatwg/fetch/pull/1755 (https://whatpr.org/fetch/1755/8d0d207...f937b58.html#orb), but I think it also doesn't block invalid MIME types.

Thomas Wisniewski [:twisniewski]

Assignee

Comment 25

•

6 months ago

Agreed, but that's also the problem which made me start asking questions, zcorpan. If we allow all unparsables, we'll be less strict than we currently are for script-blocking (because we currently block cases like image/, which has no sub-type and so doesn't parse). But we don't really want to be less strict until we've gathered telemetry which implies that we need to be for web-compatibility's sake (at least, that's how I've understood the conversation so far). So for now my patches will leave script blocking alone, and keep it as-is until we decide what to do.

Simon Pieters [:zcorpan]

Comment 26

•

6 months ago

Findings from httparchive (the first query result is saved to httparchive-sandbox.test.resp_content_type_missing_or_empty_or_invalid):

Total number of pages in the 2024_05_01_desktop dataset: 12,694,911
Total number of requests in the 2024_05_01_desktop dataset: 1,293,123,062
httparchive response content-type missing, by http status (23.5% of pages for status 200)
httparchive response content-type empty, by http status (0.10% of pages for status 200)
httparchive response content-type invalid, by http status (0.36% of pages for status 200)
httparchive response content-type multiple headers, by http status (0.00072% of pages for status 200)
httparchive response content-type invalid, values (most common are "image", "-", "none", "png"...)

Kershaw Chang [:kershaw]

Updated

•

5 months ago

Whiteboard: [necko-triaged][necko-priority-review] → [necko-triaged]

BugBot [:suhaib / :marco/ :calixte]

Comment 27

•

4 months ago

There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:twisniewski, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit BugBot documentation.

Flags: needinfo?(twisniewski)

Flags: needinfo?(tschuster)

Thomas Wisniewski [:twisniewski]

Assignee

Comment 28

•

4 months ago

Ok, I've marked the patches as "changes planned" for now, since we're not sure if/when we want to land them. We'll have to revisit this when we have more time.

Flags: needinfo?(twisniewski)

Flags: needinfo?(tschuster)

Valentin Gosu [:valentin] (he/him)

Updated

•

2 months ago

Updated

•

2 months ago

Blocks: 1911942

Valentin Gosu [:valentin] (he/him)

Updated

•

2 months ago

Duplicate of this bug: 1911942

Sunil Mayya

Updated

•

1 month ago