Open Bug 1895075 Opened 7 months ago Updated 1 month ago

Improve parsing of response content-type type headers to match spec

Categories

(Core :: Networking, defect, P2)

defect

Tracking

()

ASSIGNED

People

(Reporter: twisniewski, Assigned: twisniewski)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [necko-triaged])

Attachments

(3 files)

Right now, we parse individual Content-Type headers separately as see them, in ParseHeaderLine_locked. This is incorrectly per spec. We are meant to collect them all, and then use Extract a MIME Type on them all. This will let us parse quotes and such as the spec expects.

Since bug 1847659 has shipped without known regressions for several releases already, I think it's safe to continue nudging us along here to match the spec. I propose we fall back to our current behavior if network.standard_content_type_parsing.response_headers is false, but collect and extract the content type headers per spec if it's true.

This will let us confirm whether the spec text is web-compatible, and if so, we can continue to iterate and re-use our implementation of Extract a MIME type elsewhere as the spec text expects. And we can later store the full, properly-parsed content-type object so that other code may access it (rather than our current approach of only saving the content type's essence as mContentType, and its charset parameter as mContentCharset, which leads to other spec-compatibility bugs and complications).

Blocks: fetch
Severity: -- → S3
Priority: -- → P2
Whiteboard: [necko-triaged][necko-priority-review]
Duplicate of this bug: 560392
Blocks: mimesniff
Assignee: nobody → twisniewski
Status: NEW → ASSIGNED
Pushed by twisniewski@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d5fef245a430 part 1: implement the Fetch standard's Extract a MimeType and related algorithms; r=hsivonen https://hg.mozilla.org/integration/autoland/rev/a1cad969287d part 2: parse content-type response headers using the Extract a Mime Type algorithm from the Fetch spec; r=valentin,kershaw,necko-reviewers

Backed out for causing mochitest failures on test_block_script_wrong_mime.html

Flags: needinfo?(twisniewski)

I just chatted with dveditz about the test-case failure here, and I think we're stuck for now. The failure is because the test in question expects scripts with certain invalid mime-types to be blocked, which clashes with the spec text. For instance, image/ is currently blocked, but if we follow the spec text, it would be allowed, because that's an invalid mime-type and the spec says parsing failures should be allowed.

Dan suggested that we should add telemetry to measure if this is a problem before landing these patches, and I'd like to give him a chance to think about this some more before we try landing this again.

Flags: needinfo?(twisniewski)

I think it would be nice if we could better distinguish between empty & invalid, where AFAIU we have to allow empty.
However, I know that we've been bitten by making script mime checking harder many, many times. It would be nice if we could play with this before rolling it out completely (using an experiment, testing in Nightly or such...

As desirable as those security checks are, I wouldn't want us to take the full load of webcompat issues if other browsers are less strict.

Adding Tom Schuster because he has a painful history with stricter mime type parsing (Sorry Tom!) :)

Agreed, :freddy. My goal is to eventually move over to storing the parsed MimeType as the channel's/responsehead's ContentType, rather than two strings for ContentType and ContentCharset. But I will need to touch a lot of code to add that, since lots of necko classes and boilerplate code is involved, plus I will need to make the MimeType class XPCOM-compatible (from what I can tell).

In the meantime, I was hoping that maybe we could make a simpler change, where the existing ContentType members can be made void vs empty, where void means "not even set" and empty meaning "failed to parse". But a lot of code checks this, so it too will be a bit of a project to vet everything.

And no worries if more eyes are on this; I really would like to continue pushing this forward, and if others can help me devise more piece-meal ways to accomplish it, so much the better.

We should figure out if we can block scripts with an invalid MIME type. Otherwise I am hesitant to further weak our already weak measures again scripts with a wrong MIME.

Out of curiosity, is there a standard for how scripts should be blocked where MIME Types are considered? It would be nice to confirm whether we're aligning with other browsers on this.

Flags: needinfo?(tschuster)

(In reply to Thomas Wisniewski [:twisniewski] from comment #12)

Out of curiosity, is there a standard for how scripts should be blocked where MIME Types are considered? It would be nice to confirm whether we're aligning with other browsers on this.

Yes, this is part of the fetch standard: 2.10. Should response to request be blocked due to its MIME type?

Flags: needinfo?(tschuster)

Ah, that's what I thought. Then if I'm not mistaken, the spec doesn't really match what we're doing now, which is to block if MimeType parsing fails (that is, we're more strict). For instance, these three tests should allow, not block, the load (because they don't parse so step 2 allows them): https://searchfox.org/mozilla-central/source/dom/security/test/general/test_block_script_wrong_mime.html#18,20,22

Would we want to change that to align with the spec? (Perhaps the blocking of active content can/should be made more strict, assuming it's web-compatible?)

Flags: needinfo?(tschuster)

I don't think we can weaken our handling here, because we seem to allow executing scripts with invalid mime types. I am extremely hesitant to undo any small win we got here. Please open an issue with spec instead.

PS: You don't need to needinfo me for every comment.

Flags: needinfo?(tschuster)

Hmm. Then to follow the spec here, it does sound like we'd need to disallow scripts with invalid mime types from being run (assuming that's web-compatible). But if there isn't already spec-text for that, then I'm not sure which spec to file an issue under.

I've found that just adjusting the script-blocking code to account for the content type being void (no header passed in at all, so allow), versus it being an empty-string (parsing the MimeType failed, so block), passes the script-blocking test, and doesn't seem to be triggering any obvious test failures:

I think that for now that's a fair compromise to move us forward here (as in, to keep blocking scripts as we are right now, while still fixing the handling of multiple-content-type-headers to match the spec).

As such I'll update my patch with those changes and request review from Dan and Tom.

But if there isn't already spec-text for that, then I'm not sure which spec to file an issue under.

I imagine this would also be under the fetch spec, because that contains the blocking of image/, video/ etc. currently.

I think that for now that's a fair compromise to move us forward here (as in, to keep blocking scripts as we are right now, while still fixing the handling of multiple-content-type-headers to match the spec).

We have no idea if that is web compatible though? We first have to land telemetry to figure out the impact before blocking this. I imagine invalid MIME types are a much bigger set than the probably support uncommon Content-Type: image/ case.

Attachment #9404412 - Attachment description: Bug 1895075 - part 3: distinguish between no content-type headers being given versus them failing to parse in ResponseHead, and have EnsureMIMEOfScript explicitly check for those cases; r?dveditz,tschuster → Bug 1895075 - part 3: continue to use net_ParseContentType for EnsureMIMEOfScript for now; r?dveditz,tschuster

(In reply to Thomas Wisniewski [:twisniewski] from comment #16)

Hmm. Then to follow the spec here, it does sound like we'd need to disallow scripts with invalid mime types from being run (assuming that's web-compatible). But if there isn't already spec-text for that, then I'm not sure which spec to file an issue under.

That assumption is the catch, though! We don't know if it's web compatible until we have telemetry on it. We'd very much LIKE to block invalid Mime Types, but so far the only ones we have been confident in blocking are the invalid "image/" etc. ones, because those were the only ones we measured telemetry on. The data gathering rules did not let us send back all the mime types we saw: we had to enumerate the ones we knew were wrong and likely blockable and measure those explicitly.

We know we can't block "missing" because that was too common. I'm not sure our previous telemetry distinguished between "missing" and "empty string". I assume it didn't, actually, so we might not even be safe blocking empty string (or formerly parsing as empty string) ones. Again, we'd like to, but we CANNOT afford to break sites that "work in Chrome". You know well that a lot of sites aren't going to change things to fix their site for Firefox, and every Firefox user who uses that site then has to start using a different browser for that site, which leads to using the other browser for all their sites.

We can get away with breaking sites only when we offer users something they value, like private browsing or tracking protection (and even there we have to offer a way to turn it off site-by-site). Users don't care if we're spec-correct unless it means "not broken".

Right, that's why my patches here will not change the behavior for script-blocking, and we can figure out what to do about it (telemetry or otherwise) in a follow-up.

We can get away with breaking sites only when we offer users something they value, like private browsing or tracking protection (and even there we have to offer a way to turn it off site-by-site). Users don't care if we're spec-correct unless it means "not broken".

I wouldn't say it's just about being spec-correct. There are current differences in how we parse duplicate content-type headers:
https://wpt.fyi/results/fetch/content-type/response.window.html?label=master&label=experimental&aligned&q=content-type%2Fresponse

The Firefox only failures are a potential web-compat risk, similar to how the content-length parsing differences caused bug 1829925 to become an incident. These are usually corner cases that don't matter much to the web.
In any case, the patches in their current form don't actually change the blocking behaviour, right?

the patches in their current form don't actually change the blocking behaviour, right?

Yes, the third patch in the set makes sure we still use net_parseContentType for the blocker, so the behavior remains the same for now.

(In reply to Thomas Wisniewski [:twisniewski] from comment #16)

Hmm. Then to follow the spec here, it does sound like we'd need to disallow scripts with invalid mime types from being run (assuming that's web-compatible). But if there isn't already spec-text for that, then I'm not sure which spec to file an issue under.

No, https://fetch.spec.whatwg.org/#should-response-to-request-be-blocked-due-to-mime-type? step 2 says "If mimeType is failure, then return allowed."

The caller is here: https://fetch.spec.whatwg.org/#ref-for-should-response-to-request-be-blocked-due-to-mime-type?%E2%91%A0

So invalid MIME type (i.e., where https://fetch.spec.whatwg.org/#concept-header-extract-mime-type returns failure) is not blocked/treated as a network error.

However, ORB also checks MIME types, see https://github.com/whatwg/fetch/pull/1755 (https://whatpr.org/fetch/1755/8d0d207...f937b58.html#orb), but I think it also doesn't block invalid MIME types.

Agreed, but that's also the problem which made me start asking questions, zcorpan. If we allow all unparsables, we'll be less strict than we currently are for script-blocking (because we currently block cases like image/, which has no sub-type and so doesn't parse). But we don't really want to be less strict until we've gathered telemetry which implies that we need to be for web-compatibility's sake (at least, that's how I've understood the conversation so far). So for now my patches will leave script blocking alone, and keep it as-is until we decide what to do.

Findings from httparchive (the first query result is saved to httparchive-sandbox.test.resp_content_type_missing_or_empty_or_invalid):

Whiteboard: [necko-triaged][necko-priority-review] → [necko-triaged]

There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:twisniewski, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit BugBot documentation.

Flags: needinfo?(twisniewski)
Flags: needinfo?(tschuster)

Ok, I've marked the patches as "changes planned" for now, since we're not sure if/when we want to land them. We'll have to revisit this when we have more time.

Flags: needinfo?(twisniewski)
Flags: needinfo?(tschuster)
See Also: → 1911942
Blocks: 1911942
Duplicate of this bug: 1911942
See Also: → 1924398
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: