Improve parsing of response content-type type headers to match spec
Categories
(Core :: Networking, defect, P2)
Tracking
()
People
(Reporter: twisniewski, Assigned: twisniewski)
References
(Blocks 3 open bugs)
Details
(Whiteboard: [necko-triaged])
Attachments
(3 files)
Right now, we parse individual Content-Type headers separately as see them, in ParseHeaderLine_locked. This is incorrectly per spec. We are meant to collect them all, and then use Extract a MIME Type on them all. This will let us parse quotes and such as the spec expects.
Since bug 1847659 has shipped without known regressions for several releases already, I think it's safe to continue nudging us along here to match the spec. I propose we fall back to our current behavior if network.standard_content_type_parsing.response_headers
is false
, but collect and extract the content type headers per spec if it's true
.
This will let us confirm whether the spec text is web-compatible, and if so, we can continue to iterate and re-use our implementation of Extract a MIME type elsewhere as the spec text expects. And we can later store the full, properly-parsed content-type object so that other code may access it (rather than our current approach of only saving the content type's essence as mContentType, and its charset parameter as mContentCharset, which leads to other spec-compatibility bugs and complications).
Assignee | ||
Comment 1•7 months ago
|
||
Updated•7 months ago
|
Assignee | ||
Comment 2•7 months ago
•
|
||
Assignee | ||
Comment 4•7 months ago
|
||
Updated•7 months ago
|
Assignee | ||
Comment 5•7 months ago
|
||
Backed out for causing mochitest failures on test_block_script_wrong_mime.html
Assignee | ||
Comment 8•6 months ago
|
||
I just chatted with dveditz about the test-case failure here, and I think we're stuck for now. The failure is because the test in question expects scripts with certain invalid mime-types to be blocked, which clashes with the spec text. For instance, image/
is currently blocked, but if we follow the spec text, it would be allowed, because that's an invalid mime-type and the spec says parsing failures should be allowed.
Dan suggested that we should add telemetry to measure if this is a problem before landing these patches, and I'd like to give him a chance to think about this some more before we try landing this again.
Comment 9•6 months ago
|
||
I think it would be nice if we could better distinguish between empty & invalid, where AFAIU we have to allow empty.
However, I know that we've been bitten by making script mime checking harder many, many times. It would be nice if we could play with this before rolling it out completely (using an experiment, testing in Nightly or such...
As desirable as those security checks are, I wouldn't want us to take the full load of webcompat issues if other browsers are less strict.
Adding Tom Schuster because he has a painful history with stricter mime type parsing (Sorry Tom!) :)
Assignee | ||
Comment 10•6 months ago
|
||
Agreed, :freddy. My goal is to eventually move over to storing the parsed MimeType as the channel's/responsehead's ContentType, rather than two strings for ContentType and ContentCharset. But I will need to touch a lot of code to add that, since lots of necko classes and boilerplate code is involved, plus I will need to make the MimeType class XPCOM-compatible (from what I can tell).
In the meantime, I was hoping that maybe we could make a simpler change, where the existing ContentType members can be made void vs empty, where void means "not even set" and empty meaning "failed to parse". But a lot of code checks this, so it too will be a bit of a project to vet everything.
And no worries if more eyes are on this; I really would like to continue pushing this forward, and if others can help me devise more piece-meal ways to accomplish it, so much the better.
Comment 11•6 months ago
|
||
We should figure out if we can block scripts with an invalid MIME type. Otherwise I am hesitant to further weak our already weak measures again scripts with a wrong MIME.
Assignee | ||
Comment 12•6 months ago
|
||
Out of curiosity, is there a standard for how scripts should be blocked where MIME Types are considered? It would be nice to confirm whether we're aligning with other browsers on this.
Comment 13•6 months ago
|
||
(In reply to Thomas Wisniewski [:twisniewski] from comment #12)
Out of curiosity, is there a standard for how scripts should be blocked where MIME Types are considered? It would be nice to confirm whether we're aligning with other browsers on this.
Yes, this is part of the fetch standard: 2.10. Should response to request be blocked due to its MIME type?
Assignee | ||
Comment 14•6 months ago
|
||
Ah, that's what I thought. Then if I'm not mistaken, the spec doesn't really match what we're doing now, which is to block if MimeType parsing fails (that is, we're more strict). For instance, these three tests should allow, not block, the load (because they don't parse so step 2 allows them): https://searchfox.org/mozilla-central/source/dom/security/test/general/test_block_script_wrong_mime.html#18,20,22
Would we want to change that to align with the spec? (Perhaps the blocking of active content can/should be made more strict, assuming it's web-compatible?)
Comment 15•6 months ago
|
||
I don't think we can weaken our handling here, because we seem to allow executing scripts with invalid mime types. I am extremely hesitant to undo any small win we got here. Please open an issue with spec instead.
PS: You don't need to needinfo me for every comment.
Assignee | ||
Comment 16•6 months ago
|
||
Hmm. Then to follow the spec here, it does sound like we'd need to disallow scripts with invalid mime types from being run (assuming that's web-compatible). But if there isn't already spec-text for that, then I'm not sure which spec to file an issue under.
Assignee | ||
Comment 17•6 months ago
|
||
I've found that just adjusting the script-blocking code to account for the content type being void (no header passed in at all, so allow), versus it being an empty-string (parsing the MimeType failed, so block), passes the script-blocking test, and doesn't seem to be triggering any obvious test failures:
- https://treeherder.mozilla.org/jobs?repo=try&revision=0ce587539ac40826ac7cd6a965fec4904fec1a9f
- https://treeherder.mozilla.org/jobs?repo=try&revision=b54e72dbd0fa85441ce0fa51d626a01a8b37a280
I think that for now that's a fair compromise to move us forward here (as in, to keep blocking scripts as we are right now, while still fixing the handling of multiple-content-type-headers to match the spec).
As such I'll update my patch with those changes and request review from Dan and Tom.
Assignee | ||
Comment 18•6 months ago
|
||
Comment 19•6 months ago
|
||
But if there isn't already spec-text for that, then I'm not sure which spec to file an issue under.
I imagine this would also be under the fetch spec, because that contains the blocking of image/, video/ etc. currently.
I think that for now that's a fair compromise to move us forward here (as in, to keep blocking scripts as we are right now, while still fixing the handling of multiple-content-type-headers to match the spec).
We have no idea if that is web compatible though? We first have to land telemetry to figure out the impact before blocking this. I imagine invalid MIME types are a much bigger set than the probably support uncommon Content-Type: image/
case.
Updated•6 months ago
|
Comment 20•6 months ago
|
||
(In reply to Thomas Wisniewski [:twisniewski] from comment #16)
Hmm. Then to follow the spec here, it does sound like we'd need to disallow scripts with invalid mime types from being run (assuming that's web-compatible). But if there isn't already spec-text for that, then I'm not sure which spec to file an issue under.
That assumption is the catch, though! We don't know if it's web compatible until we have telemetry on it. We'd very much LIKE to block invalid Mime Types, but so far the only ones we have been confident in blocking are the invalid "image/" etc. ones, because those were the only ones we measured telemetry on. The data gathering rules did not let us send back all the mime types we saw: we had to enumerate the ones we knew were wrong and likely blockable and measure those explicitly.
We know we can't block "missing" because that was too common. I'm not sure our previous telemetry distinguished between "missing" and "empty string". I assume it didn't, actually, so we might not even be safe blocking empty string (or formerly parsing as empty string) ones. Again, we'd like to, but we CANNOT afford to break sites that "work in Chrome". You know well that a lot of sites aren't going to change things to fix their site for Firefox, and every Firefox user who uses that site then has to start using a different browser for that site, which leads to using the other browser for all their sites.
We can get away with breaking sites only when we offer users something they value, like private browsing or tracking protection (and even there we have to offer a way to turn it off site-by-site). Users don't care if we're spec-correct unless it means "not broken".
Assignee | ||
Comment 21•6 months ago
|
||
Right, that's why my patches here will not change the behavior for script-blocking, and we can figure out what to do about it (telemetry or otherwise) in a follow-up.
Comment 22•6 months ago
|
||
We can get away with breaking sites only when we offer users something they value, like private browsing or tracking protection (and even there we have to offer a way to turn it off site-by-site). Users don't care if we're spec-correct unless it means "not broken".
I wouldn't say it's just about being spec-correct. There are current differences in how we parse duplicate content-type headers:
https://wpt.fyi/results/fetch/content-type/response.window.html?label=master&label=experimental&aligned&q=content-type%2Fresponse
The Firefox only failures are a potential web-compat risk, similar to how the content-length parsing differences caused bug 1829925 to become an incident. These are usually corner cases that don't matter much to the web.
In any case, the patches in their current form don't actually change the blocking behaviour, right?
Assignee | ||
Comment 23•6 months ago
|
||
the patches in their current form don't actually change the blocking behaviour, right?
Yes, the third patch in the set makes sure we still use net_parseContentType for the blocker, so the behavior remains the same for now.
Comment 24•6 months ago
|
||
(In reply to Thomas Wisniewski [:twisniewski] from comment #16)
Hmm. Then to follow the spec here, it does sound like we'd need to disallow scripts with invalid mime types from being run (assuming that's web-compatible). But if there isn't already spec-text for that, then I'm not sure which spec to file an issue under.
No, https://fetch.spec.whatwg.org/#should-response-to-request-be-blocked-due-to-mime-type? step 2 says "If mimeType is failure, then return allowed."
The caller is here: https://fetch.spec.whatwg.org/#ref-for-should-response-to-request-be-blocked-due-to-mime-type?%E2%91%A0
So invalid MIME type (i.e., where https://fetch.spec.whatwg.org/#concept-header-extract-mime-type returns failure) is not blocked/treated as a network error.
However, ORB also checks MIME types, see https://github.com/whatwg/fetch/pull/1755 (https://whatpr.org/fetch/1755/8d0d207...f937b58.html#orb), but I think it also doesn't block invalid MIME types.
Assignee | ||
Comment 25•6 months ago
|
||
Agreed, but that's also the problem which made me start asking questions, zcorpan. If we allow all unparsables, we'll be less strict than we currently are for script-blocking (because we currently block cases like image/
, which has no sub-type and so doesn't parse). But we don't really want to be less strict until we've gathered telemetry which implies that we need to be for web-compatibility's sake (at least, that's how I've understood the conversation so far). So for now my patches will leave script blocking alone, and keep it as-is until we decide what to do.
Comment 26•6 months ago
|
||
Findings from httparchive (the first query result is saved to httparchive-sandbox.test.resp_content_type_missing_or_empty_or_invalid
):
- Total number of pages in the 2024_05_01_desktop dataset: 12,694,911
- Total number of requests in the 2024_05_01_desktop dataset: 1,293,123,062
- httparchive response content-type missing, by http status (23.5% of pages for status 200)
- httparchive response content-type empty, by http status (0.10% of pages for status 200)
- httparchive response content-type invalid, by http status (0.36% of pages for status 200)
- httparchive response content-type multiple headers, by http status (0.00072% of pages for status 200)
- httparchive response content-type invalid, values (most common are "image", "-", "none", "png"...)
Updated•5 months ago
|
Comment 27•4 months ago
|
||
There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:twisniewski, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit BugBot documentation.
Assignee | ||
Comment 28•4 months ago
|
||
Ok, I've marked the patches as "changes planned" for now, since we're not sure if/when we want to land them. We'll have to revisit this when we have more time.
Description
•