Is ClearBogusContentEncodingIfNeeded() needed?
Categories
(Core :: Networking, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox94 | --- | fixed |
People
(Reporter: annevk, Assigned: emilio)
References
Details
(Whiteboard: [necko-triaged])
Attachments
(1 file)
Newer Apache configurations reportedly do not require this hack. Due to the broken patches of bug 542222 I cannot find the history of this method.
Comment 1•10 years ago
|
||
Introduced here http://hg.mozilla.org/mozilla-central/diff/1238046c4cce/netwerk/protocol/http/src/nsHttpChannel.cpp. Why do you believe we need to remove it?
Comment 2•10 years ago
|
||
Luckily, this is actually CVS-era code. ;) This was added in bug 35956. In particular, see bug 35956 comment 64.
Reporter | ||
Comment 3•10 years ago
•
|
||
It seems bug 35956 comment 120 goes to the heart of the issue. Honza, we should either get our non-standard behavior in the network layer standardized, or remove it. Otherwise Servo will have a hard time competing with us.
Comment 4•10 years ago
|
||
Note that specifically for the save-to-file behavior what we actually do is completely ignore the Content-Type when deciding whether to decode. We just compare the URI filename extension, if any, to the Content-Encoding. If the Content-Encoding matches the filename extension, we don't undo the content encoding; otherwise we undo it. So if you send: Content-Type: text/html Content-Encoding: gzip and the filename is "foo.html" then we'll gunzip when saving, but if it's "foo.html.gz" then we won't.
Comment 5•9 years ago
|
||
Any chance of getting telemetry on how often this actually happens?
Updated•8 years ago
|
Comment 6•7 years ago
|
||
I came across this bug from the Fetch spec. Chrome hasn't had this behavior for about two years now, neither the Content-Type one in the spec nor the filename one in comment #4. See https://chromium.googlesource.com/chromium/src.git/+/87b5f2ece37da37ef0b0a04da9aee538ef0f9c4b. I also did a quick test and neither Safari nor IE11 (I don't currently have Edge on hand, but I assume the behavior is the same) have this quirk either. Since Firefox is the only browser that does this, probably it shouldn't be in the Fetch spec. Should I file a spec bug? I've seen one Apache deployment with the issue, and the culprit was the AddEncoding sample here. As far as I can tell, that directive is pointless. What you actually want is for foo.html.gz to be served with Content-Encoding gzip when fetching foo.html. I think that was a different module altogether, though I forget the details. https://httpd.apache.org/docs/2.4/mod/mod_mime.html#addencoding
Comment 7•7 years ago
|
||
Just to check... Is it no longer the case that Apache serves "foo.gz" as Content-Type: application/x-gzip Content-encoding: gzip by default? As far as what browsers do, I'd want to see some explicit tests, not just code pointers. There is various logic in Firefox around this in higher levels than necko that prevents decompression in some cases even if the Content-Encoding header _is_ present; I can't speak to other browsers.
Comment 8•7 years ago
|
||
I don't believe we have much in the way of special-cases here. That code link is it. For a test, here are two URLs for the same tarball. The first does not have Content-Encoding gzip, the second does. https://davidben.scripts.mit.edu/cuttlefish.tar.gz https://davidben.scripts.mit.edu/cuttlefish-2.tar.gz In every browser I tested except Firefox, the second link is decompressed and the first link is not. Firefox leaves both compressed.
Comment 9•7 years ago
|
||
Just for the record, if I remove ClearBogusEncodingIfNeeded completely, that doesn't change Firefox behavior on the testcases linked in comment 8. That's because even if the "Content-Encoding" header is present, for the specific case of saving Firefox will ignore it if that would produce garbage in the resulting local file (e.g. a non-gzipped tar archive with a .tar.gz extension). Specifically, we map encodings to MIME types per http://searchfox.org/mozilla-central/source/netwerk/protocol/http/HttpBaseChannel.cpp#1251-1306 and then disable decompression if the extension and encoding-derived type match one of the pairs in http://searchfox.org/mozilla-central/rev/7cc377ce3f0a569c5c9b362d589705cd4fecc0ac/uriloader/exthandler/nsExternalHelperAppService.cpp#600-606 The fact that other UAs just happily save garbage in that situation is rather user-hostile. :(
Comment 10•7 years ago
|
||
*shrug* I think this is just this usual debate: https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00
Comment 11•7 years ago
|
||
Sort of. Consider the case of a ".foo.gz" file on the server that the server sends with: Content-Type: application/foo Content-Encoding: gzip with the browser not knowing what extension "application/foo" should map to. This is perfectly correct server behavior, so Postel's Law doesn't even apply in this situation. When the browser goes to save this, there are several options: 1) Save as a ".foo.gz" file, save compressed data. 2) Save as a ".foo.gz" file, save decompressed data. 3) Save as a ".foo" file, save compressed data. 4) Save as a ".foo" file, save decompressed data. I will claim that options 2 and 3 are just totally broken behavior for a browser. In practice browsers aim for #4 in cases when they know what extension corresponds to "application/foo", by appending the relevant extension to the filename. But in cases in which they don't know, they can't very well do that. They could try stripping the .gz extension, but that carries a risk of user confusion and dataloss too. So in Firefox, after the "try to append an extension to make the filename match the actual type we are saving" bit, there is a check for whether our resulting extension still looks like something that triggered a content-encoding and a fallback to #1 to avoid ending up in #2. It just happens that when "application/foo" is "application/x-gzip", that fallback path ends up triggering as well, because even through we know the corresponding extension it's already on the filename and hence the filename is unchanged.
Comment 12•7 years ago
|
||
I think the server should send what it means. I believe that example means (2). If the server wanted something else, it shouldn't send something that means (2). And indeed our metrics suggested servers generally don't, which is why we removed that special-case. It also matches other browsers. If the server wants (1), it should: Host at .foo.gz (or use Content-Disposition) Content-Type: application/foo (No Content-Encoding) If the server wants (4), it should: Host at .foo (or use Content-Disposition) Content-Type: application/foo Content-Encoding: gzip (2) and (3) are indeed silly but, for all the browser knows, the server was configured to blindly wrap everything in a layer of gzip with Content-Encoding gzip, even a download that legitimately was meant to save as .foo.gz. Perhaps the decompressed data is itself a gzipped application/foo. Working around server misbehaviors means we end up in the cycle of problems in that document. Given that all only one UA has this behavior and another UA used to and managed to remove it, we should take advantage of this rare opportunity to simplify the platform. Certainly I don't think behavior like this belongs in the Fetch spec.
Comment 13•7 years ago
|
||
> I think the server should send what it means. It does, though! For the case when application/foo should be rendered inline in the browser, the server is doing _exactly_ the right thing in my comment 11 example. Consider the case when "application/foo" is "text/html", say. The filename on the server should not affect whether a browser is or is not able to handle the content inline. For what it's worth, I agree that the specifics of the behavior probably do not need to be in fetch, except insofar as higher layers should be able to instruct a fetch to not decompress based on examining the headers.
Comment 14•7 years ago
|
||
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Reporter | ||
Comment 15•6 years ago
|
||
I created https://github.com/whatwg/fetch/pull/816 to remove the quirk from Fetch, sorry for the delay.
Comment 16•5 years ago
|
||
(In reply to Anne (:annevk) from comment #15)
I created https://github.com/whatwg/fetch/pull/816 to remove the quirk from
Fetch, sorry for the delay.
I guess this means the consensus is to remove ClearBogusContentEncodingIfNeeded()
in our code.
Anne, do you know if there is any concern about removing this?
Reporter | ||
Comment 17•5 years ago
|
||
I would recommend unshipping this in EARLY_BETA_OR_EARLIER and then unshipping it completely in a follow-up release (or perhaps two releases later now that releases are more frequent). Please also communicate this in an Intent to unship. I would not expect much fallout as other browsers do not have this, but better to be on the safe side here.
Comment 18•5 years ago
|
||
So just to be clear, the proposal here is:
- Not to change the higher-level "what filename do we pick when saving, and do we decompress when saving?" behavior.
- Remove the encoding-stripping in the necko code.
Is that correct?
Reporter | ||
Comment 19•5 years ago
|
||
Yes, this is only about 2, though I do think doing something different from other browsers is risky for 1, even if we're arguably more correct, as folks often stop trying the moment the thing they are deploying works in whatever browser they test in (which is likely a majority market share browser).
Comment 20•5 years ago
|
||
Sure, but are there cases where our behavior for 1 would make things worse than the other browsers in practice?
Updated•5 years ago
|
Comment 22•4 years ago
•
|
||
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #18)
So just to be clear, the proposal here is:
- Not to change the higher-level "what filename do we pick when saving, and do we decompress when saving?" behavior.
- Remove the encoding-stripping in the necko code.
Is that correct?
As said in comment 9, only doing 2 doesn't change firefox's behavior. Removing ClearBogusContentEncodingIfNeeded()
only helps bug 1597602, but it's apparently that the issue in bug 1597602 is at server side.
After some thinking, I'm inclined to not change firefox's behavior.
Updated•4 years ago
|
Comment 23•4 years ago
|
||
I've came across a case when this heuristic does things wrong.
My Mattermost installation compresses everything it sends out, even .tar.gz attachments. The response headers look like this:
< content-type: application/gzip
< content-disposition: attachment;filename="shifty.tar.gz"; filename*=UTF-8''shifty.tar.gz
< content-encoding: gzip
This heuristic strips the content-encoding, and Firefox saves an archive compressed with gzip twice.
Comment 25•4 years ago
|
||
Content-Type shouldn't affect Content-Encoding. All other browsers decode from transport first using Content-Encoding, then decide if how to download it (Open it with application X or just unzip or just store) according to Content-Type. Content-Encoding should never be ignored.
I've seen 3 bug reports about the side-effects of this 12-year old workaround reported in the last ~1 year. Isn't it better to mimic other browsers' behavior about the current behavior at this point?
In the worst case, when someone downloads a (Such old Apache servers already have multiple critical vulnerabilities, they should better be updated if they're not hacked already)..tar.gz
file from their archaic server they'll just download it as .tar
and not notice an issue
Currently, e.g. for Bug 1560525 we need to remove it as a workaround for modern misconfigured servers, which are far more important (remember webcompat?). The standard workaround is not to have a workaround. Let's rely on the default behavior!
Websites most likely won't fix themselves for only Firefox anyway... An alternative approach is to print a console warning when the workaround is triggered? Or even better, a warning when the workaround is triggered but Content-Encoding is not dropped? The latter both helps fixing the web and keeps things working. Currently, we do neither.
Assignee | ||
Comment 26•3 years ago
|
||
Other browsers don't do this and it causes compat issues.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 27•3 years ago
|
||
Pushed by ealvarez@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/246cfe5d5a05 Disable ClearBogusContentEncodingIfNeeded. r=necko-reviewers,valentin
Comment 28•3 years ago
|
||
bugherder |
Description
•