Newer Apache configurations reportedly do not require this hack. Due to the broken patches of bug 542222 I cannot find the history of this method.
Introduced here http://hg.mozilla.org/mozilla-central/diff/1238046c4cce/netwerk/protocol/http/src/nsHttpChannel.cpp. Why do you believe we need to remove it?
Luckily, this is actually CVS-era code. ;) This was added in bug 35956. In particular, see bug 35956 comment 64.
It seems bug comment 120 goes to the heart of the issue. Honza, we should either get our non-standard behavior in the network layer standardized, or remove it. Otherwise Servo will have a hard time competing with us.
Note that specifically for the save-to-file behavior what we actually do is completely ignore the Content-Type when deciding whether to decode. We just compare the URI filename extension, if any, to the Content-Encoding. If the Content-Encoding matches the filename extension, we don't undo the content encoding; otherwise we undo it. So if you send: Content-Type: text/html Content-Encoding: gzip and the filename is "foo.html" then we'll gunzip when saving, but if it's "foo.html.gz" then we won't.
Any chance of getting telemetry on how often this actually happens?
I came across this bug from the Fetch spec. Chrome hasn't had this behavior for about two years now, neither the Content-Type one in the spec nor the filename one in comment #4. See https://chromium.googlesource.com/chromium/src.git/+/87b5f2ece37da37ef0b0a04da9aee538ef0f9c4b. I also did a quick test and neither Safari nor IE11 (I don't currently have Edge on hand, but I assume the behavior is the same) have this quirk either. Since Firefox is the only browser that does this, probably it shouldn't be in the Fetch spec. Should I file a spec bug? I've seen one Apache deployment with the issue, and the culprit was the AddEncoding sample here. As far as I can tell, that directive is pointless. What you actually want is for foo.html.gz to be served with Content-Encoding gzip when fetching foo.html. I think that was a different module altogether, though I forget the details. https://httpd.apache.org/docs/2.4/mod/mod_mime.html#addencoding
Just to check... Is it no longer the case that Apache serves "foo.gz" as Content-Type: application/x-gzip Content-encoding: gzip by default? As far as what browsers do, I'd want to see some explicit tests, not just code pointers. There is various logic in Firefox around this in higher levels than necko that prevents decompression in some cases even if the Content-Encoding header _is_ present; I can't speak to other browsers.
I don't believe we have much in the way of special-cases here. That code link is it. For a test, here are two URLs for the same tarball. The first does not have Content-Encoding gzip, the second does. https://davidben.scripts.mit.edu/cuttlefish.tar.gz https://davidben.scripts.mit.edu/cuttlefish-2.tar.gz In every browser I tested except Firefox, the second link is decompressed and the first link is not. Firefox leaves both compressed.
Just for the record, if I remove ClearBogusEncodingIfNeeded completely, that doesn't change Firefox behavior on the testcases linked in comment 8. That's because even if the "Content-Encoding" header is present, for the specific case of saving Firefox will ignore it if that would produce garbage in the resulting local file (e.g. a non-gzipped tar archive with a .tar.gz extension). Specifically, we map encodings to MIME types per http://searchfox.org/mozilla-central/source/netwerk/protocol/http/HttpBaseChannel.cpp#1251-1306 and then disable decompression if the extension and encoding-derived type match one of the pairs in http://searchfox.org/mozilla-central/rev/7cc377ce3f0a569c5c9b362d589705cd4fecc0ac/uriloader/exthandler/nsExternalHelperAppService.cpp#600-606 The fact that other UAs just happily save garbage in that situation is rather user-hostile. :(
*shrug* I think this is just this usual debate: https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00
Sort of. Consider the case of a ".foo.gz" file on the server that the server sends with: Content-Type: application/foo Content-Encoding: gzip with the browser not knowing what extension "application/foo" should map to. This is perfectly correct server behavior, so Postel's Law doesn't even apply in this situation. When the browser goes to save this, there are several options: 1) Save as a ".foo.gz" file, save compressed data. 2) Save as a ".foo.gz" file, save decompressed data. 3) Save as a ".foo" file, save compressed data. 4) Save as a ".foo" file, save decompressed data. I will claim that options 2 and 3 are just totally broken behavior for a browser. In practice browsers aim for #4 in cases when they know what extension corresponds to "application/foo", by appending the relevant extension to the filename. But in cases in which they don't know, they can't very well do that. They could try stripping the .gz extension, but that carries a risk of user confusion and dataloss too. So in Firefox, after the "try to append an extension to make the filename match the actual type we are saving" bit, there is a check for whether our resulting extension still looks like something that triggered a content-encoding and a fallback to #1 to avoid ending up in #2. It just happens that when "application/foo" is "application/x-gzip", that fallback path ends up triggering as well, because even through we know the corresponding extension it's already on the filename and hence the filename is unchanged.
I think the server should send what it means. I believe that example means (2). If the server wanted something else, it shouldn't send something that means (2). And indeed our metrics suggested servers generally don't, which is why we removed that special-case. It also matches other browsers. If the server wants (1), it should: Host at .foo.gz (or use Content-Disposition) Content-Type: application/foo (No Content-Encoding) If the server wants (4), it should: Host at .foo (or use Content-Disposition) Content-Type: application/foo Content-Encoding: gzip (2) and (3) are indeed silly but, for all the browser knows, the server was configured to blindly wrap everything in a layer of gzip with Content-Encoding gzip, even a download that legitimately was meant to save as .foo.gz. Perhaps the decompressed data is itself a gzipped application/foo. Working around server misbehaviors means we end up in the cycle of problems in that document. Given that all only one UA has this behavior and another UA used to and managed to remove it, we should take advantage of this rare opportunity to simplify the platform. Certainly I don't think behavior like this belongs in the Fetch spec.
> I think the server should send what it means. It does, though! For the case when application/foo should be rendered inline in the browser, the server is doing _exactly_ the right thing in my comment 11 example. Consider the case when "application/foo" is "text/html", say. The filename on the server should not affect whether a browser is or is not able to handle the content inline. For what it's worth, I agree that the specifics of the behavior probably do not need to be in fetch, except insofar as higher layers should be able to instruct a fetch to not decompress based on examining the headers.