Note: There are a few cases of duplicates in user autocompletion which are being worked on.

Is ClearBogusContentEncodingIfNeeded() needed?

NEW
Unassigned

Status

()

Core
Networking
3 years ago
a month ago

People

(Reporter: annevk, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [necko-would-take])

(Reporter)

Description

3 years ago
Newer Apache configurations reportedly do not require this hack.

Due to the broken patches of bug 542222 I cannot find the history of this method.
Introduced here http://hg.mozilla.org/mozilla-central/diff/1238046c4cce/netwerk/protocol/http/src/nsHttpChannel.cpp.

Why do you believe we need to remove it?

Comment 2

3 years ago
Luckily, this is actually CVS-era code.  ;)

This was added in bug 35956.  In particular, see bug 35956 comment 64.
(Reporter)

Comment 3

3 years ago
It seems bug comment 120 goes to the heart of the issue.

Honza, we should either get our non-standard behavior in the network layer standardized, or remove it. Otherwise Servo will have a hard time competing with us.

Comment 4

3 years ago
Note that specifically for the save-to-file behavior what we actually do is completely ignore the Content-Type when deciding whether to decode. We just compare the URI filename extension, if any, to the Content-Encoding.  If the Content-Encoding matches the filename extension, we don't undo the content encoding; otherwise we undo it.

So if you send:

  Content-Type: text/html
  Content-Encoding: gzip

and the filename is "foo.html" then we'll gunzip when saving, but if it's "foo.html.gz" then we won't.

Comment 5

2 years ago
Any chance of getting telemetry on how often this actually happens?
Whiteboard: [necko-would-take]

Comment 6

a month ago
I came across this bug from the Fetch spec. Chrome hasn't had this behavior for about two years now, neither the Content-Type one in the spec nor the filename one in comment #4. See https://chromium.googlesource.com/chromium/src.git/+/87b5f2ece37da37ef0b0a04da9aee538ef0f9c4b.

I also did a quick test and neither Safari nor IE11 (I don't currently have Edge on hand, but I assume the behavior is the same) have this quirk either. Since Firefox is the only browser that does this, probably it shouldn't be in the Fetch spec. Should I file a spec bug?

I've seen one Apache deployment with the issue, and the culprit was the AddEncoding sample here. As far as I can tell, that directive is pointless. What you actually want is for foo.html.gz to be served with Content-Encoding gzip when fetching foo.html. I think that was a different module altogether, though I forget the details.
https://httpd.apache.org/docs/2.4/mod/mod_mime.html#addencoding

Comment 7

a month ago
Just to check...

Is it no longer the case that Apache serves "foo.gz" as

  Content-Type: application/x-gzip
  Content-encoding: gzip

by default?

As far as what browsers do, I'd want to see some explicit tests, not just code pointers.  There is various logic in Firefox around this in higher levels than necko that prevents decompression in some cases even if the Content-Encoding header _is_ present; I can't speak to other browsers.

Comment 8

a month ago
I don't believe we have much in the way of special-cases here. That code link is it. For a test, here are two URLs for the same tarball. The first does not have Content-Encoding gzip, the second does.

https://davidben.scripts.mit.edu/cuttlefish.tar.gz
https://davidben.scripts.mit.edu/cuttlefish-2.tar.gz

In every browser I tested except Firefox, the second link is decompressed and the first link is not. Firefox leaves both compressed.

Comment 9

a month ago
Just for the record, if I remove ClearBogusEncodingIfNeeded completely, that doesn't change Firefox behavior on the testcases linked in comment 8.

That's because even if the "Content-Encoding" header is present, for the specific case of saving Firefox will ignore it if that would produce garbage in the resulting local file (e.g. a non-gzipped tar archive with a .tar.gz extension).  Specifically, we map encodings to MIME types per http://searchfox.org/mozilla-central/source/netwerk/protocol/http/HttpBaseChannel.cpp#1251-1306 and then disable decompression if the extension and encoding-derived type match one of the pairs in http://searchfox.org/mozilla-central/rev/7cc377ce3f0a569c5c9b362d589705cd4fecc0ac/uriloader/exthandler/nsExternalHelperAppService.cpp#600-606

The fact that other UAs just happily save garbage in that situation is rather user-hostile.  :(

Comment 10

a month ago
*shrug* I think this is just this usual debate:
https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00

Comment 11

a month ago
Sort of.  Consider the case of a ".foo.gz" file on the server that the server sends with:

  Content-Type: application/foo
  Content-Encoding: gzip

with the browser not knowing what extension "application/foo" should map to.  This is perfectly correct server behavior, so Postel's Law doesn't even apply in this situation.

When the browser goes to save this, there are several options:

1)  Save as a ".foo.gz" file, save compressed data.
2)  Save as a ".foo.gz" file, save decompressed data.
3)  Save as a ".foo" file, save compressed data.
4)  Save as a ".foo" file, save decompressed data.

I will claim that options 2 and 3 are just totally broken behavior for a browser.  In practice browsers aim for #4 in cases when they know what extension corresponds to "application/foo", by appending the relevant extension to the filename.  But in cases in which they don't know, they can't very well do that.  They could try stripping the .gz extension, but that carries a risk of user confusion and dataloss too.

So in Firefox, after the "try to append an extension to make the filename match the actual type we are saving" bit, there is a check for whether our resulting extension still looks like something that triggered a content-encoding and a fallback to #1 to avoid ending up in #2.

It just happens that when "application/foo" is "application/x-gzip", that fallback path ends up triggering as well, because even through we know the corresponding extension it's already on the filename and hence the filename is unchanged.

Comment 12

a month ago
I think the server should send what it means. I believe that example means (2). If the server wanted something else, it shouldn't send something that means (2). And indeed our metrics suggested servers generally don't, which is why we removed that special-case. It also matches other browsers.

If the server wants (1), it should:

  Host at .foo.gz (or use Content-Disposition)
  Content-Type: application/foo
  (No Content-Encoding)

If the server wants (4), it should:

  Host at .foo (or use Content-Disposition)
  Content-Type: application/foo
  Content-Encoding: gzip

(2) and (3) are indeed silly but, for all the browser knows, the server was configured to blindly wrap everything in a layer of gzip with Content-Encoding gzip, even a download that legitimately was meant to save as .foo.gz. Perhaps the decompressed data is itself a gzipped application/foo.

Working around server misbehaviors means we end up in the cycle of problems in that document. Given that all only one UA has this behavior and another UA used to and managed to remove it, we should take advantage of this rare opportunity to simplify the platform. Certainly I don't think behavior like this belongs in the Fetch spec.

Comment 13

a month ago
> I think the server should send what it means.

It does, though!  For the case when application/foo should be rendered inline in the browser, the server is doing _exactly_ the right thing in my comment 11 example.  Consider the case when "application/foo" is "text/html", say.

The filename on the server should not affect whether a browser is or is not able to handle the content inline.

For what it's worth, I agree that the specifics of the behavior probably do not need to be in fetch, except insofar as higher layers should be able to instruct a fetch to not decompress based on examining the headers.
You need to log in before you can comment on or make changes to this bug.