Closed Bug 1030660 Opened 10 years ago Closed 3 years ago

Is ClearBogusContentEncodingIfNeeded() needed?

Categories

(Core :: Networking, defect, P3)

x86
macOS
defect

Tracking

()

RESOLVED FIXED
94 Branch
Tracking Status
firefox94 --- fixed

People

(Reporter: annevk, Assigned: emilio)

References

Details

(Whiteboard: [necko-triaged])

Attachments

(1 file)

Newer Apache configurations reportedly do not require this hack.

Due to the broken patches of bug 542222 I cannot find the history of this method.
Luckily, this is actually CVS-era code.  ;)

This was added in bug 35956.  In particular, see bug 35956 comment 64.
It seems bug 35956 comment 120 goes to the heart of the issue.

Honza, we should either get our non-standard behavior in the network layer standardized, or remove it. Otherwise Servo will have a hard time competing with us.
Note that specifically for the save-to-file behavior what we actually do is completely ignore the Content-Type when deciding whether to decode. We just compare the URI filename extension, if any, to the Content-Encoding.  If the Content-Encoding matches the filename extension, we don't undo the content encoding; otherwise we undo it.

So if you send:

  Content-Type: text/html
  Content-Encoding: gzip

and the filename is "foo.html" then we'll gunzip when saving, but if it's "foo.html.gz" then we won't.
Any chance of getting telemetry on how often this actually happens?
Whiteboard: [necko-would-take]
I came across this bug from the Fetch spec. Chrome hasn't had this behavior for about two years now, neither the Content-Type one in the spec nor the filename one in comment #4. See https://chromium.googlesource.com/chromium/src.git/+/87b5f2ece37da37ef0b0a04da9aee538ef0f9c4b.

I also did a quick test and neither Safari nor IE11 (I don't currently have Edge on hand, but I assume the behavior is the same) have this quirk either. Since Firefox is the only browser that does this, probably it shouldn't be in the Fetch spec. Should I file a spec bug?

I've seen one Apache deployment with the issue, and the culprit was the AddEncoding sample here. As far as I can tell, that directive is pointless. What you actually want is for foo.html.gz to be served with Content-Encoding gzip when fetching foo.html. I think that was a different module altogether, though I forget the details.
https://httpd.apache.org/docs/2.4/mod/mod_mime.html#addencoding
Just to check...

Is it no longer the case that Apache serves "foo.gz" as

  Content-Type: application/x-gzip
  Content-encoding: gzip

by default?

As far as what browsers do, I'd want to see some explicit tests, not just code pointers.  There is various logic in Firefox around this in higher levels than necko that prevents decompression in some cases even if the Content-Encoding header _is_ present; I can't speak to other browsers.
I don't believe we have much in the way of special-cases here. That code link is it. For a test, here are two URLs for the same tarball. The first does not have Content-Encoding gzip, the second does.

https://davidben.scripts.mit.edu/cuttlefish.tar.gz
https://davidben.scripts.mit.edu/cuttlefish-2.tar.gz

In every browser I tested except Firefox, the second link is decompressed and the first link is not. Firefox leaves both compressed.
Just for the record, if I remove ClearBogusEncodingIfNeeded completely, that doesn't change Firefox behavior on the testcases linked in comment 8.

That's because even if the "Content-Encoding" header is present, for the specific case of saving Firefox will ignore it if that would produce garbage in the resulting local file (e.g. a non-gzipped tar archive with a .tar.gz extension).  Specifically, we map encodings to MIME types per http://searchfox.org/mozilla-central/source/netwerk/protocol/http/HttpBaseChannel.cpp#1251-1306 and then disable decompression if the extension and encoding-derived type match one of the pairs in http://searchfox.org/mozilla-central/rev/7cc377ce3f0a569c5c9b362d589705cd4fecc0ac/uriloader/exthandler/nsExternalHelperAppService.cpp#600-606

The fact that other UAs just happily save garbage in that situation is rather user-hostile.  :(
*shrug* I think this is just this usual debate:
https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00
Sort of.  Consider the case of a ".foo.gz" file on the server that the server sends with:

  Content-Type: application/foo
  Content-Encoding: gzip

with the browser not knowing what extension "application/foo" should map to.  This is perfectly correct server behavior, so Postel's Law doesn't even apply in this situation.

When the browser goes to save this, there are several options:

1)  Save as a ".foo.gz" file, save compressed data.
2)  Save as a ".foo.gz" file, save decompressed data.
3)  Save as a ".foo" file, save compressed data.
4)  Save as a ".foo" file, save decompressed data.

I will claim that options 2 and 3 are just totally broken behavior for a browser.  In practice browsers aim for #4 in cases when they know what extension corresponds to "application/foo", by appending the relevant extension to the filename.  But in cases in which they don't know, they can't very well do that.  They could try stripping the .gz extension, but that carries a risk of user confusion and dataloss too.

So in Firefox, after the "try to append an extension to make the filename match the actual type we are saving" bit, there is a check for whether our resulting extension still looks like something that triggered a content-encoding and a fallback to #1 to avoid ending up in #2.

It just happens that when "application/foo" is "application/x-gzip", that fallback path ends up triggering as well, because even through we know the corresponding extension it's already on the filename and hence the filename is unchanged.
I think the server should send what it means. I believe that example means (2). If the server wanted something else, it shouldn't send something that means (2). And indeed our metrics suggested servers generally don't, which is why we removed that special-case. It also matches other browsers.

If the server wants (1), it should:

  Host at .foo.gz (or use Content-Disposition)
  Content-Type: application/foo
  (No Content-Encoding)

If the server wants (4), it should:

  Host at .foo (or use Content-Disposition)
  Content-Type: application/foo
  Content-Encoding: gzip

(2) and (3) are indeed silly but, for all the browser knows, the server was configured to blindly wrap everything in a layer of gzip with Content-Encoding gzip, even a download that legitimately was meant to save as .foo.gz. Perhaps the decompressed data is itself a gzipped application/foo.

Working around server misbehaviors means we end up in the cycle of problems in that document. Given that all only one UA has this behavior and another UA used to and managed to remove it, we should take advantage of this rare opportunity to simplify the platform. Certainly I don't think behavior like this belongs in the Fetch spec.
> I think the server should send what it means.

It does, though!  For the case when application/foo should be rendered inline in the browser, the server is doing _exactly_ the right thing in my comment 11 example.  Consider the case when "application/foo" is "text/html", say.

The filename on the server should not affect whether a browser is or is not able to handle the content inline.

For what it's worth, I agree that the specifics of the behavior probably do not need to be in fetch, except insofar as higher layers should be able to instruct a fetch to not decompress based on examining the headers.
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P5
I created https://github.com/whatwg/fetch/pull/816 to remove the quirk from Fetch, sorry for the delay.

(In reply to Anne (:annevk) from comment #15)

I created https://github.com/whatwg/fetch/pull/816 to remove the quirk from
Fetch, sorry for the delay.

I guess this means the consensus is to remove ClearBogusContentEncodingIfNeeded() in our code.
Anne, do you know if there is any concern about removing this?

Flags: needinfo?(annevk)
See Also: → 1560525, 1597602

I would recommend unshipping this in EARLY_BETA_OR_EARLIER and then unshipping it completely in a follow-up release (or perhaps two releases later now that releases are more frequent). Please also communicate this in an Intent to unship. I would not expect much fallout as other browsers do not have this, but better to be on the safe side here.

Flags: needinfo?(annevk)

So just to be clear, the proposal here is:

  1. Not to change the higher-level "what filename do we pick when saving, and do we decompress when saving?" behavior.
  2. Remove the encoding-stripping in the necko code.

Is that correct?

Yes, this is only about 2, though I do think doing something different from other browsers is risky for 1, even if we're arguably more correct, as folks often stop trying the moment the thing they are deploying works in whatever browser they test in (which is likely a majority market share browser).

Sure, but are there cases where our behavior for 1 would make things worse than the other browsers in practice?

Priority: P5 → P2
Whiteboard: [necko-would-take] → [necko-triaged]

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #18)

So just to be clear, the proposal here is:

  1. Not to change the higher-level "what filename do we pick when saving, and do we decompress when saving?" behavior.
  2. Remove the encoding-stripping in the necko code.

Is that correct?

As said in comment 9, only doing 2 doesn't change firefox's behavior. Removing ClearBogusContentEncodingIfNeeded() only helps bug 1597602, but it's apparently that the issue in bug 1597602 is at server side.

After some thinking, I'm inclined to not change firefox's behavior.

Priority: P2 → P3

I've came across a case when this heuristic does things wrong.

My Mattermost installation compresses everything it sends out, even .tar.gz attachments. The response headers look like this:

< content-type: application/gzip
< content-disposition: attachment;filename="shifty.tar.gz"; filename*=UTF-8''shifty.tar.gz
< content-encoding: gzip

This heuristic strips the content-encoding, and Firefox saves an archive compressed with gzip twice.

Content-Type shouldn't affect Content-Encoding. All other browsers decode from transport first using Content-Encoding, then decide if how to download it (Open it with application X or just unzip or just store) according to Content-Type. Content-Encoding should never be ignored.

I've seen 3 bug reports about the side-effects of this 12-year old workaround reported in the last ~1 year. Isn't it better to mimic other browsers' behavior about the current behavior at this point?

In the worst case, when someone downloads a .tar.gz file from their archaic server they'll just download it as .tar and not notice an issue (Such old Apache servers already have multiple critical vulnerabilities, they should better be updated if they're not hacked already).

Currently, e.g. for Bug 1560525 we need to remove it as a workaround for modern misconfigured servers, which are far more important (remember webcompat?). The standard workaround is not to have a workaround. Let's rely on the default behavior!

Websites most likely won't fix themselves for only Firefox anyway... An alternative approach is to print a console warning when the workaround is triggered? Or even better, a warning when the workaround is triggered but Content-Encoding is not dropped? The latter both helps fixing the web and keeps things working. Currently, we do neither.

See Also: → 1733555

Other browsers don't do this and it causes compat issues.

Assignee: nobody → emilio
Status: NEW → ASSIGNED
Attachment #9243917 - Attachment description: Bug 1030660 - Remove ClearBogusContentEncodingIfNeeded. r=#necko-reviewers! → Bug 1030660 - Disable ClearBogusContentEncodingIfNeeded. r=#necko-reviewers!
Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/246cfe5d5a05
Disable ClearBogusContentEncodingIfNeeded. r=necko-reviewers,valentin
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 94 Branch
See Also: → 1742629
See Also: 1733555
See Also: 1597602
Blocks: 1842173
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: