.tar.gz files mangled on download by double compression
Categories
(Core :: Networking, defect, P2)
Tracking
()
People
(Reporter: RossBoylan, Unassigned)
References
Details
(Whiteboard: [necko-triaged][necko-priority-next])
Comment 1•6 years ago
|
||
Comment 2•6 years ago
|
||
Comment 3•6 years ago
|
||
Reporter | ||
Comment 4•6 years ago
|
||
Comment 5•6 years ago
|
||
Comment 6•6 years ago
|
||
Comment 7•6 years ago
|
||
Comment 8•6 years ago
|
||
Comment 9•6 years ago
|
||
Comment 10•6 years ago
|
||
Reporter | ||
Comment 11•6 years ago
|
||
Reporter | ||
Comment 12•6 years ago
|
||
Comment 13•5 years ago
|
||
I'm finding this same issue when I'm trying to download the Linux tar.gz file from here:
https://ootrandomizer.com/downloads
Only Firefox is downloading the tar.gz double gzipped, and you have to gunzip it before you can decompress and extract it with tar. Chrome is working fine for this, as did wget.
Using file I get this for the working downloads:
gzip compressed data, was "0.tar", last modified: Wed Oct 16 16:28:21 2019, max compression, from Unix, original size modulo 2^32 183767040
For the double gzipped files downloaded via Firefox:
gzip compressed data, original size modulo 2^32 66735979
Then after gunzipping once:
gzip compressed data, was "0.tar", last modified: Wed Oct 16 16:28:21 2019, max compression, from Unix, original size modulo 2^32 183767040
as expected.
Comment 14•5 years ago
|
||
I am experiencing this with 72.0.1 on Linux. It is really an issue. We have to change our server software to avoid gzip transport compression on files that are already gzip'd.
Comment 15•4 years ago
|
||
Still having this Problem with Linux on 79.0. I understand that it is kind of the fault of the webserver to gzip something that is already gzipped, but that should not create these kind of problems.
Comment 16•4 years ago
|
||
Ah, and Chromium handles it fine :-)
Comment 17•3 years ago
|
||
This is still occurring in Firefox 94.0. It has been confirmed by multiple people.
Comment 18•3 years ago
|
||
First of all a server should guess the content-type and content-encoding not only by file extension but by using a magic number found at the start of the file (see also: standard IANA MIME extensions: https://www.iana.org/assignments/media-types/media-types.xhtml).
Of course looking at the magic number at the start of each file is a thing that almost no web server does (because of performance reasons, etc.); instead this job is usually done by browsers.
If a server is sending a file *.tar.gz with the following HTTP headers, i.e.:
Content-Length: nnnnnn
Content-Type: application/gzip
Content-Encoding: gzip
or
Content-Length: nnnnnn
Content-Type: application/x-gzip
Content-Encoding: gzip
then the server is sending WRONG HTTP headers, because the first extension must be used for Content-Type and only the following extensions should be used for Content-Encoding, so, for a file *.tar.gz, it should use these HTTP headers:
Content-Length: nnnnnn
Content-Type: application/x-tar
Content-Encoding: gzip
If the file would be named: *.tar.gz.Z the HTTP headers should be:
Content-Length: nnnnnn
Content-Type: application/x-tar
Content-Encoding: gzip, compress
The content encodings must be listed in the order they were applied to the representation (file) (See also: RFC-7231 3.1.2.2. Content-Encoding).
If the file would be named *.gz (with only one extension), these HTTP headers should be used (without Content-Encoding):
Content-Length: nnnnnn
Content-Type: application/gzip
See also: mime types (https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types).
If the server would serve a *.tar file an it tried to compress the content on the fly and by doing this it would not know in advance the final length of the requested representation, it should use a compressed chunk encoding (for an HTTP/1.1 client), i.e.:
Content-Type: application/x-tar
Transfer-Encoding: gzip, chunked
without Content-Length and without Content-Encoding headers (see also: RFC-7230 3.3.1. Transfer-Encoding).
These kind of things are the basic rules that had to be applied since RFC-2616 (1999).
Nobody, who is not out of mind, would ever compress twice a file with the only result to get a bigger file.
HINT
I guess that Firefox should intercept these cases (bad content/encoding HTTP headers) and put out a warning for each case so even some web administrators could verify whether their servers are properly configured or not.
Comment 19•3 years ago
|
||
My conclusions are the following ones.
-
The best strategy when handling file names extensions and encodings should be to start from the end of the file name and then going backwards to get all known extensions associated with encodings (i.e. .gz, .Z, .uu, etc.); then pick the first known extension which is not an encoding, but, if there is no known extension, then there are two options:
A) if last known extension was a .gz (an encoding), web server can use:
Content-Type: application/x-gzip
or
B) Content-Type: application/octet-stream
Content-Encoding: gzip -
Firefox should never-ever transform the received web resource, when storing it to disk, unless there is a header "Transfer-Encoding" naming a compress transformation, i.e.: chunk, gzip / compress; right now it looks like that when receiving above mentioned HTTP headers, FF compresses received file before storing it to disk in order to make its content compliant with HTTP headers, but this is wrong because web-server may cheat on content-type and content-encoding and in any case content of file name should never be re-compressed by user-agent (browser).
Comment 20•3 years ago
|
||
Errata: in comment 18, the following sentence:
"If the file would be named: *.tar.gz.Z the HTTP headers should be":
Content-Length: nnnnnn
Content-Type: application/x-tar
Content-Encoding: gzip, compress
should be read as:
"If the file would be named: *.tar.gz.Z the HTTP headers should be":
Content-Length: nnnnnn
Content-Type: application/x-tar
Content-Encoding: gzip
Comment 21•3 years ago
|
||
Actually we had the same problem over here.
"The web page owner reported: "My CentOS server has gzip compression set. That's a fairly basic option. Looks like firefox's down loader is too dumb to realize the server is using compression. I added a line to the htaccess file to disable gzip in the dev directory."
This indeed resolved the problem we were discussing.
:D
Comment 22•3 years ago
|
||
(In reply to burdi02 from comment #21)
Actually we had the same problem over here today.
"The web page owner reported: "My CentOS server has gzip compression set. That's a fairly basic option. Looks like firefox's down loader is too dumb to realize the server is using compression. I added a line to the htaccess file to disable gzip in the dev directory."
This indeed resolved the problem we were discussing.
:D
Comment 23•3 years ago
|
||
<removed>
Updated•2 years ago
|
Comment 24•2 years ago
|
||
it happend to me it's still exist in 112.0 (64-bit)
steps to reproduce
try to download this file from wget and from Firefox it's still double compress it
it would be appreciated if you could fix this bug
Comment 25•2 years ago
|
||
sorry i forget to add the link
https://www.hdsentinel.com/hdslin/hdsentinel-019b.gz
(In reply to 3409769 from comment #24)
it happend to me it's still exist in 112.0 (64-bit)
steps to reproduce
try to download this file from wget and from Firefox it's still double compress it
it would be appreciated if you could fix this bug
Comment 26•2 years ago
|
||
Fixing this bug will regress bug 35956 and Chrome has that bug. I uploaded the same file to my server: https://emk.name/test/hdsentinel-019b.gz
Chrome will download and decompress this file, but it does not remove the .gz extension. That is exactly the problem we fixed in bug 35956.
We should either
- WONTFIX this bug, or
- fix this bug and WONTFIX bug 35956.
It would be product owner's decision.
Comment 27•2 years ago
|
||
i see could we make decompression optional in setting and make the default behaviour to not decompress gz files i think that's should fix both bug as i understand i not experienced feel free to correct me
(In reply to Masatoshi Kimura [:emk] from comment #26)
Fixing this bug will regress bug 35956 and Chrome has that bug. I uploaded the same file to my server: https://emk.name/test/hdsentinel-019b.gz
Chrome will download and decompress this file, but it does not remove the .gz extension. That is exactly the problem we fixed in bug 35956.We should either
- WONTFIX this bug, or
- fix this bug and WONTFIX bug 35956.
It would be product owner's decision.
Comment 29•1 year ago
|
||
Given that we got so many bug reports and comments, I think we might want to give this a higher priority.
Updated•1 year ago
|
Comment 31•1 year ago
|
||
Comment 32•1 year ago
|
||
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #31)
Filed https://bugs.chromium.org/p/chromium/issues/detail?id=1473207
That got duped and then https://bugs.chromium.org/p/chromium/issues/detail?id=1484221 got wontfixed. Does this mean we should adjust necko/gecko's behaviour to match chrome/wget?
Comment 33•1 year ago
|
||
Dup of bug 610679?
Comment 34•1 year ago
|
||
(In reply to Vincent Lefevre from comment #33)
Dup of bug 610679?
Probably but there is useful context here that is not there and vice versa. Let's see what Jesup/Kershaw want to do.
Comment 35•1 year ago
|
||
(In reply to :Gijs (he/him) from comment #32)
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #31)
Filed https://bugs.chromium.org/p/chromium/issues/detail?id=1473207
That got duped and then https://bugs.chromium.org/p/chromium/issues/detail?id=1484221 got wontfixed. Does this mean we should adjust necko/gecko's behaviour to match chrome/wget?
Yes, I think we should match chrome's behavior.
Comment 36•9 months ago
|
||
We should put this change behind a pref and see if we can land this sometime soon.
Updated•7 months ago
|
Description
•