The default bug view has changed. See this FAQ.

Tarball is double gzipped when downloaded with Firefox

NEW
Unassigned

Status

()

Firefox
File Handling
7 years ago
14 days ago

People

(Reporter: Matěj Cepl, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(4 attachments)

(Reporter)

Description

7 years ago
When I download file http://rrakus.fedorapeople.org/fmci-final.tgz with Firefox I get double gzipped file:

jakoubek:~ $ cd
jakoubek:~ $ ls fmci-final.t*
fmci-final.tar  fmci-final.tgz
jakoubek:~ $ rm fmci-final.tar 
jakoubek:~ $ 
jakoubek:~ $ ls -lh fmci-final.tgz 
-rw-rw-r--. 1 matej matej 2,8K  9. lis 17.19 fmci-final.tgz
jakoubek:~ $ file fmci-final.tgz 
fmci-final.tgz: gzip compressed data, from Unix
jakoubek:~ $ gunzip fmci-final.tgz 
jakoubek:~ $ ls -lh fmci-final.t*
-rw-rw-r--. 1 matej matej 2,8K  9. lis 17.19 fmci-final.tar
jakoubek:~ $ file fmci-final.tar 
fmci-final.tar: gzip compressed data, from Unix, last modified: Mon Nov  8 19:44:39 2010
jakoubek:~ $ mv fmci-final.tar fmci-final.tar.gz
jakoubek:~ $ gunzip fmci-final.tar.gz 
jakoubek:~ $ file fmci-final.tar 
fmci-final.tar: POSIX tar archive (GNU)
jakoubek:~ $ tar tvf fmci-final.tar 
drwxrwxr-x rrakus/rrakus     0 2010-11-08 19:44 fmci-final/
-rw-rw-r-- rrakus/rrakus  1602 2010-11-08 19:21 fmci-final/fmci-server.h
-rw-rw-r-- rrakus/rrakus  1008 2010-11-08 18:15 fmci-final/fmci.h
-rw-r--r-- rrakus/rrakus   691 2010-11-08 19:39 fmci-final/org.fedoraproject.fmci.conf
-rw-rw-r-- rrakus/rrakus  1443 2010-11-08 18:40 fmci-final/Makefile
-rw-rw-r-- rrakus/rrakus   326 2010-11-08 19:20 fmci-final/fmci.xml
-rw-rw-r-- rrakus/rrakus  1954 2010-11-08 19:32 fmci-final/fmci-client.c
-rw-rw-r-- rrakus/rrakus  2605 2010-11-08 19:34 fmci-final/fmci-server.c
jakoubek:~ $ 

When downloading with curl I get correctly once gzipped tarball as expected:

jakoubek:~ $ curl -I http://rrakus.fedorapeople.org/fmci-final.tgzHTTP/1.1 200 OK
Date: Tue, 09 Nov 2010 16:21:16 GMT
Server: Apache/2.2.3
Last-Modified: Mon, 08 Nov 2010 18:45:13 GMT
ETag: "b15-4948f06410040"
Accept-Ranges: bytes
Content-Length: 2837
Vary: Accept-Encoding,User-Agent
Connection: close
Content-Type: application/x-gzip

jakoubek:~ $ rm fmci-final.tar 
jakoubek:~ $ curl -L -O http://rrakus.fedorapeople.org/fmci-final.tgz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2837  100  2837    0     0   8929      0 --:--:-- --:--:-- --:--:-- 20860
jakoubek:~ $ file fmci-final.t*
fmci-final.tgz: gzip compressed data, from Unix, last modified: Mon Nov  8 19:44:39 2010
jakoubek:~ $ gunzip fmci-final.tgz 
jakoubek:~ $ file fmci-final.t*
fmci-final.tar: POSIX tar archive (GNU)
jakoubek:~ $ tar tvf fmci-final.tar 
drwxrwxr-x rrakus/rrakus     0 2010-11-08 19:44 fmci-final/
-rw-rw-r-- rrakus/rrakus  1602 2010-11-08 19:21 fmci-final/fmci-server.h
-rw-rw-r-- rrakus/rrakus  1008 2010-11-08 18:15 fmci-final/fmci.h
-rw-r--r-- rrakus/rrakus   691 2010-11-08 19:39 fmci-final/org.fedoraproject.fmci.conf
-rw-rw-r-- rrakus/rrakus  1443 2010-11-08 18:40 fmci-final/Makefile
-rw-rw-r-- rrakus/rrakus   326 2010-11-08 19:20 fmci-final/fmci.xml
-rw-rw-r-- rrakus/rrakus  1954 2010-11-08 19:32 fmci-final/fmci-client.c
-rw-rw-r-- rrakus/rrakus  2605 2010-11-08 19:34 fmci-final/fmci-server.c
jakoubek:~ $ 

Why shouldn't Firefox just download the file and doesn't let it be (especially considering there is no Content-Encoding header whatsoever)?

The file should stay in the same URL for some time, so there shouldn't be a problem to reproduce this at will.

Using Mozilla/5.0 (X11; U; Linux x86_64; cs-CZ; rv:1.9.2.12) Gecko/20101027 Fedora/3.6.12-1.fc14 Firefox/3.6.12
(Reporter)

Comment 1

7 years ago
Created attachment 489186 [details]
screenshot of httpFox

Hmm, so there IS a HTTP headers munging based on User-Agent. Still, Firefox shouldn't screw this up IMHO.

Comment 2

6 years ago
I do have exactly the same problem in FF 5.0, it's extremely and I have found no reasonable workaround. It is not triggered just by a mime-type, a filename matters too, depending on the URL format.

1) http://www.example.com/FILENAME

FILENAME   MIME-TYPE
=============================================================
test.bin + application/octet-stream => OK
test.bin + application/gzip => gzipped (shouldn't be)
test.tgz + application/octet-stream => gzipped (shouldn't be)
test.tgz + application/gzip => gzipped (shouldn't be)

So there's no way to have 'nice' URLs and correct behavior.

2) http://www.example.com/download.php?filename=FILENAME

FILENAME   MIME-TYPE
=============================================================
test.bin + application/octet-stream => OK
test.bin + application/gzip => gzipped (shouldn't be)
test.tgz + application/octet-stream => OK
test.tgz + application/gzip => gzipped (shouldn't be)

In this case it's driven just by mime-type, because the filename is not part of the URL. It's possible to set the filename with content-disposition header, but the URLs remain ugly.

I see no reason why FF 5 behaves like this. And I haven't noticed this on Windows or 32bit Linux (right now I'm on a new amd64 box).

Comment 3

6 years ago
After a bit more testing I've discovered this is somehow connected to automatic compression of the data transferred to the client.

For example in PHP there's a ob_gzhandler that automatically compresses the data sent to the client (whenever the client accepts gzip encoded data in http accept-encoding header) and most servers have this enabled by default. If I disable output buffering by adding these two lines to the .htaccess

php_flag output_buffering false
pgp_value output_handler NULL

then everything works fine (i.e. firefox does not gzip again the data), but it obviously prevents transparent compression and thus more data needs to be transferred.

Comment 4

5 years ago
Hi Matej,
I am experiencing a similar bug with http://release.debian.org/britney/update_output.txt.gz
However, that is not a tarball and I can't reproduce your specific problem with Iceweasel 10.0.3.

It seems the server has changed though, it's now running Apache 2.2.15, and no longer compresses the (already compressed) file. I also tried putting the file on my local Apache 2.2.22 and couldn't reproduce, but mine is also not (re-)compressing the file, even though mod_deflate is enabled. It looks like Apache was fixed to avoid compressing tgz-s, since they're already compressed, so Firefox may still be buggy, but Apache no longer exposes the bug. Apache is still compressing .txt.gz-s, though.

Is anyone able to reproduce this with a current Firefox? And if so, can you provide a new test URL?

Matej, note that you can use mozilla-livehttpheaders to get HTTP headers. That extension allows copy-pasting.
Also, cURL does not request compressed replies by default. So I don't think you're seeing "HTTP headers munging based on User-Agent", but rather a difference in replies because Firefox sends Accept-Encoding while cURL does not, unless invoked with --compressed.

Comment 5

5 years ago
I can reproduce using Iceweasel 10 with Matej's file by renaming it to fmci-final.tar.gz and configuring Apache to compress tarballs. I can do this in Debian by adding
AddOutputFilterByType DEFLATE application/x-tar
to /etc/apache2/mods-available/deflate.conf

I hit what I think is this bug on http://release.debian.org/britney/ with update_output.txt.gz. The same problem also affects update_excuses.html.gz. The server is Apache. I was able to reproduce this using my system as the server and a random .txt.gz found locally (a README). The file I use to test is faq.txt.gz and has MD5 sum 3d9b4fbacb5c1c4b6fc6930ad02abd28. When I download it and hit this bug, the double-gzipped version I get has MD5 sum 4b978d76df448f894fcf427f86d74674. I hit this bug when downloading from my local httpd, Apache 2.2.22 on Debian testing, with a highly stock configuration.

Amazingly, all major browsers have the same problem. Chromium has it (see http://code.google.com/p/chromium/issues/detail?id=47951 ), Konqueror 4.7.4, Opera 11.62, Safari 5.1.4 and Internet Explorer 8. Even wget has the problem, in a sense (see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=150766 ). The only browser which doesn't have this problem that I found is NetSurf (tested on version 2.8). Otherwise, curl doesn't have the problem.


After a good dose of testing the problem as it exists in Firefox, I think I can offer a good description of what the problem may be. Simply, when saving a file served with gzip Content-Encoding and application/x-gzip Content-Type, the file is saved compressed (apparently, Firefox does not decompress the content).

In practice, this only happens when a gzipped file is gzipped during transfer, and this is understandably rare. It's a performance bug that Apache gzips a gzipped file, but it happens in a few situations, as in the case of a gzipped text file. Apache's content coding is managed by mod_deflate. mod_deflate is enabled by default, but only for some MIME types for which compression is known to be efficient, for example text files. The AddOutputFilterByType directive is used to tell mod_deflate to compress some MIME types. The default configuration (deflate.conf) contains:
          AddOutputFilterByType DEFLATE text/html text/plain text/xml
It is arguably an Apache bug that problematic files (.txt.gz) are considered as text/plain, since they're compressed, but this is what happens, and explains the server's behavior.

I verified that the bug is that basic by getting a text file served with Content-Encoding gzip and with Content-Type application/x-gzip. Firefox saved the file gzipped. Of course, Apache doesn't normally serve text files with Content-Type application/x-gzip. To achieve that, I renamed faq.txt to faq.foo and modified Apache's configuration. On my Debian, I added 
AddType application/x-gzip .foo
to /etc/apache2/mods-enabled/mime.conf and
	  AddOutputFilterByType DEFLATE application/x-gzip
to /etc/apache2/mods-enabled/deflate.conf. The bug does not happen if application/x-foo is used instead.

The problem does not only happen with Apache, it also happened when I tried with lighttpd, but not by default. By default, lighttpd would send Content-Type application/octet-stream IIRC. After some hacking in lighttpd.conf (compress.filetype           = ( "application/javascript", "text/css", "text/html", "text/plain", "application/x-gzip" )) and /usr/share/lighttpd/create-mime.assign.pl, I obtained the same problem. I added the following just before the last line of /usr/share/lighttpd/create-mime.assign.pl :
print '      ".tar.gz" => "application/x-tgz",';
print "\n";
print '      ".gz" => "application/x-gzip",';
print "\n";

This can be worked around client-side by setting network.http.accept-encoding to "dummy". This will modify the Accept-Encoding header of requests, and the server won't use content coding.

In Matej's case, I am not sure why Apache compressed the tarball. It may have been a local configuration.

Comment 6

5 years ago
Created attachment 618037 [details]
A compressed readme which is likely to be affected by this

Comment 7

5 years ago
This bug can be reproduced with Iceweasel 10.0.7 (Debian) and Firefox 13 under Fedora 16 on INRIAGForge when browsing the Subversion repository and downloading a .tar.gz file from it: https://gforge.inria.fr/tracker/index.php?func=detail&aid=14579&group_id=1&atid=110 (in French).

It seems to be a consequence of bug 35956, which has not properly been fixed (Mozilla should do something smarter than what suggested in bug 35956 comment 84, i.e. looking at the URL is not sufficient).

This bug should be fixed together with bug 233047 (a .gz file is downloaded, saved as .gz, but gunzipped though it was gzipped only once), which is actually some kind of opposite bug.

Comment 8

4 years ago
I could reproduce it by trying to download a «.txt.gz» with Firefox 16.0.2 (Ubuntu x86_64), for example any of these gzip:

http://lists.openstreetmap.org/pipermail/talk-ja/

If I use Download Helper, «Accept-Encoding: gzip, deflate» is not sent and the received file is correct. The fun part is that Chromium has the same bug :-)

Comment 9

4 years ago
Can not reproduce this error on FF19 on arch linux and FF 18.0.2 on mac.

Comment 10

4 years ago
I see this with http://www.trondeau.com/storage/tutorial/mpsk_scripts_3_6.tar.gz and firefox 22 on ubuntu.

Comment 11

4 years ago
The server behind the file linked by Timo Lindfors is reporting as SSWS.

I still hit this with Iceweasel 17. However, there has been an interesting evolution - Apache apparently no longer double-gzips, at least in the case of text files. Downloading the update_output.txt.gz file from a local Apache 2.4.6 on my fresh Debian jessie install, I don't get any Content-Encoding, so the problem is not exposed. I don't know if this was the result of an intentional bugfix. The problem can still be seen on the release.debian.org URL above, which must run Debian 7 (Apache 2.2). Presumably, Apache's bug was fixed in 2.4.

Comment 12

3 years ago
I'm seeing similar behavior with both Firefox 33 and Seamonkey 2.30 on Windows, on downloads of .gz archives from a private Mailman server.

Tweaking .htaccess settings (as suggested above) fixes the problem on the server, but it's still a Mozilla platform thing that needs attention.

Comment 13

2 years ago
The equivalent Chromium bug is reportedly fixed in Chrome 43: https://code.google.com/p/chromium/issues/detail?id=268085

Comment 14

2 years ago
I stumbled into this problem today when I downloaded a mailman list archive with extension archName.txt.gz with firefox ESR 38.2.1 on a linux box. 
It downloaded the archive double zipped.
So unzipping it created a file archName.txt which was again a gzip compressed file.

What puzzles me is that the same firefox version ESR 38.2.1 on a windows 7 x64 host does not show the problem.
I downloaded the same archName.txt.gz as on the linux box but it was gzipped only once. 

So even if the mailman list server (ver 2.1.14) does something wrong with the header, why does the same verison of firefox acts differently?

Comment 15

2 years ago
motz, could you clarify the results you get? Do you have Firefox working fine on both Windows 7 and Linux, or only on one of these?
Flags: needinfo?(bugzilla)

Comment 16

2 years ago
Firefox is working fine on both Linux and Windows and is used frequently.
I have never any problem like this before although I must admit, that I rarely downloaded gzipped files on Linux.
The problem came to my knowledge by an internal customer (I work in IT).
He reported to me that a .txt.gz that he saved to his hard drive and unpacked still did not open correctly.
So I used my FF on windows to try it myself and there was no problem. I downloaded the very same file, unpacked it and the text was right there.
So we were puzzled.
I then used my ff on Linux and I had the same problem as the customer.
As the unpack did not give any errors at unpacking it was a valid 'gz' file. 
So I ran a 'file ...txt' on it and it said it was still a 'gz'  
Then I found this bug description here which origins surprisingly old, but still seems to be not solved.
Flags: needinfo?(bugzilla)

Comment 17

2 years ago
motz, I am sorry but I do not understand well. What Firefox does your internal customer use?
And does Firefox on your Linux install work correctly or not?

Comment 18

2 years ago
As I wrote: Firefox is working fine on Linux and Windows.
Both Linux and Windows are on Firefox ESR 38.2.1.
My customer uses Firefox ESR 38.2.1 on Linux only, I use Firefox on both platforms.
The tests were all done with the same ..txt.gz file from the internal mailman server.
There are differences in the addons though.

Comment 19

2 years ago
Timo's link is now broken.
lists.openstreetmap.org is now running Apache 2.4 and no longer exposes the bug.
Vincent Lefebvre link's seems broken - it does not cause a download, and I see no attachment in that ticket.
release.debian.org is presumably running Apache 2.4 now - it no longer compresses the file.

That seems to leave us with no readily available test case.

I tested on a local server using the recipe explained in comment 5 and the file in comment 6, and this persists, both on Iceweasel 38.2.0 and on Firefox 40.0.3 on Windows 8.

Comment 20

2 years ago
It still happens with my webserver:
http://www.jankratochvil.net/project/mdsms/dist/mdsms-1.5.3.tar.gz
md5sum of the file on server is:
3388f1d032b8d19e26c8d7d67f614675  mdsms-1.5.3.tar.gz
wget downloads it fine, firefox-40.0.3-1.fc21.x86_64 downloads it as double-gzipped.
It is apparently due to some custom .htaccess settings there, in a different/clean virtualserver on the same machine it does not happen.  I did not investigate exactly which .htaccess setting.
I do not say whether it is a Mozilla bug or I have misconfigured webserver, one should check the headers.

Comment 21

2 years ago
I just checked the link from jankratochvil with ff 38.2.1 esr on my home windows and it downloaded fine, i. e. no double gzip.
I tried with enabled add-ons and with safe mode and I also used ie 11 which also workde fine.
I cannot test with linux at the moment but will do on monday when I'm back in the company.

Comment 22

2 years ago
I also checked the link from jankratochvil both Firefox 40.0.3 and 42.0a2 (2015-09-11) on Linux, and I had the double-gzip.
In "about:config", if I set "network.http.accept-encoding" to blank, then I don't have the double gzip. If I used the addons "Downthemall", it does not send the header "Accept-Encoding" and I don't get the double encoding.

motz (or anyone else), could you test on Windows with the Firefox dev tools (or Firebug, or httpfox), in the tab "network" if the header "Accept-Encoding" is sent by Firefox, and if the header 'Content-Encoding:"gzip"' is sent by the server in return?

Comment 23

2 years ago
Thank you Jan. I get $ md5sum mdsms-1.5.3.tar.gz 
2ed8c99a32c0c33ebe0d937a86a1c12b  mdsms-1.5.3.tar.gz

This is both a Mozilla bug and a problem with your webserver. For performance, your server should not recompress the file. But Mozilla should decompress it if your server recompresses it.

I agree with Fabimaru.

Comment 24

2 years ago
Interesting that I see the same double-gzip problem with epiphany-3.14.2-5.fc21.x86_64:
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/601.1 (KHTML, like Gecko) Version/8.0 Safari/601.1 Epiphany/3.14.2

Comment 25

2 years ago
Jan, I described the situation with other browsers in comment 6. By the way, I can confirm comment 13 - I cannot reproduce with Chromium 44.

Comment 26

2 years ago
I found where the decision not to decompress is done. The function "ApplyDecodingForExtension" (mozilla-central/uriloader/exthandler/nsExternalHelperAppService.cpp) does not decode when the extension is one of the value present in a constant array "nonDecodableExtensions", which contains "gz", "tgz", "zip", "z" and "svgz". By removing the entry "gz" I don't get the double-gzip problem.

Here is the call stack:
#0  nsExternalHelperAppService::ApplyDecodingForExtension
#1  nsExternalAppHandler::MaybeApplyDecodingForExtension
#2  nsExternalAppHandler::OnStartRequest
#3  mozilla::dom::ExternalHelperAppParent::OnStartRequest
#4  mozilla::net::HttpChannelParent::StartDiversion
#5  nsRunnableMethodArguments<>::apply<mozilla::net::HttpChannelParent, void 
#6  nsRunnableMethodImpl<void 
#7  nsThread::ProcessNextEvent

The big question is now: is it safe? I don't know. I guess that it should be contextual: decode if "Content-Encoding" is set to gzip, otherwise ignore the decoding.

Note that in the file "mozilla-central/netwerk/protocol/http/nsHttpChannel.cpp", in the function "ClearBogusContentEncodingIfNeeded", there is the following comment:

// For .gz files, apache sends both a Content-Type: application/x-gzip
// as well as Content-Encoding: gzip, which is completely wrong.  In
// this case, we choose to ignore the rogue Content-Encoding header. We
// must do this early on so as to prevent it from being seen up stream.
// The same problem exists for Content-Encoding: compress in default
// Apache installs.

I am not sure at all that the problem still exists in Apache. Also, my first attempt was to disable this function but it had not effect.

Comment 27

2 years ago
I forgot something: I tested with the latest Windows version of Firefox… with Wine on Linux (I don't have a Windows with me), and the problem was present as well. Testing it under a real Windows would be better.

Comment 28

2 years ago
Thank you Fabimaru. I already reproduced with Firefox 40 on Windows, as indicated in comment 19.

Your finding is key. The ClearBogusContentEncodingIfNeeded function was created by changeset 16322: https://hg.mozilla.org/mozilla-central/diff/1238046c4cce/netwerk/protocol/http/src/nsHttpChannel.cpp

But the code was there before CVS migrated to Mercurial (2007). At that point, the file's path was /netwerk/protocol/http/src/nsHttpChannel.cpp. Unfortunately, there has been a service outage for more than an hour at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/netwerk/protocol/http/src/nsHttpChannel.cpp&mark=&rev=1.339 so I could not trace the code further without checking out Mozilla's CVS... which I do not intend to do.

It would be interesting to know precisely who introduced that code and when, but it is already clear that current behavior is a workaround for a very old bug, which is most likely (and visibly) fixed.

Comment 29

2 years ago
(In reply to Fabimaru from comment #26)
> Note that in the file
> "mozilla-central/netwerk/protocol/http/nsHttpChannel.cpp", in the function
> "ClearBogusContentEncodingIfNeeded", there is the following comment:
> 
> // For .gz files, apache sends both a Content-Type: application/x-gzip
> // as well as Content-Encoding: gzip, which is completely wrong.

I don't know whether this was really the case in the past or just a misconfiguration of some servers, but this doesn't happen with my web server:

$ wget -S https://www.vinc17.net/defi14.ps.gz
[...]
  HTTP/1.1 200 OK
  Date: Sun, 13 Sep 2015 22:45:36 GMT
  Server: Apache/2.4.7 (Ubuntu)
  Last-Modified: Sun, 05 Jul 2015 01:17:19 GMT
  ETag: "ad40-51a168ce565c0"
  Accept-Ranges: bytes
  Content-Length: 44352
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: application/postscript
  Content-Encoding: gzip
Length: 44352 (43K) [application/postscript]
Saving to: ‘defi14.ps.gz’

And if I create a "foo.gz" file (i.e. no extensions except the .gz one):

  HTTP/1.1 200 OK
  Date: Sun, 13 Sep 2015 22:49:23 GMT
  Server: Apache/2.4.7 (Ubuntu)
  Last-Modified: Sun, 13 Sep 2015 22:48:51 GMT
  ETag: "1d-51fa8c0abbac0"
  Accept-Ranges: bytes
  Content-Length: 29
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Encoding: gzip
Length: 29
Saving to: ‘foo.gz’

Note that I have the following in my .htaccess, as recommended (this is old, I don't remember where I saw that):

RemoveType .Z .bz2 .gz .tgz
AddEncoding compress .Z
AddEncoding gzip .gz .tgz .svgz
AddEncoding x-bzip2 .bz2

The default mime.conf file contains:

        # AddEncoding allows you to have certain browsers uncompress
        # information on the fly. Note: Not all browsers support this.
        # Despite the name similarity, the following Add* directives have
        # nothing to do with the FancyIndexing customization directives above.
        #
        #AddEncoding x-compress .Z
        #AddEncoding x-gzip .gz .tgz
        #AddEncoding x-bzip2 .bz2
        #
        # If the AddEncoding directives above are commented-out, then you
        # probably should define those extensions to indicate media types:
        #
        AddType application/x-compress .Z
        AddType application/x-gzip .gz .tgz
        AddType application/x-bzip2 .bz2

So, there's the choice between AddEncoding and AddType, but one should not enable both.

Well, in short, if the web server is configured correctly, the HTTP headers are fine.

Comment 30

2 years ago
Vincent, I do not think the way Apache is configured matters with current versions. Quoting the function's comment:

> // For .gz files, apache sends both a Content-Type: application/x-gzip
> // as well as Content-Encoding: gzip, which is completely wrong.

Sending Content-Type: application/x-gzip as well as Content-Encoding: gzip may be *wrong* (since this is inefficient), but it is *not completely wrong*. What *is completely wrong* (and clearly a bug) is to send these when the server has not recompressed the file, which is presumably what some old Apache versions did. Unless there is evidence to the contrary, I would think current Apache versions send HTTP headers which are appropriate for the response content, no matter how they are configured.

Comment 31

2 years ago
(In reply to Filipus Klutiero from comment #30)
> Vincent, I do not think the way Apache is configured matters with current
> versions.

It does matter:

> Quoting the function's comment:
> 
> > // For .gz files, apache sends both a Content-Type: application/x-gzip
> > // as well as Content-Encoding: gzip, which is completely wrong.
> 
> Sending Content-Type: application/x-gzip as well as Content-Encoding: gzip
> may be *wrong* (since this is inefficient), but it is *not completely
> wrong*. What *is completely wrong* (and clearly a bug) is to send these when
> the server has not recompressed the file, which is presumably what some old
> Apache versions did.

This is also what usually happens if both the AddEncoding and the AddType lines in mime.conf / .htaccess files are enabled. The mime.conf directives just modify the HTTP headers. For efficiency reason, the server never recompresses files by default (the bug here can be seen for some web servers that do recompression, but this is not the default and not directly caused by mime.conf configuration). Using

  AddEncoding gzip .gz

means that for .gz files, i.e. files that should have been gzipped by the user (hence the .gz extension), such files are served with the "Content-Encoding: gzip" HTTP header. This allows one to provide the true media type in the "Content-Type:" HTTP header. For instance, a .ps.gz file is normally served as:

  Content-Type: application/postscript
  Content-Encoding: gzip

i.e. once decompressed by the user agent (web browser), one gets a PostScript file. However, configuring the server with both

  AddEncoding gzip .gz
  AddType application/x-gzip .gz

is incorrect because the server serves the file as:

  Content-Type: application/x-gzip
  Content-Encoding: gzip

even when the filename has a single .gz extension and is gzipped only once (tested).

Note: One could have intuitively thought that "AddEncoding gzip .gz" would apply the AddType rules with the ".gz" extension removed (since it is already taken into account for Content-Encoding), but this does NOT work like that. The ".gz" extension can only be removed if it is unknown for AddType: whenever an extension is unknown, Apache considers the next one to determine the media type.

Comment 32

2 years ago
Vincent, perhaps counter-intuitively, the mime.conf directives do not just modify the HTTP headers. Some of them do, but as explained in httpd.apache.org/docs/current/en/mod/mod_mime.html:
In addition, mod_mime may define the handler and filters that originate and process content.

I do not know about AddType and AddEncoding. Using AddEncoding does change Apache's behavior (it causes Content-encoding to be set when serving a .ps.gz). But it does not cause the bug with my Apache 2.4.10.

Are you saying that you can reproduce this bug with Apache 2.4?

Comment 33

2 years ago
When you discuss all the httpd configuration possibilities on my server there is also active mod_deflate which is already discussed in 2014 comments above.

Comment 34

2 years ago
The tests I did in Comment 31 were on my Debian laptop with Apache 2.4.16, with something very close to the default configuration (except for the various changes I've mentioned in this comment). In particular, I did not define specific handlers.

There's mod_deflate loaded, but in its default configuration (which I use), it handles only some types:
  AddOutputFilterByType DEFLATE text/html text/plain text/xml
  AddOutputFilterByType DEFLATE text/css
  AddOutputFilterByType DEFLATE application/x-javascript application/javascript application/ecmascript
  AddOutputFilterByType DEFLATE application/rss+xml
  AddOutputFilterByType DEFLATE application/xml
while I tried on a .ps.gz file.

Comment 35

2 years ago
Well, there's still a potential problem if one wants to support both AddOutputFilterByType from mod_deflate (to get compression by the server) and AddEncoding from mod_mime (if the file is already compressed). For instance, one may have both .ps and .ps.gz files, which one may want to send both compressed (the former one needs to be compressed by the server, but not the latter one). I have not tried (one might get what is seen at Comment 5, or something like that). Anyway, I don't think that Firefox should try to support content served by misconfigured servers, otherwise there's the risk to break things even more like here (by "misconfigured", I mean really misconfigured, i.e. with incorrect HTTP headers).

Apache does not try to absolutely avoid inconsistencies between the content that it really served and the HTTP headers, and it is really up to the webmaster not to break things. http://httpd.apache.org/docs/2.4/en/mod/mod_deflate.html#precompressed gives an example to "serve correct content types, and prevent mod_deflate double gzip" (things may be more complex with other configuration).

Comment 36

2 years ago
I just did some tests with the link from Jan in Comment20 and here are the results:

w7    Firefox esr 38.2.1                      => double gzipped
w7    Firefox esr 33.2.1 with DownThemAll!    => single gzipped
Linux Firefox esr 38.2.1                      => double gzipped
w7    ie 10.0.31                              => double gzipped
w7    chrome 45.0.2454.85 m                   => single gzipped

Comment 37

2 years ago
Vincent, it is not incorrect to serve both
Content-Type: application/x-gzip
Content-Encoding: gzip
if the content is double-gzipped.

motz, do you really mean Firefox 33.2.1? And are you familiar with Firefox's network monitor or Firebug? If so, could you answer comment 22?

Comment 38

2 years ago
(In reply to Filipus Klutiero from comment #37)
> Vincent, it is not incorrect to serve both
> Content-Type: application/x-gzip
> Content-Encoding: gzip
> if the content is double-gzipped.

I agree, but what I mean is that one can easily misconfigure the server and get these headers while the content is gzipped only once. I would not be surprised if this is what lead to the code mentioned in Comment 26 (which is the cause of this bug).

Note also that Apache can also use /etc/mime.types (see "TypesConfig /etc/mime.types" in mime.conf) to determine the HTTP headers, so that not everything is its "fault". In particular, there was some confusion in the past to decide whether compression formats should be supported in this file:

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=688872

and changes in /etc/mime.types would affect Apache.

Comment 39

2 years ago
Vincent, how could one misconfigure the server to "get these headers while the content is gzipped only once"? Again, did you manage to do that with a Apache 2.4?

Comment 40

2 years ago
(In reply to Filipus Klutiero from comment #39)
> Vincent, how could one misconfigure the server to "get these headers while
> the content is gzipped only once"?

Just add the following to the .htaccess file (or mime.conf, though I haven't tested this):

  AddEncoding gzip .gz

The default mime.conf file should already have:

  AddType application/x-gzip .gz .tgz

Otherwise I suppose that it should be added there (or to the .htaccess file).

$ wget -S 'http://localhost/~vinc17/defi14.ps.gz'
[...]
  Content-Type: application/x-gzip
  Content-Encoding: gzip
[...]
$ gunzip defi14.ps.gz
$ file defi14.ps
defi14.ps: PostScript document text conforming DSC level 2.0

As you can see, it is gzipped only once. I'd say that this is the typical misconfiguration. To avoid that, one should also add the following line to the .htaccess file:

  RemoveType .gz

> Again, did you manage to do that with a Apache 2.4?

This is with: Apache/2.4.16 (Debian)

Note also that the proposition in Comment 26 [decode if "Content-Encoding" is set to gzip] is incorrect. If the file has a .gz extension, one should not decode if the file is to be saved (with this extension), except in the particular case of double-compression.

Comment 41

2 years ago
@Filipus Klutiero
 sorry, 33.2.1 was obviously a typo and should read 38.2.1.
And sorry again, I don't know anything about firebug or firefox network monitor

Comment 42

2 years ago
(In reply to motz from comment #41)
> @Filipus Klutiero
>  sorry, 33.2.1 was obviously a typo and should read 38.2.1.
> And sorry again, I don't know anything about firebug or firefox network
> monitor

Thank you motz. Would you mind answering Fabimaru's question from comment 22? It is a 1 or 2 minute job which I am sure even someone who never used the dev tools will do in 10 minutes. And someone on Bugzilla should really get familiar with these tools!

Just open a new tab, hit Ctrl+Shift+Q to bring up the network monitor ( https://developer.mozilla.org/en-US/docs/Tools/Network_Monitor ). Then load the file and you should see the request appear. Select the request and on the right, select raw headers. Then simply copy and paste the request headers here, and do the same with the response headers.

According to https://en.wikipedia.org/wiki/HTTP_compression#Problems_preventing_the_use_of_HTTP_compression an antivirus could prevent content compression.

Comment 43

2 years ago
(In reply to Vincent Lefevre from comment #40)
> (In reply to Filipus Klutiero from comment #39)
> > Vincent, how could one misconfigure the server to "get these headers while
> > the content is gzipped only once"?
> 
> Just add the following to the .htaccess file (or mime.conf, though I haven't
> tested this):
> 
>   AddEncoding gzip .gz
> 
> The default mime.conf file should already have:
> 
>   AddType application/x-gzip .gz .tgz
> 
> Otherwise I suppose that it should be added there (or to the .htaccess file).
> 
> $ wget -S 'http://localhost/~vinc17/defi14.ps.gz'
> [...]
>   Content-Type: application/x-gzip
>   Content-Encoding: gzip
> [...]
> $ gunzip defi14.ps.gz
> $ file defi14.ps
> defi14.ps: PostScript document text conforming DSC level 2.0
> 
> As you can see, it is gzipped only once. I'd say that this is the typical
> misconfiguration. To avoid that, one should also add the following line to
> the .htaccess file:
> 
>   RemoveType .gz
> 
> > Again, did you manage to do that with a Apache 2.4?
> 
> This is with: Apache/2.4.16 (Debian)
> 
[...]

I was skeptical of Vincent's comments, but I can confirm that AddEncoding simply changes the headers (with Apache 2.4.10 on Debian 8). I did not trust using buggy clients, but I verified with netcat and the content is not compressed twice (it is identical). http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html does contain:
>The content-coding is a characteristic of the entity identified by the Request-URI. Typically, the entity-body is stored with this encoding and is only decoded before rendering or analogous usage.
This seems to be designed to allow high efficiency with a single compression no matter how many times the file is served (thanks to compression on the filesystem).

That being said, if Vincent's comment 29 meant that Apache can be configured to behave in a way which triggers this bug, I cannot confirm. What was shown is the opposite.

Comment 44

2 years ago
I have tested now that CentOS-7.1 x86_64 with /etc/httpd/conf.d/test.conf:
SetOutputFilter DEFLATE
RemoveEncoding .gz .Z .bz .bz2 .zip
AddType application/x-gzip .gz
Reproduces the problem, it is served at: http://vps2.jankratochvil.net/mdsms-1.5.3.tar.gz

GET /mdsms-1.5.3.tar.gz HTTP/1.1
Host: vps2.jankratochvil.net
User-Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Via: 1.1 host2.jankratochvil.net (squid/3.4.12)
X-Forwarded-For: ::1
Cache-Control: max-age=259200
Connection: keep-alive

HTTP/1.1 200 OK
Date: Tue, 15 Sep 2015 12:54:37 GMT
Server: Apache/2.4.6 (CentOS)
Last-Modified: Tue, 15 Sep 2015 12:38:33 GMT
ETag: "4759b-51fc875c02440-gzip"
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: application/x-gzip

1faa
<double-gzipped-tar>

Comment 45

2 years ago
(In reply to Filipus Klutiero from comment #43)
> I was skeptical of Vincent's comments, but I can confirm that AddEncoding
> simply changes the headers (with Apache 2.4.10 on Debian 8). I did not trust
> using buggy clients, but I verified with netcat and the content is not
> compressed twice (it is identical).
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html does contain:
> >The content-coding is a characteristic of the entity identified by the Request-URI.
> >Typically, the entity-body is stored with this encoding and is only decoded before
> >rendering or analogous usage.
> This seems to be designed to allow high efficiency with a single compression
> no matter how many times the file is served (thanks to compression on the
> filesystem).

Yes, that was the initial goal. And this what should probably be done when possible. Now, contents are more and more dynamic, so that if one wants to be able to serve such contents in a compressed form (useful on slow networks or in case of quotas), the web server needs to have a way to compress content on the fly. This is what mod_deflate does:

  http://httpd.apache.org/docs/2.4/en/mod/mod_deflate.html

So...

> That being said, if Vincent's comment 29 meant that Apache can be configured
> to behave in a way which triggers this bug, I cannot confirm. What was shown
> is the opposite.

To trigger this bug, one needs
1. a gzipped file;
2. a mime.conf / .htaccess configuration where the "Content-Encoding:" HTTP header is NOT associated with gzip files (this is valid as long as the "Content-Type:" header says application/gzip or similar, one just "loses" the type of the uncompressed contents if it were known);
3. on-the-fly compression done by the server.

With just (1) and (2), the gzipped file would be served as:

  Content-Type: application/gzip

and no "Content-Encoding:" HTTP header. The contents are compressed once: from the gzip compression on the file system. Firefox can handle that.

If one adds (3), one gets a second compression, and the gzipped file should be served as:

  Content-Type: application/gzip
  Content-Encoding: gzip

mentioning both compressions (one from the file itself and one by the server), which is the case where the bug in Firefox occurs. I suppose that (3) can simply be enabled with:

  AddOutputFilterByType DEFLATE application/gzip

Of course, this looks silly (though valid), but in practice, this can occur when servers compress everything on-the-fly, without doing an exception for contents that are known to be already compressed (such as application/gzip). I have not tried that this line works, but you get the idea.

Comment 46

2 years ago
(In reply to Jan Kratochvil from comment #44)
> I have tested now that CentOS-7.1 x86_64 with /etc/httpd/conf.d/test.conf:
> SetOutputFilter DEFLATE
> RemoveEncoding .gz .Z .bz .bz2 .zip
> AddType application/x-gzip .gz
> Reproduces the problem, it is served at:
> http://vps2.jankratochvil.net/mdsms-1.5.3.tar.gz
> 
[...]

Thank you Jan.
Could someone update the URL field to stop pointing to the current broken value and point to Jan's instead?
This could also be retitled more correctly and precisely to "Does not decompress some double-gzipped responses (downloads)".

Comment 47

2 years ago
(In reply to Vincent Lefevre from comment #45)
> (In reply to Filipus Klutiero from comment #43)
[...]
> I suppose that
> (3) can simply be enabled with:
> 
>   AddOutputFilterByType DEFLATE application/gzip
> 
> Of course, this looks silly (though valid), but in practice, this can occur
> when servers compress everything on-the-fly, without doing an exception for
> contents that are known to be already compressed (such as application/gzip).
> I have not tried that this line works, but you get the idea.

The type is "application/x-gzip", not "application/gzip", but I can indeed reproduce with Debian 8's Apache (2.4.10) with the right type.

Considering this result, I guess there has been no real bugfix in Apache, but perhaps Apache instances which have this "performance bug" are becoming rarer due to a change in Apache's default behavior. That could explain where all the example URL-s we had which are still valid now work properly. Unless all server administrators realized the problem and changed their configuration.

If Apache has improved by default, this bug may have lower importance, but it clearly persists.

Comment 48

2 years ago
(In reply to Filipus Klutiero from comment #47)
> The type is "application/x-gzip", not "application/gzip", but I can indeed
> reproduce with Debian 8's Apache (2.4.10) with the right type.

OK, so this is not much a problem since application/x-gzip should no longer be used[*]. I would like to hear what the correct browser behavior should be. Should Firefox detect whether the content is double-gzipped? Or should it have a list of content-type values to decide whether to uncompress the content before saving?

[*] http://tools.ietf.org/html/rfc6648

Comment 49

2 years ago
(In reply to Vincent Lefevre from comment #48)
[...]
> I would like to hear what the correct browser behavior should
> be. Should Firefox detect whether the content is double-gzipped? Or should
> it have a list of content-type values to decide whether to uncompress the
> content before saving?

UNIX generally does not consider that a given file extension indicates a given file format. There is also no way to determine a file's format from its content in an absolutely positive way. If Firefox tries to stay clever about decompression, either with file extensions or content, the situation will stay aberrant. For example, we may remain in the situation where Firefox downloads files A and B, where B is a compression of A, and saves these 2 files (which differ on the server) in 2 identical files.

I think Firefox should respect HTTP headers. If it wants to keep playing the smart guy, it should at least prompt the user about whether decompression should be performed or not (or, at the very least, warn the user that decompression was not performed, despite what the server requested).

Comment 50

2 years ago
(In reply to Filipus Klutiero from comment #49)
> UNIX generally does not consider that a given file extension indicates a
> given file format.

This completely depends on the application. For instance, Firefox does (e.g. when opening a "file:" URL). More importantly, Firefox should do what the user expects. See bug 233047, which is directly related to this one.

> There is also no way to determine a file's format from
> its content in an absolutely positive way.

My question was about the particular case of possibly gzipped content.

> I think Firefox should respect HTTP headers.

Yes, but the issue is that there are several ways to respect them. Hence bug 233047.

Comment 51

2 years ago
(In reply to Vincent Lefevre from comment #50)
> (In reply to Filipus Klutiero from comment #49)
> > UNIX generally does not consider that a given file extension indicates a
> > given file format.
> 
> This completely depends on the application. For instance, Firefox does (e.g.
> when opening a "file:" URL).

When you're opening a file, Firefox asks you what action to take (though it can indeed be configured to take one by default).

> More importantly, Firefox should do what the user expects.

I think users expect Firefox not to behave incorrectly.

> 
> > There is also no way to determine a file's format from
> > its content in an absolutely positive way.
> 
> My question was about the particular case of possibly gzipped content.
> 

My answer also applies to gzip. A file whose content was generated randomly *could* (theoretically) happen to be in gzip format without being a gzip file.


I am sorry for my comment 43. I think I now understand what your point was in comment 31 and comment 40.

To sum up, this bug is related to 2 problematic Apache behaviors:
1. the behavior which presumably came first is Apache sending both Content-encoding and Content-type gzip
2. sending some gzipped files gzipped (double-gzipped).

Behavior 1 is a real bug and Mozilla (and other browsers) apparently tried to work around it by not decompressing. Behavior 2 is merely a performance bug, but with Mozilla's work around for behavior 1, it causes a real problem.

If behavior 1 would no longer happen, the issue could be simply solved by dropping the workaround. But as shown by comment 31 and comment 40, behavior 1 can still be observed with Apache 2.4. Unfortunately, as shown in comment 45, behavior 2 can still be observed in Apache 2.4 too, even though it no longer(?) happens by default.

Since both behaviors persist in Apache 2.4, there is no way to have Firefox always behave as the user would want with Apache 2.4, except by being dangerously "clever". Indeed, I verified that Chrome's change for behavior means it is now exposing the problem caused by behavior 1 (it downloads altered (uncompressed) gzip files).

Comment 52

2 years ago
(In reply to Filipus Klutiero from comment #51)
> (In reply to Vincent Lefevre from comment #50)
> > (In reply to Filipus Klutiero from comment #49)
> > > UNIX generally does not consider that a given file extension indicates a
> > > given file format.
> > 
> > This completely depends on the application. For instance, Firefox does (e.g.
> > when opening a "file:" URL).
> 
> When you're opening a file, Firefox asks you what action to take (though it
> can indeed be configured to take one by default).

No, it doesn't, at least not by default (with a fresh profile in safe-mode, I've just tried with a file ending with a .html extension, which Firefox opens as text/html though it isn't).

> > > There is also no way to determine a file's format from
> > > its content in an absolutely positive way.
> > 
> > My question was about the particular case of possibly gzipped content.
> > 
> 
> My answer also applies to gzip. A file whose content was generated randomly
> *could* (theoretically) happen to be in gzip format without being a gzip
> file.

So, with this point of view, the .gz extension is the only safe way to keep track of the expected contents on the file system. Thus for saving, you should agree with me that if a file is served with the .gz extension and is declared as being gzipped by one of the HTTP headers (Content-Type or Content-Encoding), Firefox should make sure that it saves a gzip file. This means that Firefox should not decompress the file, except in the case where *both* headers declare the file to be gzipped (this either the typical double-gzipped case, or a bug in the web server configuration).

The problem is that for Content-Type, it may be difficult to know whether it means gzip or not. Hence my comment 48. For instance, under Debian, /etc/mime.types contains:

application/x-gtar-compressed                   tgz taz

so that the application/x-gtar-compressed Content-Type would imply that the content is gzipped. But this means that a browser that doesn't do content sniffing would need to know such types. Good luck to get every of them for every OS!

> I am sorry for my comment 43. I think I now understand what your point was
> in comment 31 and comment 40.
> 
> To sum up, this bug is related to 2 problematic Apache behaviors:
> 1. the behavior which presumably came first is Apache sending both
> Content-encoding and Content-type gzip
> 2. sending some gzipped files gzipped (double-gzipped).
> 
> Behavior 1 is a real bug and Mozilla (and other browsers) apparently tried
> to work around it by not decompressing. Behavior 2 is merely a performance
> bug, but with Mozilla's work around for behavior 1, it causes a real problem.

Yes, but I really think that behavior 1 was a misconfiguration from the beginning, not an actual Apache bug (though I wonder whether this could have been the default configuration at some places). Note that the Apache configuration is not self-contained: it can depend on system files (at least /etc/mime.types), which are not part of the Apache distribution. Anyway, whether this was a real bug or a misconfiguration should not matter from the Firefox point of view, IMHO.

> If behavior 1 would no longer happen, the issue could be simply solved by
> dropping the workaround. But as shown by comment 31 and comment 40, behavior
> 1 can still be observed with Apache 2.4. Unfortunately, as shown in comment
> 45, behavior 2 can still be observed in Apache 2.4 too, even though it no
> longer(?) happens by default.

I would say: Behavior 1 (with single gzip) can still occur, but this is a major misconfiguration of the web server. Behavior 2 can still occur; this is a suboptimal configuration, but this is correct, IMHO (I have not checked what the specs exactly say in such a case). So, because the workaround has drawbacks with some server configurations (correct, though not optimal), I think that the right solution would be that the server configuration in behavior 1 be fixed by the webmaster, and drop the workaround in Firefox. But be careful not to break something else at the same time!

Workarounds to support misconfigurations or bugs in other software may be OK only if they don't introduce incorrect behavior like here with double-gzipped content.

Comment 53

2 years ago
(In reply to Vincent Lefevre from comment #52)
> (In reply to Filipus Klutiero from comment #51)
> > (In reply to Vincent Lefevre from comment #50)
> > > (In reply to Filipus Klutiero from comment #49)
> > > > UNIX generally does not consider that a given file extension indicates a
> > > > given file format.
> > > 
> > > This completely depends on the application. For instance, Firefox does (e.g.
> > > when opening a "file:" URL).
> > 
> > When you're opening a file, Firefox asks you what action to take (though it
> > can indeed be configured to take one by default).
> 
> No, it doesn't, at least not by default (with a fresh profile in safe-mode,
> I've just tried with a file ending with a .html extension, which Firefox
> opens as text/html though it isn't).

Firefox can handle HTML itself. In general (for example, with a .pdf.gz), it will ask what to do. My point is really that Firefox does not force a given handler for a given extension. It merely has default behaviors for some.

> 
> > > > There is also no way to determine a file's format from
> > > > its content in an absolutely positive way.
> > > 
> > > My question was about the particular case of possibly gzipped content.
> > > 
> > 
> > My answer also applies to gzip. A file whose content was generated randomly
> > *could* (theoretically) happen to be in gzip format without being a gzip
> > file.
> 
> So, with this point of view, the .gz extension is the only safe way to keep
> track of the expected contents on the file system.

That was not my point. My point is that browsers have *no* reliable way to tell what format has the file contents.

> Thus for saving, you
> should agree with me that if a file is served with the .gz extension and is
> declared as being gzipped by one of the HTTP headers (Content-Type or
> Content-Encoding), Firefox should make sure that it saves a gzip file.

No, Firefox receives data and a filename. What it needs to do is to create a filename with the given filename and data. If I save a sound in Ogg format, au audio editor will let me name it foo.mp3. If I don't, my OS will let me rename the file foo.mp3. If I want to serve that file, Apache will let me serve it unaltered.

If Firefox then receives a foo.mp3 response but an ogg-formatted content, it should also allow saving the file intact. It should not try converting the file to MP3 format.

> This
> means that Firefox should not decompress the file, except in the case where
> *both* headers declare the file to be gzipped (this either the typical
> double-gzipped case, or a bug in the web server configuration).
 
Indeed... though in fact, Content-type is irrelevant. Browsers should simply decompress the response if they're told to do so, or leave it as-is.

[...]
> 
> > I am sorry for my comment 43. I think I now understand what your point was
> > in comment 31 and comment 40.
> > 
> > To sum up, this bug is related to 2 problematic Apache behaviors:
> > 1. the behavior which presumably came first is Apache sending both
> > Content-encoding and Content-type gzip
> > 2. sending some gzipped files gzipped (double-gzipped).
> > 
> > Behavior 1 is a real bug and Mozilla (and other browsers) apparently tried
> > to work around it by not decompressing. Behavior 2 is merely a performance
> > bug, but with Mozilla's work around for behavior 1, it causes a real problem.
> 
> Yes, but I really think that behavior 1 was a misconfiguration from the
> beginning, not an actual Apache bug (though I wonder whether this could have
> been the default configuration at some places). Note that the Apache
> configuration is not self-contained: it can depend on system files (at least
> /etc/mime.types), which are not part of the Apache distribution. Anyway,
> whether this was a real bug or a misconfiguration should not matter from the
> Firefox point of view, IMHO.

I basically agree, although if some argue the workaround should be kept as an option, the relevance of offering such an option depends on how common behavior 1 is, which does depend on whether behavior 1 was ever a default Apache behavior.

> > If behavior 1 would no longer happen, the issue could be simply solved by
> > dropping the workaround. But as shown by comment 31 and comment 40, behavior
> > 1 can still be observed with Apache 2.4. Unfortunately, as shown in comment
> > 45, behavior 2 can still be observed in Apache 2.4 too, even though it no
> > longer(?) happens by default.
> 
> I would say: Behavior 1 (with single gzip) can still occur, but this is a
> major misconfiguration of the web server. Behavior 2 can still occur; this
> is a suboptimal configuration, but this is correct, IMHO (I have not checked
> what the specs exactly say in such a case).

Indeed, behavior 1 is a real bug, behavior 2 is a "performance bug" (which is arguably not an actual bug). 

> So, because the workaround has
> drawbacks with some server configurations (correct, though not optimal), I
> think that the right solution would be that the server configuration in
> behavior 1 be fixed by the webmaster, and drop the workaround in Firefox.
> But be careful not to break something else at the same time!

The servers should certainly be fixed. As for Firefox, it should either drop the workaround indeed, or modify it.

> Workarounds to support misconfigurations or bugs in other software may be OK
> only if they don't introduce incorrect behavior like here with
> double-gzipped content.

Indeed. If we want to keep the workaround, we should ask users whether contents should be decompressed or not.

Comment 54

2 years ago
(In reply to Filipus Klutiero from comment #53)
> Firefox can handle HTML itself. In general (for example, with a .pdf.gz), it
> will ask what to do. My point is really that Firefox does not force a given
> handler for a given extension. It merely has default behaviors for some.

These are not just default behaviors, these are hard-coded behaviors. Some of them cannot be configured even though the behavior is arbitrary and controversial. Anyway this is off-topic here. What matters here is that filename extensions will be taken into account in some contexts.

> > So, with this point of view, the .gz extension is the only safe way to keep
> > track of the expected contents on the file system.
> 
> That was not my point. My point is that browsers have *no* reliable way to
> tell what format has the file contents.

I disagree. The HTTP headers should be regarded as reliable. If some server is misconfigured, that's the fault of the webmaster.

> > Thus for saving, you
> > should agree with me that if a file is served with the .gz extension and is
> > declared as being gzipped by one of the HTTP headers (Content-Type or
> > Content-Encoding), Firefox should make sure that it saves a gzip file.
> 
> No, Firefox receives data and a filename.

and information about the file format and encoding, via the HTTP headers. The issue here is how to interpret the Content-encoding HTTP header: whether the encoding is part of the file contents, in which case the file should be saved undecoded, or this is just a temporary encoding (compression), in which case the data should be decoded before saving. Apache allows one to follow either choice. What I'm saying here is that the filename extension (such as .gz) helps to guess what was intended, but the Content-encoding HTTP header and the filename extension are not sufficient to decide.

Comment 55

2 years ago
(In reply to Vincent Lefevre from comment #54)
> (In reply to Filipus Klutiero from comment #53)

[...]
> 
> > > So, with this point of view, the .gz extension is the only safe way to keep
> > > track of the expected contents on the file system.
> > 
> > That was not my point. My point is that browsers have *no* reliable way to
> > tell what format has the file contents.
> 
> I disagree. The HTTP headers should be regarded as reliable. If some server
> is misconfigured, that's the fault of the webmaster.

Absolutely. What I was saying is that there is no way to determine a file's format from its content in an absolutely positive way. And even if extensions or Content-type would be reliable, since both are optional, that leaves us with no reliable way to determine file format (and even if there was, we cannot handle an unlimited number of file formats).

That means Firefox should have a decoding strategy independent of the file format (at least by default).

> > > Thus for saving, you
> > > should agree with me that if a file is served with the .gz extension and is
> > > declared as being gzipped by one of the HTTP headers (Content-Type or
> > > Content-Encoding), Firefox should make sure that it saves a gzip file.
> > 
> > No, Firefox receives data and a filename.
> 
> and information about the file format and encoding, via the HTTP headers.
> The issue here is how to interpret the Content-encoding HTTP header: whether
> the encoding is part of the file contents, in which case the file should be
> saved undecoded, or this is just a temporary encoding (compression), in
> which case the data should be decoded before saving. Apache allows one to
> follow either choice. What I'm saying here is that the filename extension
> (such as .gz) helps to guess what was intended, but the Content-encoding
> HTTP header and the filename extension are not sufficient to decide.

Content-encoding alone is sufficient (unless by sufficient, you mean a behavior which works around misbehaving HTTP servers). Even the filename extension should be ignored for decoding, if we want to behave properly (i.e. do what the server is telling us to do, even if the server might be misbehaving).

Comment 56

2 years ago
(In reply to Filipus Klutiero from comment #55)
> Content-encoding alone is sufficient (unless by sufficient, you mean a
> behavior which works around misbehaving HTTP servers).

It isn't. Consider data served as

  Content-Type: application/postscript
  Content-Encoding: gzip

(that is, a PostScript file sent compressed with gzip). To follow the usual conventions under Unix, if the filename ends with the .gz extension, then it is expected that the data will be stored compressed (i.e. the browser shouldn't decompress the data). Otherwise the browser should decompress the data.

Examples of typical filenames:
  * Former case: file.ps.gz
  * Latter case: file.ps

> Even the filename
> extension should be ignored for decoding, if we want to behave properly
> (i.e. do what the server is telling us to do, even if the server might be
> misbehaving).

The server doesn't tell us what to do. It just sends information about data, as specified by the RFC's. A "Content-Encoding: gzip" just means that when decompressed with gzip, the resulting data are in the format referenced by the Content-Type header. It does not tell whether and when the transmitted data should be decompressed.

Comment 57

2 years ago
(In reply to Vincent Lefevre from comment #31)
> For instance, a .ps.gz file is normally served as:
> 
>   Content-Type: application/postscript
>   Content-Encoding: gzip
> 
> i.e. once decompressed by the user agent (web browser), one gets a
> PostScript file.

My testing infirms that. While this can be the case (for example, 
https://www.cs.cmu.edu/~crary/papers/1998/param/param.ps.gz
), my local Apache 2.4.10 (default Debian 8 behavior) serves a policy.ps.gz as just
Content-Type	application/x-gzip
The same behavior can be seen on:
http://www.jmlr.org/papers/volume4/saul03a/saul03a.ps.gz
https://www.gnu.org/software/bc/manual/ps/bc.ps.gz

Chromium 45 stores param.ps.gz uncompressed. Others are saved compressed.

Comment 58

2 years ago
(In reply to Filipus Klutiero from comment #57)
> (In reply to Vincent Lefevre from comment #31)
> > For instance, a .ps.gz file is normally served as:
> > 
> >   Content-Type: application/postscript
> >   Content-Encoding: gzip
> > 
> > i.e. once decompressed by the user agent (web browser), one gets a
> > PostScript file.
> 
> My testing infirms that. While this can be the case (for example, 
> https://www.cs.cmu.edu/~crary/papers/1998/param/param.ps.gz
> ), my local Apache 2.4.10 (default Debian 8 behavior) serves a policy.ps.gz
> as just
> Content-Type	application/x-gzip
> The same behavior can be seen on:
> http://www.jmlr.org/papers/volume4/saul03a/saul03a.ps.gz
> https://www.gnu.org/software/bc/manual/ps/bc.ps.gz

I've said "normally". Here your local web server is poorly configured, just like the last two servers: if I try to open such a URL with Firefox (with the intent to open the file with an application, not to save it), Firefox does not know which application to use to open the PostScript file. This is a correct configuration, but very suboptimal because the drawback is that the file is just seen here as gzipped data; this is not very informative! The Content-Type + "Content-Encoding: gzip" configuration is much better as it gives more information on the data.

There would be the same issue with .html.gz; see for instance the HTML compressed page (entirely on one web page) on:
  https://www.gnu.org/software/libtool/manual/
which corresponds to the link:
  https://www.gnu.org/software/libtool/manual/libtool.html.gz
Firefox cannot open the page, while if it were served as:
  Content-Type: text/html
  Content-Encoding: gzip
Firefox could open it.

Comment 59

2 years ago
(In reply to Vincent Lefevre from comment #58)
> (In reply to Filipus Klutiero from comment #57)
> > (In reply to Vincent Lefevre from comment #31)
> > > For instance, a .ps.gz file is normally served as:
> > > 
> > >   Content-Type: application/postscript
> > >   Content-Encoding: gzip
> > > 
> > > i.e. once decompressed by the user agent (web browser), one gets a
> > > PostScript file.
> > 
> > My testing infirms that. While this can be the case (for example, 
> > https://www.cs.cmu.edu/~crary/papers/1998/param/param.ps.gz
> > ), my local Apache 2.4.10 (default Debian 8 behavior) serves a policy.ps.gz
> > as just
> > Content-Type	application/x-gzip
> > The same behavior can be seen on:
> > http://www.jmlr.org/papers/volume4/saul03a/saul03a.ps.gz
> > https://www.gnu.org/software/bc/manual/ps/bc.ps.gz
> 
> I've said "normally". Here your local web server is poorly configured, just
> like the last two servers: if I try to open such a URL with Firefox (with
> the intent to open the file with an application, not to save it), Firefox
> does not know which application to use to open the PostScript file.

I did not say my local web server is well configured, I said that's how Debian 8 behaves by default. This server runs on my PC only for testing purposes. If you really think Debian's default Apache configuration is poor, you are welcome to file a ticket against Debian or provide a link to one if this is a known issue.

The 3 actual URI-s I provided were the first 3 .ps.gz files found with a Google search for ".ps.gz". If you think my tests are not representative and maintain your claim, you may want to provide evidence.

[...]

Comment 60

2 years ago
(In reply to Filipus Klutiero from comment #59)
> I did not say my local web server is well configured, I said that's how
> Debian 8 behaves by default. This server runs on my PC only for testing
> purposes. If you really think Debian's default Apache configuration is poor,
> you are welcome to file a ticket against Debian or provide a link to one if
> this is a known issue.

Not everyone needs to store compressed files on their server, and such compressed files could be handled in some other way. It's up to the webmaster to configure his server according to his needs. The default configuration is just for very basic use.

Anyway, Firefox should handle any valid configuration. By "valid", I mean, that conforms to the RFC's. This includes poor configuration (such as double-gzipped). And when there are multiple choices for the behavior (for instance, when a file with gzip Content-Encoding is saved, whether it should be saved as is or uncompressed), Firefox should do what the user expects, following the conventions of the platform by default.

Comment 61

2 years ago
(In reply to Vincent Lefevre from comment #60)
> And when there are multiple choices for the behavior (for
> instance, when a file with gzip Content-Encoding is saved, whether it should
> be saved as is or uncompressed), Firefox should do what the user expects,
> [...].

Well, it should certainly try to... what needs to be determined is the best way to do that.

Comment 62

a year ago
OK guys, I can confirm that this bug persist in FF 44.0.2 (latest version as of today).
After spending the whole afternoon with Copperhead.co guys trying to solve an issue regarding file corruption when downloaded with FF, we end up confirming this bug.

The deal is that FF asks for gzip but doesn't un-gzip it after that.  It's a **** bug.

Same behaviour can be achieved with
> wget --header='Accept-Encoding: gzip, deflate' https://....

Comment 63

a year ago
I'm running into this bug as well with http://www.shlomifish.org/ and related domains after i enabled gzip encoding. I'll try to look into fixing it, but no promises.

Comment 64

a year ago
Created attachment 8723797 [details] [diff] [review]
Tentative patch for 285540:c1e0d1890cfe hg default head to fix the problem.

This is a tentative, preliminary , proof-of-concept, and very hacky patch that appears to fix the problem based on my relatively brief testing with ./mach run. I think the whole nsHttpChannel::ClearBogusContentEncodingIfNeeded() method in nsHttpChannel.cpp can be removed. But testing is welcome. Also, I noticed some tests that may fail.

Comment 65

a year ago
Created attachment 8724064 [details] [diff] [review]
firefox-double-gzip-bug610679-v0.4.1.patch

This is a cleaned up patch. I'm still not sure it's 100% correct.

Comment 66

a year ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3ba2ca223820

Not complete yet, but looks like there are download tests for related things that are failing with this change (test_with_content_encoding_ignore_extension, see https://dxr.mozilla.org/mozilla-central/rev/eb25b90a05c194bfd4f498ff3ffee7440f85f1cd/toolkit/components/jsdownloads/test/unit/common_test_Download.js#1496-1533 )

Comment 67

6 months ago
I ran into this too.  When gzip encoding is used Firefox needs to always unzip.

The content or file name extension should not be tested to determine if it needs to be unzipped, rather the http header and the http header alone, should be used.  

Yes, it's an error for a server to double zip.  But Chrome and wget correctly unzip.  Firefox does not.

https://jira.mariadb.org/browse/ODBC-59?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=87253#comment-87253

https://bugzilla.mozilla.org/show_bug.cgi?id=610679#c62

Comment 68

14 days ago
Bug still present in Firefox 52.0 (64-bit) on Ubuntu 16.04.2 LTS.

Chrome indeed correctly unzips, but that is a sub-optimal workaround..
You need to log in before you can comment on or make changes to this bug.