Open Bug 233047 Opened 20 years ago Updated 1 year ago

From some download servers, browser uncompresses gzip file, but keeps gz extension. Download manager is confused over file size and time remaining (nsBinaryDetector)

Categories

(Firefox :: File Handling, defect)

defect

Tracking

()

People

(Reporter: waynegwoods, Unassigned)

References

()

Details

(Keywords: regression)

Attachments

(1 obsolete file)

User-Agent:       
Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7a) Gecko/20040203 Firebird/0.8.0+

If I go to the mozilla download mirror at
http://mozilla.gnusoft.net/firebird/nightly/2004-02-03-02-trunk/ (or any of the
archive directories at that site, this is just an example)
and click on Firebird-mac.dmg.gz, it'll download the file and gunzip it at the
same time. The unzipping is unasked for, and occurs despite the fact that the
save dialog (on Mac at least), says it's being saved as a Gnu ZIP archive.

If it's supposed to uncompress by default, that's fine. However:
(i) it only does it from some servers... as it doesn't do it from the identical
file at the official Mozilla site (i.e.
http://ftp.mozilla.org/pub/mozilla.org/firebird/nightly/2004-02-03-02-trunk/),

(ii) it leaves the .gz extension. The fact that it's gunzipped is only evident
by the greater file size, and if I remove the .gz extension, I can open it as a
normal disk image.

(iii) during download, the download manager assumes the original gzipped size
when calculating the total file size, and the time remaining. This leads it to
it saying things like "9.0 of 8.3 MB" and giving negative time remaining.

I can reproduce this in both Firebird and Seamonkey. I have also reproduced it
on Firebird on both Mac OS X and Windows XP, on computers at totally different
locations, using different types of networks (ethernet and home dial-up). It
also occurs with a fresh profile.

Reproducible: Always
Steps to Reproduce:
1. Navigate to http://mozilla.gnusoft.net/firebird/nightly/2004-02-03-02-trunk/
2. Click on Firebird-mac.dmg.gz and go through the save dialog to save it to
your desktop

Actual Results:  
File uncompresses (gunzips) during download, but download manager gets confused
over file size and time remaining. File .gz extension remains.

Expected Results:  
Possibly not gunzipped the file during download. Or at least the download
manager should have calculated stats based on the correct file size, and the .gz
extension should have been removed from the file.

Sounds like a partial regression of bug 35956, as in it only exhibits the
problems at some servers and not others.
gnusoft.net sends:
Content-Encoding: x-gzip
Content-Type: text/plain

ftp.m.o sends:
Content-Encoding: x-gzip
Content-Type: application/x-gzip

>(iii) during download, the download manager assumes the original gzipped size
>when calculating the total file size, and the time remaining. This leads it to
>it saying things like "9.0 of 8.3 MB" and giving negative time remaining.

heh
well, it's clear how that happens - the total size is the Content-Length that
the server sent, the current position is the sum of bytes saved. could you file
a separate bug about that, though? component would probably be file handling.

god I hate this entire uncompress issues... I want bug 57342 fixed...

this may be a duplicate, not sure...
Assignee: general → file-handling
Component: Browser-General → File Handling
QA Contact: general → ian
Oh, this is an interesting case...  The problem here is that there is a stream
converter between the download code and the channel -- the nsBinaryDetector
trying to guess what type the data _really_ is.  This was the case we thought
we'd never hit because we figured that no server would send no content-type and
a content-encoding, but now of course we pump data through the binary detector
in this case, decompressing it as we go.  I'm not sure whether we have bugs on
this issue, but it _is_ one we need to solve for stream converters in general
and we've known that for a while.

There is no simple solution here, really, short of not using nsBinaryDetector in
cases when we have content-encoding (which would mean that this file would be
displayed as text/plain in the browser).
So just to clarify, the download code DOES tell necko not to gunzip the file. 
Necko just can't do that because due to the stream converter mess it's too late
-- the file has been gunzipped.
Note that if we do this right nsBinaryDetector will _always_ be looking at
gzipped data in cases like this, and hence will fail to detect some actual
plaintext as such.  Which would be bad.  So once we fix the general
streamconverter bug, we will still need to do something special with
nsBinaryDetector.  Presumably the same thing as what nsUnknownDecoder will have
to do to properly sniff the data inside a gzipped file...
Depends on: 220807
*** Bug 235859 has been marked as a duplicate of this bug. ***
Summary: From some download servers, browser uncompresses gzip file, but keeps gz extension. Download manager is confused over file size and time remaining → From some download servers, browser uncompresses gzip file, but keeps gz extension. Download manager is confused over file size and time remaining (nsBinaryDetector)
*** Bug 235829 has been marked as a duplicate of this bug. ***
*** Bug 233534 has been marked as a duplicate of this bug. ***
*** Bug 236710 has been marked as a duplicate of this bug. ***
*** Bug 236978 has been marked as a duplicate of this bug. ***
Keywords: regression
Status: UNCONFIRMED → NEW
Ever confirmed: true
OK, at this point the "right" fix is not happening for 1.7.  Nevertheless, I
would rather not ship 1.7 with this bug (and certainly FireFox 1.0 doesn't want
this bug).  Here are the interim solutions I see that could be implemented very
easily (as long as we pick one):

1)  Back out the fix to bug 220807 altogether
2)  Disable that fix in cases when content-encoding is set. This will rebreak
    .dmg.gz files served by Apache servers (but right now we screw them up
    anyway -- we save them decompressed, but with the .dmg.gz name).
3)  Disable decompression for files served as text/plain.  This will break sites
    that actually send such files as text/plain (sourceforge, eg).

There is also

4)  Do the solution I mentioned in comment 4.  This would need to go into 1.7b
    to have a hope in hell of happening for 1.7, and I just don't have the time
    to do it in the sort of timeframe involved there... If someone else wants to
    do it, I can elaborate on what needs to be done.

Frankly, I prefer option #2.  It doesn't really break anything that's not
already broken as things stand....  Pinkerton, Ben what are your exact plans as
far as branches and such go?
Flags: blocking1.7b?
(In reply to comment #10)
Please note that in my original report (235859) the reference was to firefox's
behaviour when right-clicking on a link and choosing "Save Link to Disk...". In
this case decompressing the file would always be broken behaviour; it should be
saved in the exact form that it comes from the server. This is particularly true
for the Debian source packages I refer to in 235859; subsequent processing of
these files, once downloaded, by the Debian package management tools, assumes
that they are in compressed form. The Debian server also provides md5sums to
check the integrity of the downloaded file, which of course don't work with the
uncompressed files.

Since making the report in 235859 I have also discovered that re-compressing the
downloaded files with gzip does not restore them to the exact same form as they
were on the Debian server; the Debian package management tools complain that the
.tar.gz file is a few tens of bytes larger than the expected size - if I
retrieve the files unmodified using wget, this problem does not arise, so it is
an artefact of the decompression/recompression fudge. This is a further reason
why right-click/"Save Link to Disk..." should not modify the file in any way.
Flags: blocking1.7+
(In reply to comment #11)
> Please note that in my original report (235859) the reference was to firefox's
> behaviour when right-clicking on a link and choosing "Save Link to Disk..."

In that case, it's not a duplicate of this bug.
I must consider the deafening silence to mean agreement.
Attachment #143809 - Flags: superreview?(darin)
Attachment #143809 - Flags: review?(cbiesinger)
Attachment #143809 - Flags: review?(cbiesinger) → review+
Comment on attachment 143809 [details] [diff] [review]
Patch to implement my proposal (option #2)

why are we comparing text/plain variants?  why not use NS_ParseContentType (see
nsNetUtils.h) to extract the mime-type?

i'm guessing that this code is trying to work around bugs, but do we assume
those bugs only happen in ISO-8859-1 locales??
Attachment #143809 - Flags: superreview?(darin) → superreview+
Comment on attachment 143809 [details] [diff] [review]
Patch to implement my proposal (option #2)

> but do we assume
those bugs only happen in ISO-8859-1 locales??

We're working around a default misconfiguration of apache.  If the charset is
something else, then the server has been reconfigured from th default config.
Attachment #143809 - Flags: approval1.7b?
Comment on attachment 143809 [details] [diff] [review]
Patch to implement my proposal (option #2)

a=chofmann for 1.7b
Attachment #143809 - Flags: approval1.7b? → approval1.7b+
Comment on attachment 143809 [details] [diff] [review]
Patch to implement my proposal (option #2)

Checked in to the 1.7b trunk.  Leaving bug open for the real fix.
Attachment #143809 - Attachment is obsolete: true
should mark this bug fixed so it makes it into testing and release notes and
follow up with new bug for remaining work
Flags: blocking1.7b? → blocking1.7b+
No.  Please no.  No release note on this topic.  In fact, I would prefer no
release note on bug 220807 until we get this bug fixed.  And if someone takes it
upon themselves to resolve this bug fixed, which I higly recommend against,
since it's not, please have the decency to move over all the pertinent
discussion to the bug you file as a followup.
(In reply to comment #15)
> We're working around a default misconfiguration of apache.  If the charset is
> something else, then the server has been reconfigured from th default config.

I'm just surprised that that old default config of Apache doesn't use the result
of nl_langinfo(CODESET) or getenv("LANG") or something like that.  I guess
that's just reality.
*** Bug 240055 has been marked as a duplicate of this bug. ***
*** Bug 240586 has been marked as a duplicate of this bug. ***
You can reproduce this problem consistently with
http://www.munuc.org/mygraph.svgz which as a gzip-encoded SVG file.

The Apache server its running on has the following line set:
AddEncoding x-gzip .gz .tgz .svgz

This means it sends the HTTP header 'Content-Encoding: gzip' when it serves the
file. In this case, the file is silently un-gzipped as per the bug, but retains
its mygraph.svgz filename.

If I remove the .svgz from the AddEncoding directive, then no Content-Encoding
header is sent, and the file is downloaded correctly as a gzip-encoded SVG file
called mygraph.svgz.

In *both* cases, the Content-Type is image/svg+xml.

I think this might clear things up a bit, at least in terms of where this
happens. It appears that Mozilla is confused about what to do when the
'Content-Encoding: gzip' header is set. It should only decompress the file when
its rendered in the browser, not when its saved to disk.
no, it should also uncompress when opening in a helper app, and "sometimes" when
saving to disk. for example, would you appreciate a .html file that is saved
gzipped because the server wanted to save bandwith?
(In reply to comment #24)
> no, it should also uncompress when opening in a helper app, and "sometimes" when
> saving to disk. for example, would you appreciate a .html file that is saved
> gzipped because the server wanted to save bandwith?

Point taken, but how can we tell the difference? On the one hand, if we always
don't decompress when saving then we run into issues like you mentioned, but if
we always decompress we aslo run into some nasty issues.

Perhaps we need to introduce some kind of smart logic thing like we did for
solving MIME type issues in bug 220807. In all seriousness, we could maintain a
small database of common MIME types for which data should be saved compressed
when saved to disk (like application/x-gzip), and for MIME types for which it
should be saved uncompressed (like text/html). Anything that's not in the
database could be determined by some kind of logic like:

a) Uncompressed data = ASCII text --> uncompress
b) Uncompressed data = Binary data --> leave compressed

I realise that this is a very simplistic solution, and is certainly not optimal.
But at the same time this bug needs to be mitigated somehow for 1.7 (using this
solution or some other solution). We certainly shouldn't allow Firefox 1.0 and
Netscape 7.2 to ship with this bug in its current state.

I know this is a tough one to find an acceptable solution for ...
the current solution for that is a list of content-encoding/mimetype pairs which
will never be decoded...
(In reply to comment #25)
> Point taken, but how can we tell the difference?

In practice?  We can't.  Blame the utter lack of implementation of
transfer-encoding in both browsers and servers.

So we have some heuristics that catch common cases.  We can adjust those for
.svgz, but that has nothing to do with this bug.  Please file a separate bug. 
The relevant code is at
http://lxr.mozilla.org/seamonkey/source/uriloader/exthandler/nsExternalHelperAppService.cpp#623

If you want to suggest improvements to the overall architecture, feel free to do
so, in the separate bug.
Is there a fix needed here to block 1.7?

Or can I remove the blocking 1.7 flag?
Th part that was blocking 1.7 was checked in on the 1.7 branch.
Flags: blocking1.7+
*** Bug 243523 has been marked as a duplicate of this bug. ***
I have started seeing this bug with Firefox 3.0 (Mac OS X 10.4.11); it didn't occur with Firefox versions < 3.

It happens when using phpMyAdmin to download a gzipped SQL dump.  Firefox asks what I want to do with the file, I say "Save to Disk", and it appears to save ok with the correct .sql.gz extension.  However somewhere along the way it has been unzipped (despite the .gz extension), so it's actually a decompressed file that is saved.
Assignee: file-handling → nobody
QA Contact: ian → file-handling
I am also experiencing this issue with gzipped myphpadmin exports, but it seems that I shouldn't be, given the response headers.  The following are the response headers sent by the server:

HTTP/1.0 200 OK
Server: cpsrvd/11.2.2
Connection: close
X-Powered-By: PHP/5.2.3
Set-Cookie: pma_fontsize=82%25; expires=Tue, 02-Jun-2009 20:31:35 GMT; path=/3rdparty/phpMyAdmin/; httponly
Cache-Control: private, max-age=10800, pre-check=10800
Last-Modified: Fri, 14 Dec 2007 20:36:48 GMT
Set-Cookie: pma_theme=deleted; expires=Sat, 03-May-2008 20:31:34 GMT; path=/3rdparty/phpMyAdmin/
Set-Cookie: pma_server_filename_template=database-%25Y%25m%25d_%25H%25M%25S; expires=Tue, 02-Jun-2009 20:31:35 GMT; path=/3rdparty/phpMyAdmin/; httponly
Content-Encoding: x-gzip
Content-Type: application/x-gzip
Expires: Sun, 03 May 2009 20:31:35 GMT
Content-Disposition: attachment; filename="database-20090503_133135.sql.gz"
Pragma: no-cache

Since the server sends the Content-Type as application/x-gzip, FF doesn't need to guess the type, so I would expect it to keep the file compressed (at least when saving to disk).  This was not a problem with FF2; it only became an issue once I upgraded to FF3.
I'm also experiencing this problem when saving gziped SQL dumps from phpMyAdmin, the worst part is not the fact that I get an uncompressed file... the worst part is that I end getting a CORRUPTED (truncated) file!! Damn... fortunately I noticed it and from now will use another browser to get SQL dumps, else I was trusting a few corrupted SQL backup files... the day I would need them... I can't imagine :(

This is a serious bug!
Just to add: phpMyAdmin 2.11.7 seems to workaround the problem. Anyway still a Firefox bug as it should either save compressed or uncompressed file, but never a corrupted one.
Nuno:
Content-Encoding: x-gzip means that the receiving file must be unzipped to get the content-type specified with Content-Type: application/x-gzip.
That would mean that the file have to be double-gzipped if you send both headers according to the RFC and myphpadmin doesn't seem to do that and that makes it a myPHPadmin bug.
The myphpadmin bug is bug 424306

read https://bugzilla.mozilla.org/show_bug.cgi?id=424306#c9 and
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
Comment 36 is wrong, actually.  Content-Encoding actually means the data should NOT be gunzipped, ever, when saving.  However, some servers use it to mean 
"gunzip when saving" due to lack of Transfer-Encoding support.  Please read my earlier comments in this bug.

In any case, someone needs to sit down, pick one of the implementation choices described in my earlier comments here, and implement it.
Matthias: the more important is not if the file should be decompressed or not, the main problem is we end with a truncated/corrupted file, and this is not normal, right?
I agree with comment 37, though Transfer-Encoding, even when supported, has its own problems with proxies: http://stackapps.com/questions/916/why-content-encoding-gzip-rather-than-transfer-encoding-gzip

So, it may not be a solution.

This bug still occurs, at least with Iceweasel 10.0.7 (Debian). An example:

  http://www.vinc17.net/test/gztest-compr.gz

I think that for files served with "Content-Encoding: gzip" (or x-gzip) and saved with an ending .gz extension (or any equivalent extension, such as .tgz -- the list of such extensions should probably be configurable), Firefox should do the following:
1. Decompress the file.
2. If the result is still in the gzip format, save this uncompressed file (this step would solve bug 610679 about double gzipped files).
3. Otherwise save the file uncompressed.
For (3), I meant: "save the file compressed (as received from the server)".
I can confirm that the bug is still here in the latest Nigthly version of Firefox - the files get saved with .gz extension (in my case SQL dump from phpmyadmin) but they are actually uncompressed.
Flags: approval1.7b+
Product: Core → Firefox
Version: Trunk → unspecified
Not sure if this is exactly the same issue, but Jake Archibald created a demo that shows we still get the sizes confused when decompressing gzip for a download:

https://gzipped-download-test.glitch.me/

Note, though, this demo does not leave an extraneous .gz extension on the file.
Severity: normal → S3

The severity field for this bug is relatively low, S3. However, the bug has 6 duplicates.
:Gijs, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(gijskruitbosch+bugs)

The last needinfo from me was triggered in error by recent activity on the bug. I'm clearing the needinfo since this is a very old bug and I don't know if it's still relevant.

Flags: needinfo?(gijskruitbosch+bugs)
You need to log in before you can comment on or make changes to this bug.