Open Bug 233047 Opened 20 years ago Updated 9 months ago
From some download servers, browser uncompresses gzip file, but keeps gz extension
. Download manager is confused over file size and time remaining (ns Binary Detector)
User-Agent: Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7a) Gecko/20040203 Firebird/0.8.0+ If I go to the mozilla download mirror at http://mozilla.gnusoft.net/firebird/nightly/2004-02-03-02-trunk/ (or any of the archive directories at that site, this is just an example) and click on Firebird-mac.dmg.gz, it'll download the file and gunzip it at the same time. The unzipping is unasked for, and occurs despite the fact that the save dialog (on Mac at least), says it's being saved as a Gnu ZIP archive. If it's supposed to uncompress by default, that's fine. However: (i) it only does it from some servers... as it doesn't do it from the identical file at the official Mozilla site (i.e. http://ftp.mozilla.org/pub/mozilla.org/firebird/nightly/2004-02-03-02-trunk/), (ii) it leaves the .gz extension. The fact that it's gunzipped is only evident by the greater file size, and if I remove the .gz extension, I can open it as a normal disk image. (iii) during download, the download manager assumes the original gzipped size when calculating the total file size, and the time remaining. This leads it to it saying things like "9.0 of 8.3 MB" and giving negative time remaining. I can reproduce this in both Firebird and Seamonkey. I have also reproduced it on Firebird on both Mac OS X and Windows XP, on computers at totally different locations, using different types of networks (ethernet and home dial-up). It also occurs with a fresh profile. Reproducible: Always Steps to Reproduce: 1. Navigate to http://mozilla.gnusoft.net/firebird/nightly/2004-02-03-02-trunk/ 2. Click on Firebird-mac.dmg.gz and go through the save dialog to save it to your desktop Actual Results: File uncompresses (gunzips) during download, but download manager gets confused over file size and time remaining. File .gz extension remains. Expected Results: Possibly not gunzipped the file during download. Or at least the download manager should have calculated stats based on the correct file size, and the .gz extension should have been removed from the file. Sounds like a partial regression of bug 35956, as in it only exhibits the problems at some servers and not others.
gnusoft.net sends: Content-Encoding: x-gzip Content-Type: text/plain ftp.m.o sends: Content-Encoding: x-gzip Content-Type: application/x-gzip >(iii) during download, the download manager assumes the original gzipped size >when calculating the total file size, and the time remaining. This leads it to >it saying things like "9.0 of 8.3 MB" and giving negative time remaining. heh well, it's clear how that happens - the total size is the Content-Length that the server sent, the current position is the sum of bytes saved. could you file a separate bug about that, though? component would probably be file handling. god I hate this entire uncompress issues... I want bug 57342 fixed... this may be a duplicate, not sure...
Assignee: general → file-handling
Component: Browser-General → File Handling
QA Contact: general → ian
Oh, this is an interesting case... The problem here is that there is a stream converter between the download code and the channel -- the nsBinaryDetector trying to guess what type the data _really_ is. This was the case we thought we'd never hit because we figured that no server would send no content-type and a content-encoding, but now of course we pump data through the binary detector in this case, decompressing it as we go. I'm not sure whether we have bugs on this issue, but it _is_ one we need to solve for stream converters in general and we've known that for a while. There is no simple solution here, really, short of not using nsBinaryDetector in cases when we have content-encoding (which would mean that this file would be displayed as text/plain in the browser).
So just to clarify, the download code DOES tell necko not to gunzip the file. Necko just can't do that because due to the stream converter mess it's too late -- the file has been gunzipped.
Note that if we do this right nsBinaryDetector will _always_ be looking at gzipped data in cases like this, and hence will fail to detect some actual plaintext as such. Which would be bad. So once we fix the general streamconverter bug, we will still need to do something special with nsBinaryDetector. Presumably the same thing as what nsUnknownDecoder will have to do to properly sniff the data inside a gzipped file...
*** Bug 235859 has been marked as a duplicate of this bug. ***
Summary: From some download servers, browser uncompresses gzip file, but keeps gz extension. Download manager is confused over file size and time remaining → From some download servers, browser uncompresses gzip file, but keeps gz extension. Download manager is confused over file size and time remaining (nsBinaryDetector)
*** Bug 235829 has been marked as a duplicate of this bug. ***
*** Bug 233534 has been marked as a duplicate of this bug. ***
*** Bug 236710 has been marked as a duplicate of this bug. ***
*** Bug 236978 has been marked as a duplicate of this bug. ***
OK, at this point the "right" fix is not happening for 1.7. Nevertheless, I would rather not ship 1.7 with this bug (and certainly FireFox 1.0 doesn't want this bug). Here are the interim solutions I see that could be implemented very easily (as long as we pick one): 1) Back out the fix to bug 220807 altogether 2) Disable that fix in cases when content-encoding is set. This will rebreak .dmg.gz files served by Apache servers (but right now we screw them up anyway -- we save them decompressed, but with the .dmg.gz name). 3) Disable decompression for files served as text/plain. This will break sites that actually send such files as text/plain (sourceforge, eg). There is also 4) Do the solution I mentioned in comment 4. This would need to go into 1.7b to have a hope in hell of happening for 1.7, and I just don't have the time to do it in the sort of timeframe involved there... If someone else wants to do it, I can elaborate on what needs to be done. Frankly, I prefer option #2. It doesn't really break anything that's not already broken as things stand.... Pinkerton, Ben what are your exact plans as far as branches and such go?
(In reply to comment #10) Please note that in my original report (235859) the reference was to firefox's behaviour when right-clicking on a link and choosing "Save Link to Disk...". In this case decompressing the file would always be broken behaviour; it should be saved in the exact form that it comes from the server. This is particularly true for the Debian source packages I refer to in 235859; subsequent processing of these files, once downloaded, by the Debian package management tools, assumes that they are in compressed form. The Debian server also provides md5sums to check the integrity of the downloaded file, which of course don't work with the uncompressed files. Since making the report in 235859 I have also discovered that re-compressing the downloaded files with gzip does not restore them to the exact same form as they were on the Debian server; the Debian package management tools complain that the .tar.gz file is a few tens of bytes larger than the expected size - if I retrieve the files unmodified using wget, this problem does not arise, so it is an artefact of the decompression/recompression fudge. This is a further reason why right-click/"Save Link to Disk..." should not modify the file in any way.
(In reply to comment #11) > Please note that in my original report (235859) the reference was to firefox's > behaviour when right-clicking on a link and choosing "Save Link to Disk..." In that case, it's not a duplicate of this bug.
I must consider the deafening silence to mean agreement.
20 years ago
Attachment #143809 - Flags: review?(cbiesinger) → review+
Comment on attachment 143809 [details] [diff] [review] Patch to implement my proposal (option #2) why are we comparing text/plain variants? why not use NS_ParseContentType (see nsNetUtils.h) to extract the mime-type? i'm guessing that this code is trying to work around bugs, but do we assume those bugs only happen in ISO-8859-1 locales??
Attachment #143809 - Flags: superreview?(darin) → superreview+
Comment on attachment 143809 [details] [diff] [review] Patch to implement my proposal (option #2) > but do we assume those bugs only happen in ISO-8859-1 locales?? We're working around a default misconfiguration of apache. If the charset is something else, then the server has been reconfigured from th default config.
Attachment #143809 - Flags: approval1.7b?
Comment on attachment 143809 [details] [diff] [review] Patch to implement my proposal (option #2) a=chofmann for 1.7b
Attachment #143809 - Flags: approval1.7b? → approval1.7b+
Comment on attachment 143809 [details] [diff] [review] Patch to implement my proposal (option #2) Checked in to the 1.7b trunk. Leaving bug open for the real fix.
Attachment #143809 - Attachment is obsolete: true
should mark this bug fixed so it makes it into testing and release notes and follow up with new bug for remaining work
Flags: blocking1.7b? → blocking1.7b+
No. Please no. No release note on this topic. In fact, I would prefer no release note on bug 220807 until we get this bug fixed. And if someone takes it upon themselves to resolve this bug fixed, which I higly recommend against, since it's not, please have the decency to move over all the pertinent discussion to the bug you file as a followup.
(In reply to comment #15) > We're working around a default misconfiguration of apache. If the charset is > something else, then the server has been reconfigured from th default config. I'm just surprised that that old default config of Apache doesn't use the result of nl_langinfo(CODESET) or getenv("LANG") or something like that. I guess that's just reality.
*** Bug 240055 has been marked as a duplicate of this bug. ***
*** Bug 240586 has been marked as a duplicate of this bug. ***
You can reproduce this problem consistently with http://www.munuc.org/mygraph.svgz which as a gzip-encoded SVG file. The Apache server its running on has the following line set: AddEncoding x-gzip .gz .tgz .svgz This means it sends the HTTP header 'Content-Encoding: gzip' when it serves the file. In this case, the file is silently un-gzipped as per the bug, but retains its mygraph.svgz filename. If I remove the .svgz from the AddEncoding directive, then no Content-Encoding header is sent, and the file is downloaded correctly as a gzip-encoded SVG file called mygraph.svgz. In *both* cases, the Content-Type is image/svg+xml. I think this might clear things up a bit, at least in terms of where this happens. It appears that Mozilla is confused about what to do when the 'Content-Encoding: gzip' header is set. It should only decompress the file when its rendered in the browser, not when its saved to disk.
no, it should also uncompress when opening in a helper app, and "sometimes" when saving to disk. for example, would you appreciate a .html file that is saved gzipped because the server wanted to save bandwith?
(In reply to comment #24) > no, it should also uncompress when opening in a helper app, and "sometimes" when > saving to disk. for example, would you appreciate a .html file that is saved > gzipped because the server wanted to save bandwith? Point taken, but how can we tell the difference? On the one hand, if we always don't decompress when saving then we run into issues like you mentioned, but if we always decompress we aslo run into some nasty issues. Perhaps we need to introduce some kind of smart logic thing like we did for solving MIME type issues in bug 220807. In all seriousness, we could maintain a small database of common MIME types for which data should be saved compressed when saved to disk (like application/x-gzip), and for MIME types for which it should be saved uncompressed (like text/html). Anything that's not in the database could be determined by some kind of logic like: a) Uncompressed data = ASCII text --> uncompress b) Uncompressed data = Binary data --> leave compressed I realise that this is a very simplistic solution, and is certainly not optimal. But at the same time this bug needs to be mitigated somehow for 1.7 (using this solution or some other solution). We certainly shouldn't allow Firefox 1.0 and Netscape 7.2 to ship with this bug in its current state. I know this is a tough one to find an acceptable solution for ...
the current solution for that is a list of content-encoding/mimetype pairs which will never be decoded...
(In reply to comment #25) > Point taken, but how can we tell the difference? In practice? We can't. Blame the utter lack of implementation of transfer-encoding in both browsers and servers. So we have some heuristics that catch common cases. We can adjust those for .svgz, but that has nothing to do with this bug. Please file a separate bug. The relevant code is at http://lxr.mozilla.org/seamonkey/source/uriloader/exthandler/nsExternalHelperAppService.cpp#623 If you want to suggest improvements to the overall architecture, feel free to do so, in the separate bug.
Is there a fix needed here to block 1.7? Or can I remove the blocking 1.7 flag?
Th part that was blocking 1.7 was checked in on the 1.7 branch.
*** Bug 243523 has been marked as a duplicate of this bug. ***
I have started seeing this bug with Firefox 3.0 (Mac OS X 10.4.11); it didn't occur with Firefox versions < 3. It happens when using phpMyAdmin to download a gzipped SQL dump. Firefox asks what I want to do with the file, I say "Save to Disk", and it appears to save ok with the correct .sql.gz extension. However somewhere along the way it has been unzipped (despite the .gz extension), so it's actually a decompressed file that is saved.
Assignee: file-handling → nobody
QA Contact: ian → file-handling
I am also experiencing this issue with gzipped myphpadmin exports, but it seems that I shouldn't be, given the response headers. The following are the response headers sent by the server: HTTP/1.0 200 OK Server: cpsrvd/11.2.2 Connection: close X-Powered-By: PHP/5.2.3 Set-Cookie: pma_fontsize=82%25; expires=Tue, 02-Jun-2009 20:31:35 GMT; path=/3rdparty/phpMyAdmin/; httponly Cache-Control: private, max-age=10800, pre-check=10800 Last-Modified: Fri, 14 Dec 2007 20:36:48 GMT Set-Cookie: pma_theme=deleted; expires=Sat, 03-May-2008 20:31:34 GMT; path=/3rdparty/phpMyAdmin/ Set-Cookie: pma_server_filename_template=database-%25Y%25m%25d_%25H%25M%25S; expires=Tue, 02-Jun-2009 20:31:35 GMT; path=/3rdparty/phpMyAdmin/; httponly Content-Encoding: x-gzip Content-Type: application/x-gzip Expires: Sun, 03 May 2009 20:31:35 GMT Content-Disposition: attachment; filename="database-20090503_133135.sql.gz" Pragma: no-cache Since the server sends the Content-Type as application/x-gzip, FF doesn't need to guess the type, so I would expect it to keep the file compressed (at least when saving to disk). This was not a problem with FF2; it only became an issue once I upgraded to FF3.
I'm also experiencing this problem when saving gziped SQL dumps from phpMyAdmin, the worst part is not the fact that I get an uncompressed file... the worst part is that I end getting a CORRUPTED (truncated) file!! Damn... fortunately I noticed it and from now will use another browser to get SQL dumps, else I was trusting a few corrupted SQL backup files... the day I would need them... I can't imagine :( This is a serious bug!
Just to add: phpMyAdmin 2.11.7 seems to workaround the problem. Anyway still a Firefox bug as it should either save compressed or uncompressed file, but never a corrupted one.
Nuno: Content-Encoding: x-gzip means that the receiving file must be unzipped to get the content-type specified with Content-Type: application/x-gzip. That would mean that the file have to be double-gzipped if you send both headers according to the RFC and myphpadmin doesn't seem to do that and that makes it a myPHPadmin bug. The myphpadmin bug is bug 424306 read https://bugzilla.mozilla.org/show_bug.cgi?id=424306#c9 and http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
Comment 36 is wrong, actually. Content-Encoding actually means the data should NOT be gunzipped, ever, when saving. However, some servers use it to mean "gunzip when saving" due to lack of Transfer-Encoding support. Please read my earlier comments in this bug. In any case, someone needs to sit down, pick one of the implementation choices described in my earlier comments here, and implement it.
Matthias: the more important is not if the file should be decompressed or not, the main problem is we end with a truncated/corrupted file, and this is not normal, right?
I agree with comment 37, though Transfer-Encoding, even when supported, has its own problems with proxies: http://stackapps.com/questions/916/why-content-encoding-gzip-rather-than-transfer-encoding-gzip So, it may not be a solution. This bug still occurs, at least with Iceweasel 10.0.7 (Debian). An example: http://www.vinc17.net/test/gztest-compr.gz I think that for files served with "Content-Encoding: gzip" (or x-gzip) and saved with an ending .gz extension (or any equivalent extension, such as .tgz -- the list of such extensions should probably be configurable), Firefox should do the following: 1. Decompress the file. 2. If the result is still in the gzip format, save this uncompressed file (this step would solve bug 610679 about double gzipped files). 3. Otherwise save the file uncompressed.
For (3), I meant: "save the file compressed (as received from the server)".
I can confirm that the bug is still here in the latest Nigthly version of Firefox - the files get saved with .gz extension (in my case SQL dump from phpmyadmin) but they are actually uncompressed.
Product: Core → Firefox
Version: Trunk → unspecified
Not sure if this is exactly the same issue, but Jake Archibald created a demo that shows we still get the sizes confused when decompressing gzip for a download: https://gzipped-download-test.glitch.me/ Note, though, this demo does not leave an extraneous .gz extension on the file.
You need to log in before you can comment on or make changes to this bug.