Closed Bug 121001 Opened 23 years ago Closed 23 years ago

Mozilla gzip-compresses this file on download, but it opens fine directly in browser.

Categories

(Core Graveyard :: File Handling, defect)

x86
All
defect
Not set
major

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.2alpha

People

(Reporter: jesse.houwing, Assigned: bugs)

References

()

Details

Open the url. Rightclick the top faq/walkthrough (the one of +- 500 kb). and choose save link as. Save to you harddisk and open in notepad or something simular. The result is a unreadable file.
Confirming with 2002-01-19-22 on Linux. For some reason mozilla seams to gzip compress this file on download. I ran 'file' on the downloaded file and it reported this: baldurs_gate_c.txt: gzip compressed data, deflated, last modified: Thu Jan 1 01:00:00 1970, max compression, os: MS-DOS I renamed the file to baldurs_gate.gz, ran gzip -d on it and then I could view it. If I download it with 'wget' it detects it as plain/text and you can view it right away: --15:47:26-- http://db.gamefaqs.com/computer/doswin/file/baldurs_gate_c.txt => `baldurs_gate_c.txt' Connecting to db.gamefaqs.com:80... connected! HTTP request sent, awaiting response... 200 OK Length: 564,597 [text/plain]
Summary: Files opens fine in browser (only a few characters not showing up) bus save as is garbled → Mozilla gzip-compresses this file on download, but it opens fine directly in browser.
Probably a dupe of bug 107991.
This sounds alot like bug 51852 which I notice got marked WFM a few days ago, but it's got quite a number of dupes.... Andre, when you tested with wget did you send Accept Encoding header? Mozilla sends |gzip, deflate, compress;q=0.9| for that header value, but wget does not.
OS: Windows 2000 → Linux
Christopher, I just tried with wget --header="Accept Encoding: gzip, deflate, compress;q=0.9" and got this out of wget: Invalid specification `Accept Encoding: gzip, deflate, compress;q=0.9'. What is the right parameter to send?
Hmm, I guess Accept-Encoding is the right one. Here's what I got when I used that: --16:47:59-- http://db.gamefaqs.com/computer/doswin/file/baldurs_gate_c.txt => `baldurs_gate_c.txt' Connecting to db.gamefaqs.com:80... connected! HTTP request sent, awaiting response... 200 OK 2 Server: Microsoft-IIS/5.0 3 Connection: keep-alive 4 Date: Sun, 20 Jan 2002 15:38:28 GMT 5 Content-Type: text/plain 6 Accept-Ranges: bytes 7 Last-Modified: Tue, 03 Jul 2001 18:50:21 GMT 8 ETag: "aac9c22f13c11:802" 9 Content-Length: 564597 10 0K .......... .......... .......... .......... .......... 9% @ 45.25 KB/s 50K .......... .......... .......... .......... .......... 18% @ 59.59 KB/s 100K .......... .......... .......... .......... .......... 27% @ 57.94 KB/s 150K .......... .......... .......... .......... .......... 36% @ 59.67 KB/s 200K .......... .......... .......... .......... .......... 45% @ 59.59 KB/s 250K .......... .......... .......... .......... .......... 54% @ 58.00 KB/s 300K .......... .......... .......... .......... .......... 63% @ 59.52 KB/s 350K .......... .......... .......... .......... .......... 72% @ 57.08 KB/s 400K .......... .......... .......... .......... .......... 81% @ 60.46 KB/s 450K .......... .......... .......... .......... .......... 90% @ 59.59 KB/s 500K .......... .......... .......... .......... .......... 99% @ 57.74 KB/s 550K . 100% @ 1.33 MB/s 16:48:09 (57.47 KB/s) - `baldurs_gate_c.txt' saved [564597/564597] The saved file was in plain text format.
I just tried with telnet, and it is sent gzcompressed over the line. If you set content-encoding in wget it should decode at save (and so it does) telnet won't offer this for you so it's a good alternative.
The server sends us gzip'd content as seen with NSPR_LOG_MODULES=nsHttp:5 Here's a partial log of the transfers of that file. 1024[8086de8]: http request [ 1024[8086de8]: GET /computer/doswin/file/baldurs_gate_c.txt HTTP/1.1 1024[8086de8]: Host: db.gamefaqs.com 1024[8086de8]: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7+) Gecko/20020119 1024[8086de8]: Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain; q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1 1024[8086de8]: Accept-Language: en-us 1024[8086de8]: Accept-Encoding: gzip, deflate, compress;q=0.9 1024[8086de8]: Accept-Charset: ISO-8859-1, utf-8;q=0.66, *;q=0.66 1024[8086de8]: Keep-Alive: 300 1024[8086de8]: Connection: keep-alive 1024[8086de8]: Cache-Control: max-age=0 1024[8086de8]: ] =================== 1026[8128880]: http response [ 1026[8128880]: HTTP/1.1 200 OK 1026[8128880]: Server: Microsoft-IIS/5.0 1026[8128880]: Content-Encoding: gzip 1026[8128880]: Date: Sun, 20 Jan 2002 15:51:37 GMT 1026[8128880]: Content-Type: text/plain 1026[8128880]: Accept-Ranges: bytes 1026[8128880]: Last-Modified: Tue, 03 Jul 2001 18:50:19 GMT 1026[8128880]: Etag: "80277c1f13c11:802" 1026[8128880]: Content-Length: 163389 1026[8128880]: Expires: Wed, 01 Jan 1997 12:00:00 GMT 1026[8128880]: Cache-Control: max-age=86400 1026[8128880]: Vary: Accept-Encoding 1026[8128880]: ]
Based on that log, I think this probably *is* a dupe of bug 51852 in which case it needs to be re-opened.
OS: Linux → All
See bug 87449 and bug 108688 for the same problem on gamefaqs. Dupe. *** This bug has been marked as a duplicate of 51852 ***
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
Please see my comments in bug 51852. This is _not_ a duplicate, but rather a new bug that's in a totally different area of the code and needs to be dealt with by a totally different person.
Reopening per bz's comments
Status: RESOLVED → UNCONFIRMED
Resolution: DUPLICATE → ---
->ben
Assignee: darin → ben
Note that per the HTTP1.1 rfc it's not clear whether content-encoding should or should not be decoded if the data is just being saved...
Component: Networking: HTTP → File Handling
WFM using 2002011503 Win2k. Maybe a Linux-only bug?
Does not WFM 2002011604 win2k
*** Bug 96490 has been marked as a duplicate of this bug. ***
confirming, based on bug 96490...
Status: UNCONFIRMED → NEW
Ever confirmed: true
Confirmed on 2002012203 WinXP Also not that bug #51852 (closed at the moment) has 10 votes. This vote has replaced #51852 Please look into this and set it to some near future milestone because this behavior is EXTREMELY annoying. It also happens when you use File->Save Page As... on a Gzipped/deflated page. BTW why is this bug still marked as 'new' and why was #51852 closed and not this one?
Please see comment 10.
This bug is NEW because no one has started working on it yet. This is because it's not clear what we should do: 1) You want text files that are gzipped to get unzipped. 2) You want tar files that are gzipped to _not_ get unzipped 3) What do you want with gzipped postscript files? Oh, and servers send basically identical data in cases #1, #2, and #3... :)
What to do is simple: The layer which is responsible for receiving the HTTP-Response must unzip the content. Reasons: - The browser requested the content as gzipped ("accept-encoding: gzip") (Maybe the accept-encoding is a std request header without brain) - The user do not request it as compressed. - The information that the response is compressed ("content-encoding: gzip") inside the http-header is lost after leaving the http-layer. Another solution is to request a file not as compressed if the decompression cannot be done. How it is implemented inside the renderer? How it knows its compressed, by checking the first bytes or knowing the http-response?
> Maybe the accept-encoding is a std request header without brain It is. We _always_ list the encodings we support. > The user do not request it as compressed Not true. When I request a .tar.gz file I _do_ request it as compressed. When I request a text file, I usually want it uncompressed. For postscript, I tend to want one or the other depending on the size of the file. > How it is implemented inside the renderer? The renderer asks the HTTP layer to always decompress everything before passing it on. We can't do that when saving files because servers send archive files with type and encoding headers such that they would be decompressed. _That_ is the crux of the problem, as I said. Servers make no distinction between files they send that should be decompressed before saving and files they send that should _not_ be decompressed before saving. This happens because Transfer-Encoding is unsupported in nearly all servers and browsers (including Mozilla) so servers have to use Content-Encoding to mean both content-encoding and transfer-encoding. Then we're left trying to guess which they meant... > Another solution is to request a file not as compressed if the decompression > cannot be done. Hmm. That's possibility, actually.... It doesn't help the case when we click on a link, then the user picks "save" from the dialog. At that point, the response is long since made... But it could help the "save as" case. I'm glad we're having this discussion.... This thing is pretty broken as it stands. :)
> > Maybe the accept-encoding is a std request header without brain > > It is. We _always_ list the encodings we support. I understand supporting if the response of the request is decoded. So accept-encoding is not static like the agent statement, it should be set per request. > > > The user do not request it as compressed > > Not true. When I request a .tar.gz file I _do_ request it as > compressed. When > I request a text file, I usually want it uncompressed. For > postscript, I tend > to want one or the other depending on the size of the file. No. If the browser accepts the encoding as gzipped and the web server may compress it again (like a tar.gz.gz) and says in the response "content-encoding: gzip". And the http-engine must it unzip to save a tar.gz and not a tar.gz.gz file. Or the webserver says nothing about the contentencoding and the file is transfered as .tar.gz . In this case the response data is compressed but this doest matter in the view of the http-engine. > > > How it is implemented inside the renderer? > > The renderer asks the HTTP layer to always decompress > everything before passing > it on. We can't do that when saving files because servers > send archive files > with type and encoding headers such that they would be > decompressed. _That_ is > the crux of the problem, as I said. Servers make no > distinction between files > they send that should be decompressed before saving and files > they send that > should _not_ be decompressed before saving. This happens because > Transfer-Encoding is unsupported in nearly all servers and > browsers (including > Mozilla) so servers have to use Content-Encoding to mean both > content-encoding > and transfer-encoding. Then we're left trying to guess which > they meant... The engine must not try to decompress a file, but it must decompress it in case of the content-encoding. (A file .tar.gz is not marked with content-encoding: gzip, because a zip file is not zipped again) > > > Another solution is to request a file not as compressed if > the decompression > > cannot be done. > > Hmm. That's possibility, actually.... It doesn't help the > case when we click > on a link, then the user picks "save" from the dialog. At > that point, the > response is long since made... But it could help the "save as" case. > > I'm glad we're having this discussion.... This thing is > pretty broken as it > stands. :) >
> I understand supporting if the response of the request is decoded. For transfer-encoding, sure. For content-encoding, I'm not convinced... Like I said, the problem is servers using content-encoding to mean both... > And the http-engine must it unzip to save a tar.gz and not a tar.gz.gz file. Too bad web servers send a .tar.gz _singly_ compressed and just claim it to be type application/tar, encoding gzip. > (A file .tar.gz is not marked with content-encoding: gzip, because a zip file > is not zipped again) In an ideal world, sure. In practice, a .tar.gz is sent as either content-type: application/tar content-encoding: gzip or content-type: application/x-gzip content-encoding: gzip by Apache. Since a large fraction of the servers out there are Apache, especially for unix-oriented sites, we have to deal with these broken behaviors.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.2
Depends on: 69306
Fixed by checkin for bug 69306
Status: ASSIGNED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
I can verify that the original testcase works with 2002-03-28-08 on Linux. VERIFIED FIXED.
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.