Closed
Bug 121001
Opened 23 years ago
Closed 23 years ago
Mozilla gzip-compresses this file on download, but it opens fine directly in browser.
Categories
(Core Graveyard :: File Handling, defect)
Tracking
(Not tracked)
VERIFIED
FIXED
mozilla1.2alpha
People
(Reporter: jesse.houwing, Assigned: bugs)
References
()
Details
Open the url.
Rightclick the top faq/walkthrough (the one of +- 500 kb). and choose save link as.
Save to you harddisk and open in notepad or something simular. The result is a
unreadable file.
Comment 1•23 years ago
|
||
Confirming with 2002-01-19-22 on Linux. For some reason mozilla seams to gzip
compress this file on download. I ran 'file' on the downloaded file and it
reported this:
baldurs_gate_c.txt: gzip compressed data, deflated, last modified: Thu Jan 1
01:00:00 1970, max compression, os: MS-DOS
I renamed the file to baldurs_gate.gz, ran gzip -d on it and then I could view
it. If I download it with 'wget' it detects it as plain/text and you can view it
right away:
--15:47:26-- http://db.gamefaqs.com/computer/doswin/file/baldurs_gate_c.txt
=> `baldurs_gate_c.txt'
Connecting to db.gamefaqs.com:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: 564,597 [text/plain]
Updated•23 years ago
|
Summary: Files opens fine in browser (only a few characters not showing up) bus save as is garbled → Mozilla gzip-compresses this file on download, but it opens fine directly in browser.
Comment 2•23 years ago
|
||
Probably a dupe of bug 107991.
Comment 3•23 years ago
|
||
This sounds alot like bug 51852 which I notice got marked WFM a few days ago,
but it's got quite a number of dupes....
Andre, when you tested with wget did you send Accept Encoding header? Mozilla
sends |gzip, deflate, compress;q=0.9| for that header value, but wget does not.
OS: Windows 2000 → Linux
Comment 4•23 years ago
|
||
Christopher, I just tried with wget --header="Accept Encoding: gzip, deflate,
compress;q=0.9" and got this out of wget:
Invalid specification `Accept Encoding: gzip, deflate, compress;q=0.9'.
What is the right parameter to send?
Comment 5•23 years ago
|
||
Hmm, I guess Accept-Encoding is the right one. Here's what I got when I used that:
--16:47:59-- http://db.gamefaqs.com/computer/doswin/file/baldurs_gate_c.txt
=> `baldurs_gate_c.txt'
Connecting to db.gamefaqs.com:80... connected!
HTTP request sent, awaiting response... 200 OK
2 Server: Microsoft-IIS/5.0
3 Connection: keep-alive
4 Date: Sun, 20 Jan 2002 15:38:28 GMT
5 Content-Type: text/plain
6 Accept-Ranges: bytes
7 Last-Modified: Tue, 03 Jul 2001 18:50:21 GMT
8 ETag: "aac9c22f13c11:802"
9 Content-Length: 564597
10
0K .......... .......... .......... .......... .......... 9% @ 45.25 KB/s
50K .......... .......... .......... .......... .......... 18% @ 59.59 KB/s
100K .......... .......... .......... .......... .......... 27% @ 57.94 KB/s
150K .......... .......... .......... .......... .......... 36% @ 59.67 KB/s
200K .......... .......... .......... .......... .......... 45% @ 59.59 KB/s
250K .......... .......... .......... .......... .......... 54% @ 58.00 KB/s
300K .......... .......... .......... .......... .......... 63% @ 59.52 KB/s
350K .......... .......... .......... .......... .......... 72% @ 57.08 KB/s
400K .......... .......... .......... .......... .......... 81% @ 60.46 KB/s
450K .......... .......... .......... .......... .......... 90% @ 59.59 KB/s
500K .......... .......... .......... .......... .......... 99% @ 57.74 KB/s
550K . 100% @ 1.33 MB/s
16:48:09 (57.47 KB/s) - `baldurs_gate_c.txt' saved [564597/564597]
The saved file was in plain text format.
Reporter | ||
Comment 6•23 years ago
|
||
I just tried with telnet, and it is sent gzcompressed over the line. If you set
content-encoding in wget it should decode at save (and so it does) telnet won't
offer this for you so it's a good alternative.
Comment 7•23 years ago
|
||
The server sends us gzip'd content as seen with NSPR_LOG_MODULES=nsHttp:5
Here's a partial log of the transfers of that file.
1024[8086de8]: http request [
1024[8086de8]: GET /computer/doswin/file/baldurs_gate_c.txt HTTP/1.1
1024[8086de8]: Host: db.gamefaqs.com
1024[8086de8]: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7+)
Gecko/20020119
1024[8086de8]: Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;
q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1
1024[8086de8]: Accept-Language: en-us
1024[8086de8]: Accept-Encoding: gzip, deflate, compress;q=0.9
1024[8086de8]: Accept-Charset: ISO-8859-1, utf-8;q=0.66, *;q=0.66
1024[8086de8]: Keep-Alive: 300
1024[8086de8]: Connection: keep-alive
1024[8086de8]: Cache-Control: max-age=0
1024[8086de8]: ]
===================
1026[8128880]: http response [
1026[8128880]: HTTP/1.1 200 OK
1026[8128880]: Server: Microsoft-IIS/5.0
1026[8128880]: Content-Encoding: gzip
1026[8128880]: Date: Sun, 20 Jan 2002 15:51:37 GMT
1026[8128880]: Content-Type: text/plain
1026[8128880]: Accept-Ranges: bytes
1026[8128880]: Last-Modified: Tue, 03 Jul 2001 18:50:19 GMT
1026[8128880]: Etag: "80277c1f13c11:802"
1026[8128880]: Content-Length: 163389
1026[8128880]: Expires: Wed, 01 Jan 1997 12:00:00 GMT
1026[8128880]: Cache-Control: max-age=86400
1026[8128880]: Vary: Accept-Encoding
1026[8128880]: ]
Comment 8•23 years ago
|
||
Based on that log, I think this probably *is* a dupe of bug 51852 in which case
it needs to be re-opened.
OS: Linux → All
Comment 9•23 years ago
|
||
See bug 87449 and bug 108688 for the same problem on gamefaqs.
Dupe.
*** This bug has been marked as a duplicate of 51852 ***
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
Comment 10•23 years ago
|
||
Please see my comments in bug 51852. This is _not_ a duplicate, but rather a
new bug that's in a totally different area of the code and needs to be dealt
with by a totally different person.
Comment 11•23 years ago
|
||
Reopening per bz's comments
Status: RESOLVED → UNCONFIRMED
Resolution: DUPLICATE → ---
Comment 13•23 years ago
|
||
Note that per the HTTP1.1 rfc it's not clear whether content-encoding should or
should not be decoded if the data is just being saved...
Component: Networking: HTTP → File Handling
Comment 14•23 years ago
|
||
WFM using 2002011503 Win2k. Maybe a Linux-only bug?
Reporter | ||
Comment 15•23 years ago
|
||
Does not WFM 2002011604 win2k
Comment 16•23 years ago
|
||
*** Bug 96490 has been marked as a duplicate of this bug. ***
Comment 17•23 years ago
|
||
confirming, based on bug 96490...
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 18•23 years ago
|
||
Confirmed on 2002012203 WinXP
Also not that bug #51852 (closed at the moment) has 10 votes. This vote has
replaced #51852
Please look into this and set it to some near future milestone because this
behavior is EXTREMELY annoying. It also happens when you use File->Save Page
As... on a Gzipped/deflated page.
BTW why is this bug still marked as 'new' and why was #51852 closed and not this
one?
Reporter | ||
Comment 19•23 years ago
|
||
Please see comment 10.
Comment 20•23 years ago
|
||
This bug is NEW because no one has started working on it yet. This is because
it's not clear what we should do:
1) You want text files that are gzipped to get unzipped.
2) You want tar files that are gzipped to _not_ get unzipped
3) What do you want with gzipped postscript files?
Oh, and servers send basically identical data in cases #1, #2, and #3... :)
Comment 21•23 years ago
|
||
What to do is simple:
The layer which is responsible for receiving the HTTP-Response must
unzip the content.
Reasons:
- The browser requested the content as gzipped ("accept-encoding: gzip")
(Maybe the accept-encoding is a std request header without brain)
- The user do not request it as compressed.
- The information that the response is compressed ("content-encoding: gzip")
inside the http-header is lost after leaving the http-layer.
Another solution is to request a file not as compressed if the decompression
cannot be done.
How it is implemented inside the renderer? How it knows its compressed, by
checking the first bytes or knowing the http-response?
Comment 22•23 years ago
|
||
> Maybe the accept-encoding is a std request header without brain
It is. We _always_ list the encodings we support.
> The user do not request it as compressed
Not true. When I request a .tar.gz file I _do_ request it as compressed. When
I request a text file, I usually want it uncompressed. For postscript, I tend
to want one or the other depending on the size of the file.
> How it is implemented inside the renderer?
The renderer asks the HTTP layer to always decompress everything before passing
it on. We can't do that when saving files because servers send archive files
with type and encoding headers such that they would be decompressed. _That_ is
the crux of the problem, as I said. Servers make no distinction between files
they send that should be decompressed before saving and files they send that
should _not_ be decompressed before saving. This happens because
Transfer-Encoding is unsupported in nearly all servers and browsers (including
Mozilla) so servers have to use Content-Encoding to mean both content-encoding
and transfer-encoding. Then we're left trying to guess which they meant...
> Another solution is to request a file not as compressed if the decompression
> cannot be done.
Hmm. That's possibility, actually.... It doesn't help the case when we click
on a link, then the user picks "save" from the dialog. At that point, the
response is long since made... But it could help the "save as" case.
I'm glad we're having this discussion.... This thing is pretty broken as it
stands. :)
Comment 23•23 years ago
|
||
> > Maybe the accept-encoding is a std request header without brain
>
> It is. We _always_ list the encodings we support.
I understand supporting if the response of the request is decoded.
So accept-encoding is not static like the agent statement, it should
be set per request.
>
> > The user do not request it as compressed
>
> Not true. When I request a .tar.gz file I _do_ request it as
> compressed. When
> I request a text file, I usually want it uncompressed. For
> postscript, I tend
> to want one or the other depending on the size of the file.
No. If the browser accepts the encoding as gzipped and the web server
may compress it again (like a tar.gz.gz) and says in the response
"content-encoding: gzip". And the http-engine must it unzip to save
a tar.gz and not a tar.gz.gz file. Or the webserver says nothing about
the contentencoding and the file is transfered as .tar.gz . In this
case the response data is compressed but this doest matter in the view
of the http-engine.
>
> > How it is implemented inside the renderer?
>
> The renderer asks the HTTP layer to always decompress
> everything before passing
> it on. We can't do that when saving files because servers
> send archive files
> with type and encoding headers such that they would be
> decompressed. _That_ is
> the crux of the problem, as I said. Servers make no
> distinction between files
> they send that should be decompressed before saving and files
> they send that
> should _not_ be decompressed before saving. This happens because
> Transfer-Encoding is unsupported in nearly all servers and
> browsers (including
> Mozilla) so servers have to use Content-Encoding to mean both
> content-encoding
> and transfer-encoding. Then we're left trying to guess which
> they meant...
The engine must not try to decompress a file, but it must decompress
it in case of the content-encoding. (A file .tar.gz is not marked with
content-encoding: gzip, because a zip file is not zipped again)
>
> > Another solution is to request a file not as compressed if
> the decompression
> > cannot be done.
>
> Hmm. That's possibility, actually.... It doesn't help the
> case when we click
> on a link, then the user picks "save" from the dialog. At
> that point, the
> response is long since made... But it could help the "save as" case.
>
> I'm glad we're having this discussion.... This thing is
> pretty broken as it
> stands. :)
>
Comment 24•23 years ago
|
||
> I understand supporting if the response of the request is decoded.
For transfer-encoding, sure. For content-encoding, I'm not convinced... Like I
said, the problem is servers using content-encoding to mean both...
> And the http-engine must it unzip to save a tar.gz and not a tar.gz.gz file.
Too bad web servers send a .tar.gz _singly_ compressed and just claim it to be
type application/tar, encoding gzip.
> (A file .tar.gz is not marked with content-encoding: gzip, because a zip file
> is not zipped again)
In an ideal world, sure. In practice, a .tar.gz is sent as either
content-type: application/tar
content-encoding: gzip
or
content-type: application/x-gzip
content-encoding: gzip
by Apache. Since a large fraction of the servers out there are Apache,
especially for unix-oriented sites, we have to deal with these broken behaviors.
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.2
Comment 25•23 years ago
|
||
Fixed by checkin for bug 69306
Status: ASSIGNED → RESOLVED
Closed: 23 years ago → 23 years ago
Resolution: --- → FIXED
Comment 26•23 years ago
|
||
I can verify that the original testcase works with 2002-03-28-08 on Linux.
VERIFIED FIXED.
Status: RESOLVED → VERIFIED
Updated•8 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•