Closed Bug 121001 Opened 23 years ago Closed 23 years ago

Mozilla gzip-compresses this file on download, but it opens fine directly in browser.

Categories

(Core Graveyard :: File Handling, defect)

x86
All
defect
Not set
major

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.2alpha

People

(Reporter: jesse.houwing, Assigned: bugs)

References

()

Details

Open the url.

Rightclick the top faq/walkthrough (the one of +- 500 kb). and choose save link as.

Save to you harddisk and open in notepad or something simular. The result is a
unreadable file.
Confirming with 2002-01-19-22 on Linux. For some reason mozilla seams to gzip
compress this file on download. I ran 'file' on the downloaded file and it
reported this:

baldurs_gate_c.txt: gzip compressed data, deflated, last modified: Thu Jan  1
01:00:00 1970, max compression, os: MS-DOS

I renamed the file to baldurs_gate.gz, ran gzip -d on it and then I could view
it. If I download it with 'wget' it detects it as plain/text and you can view it
right away:

--15:47:26--  http://db.gamefaqs.com/computer/doswin/file/baldurs_gate_c.txt
           => `baldurs_gate_c.txt'
Connecting to db.gamefaqs.com:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: 564,597 [text/plain]
Summary: Files opens fine in browser (only a few characters not showing up) bus save as is garbled → Mozilla gzip-compresses this file on download, but it opens fine directly in browser.
Probably a dupe of bug 107991.
This sounds alot like bug 51852 which I notice got marked WFM a few days ago,
but it's got quite a number of dupes....

Andre, when you tested with wget did you send Accept Encoding header?  Mozilla
sends |gzip, deflate, compress;q=0.9| for that header value, but wget does not. 
OS: Windows 2000 → Linux
Christopher, I just tried with wget --header="Accept Encoding: gzip, deflate,
compress;q=0.9" and got this out of wget:

Invalid specification `Accept Encoding: gzip, deflate, compress;q=0.9'.

What is the right parameter to send?
Hmm, I guess Accept-Encoding is the right one. Here's what I got when I used that:

--16:47:59--  http://db.gamefaqs.com/computer/doswin/file/baldurs_gate_c.txt
           => `baldurs_gate_c.txt'
Connecting to db.gamefaqs.com:80... connected!
HTTP request sent, awaiting response... 200 OK
2 Server: Microsoft-IIS/5.0
3 Connection: keep-alive
4 Date: Sun, 20 Jan 2002 15:38:28 GMT
5 Content-Type: text/plain
6 Accept-Ranges: bytes
7 Last-Modified: Tue, 03 Jul 2001 18:50:21 GMT
8 ETag: "aac9c22f13c11:802"
9 Content-Length: 564597
10 

    0K .......... .......... .......... .......... ..........  9% @  45.25 KB/s
   50K .......... .......... .......... .......... .......... 18% @  59.59 KB/s
  100K .......... .......... .......... .......... .......... 27% @  57.94 KB/s
  150K .......... .......... .......... .......... .......... 36% @  59.67 KB/s
  200K .......... .......... .......... .......... .......... 45% @  59.59 KB/s
  250K .......... .......... .......... .......... .......... 54% @  58.00 KB/s
  300K .......... .......... .......... .......... .......... 63% @  59.52 KB/s
  350K .......... .......... .......... .......... .......... 72% @  57.08 KB/s
  400K .......... .......... .......... .......... .......... 81% @  60.46 KB/s
  450K .......... .......... .......... .......... .......... 90% @  59.59 KB/s
  500K .......... .......... .......... .......... .......... 99% @  57.74 KB/s
  550K .                                                     100% @   1.33 MB/s

16:48:09 (57.47 KB/s) - `baldurs_gate_c.txt' saved [564597/564597]

The saved file was in plain text format.
I just tried with telnet, and it is sent gzcompressed over the line. If you set
content-encoding in wget it should decode at save (and so it does) telnet won't
offer this for you so it's a good alternative.
The server sends us gzip'd content as seen with NSPR_LOG_MODULES=nsHttp:5

Here's a partial log of the transfers of that file.

1024[8086de8]: http request [
1024[8086de8]:   GET /computer/doswin/file/baldurs_gate_c.txt HTTP/1.1
1024[8086de8]:   Host: db.gamefaqs.com
1024[8086de8]:   User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7+)
Gecko/20020119
1024[8086de8]:   Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;
q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,text/css,*/*;q=0.1
1024[8086de8]:   Accept-Language: en-us
1024[8086de8]:   Accept-Encoding: gzip, deflate, compress;q=0.9
1024[8086de8]:   Accept-Charset: ISO-8859-1, utf-8;q=0.66, *;q=0.66
1024[8086de8]:   Keep-Alive: 300
1024[8086de8]:   Connection: keep-alive
1024[8086de8]:   Cache-Control: max-age=0
1024[8086de8]: ]

===================

1026[8128880]: http response [
1026[8128880]:   HTTP/1.1 200 OK
1026[8128880]:   Server: Microsoft-IIS/5.0
1026[8128880]:   Content-Encoding: gzip
1026[8128880]:   Date: Sun, 20 Jan 2002 15:51:37 GMT
1026[8128880]:   Content-Type: text/plain
1026[8128880]:   Accept-Ranges: bytes
1026[8128880]:   Last-Modified: Tue, 03 Jul 2001 18:50:19 GMT
1026[8128880]:   Etag: "80277c1f13c11:802"
1026[8128880]:   Content-Length: 163389
1026[8128880]:   Expires: Wed, 01 Jan 1997 12:00:00 GMT
1026[8128880]:   Cache-Control: max-age=86400
1026[8128880]:   Vary: Accept-Encoding
1026[8128880]: ]
Based on that log, I think this probably *is* a dupe of bug 51852 in which case
it needs to be re-opened.
OS: Linux → All
See bug 87449 and bug 108688 for the same problem on gamefaqs.

Dupe.

*** This bug has been marked as a duplicate of 51852 ***
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
Please see my comments in bug 51852.  This is _not_ a duplicate, but rather a
new bug that's in a totally different area of the code and needs to be dealt
with by a totally different person.
Reopening per bz's comments
Status: RESOLVED → UNCONFIRMED
Resolution: DUPLICATE → ---
->ben
Assignee: darin → ben
Note that per the HTTP1.1 rfc it's not clear whether content-encoding should or
should not be decoded if the data is just being saved...
Component: Networking: HTTP → File Handling
WFM using 2002011503 Win2k. Maybe a Linux-only bug?
Does not WFM 2002011604 win2k
*** Bug 96490 has been marked as a duplicate of this bug. ***
confirming, based on bug 96490...
Status: UNCONFIRMED → NEW
Ever confirmed: true
Confirmed on 2002012203 WinXP

Also not that bug #51852 (closed at the moment) has 10 votes. This vote has
replaced #51852

Please look into this and set it to some near future milestone because this
behavior is EXTREMELY annoying. It also happens when you use File->Save Page
As... on a Gzipped/deflated page.

BTW why is this bug still marked as 'new' and why was #51852 closed and not this
one?
Please see comment 10.
This bug is NEW because no one has started working on it yet.  This is because
it's not clear what we should do:

1)  You want text files that are gzipped to get unzipped.
2)  You want tar files that are gzipped to _not_ get unzipped
3)  What do you want with gzipped postscript files?

Oh, and servers send basically identical data in cases #1, #2, and #3... :)
What to do is simple:

The layer which is responsible for receiving the HTTP-Response must
unzip the content.
Reasons:
 - The browser requested the content as gzipped ("accept-encoding: gzip")
   (Maybe the accept-encoding is a std request header without brain)
 - The user do not request it as compressed.
 - The information that the response is compressed ("content-encoding: gzip")
   inside the http-header is lost after leaving the http-layer.

Another solution is to request a file not as compressed if the decompression 
cannot be done.

How it is implemented inside the renderer? How it knows its compressed, by 
checking the first bytes or knowing the http-response?
> Maybe the accept-encoding is a std request header without brain

It is.  We _always_ list the encodings we support.

> The user do not request it as compressed

Not true.  When I request a .tar.gz file I _do_ request it as compressed.  When
I request a text file, I usually want it uncompressed.  For postscript, I tend
to want one or the other depending on the size of the file.

> How it is implemented inside the renderer?

The renderer asks the HTTP layer to always decompress everything before passing
it on.  We can't do that when saving files because servers send archive files
with type and encoding headers such that they would be decompressed. _That_ is
the crux of the problem, as I said.  Servers make no distinction between files
they send that should be decompressed before saving and files they send that
should _not_ be decompressed before saving.  This happens because
Transfer-Encoding is unsupported in nearly all servers and browsers (including
Mozilla) so servers have to use Content-Encoding to mean both content-encoding
and transfer-encoding.  Then we're left trying to guess which they meant...

> Another solution is to request a file not as compressed if the decompression 
> cannot be done.

Hmm.  That's possibility, actually....  It doesn't help the case when we click
on a link, then the user picks "save" from the dialog.  At that point, the
response is long since made...  But it could help the "save as" case.

I'm glad we're having this discussion....  This thing is pretty broken as it
stands.  :)


> > Maybe the accept-encoding is a std request header without brain
> 
> It is.  We _always_ list the encodings we support.

I understand supporting if the response of the request is decoded. 
So accept-encoding is not static like the agent statement, it should
be set per request.

> 
> > The user do not request it as compressed
> 
> Not true.  When I request a .tar.gz file I _do_ request it as 
> compressed.  When
> I request a text file, I usually want it uncompressed.  For 
> postscript, I tend
> to want one or the other depending on the size of the file.

No. If the browser accepts the encoding as gzipped and the web server
may compress it again (like a tar.gz.gz) and says in the response 
"content-encoding: gzip". And the http-engine must it unzip to save
a tar.gz and not a tar.gz.gz file. Or the webserver says nothing about
the contentencoding and the file is transfered as .tar.gz . In this
case the response data is compressed but this doest matter in the view
of the http-engine.

> 
> > How it is implemented inside the renderer?
> 
> The renderer asks the HTTP layer to always decompress 
> everything before passing
> it on.  We can't do that when saving files because servers 
> send archive files
> with type and encoding headers such that they would be 
> decompressed. _That_ is
> the crux of the problem, as I said.  Servers make no 
> distinction between files
> they send that should be decompressed before saving and files 
> they send that
> should _not_ be decompressed before saving.  This happens because
> Transfer-Encoding is unsupported in nearly all servers and 
> browsers (including
> Mozilla) so servers have to use Content-Encoding to mean both 
> content-encoding
> and transfer-encoding.  Then we're left trying to guess which 
> they meant...


The engine must not try to decompress a file, but it must decompress
it in case of the content-encoding. (A file .tar.gz is not marked with
content-encoding: gzip, because a zip file is not zipped again)

> 
> > Another solution is to request a file not as compressed if 
> the decompression 
> > cannot be done.
> 
> Hmm.  That's possibility, actually....  It doesn't help the 
> case when we click
> on a link, then the user picks "save" from the dialog.  At 
> that point, the
> response is long since made...  But it could help the "save as" case.
> 
> I'm glad we're having this discussion....  This thing is 
> pretty broken as it
> stands.  :)
> 
> I understand supporting if the response of the request is decoded.

For transfer-encoding, sure.  For content-encoding, I'm not convinced...  Like I
said, the problem is servers using content-encoding to mean both...

> And the http-engine must it unzip to save a tar.gz and not a tar.gz.gz file.

Too bad web servers send a .tar.gz _singly_ compressed and just claim it to be
type application/tar, encoding gzip.

> (A file .tar.gz is not marked with content-encoding: gzip, because a zip file
> is not zipped again)

In an ideal world, sure.  In practice, a .tar.gz is sent as either 

content-type: application/tar
content-encoding: gzip

or

content-type: application/x-gzip
content-encoding: gzip

by Apache.  Since a large fraction of the servers out there are Apache,
especially for unix-oriented sites, we have to deal with these broken behaviors.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.2
Depends on: 69306
Fixed by checkin for bug 69306
Status: ASSIGNED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
I can verify that the original testcase works with 2002-03-28-08 on Linux.

VERIFIED FIXED.
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.