Closed Bug 35956 Opened 24 years ago Closed 23 years ago

File extension not changed but gzipped files expanded when saving

Categories

(Core :: Networking: HTTP, defect, P2)

x86
All
defect

Tracking

()

VERIFIED FIXED
mozilla0.9.2

People

(Reporter: myk, Assigned: darin.moz)

References

()

Details

(Keywords: regression, Whiteboard: [PDTP2] relnote-user)

Attachments

(5 files)

Overview Description:

Mozilla now automatically decompresses a file in gzip format (.gz extension)
when downloading it, but it doesn't change the file extension.  This is
misleading: I think the file is still a gzipped file and unsuccessfully try to
gunzip it when I should be untarring it.  It probably also causes problems in
graphical file managers that base a file's type on its extension.

Steps to Reproduce:

1) go to the url and save the file when mozilla asks you what to do with it.
2) after the file has finished downloading, browse to its location in your
filesystem

Actual Results: file is named cervisia-0.6.0.tar.gz

Expected Results: file is named cervisia-0.6.0.tar

Build Date & Platform Bug Found: Linux 2000-04-14-09

Additional Builds and Platforms Tested On: none
Shouldn't it also ask before decompressing?  If you click on a link to a .gz 
file, you probably to get a .gz file.

Somewhat related is bug 31519, "Save as: should add extension to match content 
type".
->law
Assignee: gagan → law
Target Milestone: --- → M18
*** Bug 39964 has been marked as a duplicate of this bug. ***
I really think the behaviour we want here is to keep the .gz extention, but save
the file as gziped data. We only want to uncompress if we're going to view the
file internally (how are we detecting this anyway, *.txt.gz?) It's wrong to have
mozilla silently gunziping all downloads. It does nothing but waste drive space,
as most applications do this at runtime (vi, less, whatever else you'd be
viewing gziped ascii with...) and for a .gz of non-ascii, its even more useless.

since test case involves two linked files, i put it at:
http://turbogeek.org/mozilla/gzip.html
Move to M21 target milestone.
Target Milestone: M18 → M21
What Mr. Dolan said, except twice.  Mozilla has no business decompressing files
just because I download them.
*** Bug 42019 has been marked as a duplicate of this bug. ***
Uping severity. This is freaking huge folks. If I try to download a mozilla
build with mozilla, instead of 4.7, after it downloads, i get to wait while
mozilla's sloooooow decompression process sucks 100% CPU for longer then it
actually took to download. Then, to make matters worse, mozilla leaks a chuck
the size of the uncompressed version of the file (25-30M, in the case of a
mozilla nightly), occasionally crashing the whole thing, as it tries to swap in
chrome or something. Marking minor->major, perf, crash, mlk. Please reconsidder
targeting. This really should be a M16 blocker.
Severity: minor → major
Keywords: crash, mlk, perf
It looks like 39241 may have been another report of this crash. Looks like
someone reported it while downloading quakeforge (fairly large), then it ended
up as WORKSFORME, as the testcase was a much smaller file, not causing enough of
a memory leak. Uping severity again, as this confirms my crash reporting.
Severity: major → critical
I'm reassigning this to the Networking component.  I've recently been dealing 
with some more bugs real similar to this one.  If there is a way to open an 
input stream in such a way to avoid the decompressing, then please fill me in 
and I can fix it in nsStreamXferOp.cpp.
Assignee: law → gagan
Actually, I just went back and re-opened Bug 39241.  It is a reliable crasher
with a testcase.  The bug "went away", without an explicit fix, so I marked it
WORKSFORME.  We need to get stack traces to determine where this is crashing, so
we can sort out what is causing the problem.
nsbeta2 radar
Keywords: nsbeta2
Putting on [nsbeta2+] radar for beta2 fix. 
Whiteboard: [nsbeta2+]
Bug 33808 is one of the similar bugs.
After talking to mscott, I think this bug is best resolved in the URI loader 
area. However for now I am adding a call in nsIHTTPChannel to allow doing the 
conversion inside of HTTP. 

law: after I finish adding that youd need to QI the channel to nsIHTTPChannel 
and then set doConversion to false, before you start reading anything off of it. 
Status: NEW → ASSIGNED
*** Bug 33808 has been marked as a duplicate of this bug. ***
I just finished implementing this. The correct call is 
SetApplyConversion(PR_FALSE). After I check it in tonight I will reassign this 
back to you law. 
Adding mostfreq as a mostfreq bug was marked a dupe of this one.

Gerv
Keywords: mostfreq
I would like to vote to YES decompress data if a viewer (external) for the 
decompressed version is defined. Or at least ask the user.

I make use of this feature of netscape 4 all the time. Compressing all my files 
and letting netscape do the right thing when I view them - it decompresses them 
and then opens their defined viewer.

If it's a save as, then it should not decompress.

So, if it askes, save it, or view it: if I pick save, it's saved unchanged, if I 
pick view then it's decompressed before the viewer gets it.

I guess some people would want the viewer to get the compressed data - perhaps a 
configuration in the viewer definition?
I don't think anyone is arguing to not un-gzip stuff if we're going to view it.
Since the gzip code has to be in mozilla for HTTP compression, mine as well
support viewing normal .gz files. Rewording summary.
Summary: file extension not changed when gzipped (.gz) files expanded on download → File extension not changed but gzipped files expanded when saving
checked in. law take it away...
Assignee: gagan → law
Status: ASSIGNED → NEW
I've got this patch; Gagan is reviewing...

Index: nsStreamTransfer.cpp
===================================================================
RCS file: /cvsroot/mozilla/xpfe/components/xfer/src/nsStreamTransfer.cpp,v
retrieving revision 1.19
diff -u -r1.19 nsStreamTransfer.cpp
--- nsStreamTransfer.cpp        2000/06/09 00:49:40     1.19
+++ nsStreamTransfer.cpp        2000/06/24 00:57:53
@@ -131,6 +131,13 @@
          NS_SUCCEEDED( outputFile->IsValid( &isValid ) )
          &&
          isValid ) {
+        // Try to get HTTP channel.
+        nsCOMPtr<nsIHTTPChannel> httpChannel = do_QueryInterface( aChannel );
+        if ( httpChannel ) {
+            // Turn off content encoding conversions.
+            httpChannel->SetApplyConversion( PR_FALSE );
+        }
+
         // Construct stream transfer operation to be given to dialog.
         nsStreamXferOp *p= new nsStreamXferOp( aChannel, outputFile );
Status: NEW → ASSIGNED
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
verified:
Linux 2000062808
Status: RESOLVED → VERIFIED
I'm seeing this bug again.  Both with the URL above and other tar.gz files.

WinNT buildID 2000081504
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
This has regressed as of at least Linux 2000-08-16-05.  When downloading the
give URL, the file is expanded but name is not changed.  Also reported on WinNT
-> all/all.  No crash this time, changing severity to normal.
Severity: critical → normal
Keywords: regression
OS: Linux → All
*** Bug 49279 has been marked as a duplicate of this bug. ***
*** Bug 49279 has been marked as a duplicate of this bug. ***
nav triage team: [nsbeta3+]
regression
Whiteboard: [nsbeta2+] → [nsbeta2+][nsbeta3+]
Marking P1.
Priority: P3 → P1
Setting to [nsbeta2-] since that is outta here.  [nsbeta3+] is already 
indicated.
Whiteboard: [nsbeta2+][nsbeta3+] → [nsbeta2-][nsbeta3+]
There is an extremely important detail that, if provided, would greatly help me
to figure out the source of the problem:

When this fails, is it after seeing the (newer) "Downloading" (aka "super helper
app") dialog?  If yes, does it behave the same way if you right-click on the
link and choose "Save link as...?"

My theory is that the answers to those two questions are "yes" and "no."  This
would be caused by the fact that the SetApplyConversion( PR_FALSE )
 never happens when you go through that other dialog.

This might get messy.  Scott, we might have to reload the URL in this case to
get the bits without the conversion. 
PDT downgrading to P2 and leaving [PDTP2] in status whiteboard
Priority: P1 → P2
Whiteboard: [nsbeta2-][nsbeta3+] → [nsbeta2-][nsbeta3+][PDTP2]
Unable to reproduce this bug on PC/Linux build 2000090111 with the given URL.
I see an "Unknown File Type" dialog (application/x-gzip), click "Save File..."
and then "Ok". As a result, cervisia-0.6.0.tar.gz is saved, and an integrity
test "tar ztf ..." does not complain, whereas "tar tf ..." shows an error.
Thus the saved file is still gzipped (as it should).
I suspect we might see different behavior depending on *how* you download.

If you click on a link directly and see the "Downloading" dialog, that might 
result in different behavior than if you right-click on the link and choose 
"Save link as..." (which produces the "unknown content" dialog and then the 
download progress dialog).  Typing the url in the location bar seems to produce 
the latter behavior.

Do *both* of those techniques produce the same behavior?
Blocks: 50326
For the URL specified it never decompresses the file I've tried all three
methods (left click, Save-link as, typing it in URL bar). All three methods
works the same in the build 2000090321. It doesn't decompress the file. So I
think it's WORKSFORME or not?

If they all work the same then that's a different problem, perhaps.  The data 
should be uncompressed as it is loaded into the browser (in case it is html).  
Maybe Necko isn't uncompressing gzip data now?
I think there are two bugs here.  After testing this out (using the bug's URL) I
see that if you right-click and do "Save link as...", the file is decompressed
(but of course should not be).  This despite (I believe), calling
SetTransferEncoding(PR_FALSE) on the channel.  This appears to be a networking bug.

If I click on the link and then choose "Save to disk" on the Downloading dialog,
it is also decompressed.  This will still be broken when the networking glitch
is fixed, I believe.  That's the second bug.  We need to issue the
SetTransferEncoding(PR_FALSE) when we decide to save to disk (or to open a
helper app, for that matter).

One other detail:  This may be due to relatively recent changes that fixed the
downloading code to use the cached version of the file.  If the version in the
cache is decompressed, then that might account for what we're seeing.

I'm going to investigate a bit more and then (probably) reassign to Networking.
Turning off cache got Save link as... working properly.  Seems to be a cache
interaction.  Basically, the data is in the cache decompressed and when we ask
for it again, we get the decompressed data, even though we've specified
SetAutoEncoding(PR_FALSE).  I'm reassigning to the Networking component so that
that can be fixed.

I think there's still the bug in the new Downloading code whereby the data will
be decompressed if you just click on the link.  Note that this is somewhat
tricky to detect because if the data in the cache is compressed, then when
*that* code loads it, it gets the compressed data (and thus appears to be
working properly).

The right thing is for Networking to straighten out the cache business and then
reassign this to mscott to get the helper app service to turn off decompressing
when it starts saving to the temporary file.
Assignee: law → gagan
Status: REOPENED → NEW
->neeti
Assignee: gagan → neeti
Not holding PR3 for this, so marking nsbeta3-. Seems serious enough to nominate 
for rtm, though.
Keywords: rtm
Whiteboard: [nsbeta2-][nsbeta3+][PDTP2] → [nsbeta2-][nsbeta3-][PDTP2]
*** Bug 54704 has been marked as a duplicate of this bug. ***
approving for rtm. gordon can you help neeti here? 
Whiteboard: [nsbeta2-][nsbeta3-][PDTP2] → [nsbeta2-][nsbeta3-][PDTP2][rtm need info]
*** Bug 56439 has been marked as a duplicate of this bug. ***
*** Bug 56449 has been marked as a duplicate of this bug. ***
*** Bug 56846 has been marked as a duplicate of this bug. ***
*** Bug 56856 has been marked as a duplicate of this bug. ***
The cache is not doing anything special. The flag (SetApplyConversion) is not
set for Save As /Right click cases. This needs to be set for necko to not
automatically apply the content conversion. Assigning to mscott for the cases
law describes. mscott: give me a call if you need help. 
Assignee: neeti → mscott
Keywords: relnoteRTM
Here's the fix for disabling conversion for content that's getting dispatched
via the exthandler. Please ignore the first 3 lines of this patch which are part
of another bug. All we care about here are the lines in OnStartRequest which
disable conversion:
  nsCOMPtr<nsIHTTPChannel> httpChannel = do_QueryInterface( aChannel );
  if ( httpChannel )
  {
    // Turn off content encoding conversions.
    httpChannel->SetApplyConversion( PR_FALSE );
  }

http doesn't require this flag until the consumer starts to read out data so
setting it in the OnStart call has the desired effect.

If there are still cases via the save as and right click cases where this flag
isn't getting set then those would go back to law.

gagan, can I get r=gagan on this change from ya?
r=gagan
Scott, the PDT is going to want the exact fix attached without the extraneous
lines.  Might as well just do that now...
Turns out this problem is worse than we thought.

My fix works great for windows. However it doesn't work on linux. On top of
that, nothing works on linux! Let me rephrase that.

Without my changes, goto http://www.turbogeek.com/mozilla/gzip.html.
1) Save Link As for both of the example links and save them to a local file.
2) On windows, both of these files are still gziped. They aren't uncompressed.
This is GOOD as this is what we want.
3) On Linux, both of these files are uncompressed!!

I've verified that Bill's change to set the compress flag on the http channel is
getting set before he opens the channel and starts reading from it. http is
choking on linux for some reason and still decompressing it.

So what did I fix? Well I fixed it for the case where you click on a url and it
causes the helper app dialog to come up. When you saved the content, we were
always uncompressing the data.

So I believe my patch is a requirement as it makes things work great on windows.
but we need to figure out why http is giving us uncompressed data on linux for
all scenarios when we set the convert flag to false.

back to gagan for that one =). hot potato hot potato.
Assignee: mscott → gagan
*** Bug 57410 has been marked as a duplicate of this bug. ***
->darin
Assignee: gagan → darin
Status: NEW → ASSIGNED
*** Bug 57249 has been marked as a duplicate of this bug. ***
The behavior of Netscape 4.75 under Linux is also to uncompress .gz files.
Except that it "correctly" renames the files.  An interesting thing, however,
is that N4.75 does not uncompress (or rename) .tar.gz files.
Using Netscape 6 [Linux build 2000102309] and starting with an empty cache, I
see the following behavior:

1) Left click on a .gz http link.  NS tries to display the file (even if it is
   binary).  This is how 4.X works as well.
2) Now right click on the same .gz http link and select save link as.  Then in
   NS 6 you get a dialog with the full name of the file (including the .gz) but
   the file that is saved will not be compressed (BUG).
3) Next, right click on a different .gz http link and select save link as.
   This time the file is not uncompressed (or renamed).  This is what we expect.

So, from this sequence of events, it is clear that the problem is related to
the cache.  Whether it is the fault of the cache or not is unclear.  As far as
what is going on, my guess would be that the cache is storing the uncompressed
data, since that is what was needed for display.  However, it is associating
the uncompressed data with the URL to the compressed data.  Thus, when we later
request the URL to the compressed data, the cache simply gives us the
uncompressed data instead.  We of course have no knowledge of this, and
therefore we do not save the data with the correct name.

It looks to me like we are not using the cache correctly.  There should be two
entries in the cache, one for the compressed data and one for the uncompressed
data.  I have to investigate this further since I'm not too familar with how
we pass data to the cache.

If what I'm saying is true, we should be able to see the same behavior under
Windows.  Hmmmm...
Ok, under WinNT [build 20001023], I find the exact same behavior that I just
described.  The next thing to investigate is how entries are added to the cache.
Do people agree that there should be two cache entries?  One for the original
compressed data and one for the uncompressed data?  But, then what would the URL
be for the uncompressed data?  Perhaps we should only cache the compressed one?
How does NS 4.X handle this?

Ans: Under Windows NT, it doesn't... NS 4.7 has a similar problem.  In this
case, it doesn't seem to matter if I first follow a link to a .gz file (not a
.tar.gz though) and then try to save the .gz file, the result is always an
uncompressed file with a .gz extension.  The Linux version of 4.X gets this
right, however, as I previously noted.
The issue we should address is that Mozilla has no business uncompressing any
files.  Mozilla should only decompress content if it has been encoded for
transfer (e.g. gzipped HTTP response body), but it should never uncompress files
simply because they are compressed.

For example, if I download the Linux kernel source code, I do not want mozilla
to uncompress it.  The compressed archive is 17.7 megabytes.  The uncompressed
archive is 562% larger at 99.5 megabytes.  I do not want this file to take up
six times more room that it needs to.  Mozilla should leave it compressed on the
disk.  If Mozilla automatically decompresses it, it wastes my time and my CPU
time having to recompress it.

The second issue is consistency.  Mozilla decompresses gzip files, but what
about bzip, zip, lha, arc, shar, ice, et. al?  After we unzip the file, why
aren't we untarring it?  Do we also handle cpio?  Consistent handling of
compressed files is important.

The bottom line is that Mozilla should only automatically handle compression
which is meant to be invisible to the user.  The only example that I am aware of
is Content-Encoding and friends from the HTTP 1.1 spec.
I agree, but in fact, mozilla does not have a problem with .tar.gz files.  It
does not try to decompress them.  The problem is with .gz files.
> Perhaps we should only cache the compressed one?

Agreed.

> The issue we should address is that Mozilla has no business uncompressing any
> files.

If Mozilla can *display* the file and intends to do so, uncompressing it is OK.
It should only never uncompress it when saved on disk.
Hey Darin, I think there's more than just the cache that's causing a problem
here. If I click on a link that isn't in the cache, then on linux, I still see
the content get uncompressed. On windows, the content is properly handled.

To see this, attach my patch to the exthandler to disable conversion. Now visit:
www.mozilla.org and click on a linux nightly tarball. On windows, you'll see
that we don't unzip the content but on linux we still do. The linux tarball
isn't in my cache.


I had a discussion with Gagan on this... and, irrespective of how we are
currently doing things, the "correct" thing to do (following the SPEC) is to
give the file to the user in the format corresponding to the Content-Type
HTTP header.  I'm going to say up front that this is not what the user would
expect in many cases.  For example, the server at turbogeek.org reports the
Content-Type of both gzip-test.gz and gziped-ascii.txt.gz as test/plain.  And,
it specifies the Content-Encoding as gzip.  According to the SPEC this tells
the browser that the content is only gzip compressed for the purposes of getting
the actual data to the user, but that the user ultimately wants the data in the
format specified by Content-Type.  Correct me if I'm wrong, but this is how I
interpret the SPEC.

Now the way Apache, for example, handles .gz files is that it tries to guess
the format of the compressed content.  If you have a .tar.gz file, it will
report the Content-Type as application/x-tar and if it doens't recognize a
contained extension (eg. whatever.gz) then it will just report the Content-Type
as text/plain, which often times is not correct.  In both of these cases, it
will report the Content-Encoding as gzip.

What all of this means, of course, is that if the server is not reporting the
content type as application/gzip then we should decompress.  Ultimately, I think
the user should be given a choice: if the server is providing compressed data,
and the user wishes to save that data to a file, we should ask the user if they
want the data in the compressed form or the uncompressed form.

The current behavior of mozilla and netscape 4.X does not follow the SPEC
in this regard.  It is inconsistent at best, and so we have to decide what
behavior to actually implement.
What do other servers, e.g. MS IIS, by default (many people have only ftp access
to their webserver, so the default matters a lot)?

What happens for sea.gz? .tar.bz2? .txt.bz2, sea.bz2? Do we recognize bz2 at
all? Does Apache, MS IIS?
Is apache aware of the problem, then?  The closest bug I found by searching 
http://bugs.apache.org/ for "gz" was http://bugs.apache.org/index.cgi/full/3892.
Indeed, it seems that the Apache Group have done this on purpose.  Quoting from
httpd.conf.dist:

    #
    # AddEncoding allows you to have certain browsers (Mosaic/X 2.1+) uncompress
   # information on the fly. Note: Not all browsers support this.
    # Despite the name similarity, the following Add* directives have nothing
    # to do with the FancyIndexing customization directives above.
    #
    AddEncoding x-compress Z
    AddEncoding x-gzip gz tgz

I'm contacting the Apache people by mail.
> AddEncoding x-gzip gz tgz

Please note that "tgz". ".tgz" is short (in order to stay in the DOS 8.3 scheme)
for ".tar.gz". -> We will also uncompress tarballs. Yes, this is an Apache bug,
but relevant to our decision.
Please disregard my last comments. I misunderstood you. If Apache always add the
encoding header, even for .tar.gz and .tgz, we must ignore it during saving as
file (this includes the cache), or at least ask the user.

> According to the SPEC this tells
> the browser that the content is only gzip compressed for the purposes of
> getting the actual data to the user, but that the user ultimately wants the
> data in the format specified by Content-Type.  Correct me if I'm wrong, but
> this is how I interpret the SPEC.

Which spec and which sentence do you interpret this way? I checked in RFC1945,
10.3 and RFC2616, 14.11, and I see nothing that suggests this.

If I interpret it correctly, I suggest to just ignore it always save the file
compressed, without bothering the user to ask.
BTW: The relevant Apache bug reports are
<http://bugs.apache.org/index.cgi/full/2364> and
<http://bugs.apache.org/index.cgi/full/1439> (please note that the latter
predates the former, it is just an example of potential harm).
RFC 2616 says that the Content-Encoding should not be undone until display or
other presentation of data.  If Mozilla is going to save the data to disk, it
must leave the content encoding in place, which means not unzipping zipped
files.  If it intends to display the file, then it must unzip the file first.

So two example scenarios, to clarify all this posting:
1)
Content-Type: application/x-tar
Content-Encoding: x-gzip

Mozilla must save the file to disk without changing the name or unzipping the data.

2)
Content-Type: text/plain (or other displayable format)
Content-Encoding: x-gzip

Mozilla should unzip the file and display it.
Marking relnote-user in case we don't come up with a fix for this.

Gerv
Whiteboard: [nsbeta2-][nsbeta3-][PDTP2][rtm need info] → [nsbeta2-][nsbeta3-][PDTP2][rtm need info] relnote-user
The SPEC I was previously referring to is RFC 2616.  Take a look at
Section 3.5 "Content Codings"... the first paragraph in particular.

Clearly, decoding is necessary when we view content.  The question
(which the HTTP SPEC does not answer) is: what do we do when the user
asks to save the content to a file?  And, in what format should we
store the content in the cache?

As far as question 2 is concerned, storing the content in the decoded
form would make sense from the point of view of efficiency -- we don't
want to be decoding the content every time we wish to display it!  On
the other hand, the encoded form of the content may be smaller (in the
case of compression) and therefore would help us conserve disk space.
This would probably be a benefit to embedded device implementers.  So,
perhaps this should be a preference?!?

Now back to question 1.  One very common use of the Save Link As option
is for downloading a file from a HTTP server.  In this way, the HTTP
server is being used like a FTP server.  And, in this case, the user
almost always expects the data to be in the original format... usually
compressed.  We should not decompress such content.  But, this is not
the only way that the Content-Encoding header is used.  This header was
intended to be used by the server when it needs (or wants) to encode the
content for transmission or for whatever reason.  We don't have any way
of knowing what the intent of this encoding is.

The way we've attempted to solve this problem so far, is to give the user
the content in the "raw" encoded form when they click Save Link As (or
some equivalent).  However, there are some bugs in the way we do this now.
And, moreover if the content is already in the cache (in decoded form),
then what should we give the user when they ask for the content to be saved
to a file?  Should we re-encode the data?  Should we re-fetch the content?
Or, should we give them the decoded data, and somehow guess the correct
filename as Netscape 4 tries to do?

Also, what if the content is not in the cache, and then if we save the
content undecoded, should we cache that?  If we do, then if the user later
asks to display the content, we will have to remember to decode at that time.
BTW.. this is currently a problem.  Clear your cache, goto turbogeek.org/mozilla
and download [right click->Save Link As] one of the test files (eg.
gzip-test.gz).  Then left click on the save file, notice that the displayed
content is binary, which means the data is not being decoded!!

The cache is already slated for an overhaul in the very near future.  I think
it should incorporate knowledge of the encoding and possibly be able to provide
the data in either format on-the-fly?!?
Whiteboard: [nsbeta2-][nsbeta3-][PDTP2][rtm need info] relnote-user → [nsbeta2-][nsbeta3-][PDTP2][rtm need info]
mscott: with your patch (which appears to be checked into the trunk... i haven't
looked for it on the branch) I do not see a difference in the behavior on Linux
versus Windows.  I am using trunk pulls from yesterday (10-23-2000).  Perhaps I
could swing by your cube and have you to show me the difference?
darin, to see the difference go to www.mozilla.org and click on the linux
nightly tarball. When the helper app dialog comes up, select save to disk.

On windows, the file is correctly saved uncompressed. On linux, the file is
uncompressed and still has a .gz extension.

Is that what you were trying? That's what my patch fixed for windows. 
Darin,
IMO, the cache should not hold decompressed data. It is a network cache,
supposed to reduce redundant network fetches, not save processing time.
Decompressing is so fast that it might be even faster than reading the
decompressed file from disk (but I have no data supporting this). In any case, a
cache hit for compressed data (in contrast to stylesheets etc.) seems to be
unlikely, so keeping the decomprssed file just to save some processing seconds
seems like a waste of cache space to me (especially for the mem cache).
I agree with your arguments that the cache should not hold decoded data.
It's unfortunate that this is not the current behavior.  At the moment, the
stream conversion (decoding) is happening as the data arrives, unless the
HTTP channel has the ApplyConversion flag set to FALSE.  The converted stream
is passed on to the channel's listener (eg. the parser).  The cache intercepts
this stream, so it never sees the encoded data.  Clearly, then, we need to
re-think how data is put into the cache.  This is probably a major change.
mscott: as strange as this may sound, when I go to www.mozilla.org and
grab a nightly build (like mozilla-i686-pc-linux-gnu-sea.tar.gz) by left-
clicking the link and selecting save in the dialog, I _do_ get a gzip'd
tar file.  I tested this using a CVS pull from around 4 pm today (10/24).
With which version of the code are you seeing the discrepancy between
Linux and Windows?
Most of the discussion here is around HTTP; FTP also shows the same behaviour.
(Should this be entered as a new bug, like 57619?)  If I download
http://ftp.mozilla.org/pub/mozilla/nightly/2000-10-25-08-Mtrunk/mozilla-i686-pc-linux-gnu-sea.tar.gz
(with 2000102508 in Linux) I get a compressed version of the .tar.gz.  If I
download
ftp://ftp.mozilla.org/pub/mozilla/nightly/2000-10-25-08-Mtrunk/mozilla-i686-pc-linux-gnu-sea.tar.gz
(the same file with a different protocol) I get an uncompressed version.
*** Bug 57625 has been marked as a duplicate of this bug. ***
Whiteboard: [nsbeta2-][nsbeta3-][PDTP2][rtm need info] → [nsbeta2-][nsbeta3-][PDTP2][rtm need info] relnote-user
This is not something we can easily fix for RTM.  Moving the target to Future,
and marking rtm- in the status whiteboard.
Whiteboard: [nsbeta2-][nsbeta3-][PDTP2][rtm need info] relnote-user → [nsbeta2-][nsbeta3-][PDTP2][rtm-] relnote-user
Target Milestone: M21 → Future
Note that when Content-Encoding is set to 'gzip', Mozilla will save files in
compressed format, regardless.  I have a proxy which compresses html content
(sets Content-Encoding to gzip), and *all* files are saved as gzipped when I hit
save-as (but mozilla does not append a .gz extension).

It appears that with this build (2000110308), .tar.gz files are handled
correctly, but others types are not.  From my reading of rfc2616, section 7.2.1,
  "Any HTTP/1.1 message containing an entity-body SHOULD include a
   Content-Type header field defining the media type of that body. If
   and only if the media type is not given by a Content-Type field, the
   recipient MAY attempt to guess the media type via inspection of its
   content and/or the name extension(s) of the URI used to identify the
   resource."
Since this is an ambiguous situation, it is bendinging the rules a bit, but the
client (mozilla) must examine the URI to determine if the file was gzipped.

The way to unambiguously fix this would be to use 'Transfer-Encoding: gzip' when
the server has compressed the data, and only use 'Content-Encoding' when the
data was compressed to begin with.  But this would require modification of lots
of servers.  (And will probably be incompatible with HTTP/1.0)  I think I will
modify my proxy to use Transfer-Encoding instead (since Transfer-Encoding *must*
be removed by the client).  Will mozilla handle this properly?

"Future" my ass.  (not to be rude...)  This is a bug, and it needs to be fixed.
I think most people would agree, that the preferred action taken by the browser
should be to give the file in a format that is consistent with the URL.
Based on the content-type/content-encoding alone, you do not really know
what the user expects when they ask to save the content to disk.  However,
if you inspect the extension on the URL, you can do a mime-type lookup and
then figure out what to do (most of the time).

This bug is "futured" b/c it depends on cache architecture changes (which are 
coming).  Currently, we are not putting content in the cache in a consistent
way, so it is difficult (if not impossible) to properly fix this problem right
now.  Right now, it is possible depending on how you acquire content (either
through left-clicking a link or right-clicking and saving the link) to end up
with content stored in encoded form in the cache as well as content stored in
decoded form.  I really believe that this needs to be resolved first.
My suggestion to unambiguously fix the Content-Type/Content-Encoding mess by
using Transfer-Encoding won't work because neither Netscape 4.7x nor Mozilla
support it.  (But it's defined by rfc2616!)

Should I file a separate bug for 'Transfer-Encoding: gzip' not working?
If transfer-encoding is not working properly then YES that should be filed
under a different bug.
Why ever _decompress_ or do _anything_ else (i.e. CRLF-conversion) if saved on 
disk! Neither on clicking nor on "Save as...". Leave _any_ changing of the saved 
file to the helper applications or at least to what's declared in the 
application preferences, since one never knows what will be done with it. 

Why try to be smarter if it means worse. Not being able to download without 
changing means not being able to download at all. So I consider this bug as 
"major".
One important reason why we want to "touch" the network data before caching
it is, in the case of HTTP, to parse transfer encodings and group headers
together in the event that headers appear at the end of the data stream.
We really don't want to have to do this kind of parsing every time we fetch
a server response from the cache.
Component: Networking → Networking: HTTP
Blocks: 61688
*** Bug 65573 has been marked as a duplicate of this bug. ***
For *content* vs *transfer* encoding in HTTP/1.1 see my comments in bug 68414.
To clarify:
When we get an entity with "Content-Encoding: gzip" it should keep this encoding
when saved to disk. RFC 2616 says this is a property of the entity. Also, see
W3C-CUAP 3.1 (http://www.w3.org/TR/2001/NOTE-cuap-20010206#cp-save-filenames and
bug 68420), which even more clearly tells us what to do.
However, there could be an option in the filepicker to save uncompressed.
The option to "save uncompressed" is definitely an enhancement worthy of a
different bug report.  For this bug, I agree, content saved to disk should
not be decoded.  In order for this to work, we have to make sure that cached
content is also written in encoded form.

For now, we are blocked waiting for the new cache to land.
In my opinion, the Right Thing for the cache to Do is to store exactly what was
received from the server. Then, on "save as", the file should be copied from the
cache as-is. Only in this way will the final file saved be identical to the
original file from the server.

The "display" section of Mozilla should be treated just like any other helper
app and given the original (possibly compressed) file. That helper app should
uncompress (if necessary) and parse the file on-the-fly and display it. But
there's no reason to store this uncompressed (and parsed and otherwise
fiddled-with) text anywhere.

It's definitely much faster to decompress a file from the RAM cache than it is
to load a uncompressed file from the disk cache. (Sometimes all the compressed
data fits in RAM, but all the compressed + all the uncompressed data overflows
RAM and overflows onto disk).

Darin Fisher may be right that re-parsing the group headers every time a file is
loaded from cache is too slow. Would it be possible to store a short summary of
pre-parsed information and meta-information about a file somewhere else, such
that the original file contents are still unchanged ?
The new cache implementation addresses this concern.
removing stale/old keywords.

adding dependency on bug for new cache design.
Depends on: 68705
Keywords: crash, mlk, nsbeta2, rtm
Whiteboard: [nsbeta2-][nsbeta3-][PDTP2][rtm-] relnote-user → [PDTP2] relnote-user
Ok, doing "save to disk" now leaves the .gz extension and it is a .gz file, so 
this is good. However, if you choose "Open With..." and then a gz-handling 
program, odd things happen (at least for me, on Windows, with PowerArchiver2000 
- appreciate reports on other platforms.) Is it passing it the tar version?

Gerv
this bug hasn't been fixed yet.  it will require a little bit of HTTP reworking.
Keywords: nsbeta1
Target Milestone: Future → mozilla0.9.1
my changes for bug 76866 will fix this bug as well.
Depends on: 76866
*** Bug 80053 has been marked as a duplicate of this bug. ***
the necessary http changes were checked in with the http branch landing. 
however, i'm not sure that the problem is completely fixed.
Keywords: qawanted
No, this is definitely not fixed yet.  Build 2001051616, x86/Linux.
That's what we should do:
                                        View          Helper        Save As
================================================================================
Content-Type: application/octet-stream  ---           compressed    compressed
Content-Encoding: gzip                  uncompressed  uncompressed  compressed
Transfer-Encoding: gzip                 uncompressed  uncompressed  uncompressed

Transfer-Coding is not working yet (bug 68517). For all other cases I can verify
correct behavior with 2001-05-16-04, Win NT (for helper applications see
bug 69306).

One issue is left: File extension is not changed in the default file name of the
save-as dialog if it does not match the content coding. Is that covered by
bug 31519, should we keep this bug open, or should I file a new one?

-> 0.9.2
Target Milestone: mozilla0.9.1 → mozilla0.9.2
*** Bug 82127 has been marked as a duplicate of this bug. ***
*** Bug 82308 has been marked as a duplicate of this bug. ***
*** Bug 82319 has been marked as a duplicate of this bug. ***
*** Bug 82698 has been marked as a duplicate of this bug. ***
*** Bug 83154 has been marked as a duplicate of this bug. ***
This has started happening to gzip files downloaded from ftp.mozilla.org within
the past few days.
*** Bug 83188 has been marked as a duplicate of this bug. ***
Happens on all tar.gz and .tgz files too
At least now i know this and i dont have to erase the files and get them with
another app <g>
Keywords: perf, qawanted, relnoteRTMpatch
good fix. r=gagan
this fix is not really correct.  it breaks down for servers which send text/html
with a content-encoding of gzip.  try saving the toplevel page at
http://sourceforge.net/.  you'll see that the saved page is gzip encoded.

i think what we really need to do here is respect the content-encoding header
except in cases where the content-type is application/x-gzip (and related
variants).  sourceforge.net is basically broken.  it should not be sending a
Content-Encoding header in this case, since it does not intend for the browser
to decode the data.  if we added this application/x-gzip hack we'd be consistent
with the behavior of NS4x.
the question then is are there other content-types which should be treated in 
a similar manner?
> this fix is not really correct.  it breaks down for servers which send text/html
> with a content-encoding of gzip.  try saving the toplevel page at
> http://sourceforge.net/.  you'll see that the saved page is gzip encoded.

That's exactly what we should do. Quoting RFC 2616 once again:

   The content-coding is a characteristic of the entity identified by
   the Request-URI. Typically, the entity-body is stored with this
   encoding and is only decoded before rendering or analogous usage.
but then the filename of the saved page should have a .gz appended to it.
fwiw: nav4x saves the page in text/html format, not application/x-gzip.
> but then the filename of the saved page should have a .gz appended to it.

Yes (strictly speaking: we should use the proper system naming convention for
the content coding, see bug 68420).

OK.. while i agree that this solution could be sufficient for text/html, we'd
then need a way to distinguish:

  Content-Type: text/html
  Content-Encoding: gzip

from:
 
  Content-Type: application/x-gzip
  Content-Encoding: gzip

in terms of whether or not we should gunzip the data.

if we adhere to the spec then we should decode the data in both cases (or at
least assume that the data is not actually of the type given by the Content-Type
header but actually an encoded form of that), and we should choose a filename
extension that matches the content-type.  unfortunately, this doesn't work for
the second case, since the content sent by the server is not actually twice
gzip'd!  ...the server is lying!

we can alternatively assume that when saving to disk, the Content-Encoding
header should be ignored.  this is nice, because in the second case it means
that we'd be OK... we would have conveniently solved the twice gzip'd problem.
but, what about the first case.  what would happen there?  well, we'd probably
want to (as i've already said) adjust the saved file extension to take into
account the fact that the content is gzip'd.  but, how do we know that the
content is gzip'd?  because of the Content-Encoding header right?  but, we're
ignoring the Content-Encoding header, aren't we?  this is where i get stuck.

i think that no matter what we have to assume that servers will not double gzip
content.  otherwise, i'm not sure how we're going to solve this problem.

this is just another example of us having to jump through hoops to support a
commonly accepted (and consistently implemented) violation of the spec.
darin, what if you have a URL .../foo.html.gz giving HTTP
 Content-Type: text/html
 Content-Encoding: gzip
? You don't want to save that as foo.html.gz.gz, do you? You want to save it as
foo.html.gz, no matter, if the URL was ../foo.html or ../foo.html.gz, right?

So, can we avoid the check, if the filename already has an acceptable extension
(Note: .tar.gz means the same as .tgz!)?

If we do extension guessing, and considering that we can't (yet) do anything
sensible with application/x-gzip other than saving, do we have to special-case
for the "violation" (Content-Type: application/x-gzip, Content-Encoding: gzip)
at all?
FYI: It would be fine with me, if you
- didn't decompress Content-Type: gzip when saving
- decompressed Transfer-Encoding: gzip when saving
- proposed the "filename portion" (part after the last slash; "index.html", if
null) of the URL as filename for the local disk.
This would mean that the URL .../foo.html giving HTTP
 Content-Type: text/html
 Content-Encoding: gzip
would be stored gzipped as foo.html, but that'sa fault of the web site, not?

(BTW: What happens on Windows, if I save a normal foo.html? Is foo.html or
foo.htm proposed as filename? Doesn't the former break Windows extensions?)
Darin wrote:
> this is just another example of us having to jump through hoops to support a
> commonly accepted (and consistently implemented) violation of the spec.

It seems there is no solution that will work right for all situations, so,
failing that, we mine as well implement it right (RFC-style), and get Apache to
fix the Content-type/encoding ambiguities before next release. It mean's we'd
screw up saves for older (broken) Apache servers, but hell, Netscape 4.x screws
up gzip saves as well, so we're at 4.x parity for broken servers, and correct
for proper servers.

If we don't do it right now, more browsers and web servers that serve up
ambiguous content-type/-encodings will be released, and it'll never be
corrected. And if we don't do it right now, we're stuck with the problem of
trying to guess what the server meant, and eventually we're gonna guess wrong,
and force new web servers to remain broken for broken browser compat.

I say do it RFC style, relnote the user, and contact Apache (and any other
server who's default config is sending twice-gzip'd headers and only gziping it
once).
We are the Moz-cops, upholding the RFCs!!  WooHoo!!
The new attachment "alternative solution" openly admits to breaking the RFC and
it doesn't solve the problem properly.

The 06/01/01 solution with Gagan's r= on it sounds fine and hopefully can be
checked in soon so that we can verify and get out of here.

If people are downloading gzip compressed HTML files and want them to be named
foo.html.gz that's an issue for a different bug (someone mentioned the bug #
already) PLEASE don't try to solve all the world's trouble here in #35956
my previous patch (the one gagan r='d) simply ignored the Content-Encoding
header when saving to disk, but as i described in my previous patch this makes
it impossible to get the file extension right, unless we explicitly encode 
(someplace) the fact that apache doesn't really double gzip such content.

moreover, my previous patch breaks necko convention by not calling
OnStartRequest for the stream converter.  this is only a minor detail, of
course, and for the gzip stream converter it fortunately has benign side 
effects... but http's not supposed to know that, right? ;-)
   ...but as i described in my previous _comments_ this makes...

.tar.gz is usually the same as .tgz unless .tgz means a slackware package on
linux which is VERY different
What the hell are you talking about?  A slackware package is just a gzipped tar
file.
"my previous patch (the one gagan r='d) simply ignored the Content-Encoding
header when saving to disk,"

Good. Fix the Necko nit you mentioned, get approval, check in fix, verify and
kill this bug.

I really don't think the 06/01/01 patch prevents bug 68420 from being fixed,
Darin can you explain why you believe that to be true? Maybe my understanding
of Mozilla's architecture is too weak to see the problem.
my point is that this is not a bug with mozilla, it is a bug with apache.
all we can do is work around apache's bug.  that is the intent of my latter
patch.  i'm going to add an additional check that only enables the workaround
logic if the server is apache.
r=gagan
*** Bug 84899 has been marked as a duplicate of this bug. ***
+    const char *encoding = mResponseHead->PeekHeader(nsHttp::Content_Encoding);
+    if (encoding && PL_strcasestr(encoding, "gzip") && (
+        !PL_strcmp(mResponseHead->ContentType(), APPLICATION_GZIP) ||
+        !PL_strcmp(mResponseHead->ContentType(), APPLICATION_GZIP2))) {
+        // clear the Content-Encoding header
+        mResponseHead->SetHeader(nsHttp::Content_Encoding, nsnull); 

I think I got apache to spit out an encoding of x-gzip at one point (although
that was without sending any accept-encoding headers). You should probably check
for that as well.
an encoding of x-gzip would be picked up by this patch as well.  note the call to
PL_strcasestr.
Oops. Of course, we'll now match against not-gzip :)
Not sure if the following situation is releveant to this bug, but..

I've tried to download something from this link..
http://mylookandfeel.l2fprod.com/portal.php3?action=plaf&id=skinlf#resources

and I'm offered to save a *.php file I don't think it's right.. it should let me
save the *gz file that is behind there (worked well on IE when I copied the link)

(milos.kleint@czech.sun.com)
remove the printf in nsHttpChannel::SetApplyConversion and it is good to go.
*** Bug 85476 has been marked as a duplicate of this bug. ***
dougt: yikes! thanks for catching the printf.
Whiteboard: [PDTP2] relnote-user → [PDTP2] relnote-user, r=gagan, sr=dougt, a=?
Has anyone contacted Apache to get a fix in there?
Blocks: 83989
a= asa@mozilla.org for checkin to the trunk.
(on behalf of drivers)
fix checked in!!  horray!!  horray!!
Whiteboard: [PDTP2] relnote-user, r=gagan, sr=dougt, a=? → [PDTP2] relnote-user, r=gagan, sr=dougt, a=asa
marking FIXED
Status: ASSIGNED → RESOLVED
Closed: 24 years ago23 years ago
Resolution: --- → FIXED
http://mylookandfeel.l2fprod.com/portal.php3?action=plaf&id=skinlf#resources

still tries to save the .zip files as .php... seperate issue, or was the patch
supposed to take care of that one too?
yes that's a seperate bug which you should be able to find in bugzilla (it 
might even be mostfreq), basically we don't honor the suggested file name 
field, which pairs nicely w/ the fact that we don't provide normal filename 
fields :)
Keywords: patch
Whiteboard: [PDTP2] relnote-user, r=gagan, sr=dougt, a=asa → [PDTP2] relnote-user
*** Bug 85854 has been marked as a duplicate of this bug. ***
This fix seems to have broken some (all?) pages that are gz encoded. For
example, go to http://www.mutt.org/ and click on the FAQ link (it's near the top). 

In 4.77 this link brings up the page, but in Mozilla I now get a gzip file
displayed in the browser.
The mutt faq page WFM on 2001061309/Linux.
I was using 2001061308/Linux (Navigator only).

Installing 2001061408 has only made things worse... it now segfaults whenever I
click on the FAQ link:

/usr/local/mozilla/run-mozilla.sh: line 72: 15501 Segmentation fault      $prog
${1+"$@"}


WFM linux 2001061308, but I dont think this page is a "normal" encoding setup.

Upon requesting "GET /muttfaq/faq", it 302's you to 
http://www.fefe.de:80/muttfaq/faq.html.gz

Which is sent:   Content-Type: text/html..Content-Encoding: gzip

Doing a Save As... on the page saves with the name faq.html.gz, and with the
data gziped. Doing a "Save Link As..." on the page from mutt.org saves the
uncompressed version to a file named 'faq' (no .html).
Travis, the bug you mentioned (with x-gzip encoding) is known as bug 85887 and
was fixed recently.
*** Bug 87016 has been marked as a duplicate of this bug. ***
Hi

I got caught by this one too!
*** Bug 87781 has been marked as a duplicate of this bug. ***
Gzipped files from citeseer still expanded:

http://citeseer.nj.nec.com/rd/44385488%2C319362%2C1%2C0.25%2CDownload/http%253A%252F%252Fciteseer.nj.nec.com/cache/papers/cs/14081/http%253AzSzzSzwww.brics.dkzSz%257EmiszSzmacro.ps.gz/brabrand00growing.ps.gz

The URL above is a redirect to a gzipped ps file which is auto-expanded when
mozilla downloads it. It is saved by default as .ps.gz which is incorrect.

Build id: 2001062608
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
robin: what platform are you noticing this on?  i just tried with the linux
6-26/08 and didn't have any problems.  the saved file was compressed.
robin: make sure you clear your cache... there was a bug that just got fixed
which made it possible for the uncompressed content to be written to the cache,
which if later saved to disk would also be uncompressed.  this was fixed, however.

marking FIXED... please reopen if after clearing your cache you still see the
problem.  thx!
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
*** Bug 88059 has been marked as a duplicate of this bug. ***
*** Bug 88619 has been marked as a duplicate of this bug. ***
No longer blocks: 68420
I filed bug 90490 for the remaining issue (no .gz extension added).
*** Bug 90711 has been marked as a duplicate of this bug. ***
Peopl are still filing dupes. Should this be re-opened?
benc: I don't think this made 0.9.2 - are there any reports with current 
nightlies? I'd have to check that though.
benc: Some bugs are filed against old builds (bug 87016, bug 88059). Bug 85854
is filed against build 2001061308, the fix has been checked in 9 hours before.
Bug 87781 and bug 90711 have no build ID. Bug 88619 has probably the wrong ID
(that of a newer build downloaded with an old build).
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2) Gecko/20010701. 

The most annoying bug in the world has been squashed. Congratulations
FT
WFM on WindowsME, 2001072618 trunk installer build.

Isn't this bug ripe for fixed/verified?
verified:
Linux rh6 2001080106
Win NT4 2001080103
Mac os9 2001080108
Status: RESOLVED → VERIFIED
*** Bug 95242 has been marked as a duplicate of this bug. ***
I'm seing this again on build linux gcc3.0 2001122021 while downloading files from
 http://ftp.mozilla.org/pub/mozilla/nightly/latest/ using save as, not when
clicking on the links waiting for the download menu to pop up!
Link in this bug works fine however?!?
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
confirmed... but, the problem you are reporting is a different bug.  i could
reproduce it by right clicking and pressing "Save link as"  ...that's not what
this bug report is about.  

please see bug 116445.

marking FIXED.
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
This bug is still present in build 2002010208.  I tried three cases with the
mozilla installer on http://www.mozilla.org/ (i686-pc-linux)

click: downloaded and saved as .tar.gz, file is a gzipped tar.
     file save as dialog box showed: Files of type: *.gz (*.gz)

shift+click: downloaded and saved as .tar.gz, file is a tar (not gzipped)
     file save as dialog box showed: Files of type: All Files (*.*)(*.*)

right click+save link as: downloaded and saved as i386 Linux, file is tar (not
gzipped)
     file save as dialog box showed: Files of type: All Files (*.*)(*.*)

Philip, that's bug 116445.
v fixed. new issues/regressions whould be filed as new bugs.
Status: RESOLVED → VERIFIED
See Also: → 1470011
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: