Closed Bug 86235 Opened 23 years ago Closed 23 years ago

saving Yahoo News page results in gzip encoded page

Categories

(Core Graveyard :: File Handling, defect, P2)

x86
Windows 98
defect

Tracking

(Not tracked)

VERIFIED DUPLICATE of bug 51852
mozilla0.9.5

People

(Reporter: doctor__j, Assigned: paulkchen)

References

()

Details

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.1+) Gecko/20010615
BuildID:    2001061309

Reproducible: Always
Steps to Reproduce:
1. Load the above URL (http://finance.yahoo.com/q?s=AOL&d=v1)
2. Press Ctrl-S to save the page.

Actual Results:  The file saved is binary crap.

Expected Results:  The web page is saved in HTML format.

Workaround: Open view source window, then choose "File -> Save As"
Crap, huh?  Darin specializes in stinky problems.
Assignee: dougt → darin
WORKSFORME as of June 15 (tested on Win2k)
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → WORKSFORME
OK, second try...

1. Open browser.
2. Set "Charater Coding -> Auto-Detect" to Japanese.
3. Go to http://dailynews.yahoo.com/h/nm/20010613/ts/crime_survey_dc_1.html
4. Save the page.
5. Observe the crap saved.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Summary: Binary crap is saved on Yahoo Finance stock quote page → Binary crap is saved on Yahoo News page
OK.. i'm seeing this too with a trunk build dated 6/19 or so.
Status: REOPENED → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla0.9.3
it looks like File->SaveAs is calling nsIHttpChannel::SetApplyConversion(false)
which is the problem here.  IMO File->SaveAs should save the document in the
format with which it is being viewed.
that belongs to law, right? ->law
Assignee: darin → law
Status: ASSIGNED → NEW
nav triage team:

Marking nsbeta1+, nsBranch, p2, and reassigning to pchen
Assignee: law → pchen
Keywords: nsbeta1+, nsBranch
Priority: P3 → P2
So the lines at:
http://lxr.mozilla.org/seamonkey/source/xpfe/components/xfer/src/nsStreamTransfer.cpp#131

were added to fix bug 39596. So if I remove those lines, we don't get crap when
saving this file. My question is, will that regress 39596 (where we
automatically gunzipped files)? Since there were other parts to the fix for
39596, maybe not, but I'm not the expert. People on the cc-list are. ;-)
um 39596 does not have any patch attached. are you sure thats the right bug
number? also I only see comment on line 131 of that file - could you confirm if
thats the right code segment?
Ok, I'm dyslexic. That's bug 35956. Also, I meant the lines STARTING at line 131
in nsStreamTransfer.cpp, more specifically lines 131-136. That's what bill
checked in to fix 35956.

since this happens I think for any web server that serves up pages in gzip 
encoding (technical term??) we shd look hard at getting a fix for the limbo 
phase. 
Darin/Doug/Gagan - could one of you take this one if you dont have any urgent
bugs ? Paul has a topcrash which he needs to try and nail soon. thanks ! Vishy
-> dougt. 
Assignee: pchen → dougt
well, I have a hand-waving fix, if you are interested.  

Basically, there are three callers into |SelectFileAndTransferLocationSpec|:  

/xpfe/communicator/resources/content/contentAreaUtils.js, line 96
/xpfe/communicator/resources/content/nsContextMenu.js, line 704
/xpfe/components/ucth/resources/unknownContent.js, line 120

Two of these calls are related to the downloading of the html page to disk.  The 
other is related to downloading of files (eg. content click -> save this 
link as).  The problem with this is that to fix 35956, all content encoding was 
made to be ignored.  For the two calls that are interested in download what is 
current viewed, this breaks or so is reported... (Actually it doesn't really 
break as the page you are seeing is really compressed and mozilla just downloads 
the compressed form.  You can un gzip the file and presto - human readable html, 
but I guess that is not how most users expect save as to work.)

So, in these two cases, it knows something about what it is downloading, so it 
needs to hint the stream transfer code that it should apply content conversion.  
This basically takes the patch in 35956 and pushes the logic down to the 
raw consumer.

Now the problem with this solution, is that if you click on a link which is 
compressed html, and ask it to be saved, you will download the compressed 
gzip'ed form.  Ugh.  

Maybe we should just have a list of human readable file extensions, and check 
those before removing content encoding?.

Gagan, you review the gzip patch.  Do you have any input on this dilema?
Hmmm... All this should only ever be a concern if there is a discrepancy in the 
Content-type and the Content-Encoding. So maybe the right thing to do here is to 
check for a difference and then only for those cases where it's different we'd 
setApplyConversion as needed. 

Makes sense? I can try and take a whack at this if you want doug...
at 90 mph on 280, I realized the same thing.  we really need voice access to
bugzilla!  :-)
Taking back from dougt since he's busy.

Gagan and Doug, can I get content-type and content-encoding out of the channel
while inside SelectFileAndTransferLocationSpec()?

90 mph on 280, eh? Must've been in the slow lane, then. ;-)

Assignee: dougt → pchen
Keywords: nsBranch
nav triage team:

Moving out to mozilla0.9.4
Target Milestone: mozilla0.9.3 → mozilla0.9.4
*** Bug 90693 has been marked as a duplicate of this bug. ***
nav triage team:

Booting to mozilla0.9.5
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Please update summary, component and default qa.
Rather than adding a duplicate bug, I'll just note that this happens on Yahoo
Mail pages too.  This makes it a little more important, because some people
(like me) save e-mailed receipts to their hard drive, and the average person
sees binary crap, not a gzipped file disguised as a HTML file.
*** Bug 86290 has been marked as a duplicate of this bug. ***
With regard to the dup I've just marked (contains not much new), and with regard
to bug #51852 and content-encoding related issues, I'd like to emphasize, that I
still cannot confirm Yahoo's saved pages to be gzipped (.gz), compressed (.Z),
or bzipped2 (.bz2):

$ file savedpage
savedpage: data
$ mv savedpage savedpage.gz ; gunzip savedpage.gz
gunzip: savedpage.gz: not in gzip format

However, the same page saved from "View | Page Source" menu and then gzipped -9
is _very_ close in size to the bugged save file. Just a few bytes off (probably
due to dynamically created page content).

A test URL that still exists (any public archive page at Yahoo!Groups would do
it): http://groups.yahoo.com/group/c64rmx/message/2231
more dups? bug 99448, bug 94596, bug 99951
*** Bug 99951 has been marked as a duplicate of this bug. ***
**** is not a useful technical term.
Summary: Binary crap is saved on Yahoo News page → saving Yahoo News page results in gzip encoded page

*** This bug has been marked as a duplicate of 51852 ***
Status: NEW → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → DUPLICATE
this happens with php3 pages also as in my bug filed 99951.
this is also a problem with .jsp files on www.storagereview.com threads.
-> XPAPPS
VERIFIED:
problem descriptions similar.
Status: RESOLVED → VERIFIED
Component: Networking: File → File Handling
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.