86235 - saving Yahoo News page results in gzip encoded page

Reporter

Description

•

23 years ago

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.1+) Gecko/20010615
BuildID:    2001061309

Reproducible: Always
Steps to Reproduce:
1. Load the above URL (http://finance.yahoo.com/q?s=AOL&d=v1)
2. Press Ctrl-S to save the page.

Actual Results:  The file saved is binary crap.

Expected Results:  The web page is saved in HTML format.

Workaround: Open view source window, then choose "File -> Save As"

Doug Turner (:dougt)

Comment 1

•

23 years ago

Crap, huh?  Darin specializes in stinky problems.

Assignee: dougt → darin

Darin Fisher

Comment 2

•

23 years ago

WORKSFORME as of June 15 (tested on Win2k)

Status: NEW → RESOLVED

Closed: 23 years ago

Resolution: --- → WORKSFORME

doctor__j

Reporter

Comment 3

•

23 years ago

OK, second try...

1. Open browser.
2. Set "Charater Coding -> Auto-Detect" to Japanese.
3. Go to http://dailynews.yahoo.com/h/nm/20010613/ts/crime_survey_dc_1.html
4. Save the page.
5. Observe the crap saved.

URL: http://finance.yahoo.com/q?s=AOL&d=v1 → http://dailynews.yahoo.com/h/nm/20010...

Status: RESOLVED → REOPENED

Resolution: WORKSFORME → ---

Summary: Binary crap is saved on Yahoo Finance stock quote page → Binary crap is saved on Yahoo News page

Darin Fisher

Comment 4

•

23 years ago

OK.. i'm seeing this too with a trunk build dated 6/19 or so.

Status: REOPENED → ASSIGNED

Priority: -- → P3

Target Milestone: --- → mozilla0.9.3

Darin Fisher

Comment 5

•

23 years ago

it looks like File->SaveAs is calling nsIHttpChannel::SetApplyConversion(false)
which is the problem here.  IMO File->SaveAs should save the document in the
format with which it is being viewed.

Gagan

Comment 6

•

23 years ago

that belongs to law, right? ->law

Assignee: darin → law

Status: ASSIGNED → NEW

Paul Chen

Assignee

Comment 7

•

23 years ago

nav triage team:

Marking nsbeta1+, nsBranch, p2, and reassigning to pchen

Assignee: law → pchen

Keywords: nsbeta1+, nsBranch

Priority: P3 → P2

Paul Chen

Assignee

Comment 8

•

23 years ago

So the lines at:
http://lxr.mozilla.org/seamonkey/source/xpfe/components/xfer/src/nsStreamTransfer.cpp#131

were added to fix bug 39596. So if I remove those lines, we don't get crap when
saving this file. My question is, will that regress 39596 (where we
automatically gunzipped files)? Since there were other parts to the fix for
39596, maybe not, but I'm not the expert. People on the cc-list are. ;-)

Viswanath Ramachandran

Comment 9

•

23 years ago

um 39596 does not have any patch attached. are you sure thats the right bug
number? also I only see comment on line 131 of that file - could you confirm if
thats the right code segment?

Paul Chen

Assignee

Comment 10

•

23 years ago

Ok, I'm dyslexic. That's bug 35956. Also, I meant the lines STARTING at line 131
in nsStreamTransfer.cpp, more specifically lines 131-136. That's what bill
checked in to fix 35956.

Viswanath Ramachandran

Comment 11

•

23 years ago

since this happens I think for any web server that serves up pages in gzip 
encoding (technical term??) we shd look hard at getting a fix for the limbo 
phase.

Viswanath Ramachandran

Comment 12

•

23 years ago

Darin/Doug/Gagan - could one of you take this one if you dont have any urgent
bugs ? Paul has a topcrash which he needs to try and nail soon. thanks ! Vishy

Viswanath Ramachandran

Comment 13

•

23 years ago

-> dougt.

Assignee: pchen → dougt

Doug Turner (:dougt)

Comment 14

•

23 years ago

well, I have a hand-waving fix, if you are interested.  

Basically, there are three callers into |SelectFileAndTransferLocationSpec|:  

/xpfe/communicator/resources/content/contentAreaUtils.js, line 96
/xpfe/communicator/resources/content/nsContextMenu.js, line 704
/xpfe/components/ucth/resources/unknownContent.js, line 120

Two of these calls are related to the downloading of the html page to disk.  The 
other is related to downloading of files (eg. content click -> save this 
link as).  The problem with this is that to fix 35956, all content encoding was 
made to be ignored.  For the two calls that are interested in download what is 
current viewed, this breaks or so is reported... (Actually it doesn't really 
break as the page you are seeing is really compressed and mozilla just downloads 
the compressed form.  You can un gzip the file and presto - human readable html, 
but I guess that is not how most users expect save as to work.)

So, in these two cases, it knows something about what it is downloading, so it 
needs to hint the stream transfer code that it should apply content conversion.  
This basically takes the patch in 35956 and pushes the logic down to the 
raw consumer.

Now the problem with this solution, is that if you click on a link which is 
compressed html, and ask it to be saved, you will download the compressed 
gzip'ed form.  Ugh.  

Maybe we should just have a list of human readable file extensions, and check 
those before removing content encoding?.

Gagan, you review the gzip patch.  Do you have any input on this dilema?

Gagan

Comment 15

•

23 years ago

Hmmm... All this should only ever be a concern if there is a discrepancy in the 
Content-type and the Content-Encoding. So maybe the right thing to do here is to 
check for a difference and then only for those cases where it's different we'd 
setApplyConversion as needed. 

Makes sense? I can try and take a whack at this if you want doug...

Doug Turner (:dougt)

Comment 16

•

23 years ago

at 90 mph on 280, I realized the same thing.  we really need voice access to
bugzilla!  :-)

Paul Chen

Assignee

Comment 17

•

23 years ago

Taking back from dougt since he's busy.

Gagan and Doug, can I get content-type and content-encoding out of the channel
while inside SelectFileAndTransferLocationSpec()?

90 mph on 280, eh? Must've been in the slow lane, then. ;-)

Assignee: dougt → pchen

Viswanath Ramachandran

Updated

•

23 years ago

Keywords: nsBranch

Paul Chen

Assignee

Comment 18

•

23 years ago

nav triage team:

Moving out to mozilla0.9.4

Target Milestone: mozilla0.9.3 → mozilla0.9.4

John Taylor

Comment 19

•

23 years ago

*** Bug 90693 has been marked as a duplicate of this bug. ***

Paul Chen

Assignee

Comment 20

•

23 years ago

nav triage team:

Booting to mozilla0.9.5

Target Milestone: mozilla0.9.4 → mozilla0.9.5

benc

Comment 21

•

23 years ago

Please update summary, component and default qa.

Greg Valure

Comment 22

•

23 years ago

Rather than adding a duplicate bug, I'll just note that this happens on Yahoo
Mail pages too.  This makes it a little more important, because some people
(like me) save e-mailed receipts to their hard drive, and the average person
sees binary crap, not a gzipped file disguised as a HTML file.

Michael Schwendt

Comment 23

•

23 years ago

*** Bug 86290 has been marked as a duplicate of this bug. ***

Michael Schwendt

Comment 24

•

23 years ago

With regard to the dup I've just marked (contains not much new), and with regard
to bug #51852 and content-encoding related issues, I'd like to emphasize, that I
still cannot confirm Yahoo's saved pages to be gzipped (.gz), compressed (.Z),
or bzipped2 (.bz2):

$ file savedpage
savedpage: data
$ mv savedpage savedpage.gz ; gunzip savedpage.gz
gunzip: savedpage.gz: not in gzip format

However, the same page saved from "View | Page Source" menu and then gzipped -9
is _very_ close in size to the bugged save file. Just a few bytes off (probably
due to dynamically created page content).

Michael Schwendt

Comment 25

•

23 years ago

A test URL that still exists (any public archive page at Yahoo!Groups would do
it): http://groups.yahoo.com/group/c64rmx/message/2231

R.K.Aa.

Comment 26

•

23 years ago

more dups? bug 99448, bug 94596, bug 99951

Ashley Bischoff (blog at handcoding.com)

Comment 27

•

23 years ago

*** Bug 99951 has been marked as a duplicate of this bug. ***

timeless

Comment 28

•

23 years ago

**** is not a useful technical term.

Summary: Binary crap is saved on Yahoo News page → saving Yahoo News page results in gzip encoded page

Peter Trudelle

Comment 29

•

23 years ago


*** This bug has been marked as a duplicate of 51852 ***

Status: NEW → RESOLVED

Closed: 23 years ago → 23 years ago

Resolution: --- → DUPLICATE

[not reading bugmail]

Comment 30

•

23 years ago

this happens with php3 pages also as in my bug filed 99951.

[not reading bugmail]

Comment 31

•

23 years ago

this is also a problem with .jsp files on www.storagereview.com threads.

benc

Comment 32

•

23 years ago

-> XPAPPS
VERIFIED:
problem descriptions similar.

Status: RESOLVED → VERIFIED

benc

Updated

•

23 years ago

Component: Networking: File → File Handling

Nobody; OK to take it and work on it

Updated

•

8 years ago

Product: Core → Core Graveyard