If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

downloaded pdf files are corrupted with additional html headers

RESOLVED FIXED

Status

()

Core
Networking: HTTP
--
major
RESOLVED FIXED
14 years ago
12 years ago

People

(Reporter: Menno Deij, Assigned: Darin Fisher)

Tracking

Trunk
x86
All
Points:
---
Bug Flags:
blocking-aviary1.0 -

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(3 attachments)

(Reporter)

Description

14 years ago
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6

When I download pdf-files from websites of professional journals by
right-clicking and selecting "Save Link As..." I _sometimes_ get corrupted
pdf-files as a result. They appear to have some sort of HTML-headers before the
pdf-data starts and also interspersed in the pdf-data. These headers look like
this, I get like 30 or so before the file starts:

HTTP/1.1 200 OK
Server: Netscape-Enterprise/4.1
Date: Wed, 30 Jul 2003 06:11:32 GMT
Set-Cookie: ERIGHTS=-7085262077103818777
Accept-ranges: bytes
Content-type: application/pdf
Content-length: 341000

This prevents me from opening the pdf files. 

This problem is both in Mozilla and in FireBird. When I disable the download
manager and the progress indicator in Mozilla, the files get saved ok. 

I can reproduce the problem every time with the same pdf file, but it does not
appear with all downloadable pdf files.

A problem for reproduction of this bug could be that you may have to have
electronic access to these journals. However many journals provide a sample
issue, for instance by following the following link:

http://pubs3.acs.org/acs/journals/toc.page?incoden=achre4&indecade=0&involume=36&inissue=1

This is a sample issue of Accounts of Chemical Research, and it should be freely
accessible. Downloading one of the articles there using the "save link as..."
with downloadmanager or progress indicator gives the html headers in the file.

Reproducible: Always

Steps to Reproduce:
1. Open link
http://pubs3.acs.org/acs/journals/toc.page?incoden=achre4&indecade=0&involume=36&inissue=1
2. Save an article in the list
3. In a shell do "less <saved-article-name.pdf>" and see the HTML headers.

Actual Results:  
Unable to open the saved pdf with acroread. Gives an error about the file format
(which is not surprising given that there are HTML headers in the file).

Expected Results:  
Saved the pdf file without the HTML headers in.

Comment 1

14 years ago
I was able to reproduce the problem with Moz 1.5a on WinXP.  I don't have the
download manager enabled though, and it doesn't happen every time.  Not sure
what component this should be in.  Might also be a dupe of bug 184019
(Reporter)

Comment 2

14 years ago
I tried the second link (first one is down) in bug 184019 but it downloads the
pdf correctly. So, maybe it's not a dupe...

Comment 3

14 years ago
I noticed that the server sent 66 200 OK messages to Mozilla's HEAD request. It
seems that Mozilla is adding most of the 200 OK messages to the top of the
downloaded file.  This seems simular to bug 208173 and bug 212654 except they
deal with the server actually sending data with the HEAD request

Comment 4

14 years ago
I'm seeing this with linux trunk 20030729, without download manager.
nsHttp:5 / tcpdump logs to follow

marking NEW
=> Http
Assignee: blake → darin
Status: UNCONFIRMED → NEW
Component: Download Manager → Networking: HTTP
Ever confirmed: true
QA Contact: petersen → httpqa

Comment 5

14 years ago
Created attachment 128907 [details]
tcpdump log

this is a log with the bug showing up.	I couldn't get the bug to not happen
while I was capturing packets...

Comment 6

14 years ago
Created attachment 128908 [details]
bzipped nsHttp:5 log

this is a http log loading the page and downloading the pdf three times.  the
first two times worked fine (no extra http headers) and the bug showed up on
the third try.
(Assignee)

Comment 7

14 years ago
andrew: can you upload or send me a copy of the corrupted PDF file corresponding
to the HTTP log you just uploaded?  i'd like to be able to match up byte offsets
between the log file and the PDF file.  thanks!

Comment 8

14 years ago
Created attachment 128939 [details]
corrupt pdf + tcpdump log + http log

this is a corrupted pdf (pg 59-65 in the Accounts of Chemical Research) along
with its own http and tcpdump log

Updated

14 years ago
OS: Linux → All
Summary: when using dl manager, downloaded pdf files are corrupted with additional html headers → downloaded pdf files are corrupted with additional html headers

Comment 9

14 years ago
I did some testing and I believe that the number of inserted headers is
proportional to the size of the PDF (5.3K or so = 1 header). Also it seems that
if you wait long enough at the 'Save As' window before clicking the 'Save'
button the download will not be corrupted. Finally, every time the last but one
header had a Content-length value that was smaller than the actual filesize. 
I used Gecko/20040515 and the PDF files from
http://pubs.acs.org/journals/joceah/index.html (see bug 245716).

Comment 10

14 years ago
*** Bug 245716 has been marked as a duplicate of this bug. ***

Comment 11

13 years ago
This is a serious and very frustrating bug, I can't open some important pdf
files after downloading and I must redownload them with IE. 
I updated firefox to a nightly build and the problem still exists. 
My firefox version is:
Mozilla/5.0 (Windows; U; Windows NT 5.0; rv:1.7.3) Gecko/20041001 Firefox/0.10.1
I haven't encountered this problem between 0.7-0.9 versions of firefox. Any
extension can cause it? I will try a clean installation of firefox, and if it
doesn't work I will wait for a fix. 
Flags: blocking-aviary1.0?
*** Bug 264389 has been marked as a duplicate of this bug. ***
this server bug was probably worked around by Bug 160454's checkin (mozilla
only). can someone retest using a current nightly build (from the trunk)?
Depends on: 160454
Flags: blocking-aviary1.0? → blocking-aviary1.0-
I can see the bug in:
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8a5) Gecko/20041007
The file picker gives the pdf file a .htm extension and All Files as filetype.

But I can't see the bug in:
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8a5) Gecko/20041018
Now the file picker gives a .pdf extension and as file type Adobe Acrobat Document.

But the downloaded file is not a pdf file. In order to download any pdf file
from that site, I need to log in (but I can't, because you need to be a member). 
> Now the file picker gives a .pdf extension and as file type Adobe Acrobat
Document.

this is not what this bug is about.

Comment 16

13 years ago
with linux release 1.7.3, 6/9 PDFs at the URL exhibited this problem
with linux trunk 2004101806, all 9 were OK.

resolving as FIXED-by-bug 160454
Status: NEW → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.