User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6 When I download pdf-files from websites of professional journals by right-clicking and selecting "Save Link As..." I _sometimes_ get corrupted pdf-files as a result. They appear to have some sort of HTML-headers before the pdf-data starts and also interspersed in the pdf-data. These headers look like this, I get like 30 or so before the file starts: HTTP/1.1 200 OK Server: Netscape-Enterprise/4.1 Date: Wed, 30 Jul 2003 06:11:32 GMT Set-Cookie: ERIGHTS=-7085262077103818777 Accept-ranges: bytes Content-type: application/pdf Content-length: 341000 This prevents me from opening the pdf files. This problem is both in Mozilla and in FireBird. When I disable the download manager and the progress indicator in Mozilla, the files get saved ok. I can reproduce the problem every time with the same pdf file, but it does not appear with all downloadable pdf files. A problem for reproduction of this bug could be that you may have to have electronic access to these journals. However many journals provide a sample issue, for instance by following the following link: http://pubs3.acs.org/acs/journals/toc.page?incoden=achre4&indecade=0&involume=36&inissue=1 This is a sample issue of Accounts of Chemical Research, and it should be freely accessible. Downloading one of the articles there using the "save link as..." with downloadmanager or progress indicator gives the html headers in the file. Reproducible: Always Steps to Reproduce: 1. Open link http://pubs3.acs.org/acs/journals/toc.page?incoden=achre4&indecade=0&involume=36&inissue=1 2. Save an article in the list 3. In a shell do "less <saved-article-name.pdf>" and see the HTML headers. Actual Results: Unable to open the saved pdf with acroread. Gives an error about the file format (which is not surprising given that there are HTML headers in the file). Expected Results: Saved the pdf file without the HTML headers in.
I was able to reproduce the problem with Moz 1.5a on WinXP. I don't have the download manager enabled though, and it doesn't happen every time. Not sure what component this should be in. Might also be a dupe of bug 184019
I tried the second link (first one is down) in bug 184019 but it downloads the pdf correctly. So, maybe it's not a dupe...
I noticed that the server sent 66 200 OK messages to Mozilla's HEAD request. It seems that Mozilla is adding most of the 200 OK messages to the top of the downloaded file. This seems simular to bug 208173 and bug 212654 except they deal with the server actually sending data with the HEAD request
I'm seeing this with linux trunk 20030729, without download manager. nsHttp:5 / tcpdump logs to follow marking NEW => Http
Created attachment 128907 [details] tcpdump log this is a log with the bug showing up. I couldn't get the bug to not happen while I was capturing packets...
Created attachment 128908 [details] bzipped nsHttp:5 log this is a http log loading the page and downloading the pdf three times. the first two times worked fine (no extra http headers) and the bug showed up on the third try.
andrew: can you upload or send me a copy of the corrupted PDF file corresponding to the HTTP log you just uploaded? i'd like to be able to match up byte offsets between the log file and the PDF file. thanks!
Created attachment 128939 [details] corrupt pdf + tcpdump log + http log this is a corrupted pdf (pg 59-65 in the Accounts of Chemical Research) along with its own http and tcpdump log
I did some testing and I believe that the number of inserted headers is proportional to the size of the PDF (5.3K or so = 1 header). Also it seems that if you wait long enough at the 'Save As' window before clicking the 'Save' button the download will not be corrupted. Finally, every time the last but one header had a Content-length value that was smaller than the actual filesize. I used Gecko/20040515 and the PDF files from http://pubs.acs.org/journals/joceah/index.html (see bug 245716).
*** Bug 245716 has been marked as a duplicate of this bug. ***
This is a serious and very frustrating bug, I can't open some important pdf files after downloading and I must redownload them with IE. I updated firefox to a nightly build and the problem still exists. My firefox version is: Mozilla/5.0 (Windows; U; Windows NT 5.0; rv:1.7.3) Gecko/20041001 Firefox/0.10.1 I haven't encountered this problem between 0.7-0.9 versions of firefox. Any extension can cause it? I will try a clean installation of firefox, and if it doesn't work I will wait for a fix.
*** Bug 264389 has been marked as a duplicate of this bug. ***
this server bug was probably worked around by Bug 160454's checkin (mozilla only). can someone retest using a current nightly build (from the trunk)?
13 years ago
I can see the bug in: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8a5) Gecko/20041007 The file picker gives the pdf file a .htm extension and All Files as filetype. But I can't see the bug in: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8a5) Gecko/20041018 Now the file picker gives a .pdf extension and as file type Adobe Acrobat Document. But the downloaded file is not a pdf file. In order to download any pdf file from that site, I need to log in (but I can't, because you need to be a member).
> Now the file picker gives a .pdf extension and as file type Adobe Acrobat Document. this is not what this bug is about.
with linux release 1.7.3, 6/9 PDFs at the URL exhibited this problem with linux trunk 2004101806, all 9 were OK. resolving as FIXED-by-bug 160454