Closed Bug 32407 Opened 25 years ago Closed 25 years ago

large files corrupted on download

Categories

(Core Graveyard :: Tracking, defect, P1)

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: omerk, Assigned: law)

References

Details

(Whiteboard: [PDT+] fix in hand)

Any ZIP file saved w. Mozilla reports a bad CRC when tested with Winzip. Tried 2 skinz at Chromezone and a few other zip d/l's-- files were ok with Nav 4.7 and IE, corrupt with Moz.
valeski?
Assignee: chofmann → valeski
Bill, maybe we're missing the initial or final OnData*() data for writing? I tested (mozilla nightly build) this with an HTTP download of a zip file and couldn't download the zip.
Assignee: valeski → law
Keywords: beta1
CRC error also mentioned in # 32307 and # 32490
I can't spot any pattern here. I've downloaded .exe files (via http) OK. I've downloaded .zip files (via ftp) OK. I've downloaded .zip files via ftp: and have had it fail. The file lengths always look OK. I compared M14 downloads from mozilla.org, one using mozilla (as of this morning), one using 4.x. The files differ at offset 0x10000 which looks like a good spot for somebody's code to break down. I'll look further.
I applied the patch below to test whether the data is getting corrupted coming in vs. going out. I dump the raw data to a file. When I compare this file to the resulting .zip that the download produces, they do not match. When I compare the test file to what communicator downloaded, they match. So the data is getting corrupted on output. My guess is either a bug in the file transport logic or in nsIFile. I'm cc:ing Doug Turner because he might know more about the latter. The code I resurrected last week to fix 31804 was code older than nsIFile. Is that code not correct for use with nsIFile underlying nsIFileTransport? I'm guessing here (I don't even know for sure that file transport uses nsIFile). Anyway, here is the patch. You'll have to change the DEBUG_law string and the hard-coded file name to make it work for you. Index: nsStreamXferOp.cpp =================================================================== RCS file: /cvsroot/mozilla/xpfe/components/xfer/src/nsStreamXferOp.cpp,v retrieving revision 1.24 diff -r1.24 nsStreamXferOp.cpp 320a321,328 > #ifdef DEBUG_law > // Write raw data to file for test. > static FILE *test = 0; > if ( !test ) { > test = fopen( "c:\\WinNT\\Profiles\\law\\Desktop\\test", "wb" ); > } > fwrite( buffer, 1, bytesRead, test ); > #endif
Status: UNCONFIRMED → NEW
Ever confirmed: true
Do we want to apply this patch for beta 1 at this point?
Priority: P3 → P1
Target Milestone: M15
Putting on PDT- radar for beta1.
Whiteboard: [PDT-]
Changing summary to state more explicitly that *all* large files (larger than 64K, apparently) are corrupted when downloaded. Please re-evaluate PDT status. Doug and I (and Warren) are looking for a fix.
Summary: .zip files save with bad CRC → large files corrupted on download
Whiteboard: [PDT-]
*** Bug 32490 has been marked as a duplicate of this bug. ***
*** Bug 32575 has been marked as a duplicate of this bug. ***
I built a file with known content so we could look for a pattern in how the data was being scrambled. You can get that file at ftp://law.mcom.com/d/download/bytes.zip. I generated the file with this program: #include <stdio.h> void main() { FILE *f = fopen( "bytes.zip", "wb" ); unsigned int p = 0; // Fill 64K plus a few bytes. while ( p < 0x11000 ) { // Write the data 4 times. fwrite( (char*)&p, 1, 4, f ); fwrite( (char*)&p, 1, 4, f ); fwrite( (char*)&p, 1, 4, f ); fwrite( (char*)&p, 1, 4, f ); // Advance pointer. p += 0x10; } } First funny thing is that I can download this file OK via ftp. Via http: it gets scrambled, though. So this means the bug is likely not a straightforward buffering algorithm thing. Looking at the scrambled output, things are broken at offset 0x10000 which happens to be the buffer (segment) size. Instead of 00000100, I see 0030FF00. If we shift this left 1 byte (so it aligns), then we're off by 0xD0 bytes. This is confirmed by examining the data at offset 0x100D0, which is 00000001 (if we shift that left 1 byte, we get the 00000100 we expected at offset 0x10000). I'm going to try the same thing with the beta1 branch version of Mozilla and will report the results in a moment.
I get the same output downloading (via http:, ftp: works OK) using a Mozilla trunk build as I do using a beta1 branch build. You can access this test file via http: at http://law.mcom.com/bytes.zip.
I spoke with Phil, and he indicated that the PDT did not understand this bug when they marked it as PDT- (I was out of the room). I'm upping it to PDT+, and would REALLY like both info on the problem, as well as the potential for a solution. We need a lot of focus on this RSN, as this seems like a VERY dangerous bug :-( Thanks, Jim
Whiteboard: [PDT+]
I added this code to nsBufferedStreams.cpp to catch when/how my "bytes.zip" file was getting corrupted. When I hit this breakpoint, it was called by way of nsStreamXferOp::OnStopRequest->nsOutputFileStream::Close->nsBufferedStreams::Flu sh (I don't know exactly what was in the middle, though). Basically, the http channel got to the end, called OnStopRequest. That triggered the output stream close, which of course flushed the file. The problem is that (apparently; I'm verifying) the OnStopRequest was occurring on the wrong thread (*not* the UI thread). There are still OnDataAvailable notifications queued up on the UI thread. The close/flush that occurs on the second thread resets things like mCursor, which swallows data that a pending OnDataAvailable has just written (or is just about to write). Anyway, that's the theory. I'm going to investigate this issue of what thread OnStopRequest is called on more closely. If it isn't the UI thread (regardless of whether it should be), I'm going to try proxying it over to the UI thread inside my OnStopRequest code. That way, we have a fix that doesn't affect any code aside from the file download code. Of course, any of you should feel free to jump in. Bill x2296 Index: nsBufferedStreams.cpp =================================================================== RCS file: /cvsroot/mozilla/netwerk/base/src/nsBufferedStreams.cpp,v retrieving revision 1.5 diff -r1.5 nsBufferedStreams.cpp 300a301,306 > #ifdef DEBUG_law > // Ensure data at file offset matches the offset. > // This means you can't download anything but bytes.zip! > PRUint32 *p = (PRUint32*)mBuffer; > NS_ASSERTION( p && *p == mBufferStartOffset, "Buffer corrupted!" ); > #endif 313c319 < nsCRT::memcpy(mBuffer, mBuffer + amt, rem); --- > nsCRT::memmove(mBuffer, mBuffer + amt, rem); 314a321,324 > #ifdef DEBUG_law > // force notification if we hit this code! > NS_ENSURE_TRUE( rem == 0, NS_ERROR_FAILURE ); > #endif
Adding myself to cc list. (sorry about spam)
Added some folks to the CC list, as this seems to be the most perilous bug facing beta 1 at this point. In fact... it is currently the *only* PDT+ bug that requires a checkin!!!
OK, it turns out I was mistaken about the OnStopRequest; that seems to be a red herring. If I increase my test file to 2*64K bytes, then the corrupted buffer appears on an OnDataAvailable (*it* forces the flush at that point). I'm going to add some code in nsBufferedStreams::Write() to try to catch where things are going wrong.
also, you can reduce NS_OUTPUT_STREAM_BUFFER_SIZE to get this assert to hit more regular.
It does not look like |buf| is being incremented after a flush in nsBufferedOutputStream:Write()... Could this be it? still looking...
I think I found the fix. Please apply and verify: Index: nsBufferedStreams.cpp =================================================================== RCS file: /cvsroot/mozilla/netwerk/base/src/nsBufferedStreams.cpp,v retrieving revision 1.5 diff -r1.5 nsBufferedStreams.cpp 282c282 < nsCRT::memcpy(mBuffer + mCursor, buf, amt); --- > nsCRT::memcpy(mBuffer + mCursor, buf + written, amt);
You are going to have to remove law's debug patch 2000-03-20 21:01.
I think that's it, Doug. I independently arrived at the same fix. I can't believe I looked at this for so long and didn't see it. The nasty thing was that the error didn't occur when the bogus data was being written, it occurred when the previous segment was flushed. When my assertion was hit, the problem was way in the past. I'm going to check in that fix on the trunk. We'll let the PDT decide how to deal with the branch tomorrow.
Updated status whiteboard.
Whiteboard: [PDT+] → [PDT+] Investigation in progress ...
The investigation is done, I think. Doug and I found a fix that tests out. I ran checkin tests with no problem and checked the code into the trunk last night. I'm just not sure what's the deal with beta1.
updating whiteboard.
Whiteboard: [PDT+] Investigation in progress ... → [PDT+] fix in hand
Changing Platform/OS to 'All' --- this occurs with files downloaded on Mac OS & Linux beta branch builds, as well.
OS: Windows NT → All
Hardware: PC → All
Fix checked in on beta1 branch. This better work now, or else.
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Updating QA Contact.
QA Contact: leger → sairuh
verified fixed w/ today's beta1 branch bits on linux, winNT and mac (opt comm, 2000.03.22.06-nb1b).
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.