Closed
Bug 208843
Opened 21 years ago
Closed 19 years ago
large downloads fail (partial file downloaded)
Categories
(Core Graveyard :: Networking: FTP, defect)
Core Graveyard
Networking: FTP
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bugzilla, Assigned: sfraser_bugs)
References
()
Details
(4 keywords)
Attachments
(2 files)
1.49 KB,
patch
|
dougt
:
review+
wtc
:
review-
|
Details | Diff | Splinter Review |
1.50 KB,
patch
|
dougt
:
review+
wtc
:
review+
darin.moz
:
superreview+
dveditz
:
approval-aviary1.0.8+
dveditz
:
approval1.7.13+
dveditz
:
approval1.8.0.1+
dveditz
:
approval1.8.1+
|
Details | Diff | Splinter Review |
simon pointed me to this bug. he can reproduce it on mozilla and camino (trunk based), and i was able to reproduce it using today's commercial branch (2003.06.09.05-1.4) on OS X 10.2.6. large downloads from external sites fail because only part of the file is downloaded. this differs from bug 198076 because i don't have a problem with Stuffit Expander hanging. 0. this test makes use of an Apple Developer account --iirc, it's free to register. adding 'qawanted' if there's an easier test for this. 1. sign on to Apple's Developer Connection at http://connect.apple.com 2. go to the Download Software section, and click on Developer Tools, which contains big files. 3. attempt to download the OS X Developer Tools (mac binary disk image), which is about 301MB. results: the download progress seems unusually fast. ultimately, i get a Disk Copy Progress error, where the .dmg file fails to mount "due to error 95 (no mountable file systems)." also, back on the browser side i get an alert, "426 Data connection: Broken pipe."
Comment 1•21 years ago
|
||
can this be reproduce on windows or linux?
Reporter | ||
Comment 3•21 years ago
|
||
so far cannot repro on win2k and linux rh8.0, at least using the above test case. i used Winzip as the opening app on win2k and just saved the whole (not partial) file on linux. i'm trying to see if this was a regression, but it looks like this was a problem (looking at OS X builds) since at lease 31 March. :-/
Reporter | ||
Comment 4•21 years ago
|
||
this might've regressed btwn the 2003.01.22.07 (works, but not sure) and 2003.01.23.07 (broken) OS X mach-o comm builds. currently saying "not sure" since the dmg file, while 301.2MB in size on both my desktop and as described in the Apple page, doesn't seem to open up.
Comment 5•21 years ago
|
||
I couldn't reproduce based on the steps provided. At the beginning of the download, I specified the "save to disk" option. Tested under 2003-06-05-05 branch build under 10.2.6.
Reporter | ||
Comment 6•21 years ago
|
||
for OS X, i didn't select "save to disk", i took the default (OS-associated helper app) of opening with Stuffit.
Assignee | ||
Comment 7•21 years ago
|
||
Chunk of socket transport log (the only log available in opt builds) where download fails: 5327760[4f3a80]: nsSocketTransport::SendStatus [this=7c7dca0 status=804b0006] 5327760[4f3a80]: nsSocketInputStream::Read [this=7c7dd14 count=2728] 5327760[4f3a80]: calling PR_Read [count=2728] 5327760[4f3a80]: PR_Read returned [n=1360] 5327760[4f3a80]: nsSocketTransport::SendStatus [this=7c7dca0 status=804b0006] 5327760[4f3a80]: nsSocketInputStream::Read [this=7c7dd14 count=1368] 5327760[4f3a80]: calling PR_Read [count=1368] 5327760[4f3a80]: PR_Read returned [n=-1] 5327760[4f3a80]: nsSocketInputStream::Read [this=7c7dd14 count=2736] 5327760[4f3a80]: calling PR_Read [count=2736] 5327760[4f3a80]: PR_Read returned [n=-1] 5327760[4f3a80]: nsSocketInputStream::AsyncWait [this=7c7dd14] 5327760[4f3a80]: active [1] { handler=7c7dca0 condition=0 pollflags=5 } 5327760[4f3a80]: active [0] { handler=7c806f0 condition=0 pollflags=5 } 5327760[4f3a80]: idle [2] { handler=6e6d230 condition=0 pollflags=0 } 5327760[4f3a80]: idle [1] { handler=6a13b30 condition=0 pollflags=0 } 5327760[4f3a80]: idle [0] { handler=6db7920 condition=0 pollflags=0 } 5327760[4f3a80]: calling PR_Poll [active=2 idle=3] 5327760[4f3a80]: nsSocketTransport::OnSocketReady [this=7c7dca0 outFlags=1] 5327760[4f3a80]: nsSocketInputStream::OnSocketReady [this=7c7dd14 cond=0] 5327760[4f3a80]: nsSocketInputStream::CloseEx [this=7c7dd14 reason=80470002] 5327760[4f3a80]: nsSocketTransport::OnMsgInputClosed [this=7c7dca0 reason=80470002] 5327760[4f3a80]: nsSocketTransportService::DetachSocket [handler=7c7dca0] 5327760[4f3a80]: nsSocketTransport::OnSocketDetached [this=7c7dca0 cond=80470002] 5327760[4f3a80]: nsSocketTransport::RecoverFromError [this=7c7dca0 state=4 cond=80470002] 5327760[4f3a80]: nsSocketInputStream::OnSocketReady [this=7c7dd14 cond=80470002] 5327760[4f3a80]: nsSocketOutputStream::OnSocketReady [this=7c7dd2c cond=80470002] 5327760[4f3a80]: nsSocketTransport: calling PR_Close [this=7c7dca0] The error, 80470002, is NS_BASE_STREAM_CLOSED.
Comment 8•21 years ago
|
||
can you get a FTP log (not nspr log, but some other packet trace)?
Comment 9•21 years ago
|
||
btw, i downloaded the file successfully on windows using 1.4rc1. Final file size: 315,922,432 (size on disk 315,924,480).
Comment 10•21 years ago
|
||
nsSocketInputStream::CloseEx [this=7c7dd14 reason=80470002] is equivalent to nsSocketInputStream::Close [this=7c7dd14] NS_BASE_STREAM_CLOSED doesn't indicate a real failure so to speak.
Comment 11•21 years ago
|
||
coudl it be that you ran out of tmp space. simon says no, what about others?
Assignee | ||
Comment 12•21 years ago
|
||
I don't think temp space was the issue; consecutive downloads would fail in different places.
Comment 13•19 years ago
|
||
is this still a problem with recent builds?
Comment 14•19 years ago
|
||
I still see this problem. Actually, using the same test as mentioned in the description: I tried to download the Xcode 2.1 disk image many times (at least 5), and each time the download would finish (with no error), but the file would be much smaller than expected. After these repeated failures, I was able to download it using Safari on the very first attempt. Here is a snippet from my "nsHTTP" log: ====================================== 50778112[1324c50]: nsHttpTransaction::ProcessData [this=2224a0b0 count=1360] 50778112[1324c50]: nsHttpTransaction::HandleContent [this=2224a0b0 count=1360] 50778112[1324c50]: nsHttpTransaction::HandleContent [this=2224a0b0 count=1360 read=1360 mContentRead=102016712 mContentLength=788343619] 50778112[1324c50]: nsHttpTransaction::OnSocketStatus [this=2224a0b0 status=804b0006 progress=102018419] 50778112[1324c50]: nsHttpTransaction::ProcessData [this=2224a0b0 count=1376] 50778112[1324c50]: nsHttpTransaction::HandleContent [this=2224a0b0 count=1376] 50778112[1324c50]: nsHttpTransaction::HandleContent [this=2224a0b0 count=1376 read=1376 mContentRead=102018088 mContentLength=788343619] -1610551400[1306b30]: nsHttpChannel::OnDataAvailable [this=222ce790 request=22222ee0 offset=102015352 count=2736] -1610551400[1306b30]: sending status notification [this=222ce790 status=804b0006 progress=102018088/788343619] -1610551400[1306b30]: nsHttpChannel::OnStopRequest [this=222ce790 request=22222ee0 status=0] -1610551400[1306b30]: nsHttpChannel::FinalizeCacheEntry [this=222ce790] -1610551400[1306b30]: calling OnStopRequest 50778112[1324c50]: nsHttpConnection::OnSocketReadable [this=1dfa9990] 50778112[1324c50]: nsHttpConnection::CloseTransaction[this=1dfa9990 trans=2224a0b0 reason=80470002] 50778112[1324c50]: nsHttpTransaction::Close [this=2224a0b0 reason=0] 50778112[1324c50]: nsHttpConnectionMgr::ReclaimConnection [conn=1dfa9990] 50778112[1324c50]: nsHttpTransaction::DeleteSelfOnConsumerThread [this=2224a0b0] 50778112[1324c50]: proxying delete to consumer thread... ====================================== As you can see from that, I also get the 80470002 error (as mentioned in comment #7). But Darin says that might be nothing. So, what's the best way to debug this?
Comment 15•19 years ago
|
||
Some other observations: I cannot reproduce this on a Mac Mini 1.42Ghz, whereas it would always fail for me on a dual-processor G5 1.8Ghz (both using DP2 release). I could get it to fail consistently on a debug build on the dual-processor G5, but as soon as I enabled NSPR logging for "nsPipe:5", or set a conditional breakpoint at a certain spot in the debugger, then the download would finish as expected. NSPR logging of "nsHTTP:5" and/or "nsStreamPump:5" still showed the failure; only "nsPipe:5" resulted in sucess. Timing issue, maybe?
Assignee | ||
Comment 16•19 years ago
|
||
Sounds like a nasty race condition in NSPR.
Assignee | ||
Comment 17•19 years ago
|
||
Saw this today downloading a 600Mb file on a fast network, on a dual G5, with Camino 1.0b1.
Assignee | ||
Comment 18•19 years ago
|
||
I was able to reproduce this in about 1 out of 8 tries downloading a 300Mb file over my local network (about about 10MB/sec), using an optimized build. In a debug build, it's much less frequent.
Comment 19•19 years ago
|
||
I also see this problem. Saw it twice in a row downloading a 6GB file with Camino 1.0b1.
Assignee | ||
Comment 20•19 years ago
|
||
By debugging, I tracked this problem to a premature nsPipeInputStream::Close() call, coming out of nsPipeInputStream::Release() when mReaderRefCnt dropped to zero. However, the lifetimes of nsPipeInputStream/nsPipeOutputStream are tied to their owning nsPipe, so it seemed that the refCnt should never drop to 0 while the pipe was alive. And, since nsPipe has threadsafe addref/release, I think we should use threadsafe increment/decrement for the reader/writer counts too.
Assignee: dougt → sfraser_bugs
Status: NEW → ASSIGNED
Attachment #202768 -
Flags: superreview?
Attachment #202768 -
Flags: review?(dougt)
Assignee | ||
Updated•19 years ago
|
Attachment #202768 -
Flags: superreview? → superreview?(darin)
Comment 21•19 years ago
|
||
Comment on attachment 202768 [details] [diff] [review] Patch: use atomic increment/decrement for reader/writer counts nice detective work.
Attachment #202768 -
Flags: review?(dougt) → review+
Comment 22•19 years ago
|
||
Comment on attachment 202768 [details] [diff] [review] Patch: use atomic increment/decrement for reader/writer counts > NS_IMETHODIMP_(nsrefcnt) > nsPipeOutputStream::Release() > { >- if (--mWriterRefCnt == 0) >+ PR_AtomicDecrement((PRInt32 *)&mWriterRefCnt); >+ if (mWriterRefCnt == 0) > Close(); > return mPipe->Release(); > } You can't read mWriterRefCnt naked. Instead, use the return value of PR_AtomicDecrement as you do in nsPipeInputStream::Release. Is nsrefcnt always four bytes? Otherwise the (PRInt32 *) cast is not safe.
Attachment #202768 -
Flags: superreview?(darin) → review-
Assignee | ||
Comment 23•19 years ago
|
||
Fix nsPipeOutputStream::Release() to use return value from PR_AtomicDecrement. Wan-Teh: nsrefcnt is defined like this: #if defined(XP_WIN) && PR_BYTES_PER_LONG == 4 typedef unsigned long nsrefcnt; #else typedef PRUint32 nsrefcnt; #endif so will always be 4 bytes.
Attachment #202777 -
Flags: superreview?(darin)
Attachment #202777 -
Flags: review?(dougt)
Assignee | ||
Updated•19 years ago
|
Attachment #202777 -
Flags: review?(wtchang)
Updated•19 years ago
|
Attachment #202777 -
Flags: review?(wtchang) → review+
Assignee | ||
Comment 24•19 years ago
|
||
I would strongly recommend putting this on the 1.8 branch.
Flags: blocking1.8.1?
Comment 25•19 years ago
|
||
Comment on attachment 202777 [details] [diff] [review] Revised patch >+ nsrefcnt count = PR_AtomicDecrement((PRInt32 *)&mReaderRefCnt); >+ if (count == 0) > Close(); nit: PR_AtomicDecrement returns PRInt32 maybe just do: if (PR_AtomicDecrement((PRInt32 *)&mReaderRefCnt) == 0) Close(); Good find! This code probably assumed that a stream would only be used on a single thread. It forgot about the fact that the stream might be created on a different thread :-( sr=darin
Attachment #202777 -
Flags: superreview?(darin) → superreview+
Assignee | ||
Comment 26•19 years ago
|
||
Checked in on trunk: Checking in nsPipe3.cpp; /cvsroot/mozilla/xpcom/io/nsPipe3.cpp,v <-- nsPipe3.cpp new revision: 1.14; previous revision: 1.13 Seeking approval for branch.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Comment 27•19 years ago
|
||
Comment on attachment 202777 [details] [diff] [review] Revised patch I have some concerns about the AddRef and Release methods of nsPipeInputStream and nsPipeOutputStream. These methods increment or decrement two reference counts. For example, nsPipeInputStream::AddRef increments mReaderRefCnt and the reference count of mPipe. This patch makes all the increments and decrements atomic, but this may not be enough if the two reference counts must be incremented or decremented as a single transaction. I don't have time to study the code to see whether this is necessary. A common example of the need for a transaction is to transfer money between bank accounts. It is not sufficient that the "from" account is debited atomically and the "to" account is credited atomically. You also need the debit and credit to be performed "at the same time", so that the sum of the balances in the two bank accounts stays the same. Back to this patch. If we need to maintain a certain invariant on the two reference counts, then it is not sufficient that they each be incremented or decremented atomically; we need to use a higher-level lock to ensure that the two reference counts stay consistent with each other. Another note: it may be a good idea to do the increments and decrements in opposite orders. That is, we may want to do AddRef in this order: NS_IMETHODIMP_(nsrefcnt) nsPipeInputStream::AddRef(void) { nsrefcnt rv = mPipe->AddRef(); PR_AtomicIncrement((PRInt32*)&mReaderRefCnt); return rv; }
Comment 28•19 years ago
|
||
Which Solaris (sparc or x86) build of Mozilla can I get with this fix ? I have had a similar problem - see bug 248414 . I never had time to provide the trace, but the problem was easily reproducible with the URL I gave, and I could try that again. If this fixes it, I would like to mark the other bug as a dupe of this one.
Updated•19 years ago
|
Attachment #202777 -
Attachment description: Revised patch → Revised patch
Attachment #202777 -
Flags: review?(dougt) → review+
Assignee | ||
Updated•19 years ago
|
Attachment #202777 -
Flags: approval1.8rc2?
Comment 29•19 years ago
|
||
Excellent catch. I was completely unable to download an 800MB file over a fast connection with a dual-CPU machine until I updated to get this patch.
Assignee | ||
Comment 30•19 years ago
|
||
*** Bug 248414 has been marked as a duplicate of this bug. ***
Assignee | ||
Updated•19 years ago
|
OS: MacOS X → All
Hardware: Macintosh → All
Assignee | ||
Updated•19 years ago
|
Flags: blocking1.8.1? → blocking1.8.0.1?
Updated•19 years ago
|
Attachment #202777 -
Flags: approval1.8rc2? → approval1.8.0.1?
Comment 31•19 years ago
|
||
Darin - any thoughts on Comment 27?
Comment 32•19 years ago
|
||
WTC: I don't think we need to worry about building a transaction around the two AddRef calls. The second AddRef, which affects the destruction of the object itself, can happen indepedently since the caller of the AddRef method must itself have an owning reference to the object. So, the result of calling AddRef is always to change the reference count of the object to a value of 2 or more (except in the very special case of initial object construction).
Comment 33•19 years ago
|
||
I meant to add that I think we should take this fix for FF 1.5.0.1. It has zero risk and big-time reward on dual-core systems.
Comment 34•19 years ago
|
||
i landed this on a minibranch for camino1.0b2 since we are getting no approval traction
Comment 35•19 years ago
|
||
Comment on attachment 202777 [details] [diff] [review] Revised patch a=dveditz
Attachment #202777 -
Flags: approval1.8.1+
Attachment #202777 -
Flags: approval1.8.0.1?
Attachment #202777 -
Flags: approval1.8.0.1+
Updated•19 years ago
|
Flags: blocking1.8.1+
Flags: blocking1.8.0.1?
Flags: blocking1.8.0.1+
Comment 36•19 years ago
|
||
Checked in to MOZILLA_1_8_BRANCH and MOZILLA_1_8_0_BRANCH.
Assignee | ||
Updated•19 years ago
|
Keywords: fixed1.8,
fixed1.8.0.1
Updated•19 years ago
|
Keywords: fixed1.8 → fixed1.8.1
Comment 38•19 years ago
|
||
This fix doesn't appear to have gone into any official release of Seamonkey, ie. Mozilla 1.7.x . Mozilla 1.8 is still not released, so it can't be used for production. Please backport this to MOZILLA_1_7_BRANCH so that the next official Seamonkey patch release can actually perform downloads on SMP workstations. Thanks.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 39•19 years ago
|
||
FIXED means fixed on the trunk. Flags are used to indicate branch status. You could request blocking1.7.13/14 if you think it should block the next 1.7.x release.
Status: REOPENED → RESOLVED
Closed: 19 years ago → 19 years ago
Resolution: --- → FIXED
Comment 40•19 years ago
|
||
Julien, I verified that this fix is in SeaMonkey 1.0. (The Mozilla Application Suite has been renamed SeaMonkey, so SeaMonkey 1.0 is what would have been Mozilla 1.8 or 1.8.0.1.)
Comment 41•19 years ago
|
||
Wan-Teh, I was unaware of that change. When I go to http://www.mozilla.org/products and click on mozilla suite, I get prompted to download mozilla 1.7.12 . The 1.7 is also what's being updated in solaris right now and I'm not sure if they will switch to 1.8. So I would like to see this fix on the 1.7 branch too.
Flags: blocking1.7.13?
Comment 42•19 years ago
|
||
Comment on attachment 202777 [details] [diff] [review] Revised patch comment 31 applies to older supported releases, too.
Attachment #202777 -
Flags: approval1.7.13+
Attachment #202777 -
Flags: approval-aviary1.0.8+
Comment 43•19 years ago
|
||
Fix checked into old supported branches
Flags: blocking1.7.13?
Flags: blocking1.7.13+
Flags: blocking-aviary1.0.8+
Keywords: fixed-aviary1.0.8,
fixed1.7.13
Comment 44•18 years ago
|
||
verified on the 1.0.1 branch using Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.13) Gecko/20060302 Firefox/1.0.8. I followed the original reporter's steps and downloaded the same file from the Apple Developer connection and no issues were encountered. Adding keyword. Thanks to Simon for the good debugging of the problem.
Keywords: fixed-aviary1.0.8 → verified-aviary1.0.8
Comment 45•18 years ago
|
||
verified on the 1.7 branch using Mozilla 1.7.13 Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.13) Gecko/20060309 following same steps in Comment 44.
Keywords: fixed1.7.13 → verified1.7.13
Updated•3 months ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•