Closed Bug 85514 Opened 19 years ago Closed 19 years ago

downloading files on Mac sometimes fail [hang] midway

Categories

(NSPR :: NSPR, defect, P2, critical)

PowerPC
Mac System 9.x
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: bugzilla, Assigned: sfraser_bugs)

References

()

Details

(Keywords: platform-parity)

Attachments

(2 files)

couldn't find an existing bug, but pls do dup as needed.

summary: when i do an ftp download of a file on the mac, the download sometimes
fails [download progress meter hangs] midway.

to repro:
1. go to an ftp site, such as the one at sweetlou above [go into a build folder,
then].
2. single-click a file --i have been choosing files that are at least 10 Mbytes
in size. this should bring up the downloading/helper app dialog.
3. in the downloading dialog, make sure the "save to disk" option is selected,
then click OK.
4. in the resulting file picker ["enter name of file to save to..."], choose a
location --fwiw, here i don't save on the desktop; rather, i've been selecting a
subfolder that's located on the non-startup disk.
5. click Save button, and the download progress ["saving file"] dialog will
appear.
6. wait for the download to complete.

result: about half-way to three-quarter's the way thru, the progress meter in
the progress dialog stops moving. after about 3-5min, i give up and cancel the
download, and retry. it seems that roughly 2 out of 3 attempts are resulting in
this transfer failure.

tested using 2001.06.07.11-branch comm bits on Mac OS 9.0x G3.

grace et al., have you seen or heard of this?
Might be a dup of bug 71204, probably related to bug 53463
hmmm...i read bug 71204, so yeah it's possible this is a dup. fwiw [in contrast
to pchen's 2001-03-07 11:58 comments], i've been using installer bits when i
encounter this. so, i don't think that might matter.

also, i haven't really seen failures of this sort on win32 or linux, at least
not for some time --so adding 'pp'.

need to see if this is limited to ftp transfers...or if it also happens with
http downloads as well...
oops, really adding those kw's...
i have seen this recently (the 6/5/01 trunk bits), trying to d/l a 15+MB file. 
Severity: major → critical
Snippet from a protocol log:

3[d852398]: nsSocketReadRequest: [this=dfd1d58] inside OnRead.
3[d852398]: nsSocketReadRequest: [this=dfd1d58] calling listener [offset=8528440, 
count=8192]
3[d852398]: nsSocketIS: PR_Read(count=8192) returned 2920
3[d852398]: nsSocketIS: PR_Read(count=5272) returned -1
3[d852398]: nsSocketIS: PR_Read() failed with PR_WOULD_BLOCK_ERROR
3[d852398]: nsSocketReadRequest: listener returned [rv=0]
3[d852398]: nsSocketReadRequest: [this=dfd1d58] read 2920 bytes [offset=8531360]
3[d852398]: nsSocketTransport: doReadWrite [readstatus=80470007 writestatus=0 
readsuspend=0 writesuspend=0 mSelectFlags=5]
3[d852398]: nsSocketTransport: Leaving Process() [host=208.12.36.227:23593 this=
e1e9f5c], mStatus = 80470007, CurrentState=5, mSelectFlags=5

3[d852398]: nsSocketTransport: Entering Process() [host=208.12.36.227:23593 this=
e1e9f5c], aSelectFlags=1, CurrentState=5.
3[d852398]: nsSocketTransport: Transport [host=208.12.36.227:23593 this=e1e9f5c] 
is in WaitReadWrite state [readtype=1 writetype=0 status=80470007].
3[d852398]: nsSocketTransport: doReadWrite [this=e1e9f5c, aSelectFlags=1, 
mReadRequest=dfd1d58, mWriteRequest=0

This sounds similar to bug 70408
This is a Mac NSPR bug.

To explain the protocol log stuff I pasted above (PR_Read returning 
PR_WOULD_BLOCK_ERROR) -- darin says that PR_Read can be called repeatedly until 
the necko buffer is full (without interventing PR_Poll calls), until the PR_Read 
returns 0 (to indicate EOF), or PR_WOULD_BLOCK_ERROR. It happens that on Mac, we 
can only detect an EOF by virtue of receiving an orderly release request from the 
server, and this can happen some time after we've read all available data from 
the stream. So there is a time window in which a second read (after the first has 
read all available data) will return PR_WOULD_BLOCK_ERROR, because OTRcv gives us 
a kOTNoDataErr, but we have not yet received the orderly release request.

This explains why protocol logs can look different between platforms, and why 
seeing PR_WOULD_BLOCK_ERROR in Mac logs is benign.

So the real problem in this bug is a race condition in Mac NSPR, I think. Some 
instrumentation shows that we stall when the OT notifier fires while we're inside 
of SendReceiveStream(). I think we're clobbering the value of me->io_pending, 
which needs to be protected by a lock.
Assignee: dougt → gordon
Throwing some _PR_INTSOFF/PR_Lock / PR_Unlock/_PR_FAST_INTSON around did not 
help. Looking some more, what I believe is happening is this:

  We're in SendReceiveStream(), calling OTRcv(). While we are in OTRcv(), the
  notifier strikes. The OTRcv() returns kOTNoDataErr, so we set 
  fd->secret->md.readReady to FALSE (thus clobbering the value that the notifier
  put in there).

Messing with interrupts and locks doesn't therefore help.
Using OTEnterNotifier/OTLeaveNotifier in SendReceiveStream() fixes this. These 
calls prevent the notifier from firing while we are in the read/write loop, so 
prevent clobbering of fd->secret->md.readReady/fd->secret->md.writeReady and me->
io_pending.

I have *not* tested the blocking version of the code. We may also need a similar 
fix in SendReceiveDgram() (what uses this?).
r=wtc.  By the way, please restore the original while (...) {
indentation style.  I would not worry about the blocking version
of the code as long as your patch is a strict improvement over
the current code.
over to Simon. Thanks.
Assignee: gordon → sfraser
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.2
0.9.2
I tested https, FTP, IMAP, and the page load tests. Everything checks out. Since 
I can check in to NSPR with just an r=, I'm ready to go.
Priority: -- → P2
*** Bug 84826 has been marked as a duplicate of this bug. ***
Wrong.  You don't need an sr=, but you need an a=.

Remember to check in the same fix on the trunk of NSPR.
Who has to give a=? An NSPR module owner, or drivers?
drivers@mozilla.org.  Treat the NSPRPUB_CLIENT_BRANCH as if
it were the trunk of Mozilla client.

As for the trunk of NSPR, there is no sr= or a= requirement.
a=blizzard on behalf of drivers for the trunk
Blocks: 83989
Checked into NSPRPUB_CLIENT_BRANCH, and the NSPR tip.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
*** Bug 76899 has been marked as a duplicate of this bug. ***
verified:  mac os9 7/24/01 branch and trunk
Status: RESOLVED → VERIFIED
Component: Networking: FTP → NSPR
Product: Browser → NSPR
Target Milestone: mozilla0.9.2 → ---
You need to log in before you can comment on or make changes to this bug.