Closed
Bug 70408
Opened 24 years ago
Closed 24 years ago
Mac regularly fails to connect to server (PR_Poll fix)
Categories
(Core :: Networking: HTTP, defect)
Tracking
()
VERIFIED
FIXED
Future
People
(Reporter: jrgmorrison, Assigned: sfraser_bugs)
References
()
Details
(Keywords: perf)
Attachments
(3 files)
11.27 KB,
text/plain
|
Details | |
16.65 KB,
text/plain
|
Details | |
660 bytes,
patch
|
Details | Diff | Splinter Review |
Overview Description:
The Mac, for some time now, has occasionally thrown up an alert
"The operation timed out when attempting to contact jrgm.mcom.com"
while running the page loading test.
However, the server is fully reachable (e.g., another Linux or Windows box
on the same subnet running the same test at the same time will never throw
up this alert).
Since builds of Feb 22nd, this has gone from occasional to much more common
(visiting 200 URLs will cause this error to occur 2 or 3 times). Note that
Feb 22nd is the day after dougt's big channel changes.
The good news is that the Mac is on average running this test more quickly
(about 10% faster). Perhaps these two facts (the speedup and the increase in
how often this error occurs) are related.
Steps to Reproduce:
1) go to http://jrgm.mcom.com/page-loader/loader.pl
2) take the default values on that form and hit submit
3) wait until it happens
Actual Results: says it can't connect when server is A-OK
Expected Results: should connect
Reproducibility: between 1 in 100, and one in 70 running this test
Unfortunately, waiting until the alert is spun up may be too late to
reconstruct why the connection failed (but it's a start).
Note: the pages are loaded by doing JS Location.href = "aURL", which is
similar but not identical to the code path for clicking on a link or hitting
the back button.
This is a minor nuisance when running the page loading test (must be watched),
but I expect that real users are occasionally getting bogus failures, so
it would be good to track down why this is happening.
Reporter | ||
Comment 1•24 years ago
|
||
s/often/regularly/. Anyone have a good adjective for "1 in 70"?
Keywords: perf
Summary: Mac often fails to connect to server, running page loading test → Mac regularly fails to connect to server, running page loading test
Updated•24 years ago
|
Target Milestone: --- → Future
Assignee | ||
Comment 2•24 years ago
|
||
This is enough of a show-stopper for me to warrant more attention, because it
often prevents me from getting page load performance data.
Assignee | ||
Comment 4•24 years ago
|
||
Assignee | ||
Comment 5•24 years ago
|
||
Assignee | ||
Comment 6•24 years ago
|
||
Here's the chunk of log where things fail:
3[9afdbd8]: nsSocketTransport: Entering Process() [host=jrgm.mcom.com:80 this=
d4b4138], aSelectFlags=1, CurrentState=5.
3[9afdbd8]: nsSocketTransport: Transport [host=jrgm.mcom.com:80 this=d4b4138] is
in WaitReadWrite state [readtype=1 writetype=0 status=80470007].
3[9afdbd8]: nsSocketTransport: doReadWrite [this=d4b4138, aSelectFlags=1,
mReadRequest=d225320, mWriteRequest=0
3[9afdbd8]: nsSocketReadRequest: [this=d225320] inside OnRead.
3[9afdbd8]: nsSocketReadRequest: [this=d225320] calling listener [offset=39420,
count=8192]
3[9afdbd8]: nsSocketIS: PR_Read(count=1540) returned -1
3[9afdbd8]: nsSocketIS: PR_Read() failed with PR_WOULD_BLOCK_ERROR
3[9afdbd8]: nsSocketReadRequest: [this=d225320] read 0 bytes [offset=39420]
3[9afdbd8]: nsSocketTransport: doReadWrite [readstatus=80470007 writestatus=0
readsuspend=0 writesuspend=0 mSelectFlags=5]
3[9afdbd8]: nsSocketTransport: Leaving Process() [host=jrgm.mcom.com:80 this=
d4b4138], mStatus = 80470007, CurrentState=5, mSelectFlags=5
3[9afdbd8]: nsSocketTransport: Entering Process() [host=jrgm.mcom.com:80 this=
d73edf4], aSelectFlags=1, CurrentState=5.
3[9afdbd8]: nsSocketTransport: Transport [host=jrgm.mcom.com:80 this=d73edf4] is
in WaitReadWrite state [readtype=1 writetype=0 status=80470007].
3[9afdbd8]: nsSocketTransport: doReadWrite [this=d73edf4, aSelectFlags=1,
mReadRequest=d400d68, mWriteRequest=0
3[9afdbd8]: nsSocketTransport: [this=d73edf4] completing read request due to
cancelation
3[9afdbd8]: nsSocketTransport: CompleteAsyncRead [this=d73edf4]
3[9afdbd8]: nsSocketTransport: doReadWrite [readstatus=0 writestatus=0
readsuspend=0 writesuspend=0 mSelectFlags=4]
3[9afdbd8]: nsSocketTransport: Transport [host=jrgm.mcom.com:80 this=d73edf4] is
in Done state.
3[9afdbd8]: nsSocketTransport: Leaving Process() [host=jrgm.mcom.com:80 this=
d73edf4], mStatus = 0, CurrentState=4, mSelectFlags=4
3[9afdbd8]: nsSocketTransport: CheckForTimeout() [jrgm.mcom.com:80 d4b4138].
TIMED OUT... Idle interval: 120003 CurrentState=5
3[9afdbd8]: nsSocketTransport: Entering Process() [host=jrgm.mcom.com:80 this=
d4b4138], aSelectFlags=0, CurrentState=7.
3[9afdbd8]: nsSocketTransport: Transport [host=jrgm.mcom.com:80 this=d4b4138] is
in Timeout state.
3[9afdbd8]: nsSocketTransport: Transport [host=jrgm.mcom.com:80 this=d4b4138] is
in Error state.
3[9afdbd8]: nsSocketTransport: Transport [host=jrgm.mcom.com:80 this=d4b4138] is
in Done state.
3[9afdbd8]: nsSocketTransport: CompleteAsyncRead [this=d4b4138]
3[9afdbd8]: nsSocketTransport: Leaving Process() [host=jrgm.mcom.com:80 this=
d4b4138], mStatus = 0, CurrentState=3, mSelectFlags=4
mCurrentState is eSocketState_WaitReadWrite.
Note these lines:
3[9afdbd8]: nsSocketIS: PR_Read(count=1540) returned -1
3[9afdbd8]: nsSocketIS: PR_Read() failed with PR_WOULD_BLOCK_ERROR
these happen with or without the PR_Poll fix. They don't appear in a log from
Linux.
Assignee | ||
Comment 7•24 years ago
|
||
This seems like another NSPR networking issue on Mac. When we've read all the
available data, the next OTRcv (called from PR_Read) can return either
kOTNoDataErr (there is no more data, but we haven't got an T_ORDREL yet) or
kOTOutStateErr (we got the T_ORDREL already). We currently return an error if
OTRcv returns kOTNoDataErr. Maybe we should just return 0.
Assignee | ||
Comment 8•24 years ago
|
||
Returning 0 when we get kOTNoDataErr is bad. Don't do that.
However, I do find that PR_Poll is lying about data availability. We are
reporting that a socket is ready for reading even when we have not verified that
data is available. This happens in GetState(), which is called from _MD_poll. It
returns readReady based on fd->secret->md.readReady. fd->secret->md.readReady is
set because the last read succeeded, and we haven't had a T_ORDREL yet. Maybe we
should do an OTCountDataBytes() like we do in the blocking case?
Assignee | ||
Comment 9•24 years ago
|
||
Hacky "YMMV" patch:
Index: mozilla/nsprpub/pr/src/md/mac/macsockotpt.c
===================================================================
RCS file: /cvsroot/mozilla/nsprpub/pr/src/md/mac/macsockotpt.c,v
retrieving revision 3.15.2.7
diff -b -u -2 -r3.15.2.7 macsockotpt.c
--- macsockotpt.c 2001/02/07 01:02:27 3.15.2.7
+++ macsockotpt.c 2001/03/08 05:43:14
@@ -1659,6 +1659,9 @@
{
OTResult resultOT;
+ size_t availableData;
- *readReady = fd->secret->md.readReady;
+ resultOT = OTCountDataBytes((EndpointRef)fd->secret->md.osfd, &
availableData);
+
+ *readReady = fd->secret->md.readReady && (availableData > 0);
*exceptReady = fd->secret->md.exceptReady;
This patch seems to do very good things to page load times, reducing them by
about 400ms on averate (2700 -> 2300ms or so). Makes sense, because it avoids
extra PR_Reads and handling.
Assignee: darin → gordon
Reporter | ||
Comment 10•24 years ago
|
||
And no more spurious timeouts to boot! :-]
Assignee | ||
Comment 11•24 years ago
|
||
cc beard
Assignee | ||
Updated•24 years ago
|
Summary: Mac regularly fails to connect to server, running page loading test → Mac regularly fails to connect to server (PR_Poll fix)
Comment 12•24 years ago
|
||
Comment 13•24 years ago
|
||
Patrick, what broke? This looks like it is essentially identical to part of the
checkin I made for Simon earlier this evening (around 11pm).
Comment 14•24 years ago
|
||
I tried copying the patch and applying it with MacCVS, and the patch failed. I
assume it might be cause you already included it, but it didn't look like that to
me.
Comment 15•24 years ago
|
||
i couldn't apply the original code as included in the bug (maybe a linefeed
problem copying it out and patch choked).
so this didn't get checked in?
Comment 16•24 years ago
|
||
It did get checked in after all. I just confirmed that.
Assignee | ||
Comment 18•24 years ago
|
||
Fixed.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Comment 19•24 years ago
|
||
I haven't seen the Timed out error when running jrgm's test for some time now
markinh verified
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•