Closed Bug 193267 Opened 22 years ago Closed 22 years ago

Code cleanup for nsSocketTransport2.cpp

Categories

(Core :: Networking, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla1.3final

People

(Reporter: wtc, Assigned: darin.moz)

References

Details

Attachments

(2 files, 4 obsolete files)

I'm going to propose some cleanup of the code in nsSocketTransport2.cpp around PR_Poll. This should target Mozilla 1.4 alpha so that it gets enough testing on various platforms before Mozilla 1.4 final. The benefit is more accurate error codes if PR_POLL_ERR, PR_POLL_EXCEPT, or PR_POLL_HUP is reported by PR_Poll. It also removes a workaround for an invalid NSPR bug (Necko was using the out_flags fields after a PR_Poll timeout).
Attached patch Proposed patch (obsolete) — Splinter Review
Note that this if statement if ((mPollFlags & PR_POLL_WRITE) && (pollFlags & ~PR_POLL_READ)) { means "if we polled for write and PR_Poll reported at least one event that indicates we can write some data or the write will fail (the flags include PR_POLL_WRITE, PR_POLL_EXCEPT, PR_POLL_ERR, PR_POLL_HUP, and PR_POLL_NVAL but not PR_POLL_READ)".
Comment on attachment 114389 [details] [diff] [review] Same patch, with whitespace changes ignored for easier code review sr=darin.. very clever :)
Attachment #114389 - Flags: superreview+
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.4alpha
Attachment #114389 - Flags: review?(bzbarsky)
Attached patch Alternative patch (obsolete) — Splinter Review
You may find this patch more self-documenting. I explicitly list the PR_POLL_XXX flags that represent errors.
You can actually handle PR_POLL_NVAL as a special case, but I don't know what is the NS_ error code for "bad descriptor (EBADF)" or "invalid handle (ERROR_INVALID_HANDLE)".
yeah, that's probably better. turns out there is a bug on linux (probably XP_UNIX) where the check for PR_POLL_ERR above PR_ConnectContinue was causing bugs. apparently, if a server is not listening on a designated port, PR_Poll will set PR_POLL_ERR (in fact the out_flags are set to decimal 40)! as a result, we end up mapping the connection failure to NS_ERROR_NET_RESET and give the wrong error message to the user. see for example bug 193092. because of this, i'd really like to try to get this patch in for mozilla 1.3 final. i've tested it out and it works well.
Blocks: 193092
Severity: normal → major
Flags: blocking1.3?
Priority: -- → P2
Target Milestone: mozilla1.4alpha → mozilla1.3final
Blocks: 192696
Attachment #114389 - Flags: superreview+
Attachment #114389 - Flags: review?(bzbarsky)
Attachment #114511 - Flags: superreview+
Attachment #114511 - Flags: review?(dougt)
Comment on attachment 114511 [details] [diff] [review] Alternative patch, with whitespace changes ignored for easier code review nit i would have move the #define up to the top of the file and commented on it.
Attachment #114511 - Flags: review?(dougt) → review+
Attachment #114511 - Flags: approval1.3?
doug: yeah, but i think in this case the #define is very localized to this function. if the #define is not near the function, then the casual reader might go looking for PR_POLL_ERRORS in prio.h or one of the other NSPR headers. keeping it local to the function helps make the code more self-documenting (IMO), so i'm inclined to just leave it as is :-/
Comment on attachment 114511 [details] [diff] [review] Alternative patch, with whitespace changes ignored for easier code review We can turn the PR_POLL_ERRORS macro into a const PRInt16 local variable. You may want to rename it to follow your variable naming convention.
yeah, that should do nicely ;-)
Blocks: 193120
actually, i suspect this patch would cause a 100% CPU utilization bug if a server drops a socket connection while necko is not interested in reading or writing on the socket. we'd continuously call PR_Poll with PR_POLL_EXCEPT set, which would return immediately with some kind of error flag set. this would continue until necko's idle timer fires (15 second intervals), at which time necko would check its idle sockets to see if any have become disconnected (using a call to nsSocketTransport::IsAlive). i think we should avoid polling for only PR_POLL_EXCEPT. going to run some tests to verify my assertion...
Attachment #114511 - Attachment is obsolete: true
Attachment #114511 - Flags: approval1.3?
Attached patch revised patchSplinter Review
ok, i discovered a few subtle bugs while testing out this patch. not only do we have the potential problem i described previously, but i also stumbled on a bug in nsSocketTransportService::RemoveFromIdleList, which probably explains the PR_Poll crash reported as bug 193756.
Attachment #114388 - Attachment is obsolete: true
Attachment #114389 - Attachment is obsolete: true
Attachment #114509 - Attachment is obsolete: true
Blocks: 193756
Attachment #114759 - Flags: superreview?(bzbarsky)
Attachment #114759 - Flags: review?(dougt)
Attachment #114759 - Flags: review?(dougt) → review+
Comment on attachment 114759 [details] [diff] [review] revised patch (w/ whitespace changes stripped) >Index: nsSocketTransport2.cpp >-nsSocketTransport::OnSocketReady(PRFileDesc *fd, PRInt16 pollFlags) >+nsSocketTransport::OnSocketReady(PRFileDesc *fd, PRInt16 outFlags) maybe call those "socketFlags"? In any case, update the function def in the header so the name there matches this one. sr=bzbarsky with that addressed. >- memcpy(&mActiveList[index], &mActiveList[mIdleCount - 1], sizeof(SocketContext)); >+ memcpy(&mIdleList[index], &mIdleList[mIdleCount - 1], sizeof(SocketContext)); You know, I spent a lot of time sorting through that code to figure out what it was doing.... can't believe I missed that. :(
Attachment #114759 - Flags: superreview?(bzbarsky) → superreview+
Comment on attachment 114759 [details] [diff] [review] revised patch (w/ whitespace changes stripped) requesting drivers approval for 1.3 final. this patch fixes a couple critical bugs.
Attachment #114759 - Flags: approval1.3?
bz: i changed the argument to be named |outFlags| because it corresponds to PRPollDesc::out_flags, which is set by PR_Poll after it returns. thanks for noticing that the header file is inconsistent btw!
Comment on attachment 114759 [details] [diff] [review] revised patch (w/ whitespace changes stripped) a=asa (on behalf of drivers) for checkin to 1.3.
Attachment #114759 - Flags: approval1.3? → approval1.3+
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
No longer blocks: 192696
Flags: blocking1.3?
*** Bug 194625 has been marked as a duplicate of this bug. ***
VERIFIED: the proxy error (bug 193092) reported this is now working. darin: re #11, is the scenario describe: 1- Server sends a FIN (abortive release) and we try to read from a connection that is now in CLOSE_WAIT 2- Server disappeared, leaving the connection half open? I've begun integrating these situations into the TCP unit test suite.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: