Open Bug 301479 Opened 19 years ago Updated 2 years ago

can't tell when connect job is finished on windows

Categories

(NSPR :: NSPR, defect)

x86
Windows XP
defect

Tracking

(Not tracked)

People

(Reporter: kamil, Unassigned)

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4

This bug was produced on windows.  It has been some time since I excercised this
code.  So caveat, that I'm not sure if it is against 4.4.1 source.  But a brief
reading the 4.4.1 source indicates that it should still be a problem.

PR_QueueJob_Connect does not return consistent error values, and its not clear
how to use it. Where it should return PR_IN_PROGRESS to the implementation
underneath  (see nsprpub/pr/src/misc/prtpool.c) it seems to always return
PR_IO_TIMEOUT_ERROR.  If you ignore that return code and call PR_Connect with a
NO_TIMEOUT interval, then, if the connection can be easily established (local)
you get back an IS_CONNECTED (already) error.  Otherwise (if your peer is not
listening for example) you get a IO_TIMEOUT,  if you tell it
INTERVAL_NO_TIMEOUT.  About the only way this seems to work is to tie up an
engine thread waiting for the connect to return once you are in the
handleConnectJob call.

Underneath, on NT/XP, PR_queueJob_Connect operates by doing a PR_Connect with a
timeout of NO_WAIT, and then posts completion on the job.  Now on windows,
PR_Connect is implemented by calling the WSA connect function.  But there is a
twist, the timeout is imlemented by putting the socket into non-blocking mode,
then doing a select for the timeout.  If the timeout is 0 (NO_WAIT), and the
connect hasn't finished yet, then select returns the number of fds changed (0)
and this results in nspr returning a PR_IO_TIMEOUT_ERROR. The socket is then
taken out of non-blocking mode, and the error is poped back to the thread pool.
 The queuejob function gets a timeout error and the job returns immediately. 

At this point the completion status of the connect is indeterminate.  You can
call PR_Connect at this point to try to find out, and now there is a race.  If
you call connect before you get the socket to actually connect you must handle
PR_IO_ALREADY_INITIATED, and if its connected then you get back 
PR_IS_CONNECTED_ERROR.  This is because (and I'm quoting from the visual c++ 6.0
help):

"Until the connection attempt completes on a nonblocking socket, all subsequent
calls to connect on the same socket will fail with the error code WSAEALREADY,
and WSAEISCONN when the connection completes successfully. Due to ambiguities in
version 1.1 of the Windows Sockets specification, error codes returned from
connect while a connection is already pending may vary among implementations. As
a result, it is not recommended that applications use multiple calls to connect
to detect connection completion. If they do, they must be prepared to handle
WSAEINVAL and WSAEWOULDBLOCK error values the same way that they handle
WSAEALREADY, to assure robust execution."

WSAEEINVAL maps to PR_IO_ALREADY_INITIATED.  So how do you figure out when your
PR_connect returned?  Or for that matter what the return code is?  What happens
if it failed and you miss the rc?

Reproducible: Always

Steps to Reproduce:
Create code that uses PR_QueueJob_Connect, try to use it with a varriety of
hosts that exist or don't.
Actual Results:  
sometimes the socket was connected, sometimes it was not.  Never can tell what
the resulting error was.

Expected Results:  
The connect job should return when a difinitive return status comes back from
connect.
QA Contact: wtchang → nspr
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Severity: normal → S3

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: wtc → nobody
Status: ASSIGNED → NEW
You need to log in before you can comment on or make changes to this bug.