Closed Bug 276536 Opened 20 years ago Closed 16 years ago

PR_Connect times out even when told not to timeout

Categories

(NSPR :: NSPR, defect, P2)

4.5.1
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nelson, Assigned: glenbeasley)

Details

Attachments

(1 file, 2 obsolete files)

What is the correct behavior of PR_Connect when called with
PR_INTERVAL_NO_TIMEOUT and no system responds to the connection request?

Is it to timeout at the underlying OS'es timeout interval?
Or is it to NOT timeout at all, but (under the hood) keep retrying 
ad infinitem?  

I found no documentation that speaks to this issue.  So I did some tests.

On a blocking socket, call PR_Connect with an IP address of a non-existent
machine, and with PR_INTERVAL_NO_TIMEOUT.  After a time (~20 seconds on 
windows, a few minutes on Solaris), the PR_Connect call will return -1
and the error code will be PR_IO_TIMEOUT_ERROR.  

I wrote two small test programs, one for Solaris and one for Windows, that 
use the native sockets API on each platform, and blocking sockets, and 
called connect to connect to an IP address that I knew was down.  The 
connect calls returned ETIMEDOUT (on Solaris) and WSAETIMEDOUT (on Windows),
so the behavior of PR_Connect clearly reflects the behavior of the underlying
system call (even though NSPR uses non-blocking sockets to emulate blocking ones).  

Clearly, by having the explicit timeout argument to PR_Connect, which can have
the NO_TIMEOUT value, we are setting expectations about behavior.  

I think we should either

a) make the code make the expected behavior, or 

b) document the behavior of PR_Connect when the caller specified timeout
interval exceeds the system's timeout interval.  

I think NSPR *could* close the failed underlying system socket (not the 
PRFileDesc), replace it with a new socket, bind it and configure it as 
the previous socket was configured, and retry the connect, and repeat
this indefinitely, on until the user's specified timeout interval had
elapsed, but it does not appear to me that NSPR does that now for any
platform.   

Wan-Teh, what do you think about this?
On Solaris and Windows, when the OS returns ETIMEDOUT on the socket, that
error is NOT fatal to the socket.  That is, after connect returns -1
with ETIMEDOUT, it is possible to immediately call connect again, with 
the same arguments as before, on the same socket, and have it succeed, 
or timeout again, as before.  This works with both native connect calls,
and with PR_Connect calls, on both platforms. 
Hi Nelson,

Thank you for pointing this out.  I think we should
document NSPR's current behavior (to timeout at the
underlying OS's timeout interval).
Wan-Teh, I agree.  I recommend this resolution:
Document current behavior in header file, and correct the following pages:
http://www.mozilla.org/projects/nspr/reference/html/prlayer.html#1037680
http://www.mozilla.org/projects/nspr/reference/html/priofnc.html#18727
Priority: -- → P2
Target Milestone: --- → 4.6
QA Contact: wtchang → nspr
Target Milestone: 4.6 → ---
We should also update the following documentation:

https://developer.mozilla.org/en/PR_Connect

PR_Connect blocks until either the connection is successfully established or an error occurs. If the timeout parameter is not PR_INTERVAL_NO_TIMEOUT and the connection setup cannot complete before the time limit, PR_Connect fails with the error code PR_IO_TIMEOUT_ERROR.

http://www.mozilla.org/projects/nspr/reference/html/prlayer.html

The call to PR_Connect in line 87 binds the address the connection and establishes the virtual circuit to the peer. The use of PR_INTERVAL_NO_TIMEOUT as the second parameter to PR_Connect is risky. It indicates that the operation will either succeed or die trying.
Attached patch update on PR_INTERVAL_NO_TIMEOUT (obsolete) — Splinter Review
I can update the documentation with following additional sentence:

If the timeout parameter is <CODE>PR_INTERVAL_NO_TIMEOUT</CODE> then
the underlying OS's timeout value will be used.
Assignee: wtc → glen.beasley
Attachment #363172 - Flags: review?(wtc)
Attached patch update on PR_INTERVAL_NO_TIMEOUT (obsolete) — Splinter Review
Sorry pick the wrong patch. 

I can update the wiki/html documentation with following additional sentence:

If the timeout parameter is <CODE>PR_INTERVAL_NO_TIMEOUT</CODE> then
the underlying OS's timeout value will be used.
Attachment #363172 - Attachment is obsolete: true
Attachment #363173 - Flags: review?(wtc)
Attachment #363172 - Flags: review?(wtc)
Attachment #363173 - Flags: review?(wtc) → review-
Comment on attachment 363173 [details] [diff] [review]
update on PR_INTERVAL_NO_TIMEOUT

The new comment is not accurate.  It is accurate to say that the function uses the lesser of the user's chosen timeout time or the OS'es connect timeout time.
Attachment #363173 - Attachment is obsolete: true
Attachment #365958 - Flags: review?
Attachment #365958 - Flags: review? → review?(nelson)
Attachment #365958 - Flags: review?(nelson) → review+
Comment on attachment 365958 [details] [diff] [review]
update to PR_Connect

r=nelson
Checking in include/prinrval.h;
/cvsroot/mozilla/nsprpub/pr/include/prinrval.h,v  <--  prinrval.h
new revision: 3.7; previous revision: 3.6
done
Checking in include/prio.h;
/cvsroot/mozilla/nsprpub/pr/include/prio.h,v  <--  prio.h
new revision: 3.42; previous revision: 3.41
done


http://www.mozilla.org/projects/nspr/reference/html/prlayer.html#Layer091

-to the peer. The use of <TT>PR_INTERVAL_NO_TIMEOUT</TT> as the second parameter
-to <TT>PR_Connect</TT> is risky. It indicates that the operation will either
-succeed or die trying.
+to the peer. This function uses the lesser of the provided timeout time or the OS'es connect timeout time. 
+Meaning if you specify <TT>PR_INTERVAL_NO_TIMEOUT</TT> as the timeout, the OS's connection
+time limit will be used.

Checking in mozilla-org/html/projects/nspr/reference/html/prlayer.html;
/www/mozilla-org/html/projects/nspr/reference/html/prlayer.html,v  <--  prlayer.html
new revision: 1.4; previous revision: 1.3
done

--- mozilla-org/html/projects/nspr/reference/html/priofnc.html
+++ mozilla-org/html/projects/nspr/reference/html/priofnc.html
@@ -3220,9 +3220,8 @@
 <P>
 
http://www.mozilla.org/projects/nspr/reference/html/priofnc.html

 <A NAME="18764"> </A><CODE>PR_Connect</CODE> blocks until either the connection is successfully established or an 
-error occurs. If the timeout parameter is not <CODE>PR_INTERVAL_NO_TIMEOUT</CODE> and the 
-connection setup cannot complete before the time limit, <CODE>PR_Connect</CODE> fails with the 
-error code <CODE>PR_IO_TIMEOUT_ERROR</CODE>. 
+error occurs. The function uses the lesser of the provided timeout time or the OS'es connect timeout time. 
+Meaning if you specify PR_INTERVAL_NO_TIMEOUT as the timeout, the OS's connection time limit will be used. 
 
 <P>

Checking in mozilla-org/html/projects/nspr/reference/html/priofnc.html;
/www/mozilla-org/html/projects/nspr/reference/html/priofnc.html,v  <--  priofnc.html
new revision: 1.13; previous revision: 1.12
done
    


    
Updated wiki https://developer.mozilla.org/En/PR_Connect
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: