Open Bug 282496 Opened 21 years ago Updated 3 years ago

Inconsistent cross platform behavior of PR_Shutdown

Categories

(NSPR :: NSPR, defect)

x86
All
defect

Tracking

(Not tracked)

People

(Reporter: nelson, Unassigned)

Details

A colleage has reported that when one thread is waiting on a read (e.g. PR_Read or PR_Recv) on an NSPR socket, and another thread calls PR_Shutdown on that socket, shutting it down for read, that the effect of the shutdown on the reading thread is not consistent on NSPR platforms. It is reported that on some platforms, this causes the PR_Recv/PR_Read to terminate, as if EOF had been received at that moment. On others, (including Solaris) the PR_Read/Recv continues to stay blocked. This is an issue for JSS, which needs a consistent cross platform way of causing blocked receiving threads to be unblocked and receive EOF. I believe that NSPR socket behavior should be consistent accross platforms with respect to this issue, even if the underlying OSes are not. I consider this reported lack of consistency to be a bug.
Blocks: 282732
It may be hard to make the behavior consistent across platforms. Most applications don't need it, and it may be expensive to implement.
In the current NSPR Unix implementation, functions like PR_Read and PR_Recv block in poll() for at most 5 seconds. We can set a flag in the PRFileDesc structure in PR_Shutdown, and have PR_Read and PR_Recv check that flag when they time out from poll. This solution is inexpensive. But I will need one of you to implement it.
Thanks Wan-Teh, Assigning but to myself, to implement.
Assignee: wtchang → glen.beasley
On Windows when a SSLSocket is blocked on Read, the execution is blocked in ntio.c in the function _PD_MD_WAIT on rv = WaitForSingleObject(thread->md.blocked_sema, msecs); in this case msecs is set to INFINITE: calling _PR_MD_SHUTDOWN with how == PR_SHUTDOWN_RCV the wait on the semaphore is not released but calling _PR_MD_SHUTDOWN with how == PR_SHUTDOWN_BOTH the wait on the semaphore is released. The goal in for bug 282732 is for Socket.close to interrupt threads blocked in I/O which can be accomplished by: if (ioRead || ioWrite) { shutdownNativeLow(SocketBase.PR_SHUTDOWN_BOTH); } So this bug no longers blocks 282732. JSS does have methods for shutdownInput() shutdownOutput() and I will open a bug to discuss the proper behavior for these two methods. Note in all version of the JDK shutdownInput and shutdownOutput is not supported. Test output: java.lang.UnsupportedOperationException: The method shutdownInput() is not supported in SSLSocket
Glen, Can you fix JSS bug 282732 without any NSPR changes? What's this shutdownNativeLow function you referred to? I can't find it in JSS. Did you really mean shutdownNative? Does JSS use real NSPR sockets, or Java sockets wrapped in PRFileDesc? It seems that JSS can use both: http://lxr.mozilla.org/security/source/security/jss/org/mozilla/jss/ssl/common.c#142 If Java sockets are used, they should already have the desired "close" semantics, right?
I can fix bug 282732 without an NSPR fix. If we want to implement the methods shutdownInput and shutdownOutput then an NSPR fix would be needed but this is low priority since all version of the JDK do not support shutdownInput and shutdownOutput. The JDK only supports shutdown on both read/write IO on a call to close which is the behaviour 282732 will implement when I complete the fix. shutdownNativeLow is a new function that I will be adding to the fix for 282732. I am just cleaning up 282732, and will attach the fix soon. >Does JSS use real NSPR sockets, or Java sockets wrapped >in PRFileDesc? It seems that JSS can use both: http://lxr.mozilla.org/security/source/security/jss/org/mozilla/jss/ssl/common.c#142 I believe 99% of the time Users of JSS use NSPR sockets. I have yet to see any code using Java sockets. SSLServerSocket.accept() creates a NSPR JSS SSLSocket. So all SSLServerSocket usage is NSPR sockets. Only clients using SSLSocket can create a Java socket and only one constructor allows such creation. public SSLSocket(java.net.Socket s, String host, SSLCertificateApprovalCallback certApprovalCallback, SSLClientCertificateSelectionCallback clientCertSelectionCallback) throws IOException javasock.c handles Java Sockets and has its own PRIOMethods We have no sample programs or QA programs using this constructor, I almost think we should add a comments to this constructor warning that usage of SSLSocket with Java Sockets is not well tested or used.
No longer blocks: 282732
Glen, in reply to comment 4, > but calling _PR_MD_SHUTDOWN with how == PR_SHUTDOWN_BOTH the > wait on the semaphore is released. IIRC, we determined that what is actually happening is that the shutdown both causes a TCP FIN to be sent to the peer, and then (in the test setup) the peer closes the connection, which in turn causes the local socket to see an EOF, and it is this EOF that causes the threads to become unblocked. IIRC, we determined that if the remote process blocks for (say) 30 seconds before closing the socket when it receives the FIN, then the local socket also does not become unblocked for 30 seconds. Am I recalling correctly? If so, then IMO we need a solution that does not depends on the remote system closing the socket first.
reply to comment 7 It is the case that if PR_SHUTDOWN_BOTH or PR_SHUTDOWN_SEND an EOF is sent http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winsock/winsock/socket_2.asp "If the how parameter is SD_SEND, subsequent calls to the send function are disallowed. For TCP sockets, a FIN will be sent" On unix platforms PR_Shutdown(fd, PR_SHUTDOWN_RCV); will unblock the reader, but on Windows the reader is blocked. On windows we need to use PR_Interrupt and PR_Interrupt unblocks the reader but the NSPR threading model is corrupted and the program will crash 90 percent of the time. The 50 percent of the time when the program does a PR_Close of the socket, 40 percent of the time at exit of my test program, and 10 percent it does crash. I created nspr bug 288232 for the PR_Interrupt issue. I changed the OS to Windows NT for this bug.
OS: Solaris → Windows NT
(In reply to comment #8) > On windows we need to use PR_Interrupt. PR_Interrupt does unblocks the reader but > the NSPR threading model is corrupted and the program will crash 90 percent of > the time. 50 percent of the time when the program does a PR_Close of the > socket, 40 percent of the time at exit of my test program, and 10 percent it > does crash. I meant 10 percent does not crash. > > I created nspr bug 288232 for the PR_Interrupt issue. > I changed the OS to Windows NT for this bug. > > >
QA Contact: wtchang → nspr
This bug was created will working on bug 282732 which is solved. Closing this bug as WONTFIX.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → WONTFIX
This bug was originally reported against Solaris, then was shown to also be inconsistent with NSPR on Windows NT. While JSS no longer requires that this bug be fixed, this bug remains a true platform inconsistency across supported NSPR platforms. A fix for this bug has been outlined (see comment 2), and I see no reason for us to refuse to fix it. Re: comment 8, this bug is about the effect of PR_Shutdown on the receiving side of the socket, not on the sending side of the socket. It concerns the effect of shutdown on outstanding reads, not writes. Re: comment 9, the "corrupted threading model" allegation was disproven in bug 288232. If we were to choose to refuse to fix it, then we need to document it in a public web page of known deficiencies of NSPR's cross-platform consistency.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
I think the next step for this bug is to produce a simple c test program that can reproduce the problem. the progrqm should become part of our cross platform NSPR test suite
Assignee: glen.beasley → nobody
Status: REOPENED → NEW
Assignee: nobody → julien.pierre.boogz
Priority: -- → P2
Hardware: Sun → PC
There is some confusion about which OSes are affected by this bug. It includes BOTH Solaris and Windows, and possibly others. There is now evidence that bug 282732 was not actually fixed, and I continue to believe that this bug is part of the cause.
OS: Windows NT → All
Taking.
Assignee: julien.pierre.boogz → nelson

The bug assignee didn't login in Bugzilla in the last 7 months and this bug has priority 'P2'.
:KaiE, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: nelson → nobody
Flags: needinfo?(kaie)

JSS issues aren't a priority

Flags: needinfo?(kaie)
Priority: P2 → --
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.