ssl_Poll() mishandles PR_POLL_READ during handshake

RESOLVED FIXED in 3.3

Status

P3
major
RESOLVED FIXED
18 years ago
18 years ago

People

(Reporter: jgmyers, Assigned: nelson)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments)

(Reporter)

Description

18 years ago
ssl_Poll() has code to handle the case where it gets a PR_POLL_WRITE request,
yet the connection would block waiting for the handshake to read.  Code to
handle the inverse case, when ssl_Poll() gets a PR_POLL_READ request, yet the
connection would block waiting for the handshake to write, is missing.

Comment 1

18 years ago
Nelson, could you take a look at this?  Thanks.
Assignee: wtc → nelsonb
(Reporter)

Comment 2

18 years ago
This causes a problem for IMAP and NNTP in Mozilla.  The first thing Mozilla 
will do after creating a new connection is do a read poll on it.  This read poll 
will wait forever, as the other side is waiting for us to send the initial SSL 
handshake record.
(Reporter)

Comment 3

18 years ago
Created attachment 23014 [details] [diff] [review]
proposed partial fix (untested)
(Reporter)

Comment 4

18 years ago
Created attachment 23017 [details] [diff] [review]
revised proposed partial fix (untested)
(Assignee)

Comment 5

18 years ago
Ah, I think I understand now.  
You have a classic case of a half-duplex application protocol
wherein the server speaks first.  That is, after the connection
is created, the first side to send data is the server.  In that
case, it is recommended that the client call SSL_ForceHandshake
after the connection is established.  That will start the 
handshake proceess rolling, by sending out the first SSL record,
and thereafter polling on the socket for read will behave as 
expected without any changes to the ssl_poll function.

This is discussed in chapters 1 and 4 of the SSL reference manual
http://www.mozilla.org/projects/security/pki/nss/ref/ssl/
http://www.mozilla.org/projects/security/pki/nss/ref/ssl/sslfnc.html#1133431
(Reporter)

Comment 6

18 years ago
Calling SSL_ForceHandshake() is one workaround, calling PR_Write(sslfd, 0, 0) is 
another.  Nonetheless, the current behavior of PR_Poll() is incorrect.
(Assignee)

Comment 7

18 years ago
I'm not sure that having poll return a value indicating that
the socket is readable when it isn't is correct behavior.
That's what your patch seems to do.
(Reporter)

Comment 8

18 years ago
After such a call to PR_Poll(), a subsequent (nonblocking) call to PR_Read() will 
result in progress.  This directly corresponds to the existing case of 
PR_POLL_WRITE during a handshake--it returns a value indicating the descriptor is 
writeable when a subsequent call to PR_Write() will result in progress, even 
though that write may not be able to accept any data.

Even after this patch, the code still behaves incorrectly after a write to the 
lower layer descriptor returns WOULDBLOCK.  Fortunately, this is not a common 
occurence.

Comment 9

18 years ago
Nelson, if this bug does not need to be fixed in
NSS 3.2, please set target milestone to 3.3.  Thanks.
Target Milestone: --- → 3.2
(Assignee)

Comment 10

18 years ago
John, you wrote:
> After such a call to PR_Poll(), a subsequent (nonblocking) call to 
> PR_Read() will result in progress. 

The key word there is nonblocking.  A blocking call will block.
There are applications that use blocking sockets, but that wish
to avoid blocking on those sockets.  They use PR_Poll to determine
when data is available to read, and then they call PR_Read to 
read the data.  If PR_Poll returns an indication that the 
socket is readable, when it really isn't, those applications 
will block.

So, perhaps this is a case where the behavior you seem to
want (reporting the socket is readable when it isn't) 
should only be done when the socket is non-blocking.
Yes?
(Reporter)

Comment 11

18 years ago
Any such applications are making incorrect use of PR_Poll().  PR_Poll() makes no 
guarantee that a blocking request will not block.  In the write-during-handshake 
case, the existing code will indicate the SSL socket is writeable even though a 
subsequent blocking PR_Write() call would block.
(Assignee)

Comment 12

18 years ago
John, This bug now seems to describe two different issues, both
related to PR_Poll.  The original bug description had to do with
an application polling for readability on a socket whose handshake 
was blocked on write.  The new bug description had to do with 
polling for readability on a socket on which no handshake has been
initiated.  Those are quite separate problems and have different
resolutions.  Different code would be changed for each one.

Please create a new bug report about the second issue.  
(Reporter)

Comment 13

18 years ago
The second issue is a subset of the first.  Filed bug 66706.
(Assignee)

Comment 14

18 years ago
I think there's a fundamental issue here concerning how PR_Poll
should (is defined to) work when used with blocking sockets.

If, as John suggests, a ready-to-read or ready-to-write return
value from PR_Poll is NOT a guarantee that the respective read
or write operation will not block, the PR_Poll seems useless 
for blocking sockets.  But is that the definition?  

BTW, I think the missing piece of this bug, a flag to remember that
the last attempt to write on the underlying socket blocked, is 
simple enough that I could yet get it into NSS 3.2.  
Proposed patch forthcoming.
(Assignee)

Comment 15

18 years ago
Created attachment 24870 [details] [diff] [review]
YA patch, getting closer, still one issue
(Assignee)

Comment 16

18 years ago
I just realized one more problem with reporting the socket
is ready because the initial handshake hasn't begun.
PR_Poll is also used to wait for a connection to complete.
ssl_Poll should not falsely report the socket readable or
writable if it isn't actually TCP connected yet.  Hmmm.
Status: NEW → ASSIGNED
(Reporter)

Comment 17

18 years ago
If the initial hanshake hasn't been done, it should treat it identically to if 
the handshake last blocked on a write.  In both cases, it should poll the lower 
fd for PR_POLL_WRITE.  That will take care of the connect-in-progress case.
(Assignee)

Comment 18

18 years ago
We don't want to poll on write if the handshake is actually
blocked on read.  

This problem is more complicated than I thought yesterday.
I believe I now know how to get ssl_Poll correct for all
the myriad cases, but I cannot complete this work today.
So, this bug will be deferred until NSS 3.2.1 (at least).
Target Milestone: 3.2 → 3.3
(Reporter)

Comment 19

18 years ago
If the socket hasn't sent the initial handshake, then it isn't blocking on read.
SSL sockets that are in the progress of connecting haven't sent the initial
hello record.  Therefore, polling for write is the correct behavior.
(Assignee)

Comment 20

18 years ago
The test using the handshake pointer and the name of an ssl2 
function is not a sufficient test.  We need a "handshake has 
Begun" flag and a "handshaking as client" flag.  
(Reporter)

Comment 21

18 years ago
Connecting sockets don't handshake as server.  ssl2_BeginClientHandshake() is
used for all beginning client handshakes, including TLS ones.  A "handshake
blocked on write" flag would be sufficient--when a client handshake is
requested, that flag would be set inside SSL_ResetHandshake().

(Assignee)

Comment 22

18 years ago
It is not true that connecting sockets always handshake as clients.
libSSL specifically provides the ability for them to handshake as
servers.  it is possible to "import" SSL into a connected or 
connecting (in progress) socket, and then call PR_Poll before the 
connection is complete and before calling SSL_ResetHandshake. 

The code must be correct for ALL possible legitimate sequences
of PR_Connect, SSL_ImportFD, PR_Poll, SSL_ResetHandshake, and
the PR_Read/Recv/Send/Write calls.
(Reporter)

Comment 23

18 years ago
No extant protocol handshakes as server on a connecting socket.  The existing 
code does not handle this case either.

A fix for this obscure case would be for SSL_ResetHandshake() to set the 
"handshake blocked on write" bit if handshaking as server and the underlying 
socket is in the middle of a nonblocking connect.  This would not interfere 
with and need not hold up a fix for the remaining cases.
(Assignee)

Comment 24

18 years ago
Created attachment 26588 [details] [diff] [review]
proposed patch, see comments below.
(Assignee)

Comment 25

18 years ago
The patch just above is fairly large, in part because I used 
this opportunity to rename an SSL variable from "connected"
to "firstHsDone".  It has always meant that the first handshake
is done.  Now, I also have a variable that records whether the
ssl socket is (or has been) TCPconnected, and I didn't want
name confusion.  

Barring objections, I plan to check it in soon.
(Assignee)

Comment 26

18 years ago
I recently checked in the rewritten version of ssl_Poll.
It is checked in on the trunk and will be part of NSS 3.3.
I believe the new version fixes bug 56924.  

John, will you verify this?
Status: ASSIGNED → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.