Closed
Bug 80092
Opened 24 years ago
Closed 19 years ago
SSL write indicates all data sent when some is buffered
Categories
(NSS :: Libraries, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
3.11.1
People
(Reporter: jgmyers, Assigned: nelson)
References
Details
Attachments
(4 files, 4 obsolete files)
32.29 KB,
patch
|
Details | Diff | Splinter Review | |
29.85 KB,
patch
|
Details | Diff | Splinter Review | |
38.56 KB,
patch
|
rrelyea
:
review+
|
Details | Diff | Splinter Review |
31.30 KB,
patch
|
Details | Diff | Splinter Review |
During a SSL write on a nonblocking socket, when the lower layer FD returns
WOULDBLOCK, SSL incorrectly indicates to its caller that the data has been sent.
This causes some callers to stop trying to drive the write, so the encrypted
but buffered data never gets sent.
For example, the Mozilla SMTP client has a state machine driven by an event
loop. When sending a message, it has logic like:
while (bytestosend > 0) {
wait in event loop until PR_Poll() indicates SSL fd is PR_POLL_WRITE
n = PR_Write(sslfd, sendptr, bytestosend);
sendptr += n;
bytestosend -= n;
}
inputptr = inputbuf;
while (inputbuf doesn't have a '\n') {
wait in event loop until PR_Poll() indicates SSL fd is PR_POLL_READ
n = PR_Read(sslfd, inputptr, ...);
inputptr += n;
}
So if the last SSL record encounters a lower-level WOULDBLOCK, the SSL code
incorrectly indicates that all the data has been sent and the caller will stop
trying to drive output.
The SSL write code should only return the number of bytes that have been
successfully sent through the lower layer fd. In the case where SSL has to
buffer a partial record, it should remember how many bytes it has already
encoded, so the encoding step knows to skip over that number of bytes of input
on the next call.
A possible workaround would be to modify ssl_Poll() to indicate that an SSL
socket is readable whenever there is buffered write data and the underlying file
descriptor is writable. That will cause the caller to attempt a read on the SSL
socket, causing the existing SSL read code to flush the buffered write data.
Updated•24 years ago
|
Priority: -- → P2
Target Milestone: --- → 3.2.2
Assignee | ||
Comment 1•24 years ago
|
||
John, I set out to implement your "possible workaround" as a
short term solution. But I'm concerned that this could cause
lots of CPU spinning while attempting to upload a large file
over a secure connection.
Reporter | ||
Comment 2•24 years ago
|
||
Perhaps it should only indicate the socket is readable when the previous
conditions are met and there was no request to select for writability? That
way, if the caller polls for both read and write while uploading a large file,
it will only trigger the write code.
Assignee | ||
Comment 3•24 years ago
|
||
The scenario I'm concerned with is that the app stuffs a large
amoung of stuff into the pipe, which is going to take a while
to send over a typical modem. Thinking the write is done, it
polls for read. It gets told the socket is readable (because
of the proposed change) when it isn't, so it calls PR_Read.
The ssl_SecureRecv code tries to send the buffered output,
which fails with EWOULDBLOCK (again), and then tries to read
on the socket, which also fails with EWOULDBLOCK, so it
returns to the caller. This sequence of poll, write(fail),
read(fail) continues as fast as it can until the data
previously sent dribbles out over the modem and is acked.
Reporter | ||
Comment 4•24 years ago
|
||
The poll method should only indicate the socket is readable when all of:
a) PR_POLL_WRITE was not in how_flags
b) There is buffered unsent output
c) The underlying socket is writable
Test (c) will prevent the fast spinloop.
So if cases (a) and (b) are true, ssl_Poll() should add PR_POLL_WRITE to
ret_flags. If the lower poll then places PR_POLL_WRITE in out_flags, ssl_Poll()
should replace it with PR_POLL_READ.
Assignee | ||
Comment 5•24 years ago
|
||
John, I emailed you a patch that I believe solve this (though it may
not be immediately apparent how or why). Please let me know if it
solves this problem, or not.
Status: NEW → ASSIGNED
Assignee | ||
Comment 6•24 years ago
|
||
Here is the patch I sent to John:
RCS file: /cvsroot/mozilla/security/nss/lib/ssl/sslsock.c,v
retrieving revision 1.13
diff -u -r1.13 sslsock.c
--- sslsock.c 2001/02/09 02:11:31 1.13
+++ sslsock.c 2001/05/18 03:19:13
@@ -1306,8 +1306,12 @@
if ((ret_flags & PR_POLL_READ) && (SSL_DataPending(fd) > 0)) {
*out_flags = PR_POLL_READ; /* it's ready already. */
-
- } else if (ret_flags && (fd->lower->methods->poll != NULL)) {
+ return ret_flags;
+ }
+ if ((ret_flags & PR_POLL_READ) && (ss->pendingBuf.len != 0)) {
+ ret_flags |= PR_POLL_WRITE;
+ }
+ if (ret_flags && (fd->lower->methods->poll != NULL)) {
ret_flags = fd->lower->methods->poll(fd->lower, ret_flags, out_flags);
}
John replied: "The patch appears to fix the problem."
so I will check it in.
Comment 7•24 years ago
|
||
Does PR_Close drain the encrypted but unsent data in the buffer?
Assignee | ||
Comment 8•24 years ago
|
||
If, by "drain", you mean "force it to be written", the answer is no.
This problem affects only programs that use non-blocking sockets.
AFAIK, that includes only mozilla and maybe iMS.
This solution is a workaround that is satisfactory for mozilla.
I have checked it in on NSS_3_2_BRANCH, rev 1.13.2.1.
Javi (or someone, me?) should move the tag Mozilla uses to that rev.
Comment 9•24 years ago
|
||
Tag has been rolled forward.
Comment 10•24 years ago
|
||
Nelson, the fix for this bug has been checked in, right?
Assignee | ||
Comment 11•24 years ago
|
||
The patch has been checked in on both branch and trunk.
However, I think this is considered more or less a workaround.
I'm not sure that this is the final solution, so
I'm not ready to mark it "fixed" just yet.
Updated•24 years ago
|
Target Milestone: 3.2.2 → 3.4
Comment 12•23 years ago
|
||
Changing my e-mail address.
Comment 13•23 years ago
|
||
Removing old e-mail address.
Comment 14•23 years ago
|
||
Changed the QA contact to Bishakha.
QA Contact: sonja.mirtitsch → bishakhabanerjee
Comment 16•22 years ago
|
||
Nelson, are you still planning to fix this the
right way, or should we mark the bug fixed because
we've checked in a workaround?
Assignee | ||
Comment 17•22 years ago
|
||
I'm reluctant to market this bug fixed, because of the possible data loss
when a write is followed by a close. But the right solution isn't obvious
or it would probably be implemented already.
Reporter | ||
Comment 18•22 years ago
|
||
I believe the right solution is as stated in the initial description: The SSL
write code should only return the number of bytes that have been
successfully sent through the lower layer fd. In the case where SSL has to
buffer a partial record, it should remember how many bytes it has already
encoded, so the encoding step knows to skip over that number of bytes of input
on the next call.
This requires that if a caller gets a partial write or a WOULDBLOCK, then any
future write call start with the same data they previously supplied but which
did not get written.
Assignee | ||
Comment 19•22 years ago
|
||
Right. One way to solve the problem is to put a new requirement on the
calling application that it MUST continue to write the same exact data
repeatedly until it succeeds, and cannot elect to substitute different
data prior to some retry.
That is not a requirement of normal sockets. Depending on that will
probably lead some day to a bug report where the application claims
to have written X, but the peer received Y, (because the application
previously attempted to write Y but failed). That seems like a
rather serious flaw for "security" code.
Assignee | ||
Comment 20•22 years ago
|
||
Mass retarget all my old NSS bugs that were previous targeted at NSS versions
that have now been released.
Target Milestone: 3.5 → 3.7
Comment 21•22 years ago
|
||
Thought I'd suggest a compromise solution.
If I do a PR_Write of, for example, 4096 bytes, but the underlying fd is WOULDBLOCK, have the SSL layer only acknowledge acceptance of 4095 bytes. The SSL layer should not accept the last byte until the fd is unblocked and all data can be written. If that last byte is already encoded as part of an SSL record which hasn't been fully sent, simply remember the last byte in the context, and return EINVAL if the caller tries to write a different value on retry.
This largely achieves the semantics John desires, but also reduces the error condition that concerns Nelson to a detectable single-byte case.
Reporter | ||
Comment 22•22 years ago
|
||
I'm not sure the compromise solution helps much. One still has to add the
requirement that the caller supply the same exact data repeatedly until it
succeeds. Where it does help is that it makes it easier for the implementation
to detect and error out when this requirement is not met, so such problems are
much more likely to be found in testing.
Comment 23•22 years ago
|
||
Moved to target milestone 3.8 because the original
NSS 3.7 release has been renamed 3.8.
Target Milestone: 3.7 → 3.8
Comment 24•22 years ago
|
||
Added myself to the CC list
Assignee | ||
Comment 25•22 years ago
|
||
Remove target milestone of 3.8, since these bugs didn't get into that release.
Target Milestone: 3.8 → ---
Assignee | ||
Updated•20 years ago
|
QA Contact: bishakhabanerjee → jason.m.reid
Comment 26•19 years ago
|
||
Here is a suggestion to solve this problem . This is a variation on comment #21
from Chris Newman .
We can add a new SSL socket option - maybe SSL_MUST_RESEND_IDENTICAL_DATA ? -
that will trigger the behavior where libssl requires the application to resend
the same data bytes. This needs to be an option because normal sockets don't
have this requirement.
libssl would enforce this requirement by comparing the data left in the buffer
with the data sent by the application the 2nd or subsequent times, and would
return a new error such as SSL_ERROR_DIFFERENT_DATA_RESENT . Maybe the
connection should also be aborted in this case, since this is a major bug in the
application that would need to be fixed. I don't think we would want to give the
application a chance to resend the correct old bytes again after this error
happens - it should be a fatal error.
I believe this should avoid the problem where an application claims to have sent
X and the peer received Y, which was Nelson's major concern in comment #19 with
the implementation of the complete fix.
The above fixes the problem for applications which set the option and are
willing to abide by its requirements only. For other applications, the problem
remains unsolved. I think we should at least try to deal with the case of
PR_Close which causes the data remaining in the SSL buffer to be discarded,
because the application has no way of knowing that the buffering happened once
SSL accepts the data.
We could make PR_Close return SECFailure and PR_WOULD_BLOCK_ERROR, but I bet
less than 1% of existing NBIO applications check for a failure from PR_Close,
and 0% would know what flags to PR_Poll on if PR_Close fails with this error ...
So, doing this would probably just result in memory leaks from unclosed sockets
:-( . Another possibility is for ssl_close to dispatch a thread to do the actual
close . But that's unprecedented and I doubt any NBIO application would
appreciate that solution, since the NBIO code is presumably written to
avoid/minimize the use of threads. It is definitely too dangerous as a default,
and I already suggested an option that solves the problem (see above) ...
Assignee | ||
Comment 27•19 years ago
|
||
For an SSL socket to have true non-blocking socket behavior, without adding
any new requirements for re-sending the same data previously sent, it must
return PR_WOULD_BLOCK_ERROR AND leave the socket's internal crypto state
exactly as it had been before the PR_Write/Send call was made.
Each time the SSL library prepares an SSL3/TLS record to be sent, it builds
the record, computes a "MAC", advances the "sequence number" (which is the
count of bytes sent), and encrypts the result. The act of encryption changes
the state of the encryption engine. Each encryption depends on the state of
the encryption engine following the preceeding encryption. Then after all
of this is done, it attempts to send the record, which may result in
EWouldBlock from the lower socket.
So, in order to provide full non-blocking semantics, the SSL code would need
to capture (copy) the state of the encryption engine (and the sequence number)
before encrypting each and every record. Then when EWOULDBLOCK occurs, it
would need to restore the state of the encryption engine and sequence variable
prior to returning. The performance cost of that extra saving of crypto
status prior to encrypting every record would be enormous.
Alternatively, if there was some way to ask the next lower layer a question
such as "can you now send N additional bytes without blocking (or returning
EWouldBlock)?" then we could test for that before encrypting the record and
advancing the sequence number. I do not recall there being any way to do that.
Reporter | ||
Comment 28•19 years ago
|
||
If the lower layer accepted half of the SSL record before returning
PR_WOULD_BLOCK_ERROR, there would be no way for the SSL code to correctly
restore the state of the encryption engine.
Assignee | ||
Comment 29•19 years ago
|
||
Yes, the case of a partial write of an SSL record seems the most difficult.
The SSL state can only be captures on record boundaries, not mid-record.
Once part of an SSL record is sent, libSSL will keep the remainder and
attempt to send it later, as part of any successive write call. But this
puts us right in the same situation again, of having conceptually accepted
more data than has actually been sent to the layer below. The state
capture suggestion only works in the case where the attempt to write an
SSL record from the beginning fails with EWouldBlock.
So, perhaps the rule that the writer must later retry to send the same bytes
that were previously unsent is unavoidable. In that case, this would be a
property of all non-blocking SSL sockets. An extra socket option to enable
this would seem unnecessary.
Assignee | ||
Comment 30•19 years ago
|
||
This patch attempts to change the SSL I/O layer semantics to more
closely match the semantics of native NSPR sockets.
Specifically, it attempts to eliminate short writes on blocking sockets.
It also attempts to implement Chris Newman's suggestion from comment 21,
except that it does not (yet) check that the caller is supplying the
same value in subsequent calls as the last byte of the previous one.
More testing is needed.
Assignee | ||
Updated•19 years ago
|
Attachment #206159 -
Flags: review?(rrelyea)
Comment 31•19 years ago
|
||
Comment on attachment 206159 [details] [diff] [review]
patch v1 (for trunk)
r=relyea
I do have some comments all minor:
1) There is some historical cruft that has moved into ssl3_CompressMACEncryptRecord()
in the line:
if (p1Len < 256) {
...
} else {
p1Len -= oddLen;
}
In the current code oddLen is always '0' in this case, so the entire else clause can be removed.
2) This is a carry over from the previous code:
There are assumptions that cipherBytes always equal the input len. This is true of all our current ciphers, but has not always been true in the past, and may not be true in the future (Fortezza expanded the initial encrypted packet by 8 bytes). We should explicitly document this here in case we need to deal with it in the future (the easiest way is probably to force the single encryption in the case where we know that the particular cipher can expand). Anyway I just want documentation in the code for now, we can worry about fixing it if and when we run into one of these expanding ciphers.
3)The comment 'then buffer remaining bytes..." adding a 'the' between buffer and remaining will make it read more smoothly (I originally read 'buffer' as a noun and had a hard time parsing the sentence).
4) On the phone you said you saved the 'last byte' and compared it with the new byte comming in, but the patch simply remembers that there was a short write and discards the first byte without examining it (ssl3_SendApplicationData).
bob
Attachment #206159 -
Flags: review?(rrelyea) → review+
Comment 32•19 years ago
|
||
re comment 31, review issue 4... oops, I should read your comments before reviewing;).
Assignee | ||
Comment 33•19 years ago
|
||
This is a later version of the above patch.
It adds the missing test for equality of the buffered byte.
Assignee | ||
Comment 34•19 years ago
|
||
Comment on attachment 207691 [details] [diff] [review]
duplicate of previous patch.
oops, this patch was identical to the previous one.
Attachment #207691 -
Attachment description: patch v2 → duplicate of previous patch.
Attachment #207691 -
Attachment is obsolete: true
Assignee | ||
Comment 35•19 years ago
|
||
I'm trying to capture various versions of this patch.
I believe this was my second version.
Assignee | ||
Comment 36•19 years ago
|
||
This is the most recent patch, v3, I think.
Assignee | ||
Updated•19 years ago
|
Attachment #206159 -
Attachment description: patch v1 (under development) → patch v1 (for trunk)
Assignee | ||
Updated•19 years ago
|
Attachment #208810 -
Attachment description: patch v2 (I think) → patch v2 (for 3.11 branch)
Assignee | ||
Updated•19 years ago
|
Attachment #208811 -
Attachment description: patch v3 (I think) → patch v3 (for 3.11 branch)
Assignee | ||
Comment 37•19 years ago
|
||
This should be the same patch as the above patch v1 for trunk,
except that this one is for the 3.11 branch.
It should now be possible to use bugzilla's "interdiff" feature
to diff this v1 against v2 and v3, to see what changed between them.
Assignee | ||
Comment 38•19 years ago
|
||
(In reply to comment #31)
> (From update of attachment 206159 [details] [diff] [review] [edit])
> r=relyea
Bob, thanks for the review of my patch v1 for the trunk.
> I do have some comments all minor:
>
> 1) There is some historical cruft that has moved into
> ssl3_CompressMACEncryptRecord()
>
> in the line:
>
> if (p1Len < 256) {
> ...
> } else {
> p1Len -= oddLen;
> }
>
> In the current code oddLen is always '0' in this case, so the entire else
> clause can be removed.
I don't believe that oddLen is always zero in this case.
oddlen is the content length plus the MAC length, modulo the block size.
So, for a block size of 8 (say DES), and a content length of 257 (say) and
mac length of 16 (say MD5), oddLen will be 1. Since p1Len is 257, this
code will subtract 1 from p1Len, making it a multiple of 8, and will add 1
to p2 Len below it, making it a multiple of 8 also.
That number 256 is a threshold below which the code will avoid doing two
encryption steps by combining the p1 and p2 parts into a single buffer (p2).
The threshold may need adjustment.
> 2) This is a carry over from the previous code:
>
> There are assumptions that cipherBytes always equal the input len.
I found 2 places that make that assumption. Both are the assertions that
immediately follow the calls to cwSpec->encode(), e.g.
PORT_Assert(rv == SECSuccess && cipherBytes == p1Len);
and
PORT_Assert(rv == SECSuccess && cipherBytes == p2Len);
Do you see any other places that make that assumption?
> 4) [...] you said you saved the 'last byte' and compared it with the new
> byte comming in, but the patch simply remembers that there was a short write
> and discards the first byte without examining it (ssl3_SendApplicationData).
Thanks for catching that. It tells me that the patch I asked you to review
was older than my source tree at the time I asked for review. IOW, I asked
you to review the wrong patch. But the differences between what you reviewed
and what I had intended for you to review are quite small.
I have now attached two more patches, v2 and v3. v2 is the patch I thought I
had attached to the bug when I asked you to review it. Since then, I've made
more changes due to review feedback from Chris Newman. The result is v3.
Because patch v1 was for the trunk and v2 was for the 3.11 branch, bugzilla
won't show you the diffs between them. So I created another copy of patch v1
for the 3.11 branch. SO, you should be able to diff the 3.11 v1 patch against
the 3.11 v2 and 3.11 v3 patches with bugzilla's diff tool, to see exactly what I've changed since then. Just be aware that when diffing v2 or v3 against v1,
the newer code will appear on the left, and the older code on the right.
Assignee | ||
Comment 39•19 years ago
|
||
Comment on attachment 206159 [details] [diff] [review]
patch v1 (for trunk)
Please see the other v1 patch attached to this bug.
It's the very same patch, but as applied to the 3.11 branch, not the trunk.
Attachment #206159 -
Attachment is obsolete: true
Assignee | ||
Comment 40•19 years ago
|
||
Comment on attachment 208811 [details] [diff] [review]
patch v3 (for 3.11 branch)
Bob, please review this patch. It's not very different from the v1 patch you previously reviewed. bugzilla will show you the diffs, if you compare it to the v1 patch for the same branch. Thanks.
Attachment #208811 -
Flags: review?(rrelyea)
Assignee | ||
Comment 41•19 years ago
|
||
Comment on attachment 208811 [details] [diff] [review]
patch v3 (for 3.11 branch)
cancelling this review request. ssl.sh is failing. I must see if it's because of this patch.
Attachment #208811 -
Flags: review?(rrelyea)
Assignee | ||
Comment 42•19 years ago
|
||
OK, here's patch v4 for the 3.11 branch. This one passes all.sh!
Attachment #208811 -
Attachment is obsolete: true
Attachment #208826 -
Flags: review?(rrelyea)
Comment 43•19 years ago
|
||
Comment on attachment 208826 [details] [diff] [review]
patch v4 (for 3.11 branch)
r+ relyea assuming patch v1 for 3.11 is equivalent to patch v1 for trunk.
Some comments.
1) I really dislike the mixing of PRInt32 and SECStatus. I would rather it not be propogated further. I recognize it is an existing problem in ssl3_SendRecord.
2) I think adding an explicit mask with your unsigned char cast makes the code a little clearer that you are intentionally truncating ss->appDataBuffered, rather than it just being a side effect of the cast (as written it looks like you just stored a char and an int and the cast was to make the compiler happy).
bob
Attachment #208826 -
Flags: review?(rrelyea) → review+
Assignee | ||
Comment 44•19 years ago
|
||
Comment on attachment 208826 [details] [diff] [review]
patch v4 (for 3.11 branch)
requesting second review for 3.11 branch
Attachment #208826 -
Flags: review?(julien.pierre.bugs)
Assignee | ||
Updated•19 years ago
|
Target Milestone: --- → 3.11.1
Assignee | ||
Updated•19 years ago
|
QA Contact: jason.m.reid → libraries
Assignee | ||
Comment 45•19 years ago
|
||
Comment on attachment 208826 [details] [diff] [review]
patch v4 (for 3.11 branch)
cancelling unneeded second review request for the branch.
Attachment #208826 -
Flags: review?(julien.pierre.bugs)
Assignee | ||
Comment 46•19 years ago
|
||
test results are back, it passed.
Now the question is, how similar is tha patch that passed to the one that
got reviewed :-/
Assignee | ||
Updated•19 years ago
|
Attachment #219122 -
Attachment description: patch for trunk (should match one of the above patches) → (wrong patch)
Attachment #219122 -
Attachment is obsolete: true
Assignee | ||
Comment 47•19 years ago
|
||
This is what I propose to check in, if it's not too different than the
above patch that was reviewed some time ago.
Assignee | ||
Comment 48•19 years ago
|
||
Checkin comment:
Bug 80092: SSL write indicates all data sent when some is buffered.
SSL now follows NSPR socket semantics and never returns a short write
count on a blocking socket. On a blocking socket, it returns either
the full count or -1 (with an error code set).
For non-blocking sockets, SSL no longer returns a full write count
when some of the data remains buffered in the SSL record layer.
Instead it returns a number is that always at least 1 byte short of a
full write count, so that the caller will keep retrying until it is done.
SSL makes sure that the first byte sent by the caller in the retry
matches the last byte previously buffered. r=rrelyea.
Modified Files: ssl3con.c sslcon.c ssldef.c sslimpl.h sslsecur.c
Checked in on trunk:
Checking in ssl3con.c; new revision: 1.88; previous revision: 1.87
Checking in sslcon.c; new revision: 1.30; previous revision: 1.29
Checking in ssldef.c; new revision: 1.11; previous revision: 1.10
Checking in sslimpl.h; new revision: 1.50; previous revision: 1.49
Checking in sslsecur.c;new revision: 1.36; previous revision: 1.35
Summary: SSL write accepts data when lower write WOULDBLOCK → SSL write indicates all data sent when some is buffered
Assignee | ||
Comment 49•19 years ago
|
||
This has been fixed. The work has been backported to the branch.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•