Closed Bug 919877 (CVE-2013-1740) Opened 6 years ago Closed 6 years ago

When false start is enabled, libssl will sometimes return unencrypted, unauthenticated data from PR_Recv

Categories

(NSS :: Libraries, defect, P1, critical)

defect

Tracking

(firefox24 wontfix, firefox25+ wontfix, firefox26+ wontfix, firefox27+ disabled, firefox28 disabled, firefox-esr17- wontfix, firefox-esr24- wontfix, b2g18 unaffected, b2g-v1.1hd ?, b2g-v1.2 ?)

RESOLVED FIXED
3.15.4
Tracking Status
firefox24 --- wontfix
firefox25 + wontfix
firefox26 + wontfix
firefox27 + disabled
firefox28 --- disabled
firefox-esr17 - wontfix
firefox-esr24 - wontfix
b2g18 --- unaffected
b2g-v1.1hd --- ?
b2g-v1.2 --- ?

People

(Reporter: briansmith, Assigned: briansmith)

References

Details

(Keywords: sec-critical, Whiteboard: [fixed with the patch to bug 713933])

Attachments

(3 files)

While working on updating strsclnt to actually test false start [1] as part of bug 713933, I added a SSL_TRC call to ssl_SecureSend that would be triggered whenever we are sending data when !ss->isFirstHsDone. However, when running strsclnt, that condition was never true, even when we were false starting. That is because of this code:

> @@ -103,20 +103,22 @@ ssl_Do1stHandshake(sslSocket *ss)
>  
>  	    SSL_TRC(3, ("%d: SSL[%d]: handshake is completed",
>  			SSL_GETPID(), ss->fd));
>              /* call handshake callback for ssl v2 */
>  	    /* for v3 this is done in ssl3_HandleFinished() */
>  	    if ((ss->handshakeCallback != NULL) && /* has callback */
>  		(!ss->firstHsDone) &&              /* only first time */
>  		(ss->version < SSL_LIBRARY_VERSION_3_0)) {  /* not ssl3 */
> -		ss->firstHsDone     = PR_TRUE;
> +		ss->firstHsDone = PR_TRUE;
> +		ss->enoughFirstHsDone = PR_TRUE;
>  		(ss->handshakeCallback)(ss->fd, ss->handshakeCallbackData);
>  	    }
> -	    ss->firstHsDone         = PR_TRUE;
> +	    ss->firstHsDone = PR_TRUE;
> +	    ss->enoughFirstHsDone = PR_TRUE;
>  	    ss->gs.writeOffset = 0;
>  	    ss->gs.readOffset  = 0;

The problem isn't with the changes made by my patch, but with the existing code that sets ss->firstHsDone = PR_TRUE (the bottom instance).

To verify that this is actually problematic, I modified libssl so that, when ss->sec.isServer is true, it would skip sending change_cipher_spec, skip sending the finished message, and skip switching its write cipher spec to the encrypted/authenticated cipher spec. With these changes, selfserv will send its application_data records unencrypted and unauthenticated immediately after the ServerHelloDone.

Then, I stripped sslstress.txt down to only the new ECC false start cases I added. Finally, I added a breakpoint between the point where strsclnt sends its HTTP request and the point where it receives the HTTP response. In the debugger, I verified that the application data was received by strsclnt unencrypted and unauthenticated, and that libssl returned the unencrypted and unauthenticated data unmodified to strsclnt.

strsclnt seems to use blocking I/O. Firefox and Chromium use non-blocking I/O. I attempted to create a test case using non-blocking I/O to trigger this wrong behavior. However, so far I have not been able to. I am still investigating why. It appears there may some significant difference in the SSL handshake gathering logic in the blocking and non-blocking cases. It could be that the blocking I/O case is the only one that is problematic but I haven't verified that yet.

Again, it does not seem like the changes for bug 713933 caused this problem. However, now I am going to go back and do my experiment on a version of libssl without the patches in that bug.

[1] The existing strsclnt test script, sslstress.txt, only contains entries for false start that use RSA key exchange. However, the default libssl criteria for false start was changed after that those test cases were added, and now libssl won't (by default) allow false start for connections using RSA key exchange, so the tests haven't really testing false start for a while.
Summary: When SSL_ENABLE_FALSE_START is set on a blocking socket, libssl will return unencrypted, authenticated data from PR_Recv → When SSL_ENABLE_FALSE_START is set on a blocking socket, libssl will return unencrypted, unauthenticated data from PR_Recv
I confirmed that this issue affects the NSS trunk with no patches applied.

Here is the edited output from running:

SSLTRACE=80 OS_TARGET=WIN95 NSS_ENABLE_ECC=1 DOMSUF=local4 HOST=localhost NSS_TESTS=ssl NSS_SSL_RUN=stress ./ssl.sh &> strsclnt.log

(localhost.local4 is mapped to 127.0.0.1 in my /etc/hosts).

I cut off the cert generation junk at the top and a lot of other junk at the end.

Search for "raw gather data: [Len: 140]" to find the interesting bits. Notice how the ciphertext == plaintext and notice that strsclnt prints:

strsclnt: connection on thread 0 read 140 bytes (140 total).

Also notice all 383 test cases passed. Those are some robust tests. :)
Note that you must build libssl with NSS_ENABLE_ECC=1 because the only ephemeral key exchange cipher suites we can test are the ECDHE ones, and libssl requires ephemeral key exchange for false start by default.
I think the patch to solve this problem is something like this one. The key thing to keep in mind is that ssl3_FinishHandshake is the ONLY place where it is safe to set ss->firstHsDone--not only because of false start, but also because of the way async certificate verification is done concurrently with sending/receiving handshake messages. In particular, if it is possible to trigger this condition in Firefox, then it means that even without false start, Firefox would accept data from untrustworthy certificate. So, it is urgent to figure out if/how this problem affects applications that use non-blocking sockets to see if/how this affects Firefox and potentially Chromium.

Note that strsclnt executes the following sequence of operations on a blocking socket:

   PR_Send(...);
   SSL_GetChannelInfo(...);
   PR_Recv(...);

When false start is not done, this is perfectly OK to do. However, in the current state of the NSS trunk, this is not OK to do when false start is enabled.

In particular, SSl_GetChannelInfo will fail because the handshake hasn't been completed and so it will call ssl3_CanFalseStart; ssl3_CanFalseStart will fail (in debug builds) because the SSL3 handshake lock isn't being held. The patch for bug 713933 resolves this aspect of the issue.

More seriously, in the current state of the trunk, the pattern of PR_Send() that succeeds followed by PR_Recv() isn't safe when false start is done on a blocking socket. I suspect that many applications follow this pattern though.
Assignee: nobody → brian
Status: NEW → ASSIGNED
Attachment #809017 - Flags: feedback?(wtc)
> @@ -103,20 +103,22 @@ ssl_Do1stHandshake(sslSocket *ss)
> +         PR_ASSERT(ss->firstHsDone);
> 	    ss->firstHsDone = PR_TRUE;
> 	    ss->enoughFirstHsDone = PR_TRUE;
>  	    ss->gs.writeOffset = 0;
>  	    ss->gs.readOffset  = 0;

I added this assertion to Firefox Nightly's copy of libssl to see if Firefox (and, by extension, other applications that use non-blocking sockets) would be affected by this issue. The assertion was triggered when false start is enabled and I saw that ss->ssl3.hs.ws == wait_change_cipher. So, it seems like Firefox 25 (beta) may indeed be affected by this issue. Tomorrow my plan is to confirm this by running Firefox against selfserv with the "investigative patch" applied. If that testing comes back with a bad result then we should hold back the enabling of false start from Firefox 25 (beta) and disable it for Firefox 26 (aurora) and 27 (nightly) until we have a fix for this bug.
Status: ASSIGNED → NEW
Summary: When SSL_ENABLE_FALSE_START is set on a blocking socket, libssl will return unencrypted, unauthenticated data from PR_Recv → When false start is enabled, libssl will sometimes return unencrypted, unauthenticated data from PR_Recv
This does appear to affect Chromium. Tested with a modified Go server which omits the CCS and Finished message.
Comment on attachment 809017 [details] [diff] [review]
WIP: patch that seems to fix the issue

Review of attachment 809017 [details] [diff] [review]:
-----------------------------------------------------------------

::: lib/ssl/ssl3con.c
@@ +6864,5 @@
>  PRBool
>  ssl3_CanFalseStart(sslSocket *ss) {
>      PRBool rv;
>  
> +    /*PORT_Assert( ss->opt.noLocks || ssl_HaveSSL3HandshakeLock(ss) );*/

Is this an unrelated change?

::: lib/ssl/sslsecur.c
@@ +106,4 @@
>  			SSL_GETPID(), ss->fd));
> +	    /* for v3 this is done in ssl3_FinishHandshake() */
> +	    if (!ss->firstHsDone && ss->version < SSL_LIBRARY_VERSION_3_0) {
> +		ss->firstHsDone = PR_TRUE;

This change means for SSL 3.0 and TLS, ss->handshake == 0 (checked on line 99)
does not imply the completion of handshake. Did I understand this correctly?

I guess for SSL 3.0 and TLS, ss->handshake is set to 0 at the end of
ssl_GatherRecord1stHandshake. Therefore, if False Start is enabled,
setting ss->handshake to 0 does not mean the handshake is completed.
Correct?
Attachment #809017 - Flags: feedback?(wtc) → feedback+
(In reply to Wan-Teh Chang from comment #6)
> ::: lib/ssl/ssl3con.c
> @@ +6864,5 @@
> >  PRBool
> >  ssl3_CanFalseStart(sslSocket *ss) {
> >      PRBool rv;
> >  
> > +    /*PORT_Assert( ss->opt.noLocks || ssl_HaveSSL3HandshakeLock(ss) );*/
> 
> Is this an unrelated change?

No. strsclnt executes the following:

1. PR_Send
2. SSL_GetChannelInfo
3. PR_Recv

Without this patch, it all "works" except there is this vulnerability. With this patch, the call to SSL_GetChannelInfo fails (assertion in debug build), because SSL_GetChannelInfo will call ssl3_CanFalseStart and ssl3_CanFalseStart has the above check, which fails because we aren't holding the SSL3HandshakeLock. The proper fix for this is to make the type of changes that I made in bug 713933 to limit ssl3_CanFalseStart to being called in just a couple of places where we are already holding the locks.

> ::: lib/ssl/sslsecur.c
> @@ +106,4 @@
> >  			SSL_GETPID(), ss->fd));
> > +	    /* for v3 this is done in ssl3_FinishHandshake() */
> > +	    if (!ss->firstHsDone && ss->version < SSL_LIBRARY_VERSION_3_0) {
> > +		ss->firstHsDone = PR_TRUE;
> 
> This change means for SSL 3.0 and TLS, ss->handshake == 0 (checked on line
> 99)
> does not imply the completion of handshake. Did I understand this correctly?
> 
> I guess for SSL 3.0 and TLS, ss->handshake is set to 0 at the end of
> ssl_GatherRecord1stHandshake. Therefore, if False Start is enabled,
> setting ss->handshake to 0 does not mean the handshake is completed.
> Correct?

I am still looking into this. My patch seems to work but I am surprised it works. I would not be surprised to find some edge cases where it doesn't work.

Also, I am a little concerned about these two lines too:

   ss->gs.writeOffset = 0;
   ss->gs.readOffset  = 0;

At least for SSL 3.0+, this place seems like a bad place to reset these fields.

I remember that when I added the async cert verification mechanism, I had to be very careful with leftover data in the handshake messages in ss->ssl3.hs.msgState.buf when the handshake was interrupted in the middle of a handshake message. However, because ssl3_GatherData avoids reading data from the network beyond the end of the next record (which seems quite inefficient), this may not be something to be concerned about.
(In reply to Brian Smith (:briansmith, was :bsmith@mozilla.com) from comment #7)
> 
> > ::: lib/ssl/sslsecur.c
> > @@ +106,4 @@
> > >  			SSL_GETPID(), ss->fd));
> > > +	    /* for v3 this is done in ssl3_FinishHandshake() */
> > > +	    if (!ss->firstHsDone && ss->version < SSL_LIBRARY_VERSION_3_0) {
> > > +		ss->firstHsDone = PR_TRUE;
> > 
> > This change means for SSL 3.0 and TLS, ss->handshake == 0 (checked on line
> > 99) does not imply the completion of handshake. Did I understand this
> > correctly?

It appears to be the case that, for SSL 3.0 and TLS:

1. (ss->handshake == NULL) means that SSL_ForceHandshake and PR_Send/PR_Write will never advance to handshake state machine.

2. (ss->handshake == NULL) will also make this code in ssl_SecureRecv a no-op even after we fix the ss->firstHsDone issue:

>    rv = 0;
>    /* If any of these is non-zero, the initial handshake is not done. */
>    if (!ss->firstHsDone) {
>	ssl_Get1stHandshakeLock(ss);
>	if (ss->handshake || ss->nextHandshake || ss->securityHandshake) {
>	    rv = ssl_Do1stHandshake(ss);
>	}
>	ssl_Release1stHandshakeLock(ss);
>    }

3. #2 means that ssl_SecureRecv relies on DoRecv to drive the handshake. DoRecv calls ssl3_GatherAppDataRecord to do so. ssl3_GatherAppDataRecord calls ssl3_GatherCompleteHandshake repeatedly in a loop until some application data has been returned.

So, it seems like once we've received ServerHelloDone, SSL_ForceHandshake, PR_Send, and PR_Write will not have any effect as far as driving the handshake forward. IMO, this is a bug, but a different and much less severe one. These functions should always try to make progress in the handshake, especially SSL_ForceHandshake, IMO. Consider an application that only sends application data but never receives any; such an application would not work at all when false start is enabled.

I also re-discovered that ss->ssl3.hs.ws == idle_handshake is used seemingly interchangably with ss->isFirstHsDone. For example, ssl3_GatherCompleteHandshake tests (ss->ssl3.hs.ws == idle_handshake) when AFAICT ss->isFirstHsDone is a more logical test, since ss->isFirstHsDone is used throughout the rest of the gathering code. Also, ssl3_GatherCompleteHandshake is not holding the SSL3HandshakeLock when it tests ss->ssl3.hs.ws == idle_handshake, which also seems wrong. Note that ssl3_FinishHandshake sets ss->isFirstHsDone = PR_TRUE and ss->ssl3.hs.ws = idle_handshake.

> > I guess for SSL 3.0 and TLS, ss->handshake is set to 0 at the end of
> > ssl_GatherRecord1stHandshake. Therefore, if False Start is enabled,
> > setting ss->handshake to 0 does not mean the handshake is completed.
> > Correct?

Right. For SSL 3.0 and TLS, ss->handshake == NULL effectively means "only allow PR_Recv/PR_Read to drive the handshake forward."

> I am still looking into this. My patch seems to work but I am surprised it
> works. I would not be surprised to find some edge cases where it doesn't
> work.

ssl3_SetAlwaysBlock sets ss->handshake = ssl3_AlwaysBlock. This happens when the client auth data callback returns SECWouldBlock. We call the client auth data callback before we make the decision to false start, so it isn't problematic. But, it *would* be problematic, AFAICT, if we were to do things in the opposite order. We should document this, and/or (preferably, IMO) change the async client auth data callback stuff to use the "restartTarget" mechanism I created for async peer cert verification.

> Also, I am a little concerned about these two lines too:
> 
>    ss->gs.writeOffset = 0;
>    ss->gs.readOffset  = 0;

I did not find any issue with this part yet, but I don't completely understand how these fields are being maintained either.
Do we know which versions of Firefox are affected, in order to set status flags?
Versions before firefox-25 are affected only because we have a pref to enable false start that some users may have enabled. The default is to have false start disabled until Firefox 25. I will land the patch in bug 920248 to disable false start on mozilla-beta and mozilla-aurora this weekend now that it has been approved. If we want to disable false start on other branches because of the availability of the pref then please leave a (hidden) comment in bug 920248 or in a new bug.
I believe that the complete version of this patch is going to require some of the changes that are part of the patch for bug 713933, which has already gone through some review. In particular, the change to using ss->ssl3.hs.canFalseStart instead of calling ssl3_CanFalseStart is needed. I am going to fold the fix for this bug into the patch for bug 713933.
Please use CVE-2013-1740 for this vulnerability.
Alias: CVE-2013-1740
Any updates here? We're working on the next release cycle... I assume this is 28 disabled?
(In reply to Al Billings [:abillings] from comment #13)
> Any updates here? We're working on the next release cycle... I assume this
> is 28 disabled?

This bug in NSS is fixed with the patch for bug 713933 that was checked into NSS 3.15.3. The versions that are listed as "affected" in the tracking flags are only affected if the user has set a non-default value for the false-start pref. Do you want me to check in a patch that disables false start, regardless of the value of the pref, to those branches? 

Marking WORKSFORME/Depends on bug 713933 since the fix is in that public bug. Note that the fix is likely to be difficult to notice amidst all the other changes to that bug. On the other hand, bug 713933 comment 77 points out the problem.
Status: NEW → RESOLVED
Closed: 6 years ago
Depends on: 713933
Priority: -- → P1
Resolution: --- → WORKSFORME
Whiteboard: [fixed with the patch to bug 713933]
Target Milestone: --- → 3.15.3
changing target milestone to 3.15.4
Target Milestone: 3.15.3 → 3.15.4
based on comment 14 marking wontfix for ff26 and esr24 if this is only for non-default pref values.  Also minusing tracking for esr17 because it's EOL.
Although the patch is not in this bug we do know what fixed it, rather than the bug mysteriously going away as implied by "worksforme". Changing resolution to FIXED.
Group: crypto-core-security
Resolution: WORKSFORME → FIXED
Group: core-security
You need to log in before you can comment on or make changes to this bug.