Closed Bug 1086145 (CVE-2015-2721) Opened 10 years ago Closed 9 years ago

NSS incorrectly permits skipping of ServerKeyExchange

Categories

(NSS :: Libraries, defect, P1)

defect

Tracking

(firefox33 wontfix, firefox34+ wontfix, firefox35+ wontfix, firefox36- wontfix, firefox37 wontfix, firefox38 wontfix, firefox38.0.5 wontfix, firefox39 fixed, firefox40 fixed, firefox41 fixed, firefox-esr3139+ fixed, firefox-esr3839+ fixed, b2g-v1.4 wontfix, b2g-v2.0 fixed, b2g-v2.0M fixed, b2g-v2.1 fixed, b2g-v2.1S fixed, b2g-v2.2 fixed, b2g-master fixed)

RESOLVED FIXED
Tracking Status
firefox33 --- wontfix
firefox34 + wontfix
firefox35 + wontfix
firefox36 - wontfix
firefox37 --- wontfix
firefox38 --- wontfix
firefox38.0.5 --- wontfix
firefox39 --- fixed
firefox40 --- fixed
firefox41 --- fixed
firefox-esr31 39+ fixed
firefox-esr38 39+ fixed
b2g-v1.4 --- wontfix
b2g-v2.0 --- fixed
b2g-v2.0M --- fixed
b2g-v2.1 --- fixed
b2g-v2.1S --- fixed
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: karthikeyan.bhargavan, Assigned: mt)

References

()

Details

(Keywords: sec-moderate, Whiteboard: [adv-main39+][adv-esr38.1+][adv-esr31.8+][b2g-adv-main2.2+])

Attachments

(2 files, 8 obsolete files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36

Steps to reproduce:

We have been performing systematic state machine tests on NSS, OpenSSL, and other SSL implementations, and we found some unexpected behaviours in NSS (not all are immediately exploitable.) We will report the other weirdnesses in due course, but here's one that seems to break False Start's forward secrecy guarantee.

The core issue is that the NSS client allows an ECDHE_ECDSA exchange where the server does not send its ServerKeyExchange message. In this case, NSS will take
the (long-term) EC key from the ECDSA certificate, and use that instead of the
server's ephemeral parameters. This violates the TLS protocol and also has some security implications for forward secrecy.

The application at the client (e.g. the browser) still thinks it is engaged in an ECDHE exchange, but in fact it has been silently downgraded to a non-forward secret mixed-ECDH exchange. Consequently, if False Start is enabled, the client will start sending data encrypted under the (non forward-secret) connection keys. This breaks some of the changes of Bug942729 (https://bugzilla.mozilla.org/show_bug.cgi?id=942729).

To test this behavior, we used Chrome or Firefox to connect to Gmail (which uses ECDHE-ECDSA) via a network proxy. We allow all messages to pass through, but block the ServerKeyExchange going from google to the browser. The browser still completes the handshake and starts sending data. 

The culprit is in the beginning of ssl3_SendClientKeyExchange in ssl3con.c:

if (ss->sec.peerKey == NULL) {
	serverKey = CERT_ExtractPublicKey(ss->sec.peerCert);
        .....
}






Expected results:

One would expect the client to kill the handshake when it receives the ServerHelloDone without having received a ServerKeyExchange in an ECDHE exchange.

A similar behavior occurs for DHE, but requires the server to send a DH-key certificate (which is much rarer in practice.)
I believe the root causes are different than what Karthikeyan suggests:

1. 
    static SECStatus
    ssl3_HandleServerHelloDone(sslSocket *ss)
    {
        [...]

        SSL3WaitState ws          = ss->ssl3.hs.ws;

        [...]

        if (ws != wait_hello_done  &&
------>     ws != wait_server_cert &&
------>	    ws != wait_server_key  &&
------>     ws != wait_cert_request) {
	    SSL3_SendAlert(ss, alert_fatal, unexpected_message);
	    PORT_SetError(SSL_ERROR_RX_UNEXPECTED_HELLO_DONE);
	    return SECFailure;
        }

        rv = ssl3_SendClientSecondRound(ss);

        return rv;
    }

The three lines prefixed with "------>" are wrong. The *only* time we should accept ServerHelloDone is when ss->ssl3.hs.ws == wait_hello_done. More generally, whenever we process a received message, we need to ensure that ss->ssl3.hs.ws is in the wait_* state for that specific message. Whenever we process a message, we need to ensure ss->ssl3.hs.ws is modified to be the next state. We need to be careful about the cases where the next state is ambiguous (e.g. the optional CertificateStatus message).

If this is done, then I think that the line that Karthikeyan pointed out would not be problematic.

2.

    ssl3_SendClientKeyExchange(sslSocket *ss)
    {
        [...]
        if (ss->sec.peerKey == NULL) {
	    serverKey = CERT_ExtractPublicKey(ss->sec.peerCert);
            [...]
        } else {
            [...]
        }

        [...]

        switch (ss->ssl3.hs.kea_def->exchKeyType) {
            [...]
        case kt_ecdh:
	    rv = ssl3_SendECDHClientKeyExchange(ss, serverKey);
	    break;
            [...]

        [...]
    }

instead of checking "if (ss->sec.peerKey == NULL)", NSS should be checking "ss->ssl3.hs.kea_def->exchKeyType" and verifying that for ss->sec.peerKey is non-NULL for key exchange types other than RSA.

Further, the subjectPublicKeyInfo of the certificate will identify the type of the key:

    * id-ecPublicKey - either ECDH or ECDSA
    * id-ecDH - ECDH only

Note that there is no "ECDSA only" type, based on a quick look at the specs. Instead, according to the spec, ECDSA-only is indicated by id-ecPublicKey with keyUsage restricted to digitalSignature. But, neither NSS, nor the applications using it, verify that the keyUsage extension of the certificate is compatible with the way the key is being used.

However, I think if problem #1 is fixed then problem #2 probably won't occur.
Note that this behavior is probably pretty bad for renegotiation, because some state from the previous handshake's Certificate & ServerKeyExchange probably carries over into the next handshake if the Certificate and/or ServerKeyExchange messages are skipped.
Yes, these are the other problematic state machine features I was referring to.
NSS currently allows the ServerHelloDone immediately after ServerHello and in all these other cases too, marked by ---> and these should probably be disallowed (unless there are some use cases I am unaware of).

Other notes:

- If a client receives ServerHelloDone without having received ServerCertificate/ServerKeyExchange, it will accept the ServerHelloDone, but would fail at the line I pointed out in SendClientKeyExchange 

- If a client does not receive a ServerCertificate but then receives a ServerKeyExchange, it fails in HandleServerKeyExchange (where peerCert is checked, for signing)

- If a client receives a ServerCertificate and then a CertificateRequest; ServerHelloDone, it accepts them, asks the user for his certificate, and tries to calculate the ClientKeyExchange. In some cases this will go through (ECDSA cert). In others, it wont go through, but the client will already have been prompted for a certificate.
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #2)
> Note that this behavior is probably pretty bad for renegotiation, because
> some state from the previous handshake's Certificate & ServerKeyExchange
> probably carries over into the next handshake if the Certificate and/or
> ServerKeyExchange messages are skipped.

We've been trying to test renegotiation scenarios and trying to see what is leftover between handshakes, but this is more difficult to reproduce through a browser.
static SECStatus
ssl3_HandleServerKeyExchange(sslSocket *ss, SSL3Opaque *b, PRUint32 length)
{
    [...]

        if (ss->ssl3.hs.ws != wait_server_key &&
------>	    ss->ssl3.hs.ws != wait_server_cert) {
	    errCode = SSL_ERROR_RX_UNEXPECTED_SERVER_KEY_EXCH;
	    desc    = unexpected_message;
	    goto alert_loser;
        }
        if (ss->sec.peerCert == NULL) {
	    errCode = SSL_ERROR_RX_UNEXPECTED_SERVER_KEY_EXCH;
	    desc    = unexpected_message;
	    goto alert_loser;
        }

The line marked "------>" is wrong, but the (ss->sec.peerCert == NULL) check and others probably make it a style issue more than a substantive one.


    static SECStatus
    ssl3_HandleCertificateRequest(sslSocket *ss, SSL3Opaque *b, PRUint32 length)
    {
        [...]

        if (ss->ssl3.hs.ws != wait_cert_request &&
            ss->ssl3.hs.ws != wait_server_key) {

        [...]
    }

There's no reason to have this flexibility here. Instead, we should determine from the cipher suite's key exchange algorithm whether a ServerKeyExchange is expected or not, and accept the CertificateRequest message only in the correct state.

    static SECStatus
    ssl3_HandleClientHello(sslSocket *ss, SSL3Opaque *b, PRUint32 length)
    {
        [...]

        if ((ss->ssl3.hs.ws != wait_client_hello) &&
	    (ss->ssl3.hs.ws != idle_handshake)) {
    }

This looks wrong too, because it seems like we should only accept ClientHello in the wait_client_hello state. In particular, does something bad happen if the server sends the client a ClientHello before the client sends the server a ClientHello? The worst case scenerio would be that the server sending a ClientHello would cause the client to switch roles; this would be bad because in the server role we usually don't even require a certificate from the peer! IIRC, I tested that a long time ago and couldn't get a bad thing to happen because there was no certificate configured and/or there was no server session cache configured. However, that might not be the case for DTLS, and maybe I'm misremembering or maybe I tested it wrong.

Regardless, when the socket is configured in the client role, we shouldn't be processing ClientHello messages at all.
Karthikeyan, do you agree that this seems to only be possible for ECDSA keys?

(In reply to Karthikeyan Bhargavan from comment #4)
> We've been trying to test renegotiation scenarios and trying to see what is
> leftover between handshakes, but this is more difficult to reproduce through
> a browser.

See bug 702322 for an abandoned framework for precisely controlling the state machine in tests. See bug 736734 for some example tests. Also, I think recently some similar GTest-based framework was checked into NSS; I suspect that that newer framework may work better.
Since Firefox (and Chromium) on Android apparently depend on OpenSSL, I'll list a couple of funny features in the OpenSSL client implementation.

- During an ECDHE (DHE) handshake, OpenSSL allows a silent failover to static ECDH (DH) in the same way as NSS above (i.e. skip the ServerKeyExchange message.)

- During an RSA (note: not RSA_EXPORT) handshake, OpenSSL client allows an ephemeral RSA handshake. That is, the server can send a ServerKeyExchange message with signed ephemeral RSA parameters (say a small modulus and exponent) and the client will replace the server's public key with the ephemeral RSA public key. Hence, the client is doing an RSA_EXPORT handshake, even though it expected to do an RSA handshake,

- If the server skips the ServerCertificate and sends a ServerKeyExchange, OpenSSL will try its best to process the message, and fail deep inside the signature verification

There are other unexpected behaviours in the OpenSSL server implementation, but I am guessing that is not so relevant here.
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #6)
> Karthikeyan, do you agree that this seems to only be possible for ECDSA keys?

I missed this question before.

Yes, the behavior I am speaking about occurs *only* if the server certificate (peerCert) contains a key that may be used in the current DHE/ECDHE certificate. This means either a DH-Key certificate (which I have never seen, but someone may know of?) or more likely an ECDSA certificate. If the server or m-i-t-m provided an RSA/DSA certificate, the sendDHClientKeyExchange or sendECDHClientKeyExchange will fail.


> 
> (In reply to Karthikeyan Bhargavan from comment #4)
> > We've been trying to test renegotiation scenarios and trying to see what is
> > leftover between handshakes, but this is more difficult to reproduce through
> > a browser.
> 
> See bug 702322 for an abandoned framework for precisely controlling the
> state machine in tests. See bug 736734 for some example tests. Also, I think
> recently some similar GTest-based framework was checked into NSS; I suspect
> that that newer framework may work better.
In private email, I suggested that the researchers try the reverse-handshake race condition attack from comment 5 against other implementations. Karthikeyan replied:

> On NSS, I tried a few such cases (e.g. sending an empty ServerCertificate to see if it
> would treat it as a ClientCertificate. It works to an extent, but HandleNoCertificate
> checks that it is not running on a Server.

It isn't clear to me what you mean, because there's no such check in ssl3_HandleNoCertificate.

Besides trying an empty ServerCertificate message, I suggest also trying one with a certificate having the id-kp-clientAuth EKU that chains up to a root that you trust (your own custom trust anchor, marked as trusted for client certs in the browser). Note that in some cases it might be easy to get an id-kp-clientAuth certificate from a trusted CA, so that would e a serious problem.
> It isn't clear to me what you mean, because there's no such check in
> ssl3_HandleNoCertificate.

You're right, the check is in HandleCertificate. It took me a minute to negate
the conditional below in my head:

if (!(isTLS && isServer)) {
            desc = bad_certificate;
            goto alert_loser;
        }


> Besides trying an empty ServerCertificate message, I suggest also trying one
> with a certificate having the id-kp-clientAuth EKU that chains up to a root
> that you trust (your own custom trust anchor, marked as trusted for client
> certs in the browser). Note that in some cases it might be easy to get an
> id-kp-clientAuth certificate from a trusted CA, so that would e a serious
> problem.

I'll get Antoine to look at this, he understands certificates much better than me.
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #9)
> Besides trying an empty ServerCertificate message, I suggest also trying one
> with a certificate having the id-kp-clientAuth EKU that chains up to a root
> that you trust (your own custom trust anchor, marked as trusted for client
> certs in the browser). Note that in some cases it might be easy to get an
> id-kp-clientAuth certificate from a trusted CA, so that would e a serious
> problem.

I looked at the code and it seems that this could work. Building a test server now...
(In reply to Antoine Delignat-Lavaud from comment #11)
> (In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment
> #9)
> > Besides trying an empty ServerCertificate message, I suggest also trying one
> > with a certificate having the id-kp-clientAuth EKU

Note that this shouldn't work with any recent Firefox, because Firefox has been hard-coded to validate for the id-kp-serverAuth EKU since Firefox 11-ish. Please read bug 674147 comment 57, which is relevant to this. Prior to the fix for that bug, the code in Firefox looked like this:

    certUsage = isServer ? certUsageSSLClient : certUsageSSLServer;
    certificateusage = isServer ? certificateUsageSSLClient : certificateUsageSSLServer;

    if (!nsNSSComponent::globalConstFlagUsePKIXVerification) {
        rv = CERT_VerifyCertNow(CERT_GetDefaultCertDB(), peerCert, checksig, certUsage,
                                pinarg);
    }
    else {
        [...]

        rv = CERT_PKIXVerifyCert(peerCert, certificateusage,
                                survivingParams->GetRawPointerForNSS(),
                                cvout, pinarg);
    }

    if ( rv == SECSuccess && !isServer ) {
        /* cert is OK.  This is the client side of an SSL connection.
        * Now check the name field in the cert against the desired hostname.
        * NB: This is our only defense against Man-In-The-Middle (MITM) attacks!
        */
        if (hostname && hostname[0])
            rv = CERT_VerifyCertName(peerCert, hostname);
        else
            rv = SECFailure;

        [...]

It is more likely that Firefox 10 or older versions would be exploitable by such an inversion of roles caused by losing a race in sending ClientHello.

Also note that if your exploit fails because of a crash due to the fact that the session cache wasn't configured (as I mentioned in bug 674147 comment 57), then you can try configuring NSS with the SSL_NO_CACHE option, which might get around that bug. In recent versions of Firefox, that flag is controlled by the security.ssl.disable_session_identifiers option in about:config. In bug 967977, it was noted that the Tor Browser Bundle was previously patching NSS to disable session caching. So, if your exploit doesn't work against Firefox 10 or earlier versions, your exploit might work against older versions of the Tor Browser Bundle. Please let me know what you find out.
[Tracking Requested - why for this release]:
This seems like a serious issue because websites that use ECC keys are likely expecting them to only be used for ECDSA key exchange, not ECDH key exchange. That means that they are likely less concerned about disclosing their private key after the certificate has expired, since no data is supposed to be encrypted by those keys (however, see bug 610095). Because of the non-secure fallback logic in Firefox (and Chrome), an attacker will be able to silently cause the browser to encrypt the data with the key, and then re-try with a successful handshake, with the intent of stealing the server's private key later.
Assignee: nobody → nobody
Component: Security → Libraries
Keywords: sec-high
Product: Core → NSS
Target Milestone: --- → 3.17.3
Version: Trunk → trunk
Note that this affects all versions of NSS and all versions of all Mozilla products, but I didn't set all the status/tracking flags because I'm not sure which B2G-related flags are relvant.
OS: Mac OS X → All
Hardware: x86 → All
Dan - Is a fix planned for the Firefox 34 time frame?
Flags: needinfo?(dveditz)
That's a better question for the NSS/PKI team. tagging Richard as a first guess.
Flags: needinfo?(dveditz) → needinfo?(rlb)
I don't know if this bug report is the right place for this, but since I've been reporting state machine bugs, I'll continue to add them on this thread, since it seems to be a good place to sort out all these issues in one go.

We've been testing renegotiation behavior on various TLS implementations and one of the scenarios flagged the following behaviour in NSS.

During renegotiation, NSS accepts Application Data from the peer (and this is normal in TLS).
However, it continue to accept Application Data between Change Cipher Spec and Finished, that is, after the new keys have been installed, but before the new security context has been authenticated by the peer.
This feels like dangerous behavior, although not easily exploitable on the web.

The dangerous scenario is when an NSS TLS client is connected to a malicious server, then the server renegotiates but presents the valid certificate of an honest server. It can arbitrarily tamper with the new handshake and proceed all the way to ServerCCS, but it won't be able to forge a valid ServerFinished. However, if the client starts accepting Application Data before ServerFinished, it may be fooled into
accepting data thinking it is talking to the honest server S. 

The above scenario may become exploitable when:
- A client application is willing to accept different server identities during the initial handshake and renegotiation (E.g. the first is anonymous or uses a compromised cert, but the second uses a strong cert)
- The attacker is able to obtain and inject application data (out-of-order) before the ServerFinished.
  (This could happen, e.g. if the connection is using DTLS)

If it seems worth doing, I could try finding a concrete exploit on NSS/DTLS, but perhaps it would be simpler if we agreed this should be fixed. That is, no Application Data should be accepted between CCS and Finished (on the first or any subsequent handshake).

-K.
(In reply to Karthikeyan Bhargavan from comment #18)
> - A client application is willing to accept different server identities
> during the initial handshake and renegotiation (E.g. the first is anonymous
> or uses a compromised cert, but the second uses a strong cert)

I suggest that browsers implement the suggestion in bug 978831 so that this never happens. This is one of multiple reasons to fix bug 978831.

> - The attacker is able to obtain and inject application data (out-of-order)
> before the ServerFinished.
>   (This could happen, e.g. if the connection is using DTLS)
> 
> If it seems worth doing, I could try finding a concrete exploit on NSS/DTLS,
> but perhaps it would be simpler if we agreed this should be fixed. That is,
> no Application Data should be accepted between CCS and Finished (on the
> first or any subsequent handshake).

I agree that that seems like a good idea too, worthy of its own bug.
The DTLS spec seems to agree (https://tools.ietf.org/html/rfc6347#section-4.1)

   Note that in the special case of a rehandshake on an existing
   association, it is safe to process a data packet immediately, even if
   the ChangeCipherSpec or Finished messages have not yet been received
   provided that either the rehandshake resumes the existing session or
   that it uses exactly the same security parameters as the existing
   association.  In any other case, the implementation MUST wait for the
   receipt of the Finished message to prevent downgrade attack.

A second class of attack may arise that does not involve switching server identities
(it's reminiscent of the classic renegotiation attack):
- A malicious client M connects to an honest server S and sends some data D
- An honest client C independently connects to the same server S
- M intercepts the handshake and sends it as a renegotiation over its own (encrypted) connection.
- M changes C's ClientHello to add a Renegotiation Indication extension, but otherwise makes no changes
- M sends C's CCS but drops C's ClientFinished
- M instead sends C's Application Data D' (e.g if using False start with DTLS) to the server.

If the server is accepting Application Data before finished during renegotiation (NSS), it will decrypt the packet and may release it to the application. Now the malicious client's data D and the good clients's D'
have been concatenated at the server. 

This scenario is effectively the same as Ray and Dispensa's renegotiation attack. The key is that the Secure Renego countermeasure only kicks in at the Finished message; if an attacker could bypass that the 
attack reappears.

The first important pre-condition for the above "attack" scenario is that the malicious client M should be able to get a good client's encrypted application data D' before having to send it a ServerFinished. This could easily happen during False Start. 

A second pre-condition is that the honest server should be willing to read the application data even though it appeared out of order. I am guessing this may happen during DTLS, but I don't have an easy way of testing it. In any case, I am pretty sure the server will decrypt the message using the new keys, whether or not it delivers it to the application. In some cases, since the new keys have not yet been authenticated, this may become an attack vector for some computational attacks. 

-Karthik

PS: Should I copy these comments over to a new bug, or shall I keep them here?
Given that we're at the end of the 34 beta cycle, I'm marking this bug as wontfix for 34.
Yeah, looks like we've missed the 34 window.  I'll get someone to take a look at this for 35.
Flags: needinfo?(rlb)
Richard, we're nearing the end of the 35 cycle with only one beta left on Mon Dec 29th - is there any news for this issue, are there options for a low risk resolution?
Flags: needinfo?(rlb)
I guess there are two levels of resolution that seem reasonable here.

As bsmith points out, the NSS state machine is asking for trouble (and will probably lead to new bugs in the future) but designing and implementing a new conservative state machine is probably a bit of work. 

In the short term, just fixing the ECDHE->ECDH fallback should fix the most egregious case above.
In addition, preventing application data between CCS and Finished during renegotiation seems like an easy fix?
Those options sound good, but at this stage in the cycle they will have to be for 36 and not 35. We only have an RC remaining and cannot take on speculative patches.
A heads up, two similar rollback attacks on openssl are due to be patched in openssl-1.0.1k (Jan 8th).
Notably the ECDHE->ECDH rollback in openssl will be disclosed as CVE-2014-3572
What is the plan for 36 wrt this bug? Thanks
No activity on this. I am going to untrack it.
If it would help push this along, I could propose a patch. 

Alternatively, I could set a student to hunt for more attacks, which would not be very productive, considering that we already know that the state machine in NSS has bugs.
(In reply to Karthikeyan Bhargavan from comment #29)
> If it would help push this along, I could propose a patch. 
Yes, it would help a lot.
> > If it would help push this along, I could propose a patch. 
> Yes, it would help a lot.

We'll get on it.
I'll leave this alone for ESR until we see what the plan is here
With no assignee and no plan of action on this bug, given that ESR 38 is 8 weeks away, there's no reason to track this for ESR 31 since we're a week away from the 37 RC and this is very unlikely to be fixed in time.
Discussed on the call today.  I'm going to have a crack at fixing the state machine.

I don't know that I understand enough of how this is playing up to generate a unit test, but I'll see if I can build something.
Assignee: nobody → martin.thomson
Thanks Martin, we'd be happy to provide a FlexTLS server (and client) to test against for compliance and non-compliance to the protocol state machine.
OK, so I've done an initial bit of analysis, and I think that Brian's original analysis in comment 1 wasn't quite correct.  The states from which ServerHelloDone is accepted are indeed correct, since those messages are optional in TLS.  However, I believe that we need additional discipline when the inbound message skips an optional message.

As mentioned in comment 5, we need to be checking the negotiated key agreement to determine whether skipping ServerKeyExchange is permitted.  If the key agreement requires the message, then the skip should be disallowed.  The same logic applies to Certificate also.  It seems valid to always allow skipping of CertificateRequest.

In all cases, an attacker that induces this sort of error can only benefit if they can also subsequently falsify the Finished message.  I will concede might be made feasible by the secondary effects of monkeying with ordering, as Brian described regarding the original issue.

I'm going to add the extra checks for Certificate and ServerKeyExchange; then I'll look more into fortifying the second order problems.

(Note that the point regarding application data between CCS and Finished doesn't bother me, as long as we only report the new status after Finished is verified.  That is, the data might as well belong to the previous generation of application data until the application is told that the new regime is in place.)
Status: NEW → ASSIGNED
First cut done: https://codereview.appspot.com/218070043/
Flags: needinfo?(wtc)
Attached patch bug1086145-1.patch (obsolete) — — Splinter Review
See also rietveld.
Attachment #8580964 - Flags: review?(wtc)
Attached patch bug1086145-2.patch (obsolete) — — Splinter Review
Attachment #8580965 - Flags: review?(wtc)
Comment on attachment 8580964 [details] [diff] [review]
bug1086145-1.patch

Review of attachment 8580964 [details] [diff] [review]:
-----------------------------------------------------------------

r=wtc. I reviewed this patch at https://codereview.appspot.com/218070043/.
I suggested some changes.
Attachment #8580964 - Flags: review?(wtc) → review+
Comment on attachment 8580965 [details] [diff] [review]
bug1086145-2.patch

Review of attachment 8580965 [details] [diff] [review]:
-----------------------------------------------------------------

Eric, could you review the tests that Martin added? Thanks.

My feedback+ indicates an approval for checkin (and an extremely
cursory review) rather than a review.
Attachment #8580965 - Flags: review?(wtc)
Attachment #8580965 - Flags: review?(ekr)
Attachment #8580965 - Flags: feedback+
Attached patch bug1086145-1.patch (obsolete) — — Splinter Review
Wan-Teh did say that he was happy with this, but I'd feel much better if this had a second round of checking given the extent of the changes.  Rietveld is already updated.
Attachment #8580964 - Attachment is obsolete: true
Flags: needinfo?(wtc)
Attachment #8588198 - Flags: review+
Attachment #8588198 - Flags: feedback?(wtc)
Attached patch bug1086145-2.patch (obsolete) — — Splinter Review
Attachment #8580965 - Attachment is obsolete: true
Attachment #8580965 - Flags: review?(ekr)
Attachment #8588199 - Flags: review?(ekr)
Summary: Disable NSS silent fallback to non-forward secret ciphersuites (or disable False Start) → NSS incorrectly permits skipping of ServerKeyExchange
Comment on attachment 8588198 [details] [diff] [review]
bug1086145-1.patch

Review of attachment 8588198 [details] [diff] [review]:
-----------------------------------------------------------------

::: lib/ssl/ssl3con.c
@@ +6646,5 @@
>      PORT_Assert( ss->opt.noLocks || ssl_HaveRecvBufLock(ss) );
>      PORT_Assert( ss->opt.noLocks || ssl_HaveSSL3HandshakeLock(ss) );
>  
> +    if (ss->ssl3.hs.ws != wait_server_key ||
> +        ss->sec.peerCert == NULL) {

I think the ss->sec.peerCert case should be either removed here.

When could ss->sec.peerCert be NULL here? The only place that sets ssl3.hs.ws = wait_server_key is ssl3_AuthCertificate, and that only happens after the certificate has already been set.

@@ +9840,5 @@
>      PORT_Assert( ss->opt.noLocks || ssl_HaveRecvBufLock(ss) );
>      PORT_Assert( ss->opt.noLocks || ssl_HaveSSL3HandshakeLock(ss) );
>  
> +    if ((isServer && ss->ssl3.hs.ws != wait_client_cert) ||
> +        (!isServer && ss->ssl3.hs.ws != wait_server_cert)) {

Is isn't necessary to check isServer here. ss->ssl3.hs.ws = wait_client_cert is only set by ssl3_SendServerHelloSequence, and ss->ssl3.hs.ws = wait_server_cert is only set by ssl3_HandleServerHello.

IMO, it is better not to add these redundant checks.

::: lib/ssl/sslsecur.c
@@ +399,5 @@
>      }
>  
> +    /* Require a strong, forward-secret key exchange. */
> +    *canFalseStart = ss->ssl3.hs.kea_def->ephemeral &&
> +      !ss->ssl3.hs.kea_def->is_limited;

I believe the old code was correct, and the old code was better because it made it more clear exactly which key exchange algorithms are being whitelisted. It's safer to explicitly whitelist the allowed KEAs than to allow any non-limited ephemeral key exchange algorithm, when considering that we may add new kinds of ephemeral key exchange in the future.

Also, there should be a follow-up bug to remove the non-ECC DHE_* algorithms from the whitelist, because they are generally not safe for False Start.
Comment on attachment 8588198 [details] [diff] [review]
bug1086145-1.patch

Review of attachment 8588198 [details] [diff] [review]:
-----------------------------------------------------------------

::: lib/ssl/ssl3con.c
@@ +6586,5 @@
> +    } else if (ss->ssl3.hs.kea_def->ephemeral) {
> +        ss->ssl3.hs.ws = wait_server_key;
> +    } else {
> +        ss->ssl3.hs.ws = wait_cert_request;
> +    }

I would expect it to be organized like this:

if (/*cipher suite uses certs*/) {
  ss->ssl3.hs.ws = wait_server_cert;
} else if (ss->ssl3.hs.kea_def->ephemeral) {
  /* anonymous DH and similar do not use a Certificate
   * message */
  ss->ssl3.hs.ws = wait_server_key;
} else {
  /* A comment explaining which types of cipher suites this
   * case applies to.
   */
  ss->ssl3.hs.ws = wait_cert_request;
} else {
  /* Some cipher suites don't use certificates at all, and
   * aren't allowed to request a client certificate. */
  ss->ssl3.hs.ws = wait_hello_done;
}

In particular:

1. NIT: Order the conditions so that wait_server_cert appears before wait_server_key and wait_server_key before wait_cert_request, to be consistent with the on-the-wire order of those messages.

2. I'm not convinced that "ss->ssl3.hs.kea_def->signKeyType != sign_null" is the best wait to determine whether the cipher suite requires the server to send a Certificate message. Isn't there a clearer way to determine that? If not, documenting the reasoning in a comment would be useful.

3. It seems questionable to me to allow a Certificate-less server to request a certificate from a client at all. It is worth documenting here which kinds of cipher suites need that treatment, if any.

4. It's worth ensuring that cipher suites that shouldn't be allowed to request a client certificate aren't allowed to request one.
I haven't had a chance to review yet, just noting that it would be good to coordinating patching/disclosure so that we can land this to Chrome at an appropriate time as well.
Additional items, perhaps better done in follow-up bugs:

1. It seems like the check of ss->ssl3.hs.ws in ssl3_HandleClientHello should be moved up towards the top of the function.

2. It isn't clear how the code prevents a client from accepting a ClientHello from the server instead of a ServerHello as described above.
Comment on attachment 8588198 [details] [diff] [review]
bug1086145-1.patch

Review of attachment 8588198 [details] [diff] [review]:
-----------------------------------------------------------------

r=wtc. I reviewed this patch at https://codereview.appspot.com/218070043/

::: lib/ssl/sslsecur.c
@@ +399,5 @@
>      }
>  
> +    /* Require a strong, forward-secret key exchange. */
> +    *canFalseStart = ss->ssl3.hs.kea_def->ephemeral &&
> +      !ss->ssl3.hs.kea_def->is_limited;

I agree it is safer to revert to the whitelist of the old code here. In my code review at https://codereview.appspot.com/218070043/ I noted that we need to disallow the "anon" cipher suites, which are also ephemeral. So the new code seems more error-prone.
Attachment #8588198 - Flags: feedback?(wtc) → feedback+
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #44)
> > +    if (ss->ssl3.hs.ws != wait_server_key ||
> > +        ss->sec.peerCert == NULL) {
> 
> I think the ss->sec.peerCert case should be either removed here.

You are correct.  I was only doing refactoring based on local information.

> IMO, it is better not to add these redundant checks.

I guess we disagree then.  If these get merged into the one state, then this is fine.

(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #45)
[...]
> > +    } else if (ss->ssl3.hs.kea_def->ephemeral) {
> > +        ss->ssl3.hs.ws = wait_server_key;
> > +    } else {
> > +        ss->ssl3.hs.ws = wait_cert_request;
> > +    }
> 
> I would expect it to be organized like this:
> 
> if (/*cipher suite uses certs*/) {
>   ss->ssl3.hs.ws = wait_server_cert;
> } else if (ss->ssl3.hs.kea_def->ephemeral) {
>   /* anonymous DH and similar do not use a Certificate
>    * message */
>   ss->ssl3.hs.ws = wait_server_key;
> } else {
>   /* A comment explaining which types of cipher suites this
>    * case applies to.
>    */
>   ss->ssl3.hs.ws = wait_cert_request;
> } else {
>   /* Some cipher suites don't use certificates at all, and
>    * aren't allowed to request a client certificate. */
>   ss->ssl3.hs.ws = wait_hello_done;
> }

That final else can't be reached if you accept that sign_null is the right signal.  And I don't have any alternative.  So you are describing what I constructed, sans explanatory text, which I'll add.
 
> 3. It seems questionable to me to allow a Certificate-less server to request
> a certificate from a client at all. It is worth documenting here which kinds
> of cipher suites need that treatment, if any.

I thought about that but considered it less of a correctness issue than simply a strictness one.

Technically, with the signature_algorithms extension we can have an anon_DH suite where the server requests a client certificate.  That might sound perverse, but I didn't want to add more spaghetti to what is already an immense pile.  I think that the only place that needs to go is in 2 out of three of the alternatives in ssl3_HandleServerKeyExchange (kt_rsa always corresponds to sign_rsa there).

However, I just looked and it looks like we would explode if someone tried to do anon_DH in any form other than ECDH because there is a consistency check between the certificate and the key exchange.  This is currently safe because we don't support any of those cipher suites.

> 4. It's worth ensuring that cipher suites that shouldn't be allowed to
> request a client certificate aren't allowed to request one.

I guess that I could check for (ssl3_ExtensionNegotiated(ss, ssl_signature_algorithms_xtn) || ss->ssl3.hs.kea_def->signKeyType != sign_null) in both places.  Thoughts?

(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #47)
> Additional items, perhaps better done in follow-up bugs:
> 
> 1. It seems like the check of ss->ssl3.hs.ws in ssl3_HandleClientHello
> should be moved up towards the top of the function.

I'll do that.

> 2. It isn't clear how the code prevents a client from accepting a
> ClientHello from the server instead of a ServerHello as described above.

Well, the checks I have on the certificate help a little.  But I think that you are right to ask this question.  It seems that a client would accept a ClientHello from a server and make it some part of the way through a handshake.  I don't expect it to get very far, not even as far as Certificate messages, but it might trip some edge cases on the way.  It's easy to prevent.
> > +    /* Require a strong, forward-secret key exchange. */
> > +    *canFalseStart = ss->ssl3.hs.kea_def->ephemeral &&
> > +      !ss->ssl3.hs.kea_def->is_limited;
> 
> I agree it is safer to revert to the whitelist of the old code here. In my
> code review at https://codereview.appspot.com/218070043/ I noted that we
> need to disallow the "anon" cipher suites, which are also ephemeral. So the
> new code seems more error-prone.

Error-prone only because I screwed up and added a potential downgrade from authenticated to unauthenticated modes ? :)

I actually disagree on both points now that I've considered it.

If the client offered an unauthenticated cipher suite, then this is actually fine as-is.  That downgrade is possible anyway with a straight MitM, a far easier downgrade than the theoretical risk that makes us avoid static key exchange modes.  Why deny those clients the performance advantage when they clearly aren't losing anything they weren't prepared to give up in the first place?

As for the relative merits of deciding based on properties vs. an explicit list, I prefer the properties, precisely because that is more maintainable.

My current code says this:

    /* Recommend only if there is a strong, forward-secrecy-capable suite.
     * Suites using kt_dh don't count (without named group support).
     * Note: This also excludes all export cipher suites. */
    *canFalseStart = ss->ssl3.hs.kea_def->ephemeral &&
      !ss->ssl3.hs.kea_def->signKeyType != kt_dh;


We might consider restoring kt_dh if someone implements the named group extension for DHE, but I'm not holding my breath there.
Attached patch bug1086145-1.patch (obsolete) — — Splinter Review
r=wtc

Note that I'm happy to update this if the discussion continues to reach a different conclusion.
Attachment #8588198 - Attachment is obsolete: true
Attachment #8588820 - Flags: review+
Comment on attachment 8588820 [details] [diff] [review]
bug1086145-1.patch

Review of attachment 8588820 [details] [diff] [review]:
-----------------------------------------------------------------

I spotted a bug. Please see the comment marked with "IMPORTANT".

::: lib/ssl/ssl3con.c
@@ +288,5 @@
>  { /* indexed by SSL3KeyExchangeAlgorithm */
> +    /* kea            exchKeyType signKeyType is_limited limit tls_keygen ephemeral */
> +    {kea_null,           kt_null, sign_null,  PR_FALSE,   0, PR_FALSE, PR_FALSE},
> +    {kea_rsa,            kt_rsa,  sign_rsa,   PR_FALSE,   0, PR_FALSE, PR_FALSE},
> +    {kea_rsa_export,     kt_rsa,  sign_rsa,   PR_TRUE,  512, PR_FALSE, PR_FALSE},

We need to set ephemeral=PR_TRUE for kea_rsa_export.
We no longer allow weak RSA certificates. For RSA
certificates we allow, the ServerKeyExchange message
is required.

::: lib/ssl/sslsecur.c
@@ +401,5 @@
> +    /* Recommend only if there is a strong, forward-secrecy-capable suite.
> +     * Suites using kt_dh don't count (without named group support).
> +     * Note: This also excludes all export cipher suites. */
> +    *canFalseStart = ss->ssl3.hs.kea_def->ephemeral &&
> +      !ss->ssl3.hs.kea_def->signKeyType != kt_dh;

IMPORTANT: there is an extraneous '!' at the beginning of this line.
Attachment #8588820 - Flags: feedback-
Attached patch bug1086145-1.patch (obsolete) — — Splinter Review
r=wtc; with snafus removed.
Attachment #8588820 - Attachment is obsolete: true
Attachment #8589196 - Flags: review+
Comment on attachment 8589196 [details] [diff] [review]
bug1086145-1.patch

Review of attachment 8589196 [details] [diff] [review]:
-----------------------------------------------------------------

Martin, I believe your conditional expression in SSL_RecommendedCanFalseStart
still has a bug. Please see my comments marked with "IMPORTANT" or "BUG?".
I hope I didn't misunderstand the code.

::: lib/ssl/ssl3con.c
@@ +6582,5 @@
>  
>      ss->ssl3.hs.isResuming = PR_FALSE;
> +    if (ss->ssl3.hs.kea_def->signKeyType != sign_null) {
> +        /* All current cipher suites other than those with sign_null (i.e.,
> +         * anon_dh suites) require a certificate, so use that signal. */

Nit: anon_dh => dh_anon

@@ +6648,5 @@
>  		SSL_GETPID(), ss->fd));
>      PORT_Assert( ss->opt.noLocks || ssl_HaveRecvBufLock(ss) );
>      PORT_Assert( ss->opt.noLocks || ssl_HaveSSL3HandshakeLock(ss) );
>  
> +    if (ss->ssl3.hs.ws != wait_server_key) {

The ss->sec.peerCert == NULL test in the original code seems to detect that we received a ServerKeyExchange message without first receiving a Certificate message. In the new code, I agree this test can be removed. (It doesn't really matter, but that test would be incompatible with the "anon" cipher suites.)

@@ +7652,5 @@
>      PORT_Assert( ss->ssl3.initialized );
>  
> +    if (!ss->sec.isServer ||
> +        ((ss->ssl3.hs.ws != wait_client_hello) &&
> +         (ss->ssl3.hs.ws != idle_handshake))) {

Nit: (I know these come from the original code) we can omit the two pairs of innermost parentheses:

    if (!ss->sec.isServer ||
        (ss->ssl3.hs.ws != wait_client_hello &&
         ss->ssl3.hs.ws != idle_handshake)) {

@@ +7665,5 @@
> +        errCode = SSL_ERROR_RENEGOTIATION_NOT_ALLOWED;
> +        goto alert_loser;
> +    }
> +
> +

Nit: delete one blank line.

@@ +9057,5 @@
>  
>      isTLS = (PRBool)(ss->ssl3.prSpec->version > SSL_LIBRARY_VERSION_3_0);
>      isTLS12 = (PRBool)(ss->ssl3.prSpec->version >= SSL_LIBRARY_VERSION_TLS_1_2);
>  
> +    if (ss->ssl3.hs.ws != wait_cert_verify) {

I verified it is safe to remove the ss->sec.peerCert == NULL test in the original code because we only transition to the wait_cert_verify state when ss->sec.peerCert is not null.

::: lib/ssl/sslimpl.h
@@ +726,5 @@
>      SSL3KEAType              exchKeyType;
>      SSL3SignType             signKeyType;
> +    /* For export cipher suites:
> +     * is_limited identifies a suite as having a limit on the modulus length.
> +     * key_size_limit provides the corresponding maximum modulus length */

Nit: should the wording be more generic and say "key size" instead of "modulus length"?

@@ +731,4 @@
>      PRBool                   is_limited;
>      int                      key_size_limit;
>      PRBool                   tls_keygen;
> +    /* True if the key exchange for the suite can be ephemeral. */

Nit: does "can be" imply the key exchange method doesn't need to be ephemeral?

::: lib/ssl/sslsecur.c
@@ +397,5 @@
>  	PORT_SetError(SSL_ERROR_FEATURE_NOT_SUPPORTED_FOR_SSL2);
>  	return SECFailure;
>      }
>  
> +    /* Recommend only if there is a strong, forward-secrecy-capable suite.

Super nit: I think you use "Recommend" here because the function is named SSL_RecommendedCanFalseStart. But that function name means it uses the NSS recommended criteria. So ideally we should say "NSS recommends False Start require a strong, forward-secrecy-capable suite", which can be shortened to "Require a strong, forward-secrecy-capable suite." Your new wording is a little unclear.

@@ +399,5 @@
>      }
>  
> +    /* Recommend only if there is a strong, forward-secrecy-capable suite.
> +     * Suites using kt_dh don't count (without named group support).
> +     * Note: This also excludes all export cipher suites. */

IMPORTANT: how does this exclude all export cipher suites? The !is_limited test is gone.

@@ +401,5 @@
> +    /* Recommend only if there is a strong, forward-secrecy-capable suite.
> +     * Suites using kt_dh don't count (without named group support).
> +     * Note: This also excludes all export cipher suites. */
> +    *canFalseStart = ss->ssl3.hs.kea_def->ephemeral &&
> +      ss->ssl3.hs.kea_def->signKeyType != kt_dh;

1. BUG?: I think we should test exchKeyType instead of signKeyType.
This test should read:
    ss->ssl3.hs.kea_def->exchKeyType != kt_dh

Also, this test seems to allow kea_rsa_export, kea_rsa_export_1024,
and all the "anon" key exchange methods.

2. Nit: please indent four spaces relative to the previous line.
Attachment #8589196 - Flags: review-
Comment on attachment 8589196 [details] [diff] [review]
bug1086145-1.patch

Review of attachment 8589196 [details] [diff] [review]:
-----------------------------------------------------------------

::: lib/ssl/ssl3con.c
@@ +7652,5 @@
>      PORT_Assert( ss->ssl3.initialized );
>  
> +    if (!ss->sec.isServer ||
> +        ((ss->ssl3.hs.ws != wait_client_hello) &&
> +         (ss->ssl3.hs.ws != idle_handshake))) {

Does ss->sec.isServer always get set reliably before ssl3_HandleClientHello is called?

In particular, what if the client does the TCP connection, but the server sends a ClientHello before the client calls ssl3_SendClientHello?

@@ +7653,5 @@
>  
> +    if (!ss->sec.isServer ||
> +        ((ss->ssl3.hs.ws != wait_client_hello) &&
> +         (ss->ssl3.hs.ws != idle_handshake))) {
> +        desc    = unexpected_message;

Nit: collapse multiple spaces.

@@ +7657,5 @@
> +        desc    = unexpected_message;
> +        errCode = SSL_ERROR_RX_UNEXPECTED_CLIENT_HELLO;
> +        goto alert_loser;
> +    }
> +    if (ss->ssl3.hs.ws == idle_handshake  &&

Nit: too many spaces before "&&"

@@ +7660,5 @@
> +    }
> +    if (ss->ssl3.hs.ws == idle_handshake  &&
> +        ss->opt.enableRenegotiation == SSL_RENEGOTIATE_NEVER) {
> +        desc    = no_renegotiation;
> +        level   = alert_warning;

ditto unneccessary spaces.

@@ +9844,5 @@
>      PORT_Assert( ss->opt.noLocks || ssl_HaveRecvBufLock(ss) );
>      PORT_Assert( ss->opt.noLocks || ssl_HaveSSL3HandshakeLock(ss) );
>  
> +    if ((isServer && ss->ssl3.hs.ws != wait_client_cert) ||
> +        (!isServer && ss->ssl3.hs.ws != wait_server_cert)) {

You mentioned that you thought it was unnecessary to revert this change because it didn't matter. Conversely, why add a change that doesn't do anything except make the code less consistent? In particular, note that no other function seems to test isServer when testing ss->ssl3.hs.ws and this code makes the reader wonder if these impossible states are possible. It seems bad to me.

I'd rather have the changes be limited to fixing the problem or making the logic *more* clear.

::: lib/ssl/sslimpl.h
@@ +731,4 @@
>      PRBool                   is_limited;
>      int                      key_size_limit;
>      PRBool                   tls_keygen;
> +    /* True if the key exchange for the suite can be ephemeral. */

s/can be/is/

::: lib/ssl/sslsecur.c
@@ +401,5 @@
> +    /* Recommend only if there is a strong, forward-secrecy-capable suite.
> +     * Suites using kt_dh don't count (without named group support).
> +     * Note: This also excludes all export cipher suites. */
> +    *canFalseStart = ss->ssl3.hs.kea_def->ephemeral &&
> +      ss->ssl3.hs.kea_def->signKeyType != kt_dh;

Again, why make an change to this code in this patch that is unnecessary for fixing this bug? If you think this code is unclear or you think the false start criteria are wrong, that should be discussed and fixed in a separate bug.

I think by now it is clear that the previous version of the code was less error-prone, given that there have been two errors in the attempts to improve it.

Also, the original code took a whitelisting approach to deciding which specific cipher suites to enable, whereas the new code takes more of a blacklisting approach. The usual style is to prefer the whitelisting approach.

Regarding your argument that it is OK to false start for anonymous cipher suites: I'm not disagreeing with that, but making that improvement is outside the scope of this bug so why discuss that here?

Again, in general, changes should be restricted to what is needed to fix the bug, especially with a security bug like this one.
(In reply to Wan-Teh Chang from comment #54)
> > +    /* True if the key exchange for the suite can be ephemeral. */
> 
> Nit: does "can be" imply the key exchange method doesn't need to be
> ephemeral?

Yes.  It is entirely possible to supply a static key in ServerKeyExchange.  NSS provides the same ephemeral key for export cipher suites, as well as ECDHE.  The latter know has an option to generate a new key pair for every connection, but that's only recent.

Of course, the definition of ephemeral is something that is debatable too.  Some people insist on a definition that allows only one use.  Others insist that it's OK to say that something is ephemeral as long as the key doesn't get stored or reused too much (where stored and reused have varying degrees of flexibility).
 
> > +    /* Recommend only if there is a strong, forward-secrecy-capable suite.
> 
> Super nit: I think you use "Recommend" here because the function is named
> SSL_RecommendedCanFalseStart. But that function name means it uses the NSS
> recommended criteria. So ideally we should say "NSS recommends False Start
> require a strong, forward-secrecy-capable suite", which can be shortened to
> "Require a strong, forward-secrecy-capable suite." Your new wording is a
> little unclear.

I've reworded to:
    /* Recommend false start only if there is a strong, forward-secrecy-capable
     * suite.  Export cipher suites definitely aren't strong enough.  Suites
     * using kt_dh key exchange can't guarantee that they are strong enough. */
 
> IMPORTANT: how does this exclude all export cipher suites? The !is_limited
> test is gone.

I obviously screwed up this one, didn't I.
 
> @@ +401,5 @@
> > +    *canFalseStart = ss->ssl3.hs.kea_def->ephemeral &&
> > +      ss->ssl3.hs.kea_def->signKeyType != kt_dh;
> 
> 1. BUG?: I think we should test exchKeyType instead of signKeyType.
> This test should read:
>     ss->ssl3.hs.kea_def->exchKeyType != kt_dh

Good catch.  If this were a good language, type checking would have prevented this from compiling.

> Also, this test seems to allow kea_rsa_export, kea_rsa_export_1024,
> and all the "anon" key exchange methods.

The export thing is a mistake.

As I noted in comment 50, I see no reason not to allow false start for "anon" methods.

    *canFalseStart = ss->ssl3.hs.kea_def->ephemeral &&
        !ss->ssl3.hs.kea_def->is_limited &&
        ss->ssl3.hs.kea_def->exchKeyType != kt_dh;
(In reply to Brian Smith (:briansmith, :bsmith, use NEEDINFO?) from comment #55)
> Does ss->sec.isServer always get set reliably before ssl3_HandleClientHello
> is called?

Yes.  Though the logic is a little hard to follow.  SSL_ResetHandshake is passed an 'asServer' flag, which determines whether ssl2_BeginServerHandshake() or ssl2_BeginClientHandshake() is called next.  Those each set the flag differently.

The main thing that this check prevents is a ClientHello being sent by the server on an existing connection.
 
> In particular, what if the client does the TCP connection, but the server
> sends a ClientHello before the client calls ssl3_SendClientHello?

Who starts the TCP connection is not something that libssl seems to care about.  It only really cares about the role.

> ::: lib/ssl/sslimpl.h
> @@ +731,4 @@
> >      PRBool                   is_limited;
> >      int                      key_size_limit;
> >      PRBool                   tls_keygen;
> > +    /* True if the key exchange for the suite can be ephemeral. */
> 
> s/can be/is/

"can be" is more correct.
 
> ::: lib/ssl/sslsecur.c
> @@ +401,5 @@
> > +    /* Recommend only if there is a strong, forward-secrecy-capable suite.
> > +     * Suites using kt_dh don't count (without named group support).
> > +     * Note: This also excludes all export cipher suites. */
> > +    *canFalseStart = ss->ssl3.hs.kea_def->ephemeral &&
> > +      ss->ssl3.hs.kea_def->signKeyType != kt_dh;
> 
> Again, why make an change to this code in this patch that is unnecessary for
> fixing this bug? If you think this code is unclear or you think the false
> start criteria are wrong, that should be discussed and fixed in a separate
> bug.
> 
> I think by now it is clear that the previous version of the code was less
> error-prone, given that there have been two errors in the attempts to
> improve it.

I'm taking it out.
Attached patch bug1086145-1.patch (obsolete) — — Splinter Review
Attachment #8589196 - Attachment is obsolete: true
Attachment #8589306 - Flags: review?(wtc)
Comment on attachment 8589306 [details] [diff] [review]
bug1086145-1.patch

Review of attachment 8589306 [details] [diff] [review]:
-----------------------------------------------------------------

r=wtc.
Attachment #8589306 - Flags: review?(wtc) → review+
Comment on attachment 8588199 [details] [diff] [review]
bug1086145-2.patch

Review of attachment 8588199 [details] [diff] [review]:
-----------------------------------------------------------------

r=wtc. Overall this looks great. I found only one bug: the error_code_ member in tls_agent.h needs to be initialized by the constructor. Most of my comments are just nits. I marked two comments as "IMPORTANT".

::: external_tests/README
@@ +22,5 @@
>    ssl_gtest -d ${SSLGTESTDIR}
>  
> +Where $SSLGTESTDIR is a directory with a database containing:
> + - an RSA certificate called server (with its private keys)
> + - an ECDSA certificate called ecdsa (with its private keys)

Nit: I know this comes from the original text, but each certificate should have just one private key (singular).

@@ +30,2 @@
>  
> +  tests_results/security/$(hostname).${NUMBER}/ssl_gtests

Nit: we should use curly braces {} instead of parentheses around the |hostname| variable, right?

::: external_tests/ssl_gtest/ssl_loopback_unittest.cc
@@ -158,5 @@
> -
> -  client_->CheckVersion(SSL_LIBRARY_VERSION_TLS_1_1);
> -}
> -
> -TEST_P(TlsConnectGeneric, ConnectTLS_1_2_Only) {

Please confirm these two tests were not deleted by accident.

@@ +176,5 @@
>  
> +TEST_P(TlsConnectGeneric, ConnectEcdsa) {
> +  SetupEcdsa();
> +  Connect();
> +  client_->CheckAuthType(ssl_auth_ecdsa);

Nit: should we also call client_->CheckVersion()? See line 47 in the TlsConnectGeneric.Connect test.

::: external_tests/ssl_gtest/ssl_skip_unittest.cc
@@ +9,5 @@
> +#include "tls_parser.h"
> +#include "tls_filter.h"
> +#include "tls_connect.h"
> +
> +namespace nss_test {

Please add a comment to summarize what this file is testing.

@@ +13,5 @@
> +namespace nss_test {
> +
> +class TlsHandshakeSkipFilter : public TlsRecordFilter {
> + public:
> +  TlsHandshakeSkipFilter(uint8_t handshake_type)

Nit: please document that |handshake_type| specifies the handshake message to skip.

@@ +16,5 @@
> + public:
> +  TlsHandshakeSkipFilter(uint8_t handshake_type)
> +      : handshake_type_(handshake_type),
> +        changed_(false) {}
> + protected:

Nit: add a blank line before "protected:".

@@ +17,5 @@
> +  TlsHandshakeSkipFilter(uint8_t handshake_type)
> +      : handshake_type_(handshake_type),
> +        changed_(false) {}
> + protected:
> +  virtual bool FilterRecord(uint8_t content_type, uint16_t version,

Nit: please document what this method does.

@@ +55,5 @@
> +        output_offset += entire_length;
> +      } else {
> +        std::cerr << "Dropping handshake: "
> +                  << static_cast<unsigned>(handshake_type_) << std::endl;
> +        changed_ = true;

IMPORTANT: it is strange that we set |changed_| to true if handshake_type == handshake_type_. I would expect "changed" to mean handshake_type != handshake_type_. This is worth a comment.

@@ +64,5 @@
> +  }
> +
> + private:
> +  uint8_t handshake_type_;
> +  bool changed_;

Please document |changed_|.

Does it need to be a member variable? Can it be a local variable in the FilterRecord() method?

@@ +84,5 @@
> +      server_->SetPacketFilter(filter);
> +    }
> +    ConnectExpectFail();
> +    ASSERT_EQ(kTlsAlertFatal, alert_recorder->level());
> +    ASSERT_EQ(alert, alert_recorder->description());

Nit: use EXPECT_EQs.

@@ +122,5 @@
> +TEST_P(TlsSkipTest, SkipCertAndKeyExch) {
> +  auto chain = new ChainedPacketFilter();
> +  chain->Add(new TlsHandshakeSkipFilter(kTlsHandshakeCertificate));
> +  chain->Add(new TlsHandshakeSkipFilter(kTlsHandshakeServerKeyExchange));
> +  ServerSkipTest(chain);

Interesting ... is it easy to verify both handshake messages are skipped? Because just skipping ServerKeyExchange will also result in SSL_ERROR_RX_UNEXPECTED_HELLO_DONE.

It seems that this will require us to record the handshake wait state at the time the handshake fails.

::: external_tests/ssl_gtest/tls_agent.cc
@@ +78,5 @@
>    ASSERT_EQ(SECSuccess, rv);
>    SetState(CONNECTING);
>  }
>  
>  void TlsAgent::EnableSomeECDHECiphers() {

Nit: in your naming convention, ECDHE should be spelled "Ecdhe".

@@ +125,5 @@
>  }
>  
> +void TlsAgent::CheckAuthType(SSLAuthType type) const {
> +  ASSERT_EQ(CONNECTED, state_);
> +  ASSERT_EQ(type, csinfo_.authAlgorithm);

Nit: EXPECT_EQ is more appropriate than ASSERT_EQ here.
You don't need to change this now, but we probably should
make a pass through this file in the future to change
some of these ASSERT_EQs to EXPECT_EQs.

EXPECT_EQ will not abort the test if it fails. It is
more appropriate when you are interested in seeing how
badly the code is broken. ASSERT_EQ is appropriate if
it is meaningless for the test to go on when the check
fails, for example, if the rest of the test would deference
a null pointer or access array elements outside the
array bounds.

::: external_tests/ssl_gtest/tls_agent.h
@@ +40,5 @@
>          pr_fd_(nullptr),
>          adapter_(nullptr),
>          ssl_fd_(nullptr),
>          role_(role),
>          state_(INIT) {

IMPORTANT: The constructor should initialize the new error_code_ member.

::: external_tests/ssl_gtest/tls_connect.cc
@@ +106,5 @@
>    Init();
>  }
>  
> +void TlsConnectTestBase::SetupEcdsa() {
> +  Reset("ecdsa", kt_ecdh);

Nit: based on the source code in sslt.h:
http://mxr.mozilla.org/nss/source/lib/ssl/sslt.h#36

we should use ssl_kea_ecdh instead of kt_ecdh, and similarly ssl_kea_rsa instead of kt_rsa. The kt_xxx symbols are now macros defined for backward compatibility.

::: external_tests/ssl_gtest/tls_connect.h
@@ +40,5 @@
>    // Initialize client and server.
>    void Init();
>    // Re-initialize client and server.
> +  void Reset(const std::string& server_name = "server",
> +             SSLKEAType kea = kt_rsa);

Nit: I know we don't need to follow the Google C++ Style Guide, but that style guide forbids default arguments. It would be nice to avoid them unless we disagree with the rationale.

@@ +51,5 @@
>    void Connect();
>    // Connect and expect it to fail.
>    void ConnectExpectFail();
>  
> +  // Setup to use ECDSA on the server and some ECDSA suites.

Nit: use ECDSA => use an ECDSA certificate ?

ECDSA suites => ECDHE suites ?

::: external_tests/ssl_gtest/tls_filter.h
@@ +44,5 @@
>    TlsHandshakeFilter() {}
>  
> +  // Reads the length from the record header.
> +  // This also reads the DTLS fragment information and checks it.
> +  static bool ReadLength(TlsParser& parser, uint16_t version, uint32_t *length);

Nit: should |parser| be a const reference?

In Google's Style Guide, if the function needs to modify |parser|, |parser| should be declared as a pointer.

::: external_tests/ssl_gtest/tls_parser.h
@@ +31,5 @@
>  const uint8_t kTlsAlertHandshakeFailure = 0x28;
>  const uint8_t kTlsAlertIllegalParameter = 0x2f;
>  const uint8_t kTlsAlertDecodeError = 0x32;
>  const uint8_t kTlsAlertUnsupportedExtension = 0x6e;
>  const uint8_t kTlsAlertNoApplicationProtocol = 0x78;

Nit: since both TLS 1.2 and IANA (http://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml#tls-parameters-6) specify the alert message codes in decimal, I think decimal is better than hexadecimal here.

::: tests/cert/cert.sh
@@ +970,5 @@
> +n
> +CERTSCRIPT
> +  if [ "$RET" -ne 0 ]; then
> +     echo "return value is $RET"
> +     Exit 6 "Fatal - failed to create server cert for ssl_gtests"

Nit: server cert => rsa server cert

@@ +985,5 @@
>  CERTSCRIPT
>  
>    if [ "$RET" -ne 0 ]; then
>       echo "return value is $RET"
> +     Exit 6 "Fatal - failed to create ecdsa cert for ssl_gtests"

Nit: ecdsa cert => ecdsa server cert
Attachment #8588199 - Flags: review+
(In reply to Wan-Teh Chang from comment #60)
> ::: external_tests/ssl_gtest/ssl_loopback_unittest.cc
> > -TEST_P(TlsConnectGeneric, ConnectTLS_1_2_Only) {
> 
> Please confirm these two tests were not deleted by accident.

Yes, this is intentional.  These replicate TEST_P(TlsConnectGeneric, Connect)

> > +        std::cerr << "Dropping handshake: "
> > +                  << static_cast<unsigned>(handshake_type_) << std::endl;
> > +        changed_ = true;
> 
> IMPORTANT: it is strange that we set |changed_| to true if handshake_type ==
> handshake_type_. I would expect "changed" to mean handshake_type !=
> handshake_type_. This is worth a comment.

Done.  Note that if we don't find handshake_type_ we don't change the record at all.

> Does it need to be a member variable? Can it be a local variable in the
> FilterRecord() method?

I've added a comment explaining why.  You have to track this state across invocations (the filter is called multiple times).  You have to know if a message is dropped so that the DTLS sequence number can be adjusted.

> @@ +122,5 @@
> > +TEST_P(TlsSkipTest, SkipCertAndKeyExch) {
> > +  auto chain = new ChainedPacketFilter();
> > +  chain->Add(new TlsHandshakeSkipFilter(kTlsHandshakeCertificate));
> > +  chain->Add(new TlsHandshakeSkipFilter(kTlsHandshakeServerKeyExchange));
> > +  ServerSkipTest(chain);
> 
> Interesting ... is it easy to verify both handshake messages are skipped?
> Because just skipping ServerKeyExchange will also result in
> SSL_ERROR_RX_UNEXPECTED_HELLO_DONE.
>
> It seems that this will require us to record the handshake wait state at the
> time the handshake fails.

Yeah, I don't see much value in creating the API to expose that.  Maybe if this were new code, I'd design for test a little more, but this is NSS.

> Nit: EXPECT_EQ is more appropriate than ASSERT_EQ here.
> You don't need to change this now, but we probably should
> make a pass through this file in the future to change
> some of these ASSERT_EQs to EXPECT_EQs.
> 
> EXPECT_EQ will not abort the test if it fails. It is
> more appropriate when you are interested in seeing how
> badly the code is broken. ASSERT_EQ is appropriate if
> it is meaningless for the test to go on when the check
> fails, for example, if the rest of the test would deference
> a null pointer or access array elements outside the
> array bounds.

I didn't really know - I was largely copying what was already there.  I did a quick pass through and changed them all.  It's only the point at which we can't connect that we have a problem.

> IMPORTANT: The constructor should initialize the new error_code_ member.

Thanks.

> In Google's Style Guide, if the function needs to modify |parser|, |parser|
> should be declared as a pointer.

Ahh, Google Style.  Trying really hard to make C++ less bad.  That's a lost cause.
Attached patch bug1086145-1.patch — — Splinter Review
r=wtc, just uploading what I plan to land.
Attachment #8589306 - Attachment is obsolete: true
Attachment #8591024 - Flags: review+
Attached patch bug1086145-2.patch (obsolete) — — Splinter Review
r=wtc
Attachment #8588199 - Attachment is obsolete: true
Attachment #8588199 - Flags: review?(ekr)
Attachment #8591025 - Flags: review+
This is clearly not fixed in 3.17.3
Target Milestone: 3.17.3 → ---
Note: this was published some time ago:

https://www.smacktls.com/smack.pdf
Comment on attachment 8591025 [details] [diff] [review]
bug1086145-2.patch

Review of attachment 8591025 [details] [diff] [review]:
-----------------------------------------------------------------

r=wtc. I suggest two small changes.

::: external_tests/ssl_gtest/ssl_skip_unittest.cc
@@ +65,5 @@
> +        output_offset += entire_length;
> +      } else {
> +        std::cerr << "Dropping handshake: "
> +                  << static_cast<unsigned>(handshake_type_) << std::endl;
> +        // We only need to report a that the output contains changed data if we

Nit: report a that => report that

::: external_tests/ssl_gtest/tls_connect.h
@@ +40,5 @@
>    // Initialize client and server.
>    void Init();
>    // Re-initialize client and server.
> +  void Reset(const std::string& server_name, SSLKEAType kea);
> +  void Reset(); // With the RSA server certificate.

Nit: the Google Style Guide also recommends against this kind of method overloading. Please revert to using default arguments until the NSS team decides which style guide we should follow in the C++ code. Sorry about this.
Attached patch bug1086145-2.patch — — Splinter Review
I made some bigger changes that I hope you will approve of.  SetupEcdsa is now ResetEcdsa() and ResetRsa() replaces Reset().  Now Reset() is private and there is less chance now that mismatched arguments can be passed to it.
Attachment #8591025 - Attachment is obsolete: true
Attachment #8591135 - Flags: review+
Comment on attachment 8591135 [details] [diff] [review]
bug1086145-2.patch

Review of attachment 8591135 [details] [diff] [review]:
-----------------------------------------------------------------

r=wtc. Thanks.
Attachment #8591135 - Flags: review+
Reminder to Ryan to check with Chrome release managers about the timing of this landing.
Flags: needinfo?(ryan.sleevi)
Discussed on the NSS call.  NSS release 3.19 (Beta in a few days).  Firefox release 39 (June 30).  Chrome release 43/44 (Ryan to provide information).
Flags: needinfo?(rlb)
https://hg.mozilla.org/projects/nss/rev/6b4770c76bc8
https://hg.mozilla.org/projects/nss/rev/1865635f5df5
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → 3.19
[Tracking Requested - why for this release]: Support for ESR31 is dropped when Fx40 ships, so this needs 39+ tracking status AFAICT?
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #72)
> [Tracking Requested - why for this release]: Support for ESR31 is dropped
> when Fx40 ships, so this needs 39+ tracking status AFAICT?

38 is the next ESR
and ESR31 and ESR38 overlap for two releases.
For reference, here's a summary of current NSS versions across supported release branches. Not sure if we want to update to 3.19 across the board or create some combination of point releases.

esr38: NSS_3_18_RTM
b2g37: NSS_3_17_4_RTM
b2g34: NSS_3_17_2_RTM
b2g32: NSS_3_17_2_RTM
b2g30: NSS_3_16_2_3_RTM
esr31: NSS_3_16_2_3_RTM
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #75)
> For reference, here's a summary of current NSS versions across supported
> release branches. Not sure if we want to update to 3.19 across the board or
> create some combination of point releases.

There are other changes in 3.19 that will need some soak time before we can be confident that they are safe to take.  Once that resolves, this can be opened.  There is also another bug under embargo with a prepared patch that could land in 3.19 in the near future (though a backport is still a possibility).
Priority: -- → P1
Martin, any thoughts for how we want to handle backporting this fix to older branches? See comment 75 for the various NSS versions currently in use across our supported releases.
We might suggest that we simply upgrade those older releases to a newer version of NSS.  3.18 has proven to be quite stable so far, so the risk might be acceptable.  3.19 might be a little too new.

The tests are the problem for back ports.  We might be able to just land the code without the tests.

I'll bring it up on the NSS call this week and get back to you with an answer.
My proposed plan is to take 3.19 all the way back.  Bug 1138554 is coming on Monday and that will need to be ported too.  3.19 went RTM on April 30, and has been in Firefox since well before then.
Flags: needinfo?(martin.thomson)
Blocks: 1166031
What is 1166031?
Cancelling this.
Flags: needinfo?(ryan.sleevi)
:dveditz, can we mark this public?  It's in m-c, aurora, and soon beta.
Flags: needinfo?(dveditz)
Group: core-security
Flags: needinfo?(dveditz)
Note that NSS 3.19 only landed in Chrome today, and is not part of any of our Dev or Beta channels.

Opening this bug was a non-issue because the attack itself was described in the SMACK-TLS preprint (Comment #65), and NSS 3.19 was announced.

Few follow-up questions:
- Was there a CVE attached to this, dveditz? Comment 26 suggested it was only OpenSSL that got a CVE
- NSS 3.19 release notes didn't flag this as a security issue [other than this was hidden], except the SMACK-TLS paper describes how this could cause issues for cookie disclosure + false start (wouldn't that be a High-Severity issue? It is using Chrome's severity ranking)
Flags: needinfo?(dveditz)
Good questions, Ryan, thanks.

I should have made it clear when I opened the bug that I was doing so because the attack and paper were already public, not simply because the patch had landed (as implied by the request in comment 82).

You're right that the NSS implementation requires a separate CVE. We'll use CVE-2015-2721

> - NSS 3.19 release notes didn't flag this as a security issue [other than
> this was hidden], except the SMACK-TLS paper describes how this could cause
> issues for cookie disclosure + false start (wouldn't that be a High-Severity
> issue? It is using Chrome's severity ranking)

I don't do the NSS release notes so I can't help you there, but this bug is rated "sec-high" and will definitely get a Firefox security advisory.
Alias: CVE-2015-2721
Flags: needinfo?(dveditz)
(In reply to Ryan Sleevi from comment #83)
> - NSS 3.19 release notes didn't flag this as a security issue [other than
> this was hidden], except the SMACK-TLS paper describes how this could cause
> issues for cookie disclosure + false start (wouldn't that be a High-Severity
> issue? It is using Chrome's severity ranking)

The actual exploit here isn't that interesting.  The paper talks it up more than it deserves.  At best, a break in authentication can be turned into a break of the key exchange.  In retrospect, I'd downgrade this to sec-moderate.
The SMACK paper (section V-C) demonstrates that the bug disables forward secrecy for False Start data.
I hope we don't talk it up more than that!

In the past, lack of forward secrecy has been considered serious enough to disable false start altogether (see bug 920248). However, we do agree that the bug does not "break" the normal authentication and confidentiality guarantees of TLS, and it does require an active network attacker, and so sec-moderate would seem to be appropriate.
Blocks: 1173413
Whiteboard: [adv-main39+][adv-esr38.1+][adv-esr31.8+]
Whiteboard: [adv-main39+][adv-esr38.1+][adv-esr31.8+] → [adv-main39+][adv-esr38.1+][adv-esr31.8+][b2g-adv-main2.2+]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: