Closed Bug 542832 Opened 15 years ago Closed 13 years ago

SSL_RestartHandshakeAfterServerCert is broken

Categories

(NSS :: Libraries, enhancement, P1)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED
3.13.2

People

(Reporter: agl, Assigned: briansmith)

References

Details

Attachments

(8 files, 18 obsolete files)

37.29 KB, patch
rrelyea
: review+
Details | Diff | Splinter Review
13.36 KB, patch
rrelyea
: review+
Details | Diff | Splinter Review
21.47 KB, patch
rrelyea
: review+
Details | Diff | Splinter Review
1.42 KB, patch
rrelyea
: review+
Details | Diff | Splinter Review
1.05 KB, patch
rrelyea
: review+
Details | Diff | Splinter Review
1.55 KB, patch
wtc
: review+
rrelyea
: review+
Details | Diff | Splinter Review
12.55 KB, patch
briansmith
: review+
Details | Diff | Splinter Review
39.11 KB, patch
briansmith
: feedback+
Details | Diff | Splinter Review
Attached patch patch v1 (obsolete) — Splinter Review
SSL_RestartHandshakeAfterServerCert is broken:
  * It's in an internal header file
  * It takes an sslSocket* directly

Because of this, nobody can actually call it.

This patch makes it public and it appears to work ok, although I haven't actually added it to any tests.
Attachment #424068 - Attachment is patch: true
Attachment #424068 - Attachment mime type: application/octet-stream → text/plain
Attachment #424068 - Flags: review?(wtc)
Attachment #424068 - Attachment is obsolete: true
Attachment #424071 - Flags: review?(wtc)
Attachment #424068 - Flags: review?(wtc)
Blocks: 511393
Assignee: nobody → bsmith
No longer blocks: 511393
Severity: normal → enhancement
OS: Linux → All
Priority: -- → P1
Hardware: x86 → All
Whiteboard: [mozilla-SPDY]
Target Milestone: --- → 3.13.2
I think there is a better alternative solution:

When the auth callback returns SECWouldBlock, we can keep the handshake going until we get to the client certificate message and/or until we get to the Finished message (if there is no client cert). If the certificate auth is happening on another thread, then we will be able to do the ClientKeyExchange public key operations and send the ClientKeyExchange message to the server, concurrently with doing the certificate authentication. In theory, the server could process the ClientKeyExchange message and do its part of the key agreement PK operations while it waits for us to send the authenticate its certificate and send the Finished message, so that all the key agreement happens concurrently with the certificate authentication.
(In reply to Brian Smith (:bsmith) from comment #2)
> When the auth callback returns SECWouldBlock, we can keep the handshake
> going until we get to the client certificate message and/or until we get to
> the Finished message (if there is no client cert).

Previously, we have discsused whether it is OK to send the client certificate before authenticating the server's certificate. An attacker that is MiTMing a connection between the client and the real server (with a good certificate) can always get the client to disclose its certificate; the client has to send its certificate before it receives the server's Finished message, and we can only detect tampering (in particular, an attacker-inserted CertificateRequest) with the handshake.

However, checking the validity of server's certificate is useful for preventing (non-MiTM) attacks where the attacker does not have a valid certificate for the host that is being connected to. For this reason, I suppose there is some value in validating the server's certificate before sending the client certificate.
(In reply to Brian Smith (:bsmith) from comment #3)
> [...]; the client has to send its certificate before it receives the
> server's Finished message, and we can only detect tampering (in particular,
> an attacker-inserted CertificateRequest) with the handshake

... after having received the server's Finished message.
Comment on attachment 424071 [details] [diff] [review]
patch v1 (CVS diff format this time)

Review of attachment 424071 [details] [diff] [review]:
-----------------------------------------------------------------

In addition to the changes I mention below, we should also add support for the AuthCertificate callback to return SECWouldBlock, in addition to the support for the bad cert handler returning SECWouldBlock.

I will post an updated patch that makes the changes below.

::: mozilla/security/nss/cmd/lib/SSLerrs.h
@@ +397,5 @@
>  ER3(SSL_ERROR_RENEGOTIATION_NOT_ALLOWED,       (SSL_ERROR_BASE + 112),
>  "Renegotiation is not allowed on this SSL socket.")
> +
> +ER3(SSL_ERROR_HANDSHAKE_SUSPENDED,             (SSL_ERROR_BASE + 113),
> +"SSL handshake has been suspended by request.")

I have found that in Gecko, we rely on the PR_WOULD_BLOCK_ERROR and cannot rely on always seeing SECWouldBlock. So, I think it is better to NOT have this new error code at all

::: mozilla/security/nss/cmd/tstclnt/tstclnt.c
@@ +306,5 @@
>      fprintf(stderr, "Bad server certificate: %d, %s\n", err, 
>              SECU_Strerror(err));
> +
> +    if (pauseHandshake)
> +	return SECWouldBlock;

Needs PORT_SetError(PR_WOULD_BLOCK_ERROR);

@@ +1070,5 @@
>  	    FPRINTF(stderr, "%s: Read from server %d bytes\n", progName, nb);
>  	    if (nb < 0) {
> +		if (PR_GetError() == SSL_ERROR_HANDSHAKE_SUSPENDED) {
> +		    /* -P was used to pause the handshake. */
> +		    SSL_RestartHandshakeAfterServerCert(s);

If we remove the SSL_ERROR_HANDSHAKE_SUSPENDED error like I recommend, then we will need to set a flag in our badCertHandler indicating that we returned SECWouldBlock, and check that flag instead.

::: mozilla/security/nss/lib/ssl/sslsecur.c
@@ +153,5 @@
> +	/* See case 3, above. */
> +	if (rv == SECWouldBlock &&
> +	    PORT_GetError() == SSL_ERROR_HANDSHAKE_SUSPENDED) {
> +	    break;
> +	}

Here, we can change the condition to rv == SECWouldBlock && ss->handshake == AlwaysBlock. Also, we shouldn't remove the logic below that ensures that returning SECWouldBlock is always accompanied by PORT_SetError(PR_WOULD_BLOCK_ERROR), for the reason I mentioned above.

@@ +174,4 @@
>  static SECStatus
>  AlwaysBlock(sslSocket *ss)
>  {
> +    PORT_SetError(SSL_ERROR_HANDSHAKE_SUSPENDED);

We can revert this change.

@@ +1498,3 @@
>  
> +    if (!ss)
> +	return SECFailure;

Need to handle this in the same way we handle it in other functions.
Attachment #424071 - Flags: review-
Here is my modified version of Adam's patch, not quite ready for review, but I would appreciate any feedback that I can get.

Like Adam's patch, this includes a working tstclnt. You can use the -P flag to enable the asynchronous cert validation behavior. I made his patch so that cert auth failure override (-o) is controlled independently of the -P flag.

This patch extends the support for non-blocking mode to the AuthCertificate hook; previously, only the BadCertHandler could be non-blocking. The implemented semantics are this: If your AuthCertificate callback returns SECWouldBlock, then your BadCertHandler (if any) will never be called by libssl. The expectation is that when the asynchronous auth cert callback finishes with an error, it will either (a) close the connection or (b) call the bad cert handler directly (as we do in the modified Firefox below). 

The modified tstclnt continues to provide only a BadCertHandler. I think this is good enough, as far as code test coverage is concerned.

I should make sure that the async cert verification remains disabled for SSL2.
Do I need to update this patch to keep it disabled for servers? I don't have a great way of testing it for servers and we haven't updated selfserv to use it yet. I suggest we disable this functionality for servers and file a follow-up bug for server support.

I am also not sure about how to best integrate this in with the NSS automated test infrastructure. We should have a round of SSL tests that use the asynchronous cert auth feature, but I don't want to double the number of SSL tests that run, as they are already quite slow.

Firefox test builds that use the asynchronous behavior are currently running at https://tbpl.mozilla.org/?tree=Try&rev=8dbbf3bd423e and the builds are (will be) available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bsmith@mozilla.com-8dbbf3bd423e

I tried to implement the optimization that I mentioned above but I ran into some strange behavior. I hope that I just had a dumb mistake in my code that I couldn't find, but since it worked for some servers and not others, I fear that some servers may expect the ClientKeyExchange and/or ChangeCipherSpec and/or Finished messages to be in the same packet. I think that is something we should investigate later, after we have this basic support working.
Attachment #569876 - Flags: feedback?(wtc)
Attachment #569876 - Flags: feedback?(rrelyea)
Attachment #569876 - Flags: feedback?(agl)
Patrick found that there is a bug in this patch (see below). I am investigating it.

I also think that ssl3_HandleCertificate needs to be changed, because there is some code that gets "goto'd" over on SECWouldBlock that should be executed in that case too.

(In reply to Patrick McManus from Bug 674147 comment 15 and Bug 674147 comment 16)
> [...] I can no longer connect to https://www.cotendo.com/
> 
> I get a SSL_ERROR_RX_UNEXPECTED_CERTIFICATE error (see stack below) that I
> don't get without the patch set.
> 
> [...]
> 
> I tried other known SPDY/NPN hosts (e.g. mail.google.com) and was able to
> handshake fine.
> 
> Breakpoint 1, ssl3_HandleCertificate (ss=0x7fffd5c6a000, b=0x7fffd6bb6000
> "", length=3087) at ssl3con.c:7842
> 7842		errCode = SSL_ERROR_RX_UNEXPECTED_CERTIFICATE;
> (gdb) bt
> #0  ssl3_HandleCertificate (ss=0x7fffd5c6a000, b=0x7fffd6bb6000 "",
> length=3087) at ssl3con.c:7842
> #1  0x00007ffff6a273ca in ssl3_HandleHandshakeMessage (ss=0x7fffd5c6a000,
> b=0x7fffd6bb6000 "", length=3087) at ssl3con.c:8708
> #2  0x00007ffff6a27a58 in ssl3_HandleHandshake (ss=0x7fffd5c6a000,
> origBuf=0x7fffd5c6a390) at ssl3con.c:8871
> #3  0x00007ffff6a28576 in ssl3_HandleRecord (ss=0x7fffd5c6a000,
> cText=0x7fffe53fe780, databuf=0x7fffd5c6a390) at ssl3con.c:9171
> #4  0x00007ffff6a2975d in ssl3_GatherCompleteHandshake (ss=0x7fffd5c6a000,
> flags=0) at ssl3gthr.c:209
> #5  0x00007ffff6a2c5a8 in ssl_GatherRecord1stHandshake (ss=0x7fffd5c6a000)
> at sslcon.c:1258
> #6  0x00007ffff6a3917e in ssl_Do1stHandshake (ss=0x7fffd5c6a000) at
> sslsecur.c:155
> #7  0x00007ffff6a3bb1f in ssl_SecureSend (ss=0x7fffd5c6a000,
>
> [This] is a regression from mozilla-central.
Depends on: 651523
It is better to check in the patch for bug 651523 before this, because the code removed in bug 651523 is part of the code that gets skipped over, as I mentioned above.
This is no longer blocking the mozilla SPDY landing. We are going to land SPDY first and then do the SSL thread removal.
Whiteboard: [mozilla-SPDY]
Attachment #424071 - Attachment is obsolete: true
Attachment #569876 - Attachment is obsolete: true
Attachment #573009 - Flags: superreview?(rrelyea)
Attachment #573009 - Flags: review?(wtc)
Attachment #573009 - Flags: feedback?(agl)
Attachment #569876 - Flags: feedback?(wtc)
Attachment #569876 - Flags: feedback?(rrelyea)
Attachment #569876 - Flags: feedback?(agl)
Attachment #424071 - Flags: review?(wtc)
I took Wan-Teh's suggestion and modified tstclnt to use the asynchronous cert verification mechanism (returning SECWouldBlock from the cert auth hook and then using SSL_RestartHandshakeAfterServerCert) by default.

tstclnt can be forced to use the synchronous mechanism by using the new -P switch. The -P switch is required for SSL2 connections because asynchronous certificate validation cannot be used for SSL2. I have modified tests/ssl/ssl.sh to provide the -P switch when the test include "SSL2" and to disable SSL2 (using the -2 switch) otherwise.

This means that this feature is being tested by nearly every test in the SSL test suite.

The -P switch can be used with the -o switch because tstclnt's auth certificate hook (ownAuthCertificate) emulates the old bad cert handler behavior.
Attachment #573014 - Flags: superreview?(rrelyea)
Attachment #573014 - Flags: review?(wtc)
Comment on attachment 573009 [details] [diff] [review]
Implement SSL_RestartHandshakeAfterServerCert

Review of attachment 573009 [details] [diff] [review]:
-----------------------------------------------------------------

Note that the original implementation had comments saying that it did not work with renegotiations. This version *does* work with renegotiations, and this is tested thoroughly in all the sslauth.txt tests that use "-r_-r_-r" and "-r_-r_-r_-r" flags.

::: ../restart/mozilla/security/nss/lib/ssl/ssl3con.c
@@ +5751,5 @@
>  	PORT_SetError(SSL_ERROR_RX_UNEXPECTED_HELLO_DONE);
>  	return SECFailure;
>      }
>  
> +    ss->ssl3.hs.ws = wait_server_cert_auth;

By setting ss->ssl3.hs.ws to the new wait_server_cert_auth value, we ensure that we will not accept any handshake messages from the server, because every ssl3_Handle* function whitelists the handshake messages it accepts and none of them will accept wait_server_cert_auth.

@@ +8186,3 @@
>  
> +    ss->ssl3.hs.waitingForAuthCertificate = PR_FALSE;
> +    if (ss->ssl3.hs.ws == wait_server_cert_auth) {

Note that we have to test this condition, because it is possible that certificate authentication will finish before we have received the server's ServerHelloDone message.

::: ../restart/mozilla/security/nss/lib/ssl/sslimpl.h
@@ +790,5 @@
>  #ifdef NSS_ENABLE_ECC
>      PRUint32              negotiatedECCurves; /* bit mask */
>  #endif /* NSS_ENABLE_ECC */
> +
> +    PRBool                waitingForAuthCertificate;

This will be true from the time the auth certificate hook (or bad cert handler) returns SECWouldBlock until the time that SSL_RestartHandshakeAfterServerCert is called.
More notes to help reviewers:

1. To build and test the code, apply the patch for bug 651523, then the patch for bug 700516, then the two patches above.

2. This patch doesn't trigger bug 697910 any more because I have changed ssl3_HandleCertificate so that it never returns SECWouldBlock.
No longer depends on: 697910
Comment on attachment 573014 [details] [diff] [review]
Test SSL_RestartHandshakeAfterServerCert by having tstclnt use it by default.

Review of attachment 573014 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good in general.  I suggested some changes below.

::: ../restart/mozilla/security/nss/cmd/tstclnt/tstclnt.c
@@ +225,5 @@
>      fprintf(stderr, "%-20s Renegotiate N times (resuming session if N>1).\n", "-r N");
>      fprintf(stderr, "%-20s Enable the session ticket extension.\n", "-u");
>      fprintf(stderr, "%-20s Enable compression.\n", "-z");
> +    fprintf(stderr, "%-20s Use synchronous certificate validation "
> +                    "(required for SSL2)\n", "-P");

Just curious why you chose -P for this option, because it's the opposite
of "pause".

@@ +294,5 @@
>  	}
>      }
>  }
>  
> +typedef struct ServerCertAuth

The NSS convention is to use 'ServerCertAuthStr' as the struct tag.
Alternatively, omit the struct tag.

Please explain what "pause" means here.

@@ +315,5 @@
>              SECU_Strerror(err));
>      return SECSuccess;	/* override, say it's OK. */
>  }
>  
> +static PR_CALLBACK SECStatus 

Don't use the obsolete PR_CALLBACK qualifier.

@@ +326,5 @@
> +
> +    PORT_Assert(serverCertAuth->shouldPause);
> +    PORT_Assert(!serverCertAuth->isPaused);
> +    serverCertAuth->isPaused = PR_TRUE;
> +    PORT_SetError(PR_WOULD_BLOCK_ERROR);

Please do not set an error code when returning SECWouldBlock, unless
it is necessary.  This code will become sample code for doing asynchronous
certificate validation in SSL, so we should try to omit unnecessary code.

@@ +537,5 @@
> +
> +    serverCertAuth->isPaused = PR_FALSE;
> +    rv = SSL_AuthCertificate(serverCertAuth->dbHandle, fd, PR_TRUE, PR_FALSE);
> +    if (rv != SECSuccess && override) {
> +        rv = ownBadCertHandler(NULL, fd);

I expected that SSL_RestartHandshakeAfterServerCert would be
responsible for calling ownBadCertHandler.  This would require
SSL_RestartHandshakeAfterServerCert to take an input argument
(of the certificate verification status).

But your design is fine by me, too (only call
SSL_RestartHandshakeAfterServerCert if the certificate is good
or the certificate error is overridden).

Note that ownBadCertHandler merely return SECSuccess, so we
don't really need to call it here.

@@ +544,5 @@
> +	return rv;
> +    }
> +    
> +    rv = SSL_RestartHandshakeAfterServerCert(fd);
> +    if (rv < 0 && PORT_GetError() == PR_WOULD_BLOCK_ERROR) {

This should be rv != SECSuccess, right?

It is strange for SSL_RestartHandshakeAfterServerCert to fail with PR_WOULD_BLOCK_ERROR.
I think the caller should be expected to call SSL_ForceHandshake to finish the handshake,
so SSL_RestartHandshakeAfterServerCert should return SECSuccess as long as it has
successfully restarted the handshake, even if the handshake is not finished yet.

@@ -659,2 @@
>      }
> -    handle = CERT_GetDefaultCertDB();

It seems better to set serverCertAuth.dbHandle right here.

@@ +1114,5 @@
> +
> +		    rv = restartHandshakeAfterServerCertIfNeeded(s,
> +				&serverCertAuth, override);
> +		    if (rv == SECFailure) {
> +			goto handshake_failed;

Can you avoid this goto by duplicating these three lines of code here?
    error = 254;
    SECU_PrintError(progName, "authentication of server cert failed");
    goto done;
Comment on attachment 573014 [details] [diff] [review]
Test SSL_RestartHandshakeAfterServerCert by having tstclnt use it by default.

Review of attachment 573014 [details] [diff] [review]:
-----------------------------------------------------------------

::: ../restart/mozilla/security/nss/cmd/tstclnt/tstclnt.c
@@ +1043,5 @@
>      while (pollset[SSOCK_FD].in_flags | pollset[STDIN_FD].in_flags) {
>  	char buf[4000];	/* buffer for stdin */
>  	int nb;		/* num bytes read from stdin. */
>  
> +	rv = restartHandshakeAfterServerCertIfNeeded(s, &serverCertAuth,

The places you call restartHandshakeAfterServerCertIfNeeded
seem ad hoc.  I think the rule you used is to call
restartHandshakeAfterServerCertIfNeeded right before calling
PR_Poll, right?
(In reply to Wan-Teh Chang from comment #14)
> > +    fprintf(stderr, "%-20s Use synchronous certificate validation "
> > +                    "(required for SSL2)\n", "-P");
> 
> Just curious why you chose -P for this option, because it's the opposite
> of "pause".

Because Adam chose 'P' with the opposite meaning and I didn't change it when I made async the default. Do you have a different suggestion? The best alternative I can think of is making '-O' mean "sync validation" and '-o' mean "ignore validation failures"; i.e. "O/o" somehow means "override validation behavior."

> Please do not set an error code when returning SECWouldBlock, unless
> it is necessary.  This code will become sample code for doing asynchronous
> certificate validation in SSL, so we should try to omit unnecessary code.

OK. It is not necessary to set the error code. The code I added to libssl calls PORT_SetError(PR_WOULD_BLOCK_ERROR); when it receives the SECWouldBlock, precisely to make it unnecessary to do so here.

> But your design is fine by me, too (only call
> SSL_RestartHandshakeAfterServerCert if the certificate is good
> or the certificate error is overridden).

I think generally it would be suboptimal for an application to an application to use a badCertHandler called by SSL_RestartHandshakeAfterServerCert, especially considering likely threading models. For example, in Firefox, our cert validation must be done in one thread, cert override processing must be done (mostly) in another thread, and calls to libssl must be done on yet another thread. I think other applications that want to make use of the async mechanism will have similar requirements.

I will mention this in new documentation for SSL_BadCertHook in an updated patch.

> Note that ownBadCertHandler merely return SECSuccess, so we
> don't really need to call it here.

I would like to leave it like I wrote it, to demonstrate how an application can be converted to use this asynch mechanism if they've already split their validation into an auth certificate hook and a bad cert handler.

> @@ +544,5 @@
> > +	return rv;
> > +    }
> > +    
> > +    rv = SSL_RestartHandshakeAfterServerCert(fd);
> > +    if (rv < 0 && PORT_GetError() == PR_WOULD_BLOCK_ERROR) {
> 
> This should be rv != SECSuccess, right?
> 
> It is strange for SSL_RestartHandshakeAfterServerCert to fail with
> PR_WOULD_BLOCK_ERROR.

If SSL_RestartHandshakeAfterServerCert calls a function that sets PR_WOULD_BLOCK_ERROR, should it convert the SECFailure/SECWouldBlock to SECSuccess then? That makes sense to me.

> I think the caller should be expected to call SSL_ForceHandshake to finish
> the handshake, so SSL_RestartHandshakeAfterServerCert should return
> SECSuccess as long as it has successfully restarted the handshake, even if
> the handshake is not finished yet.

The application needs to call *some* function to force the handshake; it could be SSL_ForceHandshake or PR_Write or PR_Read too. I purposely avoided calling SSL_ForceHandshake in tstclnt to demonstrate this. I mentioned this in the documentation for SSL_RestartHandshakeAfterServerCert:

    * This function will not complete the entire handshake. The application
    * must call SSL_ForceHandshake, PR_Recv, PR_Send, etc. after calling
    * this function to force the handshake to complete.

> 
> @@ -659,2 @@
> >      }
> > -    handle = CERT_GetDefaultCertDB();
> 
> It seems better to set serverCertAuth.dbHandle right here.

I moved the call to CERT_GetDefaultCertDB() so that the code for controlling cert authentication is all together in the same block. Not a big deal either way, but it seems clearer to me.

> @@ +1114,5 @@
> > +
> > +		    rv = restartHandshakeAfterServerCertIfNeeded(s,
> > +				&serverCertAuth, override);
> > +		    if (rv == SECFailure) {
> > +			goto handshake_failed;
> 
> Can you avoid this goto by duplicating these three lines of code here?
>     error = 254;
>     SECU_PrintError(progName, "authentication of server cert failed");
>     goto done;

OK. I was hoping that the label "handshake_failed" was useful documentation to show that exit code 254 (typically?) means "handshake failure". I can change it to remove the goto.
(In reply to Wan-Teh Chang from comment #15)
> The places you call restartHandshakeAfterServerCertIfNeeded
> seem ad hoc.  I think the rule you used is to call
> restartHandshakeAfterServerCertIfNeeded right before calling
> PR_Poll, right?

Yes, for tstclnt; if you remove either of those calls to restartHandshakeAfterServerCertIfNeeded, you will end up waiting forever in PR_Poll, because PR_Poll only detects activity on the file descriptors, and we are waiting for something other than I/O.

If we removed the calls to PR_Poll so that tstclnt was busy looping until all content was written, we would still need one call to restartHandshakeAfterServerCertIfNeeded per loop.

Typically, an application must restart the handshake with SSL_RestartHandshakeAfterServerCert before doing any blocking I/O or I/O with large timeouts on the socket. This includes the calls to PR_Poll(...,PR_INTERVAL_NO_TIMEOUT) in tstclnt.

However, this isn't always true. An application could do a blocking (or long-timeout) PR_Poll on the socket that is waiting to be restarted *if* there is some other way to break the PR_Poll. In Gecko, we are always polling with a long timeout. But we poll on socket file descriptors AND a file descriptor for a pipe; when we write to the pipe, PR_Poll returns to the application. In Gecko, we write to that pipe whenever we dispatch an event to the socket transport thread. So, in the Gecko usage of this function, when cert validation completes, I dispatch an event to the socket transport thread, to cause PR_Poll to return. Then, when the event is run, the code for the event calls SSL_RestartHandshakeAfterServerCert which gets the handshake going again. (An alternative would be for the cert validation thread to call SSL_RestartHandshakeAfterServerCert and then send a dummy no-op event to the socket transport service.)

Note that in the Gecko usage, we never have to call SSL_ForceHandshake. In the future, we will use SSL_ForceHandshake for NPN, but that isn't related to our usage of this asynchronous cert validation mechanism.
Comment on attachment 573009 [details] [diff] [review]
Implement SSL_RestartHandshakeAfterServerCert

r+, though I do have some questions:

n ssl_sendClientSecondRound()

rather than branching around to 'before_finished' when ss->ssl3.hs.sentSecondRoundExceptFinished, why not make the code between before_finished it's own function which is called directly bot both ssl_sendClientSecondRound() and ssl_RestartHandshakeAfterServerCert(). This will allow you to get rid of ss->ssl3.hs.sendSecondRoundExceptFinished, or is there  some sort of race you are avoiding? Or are you dealing with the case where The clientchain is not null but we are in hs>waitingForAuthCertificate state? (in which case, keep the structure, but still make the code between before_finished it's own function.

Line 5800: shouldn't rv be SECWouldBlock?

bob
Attachment #573009 - Flags: superreview?(rrelyea) → superreview+
Comment on attachment 573014 [details] [diff] [review]
Test SSL_RestartHandshakeAfterServerCert by having tstclnt use it by default.

r+ rrelyea
Attachment #573014 - Flags: superreview?(rrelyea) → superreview+
I believe there is a bug in my code regarding renegotiations. Before we send our change_cipher_spec, we write application data with the old cipher spec. But, after we send our change_cipher_spec, we must write application data with the new cipher spec. But, we must not write application data with the new cipher spec until after we've sent the Finished message. With my patch, we send the change_cipher_spec and then return if the cert hasn't been validated yet. But, I believe we will still allow application data to be written in the middle of the renegotiation handshake. (If we don't, then there is no bug). To fix this, I need to make sure that I always send the change_cipher_spec and Finished messages at the same time.

One way to fix that would be to avoid sending the change_cipher_spec message until cert authentication is complete. But, I think it is better (performance-wise) to do something different:

1. For the initial handshake with no client cert, send the entire second round (including the Finished message), but do not mark the handshake as idle until cert authentication is complete. This way, no application data will be sent/received with an invalid cert.

2. For any handshake (initial or renegotiation) with a client cert, do as my patch does, and avoid sending any part of the second round until cert authentication is complete.

3. For a renegotiation handshake (with our without a client cert), do not send any part of the second round until certificate validation is complete.

Doing things in this new way would have the following benefits:
1. In an initial handshake, the server can process our Finished message AND send its own Finished message while we wait for certificate validation. This is an even bigger latency win than what I was trying to accomplish with my initial approach.
2. There would be no change to the way the second round is grouped into a single packet. This will eliminate any potential compatibility problems with any (broken) servers that expect change_cipher_spec and Finished to be in the same packet.

I believe that, as long as we block the sending/receiving of any application data and any client certificates until cert authentication is complete, it is 100% safe to exchange the Finished messages, even for server-speaks-first protocols like IMAP. Previously, I had thought this might be dangerous for server-speaks-first protocols because the server would start sending application data before the client has validated its cert. But, if it is the correct server, then this doesn't matter; if it is a malicious server then it wouldn't matter as long as we make sure we don't add the session to the session cache, we don't accept renegotiation requests, and we don't read/write any data from that server.

Does this thinking seem correct to you guys?
Comment on attachment 573014 [details] [diff] [review]
Test SSL_RestartHandshakeAfterServerCert by having tstclnt use it by default.

There are several problems with this patch. Some are mentioned above already.

Another problem was found by Bob during his review--returning SECSuccess instead of SECWouldBlock. It took me a while to understand how/why this could be a bug but now I understand why: If we were to return SECWouldBlock, we would end up permanentely ignoring any handshake messages that appear the current message the same handshake record.

It is not practical to modify tstclnt and selfserv to test everything that needs to be tested for this feature. This weekend, I built a new small(-ish) testing harness for testing it. See 702322.
Attachment #573014 - Flags: superreview+
Attachment #573014 - Flags: review?(wtc)
Attachment #573014 - Flags: review-
Comment on attachment 573009 [details] [diff] [review]
Implement SSL_RestartHandshakeAfterServerCert

Sorry, comment 21 was supposed to be on this attachment. (Also s/702322/Bug 702322/)
Attachment #573009 - Flags: superreview+
Attachment #573009 - Flags: review?(wtc)
Attachment #573009 - Flags: review-
Attachment #573009 - Flags: feedback?(agl)
I made the following changes:
1. I renamed the function to SSL_RestartHandshakeAfterAuthCertificate; the new implementation likely only needs minor changes to work for both clients and servers, so if/when that change is made, we should be able to support both with this one function.
2. The SSL_RestartHandshakeAfterServerCert function was previously already exported; I now provide an always-failing implementation with the same signature, to ensure binary compatibility with any application that was (wrongly) calling it.
3. SSL_RestartHandshakeAfterCertReq had the same issues as SSL_RestartHandshakeAfterServerCert had and its broken implementation would likely make code review confusing, so I also made it into an always-failing function.
4. Based on Bob's comments, I found that I was not handling some corner cases correctly. I have corrected that, and that required major changes to the implementation. As a consequence of that, the "goto" that Bob didn't like is gone.
5. The behavior should now be close to what Chrome does. The entire handshake can be completed before the cert is validated, but the application cannot send/receive data until the cert has been validated. Consequently, when an application switches from synchronous to asynchronous cert validation using this API, they should get an automatic performance boost.

Notes:
1. I did not try to do anything special for False Start. I think that, if cert validation is still pending, then ssl3_CanFalseStart should return false. However, that would mean that ssl3_CanFalseStart would start to return different results at different points in the handshake. But, others might disagree. I think we should handle the interaction between this feature and False Start in another bug.
2. We should have options to control whether we allow sending of the client cert and/or NPN message before cert validation is done. This implementation always waits for cert validation before sending the client certificate message but it doesn't want for sending the NPN message. (This is the behavior that I think Firefox should have.) However, in another patch, we should change the default to be stricter (not sending the NPN message), with an option to get the looser behavior. I think we should do that in another bug.

I switched to r?rrelyea and sr?wtc since Bob already reviewed the first attempt last week.
Attachment #573009 - Attachment is obsolete: true
Attachment #574828 - Flags: superreview?(wtc)
Attachment #574828 - Flags: review?(rrelyea)
Attachment #574828 - Flags: feedback?(agl)
Attachment #574828 - Attachment is obsolete: true
Attachment #574829 - Flags: superreview?(wtc)
Attachment #574829 - Flags: review?(rrelyea)
Attachment #574829 - Flags: feedback?(agl)
Attachment #574828 - Flags: superreview?(wtc)
Attachment #574828 - Flags: review?(rrelyea)
Attachment #574828 - Flags: feedback?(agl)
This addresses the review comments for the previous version. The switch is now -O instead of -P.
Attachment #573014 - Attachment is obsolete: true
Attachment #574831 - Flags: superreview?(wtc)
Attachment #574831 - Flags: review?(rrelyea)
Here are the tests I created using the mini framework I created in bug 702322 to demonstrate that this code actually works.
Attachment #574850 - Flags: feedback?(wtc)
Attachment #574850 - Flags: feedback?(rrelyea)
Comment on attachment 574829 [details] [diff] [review]
Implement SSL_RestartHandshakeAfterServerCert [v4]

Hmm...seems like there is a problem either in Firefox or with Bugzilla uploading patches recently. Even though I was very careful to select v4 of this patch last night (twice!), I still ended up uploading a different version.
Attachment #574829 - Attachment is obsolete: true
Attachment #574829 - Flags: superreview?(wtc)
Attachment #574829 - Flags: review?(rrelyea)
Attachment #574829 - Flags: feedback?(agl)
Attachment #575064 - Flags: review?(rrelyea)
Comment on attachment 574850 [details] [diff] [review]
SSL_RestartHandshakeAfterServerCert loopback tests

r+ rrelyea.
The restart function does make things a lot cleaner.

bob
Attachment #574850 - Flags: feedback?(rrelyea) → feedback+
Comment on attachment 575064 [details] [diff] [review]
Implement SSL_RestartHandshakeAfterAuthCertificate [v4] [checked in]

r+ rrelyea. oops This is the patch I meant to r+ reviewing the other one now..
Attachment #575064 - Flags: review?(rrelyea) → review+
Comment on attachment 574850 [details] [diff] [review]
SSL_RestartHandshakeAfterServerCert loopback tests

return to ? until I finish my review...
Attachment #574850 - Flags: feedback+ → feedback?
Comment on attachment 574850 [details] [diff] [review]
SSL_RestartHandshakeAfterServerCert loopback tests

There seems to be something missing here. There are a lot of MACROS which don't seem to have a definition. Is this a diff on top of some other patch?
Attachment #574850 - Flags: feedback? → feedback?(rrelyea)
Comment on attachment 574831 [details] [diff] [review]
Test SSL_RestartHandshakeAfterServerCert by having tstclnt use it by default [v4]

r+ rrelyea
Attachment #574831 - Flags: review?(rrelyea) → review+
(In reply to Robert Relyea from comment #32)
> Comment on attachment 574850 [details] [diff] [review] [diff] [details] [review]
> SSL_RestartHandshakeAfterServerCert loopback tests
> 
> There seems to be something missing here. There are a lot of MACROS which
> don't seem to have a definition. Is this a diff on top of some other patch?

It should build correctly on top of the patches in bug 702322.
Brian intends to land the reviewed patches into mozilla-central for testing.

Wouldn't it have been simpler to check the already r+ed patches into NSS, and create a tag that includes it?
== Urgent reminder ==

We need to urgently get this bug reviewed, checked in to NSS and released in a new NSS release.

mozilla-central uses the current patches attached to this bug.

While this is acceptable for testing version, it's inacceptable for released versions of Mozilla software.

In two days this patch will be delivered as part of Mozilla Aurora, and it will be much more difficult to get an NSS update into Mozilla code.
Brian, since you have lobbied to get this patch landed early,
I see it as your responsibility to push this forward quickly.

I'm highly worried that there has been no activity in this bug during the previous 4 weeks.

Given that this patch introduces a new API, it is unacceptable to ship this unofficial mix of NSS.


Brian, if you are unable to get this bug closed in time, please back out the associated patches from mozilla-central (and mozilla-aurora).
Hi Kai, I am going to work on this today. As I noted in bug 360420, reviewing your OCSP stapling code made me realize that we need to have some way for the application to indicate that libssl should send the bad_certificate or bad_certificate_status_response alert to the peer. I am working on this now.
(In reply to Kai Engert (:kaie) from comment #36)
> In two days this patch will be delivered as part of Mozilla Aurora, and it
> will be much more difficult to get an NSS update into Mozilla code.

It will not be difficult to get the updated NSS into mozilla-aurora, especially because of this issue. Just like, for example, we took 3.13.1 near the end of Aurora.

I think it would be a good idea for Wan-Teh to provide feedback on some of the details, in particular the interaction between this code and False Start and the way this patch blocks the processing of client certs and the handshake callback until after the auth cert hook is complete.
At least two changes could/should be made to this:

1. There must be a way for the application to tell libssl whether to send the bad_certificate or bad_certificate_status_response alert.

2. We must document that the async mechanism only works for non-blocking sockets. Possibly, when the auth certificate callback returns SECWouldBlock, we should verify that the connection is non-blocking and fail if it isn't.
(In reply to Brian Smith (:bsmith) from comment #40)
> 1. There must be a way for the application to tell libssl whether to send
> the bad_certificate or bad_certificate_status_response alert

... or unsupported_certificate, certificate_revoked, certificate_expired,  certificate_unknown.
These tests are updated to use the new function name/signature. I also added two tests: one tests that False Start is disabled for the connection when the auth certificate hook returns SECWouldBlock; the other tests that we send an alert when SSL_AuthCertificateComplete is called with a non-zero (error) status.

Bug 713933 is for allowing False Start and async certificate validation to work together (safely).

The previous four patches should be applied on top of the first two (attachment 575064 [details] [diff] [review] and attachment 574831 [details] [diff] [review]).

This patch, for adding tests, depends on the patches in bug 702322.
Attachment #574850 - Attachment is obsolete: true
Attachment #584626 - Flags: superreview?(wtc)
Attachment #584626 - Flags: review?(rrelyea)
Attachment #574850 - Flags: feedback?(wtc)
Attachment #574850 - Flags: feedback?(rrelyea)
Attachment #575064 - Flags: superreview?(wtc)
Attachment #584595 - Flags: superreview?(wtc)
I asked Wan-Teh to review this because he is the one that wrote the documentation on the NSPR poll method.

Wan-Teh, this patch that I am attaching now is a very high priority for Mozilla--higher than the superreviews for the other patches in this bug. It would be great if we could get a review of this patch ASAP so we can check it into mozilla-central on top of the other private patches. It does not depend on the review of the other patches.

When we have finished the handshake, but the application hasn't called SSL_AuthCertificateComplete yet, PR_Poll on the socket will indicate that we can read/write to the socket, but every read/write will -1 with PR_WOULD_BLOCK_ERROR. This will cause some applications, including Firefox and Thunderbird, to spin in a PR_Poll/PR_Write loop, consuming 100% CPU. It is important that PR_Poll not consider the socket readable/writable in this state.
Attachment #588233 - Flags: superreview?(rrelyea)
Attachment #588233 - Flags: review?(wtc)
This patch must be applied on top of the patch "Loopback tests for SSL_AuthCertificateComplete [v2]".

This patch and the previous patch are git-formatted patches because it is tedious to create patches for CVS that can be applied on the top of other patches that haven't been checked in.
Attachment #588234 - Flags: superreview?(rrelyea)
Attachment #588234 - Flags: review?(wtc)
Comment on attachment 584593 [details] [diff] [review]
spaces -> tabs and remove trailing whitespace from previous patch [checked in]

r+ rrelyea
Attachment #584593 - Flags: review?(rrelyea) → review+
Comment on attachment 584595 [details] [diff] [review]
Rename SSL_RHAAC to SSL_AuthCertificateComplete; send alert when async cert. authentication fails [checked in]

r+

Note about initialization of rv at line 1493 of sslsecure.c:

While we can see from inspection the initialization isn't needed, it may have been added to silence a compilier warning (particularly from a compilier that wasn't smart enough to see all 3 branches of the if-then-else statement.

No change to the patch is required, just thought I'd note why you see this fairly frequently in NSS.

bob
Attachment #584595 - Flags: review?(rrelyea) → review+
Comment on attachment 584622 [details] [diff] [review]
Disable False Start when async cert authentication is used until they work safely together [checked in]

r+ the patch does do what it advertises. I'd like to get wtc's comments about turning off falsestart, however.

bob
Attachment #584622 - Flags: review?(rrelyea) → review+
Comment on attachment 588234 [details] [diff] [review]
Additions to the loopback tests to test that the poll method returns the correct results

Review of attachment 588234 [details] [diff] [review]:
-----------------------------------------------------------------

I am going to update this patch with more tests.
Attachment #588234 - Flags: superreview?(rrelyea)
Attachment #588234 - Flags: review?(wtc)
Comment on attachment 588233 [details] [diff] [review]
Change ssl_Poll so that it it indicates the socket is unwritable and unreadable when we are blocked waitinf for an async callback to complete

Review of attachment 588233 [details] [diff] [review]:
-----------------------------------------------------------------

I am going to update this patch.

::: security/nss/lib/ssl/sslsock.c
@@ +1958,5 @@
>  	    }
>  	}
> +    } else if ((new_flags & PR_POLL_READ) &&
> +               (SSL_DataPending(fd) > 0) &&
> +               !ssl_IsWaitingForRestart(ss)) {

I think this change may not be necessary. At least, all my tests pass without this change.

I believe there is no way SSL_DataPending(fd) > 0 when we are blocked waiting for an async callback, because we would only have data pending if we've processed an application_data record and we'd only process an application_data record if we'd already completed the handshake, so there can't be any async callback waiting.. Perhaps we should document that here.

@@ +1973,5 @@
> +	/* We're blocked on an async calback and we have no data waiting to be
> +	** sent.
> +	*/
> +	*p_out_flags = 0;
> +	return 0; 

I do not think that this is quite correct. We DO NOT want to consider the socket writable, but we DO want to be able to poll for errors (connection closed). So, perhaps we should set *p_out_flags = (PR_POLL_READ | PR_POLL_EXCEPT) if (inflags & PR_POLL_EXCEPT).

This is somewhat related to bug 658138.
Attachment #588233 - Flags: superreview?(rrelyea)
Attachment #588233 - Flags: review?(wtc)
Attachment #588234 - Attachment is obsolete: true
Attachment #588358 - Flags: superreview?(rrelyea)
Attachment #588358 - Flags: review?(wtc)
(In reply to Brian Smith (:bsmith) from comment #54)
> @@ +1973,5 @@
> > +	/* We're blocked on an async calback and we have no data waiting to be
> > +	** sent.
> > +	*/
> > +	*p_out_flags = 0;
> > +	return 0; 
> 
> I do not think that this is quite correct. We DO NOT want to consider the
> socket writable, but we DO want to be able to poll for errors (connection
> closed). So, perhaps we should set *p_out_flags = (PR_POLL_READ |
> PR_POLL_EXCEPT) if (inflags & PR_POLL_EXCEPT).
> 
> This is somewhat related to bug 658138.

See http://hg.mozilla.org/mozilla-central/annotate/9b5f1ccdb021/nsprpub/pr/src/md/windows/w32poll.c#l221.

If the upper layer wants to poll for EXC then the socket is always polled for it regardless what socket's poll override returns.  It is OK to return 0.
Comment on attachment 588357 [details] [diff] [review]
Change ssl_Poll so that it it indicates the socket is unwritable and unreadable when we are blocked waiting for an async callback to complete [v2]

Review of attachment 588357 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/nss/lib/ssl/sslsock.c
@@ +1997,5 @@
> +	/* We must select on read, not write, to get them.
> +	 */
> +	if (result & PR_POLL_WRITE) {
> +	    result &= ~PR_POLL_WRITE;
> +	    result |= PR_POLL_READ;

I think you should not set to poll for READ.  If the upper layer wants to detect graceful close of the socket, it should say so it self, explicitly.  

Could it happen, in terms of SSL nego, that server sent some data before we finished the cert verification?  In that case we loop again..
Yes, the peer can always send data on the connection, regardless of whether they are supposed to or not according to the specification. :(

This patch addresses Honza's comments. I am unsure as to whether we will actually be able to detect that the socket has been closed by the peer in this situation; I tried to write a test for it using the loopback framework I created but the test wasn't able to detect the socket being closed by the peer. Help appreciated.
Attachment #588357 - Attachment is obsolete: true
Attachment #588931 - Flags: superreview?(rrelyea)
Attachment #588931 - Flags: review?(wtc)
Attachment #588357 - Flags: superreview?(rrelyea)
Attachment #588357 - Flags: review?(wtc)
Attachment #588358 - Attachment is obsolete: true
Attachment #588934 - Flags: superreview?(rrelyea)
Attachment #588934 - Flags: review?(wtc)
Attachment #588358 - Flags: superreview?(rrelyea)
Attachment #588358 - Flags: review?(wtc)
Comment on attachment 588931 [details] [diff] [review]
Change ssl_Poll so that it it indicates the socket is unreadable/unwritable and unreadable when we are blocked waiting for an async callback to complete [v3]

Review of attachment 588931 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/nss/lib/ssl/sslsock.c
@@ +1994,5 @@
> +     */
> +    if (ss->version >= SSL_LIBRARY_VERSION_3_0 &&
> +	ss->ssl3.hs.restartTarget != NULL) {
> +	result       &= ~(PR_POLL_WRITE | PR_POLL_READ);
> +	*p_out_flags &= ~(PR_POLL_WRITE | PR_POLL_READ);

Hmm.. actually clearing READ form out_flags here might not be correct.  If ssl_PollInner indicates there are data to read from the (ssl) socket - i.e. some buffered app data are ready to read, you should allow the upper layer read it, if that wouldn't of course just lead to WOULD_BLOCK because of waiting cert auth.  Here you know more then me.
(In reply to Honza Bambas (:mayhemer) from comment #58)
> Comment on attachment 588357 [details] [diff] [review]
> Change ssl_Poll so that it it indicates the socket is unwritable and
> unreadable when we are blocked waiting for an async callback to complete [v2]
> 
> Review of attachment 588357 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: security/nss/lib/ssl/sslsock.c
> @@ +1997,5 @@
> > +	/* We must select on read, not write, to get them.
> > +	 */
> > +	if (result & PR_POLL_WRITE) {
> > +	    result &= ~PR_POLL_WRITE;
> > +	    result |= PR_POLL_READ;
> 
> I think you should not set to poll for READ.  If the upper layer wants to
> detect graceful close of the socket, it should say so it self, explicitly.  
> 
> Could it happen, in terms of SSL nego, that server sent some data before we
> finished the cert verification?  In that case we loop again..

Maybe, when waiting for cert verify, call available on the lower layer in read and remember the state.  Error means to propagate it to the upper layer immediately - probably already works since close of a socket is detected.  Presence of data means to stop polling for read while waiting for cert verification.

Or allow poll for read here only every 1000ms or so.
(In reply to Honza Bambas (:mayhemer) from comment #61)
> Hmm.. actually clearing READ form out_flags here might not be correct.  If
> ssl_PollInner indicates there are data to read from the (ssl) socket - i.e.
> some buffered app data are ready to read, you should allow the upper layer
> read it, if that wouldn't of course just lead to WOULD_BLOCK because of
> waiting cert auth.  Here you know more then me.

That buffered app data would need to be authenticated which means we need to wait for cert auth to complete before we can return it to the application, so PR_Recv and PR_Read return PR_WOULD_BLOCK_ERROR until cert auth completes. So, clearing PR_POLL_READ is mostly doing the correct thing.

However, I see that I am overzealous in clearing PR_POLL_WRITE. If we have data in the pending buf, it is probably a client cert or other handshake data that should be sent to the server. It isn't *wrong* to clear PR_POLL_WRITE here; the application would work correctly, but the handshake would take longer to complete. Still, I should update the patch to fix this inefficiency.

It is also the case that clearing PR_POLL_READ is not wrong but is sub-optimal. The case where the peer has shut down its side of the connection is quite complicated. If the peer purposefully shut down the connection, then it probably did so because the connection timed out, and it probably sent a fatal alert before closing the connection. In that case, ideally we would process the alert and close the connection without waiting for cert auth to complete. To do that, we would have to allow for decoding at least one record from the peer after the finished message, and process it *only* if it is an alert. I understand how to implement such a change. [1]

is this a worthwhile improvement? And, should we make this improvement in this bug? Remember that previously, the only alternative was synchronous cert validation. So, even before, the application could not detect that the connection had been severed in these situations until after it had completed cert validation. Also, the peer could send us the close_notify alert at *any* time, during or after the handshake. To catch all of those ASAP, ssl_Poll should be returning PR_POLL_READ unconditionally, AFAICT. So, the bug "libssl does not allow the application to detect soon enough that the server (securely) closed the connection" seems like a separate enhancement request, perhaps related to bug 462874. So, for now, I think it is a good idea to clear PR_POLL_READ unconditionally when server cert authentication is pending. Especially, it is late in the release cycle of Firefox (especially) to make such a change to the core of the implementation of the SSL_AuthCertificateComplete feature itself. (Some checks of restartTarget would need to be moved from the record gathering code to other places.)

[1] This would actually be an optimization for SMTP/POP/IMAP clients, because the server speaks first. That means, the first record after the server's finished message would almost always be an application data record. We would end up gathering/decrypting/MACing the first application data record from the server and have it ready for processing as soon as the certificate authentication completed. But, this doesn't help for HTTP, because the client speaks first in HTTP.
See also bug 455974.

Also, I noticed that ssl_Poll does do *any* locking. AFAICT, this means that it doesn't necessarily share the same view of important variables as other threads. I did not add any locking to the ssl_Poll method because I didn't know if this lack of locking was intentional. But, it means that, for example, SSL_AuthCertificateComplete must generally be called on the same thread that would call PR_Poll on the socket, in order to avoid race conditions and to ensure that ssl_Poll and the implementation of ssl_AuthCertificateComplete share a consistent view of the restartTarget variable (at least).
(In reply to Brian Smith (:bsmith) from comment #63)
> (In reply to Honza Bambas (:mayhemer) from comment #61)
> > Hmm.. actually clearing READ form out_flags here might not be correct.  If
> > ssl_PollInner indicates there are data to read from the (ssl) socket - i.e.
> > some buffered app data are ready to read, you should allow the upper layer
> > read it, if that wouldn't of course just lead to WOULD_BLOCK because of
> > waiting cert auth.  Here you know more then me.
> 
> That buffered app data would need to be authenticated which means we need to
> wait for cert auth to complete before we can return it to the application,
> so PR_Recv and PR_Read return PR_WOULD_BLOCK_ERROR until cert auth
> completes. So, clearing PR_POLL_READ is mostly doing the correct thing.

That was the information I was missing.

> However, I see that I am overzealous in clearing PR_POLL_WRITE. If we have
> data in the pending buf, it is probably a client cert or other handshake
> data that should be sent to the server. It isn't *wrong* to clear
> PR_POLL_WRITE here; the application would work correctly, but the handshake
> would take longer to complete. Still, I should update the patch to fix this
> inefficiency.

Good idea.

> Also, the peer could send us the close_notify alert at
> *any* time, during or after the handshake. To catch all of those ASAP,
> ssl_Poll should be returning PR_POLL_READ unconditionally, AFAICT. So, the
> bug "libssl does not allow the application to detect soon enough that the
> server (securely) closed the connection" seems like a separate enhancement
> request, perhaps related to bug 462874. So, for now, I think it is a good
> idea to clear PR_POLL_READ unconditionally when server cert authentication
> is pending. Especially, it is late in the release cycle of Firefox
> (especially) to make such a change to the core of the implementation of the
> SSL_AuthCertificateComplete feature itself. (Some checks of restartTarget
> would need to be moved from the record gathering code to other places.)

I agree with this paragraph.
Comment on attachment 584621 [details] [diff] [review]
Update tstclnt to use the new function name and to send alerts when cert authentication fails [v2] [checked in]

r+ rrelyea
Attachment #584621 - Flags: review?(rrelyea) → review+
Comment on attachment 588931 [details] [diff] [review]
Change ssl_Poll so that it it indicates the socket is unreadable/unwritable and unreadable when we are blocked waiting for an async callback to complete [v3]

Clearing review requests as I will update this patch with a new one.
Attachment #588931 - Flags: superreview?(rrelyea)
Attachment #588931 - Flags: review?(wtc)
Comment on attachment 589264 [details] [diff] [review]
Change ssl_Poll so that it it indicates the socket is unwritable and unreadable when we are blocked waiting for an async callback to complete [v4]

r+ rrelyea
Attachment #589264 - Flags: superreview?(rrelyea) → superreview+
Comment on attachment 584621 [details] [diff] [review]
Update tstclnt to use the new function name and to send alerts when cert authentication fails [v2] [checked in]

Review of attachment 584621 [details] [diff] [review]:
-----------------------------------------------------------------

This patch needs work.  Please see my comments below.

::: mozilla/security/nss/cmd/tstclnt/tstclnt.c
@@ -530,3 +530,4 @@
> >                                          PRBool override)
> >  {
> >      SECStatus rv;
> > +    PRErrorCode status;

'status' is a bad name because it suggests the type is SECStatus.
Name this variable 'error'.

@@ -545,2 +543,3 @@
> >      if (rv != SECSuccess) {
> > -	return rv;
> > +        status = PR_GetError();
> > +        if (status == 0) {

Unless you call PR_SetError(0) before the SSL_AuthCertificate call
above, you can't use PR_GetError() == 0 as the indication that the
SSL_AuthCertificate call did not set error code.

Also, SSL_AuthCertificate is a short function and is easy to inspect.

@@ -546,1 +544,5 @@
> > -	return rv;
> > +        status = PR_GetError();
> > +        if (status == 0) {
> > +            PR_NOT_REACHED("SSL_AuthCertificate return SECFailure without "
> > +                           "setting error code.");
> > +            status = PR_INVALID_STATE_ERROR;

PR_INVALID_STATE_ERROR is not the right error for this condition.
If we don't yet have an error code for unknown certificate errors,
we can add SEC_ERROR_UNKNOWN_CERTIFICATE_ERROR.

Note that you check 'override' in an else-if.  This means if the
SSL_AuthCertificate call does not set the error code, you will skip
the ownBadCertHandler call.  Is this intentional?

@@ -548,2 +553,6 @@
> > -    
> > +    if (rv == SECSuccess) {
> > -    rv = SSL_RestartHandshakeAfterAuthCertificate(fd);
> > +        status = 0;
> > +    }
> > +
NaN more ...

It seems that you can just say
    rv = SSL_AuthCertificateComplete(fd, status);
(In reply to Wan-Teh Chang from comment #70)
> @@ -545,2 +543,3 @@
> > >      if (rv != SECSuccess) {
> > > -	return rv;
> > > +        status = PR_GetError();
> > > +        if (status == 0) {
> 
> Unless you call PR_SetError(0) before the SSL_AuthCertificate call
> above, you can't use PR_GetError() == 0 as the indication that the
> SSL_AuthCertificate call did not set error code.

I think you mean that if PR_GetError() == 0, we know that SSL_AuthCertificate didn't set the error code, but if PR_GetError() != 0, we don't know that SSL_AuthCertificate DID set the error code. I will fix this.

> Also, SSL_AuthCertificate is a short function and is easy to inspect.

SSL_AuthCertificate depends on CERT_VerifyCertNow (amongst other functions) to set the error code when it returns SECFailure, so there are actually a lot of code paths that would have to be inspected to prove that there is no bug that would cause SSL_AuthCertificate so it is non-trivial to prove that SSL_AuthCertificate always calls PR_SetError() with a non-zero error code when it returns SECFailure.

> 
> @@ -546,1 +544,5 @@
> > > -	return rv;
> > > +        status = PR_GetError();
> > > +        if (status == 0) {
> > > +            PR_NOT_REACHED("SSL_AuthCertificate return SECFailure without "
> > > +                           "setting error code.");
> > > +            status = PR_INVALID_STATE_ERROR;
> 
> PR_INVALID_STATE_ERROR is not the right error for this condition.
> If we don't yet have an error code for unknown certificate errors,
> we can add SEC_ERROR_UNKNOWN_CERTIFICATE_ERROR.

If this was in library code, I would agree. But, I am not sure it is worthwhile to define a new error code for this supposed-to-be-impossible error detected in testing code.

> Note that you check 'override' in an else-if.  This means if the
> SSL_AuthCertificate call does not set the error code, you will skip
> the ownBadCertHandler call.  Is this intentional?

Yes, because the override mechanism is supposed to override cases where we (correctly) validated a certificate and found that it shouldn't be trusted. But, the case where SSL_AuthCertificate didn't set the error code is a bug in SSL_AuthCertificate (or a function it calls), and the override mechanism isn't there to protect again such bugs. It is better for the error to be reported so that the bug can be found.

> @@ -548,2 +553,6 @@
> > > -    
> > > +    if (rv == SECSuccess) {
> > > -    rv = SSL_RestartHandshakeAfterAuthCertificate(fd);
> > > +        status = 0;
> > > +    }
> > > +
> NaN more ...
> 
> It seems that you can just say
>     rv = SSL_AuthCertificateComplete(fd, status);

The patch "Update tstclnt to use the new function name and to send alerts when cert authentication fails" is to be applied on top of this patch, and it changes this code to do exactly that.
Comment on attachment 589264 [details] [diff] [review]
Change ssl_Poll so that it it indicates the socket is unwritable and unreadable when we are blocked waiting for an async callback to complete [v4]

Review of attachment 589264 [details] [diff] [review]:
-----------------------------------------------------------------

r=wtc.  I suggest some comment changes below.

The location of the new code seems correct.  This function is
complicated, so I am not 100% sure.

Two other comments:

1. Before certificate verification is finished, we cannot report
a close_notify alert received after the Finished message.  It is
an authenticated close_notify alert, so it requires certificate
authentication, even though it is merely a graceful closure
notification.

2. I guess you want to leave the SSL socket in PR_Poll while
verifying the certificate so that the handshake can proceed in
parallel.  But this means the SSL_AuthCertificateComplete call
cannot unblock the PR_Poll call.  You need to use some other
mechanism, such as an NSPR pollable event, to unblock the PR_Poll
call, call SSL_AuthCertificateComplete, and then call PR_Poll
again.  Is this how it is done?

::: security/nss/lib/ssl/sslsock.c
@@ +1957,5 @@
>  	new_flags |=  PR_POLL_WRITE;   /* also select on write. */
> +    }
> +
> +    if (ss->version >= SSL_LIBRARY_VERSION_3_0 &&
> +	ss->ssl3.hs.restartTarget != NULL) {

Should we also test ss->firstHsDone here?  Or is that
implied by ss->ssl3.hs.restartTarget != NULL?

If !ss->firstHsDone, we need to report if the lower layer
is readable or writable to allow the handshake to proceed.
So, it seems that not testing ss->firstHsDone here will
suppress the optimization of verifying the certificate
while the handshake is in progress.

@@ -1958,1 +1958,5 @@
> >  	return new_flags;
> > +    }
> > +
> > +    if (ss->version >= SSL_LIBRARY_VERSION_3_0 &&
> > +	ss->ssl3.hs.restartTarget != NULL) {
> > +	/* PR_Recv and PR_Read will block until the callback completes. */

I suggest changing "PR_Recv and PR_Read" to just "Read"
and "PR_Send and PR_Write" to "Write".

"until the callback completes" is not clear.  I think
you mean "until SSL_AuthCertificateComplete is called".
Or you can say "until certificate authentication completes".

Please document the purpose of the
"!ss->lastWriteBlocked || ss->pendingBuf.len == 0" test
(to allow pending write data to be sent).  It is not
obvious.
Attachment #589264 - Flags: review?(wtc) → review+
(In reply to Brian Smith (:bsmith) from comment #71)
>
> I think you mean that if PR_GetError() == 0, we know that
> SSL_AuthCertificate didn't set the error code, but if PR_GetError() != 0, we
> don't know that SSL_AuthCertificate DID set the error code. I will fix this.

Functions that use NSPR's error reporting system are not required
to set the error code to 0 when they return successfully.  So most
of these functions simply set the error code before they return a
failure, and do not touch the error code when they return successfully.

So, if SSL_AuthCertificate does not set the error code before returning
a failure, PR_GetError() will return a stale error code set by the last
function that failed, rather than returning 0.

For your code to work, you have to call PR_SetError(0) before calling
SSL_Authenticate.

I would not worry about this.  Since people may use tstclnt as sample
code, this check may be misinterpreted as necessary and copied to
their code.

Re: PR_INVALID_STATE_ERROR: it's fine if you don't want to define a
new error code.  I just wanted to point out that PR_INVALID_STATE_ERROR
has a specific meaning, and it is not for this error condition.  You
can use PR_UNKNOWN_ERROR, SEC_ERROR_LIBRARY_FAILURE, or
SSL_ERROR_BAD_CERTIFICATE.
(In reply to Wan-Teh Chang from comment #72)
> 2. I guess you want to leave the SSL socket in PR_Poll while
> verifying the certificate so that the handshake can proceed in
> parallel.  But this means the SSL_AuthCertificateComplete call
> cannot unblock the PR_Poll call.  You need to use some other
> mechanism, such as an NSPR pollable event, to unblock the PR_Poll
> call, call SSL_AuthCertificateComplete, and then call PR_Poll
> again.  Is this how it is done?

Yes, that is exactly what Gecko does. We dispatch an event to the socket transport service thread. During the dispatching of the event, the socket transport service to set pollable event to bet set, which unblocks the PR_Poll. Then the event's Run() method is executed, which calls SSL_AuthCertificateComplete. Then, the socket transport thread will re-run its poll loop, calling PR_Poll again.

> 
> ::: security/nss/lib/ssl/sslsock.c
> @@ +1957,5 @@
> >  	new_flags |=  PR_POLL_WRITE;   /* also select on write. */
> > +    }
> > +
> > +    if (ss->version >= SSL_LIBRARY_VERSION_3_0 &&
> > +	ss->ssl3.hs.restartTarget != NULL) {
> 
> Should we also test ss->firstHsDone here?  Or is that
> implied by ss->ssl3.hs.restartTarget != NULL?

restartTarget will be non-NULL if and only if we are in a handshake that can not make any progress until an async completion function is called. That is exactly the situation in which we don't want to poll the socket. It doesn't matter if we're in the first or a subsequent handshake.

Right now, the only async completion function (that works) is SSL_AuthCertificateComplete. But, we will be able to use the same mechanism the client certificate callback too. I also have some ideas for other async callbacks but I don't know if they are worthwhile yet.

In the initial handshake, firstHsDone is only set after the handshake is complete AND the certificate has been authenticated. If the handshake is complete, but the cert hasn't been authenticated, then restartTarget will be the new ssl3_FinishHandshake function.

If a client certificate is to be sent, then the handshake will block before sending the client certificate, with restartTarget == ssl3_SendClientSecondRound.

In renegotiation handshakes, firstHsDone will already be set, of course. But, the handshake will block with restartTarget == ssl3_SendClientSecondRound. (So, really, a renegotiation handshake isn't as completely asynchronous as the first handshake. I did it this way because I was concerned libssl might process application data right after the change_cipher_spec message and before the finished message on renegotiation handshakes, which would be protected by the renegotiation cert's key, and we should authenticate the cert before using its key to protect any application data.)

> If !ss->firstHsDone, we need to report if the lower layer
> is readable or writable to allow the handshake to proceed.
> So, it seems that not testing ss->firstHsDone here will
> suppress the optimization of verifying the certificate
> while the handshake is in progress.

Once we get to a point where we really need the certificate to be validated, we will set restartTarget to non-null. Until that point, restartTarget will be null. When we get to that point, we will see that authCertificatePending is set, and we will set restartTarget to a non-null value.

> "until the callback completes" is not clear.  I think
> you mean "until SSL_AuthCertificateComplete is called".
> Or you can say "until certificate authentication completes".

Right now, this is true. But, if we add additional async callbacks in the future (the client cert callback being the most obvious pending one), then it will become inaccurate. That is why I wrote the comments to be vague about which callback is being completed.

However, if you would prefer, I can change the comments to reference SSL_AuthCertificateComplete directly; we could then change the comments in the future if/when we implement other async callbacks.
Re: to clarify my earlier comment
> "until the callback completes" is not clear.  I think
> you mean "until SSL_AuthCertificateComplete is called".
> Or you can say "until certificate authentication completes".

My complaint is about the use of "callback" here.  To me,
in this context, "callback" refers to a client function that
NSS calls.  I thought you're using "callback" to refer to
SSL_AuthCertificateComplete, which is an NSS function that
a client calls.  This is why my suggested rewording avoided
the use of "callback".

I can suggest other rewording that is vague:
    until the asynchronous operation we're waiting for completes

But now I believe I understand what you meant by "callback" here.
We can say
    until the asynchronous callback completes
where "asynchronous callback" is a callback that returns SECWouldBlock
and will arrange for an NSS function such as SSL_AuthCertificateComplete
to be called to report its completion.
This version addresses Wan-Teh's review comments and also makes a change.

In the prvious version of the patch, when the caller passed in PR_POLL_EXCEPT, we would return PR_POLL_EXCEPT. However, if PR_Poll were to return PR_POLL_EXCEPT for a socket, then the application is likely to try to read or write to the socket in order to get the error code for the error that occurred. This could cause the same kind of busy loop, AFAICT. So, we should clear even PR_POLL_EXCEPT. It makes sense to clear PR_POLL_EXCEPT in this case since the poll method is supposed to return the flags that will be passed to the poll() method, and we don't want the poll method to be called at all in this scenerio.
Attachment #589264 - Attachment is obsolete: true
Attachment #591695 - Flags: review?(wtc)
This version addresses the review comments.
Attachment #575064 - Attachment is obsolete: true
Attachment #591696 - Flags: review+
Attachment #575064 - Flags: superreview?(wtc)
Attachment #575064 - Attachment is obsolete: false
Attachment #574831 - Attachment is obsolete: true
Attachment #574831 - Flags: superreview?(wtc)
Attachment #591696 - Flags: superreview?(wtc)
If I understand correctly, during last Thursday's conference call, Wan-Teh gave permission to check in all of these patches, despite his pending sr.
Comment on attachment 575064 [details] [diff] [review]
Implement SSL_RestartHandshakeAfterAuthCertificate [v4] [checked in]

cvs commit: Examining .
Checking in SSLerrs.h;
/cvsroot/mozilla/security/nss/lib/ssl/SSLerrs.h,v  <--  SSLerrs.h
new revision: 1.18; previous revision: 1.17
done
Checking in ssl.def;
/cvsroot/mozilla/security/nss/lib/ssl/ssl.def,v  <--  ssl.def
new revision: 1.28; previous revision: 1.27
done
Checking in ssl.h;
/cvsroot/mozilla/security/nss/lib/ssl/ssl.h,v  <--  ssl.h
new revision: 1.46; previous revision: 1.45
done
Checking in ssl3con.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3con.c,v  <--  ssl3con.c
new revision: 1.159; previous revision: 1.158
done
Checking in ssl3gthr.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3gthr.c,v  <--  ssl3gthr.c
new revision: 1.11; previous revision: 1.10
done
Checking in sslerr.h;
/cvsroot/mozilla/security/nss/lib/ssl/sslerr.h,v  <--  sslerr.h
new revision: 1.19; previous revision: 1.18
done
Checking in sslimpl.h;
/cvsroot/mozilla/security/nss/lib/ssl/sslimpl.h,v  <--  sslimpl.h
new revision: 1.91; previous revision: 1.90
done
Checking in sslsecur.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslsecur.c,v  <--  sslsecur.c
new revision: 1.55; previous revision: 1.54
done
Attachment #575064 - Attachment description: Implement SSL_RestartHandshakeAfterAuthCertificate [v4] → Implement SSL_RestartHandshakeAfterAuthCertificate [v4] [checked in]
Comment on attachment 584593 [details] [diff] [review]
spaces -> tabs and remove trailing whitespace from previous patch [checked in]

cvs commit: Examining .
Checking in ssl.h;
/cvsroot/mozilla/security/nss/lib/ssl/ssl.h,v  <--  ssl.h
new revision: 1.47; previous revision: 1.46
done
Checking in ssl3con.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3con.c,v  <--  ssl3con.c
new revision: 1.160; previous revision: 1.159
done
Checking in ssl3gthr.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3gthr.c,v  <--  ssl3gthr.c
new revision: 1.12; previous revision: 1.11
done
Checking in sslimpl.h;
/cvsroot/mozilla/security/nss/lib/ssl/sslimpl.h,v  <--  sslimpl.h
new revision: 1.92; previous revision: 1.91
done
Attachment #584593 - Attachment description: spaces -> tabs and remove trailing whitespace from previous patch → spaces -> tabs and remove trailing whitespace from previous patch [checked in]
Comment on attachment 584595 [details] [diff] [review]
Rename SSL_RHAAC to SSL_AuthCertificateComplete; send alert when async cert. authentication fails [checked in]

cvs commit: Examining .
Checking in ssl.def;
/cvsroot/mozilla/security/nss/lib/ssl/ssl.def,v  <--  ssl.def
new revision: 1.29; previous revision: 1.28
done
Checking in ssl.h;
/cvsroot/mozilla/security/nss/lib/ssl/ssl.h,v  <--  ssl.h
new revision: 1.48; previous revision: 1.47
done
Checking in ssl3con.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3con.c,v  <--  ssl3con.c
new revision: 1.161; previous revision: 1.160
done
Checking in sslimpl.h;
/cvsroot/mozilla/security/nss/lib/ssl/sslimpl.h,v  <--  sslimpl.h
new revision: 1.93; previous revision: 1.92
done
Checking in sslsecur.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslsecur.c,v  <--  sslsecur.c
new revision: 1.56; previous revision: 1.55
done
Attachment #584595 - Attachment description: Rename SSL_RHAAC to SSL_AuthCertificateComplete; send alert when async cert. authentication fails → Rename SSL_RHAAC to SSL_AuthCertificateComplete; send alert when async cert. authentication fails [checked in]
Comment on attachment 584622 [details] [diff] [review]
Disable False Start when async cert authentication is used until they work safely together [checked in]

cvs commit: Examining .
Checking in ssl3con.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3con.c,v  <--  ssl3con.c
new revision: 1.162; previous revision: 1.161
done
Attachment #584622 - Attachment description: Disable False Start when async cert authentication is used until they work safely together → Disable False Start when async cert authentication is used until they work safely together [checked in]
Comment on attachment 591695 [details] [diff] [review]
Change ssl_Poll so that it it indicates the socket is unwritable and unreadable when we are blocked waiting for an async callback to complete [v5] [checked in]

An earlier version of this patch has r=rrelyea

The previous version of this patch has r=wtc

The most recent version of this patch has a change (described in comment 76) that has not yet been reviewed.
Comment on attachment 591695 [details] [diff] [review]
Change ssl_Poll so that it it indicates the socket is unwritable and unreadable when we are blocked waiting for an async callback to complete [v5] [checked in]

Bob or Wan-Teh,

can you please review the change that Brian made (between reviewed v4 and not-yet-reviewed v5) ?

You can look at the change at this link:
https://bugzilla.mozilla.org/attachment.cgi?oldid=589264&action=interdiff&newid=591695&headers=1

I'm going to check in v4, because v5 is not yet reviewed.
Attachment #591695 - Flags: review?(rrelyea)
(In reply to Kai Engert (:kaie) from comment #84)
> 
> I'm going to check in v4, because v5 is not yet reviewed.

No, I'm going to check in v5 anyway, because Brian seems to describe a bad bus loop in v4.

I have emailed Bob and Wan-Teh and asked them to look the interdiff with a high priority.
We should get this reviewed before the final release.
Comment on attachment 591695 [details] [diff] [review]
Change ssl_Poll so that it it indicates the socket is unwritable and unreadable when we are blocked waiting for an async callback to complete [v5] [checked in]

cvs commit: Examining .
Checking in sslsock.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslsock.c,v  <--  sslsock.c
new revision: 1.81; previous revision: 1.80
done
Attachment #591695 - Attachment description: Change ssl_Poll so that it it indicates the socket is unwritable and unreadable when we are blocked waiting for an async callback to complete [v5] → Change ssl_Poll so that it it indicates the socket is unwritable and unreadable when we are blocked waiting for an async callback to complete [v5] [checked in] [needs final review]
Comment on attachment 591696 [details] [diff] [review]
Test SSL_RestartHandshakeAfterServerCert by having tstclnt use it by default. [v5] [checked in]

I've checked in both this patch, and the other patch named
 "Update tstclnt to use the new function name and to send alerts when
  cert authentication fails [v2]"

in a single commit, because both are needed to successfully build.


Checking in cmd/tstclnt/tstclnt.c;
/cvsroot/mozilla/security/nss/cmd/tstclnt/tstclnt.c,v  <--  tstclnt.c
new revision: 1.65; previous revision: 1.64
done
Checking in tests/ssl/ssl.sh;
/cvsroot/mozilla/security/nss/tests/ssl/ssl.sh,v  <--  ssl.sh
new revision: 1.107; previous revision: 1.106
done
Attachment #591696 - Attachment description: Test SSL_RestartHandshakeAfterServerCert by having tstclnt use it by default. [v5] → Test SSL_RestartHandshakeAfterServerCert by having tstclnt use it by default. [v5] [checked in]
Attachment #584621 - Attachment description: Update tstclnt to use the new function name and to send alerts when cert authentication fails [v2] → Update tstclnt to use the new function name and to send alerts when cert authentication fails [v2] [checked in]
With the latest checkins, we get a new build warning.

tstclnt.c: In function ‘ownAuthCertificate’:
tstclnt.c:328:5: warning: too many arguments for format [-Wformat-extra-args]

code:
    FPRINTF(stderr, "using asynchronous certificate validation\n", progName);

As the parameter is obviously "too much", I've checked in an update to remove the parameter.

cvs commit: Examining .
Checking in tstclnt.c;
/cvsroot/mozilla/security/nss/cmd/tstclnt/tstclnt.c,v  <--  tstclnt.c
new revision: 1.66; previous revision: 1.65
done
Brian, in case you're unhappy with my checkins, you can simply execute this series of commands to revert (or tell me to do it).

cvs update -j1.66 -j1.65 mozilla/security/nss/cmd/tstclnt/tstclnt.c
cvs update -j1.107 -j1.106 mozilla/security/nss/tests/ssl/ssl.sh
cvs update -j1.65 -j1.64 mozilla/security/nss/cmd/tstclnt/tstclnt.c
cvs update -j1.81 -j1.80 mozilla/security/nss/lib/ssl/sslsock.c
cvs update -j1.162 -j1.161 mozilla/security/nss/lib/ssl/ssl3con.c
cvs update -j1.56 -j1.55 mozilla/security/nss/lib/ssl/sslsecur.c
cvs update -j1.93 -j1.92 mozilla/security/nss/lib/ssl/sslimpl.h
cvs update -j1.161 -j1.160 mozilla/security/nss/lib/ssl/ssl3con.c
cvs update -j1.48 -j1.47 mozilla/security/nss/lib/ssl/ssl.h
cvs update -j1.29 -j1.28 mozilla/security/nss/lib/ssl/ssl.def
cvs update -j1.92 -j1.91 mozilla/security/nss/lib/ssl/sslimpl.h
cvs update -j1.12 -j1.11 mozilla/security/nss/lib/ssl/ssl3gthr.c
cvs update -j1.160 -j1.159 mozilla/security/nss/lib/ssl/ssl3con.c
cvs update -j1.47 -j1.46 mozilla/security/nss/lib/ssl/ssl.h
cvs update -j1.55 -j1.54 mozilla/security/nss/lib/ssl/sslsecur.c
cvs update -j1.91 -j1.90 mozilla/security/nss/lib/ssl/sslimpl.h
cvs update -j1.19 -j1.18 mozilla/security/nss/lib/ssl/sslerr.h
cvs update -j1.11 -j1.10 mozilla/security/nss/lib/ssl/ssl3gthr.c
cvs update -j1.159 -j1.158 mozilla/security/nss/lib/ssl/ssl3con.c
cvs update -j1.46 -j1.45 mozilla/security/nss/lib/ssl/ssl.h
cvs update -j1.28 -j1.27 mozilla/security/nss/lib/ssl/ssl.def
cvs update -j1.18 -j1.17 mozilla/security/nss/lib/ssl/SSLerrs.h
I've created tag NSS_3_13_2_BETA3 for testing purposes, after my previous checkins.
Brian,

in order to allow you to make an assessment of the differences between

  Firefox 11 beta (current mozilla-beta including patches)

and

  Firefox 11 with security/patches reverted and NSS updated to NSS_3_13_2_BETA3

I will attach a diff between these versions.
For easier reviewing, I have excluded the changes to the CA trust module.

Could you please check if everything inside libSSL has been applied according to your expecations?

I apologize for jumping in, but I'm getting nervous because of the approaching FF 11.
I hope my work was helpful.


As soon as you confirm ths diffs in libSSL are according to your expectations, I will create a 3.13.2 release candidate tag.
Attachment #596315 - Flags: feedback?(bsmith)
With this set of patches, Firefox fails to build, because it still calls SSL_RestartHandshakeAfterAuthCertificate, which you rename to SSL_AuthCertificateComplete.


Brian, please make it clear which of the patches in this bug you want to exlude for 3.13.2 (if any), or alternatively, please point me to the patch that fixes the call to SSL_RestartHandshakeAfterAuthCertificate


Thanks
Comment on attachment 591695 [details] [diff] [review]
Change ssl_Poll so that it it indicates the socket is unwritable and unreadable when we are blocked waiting for an async callback to complete [v5] [checked in]

Review of attachment 591695 [details] [diff] [review]:
-----------------------------------------------------------------

r=wtc.  I suggest some comment changes below.

::: security/nss/lib/ssl/sslsock.c
@@ +1959,5 @@
> +
> +    if (ss->version >= SSL_LIBRARY_VERSION_3_0 &&
> +	ss->ssl3.hs.restartTarget != NULL) {
> +	/* Read and write will block until the asynchronous callback completes
> +	 * (e.g. until SSL_AuthCertificateComplete is called), so don't tell

e.g. => i.e.

@@ +1963,5 @@
> +	 * (e.g. until SSL_AuthCertificateComplete is called), so don't tell
> +	 * the caller to poll the socket unless there is pending write data.
> +	 */
> +	if (ss->lastWriteBlocked && ss->pendingBuf.len != 0) {
> +	    new_flags &= (PR_POLL_WRITE | PR_POLL_EXCEPT);

It seems clearer to just clear the PR_POLL_READ flag.

The problem with PR_POLL_EXCEPT is worth a comment in the
source code.

If PR_POLL_EXCEPT is a problem, should we also clear the
PR_POLL_EXCEPT flag when there is pending write data?
I guess it's not necessary because PR_XXX will send the
pending write data, which will fail with a real error
(not PR_WOULD_BLOCK_ERROR).
Attachment #591695 - Flags: review?(wtc) → review+
The landing of this bug has caused the NSS tinderbox to turn orange (TESTFAILED).

On one of the Linux machines, we get multiple failures during SSL.
In all failures, the log reports the following:

using asynchronous certificate validation
Error opening input terminal for read
tstclnt: write to SSL socket failed: The security password entered is incorrect.
tstclnt: exiting with return code 254

I don't know if this the same failure as in bug 718469. However, those older errors were sporadic. Now we failures each cycle.
(In reply to Wan-Teh Chang from comment #93)
> r=wtc.  I suggest some comment changes below.


Wan-Teh, thanks for the r+

As your changes seem to be cleanup/comment/clarity changes, I propose to get them in a follow-up bug.

I've filed bug 726315 with your proposals and assigned it to Brian.
Because of the Tinderbox failure, I will attempt to backout both tstclnt patches (the change that makes tstclnt use the new function by default).

cvs update -j1.66 -j1.65 mozilla/security/nss/cmd/tstclnt/tstclnt.c
cvs update -j1.107 -j1.106 mozilla/security/nss/tests/ssl/ssl.sh
cvs update -j1.65 -j1.64 mozilla/security/nss/cmd/tstclnt/tstclnt.c


Checking in cmd/tstclnt/tstclnt.c;
/cvsroot/mozilla/security/nss/cmd/tstclnt/tstclnt.c,v  <--  tstclnt.c
new revision: 1.67; previous revision: 1.66
done
Checking in tests/ssl/ssl.sh;
/cvsroot/mozilla/security/nss/tests/ssl/ssl.sh,v  <--  ssl.sh
new revision: 1.108; previous revision: 1.107
done
Attachment #584621 - Attachment description: Update tstclnt to use the new function name and to send alerts when cert authentication fails [v2] [checked in] → Update tstclnt to use the new function name and to send alerts when cert authentication fails [v2]
Attachment #591696 - Attachment description: Test SSL_RestartHandshakeAfterServerCert by having tstclnt use it by default. [v5] [checked in] → Test SSL_RestartHandshakeAfterServerCert by having tstclnt use it by default. [v5]
(In reply to Kai Engert (:kaie) from comment #94)
> The landing of this bug has caused the NSS tinderbox to turn orange
> (TESTFAILED).

I am investigating this now.

(In reply to Kai Engert (:kaie) from comment #92)
> Brian, please make it clear which of the patches in this bug you want to
> exlude for 3.13.2 (if any), or alternatively, please point me to the patch
> that fixes the call to SSL_RestartHandshakeAfterAuthCertificate

Bug 713934. (I am keeping track of these issues in the bug dependencies.)
(In reply to Kai Engert (:kaie) from comment #94)
> The landing of this bug has caused the NSS tinderbox to turn orange
> (TESTFAILED).
> 
> On one of the Linux machines, we get multiple failures during SSL.
> In all failures, the log reports the following:
> 
> using asynchronous certificate validation
> Error opening input terminal for read
> tstclnt: write to SSL socket failed: The security password entered is
> incorrect.
> tstclnt: exiting with return code 254

"Error opening input terminal for read" means that SECU_GetModulePassword is caling SECU_GetPasswordString and SECU_GetPasswordString is failing to open "/dev/tty" for reading. Consequently, tstclnt cannot even initialize the PKCS#11 module.

See bug 339675 that had a similar symptom. 

Several tstclnt-based tests passed before these failures started. Based on bug 339675, I suspect that maybe, somehow, something in the checkins are causing either tstclnt or selfserv processes to not exit properly and/or otherwise not get reaped properly. If the builds turn green then after Kai's backout of the tstclnt changes then we know it is a problem with the tests.

If the builds stay orange after that backout then it might either be a problem with the libssl changes themselves, or it may be some problem with the build machines themselves. Are the orange tests all running on the same physical and/or logical (virtual) machine? If so, we should reboot that machine if the orange continues, and see it turns green after the reboot, if Kai's backout of the test changes does not make things turn green.
Yes, all builds labeled buildnss02 are the same VM. I'll reboot it if the current cycle will still be orange.
Here is a wild guess:

The only failing machine was a Fedora machine, which has NSS system libraries globally installed, which do not contain ECC functionality.

All 20 failing tests on the Fedora machine involved a SSL connection that used ECDH.

Could the errors have been caused by a mixup of test-environment libraries and systemwide library?
(In reply to Kai Engert (:kaie) from comment #100)
> 
> Could the errors have been caused by a mixup of test-environment libraries
> and systemwide library?


No, I performed manual testing on the affected tinderbox build machine. I printed the link dependencies (ldd) of all NSS tools from within test script ssl.sh - all link to the libraries in the local build output.

Also, I checked which file handles are open. The tstclnt/selfserv tools have file handles to the freebl/softoken from the local build build.

Nothing appears to luse the global libraries.
(In reply to Kai Engert (:kaie) from comment #100)
> All 20 failing tests on the Fedora machine involved a SSL connection that
> used ECDH.

I have reproduced the problem on Windows. It seems to be the case that only the RHEL tinderboxes went orange because only the RHEL tinderboxes actually run these tests! I did not notice this before because I always skipped the ECC tests locally because I could previously not figure out how to get them to build & run!

It seems to be the case that the tests are dependent on the fact that certificate validation is done before the ephemeral ECDH key is generated, and that certificate validation will cause a login to the PKCS#11 module.
(In reply to Brian Smith (:bsmith) from comment #102)
> 
> I have reproduced the problem on Windows. It seems to be the case that only
> the RHEL tinderboxes went orange because only the RHEL tinderboxes actually
> run these tests!

s/RHEL/Fedora/

Thanks for this information. You're right, the tests "SSL Cipher Coverage" were not run on the other build machines while the test scripts were checked in.

I analyzed what's going on. The reason is that the build slaves operated in different testing modes.

Because Fedora buildnss02 is fast, I had configured it to run the full set of tests in each cycle.

Even thought the Windows buildnss03 is slow, it is also configured to run the full set of tests in each cycle. However, no run started with the tests checked in, so we didn't see the failure there.

Why didn't we see the failures on the macminis and the "SunOS communist" machine? Because they are configured to perform only 1 out of 4 tests variations in each run. I assume, we would have had to see 4 full cycles for each of the machines to see the failure.


Ok, I propose the following:
- I change the configuration of the macs to run the full set of tests in each run
- I check in the tests again
- then we should get the failures on buildnss02, buildnss03 and both Macs in each cycle

I would like to do this, to confirm that our tinderbox machines are configured correctly and each of them catches the failures.

In addition, we should see the failure in at least one out of 4 cycles on "SunOS communist".

(Note SunOS machine buildnss04 has all tests disabled, because that VM wasn't stable when running the tests.)
Now I'm able to reproduce the failure on my local Linux machine.

> using asynchronous certificate validation
> Error opening input terminal for read
> tstclnt: write to SSL socket failed: The security password entered is incorrect.

When running the tests locally, I get a PROMPT for a password!
The command being executed is:

    tstclnt -p 8443 -h localhost.localdomain -c :C001 -T -2 -f -d . -v -w nss \
            < /home/kaie/moz/nss/head/mozilla/security/nss/tests/ssl/sslreq.dat

The prompt being shown is:

    Enter Password or Pin for "NSS FIPS 140-2 Certificate DB":

In other words, despite the command line argument "-w nss" which is supposed to provide the passwords, the application prompts anyway.

I suspect the reason is the async verification. The provided password is probably not forwarded to the place where it will be needed later.
When adding the the -O parameter  (use synchronous certificate validation) to the command shown in comment 104, then there is no prompt for the password.
Depends on: 726588
The cause of the test failure is bug 726588.
My local tests succeed when using the fix attached to that bug.
(In reply to Kai Engert (:kaie) from comment #103)
> Ok, I propose the following:
> ...
> - I check in the tests again

mozilla/security/nss/cmd/tstclnt/tstclnt.c cvs version 1.68
mozilla/security/nss/tests/ssl/ssl.sh  cvs version 1.109
Attachment #584621 - Attachment description: Update tstclnt to use the new function name and to send alerts when cert authentication fails [v2] → Update tstclnt to use the new function name and to send alerts when cert authentication fails [v2] [checked in]
Attachment #591696 - Attachment description: Test SSL_RestartHandshakeAfterServerCert by having tstclnt use it by default. [v5] → Test SSL_RestartHandshakeAfterServerCert by having tstclnt use it by default. [v5] [checked in]
Bug 726588 fixed. Orange gone.

Brian, are we ready to tag the release candidate?
Comment on attachment 596315 [details] [diff] [review]
Diff between today's mozilla-beta and NSS_3_13_2_BETA3

Review of attachment 596315 [details] [diff] [review]:
-----------------------------------------------------------------

Yes, this is what we need for mozilla-centra/-aurora/-beta. Thanks for pushing this forward Kai.
Attachment #596315 - Flags: feedback?(bsmith) → feedback+
Can this be closed now?
(In reply to Kai Engert (:kaie) from comment #110)
> Can this be closed now?

We're still waiting to check in all the new tests. Should we file a new bug for the tests?
Sorry, I'm confused about the progress here.
I've just tried to build FF11b3 against 3.13.2 RTM and Firefox fails to build:
/usr/src/packages/BUILD/mozilla/security/manager/ssl/src/nsNSSIOLayer.cpp:976:64: error: 'SSL_RestartHandshakeAfterAuthCertificate' was not declared in this scope
(In reply to Wolfgang Rosenauer [:wolfiR] from comment #112)
> I've just tried to build FF11b3 against 3.13.2 RTM and Firefox fails to
> build:
> /usr/src/packages/BUILD/mozilla/security/manager/ssl/src/nsNSSIOLayer.cpp:
> 976:64: error: 'SSL_RestartHandshakeAfterAuthCertificate' was not declared
> in this scope

We are waiting for Brian to land bug 713934 into mozilla-beta (FF11).
(In reply to Kai Engert (:kaie) from comment #113)
> (In reply to Wolfgang Rosenauer [:wolfiR] from comment #112)
> > I've just tried to build FF11b3 against 3.13.2 RTM and Firefox fails to
> > build:
> > /usr/src/packages/BUILD/mozilla/security/manager/ssl/src/nsNSSIOLayer.cpp:
> > 976:64: error: 'SSL_RestartHandshakeAfterAuthCertificate' was not declared
> > in this scope
> 
> We are waiting for Brian to land bug 713934 into mozilla-beta (FF11).

Looks like this has happened : https://bugzilla.mozilla.org/show_bug.cgi?id=713934#c11
Comment on attachment 588934 [details] [diff] [review]
Update tests for SSL_AuthCertificateComplete to test changes to ssl_Poll

r+ but it may need updating based on comments about the overall frame work.
Attachment #588934 - Flags: superreview?(rrelyea) → superreview+
Comment on attachment 584626 [details] [diff] [review]
Loopback tests for SSL_AuthCertificateComplete [v2]

r+ with the following caveats:

1) please don't check in SSL_AuthCertComplete_send_alert.c.rej.
2) once the frame work is updated according to some of my earlier comments, this patch may need to be updated.
3) You define a CHECK_NOT() macro in the framework, but you pretty consistantly use CHECK(!condition) rather than CHECK_NOT(condition). The former is fine (and probably more readable), which begs the question why even bother with CHECK_NOT(condition).

bob
Attachment #584626 - Flags: review?(rrelyea) → review+
Comment on attachment 591695 [details] [diff] [review]
Change ssl_Poll so that it it indicates the socket is unwritable and unreadable when we are blocked waiting for an async callback to complete [v5] [checked in]

r+ relyea
Attachment #591695 - Flags: review?(rrelyea) → review+
Attachment #591695 - Attachment description: Change ssl_Poll so that it it indicates the socket is unwritable and unreadable when we are blocked waiting for an async callback to complete [v5] [checked in] [needs final review] → Change ssl_Poll so that it it indicates the socket is unwritable and unreadable when we are blocked waiting for an async callback to complete [v5] [checked in]
I think it's confusing when bugs are kept open for a long time, if the fix is only in, and the only remaining task are the tests.

I propose to mark this bug as "resolved fixed" with target milestone 3.13.3 and file a separate bug for the remaining work.
(In reply to Kai Engert (:kaie) from comment #118)
> I propose to mark this bug as "resolved fixed" with target milestone 3.13.3
> and file a separate bug for the remaining work.

I will make the new bug depend on this one.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Attachment #584626 - Attachment is obsolete: true
Attachment #584626 - Flags: superreview?(wtc)
Attachment #588934 - Attachment is obsolete: true
Attachment #588934 - Flags: review?(wtc)
Attachment #591696 - Flags: superreview?(wtc)
Attachment #584595 - Flags: superreview?(wtc)
Attachment #584622 - Flags: superreview?(wtc)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: