Last Comment Bug 712363 - SSL connections do not work correctly when not managed by the socket transport service (including especially when the socket is not non-blocking)
: SSL connections do not work correctly when not managed by the socket transpor...
Status: RESOLVED FIXED
[inbound]
:
Product: Core
Classification: Components
Component: Security: PSM (show other bugs)
: Trunk
: All All
: P1 critical (vote)
: mozilla13
Assigned To: Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
:
Mentors:
: 704984 (view as bug list)
Depends on:
Blocks: 674147 675221 704984 705773 708813 711787
  Show dependency treegraph
 
Reported: 2011-12-20 10:30 PST by Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
Modified: 2013-09-16 06:08 PDT (History)
22 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
verified


Attachments
Part 1: Make SyncRunnableBase work without deadlocking when run on the main thread (3.04 KB, patch)
2011-12-21 22:18 PST, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
brian: review+
Details | Diff | Review
Part 2: Fix certificate verification for blocking sockets and sockets not managed by the socket transport service (15.62 KB, patch)
2011-12-21 22:19 PST, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
no flags Details | Diff | Review
Part 3: Move cert override logic to SSLServerCertVerification.cpp, r?honzab (46.48 KB, patch)
2011-12-21 22:21 PST, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
honzab.moz: review+
Details | Diff | Review
Part 4: Fix cert error overrides for blocking sockets and sockets not managed by the socket transport service (8.67 KB, patch)
2011-12-21 22:22 PST, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
no flags Details | Diff | Review
Part 5: Move SSLServerCertVerificationResult to SSLServerCertVerification.cpp (3.94 KB, patch)
2011-12-22 02:51 PST, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
no flags Details | Diff | Review
Part 5: Move SSLServerCertVerificationResult to SSLServerCertVerification.cpp [v2] (4.71 KB, patch)
2011-12-22 03:15 PST, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
honzab.moz: review+
Details | Diff | Review
Part 2: Fix certificate verification for blocking sockets and sockets not managed by the socket transport service (15.62 KB, patch)
2011-12-22 15:32 PST, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
no flags Details | Diff | Review
Part 1: Make SyncRunnableBase work without deadlocking when run on the main thread (3.04 KB, patch)
2011-12-22 15:36 PST, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
brian: review+
honzab.moz: feedback-
Details | Diff | Review
Part 1: Make SyncRunnableBase work without deadlocking when run on the main thread [v2] (3.04 KB, patch)
2012-01-20 00:32 PST, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
honzab.moz: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Review
Part 2: Move CertErrorRunnable, SSLServerCertVerificationResult, and HandleBadCertificate to SSLServerCertVerification.cpp (36.97 KB, patch)
2012-01-21 17:05 PST, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
honzab.moz: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Review
[NOT FOR CHECKIN] Script to help verify that there were no semantic changes in Part 2 (1.04 KB, text/plain)
2012-01-21 17:07 PST, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
no flags Details
Part 3: Correct the documentation of how threading works (4.77 KB, patch)
2012-01-21 17:12 PST, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
honzab.moz: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Review
Part 4: Split the creation of CertErrorRunnable from the dispatching of it (10.28 KB, patch)
2012-01-21 17:35 PST, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
no flags Details | Diff | Review
Part 4: Split the creation of CertErrorRunnable from the dispatching of it (15.24 KB, patch)
2012-01-21 18:07 PST, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
honzab.moz: review+
Details | Diff | Review
Part 5: Part 5: Add back synchronous cert validation for Thunderbird (14.32 KB, patch)
2012-01-21 18:08 PST, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
honzab.moz: review+
Details | Diff | Review
Part 4: Split the creation of CertErrorRunnable from the dispatching of it (15.25 KB, patch)
2012-01-22 16:54 PST, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
brian: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Review
Part 5: Add back synchronous cert validation for Thunderbird (14.47 KB, patch)
2012-01-22 16:56 PST, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
brian: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Review

Description Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-12-20 10:30:41 PST
+++ This bug was initially created as a clone of Bug #704984 +++

(In reply to Honza Bambas (:mayhemer) from Bug #704984 comment #54)
> - PSMRunnable (SyncRunnableBase) allowed to be dispatched also from the main
> thread (calls its operation method directly)
> - NS_IsMainThread() special handling in PK11PasswordPrompt no longer
> necessary

OK.

> - SSLServerCertVerificationJob remembers the thread it has to dispatch the
> result to (= the thread where the socket is being called on)
> - nsNSSSocketInfo::GetPreviousCert can now be called on the main thread too
> - when send operation on a blocking socket returns would-block then if we
> are waiting for cert verify, block by looping the event queue and then retry
> the send again

There is a better fix. When the current thread is not the socket transport thread, then do not use async. certificate path validation. That is, instead of dispatching a SSLServerCertVerificationJob and returning SECWouldBlock, just call AuthCertificate directly. This will work for blocking sockets too. Async. cert validation is only needed for the socket transport thread.

Yesterday (last night), I also investigated the technique you propose in this patch, having the certificate validation job remembering the thread it is dispatched from, and then returning to it. The problem is that there is another assumption that the async cert validation code makes: it assumes that it can interrupt any PR_Poll (any blocking socket I/O operation) by dispatching an event to that thread. It isn't clear to me that this is (always) going to work for anything except the socket transport thread.

> The true fix should be made in NSS (not for today): when the socket is set
> blocking, we must never return would-block since it will be handled as a
> fatal error by the calling layer.  No idea how to achieve this  This
> presumption might be wrong for the ssl socket, though..

The cert auth hook must never return SECWouldBlock for a blocking socket.
Comment 1 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-12-20 10:33:23 PST
I will submit two patches:

Patch #1 will make SSL connections work correctly in these cases, but cert overrides won't work.
Patch #2 will make the cert overrides work.

Patch #1 will be ready shortly. Patch #2 will take a little more time.
Comment 2 Honza Bambas (:mayhemer) 2011-12-20 14:05:42 PST
(In reply to Brian Smith (:bsmith) from comment #0)
> There is a better fix. When the current thread is not the socket transport
> thread, then do not use async. certificate path validation. That is, instead
> of dispatching a SSLServerCertVerificationJob and returning SECWouldBlock,
> just call AuthCertificate directly. This will work for blocking sockets too.
> Async. cert validation is only needed for the socket transport thread.

Are you sure about this?  When you will do this on the main thread, then how will cert verification process OCSP requests that must go though the main thread able to process events?
Comment 3 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-12-20 17:45:12 PST
(In reply to Honza Bambas (:mayhemer) from comment #2)
> (In reply to Brian Smith (:bsmith) from comment #0)
> > Async. cert validation is only needed for the socket transport thread.
> 
> Are you sure about this?  When you will do this on the main thread, then how
> will cert verification process OCSP requests that must go though the main
> thread able to process events?

http://mxr.mozilla.org/mozilla-central/source/security/manager/ssl/src/nsNSSCallbacks.cpp?rev=cf0b31ff2b6d#401

I am not sure yet. But, our code for fetching OCSP responses has special support for running on the main thread.
Comment 4 Honza Bambas (:mayhemer) 2011-12-21 02:02:36 PST
(In reply to Brian Smith (:bsmith) from comment #3)
> http://mxr.mozilla.org/mozilla-central/source/security/manager/ssl/src/
> nsNSSCallbacks.cpp?rev=cf0b31ff2b6d#401
> 
> I am not sure yet. But, our code for fetching OCSP responses has special
> support for running on the main thread.

Nice, but: http://mxr.mozilla.org/mozilla-central/source/security/manager/ssl/src/nsNSSCallbacks.cpp?rev=cf0b31ff2b6d#386
Comment 5 Honza Bambas (:mayhemer) 2011-12-21 02:03:56 PST
BTW, with my patch from bug 704984:

  >	xul.dll!nsHTTPDownloadEvent::Run()  Line 100	C++
 	xul.dll!nsThread::ProcessNextEvent(bool mayWait=true, bool * result=0x003a7277)  Line 625 + 0x19 bytes	C++
 	xul.dll!NS_ProcessNextEvent_P(nsIThread * thread=0x007d7d88, bool mayWait=true)  Line 245 + 0x17 bytes	C++
 	xul.dll!nsThread::Shutdown()  Line 483 + 0xb bytes	C++
 	xul.dll!NS_InvokeByIndex_P(nsISupports * that=0x062eb400, unsigned int methodIndex=6, unsigned int paramCount=0, nsXPTCVariant * params=0x00000000)  Line 103	C++
 	xul.dll!nsProxyObjectCallInfo::Run()  Line 182 + 0x2d bytes	C++
 	xul.dll!nsThread::ProcessNextEvent(bool mayWait=true, bool * result=0x003a7363)  Line 625 + 0x19 bytes	C++
 	xul.dll!NS_ProcessNextEvent_P(nsIThread * thread=0x007d7d88, bool mayWait=true)  Line 245 + 0x17 bytes	C++
 	xul.dll!nsNSSSocketInfo::BlockOnCertVerification(int bytesWritten=-1)  Line 1028 + 0xd bytes	C++
 	xul.dll!PSMSend(PRFileDesc * fd=0x008337e0, const void * buf=0x0b2b34bc, int amount=43, int flags=0, unsigned int timeout=10000)  Line 2267 + 0xc bytes	C++
 	nspr4.dll!PR_Send(PRFileDesc * fd=0x008337e0, const void * buf=0x0b2b34bc, int amount=43, int flags=0, unsigned int timeout=10000)  Line 226 + 0x1e bytes	C
 	nsldappr32v60.dll!prldap_write(int s=1, const void * buf=0x0b2b34bc, int len=43, lextiof_socket_private * socketarg=0x0b27bb88)  Line 218 + 0x1a bytes	C
 	nsldap32v60.dll!ber_flush(sockbuf * sb=0x0b1b8a48, berelement * ber=0x0b2b3390, int freeit=0)  Line 439 + 0x26 bytes	C
 	nsldap32v60.dll!nsldapi_send_ber_message(ldap * ld=0x0b1b8858, sockbuf * sb=0x0b1b8a48, berelement * ber=0x0b2b3390, int freeit=0, int epipe_handler=1)  Line 473 + 0x11 bytes	C
 	nsldap32v60.dll!nsldapi_send_server_request(ldap * ld=0x0b1b8858, berelement * ber=0x0b2b3390, int msgid=1, ldapreq * parentreq=0x00000000, ldap_server * srvlist=0x00000000, ldap_conn * lc=0x0b1cf098, char * bindreqdn=0x0b2c0778, int bind=0)  Line 342 + 0x17 bytes	C
 	nsldap32v60.dll!nsldapi_send_initial_request(ldap * ld=0x0b1b8858, int msgid=1, unsigned long msgtype=96, char * dn=0x0b1b0890, berelement * ber=0x0b2b3390)  Line 148 + 0x2b bytes	C
 	nsldap32v60.dll!simple_bind_nolock(ldap * ld=0x0b1b8858, const char * dn=0x0b1b0890, const char * passwd=0x0b1c99d0, int unlock_permitted=1)  Line 153 + 0x17 bytes	C
 	nsldap32v60.dll!ldap_simple_bind(ldap * ld=0x0b1b8858, const char * dn=0x0b1b0890, const char * passwd=0x0b1c99d0)  Line 82 + 0x13 bytes	C
 	xul.dll!nsLDAPOperation::SimpleBind(const nsACString_internal & passwd={...})  Line 322 + 0x30 bytes	C++
 	xul.dll!nsAbLDAPListenerBase::OnLDAPInit(nsILDAPConnection * aConn=0x0b1b0b70, unsigned int aStatus=0)  Line 302 + 0x35 bytes	C++
 	xul.dll!nsLDAPConnection::OnLookupComplete(nsICancelable * aRequest=0x0b1b7814, nsIDNSRecord * aRecord=0x0b0d4450, unsigned int aStatus=0)  Line 663	C++
Comment 6 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-12-21 09:11:14 PST
(In reply to Honza Bambas (:mayhemer) from comment #4)
> Nice, but:
> http://mxr.mozilla.org/mozilla-central/source/security/manager/ssl/src/
> nsNSSCallbacks.cpp?rev=cf0b31ff2b6d#386

Yes, that will queue an event on the main thread. Then, later, if we are on the main thread, we will run the event loop and then process that event (among others), right?
Comment 7 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-12-21 22:18:34 PST
Created attachment 583723 [details] [diff] [review]
Part 1: Make SyncRunnableBase work without deadlocking when run on the main thread
Comment 8 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-12-21 22:19:15 PST
Created attachment 583724 [details] [diff] [review]
Part 2: Fix certificate verification for blocking sockets and sockets not managed by the socket transport service
Comment 9 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-12-21 22:21:37 PST
Created attachment 583725 [details] [diff] [review]
Part 3: Move cert override logic to SSLServerCertVerification.cpp, r?honzab

This patch simply moves the cert override handling from nsNSSIOLayer.cpp to SSLServerCertVerification.cpp. I did this because I thought I was going to need to make many changes to it. However, I realized later that I hardly had to change anything with the cert override handling. But this was a change that Honza had previously recommended anyway, so I left it in. So, if you want, I can redo the next patch so that it doesn't depend on this one.
Comment 10 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-12-21 22:22:27 PST
Created attachment 583726 [details] [diff] [review]
Part 4: Fix cert error overrides for blocking sockets and sockets not managed by the socket transport service
Comment 11 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-12-21 22:23:41 PST
Note that Part 1 is just the SyncRunnableBase changes from Honza's patch in bug 704984 attachment 583184 [details] [diff] [review].
Comment 12 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-12-21 22:35:53 PST
tested this in the following horrible way:

Horrible Test #1
----------------
This tests verifies that SSL server cert verification, including OCSP fetching, works correctly on the main thread.

1. I configured a new LDAP account with www.paypal.com as the server, on port 443, using SSL.
2. I set breakpoints in a couple of places in internal_send_receive_attempt.
3. I configured Thunderbird to use this "LDAP" server for autocompletion in Options -> Addressing -> Directory Server.
4. I started composing a new email.
5. I started typing an email address in the "To:" field.
6. My breakpoints in internal_send_receive_attempt were hit.
7. I stepped through the code to verify that everything was working as I expected.
8. Of course, later the bind fails, since www.paypal.com isn't really an LDAP server.

Horrible Test #2
----------------
1. I reconfigured the LDAP settings to use addressbook.mozilla.com with correct settings, using SSL.
2. I started composing a new email
3. I started typing an email address in the "To:" field.
4. I verified that address looks were working correctly.

Tomorrow, I will try to create some xpcshell tests that verify this in a less hacky, and automated way. It won't test LDAP, but it will test some other protocol on the main thread that uses OCSP.
Comment 13 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-12-22 02:51:33 PST
Created attachment 583748 [details] [diff] [review]
Part 5: Move SSLServerCertVerificationResult to SSLServerCertVerification.cpp

This should have been part of Part 3. Once the cert override logic is moved to SSLServerCertVerification.cpp, SSLServerCertVerificationResult doesn't need to be exposed outside of that file anymore.
Comment 14 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-12-22 03:15:05 PST
Created attachment 583751 [details] [diff] [review]
Part 5: Move SSLServerCertVerificationResult to SSLServerCertVerification.cpp [v2]
Comment 15 Honza Bambas (:mayhemer) 2011-12-22 15:13:15 PST
Brian, p2 patch could not be applied to comm-cetral/mozilla.  Please rebase the patches to what the current client.py from comm-cetral checks out.  I want to test the patches with my _working_ LDAPS OCSP'ed server.
Comment 16 Honza Bambas (:mayhemer) 2011-12-22 15:20:34 PST
Comment on attachment 583723 [details] [diff] [review]
Part 1: Make SyncRunnableBase work without deadlocking when run on the main thread

Review of attachment 583723 [details] [diff] [review]:
-----------------------------------------------------------------

I can't superreview my own work.
Comment 17 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-12-22 15:32:58 PST
Created attachment 583955 [details] [diff] [review]
Part 2: Fix certificate verification for blocking sockets and sockets not managed by the socket transport service
Comment 18 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-12-22 15:36:24 PST
Created attachment 583957 [details] [diff] [review]
Part 1: Make SyncRunnableBase work without deadlocking when run on the main thread
Comment 19 Honza Bambas (:mayhemer) 2011-12-22 15:57:02 PST
Patches now apply and fixes the LDAP issue for me.
Comment 20 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-12-22 16:35:12 PST
Comment on attachment 583957 [details] [diff] [review]
Part 1: Make SyncRunnableBase work without deadlocking when run on the main thread

Honza, I originally asked you for SR for this patch just because I wanted you to look over the things I omitted from your patch. I don't think an SR is needed for this.
Comment 21 Honza Bambas (:mayhemer) 2012-01-05 03:51:49 PST
You are missing the changes to CertErrorRunnable::Run().  When the LDAP server has a bad certificate, you don't set the error on the socket, so you silently add an exception for all bad certificates when the cert verification doesn't happen on the socket thread, quit bad.  The reason is missing call to SSLServerCertVerificationResult::Run() that sets the result of verification on the socket (and cancels it when cert is bad).

Just be aware that you should prevent call to SSL_RestartHandshakeAfterAuthCertificate synchronously from the AuthCertificateHook as you could renter the locks, but you know this better then me.

During the review (not yet published) I proposed to have a static function that verifies and if bad, handles the bad cert since you, more or less, duplicate the code of SSLServerCertVerificationJob::Run() in AuthCertificateHook.  Not good since it may introduce bugs like described above.

Setting the verification result on the socket could be done in that method so you may leave CertErrorRunnable simpler.


Comm-beta displays the bad cert info dialog and fails the load.  Comm-central w/o the patches obviously deadlocks.
Comment 22 Honza Bambas (:mayhemer) 2012-01-05 03:52:59 PST
Comment on attachment 583955 [details] [diff] [review]
Part 2: Fix certificate verification for blocking sockets and sockets not managed by the socket transport service

Review of attachment 583955 [details] [diff] [review]:
-----------------------------------------------------------------

Dropping r flag, this is just a pre-review anyway.

::: security/manager/ssl/src/SSLServerCertVerification.cpp
@@ +455,5 @@
>  {
> +  // Runs on a certificate verification thread if the verifying a certificate 
> +  // for a connection on the socket transport thread; otherwise, runs on the
> +  // thread doing the network I/O, which may (but shouldn't) be the main
> +  // thread.

This comment needs a wording check.

The "but shouldn't" seems redundant to me here.  This patch is fixing exactly this issue.  Now any thread can call on cert verification.

@@ +734,4 @@
>  
>    nsNSSSocketInfo *socketInfo = static_cast<nsNSSSocketInfo*>(arg);
> +
> +  bool onSTSThread;

onNetworkThread ?

@@ +747,5 @@
> +    PR_SetError(PR_UNKNOWN_ERROR, 0);
> +    return SECFailure;
> +  }
> +  
> +  SECStatus rv;

s/rv/status/ ?

@@ +761,5 @@
> +    // thread doing the network I/O may not interrupt its network I/O on receipt
> +    // of our SSLServerCertVerificationResult event, and/or it might not even be
> +    // a non-blocking socket.
> +    rv = AuthCertificate(socketInfo, serverCert);
> +    // XXX: cert overrides are not handled!

You should add to this comment that we also block the read/write op until the cert verification is done regardless the socket is blocking or non-blocking.
Comment 23 Honza Bambas (:mayhemer) 2012-01-05 03:53:15 PST
Comment on attachment 583726 [details] [diff] [review]
Part 4: Fix cert error overrides for blocking sockets and sockets not managed by the socket transport service

Review of attachment 583726 [details] [diff] [review]:
-----------------------------------------------------------------

Same here.

::: security/manager/ssl/src/SSLServerCertVerification.cpp
@@ +281,5 @@
>  // SSLServerCertVerificationResult with the error code from PR_GetError().
>  SECStatus
>  HandleBadCertificate(PRErrorCode defaultErrorCodeToReport,
> +                    nsNSSSocketInfo * socketInfo, CERTCertificate * cert,
> +                    const void * fdForLogging, bool isAsync)

Indention

@@ +456,2 @@
>    }
>    if (NS_FAILED(nrv)) {

Blank line before this if () { line please.

@@ +1251,5 @@
> +      if (rv != SECSuccess && PR_GetError() == 0) {
> +        NS_ERROR("AuthCertificate or HandleBadCertificate did not set error code");
> +        PR_SetError(PR_UNKNOWN_ERROR, 0);
> +      }
> +    }

This code now duplicates in SSLServerCertVerificationJob::Run().  

You should create a static method containing this code that SSLServerCertVerificationJob::Run() would be using.
Comment 24 Honza Bambas (:mayhemer) 2012-01-05 03:53:59 PST
Comment on attachment 583725 [details] [diff] [review]
Part 3: Move cert override logic to SSLServerCertVerification.cpp, r?honzab

Review of attachment 583725 [details] [diff] [review]:
-----------------------------------------------------------------

I hate checking these patches.. :)

r=honzab.
Comment 25 Honza Bambas (:mayhemer) 2012-01-05 03:55:52 PST
Comment on attachment 583751 [details] [diff] [review]
Part 5: Move SSLServerCertVerificationResult to SSLServerCertVerification.cpp [v2]

Review of attachment 583751 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab
Comment 26 Honza Bambas (:mayhemer) 2012-01-11 11:56:34 PST
Comment on attachment 583957 [details] [diff] [review]
Part 1: Make SyncRunnableBase work without deadlocking when run on the main thread

The final version of this patch will need to land on beta channel as well.  See bug 708813 comment 25.
Comment 27 Mark Banner (:standard8) 2012-01-16 03:42:31 PST
Are we close to landing this?

I'm especially concerned about part 1 landing on the beta channel in time for release - as otherwise some people will have broken LDAP in Thundebird. I'm not sure what the effects could be for Firefox users (if any).
Comment 28 Mike Conley (:mconley) - (Away until June 29th) 2012-01-19 10:47:32 PST
Ping
Comment 29 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-01-20 00:32:38 PST
Created attachment 590132 [details] [diff] [review]
Part 1: Make SyncRunnableBase work without deadlocking when run on the main thread [v2]

This is the same patch as the first version but with a whitespace change. I understand that there needs to be a change to the way CertErrorRunnable::Run works for the non-STS case, but in other patches in this series I have changed how CertErrorRunnable works for other reasons.
Comment 30 Honza Bambas (:mayhemer) 2012-01-20 13:44:32 PST
Comment on attachment 590132 [details] [diff] [review]
Part 1: Make SyncRunnableBase work without deadlocking when run on the main thread [v2]

I'm afraid I cannot review this until I also see the remaining changes to be made and have a complete set of patches to test.
Comment 31 Mark Banner (:standard8) 2012-01-20 14:41:13 PST
(In reply to Honza Bambas (:mayhemer) from comment #30)
> Comment on attachment 590132 [details] [diff] [review]
> Part 1: Make SyncRunnableBase work without deadlocking when run on the main
> thread [v2]
> 
> I'm afraid I cannot review this until I also see the remaining changes to be
> made and have a complete set of patches to test.

Doesn't bug 708813 just require this patch though?
Comment 32 Honza Bambas (:mayhemer) 2012-01-21 12:24:18 PST
(In reply to Mark Banner (:standard8) from comment #31)
> (In reply to Honza Bambas (:mayhemer) from comment #30)
> > Comment on attachment 590132 [details] [diff] [review]
> > Part 1: Make SyncRunnableBase work without deadlocking when run on the main
> Doesn't bug 708813 just require this patch though?

Yes, that's right.  This patch in particular should land only and only on beta.

I'll copy the patch to that bug and r+ it.

But I still don't think it is sufficient for this bug until the remaining patches get updated and I can test them as a complete set.
Comment 33 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-01-21 17:05:02 PST
Created attachment 590524 [details] [diff] [review]
Part 2: Move CertErrorRunnable, SSLServerCertVerificationResult, and HandleBadCertificate to SSLServerCertVerification.cpp

Honza, I wrote a script to help you review this, which I will attach next.
Comment 34 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-01-21 17:07:37 PST
Created attachment 590525 [details]
[NOT FOR CHECKIN] Script to help verify that there were no semantic changes in Part 2

Here is the script to help review Part 2. That is a big patch but it is just moving code and fiddling with #includes and minor declarations that support the moved code.
Comment 35 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-01-21 17:12:58 PST
Created attachment 590526 [details] [diff] [review]
Part 3: Correct the documentation of how threading works
Comment 36 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-01-21 17:35:39 PST
Created attachment 590528 [details] [diff] [review]
Part 4: Split the creation of CertErrorRunnable from the dispatching of it
Comment 37 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-01-21 18:07:38 PST
Created attachment 590529 [details] [diff] [review]
Part 4: Split the creation of CertErrorRunnable from the dispatching of it
Comment 38 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-01-21 18:08:57 PST
Created attachment 590530 [details] [diff] [review]
Part 5: Part 5: Add back synchronous cert validation for Thunderbird
Comment 39 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-01-21 20:20:39 PST
Tryserver for Firefox seems to be broken right now.

Thunderbird builds with these patches are currently building on cc-try:
http://build.mozillamessaging.com/tinderboxpushlog/?tree=ThunderbirdTry&rev=8f9793ec4860
Comment 40 Honza Bambas (:mayhemer) 2012-01-22 09:17:09 PST
Comment on attachment 590132 [details] [diff] [review]
Part 1: Make SyncRunnableBase work without deadlocking when run on the main thread [v2]

Review of attachment 590132 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab for landing on m-c and r/m-a and actually as well as r/m-b (marked then in a different bug).
Comment 41 Honza Bambas (:mayhemer) 2012-01-22 09:17:20 PST
Comment on attachment 590524 [details] [diff] [review]
Part 2: Move CertErrorRunnable, SSLServerCertVerificationResult, and HandleBadCertificate to SSLServerCertVerification.cpp

Review of attachment 590524 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the script, Brian!  Very helpful.
r=honzab
Comment 42 Honza Bambas (:mayhemer) 2012-01-22 09:17:31 PST
Comment on attachment 590526 [details] [diff] [review]
Part 3: Correct the documentation of how threading works

Review of attachment 590526 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab
Comment 43 Honza Bambas (:mayhemer) 2012-01-22 09:17:44 PST
Comment on attachment 590529 [details] [diff] [review]
Part 4: Split the creation of CertErrorRunnable from the dispatching of it

Review of attachment 590529 [details] [diff] [review]:
-----------------------------------------------------------------

This really is nice.

r=honzab

::: security/manager/ssl/src/SSLServerCertVerification.cpp
@@ +358,3 @@
>  }
>  
> +// Returns null with the error code set if it does not create the

"with the PR error code" ?

@@ +508,5 @@
>    }
>  
>    socketInfo->SetStatusErrorBits(*nssCert, collected_errors);
> +  nsSSLIOLayerHelpers::mHostsWithCertErrors->RememberCertHasError(
> +    socketInfo, socketInfo->SSLStatus(), SECFailure);

You forgot to remove this call from SetStatusErrorBits if you still believe it is a good idea to call this separately.

@@ +526,5 @@
> +// which may call nsHttpConnection::GetInterface() through
> +// nsNSSSocketInfo::mCallbacks. nsHttpConnection::GetInterface must always
> +// execute on the main thread, with the socket transport service thread
> +// blocked.
> +class CertErrorRunnableRunnable : public nsRunnable

Interesting solution.

@@ +540,5 @@
> +    nsresult rv = mCertErrorRunnable->DispatchToMainThreadAndWait();
> +    // The result must run on the socket transport thread, which we are already
> +    // on, so we can just run it directly, instead of dispatching it.
> +    if (NS_SUCCEEDED(rv)) {
> +      rv = mCertErrorRunnable->mResult->Run();

Maybe worth to also check on presence of mResult?  Regardless of the MOZ_ASSERT in RunOnTargetThread.

You have a similar check in AuthCertificateHook in Part 5 patch.

Actually, there used to be a code to always create a result.  But it was just a corner case insurance..
Comment 44 Honza Bambas (:mayhemer) 2012-01-22 09:17:53 PST
Comment on attachment 590530 [details] [diff] [review]
Part 5: Part 5: Add back synchronous cert validation for Thunderbird

Review of attachment 590530 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab

::: security/manager/ssl/src/SSLServerCertVerification.cpp
@@ -1119,5 @@
>    // doing verification without checking signatures.
> -  NS_ASSERTION(checkSig, "AuthCertificateHook: checkSig unexpectedly false");
> -
> -  // PSM never causes libssl to call this function with PR_TRUE for isServer,
> -  // and many things in PSM assume that we are a client.

Shouldn't be this removal in a different bug?

@@ +1204,5 @@
> +    error = PR_UNKNOWN_ERROR;
> +  }
> +
> +  PR_SetError(error, 0);
> +  return SECFailure;

Again, this is a code very much similar to the code in SSLServerCertVerificationJob::Run().  We should have a followup to try to stick this code in one place.

For now, since it is that late, let's go with this, since it works well.
Comment 45 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-01-22 16:54:47 PST
Created attachment 590608 [details] [diff] [review]
Part 4: Split the creation of CertErrorRunnable from the dispatching of it

Addresses Honza's review comments.
Comment 46 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-01-22 16:56:17 PST
Created attachment 590610 [details] [diff] [review]
Part 5: Add back synchronous cert validation for Thunderbird

Addresses Honza's review comments.
Comment 47 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-01-23 12:04:16 PST
It seems people are overlooking that this bug also depends on bug 706955 being fixed. There is a patch in bug 706955 awaiting review. I have now just changed the reviewer.
Comment 48 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-01-23 22:36:16 PST
Here are the tryserver results for mozilla-central, for these patches and a bunch more related ones:
https://tbpl.mozilla.org/?tree=Try&rev=b63de8572b37
Comment 51 Mark Banner (:standard8) 2012-02-01 14:37:28 PST
Comment on attachment 590132 [details] [diff] [review]
Part 1: Make SyncRunnableBase work without deadlocking when run on the main thread [v2]

[Approval Request Comment]
This has landed on mozilla-central, it narrowly missed landing on mozilla-aurora by a few hours, when we were expecting it to get into mozilla-central.

We could really do with this on mozilla-aurora so that LDAP is no longer broken there, and our testers (that include quite a few LDAP users) can use it again. Doing a one-off branch is not really viable for Thunderbird & SeaMonkey on mozilla-aurora.

Regression caused by (bug #): Various PSM changes
User impact if declined: LDAP over SSL hangs for Thunderbird & SeaMonkey 
Testing completed (on m-c, etc.): Landed on mozilla-central, now in nightlies, it is being released in Thunderbird 11 Beta via a special relbranches for each beta and then for final release. 
Risk to taking this patch (and alternatives if risky): Changes in this patch do affect Firefox, as said previously, this only missed the merge by a few hours. This will hopefully be somewhat mitigated by the extra testing received by the Betas.
String changes made by this patch: None.
Comment 52 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-02-01 16:25:52 PST
(In reply to Mark Banner (:standard8) from comment #51)
> We could really do with this on mozilla-aurora so that LDAP is no longer
> broken there, and our testers (that include quite a few LDAP users) can use
> it again. Doing a one-off branch is not really viable for Thunderbird &
> SeaMonkey on mozilla-aurora.

I agree. We should land this on mozilla-aurora ASAP.

> Regression caused by (bug #): Various PSM changes

bug 674147 and maybe bug 675221
Comment 53 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-02-01 17:04:50 PST
Did not intend to set target milestone.
Comment 54 Alex Keybl [:akeybl] 2012-02-02 07:07:24 PST
Comment on attachment 590132 [details] [diff] [review]
Part 1: Make SyncRunnableBase work without deadlocking when run on the main thread [v2]

[Triage Comment]
Missed the merge by a few hours - approved for Aurora 12.
Comment 55 Honza Bambas (:mayhemer) 2012-02-02 07:21:06 PST
*** Bug 704984 has been marked as a duplicate of this bug. ***
Comment 57 Henrik Skupin (:whimboo) 2012-02-27 03:38:20 PST
Thanks for fixing it for Firefox 12. I can no longer reproduce the freeze with Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:12.0a2) Gecko/20120226 Thunderbird/12.0a2 ID:20120226030023

Note You need to log in before you can comment on or make changes to this bug.