Closed Bug 638821 Opened 13 years ago Closed 13 years ago

Possible lock order inconsistency (potential deadlock) in SSL machinery

Categories

(NSS :: Libraries, defect, P2)

x86_64
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED
3.12.10

People

(Reporter: jseward, Assigned: wtc)

Details

Attachments

(1 file)

using Helgrinding setup described in bug 551155 comment 13 and later
(ignore preceding junk)

when loading http://blog.chromium.org/2011/01/html-video-codec-support-in-chrome.html

It appears there's a lock acquisition order inconsistency, leading to report below.

-------------------

From code inspection:

security/nss/lib/ssl/sslcon.c:3128:

3128	ssl_GetXmitBufLock(ss);    /***************************************/
3129	ssl_GetSSL3HandshakeLock(ss);

acquires XmitBufLock then SSL3HandshakeLock

vs

security/nss/lib/ssl/ssl3con.c:5700

5700    PORT_Assert( ss->opt.noLocks || ssl_HaveSSL3HandshakeLock(ss) );
5701
5702    if (ws != wait_hello_done  &&
5703        ws != wait_server_cert &&
5704	ws != wait_server_key  &&
5705	ws != wait_cert_request) {
5706	SSL3_SendAlert(ss, alert_fatal, unexpected_message);
5707	PORT_SetError(SSL_ERROR_RX_UNEXPECTED_HELLO_DONE);
5708	return SECFailure;
5709    }
5710
5711    ssl_GetXmitBufLock(ss);		/*******************************/

acquires SSL3HandshakeLock(implied) then XmitBufLock

hence potential deadlock

-------------------

Report:

Thread #21: lock order "0x25AB2D18 before 0x25AAFD48" violated
   at 0x4C2BC62: pthread_mutex_lock (/home/sewardj/VgTRUNK/trunk/helgrind/hg_intercepts.c:496)
   by 0x7A3E4F5: PR_Lock (nsprpub/pr/src/pthreads/ptsynch.c:208)
   by 0x7A3E60D: PR_EnterMonitor (nsprpub/pr/src/pthreads/ptsynch.c:591)
   by 0xBA36CB3: ssl3_HandleServerHelloDone (security/nss/lib/ssl/ssl3con.c:5711)
   by 0xBA3A9C6: ssl3_HandleHandshakeMessage (security/nss/lib/ssl/ssl3con.c:8632)
   by 0xBA3B6F6: ssl3_HandleRecord (security/nss/lib/ssl/ssl3con.c:8727)
   by 0xBA3BC86: ssl3_GatherCompleteHandshake (security/nss/lib/ssl/ssl3gthr.c:209)
   by 0xBA3DC96: ssl_GatherRecord1stHandshake (security/nss/lib/ssl/sslcon.c:1258)
   by 0xBA42A0A: ssl_Do1stHandshake (security/nss/lib/ssl/sslsecur.c:151)
   by 0xBA4369D: ssl_SecureSend (security/nss/lib/ssl/sslsecur.c:1213)
   by 0xBA468D4: ssl_Write (security/nss/lib/ssl/sslsock.c:1652)
   by 0x600FE95: nsSSLThread::Run() (security/manager/ssl/src/nsSSLThread.cpp:1045)

  Required order was established by acquisition of lock at 0x25AB2D18
   at 0x4C2BC62: pthread_mutex_lock (/home/sewardj/VgTRUNK/trunk/helgrind/hg_intercepts.c:496)
   by 0x7A3E4F5: PR_Lock (nsprpub/pr/src/pthreads/ptsynch.c:208)
   by 0x7A3E60D: PR_EnterMonitor (nsprpub/pr/src/pthreads/ptsynch.c:591)
   by 0xBA3D9FC: ssl2_BeginClientHandshake (security/nss/lib/ssl/sslcon.c:3128)
   by 0xBA42A0A: ssl_Do1stHandshake (security/nss/lib/ssl/sslsecur.c:151)
   by 0xBA4369D: ssl_SecureSend (security/nss/lib/ssl/sslsecur.c:1213)
   by 0xBA468D4: ssl_Write (security/nss/lib/ssl/sslsock.c:1652)
   by 0x600FE95: nsSSLThread::Run() (security/manager/ssl/src/nsSSLThread.cpp:1045)
   by 0x600F7D8: nsPSMBackgroundThread::nsThreadRunner(void*) (security/manager/ssl/src/nsPSMBackgroundThread.cpp:44)
   by 0x7A43A1C: _pt_root (nsprpub/pr/src/pthreads/ptthread.c:189)
   by 0x4C2CBE7: mythread_wrapper (/home/sewardj/VgTRUNK/trunk/helgrind/hg_intercepts.c:221)
   by 0x4E369C9: start_thread (/build/buildd/eglibc-2.11.1/nptl/pthread_create.c:300)

  followed by a later acquisition of lock at 0x25AAFD48
   at 0x4C2BC62: pthread_mutex_lock (/home/sewardj/VgTRUNK/trunk/helgrind/hg_intercepts.c:496)
   by 0x7A3E4F5: PR_Lock (nsprpub/pr/src/pthreads/ptsynch.c:208)
   by 0x7A3E60D: PR_EnterMonitor (nsprpub/pr/src/pthreads/ptsynch.c:591)
   by 0xBA3DA0E: ssl2_BeginClientHandshake (security/nss/lib/ssl/sslcon.c:3129)
   by 0xBA42A0A: ssl_Do1stHandshake (security/nss/lib/ssl/sslsecur.c:151)
   by 0xBA4369D: ssl_SecureSend (security/nss/lib/ssl/sslsecur.c:1213)
   by 0xBA468D4: ssl_Write (security/nss/lib/ssl/sslsock.c:1652)
   by 0x600FE95: nsSSLThread::Run() (security/manager/ssl/src/nsSSLThread.cpp:1045)
   by 0x600F7D8: nsPSMBackgroundThread::nsThreadRunner(void*) (security/manager/ssl/src/nsPSMBackgroundThread.cpp:44)
   by 0x7A43A1C: _pt_root (nsprpub/pr/src/pthreads/ptthread.c:189)
   by 0x4C2CBE7: mythread_wrapper (/home/sewardj/VgTRUNK/trunk/helgrind/hg_intercepts.c:221)
   by 0x4E369C9: start_thread (/build/buildd/eglibc-2.11.1/nptl/pthread_create.c:300)
Component: Networking: HTTP → Security
QA Contact: networking.http → toolkit
could this be a dup of 588698  ?
(In reply to comment #1)
> could this be a dup of bug 588698  ?

It's clearly very closely related, but I think it's different.

Bug 588698 comment 22 contains a patch which, amongst things,
documents the rank order of locks.  The relevant bit is

  Rank (order) of locks

  recvLock -> firstHandshake -> recvbuf -> ssl3Handshake -> xmitbuf -> "spec"
  sendLock ->

In this bug (comment 0), we have sslcon3.c acquiring locks in the order

   ssl3Handshake -> xmitBuf

which is in accordance with the documented order, but sslcon.c does

   xmitBufLock -> ssl3Handshake

so it seems (at least superficially) that sslcon.c is wrong.
Assignee: nobody → nobody
Component: Security → Libraries
Product: Core → NSS
QA Contact: toolkit → libraries
Version: Trunk → trunk
Julian: thank you for the bug report and analysis.

The bug in sslcon.c has been fixed on the NSS trunk (NSS 3.13)
in bug 588698 comment 22.

Before:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/ssl/sslcon.c&rev=1.40&mark=3128,3132#3128

After:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/ssl/sslcon.c&rev=1.41&mark=3129,3131#3128

For some reason (perhaps to minimize risk by only fixing the
bug that the customer reported), I didn't backport this fix
to the NSS_3_12_BRANCH for NSS 3.12.8.

Can you find out if you have SSL 2.0 enabled?  The code in sslcon.c
should only be used if SSL 2.0 is enabled.
Assignee: nobody → wtc
Status: NEW → ASSIGNED
Priority: -- → P2
Target Milestone: --- → 3.12.10
(In reply to comment #3)
> Can you find out if you have SSL 2.0 enabled?  The code in sslcon.c
> should only be used if SSL 2.0 is enabled.

How do I do that?  I know about finding threading errors and stuff like
that, but nothing about Fx's SSL code.
What's the value of security.enable_ssl2 ?
PreferenceName=security.enable_ssl2  Status=default  Type=boolean  Value=false

and I'm pretty sure this is the same profile as used when the inconsistency
was detected.
(In reply to comment #3)
> The bug in sslcon.c has been fixed on the NSS trunk (NSS 3.13)
> in bug 588698 comment 22.

FTR, the inconsistency in summary is:

sslcon.c:
3128	ssl_GetXmitBufLock(ss);    /***************************************/
3129	ssl_GetSSL3HandshakeLock(ss);

vs

ssl3con.c:
9053:    ssl_GetSSL3HandshakeLock(ss);
5711:    ssl_GetXmitBufLock(ss);		/*******************************/



> Can you find out if you have SSL 2.0 enabled?  The code in sslcon.c
> should only be used if SSL 2.0 is enabled.

SSL 2.0 is disabled, but sslcon.c is obviously being used.
I have the following pref settings:

security.enable_ssl2   false
security.enable_ssl3   true
security.enable_tls    true
I found two more instances of the same thing:

sslcon.c:
3128	ssl_GetXmitBufLock(ss);    /***************************************/
3129	ssl_GetSSL3HandshakeLock(ss);

vs

ssl3con.c
9053    ssl_GetSSL3HandshakeLock(ss);
8391    ssl_GetXmitBufLock(ss);	/*************************************/


and

sslcon.c:
3128	ssl_GetXmitBufLock(ss);    /***************************************/
3129	ssl_GetSSL3HandshakeLock(ss);

vs

sslsock.c:
466    ssl_GetSSL3HandshakeLock(ss);
467    ssl_GetXmitBufLock(ss);


From this it is clear that sslcon.c:3128/9 are disagrees with at least
three other lock-pair acquisition sequences.  From which I'd guess the
fix mentioned in comment #3 is the right fix.
Attached patch patch, version 1Splinter Review
A copy of sslcon.c rev 1.41, that makes the sslcon.c lock order
agree with everyone else.  With this in place I can't detect any
lock order violations in the ssl machinery any more.
Attachment #522954 - Flags: review?(wtc)
Comment on attachment 522954 [details] [diff] [review]
patch, version 1

r=wtc.

Patch checked in on the NSS_3_12_BRANCH (NSS 3.12.10).

Checking in sslcon.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslcon.c,v  <--  sslcon.c
new revision: 1.40.2.1; previous revision: 1.40
done

Note: this patch is already in the NSS trunk (NSS 3.13)
as part of the patch (attachment 470102 [details] [diff] [review]) in bug 588698.
Attachment #522954 - Flags: review?(wtc)
Attachment #522954 - Flags: review+
Just FYI: I figured out why we always start the handshake
in ssl2_BeginClientHandshake.  That is set up by the
SSL_ResetHandshake call:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/ssl/sslsecur.c&rev=1.48&mark=220,243,247#217

In the first ssl2_BeginClientHandshake call, if SSL 2.0
is disabled, we switch to ssl_GatherRecord1stHandshake
as the handshake function:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/ssl/sslcon.c&rev=1.41&mark=3119-3120,3123,3126,3130#3119

and we use ssl_GatherRecord1stHandshake from that point on.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: