Open Bug 615342 Opened 9 years ago Updated 3 years ago

nsHttpConnection::GetInterface races against Socket Thread

Categories

(Core :: Networking: HTTP, defect, P3)

defect

Tracking

()

People

(Reporter: mcmanus, Unassigned)

References

Details

(Whiteboard: [necko-backlog])

originally commented on in bug 255141

nsHttpConnection::GetInterface is meant to be invoked from the UI thread via synchronous proxy from the transport thread. This stops the transport thread from executing during that time, avoiding race conditions in GetInterface() or GetSecurityCallbacks() which it invokes.

This is an existing comment from nsHttpConnection::GetInterface :

    // NOTE: This function is only called on the UI thread via sync proxy from
    //       the socket transport thread.  If that weren't the case, then we'd
    //       have to worry about the possibility of mTransaction going away
    //       part-way through this function call.  

However, that isn't what generally happens. nsHttpConnection::GetInterface() was indeed being executed on the UI thread, but the corresponding proxy call was being made from security/manager/ssl/src/nsNSSIOLayer.cpp:384 on the  nsPSMBackgroundThread which means the socket thread could still be doing something dangerous:

0  pthread_cond_wait@@GLIBC_2.3.2 () at
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007ffff3c6ad2a in PR_WaitCondVar (cvar=0x7fffce8b0b80,
timeout=4294967295) at
/home/mcmanus/src/mozilla2/wd/pipemq/nsprpub/pr/src/pthreads/ptsynch.c:417
#2  0x00007ffff3c6b4bc in PR_Wait (mon=0x7fffcf199c80, timeout=4294967295) at
/home/mcmanus/src/mozilla2/wd/pipemq/nsprpub/pr/src/pthreads/ptsynch.c:614
#3  0x00007ffff4e916c4 in nsAutoMonitor::Wait (this=0x7fffe03f7ea0,
interval=4294967295) at ../../../dist/include/nsAutoLock.h:346
#4  0x00007ffff64f2ad9 in nsEventQueue::GetEvent (this=0x7fffd052c6b0,
mayWait=1, result=0x7fffe03f7f60) at
/home/mcmanus/src/mozilla2/wd/pipemq/xpcom/threads/nsEventQueue.cpp:85
[...]
#27 0x00007ffff5efa64b in nsSSLThread::Run (this=0x7fffd9bdbdc0) at
/home/mcmanus/src/mozilla2/wd/pipemq/security/manager/ssl/src/nsSSLThread.cpp:1045
#28 0x00007ffff5ef84d7 in nsPSMBackgroundThread::nsThreadRunner
(arg=0x7fffd9bdbdc0) at
/home/mcmanus/src/mozilla2/wd/pipemq/security/manager/ssl/src/nsPSMBackgroundThread.cpp:44
#29 0x00007ffff3c72517 in _pt_root (arg=0x7fffda6e99b0) at
/home/mcmanus/src/mozilla2/wd/pipemq/nsprpub/pr/src/pthreads/ptthread.c:228
#30 0x00007ffff7bc59ca in start_thread (arg=<value optimized out>) at
pthread_create.c:300

You can reproduce pretty much at will by just breaking on the httpConnection
code and pulling up an https url. This will be considerably more dangerous when pipelining because GetSecurityCallbacks() in that context needs to access an array that is manipulated only on the socket thread.
Blocks: 599164
Blocks: 603503
I believe this will be WONTFIX (or INVALID) after we remove the SSL thread.
In my SSL thread removal patches, I spent some effort to make sure that the main thread and the socket transport thread almost never have to sync up because of PSM. But, then I re-found this bug, and it seemed to match the symptoms of some races I have been fighting with my patches. So, I have changed my SSL thread removal patches to dispatch a proxy to the socket transport thread, which then blocks while it waits for an event on the main thread to complete.

However, even with the SSL thread removal patch, I think there is at least one remaining case where the callbacks are potentially accessed from PSM without blocking the socket transport thread: PK11PasswordPrompt/PK11PasswordPromptRunnable calls nsNSSSocketInfo::GetInterface(), perhaps blocking a cert verification thread instead of the socket transport thread. nsNSSSocketInfo::GetInterface() does access the callbacks.

But, if we change Pk11PasswordPrompt to block the socket transport thread, then we would not be able to do any network I/O while the master password prompt is active. Instead we should change things so that the various wincx/pinarg parameters in certificate validation and the SSL_SetPKCS11PinArg calls are changed to point to something other than the nsNSSSocketInfo for the connection. IMO, this will be easier to do after fixing bug 679144 and bug 703834.
Depends on: 679144, 703834
Whiteboard: [necko-backlog]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
You need to log in before you can comment on or make changes to this bug.