Closed Bug 430260 Opened 12 years ago Closed 9 years ago

Deadlock on main thread during exit in nsPSMBackgroundThread::requestExit

Categories

(Core :: XPCOM, defect, critical)

x86
Windows XP
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 468736

People

(Reporter: mayhemer, Unassigned)

References

Details

(Keywords: hang)

Attachments

(1 file)

On main thread during PROFILE_CHANGE_NET_TEARDOWN_TOPIC observation nsPSMBackgroundThread notifies its self to stop and joins the thread. The background thread is calling a sync proxy to main thread at that moment and could not have been notified about its cond var notification.

PSM background thread backtrace:

>	nspr4.dll!_PR_MD_WAIT_CV(_MDCVar * cv=0x05a795cc, _MDLock * lock=0x03892344, unsigned int timeout=4294967295)  Line 280 + 0x14 bytes	C
 	nspr4.dll!_PR_WaitCondVar(PRThread * thread=0x0491b9c8, PRCondVar * cvar=0x05a79558, PRLock * lock=0x03892328, unsigned int timeout=4294967295)  Line 204 + 0x17 bytes	C
 	nspr4.dll!PR_Wait(PRMonitor * mon=0x044e2070, unsigned int ticks=4294967295)  Line 175 + 0x1d bytes	C
 	xpcom_core.dll!nsAutoMonitor::Wait(unsigned int interval=4294967295)  Line 340 + 0x11 bytes	C++
 	xpcom_core.dll!nsEventQueue::GetEvent(int mayWait=1, nsIRunnable * * result=0x050ef950)  Line 86	C++
 	xpcom_core.dll!nsThread::nsChainedEventQueue::GetEvent(int mayWait=1, nsIRunnable * * event=0x050ef950)  Line 113	C++
 	xpcom_core.dll!nsThread::ProcessNextEvent(int mayWait=1, int * result=0x050ef974)  Line 501 + 0x49 bytes	C++
 	xpcom_core.dll!NS_ProcessNextEvent_P(nsIThread * thread=0x04a51c68, int mayWait=1)  Line 227 + 0x16 bytes	C++
 	xpcom_core.dll!nsProxyEventObject::CallMethod(unsigned short methodIndex=3, const XPTMethodDescriptor * methodInfo=0x00c15108, nsXPTCMiniVariant * params=0x050efa20)  Line 259 + 0xb bytes	C++
 	xpcom_core.dll!PrepareAndDispatch(nsXPTCStubBase * self=0x047b61c8, unsigned int methodIndex=3, unsigned int * args=0x050efae0, unsigned int * stackBytesToPop=0x050efad0)  Line 114 + 0x21 bytes	C++
 	xpcom_core.dll!SharedStub()  Line 142	C++
 	xpcom_core.dll!nsGetInterface::operator()(const nsID & aIID={...}, void * * aInstancePtr=0x047b61c8)  Line 52 + 0x21 bytes	C++
 	xpcom_core.dll!nsGetInterface::operator()(const nsID & aIID={...}, void * * aInstancePtr=0x050efb14)  Line 52 + 0x21 bytes	C++
 	pipnss.dll!nsCOMPtr<nsIBadCertListener2>::assign_from_helper(const nsCOMPtr_helper & helper={...}, const nsID & aIID={...})  Line 1335 + 0x13 bytes	C++
 	pipnss.dll!nsCOMPtr<nsIBadCertListener2>::nsCOMPtr<nsIBadCertListener2>(const nsCOMPtr_helper & helper={...})  Line 695	C++
 	pipnss.dll!nsNSSBadCertHandler(void * arg=0x05b51e48, PRFileDesc * sslSocket=0x05b5e7b0)  Line 3026	C++
 	ssl3.dll!ssl3_HandleCertificate(sslSocketStr * ss=0x05b51f20, unsigned char * b=0x05b5a6f3, unsigned int length=0)  Line 7266 + 0x1b bytes	C
 	ssl3.dll!ssl3_HandleHandshakeMessage(sslSocketStr * ss=0x05b51f20, unsigned char * b=0x05b59f1c, unsigned int length=2007)  Line 7938 + 0x11 bytes	C
 	ssl3.dll!ssl3_HandleHandshake(sslSocketStr * ss=0x05b51f20, sslBufferStr * origBuf=0x05b52168)  Line 8062 + 0x19 bytes	C
 	ssl3.dll!ssl3_HandleRecord(sslSocketStr * ss=0x05b51f20, SSL3Ciphertext * cText=0x050efe50, sslBufferStr * databuf=0x05b52168)  Line 8325 + 0xd bytes	C
 	ssl3.dll!ssl3_GatherCompleteHandshake(sslSocketStr * ss=0x05b51f20, int flags=0)  Line 206 + 0x17 bytes	C
 	ssl3.dll!ssl_GatherRecord1stHandshake(sslSocketStr * ss=0x05b51f20)  Line 1258 + 0xb bytes	C
 	ssl3.dll!ssl_Do1stHandshake(sslSocketStr * ss=0x05b51f20)  Line 151 + 0xf bytes	C
 	ssl3.dll!ssl_SecureSend(sslSocketStr * ss=0x05b51f20, const unsigned char * buf=0x05b72d98, int len=375, int flags=0)  Line 1152 + 0x9 bytes	C
 	ssl3.dll!ssl_SecureWrite(sslSocketStr * ss=0x05b51f20, const unsigned char * buf=0x05b72d98, int len=375)  Line 1197 + 0x13 bytes	C
 	ssl3.dll!ssl_Write(PRFileDesc * fd=0x05b5e7b0, const void * buf=0x05b72d98, int len=375)  Line 1487 + 0x17 bytes	C
 	pipnss.dll!nsSSLThread::Run()  Line 1029 + 0x1c bytes	C++
 	pipnss.dll!nsPSMBackgroundThread::nsThreadRunner(void * arg=0x0491b7e8)  Line 45	C++


Main thread backtrace:

>	nspr4.dll!_PR_MD_WAIT_CV(_MDCVar * cv=0x0491bc24, _MDLock * lock=0x00b936bc, unsigned int timeout=4294967295)  Line 280 + 0x14 bytes	C
 	nspr4.dll!_PR_WaitCondVar(PRThread * thread=0x00b93888, PRCondVar * cvar=0x0491bbb0, PRLock * lock=0x00b936a0, unsigned int timeout=4294967295)  Line 204 + 0x17 bytes	C
 	nspr4.dll!PR_WaitCondVar(PRCondVar * cvar=0x0491bbb0, unsigned int timeout=4294967295)  Line 551 + 0x17 bytes	C
 	nspr4.dll!PR_JoinThread(PRThread * thread=0x0491b9c8)  Line 1593 + 0xb bytes	C
 	pipnss.dll!nsPSMBackgroundThread::requestExit()  Line 97 + 0xd bytes	C++
 	pipnss.dll!nsNSSComponent::DoProfileChangeNetTeardown()  Line 2340	C++
 	pipnss.dll!nsNSSComponent::Observe(nsISupports * aSubject=0x0448f388, const char * aTopic=0x1003cdf4, const wchar_t * someData=0x1003dab4)  Line 2088 + 0xb bytes	C++
 	xpcom_core.dll!nsObserverList::NotifyObservers(nsISupports * aSubject=0x0448f388, const char * aTopic=0x1003cdf4, const wchar_t * someData=0x1003dab4)  Line 129	C++
 	xpcom_core.dll!nsObserverService::NotifyObservers(nsISupports * aSubject=0x0448f388, const char * aTopic=0x1003cdf4, const wchar_t * someData=0x1003dab4)  Line 184	C++
 	xul.dll!nsXREDirProvider::DoShutdown()  Line 838	C++
 	xul.dll!ScopedXPCOMStartup::~ScopedXPCOMStartup()  Line 907	C++
 	xul.dll!XRE_main(int argc=3, char * * argv=0x00bc0428, const nsXREAppData * aAppData=0x00bc0a00)  Line 3216	C++
 	firefox.exe!NS_internal_main(int argc=3, char * * argv=0x00bc0428)  Line 158 + 0x12 bytes	C++
 	firefox.exe!wmain(int argc=3, wchar_t * * argv=0x00bc0340)  Line 87 + 0xd bytes	C++


Happened just ones.
Possibly bug 429304?
(In reply to comment #1)
> Possibly bug 429304?
> 

Yes, this could be a dup. Please see also bug 426404 where something similar to this bug happened on the socket thread. I add you on CC to that bug.
Adding dep on bug 429304, to track the possible relation.
Depends on: 429304
So, it looks to me that this deadlock can be resolved if we locked in this scope <http://mxr.mozilla.org/mozilla-central/source/security/manager/ssl/src/nsSSLThread.cpp#997> and made sure that mExitRequested could not be set while this work is in progress, right?
Well, attachment 439893 [details] [diff] [review] sound like a much better solution to me!
Probably good STR:
- debug build on windows of the trunk to this date (63d439971e7e)
- apply patch for bug 613977 (v3)
- do not change network.http.connection-retry-timeout (leave at default)
- goto a fast (best local) site and connect in a way the certificate won't be considered valid
- hold F5 (refresh loop) until a dialog appears
- close the dialog with cancel
- close firefox
=> you should deadlock the way described here
I have a dump with heap of this hang ... it's 600 MB but I can get it to someone if it'll help debug.

I've hit this multiple times on the last few nightlies, would be nice if we could fix this.
Attached file mac hang sample
Cheng and I have been hitting this on Mac a lot lately.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 468736
You need to log in before you can comment on or make changes to this bug.