Closed Bug 404870 Opened 12 years ago Closed 12 years ago

DOMWindow leak and "ASSERTION: Non-empty event queue being destroyed; events being leaked"

Categories

(Core :: XPCOM, defect, P2, major)

x86
macOS
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: roc, Assigned: roc)

Details

(Keywords: memory-leak)

Attachments

(2 files)

On an AJAXy website a friend set up (sorry, login required), which uses persistent XMLHttpRequests, if I connect and then while connected choose "Work Offline" to go offline, I get an assertion "Non-empty event queue being destroyed; events being leaked" and then when I quit two DOMWindows are leaked. Here's the assertion stack:

#0  NS_DebugBreak_P (aSeverity=1, aStr=0x13ad440 "Non-empty event queue being destroyed; events being leaked.", aExpr=0x13ad434 "IsEmpty()", aFile=0x13ad3f0 "/Users/roc/mozilla-checkin/mozilla/xpcom/threads/nsEventQueue.cpp", aLine=63) at /Users/roc/mozilla-checkin/mozilla/xpcom/base/nsDebugImpl.cpp:263
#1  0x0136834a in nsEventQueue::~nsEventQueue (this=0x3e9da784) at /Users/roc/mozilla-checkin/mozilla/xpcom/threads/nsEventQueue.cpp:63
#2  0x013d106c in nsThread::nsChainedEventQueue::~nsChainedEventQueue (this=0x3e9da77c) at /Users/roc/mozilla-checkin/mozilla/xpcom/threads/nsThread.h:101
#3  0x01369361 in nsThread::~nsThread (this=0x3e9da760) at /Users/roc/mozilla-checkin/mozilla/xpcom/threads/nsThread.cpp:285
#4  0x01369425 in nsThread::Release (this=0x3e9da760) at /Users/roc/mozilla-checkin/mozilla/xpcom/threads/nsThread.cpp:133
#5  0x139a026b in nsSocketTransportService::Shutdown (this=0x2043c00) at /Users/roc/mozilla-checkin/mozilla/netwerk/base/src/nsSocketTransportService2.cpp:419
#6  0x139810d0 in nsIOService::SetOffline (this=0x2923300, offline=1) at /Users/roc/mozilla-checkin/mozilla/netwerk/base/src/nsIOService.cpp:617
#7  0x0137bb38 in NS_InvokeByIndex_P (that=0x2923300, methodIndex=10, paramCount=1, params=0xbfffc9d4) at /Users/roc/mozilla-checkin/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:179
#8  0x12a45016 in XPCWrappedNative::CallMethod (ccx=@0xbfffcc40, mode=CALL_SETTER) at /Users/roc/mozilla-checkin/mozilla/js/src/xpconnect/src/xpcwrappednative.cpp:2339
#9  0x12a80e09 in XPCWrappedNative::SetAttribute (ccx=@0xbfffcc40) at /Users/roc/mozilla-checkin/mozilla/js/src/xpconnect/src/xpcwrappednativejsops.cpp:2120
#10 0x12a4cca5 in XPC_WN_GetterSetter (cx=0x3cfbd8a0, obj=0x34bcb3c0, argc=1, argv=0x27796dc, vp=0xbfffcd64) at /Users/roc/mozilla-checkin/mozilla/js/src/xpconnect/src/xpcwrappednativejsops.cpp:1491
#11 0x010635b5 in js_Invoke (cx=0x3cfbd8a0, argc=1, vp=0x27796d4, flags=2) at /Users/roc/mozilla-checkin/mozilla/js/src/jsinterp.c:1362
#12 0x01063903 in js_InternalInvoke (cx=0x3cfbd8a0, obj=0x34bcb3c0, fval=1037854336, flags=0, argc=1, argv=0xbfffd308, rval=0xbfffd308) at /Users/roc/mozilla-checkin/mozilla/js/src/jsinterp.c:1438
#13 0x01063bb9 in js_InternalGetOrSet (cx=0x3cfbd8a0, obj=0x34bcb3c0, id=42888644, fval=1037854336, mode=JSACC_WRITE, argc=1, argv=0xbfffd308, rval=0xbfffd308) at /Users/roc/mozilla-checkin/mozilla/js/src/jsinterp.c:1502

I'll dig, not sure whose fault this is yet.
Severity: normal → major
Keywords: mlk
OK I added an assertion that fires when an event is posted to a thread that isn't going to run NS_ProcessPendingEvents again. Here's the stack when that fires:

#0  NS_DebugBreak_P (aSeverity=1, aStr=0x13abb38 "This event will be leaked and never run!", aExpr=0x13abb24 "!mEventsAreDoomed", aFile=0x13ab968 "/Users/roc/mozilla-trunk/xpcom/threads/nsThread.cpp", aLine=341) at /Users/roc/mozilla-trunk/xpcom/base/nsDebugImpl.cpp:263
#1  0x013688a5 in nsThread::PutEvent (this=0x3ee66b90, event=0x3ee66be0) at /Users/roc/mozilla-trunk/xpcom/threads/nsThread.cpp:341
#2  0x01368f8d in nsThread::Dispatch (this=0x3ee66b90, event=0x3ee66be0, flags=0) at /Users/roc/mozilla-trunk/xpcom/threads/nsThread.cpp:387
#3  0x13885a1c in nsSocketTransportService::Dispatch (this=0x204c600, event=0x3ee66be0, flags=0) at /Users/roc/mozilla-trunk/netwerk/base/src/nsSocketTransportService2.cpp:106
#4  0x138f28d6 in nsHttpConnectionMgr::PostEvent (this=0x3d0b6130, handler={__pfn = &nsHttpConnectionMgr::OnMsgCancelTransaction(int, void*), __delta = 0}, iparam=-2142043148, vparam=0x3f01a830) at /Users/roc/mozilla-trunk/netwerk/protocol/http/src/nsHttpConnectionMgr.cpp:175
#5  0x138f2cfb in nsHttpConnectionMgr::CancelTransaction (this=0x3d0b6130, trans=0x3f01a830, reason=2152924148) at /Users/roc/mozilla-trunk/netwerk/protocol/http/src/nsHttpConnectionMgr.cpp:212
#6  0x1396ff38 in nsHttpHandler::CancelTransaction (this=0x3d0b4ec0, trans=0x3f01a830, reason=2152924148) at /Users/roc/mozilla-trunk/netwerk/protocol/http/src/nsHttpTransaction.h:148
#7  0x13908fc5 in nsHttpChannel::Cancel (this=0x3ee9b180, status=2152924148) at /Users/roc/mozilla-trunk/netwerk/protocol/http/src/nsHttpChannel.cpp:3392
#8  0x17dca9c3 in nsCrossSiteListenerProxy::ForwardRequest (this=0x3ee01df0, aFromStop=0) at /Users/roc/mozilla-trunk/content/base/src/nsCrossSiteListenerProxy.cpp:84
#9  0x17dcbb11 in nsCrossSiteListenerProxy::OnStartRequest (this=0x3ee01df0, aRequest=0x3ee9b1b0, aContext=0x0) at /Users/roc/mozilla-trunk/content/base/src/nsCrossSiteListenerProxy.cpp:126
#10 0x1390c4d4 in nsHttpChannel::CallOnStartRequest (this=0x3ee9b180) at /Users/roc/mozilla-trunk/netwerk/protocol/http/src/nsHttpChannel.cpp:757
#11 0x139135ab in nsHttpChannel::OnStartRequest (this=0x3ee9b180, request=0x3e8e17b0, ctxt=0x0) at /Users/roc/mozilla-trunk/netwerk/protocol/http/src/nsHttpChannel.cpp:4331
#12 0x138639ad in nsInputStreamPump::OnStateStart (this=0x3e8e17b0) at /Users/roc/mozilla-trunk/netwerk/base/src/nsInputStreamPump.cpp:439
#13 0x13864611 in nsInputStreamPump::OnInputStreamReady (this=0x3e8e17b0, stream=0x3f068f1c) at /Users/roc/mozilla-trunk/netwerk/base/src/nsInputStreamPump.cpp:395
#14 0x013c2c44 in nsInputStreamReadyEvent::Run (this=0x3e8d63f0) at /Users/roc/mozilla-trunk/xpcom/io/nsStreamUtils.cpp:111
#15 0x013684d1 in nsThread::ProcessNextEvent (this=0x2912f20, mayWait=1, result=0xbfffc68c) at /Users/roc/mozilla-trunk/xpcom/threads/nsThread.cpp:496
#16 0x0130d985 in NS_ProcessNextEvent_P (thread=0x2912f20, mayWait=1) at nsThreadUtils.cpp:227
#17 0x01368ae8 in nsThread::Shutdown (this=0x3ee66b90) at /Users/roc/mozilla-trunk/xpcom/threads/nsThread.cpp:451
#18 0x1388686f in nsSocketTransportService::Shutdown (this=0x204c600) at /Users/roc/mozilla-trunk/netwerk/base/src/nsSocketTransportService2.cpp:418
#19 0x138676f0 in nsIOService::SetOffline (this=0x29217b0, offline=1) at /Users/roc/mozilla-trunk/netwerk/base/src/nsIOService.cpp:617
This is timing-dependent in a nasty way, BTW. I can reproduce it most of the time but not always. Adding debug code involving locking made the bug go away.
Looking at that stack and poking around the netwerk code, it looks like we're cancelling an HTTP transaction and nsHttpConnectionMgr is calling nsSocketTransportService::Dispatch to do stuff on the socket transport thread. But the socket transport thread is dead dead dead.

nsSocketTransportService::Dispatch does check "NS_ENSURE_TRUE(mThread, NS_ERROR_NOT_INITIALIZED);", but nsSocketTransportService::Shutdown looks like this:

    // join with thread
    mThread->Shutdown();
    NS_RELEASE(mThread);

So this is actually a race condition where events are being posted to the socket transport service between the thread actually shutting down in mThread->Shutdown() and mThread being cleared.
Well, it's only sort-of a race condition. In this case events are happening on the same thread, the main thread; we're posting to mThread while the main thread runs another event while it waits for mThread's shutdown-ack event to arrive in nsThread::Shutdown:

  while (!context.shutdownAck)
    NS_ProcessNextEvent(context.joiningThread);

But I'm also a bit concerned about this code:

// event queue (any thread)
NS_IMETHODIMP
nsSocketTransportService::Dispatch(nsIRunnable *event, PRUint32 flags)
{
    LOG(("STS dispatch [%p]\n", event));
    NS_ENSURE_TRUE(mThread, NS_ERROR_NOT_INITIALIZED);
    return mThread->Dispatch(event, flags);

So a Dispatch from another thread could race with the code in nsSocketTransportService::Shutdown which sets mThread to null. Maybe the comment isn't true and Dispatch is always called on the main thread, but I wouldn't want to bet on it, it can be called along many paths.
I'm not sure what the right thing is to do here. The safest course might be to have a reliable, race-proof way for nsThread::Dispatch to detect that the thread is no longer going to be able to process the event, and return an error rv (NS_ERROR_NOT_INITIALIZED) in that case. That will prevent leaks and make it easier to catch errors early. Also, provide locking around reads of mThread in nsSocketTransportService to prevent races there.

Could also clear mThread early in nsSocketTransportService::Shutdown(), although I think that would not be necessary with the above two changes, and would add risk.
Attached patch nsThread fixSplinter Review
This is the first part of my previous comment. We simply don't allow PutEvent to add an event to the queue unless we're eventually going to process it. This should prevent us from ever leaking events. The caller can detect that this happened and deal appropriately.
Assignee: nobody → roc
Status: NEW → ASSIGNED
Attachment #289758 - Flags: superreview?(benjamin)
Attachment #289758 - Flags: review?(benjamin)
Here, we guard mThread reads from non-main-threads, and mThread writes from any thread, with mLock. Dispatch and IsOnCurrentThread grab a strong ref atomically so they won't crash in a race condition.

Also, we leverage the previous patch to detect when the thread has stopped accepting events; if it has, we treat it as if there was no thread.
Attachment #289760 - Flags: superreview?(cbiesinger)
Attachment #289760 - Flags: review?(cbiesinger)
This protects us against possible crashes and leaks. I guess there's some risk that bugs where we used to only leak may get worse, but probably not.
Flags: blocking1.9?
Flags: blocking1.9? → blocking1.9+
Priority: -- → P2
Attachment #289758 - Flags: superreview?(benjamin)
Attachment #289758 - Flags: superreview+
Attachment #289758 - Flags: review?(benjamin)
Attachment #289758 - Flags: review+
Whiteboard: [needs review]
Can the nsThread fix go ahead and land, or does it require the nsSocketTransportService fix land at the same time?
They really should both land at the same time.
Attachment #289760 - Flags: superreview?(cbiesinger)
Attachment #289760 - Flags: superreview+
Attachment #289760 - Flags: review?(cbiesinger)
Attachment #289760 - Flags: review+
checked in.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [needs review]
Sounds like this is difficult enough to hit that we can't really test it...
Flags: in-testsuite-
You need to log in before you can comment on or make changes to this bug.