Closed Bug 1204784 Opened 9 years ago Closed 9 years ago

Intermittent leakcheck | default process: 64 bytes leaked (nsRunnable, nsThreadShutdownAckEvent)

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox43 --- wontfix
firefox44 --- wontfix
firefox45 --- fixed
firefox46 --- fixed
firefox47 --- fixed

People

(Reporter: nigelb, Assigned: khuey)

References

Details

(Keywords: intermittent-failure, memory-leak)

Attachments

(2 files)

No description provided.
Keywords: mlk
Regression from bug 1200922, maybe?
Flags: needinfo?(khuey)
Well that bug added this class, so yes, it must be.
Flags: needinfo?(khuey)
ASSERTION: Non-empty event queue being destroyed; events being leaked.: 'IsEmpty()'
*shrug*
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Kyle, this seems to be happening more often starting Jan 11. https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1204784&startday=2016-01-01&endday=2016-01-14&tree=all Any idea what caused this to trigger more often? Can we clean up the events?
Flags: needinfo?(khuey)
It could have been landing bug 1237371 and bug 1237369, which uses this a lot more. But idk why it would lag a few days. Not sure what's going on ... I'll think about it some more.
Assignee: nobody → khuey
Flags: needinfo?(khuey)
The only way I could see that we would put an event in a queue and not execute it is if there are nested event queues. Do you have any other ideas, Nathan? The Push/PopEventQueue stuff is only used by workers so it seems unlikely to be involved here ...
Flags: needinfo?(nfroyd)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #9) > The only way I could see that we would put an event in a queue and not > execute it is if there are nested event queues. Do you have any other > ideas, Nathan? The Push/PopEventQueue stuff is only used by workers so it > seems unlikely to be involved here ... But we "promote" events from a popped event queue: https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThread.cpp#1176 so Push/PopEventQueue doesn't seem relevant here...? We also have: 16:23:53 INFO - [Child 1950] ###!!! ASSERTION: Non-empty event queue being destroyed; events being leaked.: 'IsEmpty()', file /builds/slave/m-in-l64-d-0000000000000000000/build/src/xpcom/threads/nsEventQueue.cpp, line 36 16:23:53 INFO - #01: nsThread::nsChainedEventQueue::~nsChainedEventQueue() [mfbt/RefPtr.h:55] 16:23:53 INFO - #02: nsThread::~nsThread() [xpcom/glue/nsTArray.h:2093] 16:23:53 INFO - #03: nsThread::~nsThread() [xpcom/threads/nsThread.cpp:508] 16:23:53 INFO - #04: nsThread::Release() [xpcom/threads/nsThread.cpp:179] 16:23:53 INFO - #05: _PR_DestroyThreadPrivate [nsprpub/pr/src/threads/prtpd.c:237] 16:23:53 INFO - #06: _pt_thread_death_internal [nsprpub/pr/src/pthreads/ptthread.c:888] 16:23:53 INFO - #07: _pt_thread_death [nsprpub/pr/src/pthreads/ptthread.c:865] 16:23:53 INFO - #08: libpthread.so.0 + 0x7c83 16:23:53 INFO - #09: libpthread.so.0 + 0x7ea8 16:23:53 INFO - #10: libc.so.6 + 0xf42ed so the root event queue must be non-empty when the thread dies, because that's the only nsChainedEventQueue we're destroying. I also think that the ack event must be in the main thread's event queue, as that's where the shutdown ack event from bug 1237371 is going. The main thread is also where LSPAnnotater::Annotate is running, so the main thread would also be receiving a shutdown ack event there. (I don't really understand how the stack can be libc -> pthread -> nsThread::Release...does that mean we're not actually releasing the main thread until TLS is emptied?) In any event shutdown for the main thread looks a little strange: https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThreadManager.cpp#126 There's probably some other IPC event loop stuff involved, but AFAICT the main thread isn't going through the loop here: https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThread.cpp#390 which means that we're never setting mEventsAreDoomed on the main thread, which means that we keep accepting events until...always? I think my bet is on the shutdown ack event coming here or after: https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThreadManager.cpp#161 So the main thread is going through nsThreadManager::Shutdown, but finishing the processing of events before the predictor thread or the annotator thread can send back the shutdown ack...and then once the shutdown ack gets sent there, we leak. Does that make any sense?
Flags: needinfo?(nfroyd)
(In reply to Nathan Froyd [:froydnj] from comment #11) > I think my bet is on the shutdown ack event coming here or after: > > https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThreadManager. > cpp#161 Hmm, it does seem like we need to wait for all the pending shutdown contexts to clear first.
Attached patch PatchSplinter Review
If the above theory is correct, this is the fix. It does seems like something we need regardless.
Attachment #8708499 - Flags: review?(nfroyd)
Comment on attachment 8708499 [details] [diff] [review] Patch Review of attachment 8708499 [details] [diff] [review]: ----------------------------------------------------------------- This looks fine in principle, but...this is assuming that we can never get any events into the main thread's event queue between the first call to NS_ProcessPendingEvents and the WaitForAllAsyncShutdowns in nsThreadManager::Shutdown, right? Because if events can, then it's possible that some of those events are going to call AsyncShutdown, and there might not be outstanding AsyncShutdown threads. Then WaitForAllAsyncShutdowns does nothing, and we're back to where we started. How likely is that? I'm not really sure. I don't know when these AsyncShutdown-calling events are getting inserted into the queue, though... r=me, but I'd really like to know whether we can come up with something more bulletproof, or we can assert that the above doesn't happen.
Attachment #8708499 - Flags: review?(nfroyd) → review+
From irc [11:16:56] <khuey> froydnj: in your comment, do you mean the *last* call to NS_ProcessPendingEvents? [11:17:30] <froydnj> khuey: no, I mean the first call: https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThreadManager.cpp#126 [11:17:47] <khuey> froydnj: if an event gets inserted that wants to call AsyncShutdown at that point, it doesn't matter [11:18:03] <froydnj> khuey: why not? [11:18:07] <khuey> froydnj: because we're going to go through and call Shutdown on all outstanding threads just below that [11:18:34] <khuey> froydnj: and there will be at least one thread outstanding [11:18:43] <khuey> froydnj: (because there's something to call AsyncShutdown on) [11:18:51] <khuey> froydnj: so we'll spin the event loop to do the (sync) Shutdown [11:19:05] <khuey> froydnj: and that'll execute any pending runnables that want to call AsyncShutdown on other threads [11:19:42] <khuey> froydnj: similarly once we've called Shutdown on all outstanding threads any pending runnables that want to call AsyncShutdown don't matter, because all threads have been synchronously Shutdown [11:20:26] <froydnj> khuey: ok, I think that makes sense...worth a comment in nsThreadManager::Shutdown? [11:20:32] <khuey> froydnj: sure
Flags: needinfo?(khuey)
I was hoping the assertion would give us a minidump but apparently not :/
Flags: needinfo?(khuey)
Keywords: leave-open
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Per philor, This is very likely the cause for a very hard-to-star, frequent assertion (which is also the source of the bugs marked as dups of this in a chain) -- bug 1240672
Depends on: 1240672
Flags: needinfo?(khuey)
I can make the assertion non-fatal again (which presumably will turn the orange back into a leak). Is that easier to star?
Flags: needinfo?(rjesup)
Flags: needinfo?(philringnalda)
Flags: needinfo?(khuey)
Yep, leaks don't have tree-specific paths. I filed bug 1242178 on being able to star assertions, but I don't know how likely to be workable it is. The other thing you can do is to push the path out of the way of the bug search with a longer message - treeherder is searching for Assertion failure: IsEmpty(), at /builds/slave/try-lx-d-000000000000000000000/build/src/xpcom/thread so if you turn that into Assertion failure: IsEmpty() event queue should be empty while we are shutting down because we can't even then the bug summary can match what it searches for.
Flags: needinfo?(philringnalda)
Status: RESOLVED → REOPENED
Flags: needinfo?(rjesup)
Resolution: FIXED → ---
https://hg.mozilla.org/mozilla-central/rev/dadf5ab56adc (landed in time to the merge to Aurora)
Target Milestone: mozilla46 → ---
See Also: → 1240236
I have an idea on how to get us more information on this from automation. I'll try to whip it up this weekend.
Just an idle question - are we certain the "joiningThread" is MainThread? Too bad we can't hit this under rr..... if it's frequent enough, maybe we can? or maybe rr will make it *more* likely (often it makes things easier to hit, and often harder - depends on which way the timing gods got up that morning).
(In reply to Randell Jesup [:jesup] from comment #37) > Just an idle question - are we certain the "joiningThread" is MainThread? I don't think it is, judging by: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8569ba92a355 and the failing log: 11:38:41 INFO - [2032] ###!!! ASSERTION: shouldn't be waiting on other threads to shutdown: 'mRequestedShutdownContexts.IsEmpty()', file /builds/slave/try-l64-d-00000000000000000000/build/src/xpcom/threads/nsThread.cpp, line 522 11:38:41 INFO - #01: nsThread::~nsThread() [xpcom/threads/nsThread.cpp:523] 11:38:41 INFO - #02: nsThread::Release() [xpcom/threads/nsThread.cpp:179] 11:38:42 INFO - #03: _PR_DestroyThreadPrivate [nsprpub/pr/src/threads/prtpd.c:238] 11:38:42 INFO - #04: _pt_thread_death_internal [nsprpub/pr/src/pthreads/ptthread.c:889] 11:38:42 INFO - #05: _pt_thread_death [nsprpub/pr/src/pthreads/ptthread.c:863] 11:38:42 INFO - #06: libpthread.so.0 + 0x7c83 11:38:42 INFO - #07: libpthread.so.0 + 0x7ea8 11:38:42 INFO - #08: libc.so.6 + 0xf42ed 11:38:42 INFO - [2032] ###!!! ASSERTION: Non-empty event queue being destroyed; events being leaked.: 'IsEmpty()', file /builds/slave/try-l64-d-00000000000000000000/build/src/xpcom/threads/nsEventQueue.cpp, line 36 11:38:42 INFO - #01: nsThread::nsChainedEventQueue::~nsChainedEventQueue() [mfbt/RefPtr.h:55] 11:38:42 INFO - #02: nsThread::~nsThread() [xpcom/glue/nsTArray.h:2095] 11:38:42 INFO - #03: nsThread::~nsThread() [xpcom/threads/nsThread.cpp:523] 11:38:42 INFO - #04: nsThread::Release() [xpcom/threads/nsThread.cpp:179] 11:38:42 INFO - #05: _PR_DestroyThreadPrivate [nsprpub/pr/src/threads/prtpd.c:238] 11:38:42 INFO - #06: _pt_thread_death_internal [nsprpub/pr/src/pthreads/ptthread.c:889] 11:38:42 INFO - #07: _pt_thread_death [nsprpub/pr/src/pthreads/ptthread.c:863] 11:38:42 INFO - #08: libpthread.so.0 + 0x7c83 11:38:42 INFO - #09: libpthread.so.0 + 0x7ea8 11:38:42 INFO - #10: libc.so.6 + 0xf42ed I *think* that means this is a non-main thread. There's another variant of the non-empty event queue one that *is* on the main thread, but I think that's a separate bug...the extra assertions I added in that try push don't seem to be firing, though.
And jesup wins the prize for today: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a3fa8023b527 14:01:43 INFO - 133 @0x7f06a8f979a0 (0 references; 0 from COMPtrs) 14:01:43 INFO - allocation stack: 14:01:43 INFO - #00: nsThread::ShutdownInternal(bool) [xpcom/base/nsAutoPtr.h:34] 14:01:43 INFO - #01: nsThread::AsyncShutdown() [xpcom/threads/nsThread.cpp:720] 14:01:43 INFO - #02: nsThreadPool::PutEvent(already_AddRefed<nsIRunnable>&&) [xpcom/threads/nsThreadPool.cpp:125] 14:01:43 INFO - #03: nsThreadPool::Dispatch(already_AddRefed<nsIRunnable>&&, unsigned int) [xpcom/threads/nsThreadPool.cpp:271] 14:01:43 INFO - #04: mozilla::TaskQueue::DispatchLocked(already_AddRefed<nsIRunnable>, mozilla::TaskQueue::DispatchMode, mozilla::AbstractThread::DispatchFailureHandling, mozilla::AbstractThread::DispatchReason) [xpcom/threads/TaskQueue.cpp:66] 14:01:43 INFO - #05: mozilla::TaskQueue::Dispatch(already_AddRefed<nsIRunnable>, mozilla::AbstractThread::DispatchFailureHandling, mozilla::AbstractThread::DispatchReason) [xpcom/threads/TaskQueue.h:47] 14:01:43 INFO - #06: mozilla::detail::EventTarget<mozilla::AbstractThread>::Dispatch(mozilla::AbstractThread*, already_AddRefed<nsIRunnable>&&) [dom/media/MediaEventSource.h:109] 14:01:43 INFO - #07: mozilla::detail::ListenerHelper<mozilla::AbstractThread, mozilla::MediaEventSourceImpl<<anonymous>, Ts>::ConnectInternal(Target*, This*, Method) [with Target = mozilla::AbstractThread; This = mozilla::MediaDecoderStateMachine; Method = void (mozilla::MediaDecoderStateMachine::*)(const RefPtr<mozilla::MediaData>&); mozilla::ListenerMode Mode = (mozilla::ListenerMode)1; Es = {RefPtr<mozilla::MediaData>}; typename mozilla::EnableIf<mozilla::detail::TakeArgs<Method>::value, mozilla::MediaEventListener>::Type = mozilla::MediaEventListener]::<lambda(mozilla::MediaEventSourceImpl<(mozilla::ListenerMode)1, RefPtr<mozilla::MediaData> >::ArgType<RefPtr<mozilla::MediaData> >&&)> >::DispatchHelper<mozilla::MediaEventSourceImpl<<anonymous>, Ts>::ConnectInternal(Target*, This*, Method) [with Target = mozilla::AbstractThread; This = mozilla::MediaDecoderStateMachine; Method = void (mozilla::MediaDecoderStateMachine::*)(const RefPtr<mozilla::MediaData>&); mozilla::ListenerMode Mode = (mozilla::ListenerMode)1; Es = {RefPtr<mozilla::MediaData>}; typename mozilla::EnableIf<mozilla::detail::TakeArgs<Method>::value, mozilla::MediaEventListener>::Type = mozilla::MediaEventListener]::<lambda(mozilla::MediaEventSourceImpl<(mozilla::ListenerMode)1, RefPtr<mozilla::MediaData> >::ArgType<RefPtr<mozilla::MediaData> >&&)>, const RefPtr<mozilla::MediaData>&> [dom/media/MediaEventSource.h:177] 14:01:43 INFO - #08: void mozilla::MediaEventSourceImpl<(mozilla::ListenerMode)1, RefPtr<mozilla::MediaData> >::NotifyInternal<RefPtr<mozilla::MediaData>&>(RefPtr<mozilla::MediaData>&&&) [dom/media/MediaEventSource.h:468] 14:01:43 INFO - #09: mozilla::MediaQueue<mozilla::MediaData>::PopFront() [mfbt/RefPtr.h:206] 14:01:43 INFO - #10: mozilla::media::DecodedAudioDataSink::PopFrames(unsigned int) [mfbt/AlreadyAddRefed.h:116] 14:01:43 INFO - #11: mozilla::AudioStream::GetUnprocessed(mozilla::AudioBufferWriter&) [dom/media/AudioStream.cpp:583] 14:01:43 INFO - #12: mozilla::AudioStream::DataCallback(void*, long) [dom/media/AudioStream.cpp:658] 14:01:43 INFO - #13: trigger_user_callback [media/libcubeb/src/cubeb_pulse.c:215] 14:01:43 INFO - #14: libpulse.so.0 + 0x29ba7 14:01:43 INFO - #15: libpulsecommon-1.1.so + 0x2d9eb 14:01:43 INFO - #16: libpulse.so.0 + 0xebdd 14:01:43 INFO - #17: libpulsecommon-1.1.so + 0x322a9 14:01:43 INFO - #18: libpulse.so.0 + 0x21a1e 14:01:43 INFO - #19: libpulse.so.0 + 0x21de5 14:01:43 INFO - #20: libpulse.so.0 + 0x21e90 14:01:43 INFO - #21: libpulse.so.0 + 0x3030f 14:01:43 INFO - #22: libpulsecommon-1.1.so + 0x3fd18 14:01:43 INFO - #23: libpthread.so.0 + 0x7e9a 14:01:43 INFO - #24: libc.so.6 + 0xf42ed That is, we're asynchronously shutting down a thread in a thread pool that we don't need anymore, from a non-main-thread thread (one that I don't think we even control via the thread manager, since it looks like it was started from pulseaudio?). The source code: http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThreadPool.cpp#122 // We never dispatched any events to the thread, so we can shut it down // asynchronously without worrying about anything. MOZ_ALWAYS_TRUE(NS_SUCCEEDED(thread->AsyncShutdown())); I guess this would be fallout from bug 1200922, then, khuey?
Flags: needinfo?(khuey)
(In reply to Nathan Froyd [:froydnj] from comment #39) > That is, we're asynchronously shutting down a thread in a thread pool that > we don't need anymore, from a non-main-thread thread (one that I don't think > we even control via the thread manager, since it looks like it was started > from pulseaudio?). Ugh, jfc. On the bright side, it does explain why this only happens on Linux. I don't think things would have worked before my changes if we ended up in this code path. We would have tried to dispatch a runnable to the current thread that would never run ... Do we have a different intermittent runnable leak that just happened to go away when I landed my patch? I think the fix is to make this call nsThreadPool::ShutdownThread with this thread (and that can call AsyncShutdown, because why not), that way we shut down everything from a thread we control (the main thread). There's a more general problem here of how much of Gecko works when you start calling it from threads we don't control ... I hope this audio code doesn't do anything too crazy!
Flags: needinfo?(khuey)
Attached patch Take twoSplinter Review
Untested, but something like this.
Attachment #8714152 - Flags: review?(nfroyd)
Comment on attachment 8714152 [details] [diff] [review] Take two Review of attachment 8714152 [details] [diff] [review]: ----------------------------------------------------------------- This fix looks good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=81c83641606f I have some patches that helped in debugging this issue; I'll clean them up and post them in a different bug. ::: xpcom/threads/nsThreadPool.cpp @@ +136,5 @@ > LOG(("THRD-P(%p) shutdown async [%p]\n", this, aThread)); > > + // This is either called by a threadpool thread that is out of work, or > + // a thread that attempted to create a threadpool thread and raced in > + // such a way that the newly created thread is no longer necessary. Thank you for this comment.
Attachment #8714152 - Flags: review?(nfroyd) → review+
See Also: → 1239927
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Comment on attachment 8714152 [details] [diff] [review] Take two Approval Request Comment [Feature/regressing bug #]: Bug 1200922 [User impact if declined]: High frequency intermittent leaks [Describe test coverage new/current, TreeHerder]: This is well covered on TreeHerder, because the orange shows up all the time. [Risks and why]: Low risk [String/UUID change made/needed]: None
Attachment #8714152 - Flags: approval-mozilla-beta?
Attachment #8714152 - Flags: approval-mozilla-aurora?
Comment on attachment 8708499 [details] [diff] [review] Patch See above. Note that this landed in time to make it to Aurora, so we only need it on Beta.
Attachment #8708499 - Flags: approval-mozilla-beta?
Comment on attachment 8714152 [details] [diff] [review] Take two Improve the testsuite, taking it. Should be in 45 beta 3.
Attachment #8714152 - Flags: approval-mozilla-beta?
Attachment #8714152 - Flags: approval-mozilla-beta+
Attachment #8714152 - Flags: approval-mozilla-aurora?
Attachment #8714152 - Flags: approval-mozilla-aurora+
Attachment #8708499 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
See Also: → 1307761
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: