Closed
Bug 1204784
Opened 8 years ago
Closed 7 years ago
Intermittent leakcheck | default process: 64 bytes leaked (nsRunnable, nsThreadShutdownAckEvent)
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla47
People
(Reporter: nigelb, Assigned: khuey)
References
Details
(Keywords: intermittent-failure, memory-leak)
Attachments
(2 files)
3.60 KB,
patch
|
froydnj
:
review+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
2.13 KB,
patch
|
froydnj
:
review+
Sylvestre
:
approval-mozilla-aurora+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
No description provided.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 3•8 years ago
|
||
Well that bug added this class, so yes, it must be.
Flags: needinfo?(khuey)
Assignee | ||
Comment 4•8 years ago
|
||
ASSERTION: Non-empty event queue being destroyed; events being leaked.: 'IsEmpty()'
Assignee | ||
Comment 5•7 years ago
|
||
*shrug*
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Updated•7 years ago
|
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Comment hidden (Intermittent Failures Robot) |
Comment 7•7 years ago
|
||
Kyle, this seems to be happening more often starting Jan 11. https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1204784&startday=2016-01-01&endday=2016-01-14&tree=all Any idea what caused this to trigger more often? Can we clean up the events?
Flags: needinfo?(khuey)
Assignee | ||
Comment 8•7 years ago
|
||
It could have been landing bug 1237371 and bug 1237369, which uses this a lot more. But idk why it would lag a few days. Not sure what's going on ... I'll think about it some more.
Assignee: nobody → khuey
Flags: needinfo?(khuey)
Assignee | ||
Comment 9•7 years ago
|
||
The only way I could see that we would put an event in a queue and not execute it is if there are nested event queues. Do you have any other ideas, Nathan? The Push/PopEventQueue stuff is only used by workers so it seems unlikely to be involved here ...
Flags: needinfo?(nfroyd)
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 11•7 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #9) > The only way I could see that we would put an event in a queue and not > execute it is if there are nested event queues. Do you have any other > ideas, Nathan? The Push/PopEventQueue stuff is only used by workers so it > seems unlikely to be involved here ... But we "promote" events from a popped event queue: https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThread.cpp#1176 so Push/PopEventQueue doesn't seem relevant here...? We also have: 16:23:53 INFO - [Child 1950] ###!!! ASSERTION: Non-empty event queue being destroyed; events being leaked.: 'IsEmpty()', file /builds/slave/m-in-l64-d-0000000000000000000/build/src/xpcom/threads/nsEventQueue.cpp, line 36 16:23:53 INFO - #01: nsThread::nsChainedEventQueue::~nsChainedEventQueue() [mfbt/RefPtr.h:55] 16:23:53 INFO - #02: nsThread::~nsThread() [xpcom/glue/nsTArray.h:2093] 16:23:53 INFO - #03: nsThread::~nsThread() [xpcom/threads/nsThread.cpp:508] 16:23:53 INFO - #04: nsThread::Release() [xpcom/threads/nsThread.cpp:179] 16:23:53 INFO - #05: _PR_DestroyThreadPrivate [nsprpub/pr/src/threads/prtpd.c:237] 16:23:53 INFO - #06: _pt_thread_death_internal [nsprpub/pr/src/pthreads/ptthread.c:888] 16:23:53 INFO - #07: _pt_thread_death [nsprpub/pr/src/pthreads/ptthread.c:865] 16:23:53 INFO - #08: libpthread.so.0 + 0x7c83 16:23:53 INFO - #09: libpthread.so.0 + 0x7ea8 16:23:53 INFO - #10: libc.so.6 + 0xf42ed so the root event queue must be non-empty when the thread dies, because that's the only nsChainedEventQueue we're destroying. I also think that the ack event must be in the main thread's event queue, as that's where the shutdown ack event from bug 1237371 is going. The main thread is also where LSPAnnotater::Annotate is running, so the main thread would also be receiving a shutdown ack event there. (I don't really understand how the stack can be libc -> pthread -> nsThread::Release...does that mean we're not actually releasing the main thread until TLS is emptied?) In any event shutdown for the main thread looks a little strange: https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThreadManager.cpp#126 There's probably some other IPC event loop stuff involved, but AFAICT the main thread isn't going through the loop here: https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThread.cpp#390 which means that we're never setting mEventsAreDoomed on the main thread, which means that we keep accepting events until...always? I think my bet is on the shutdown ack event coming here or after: https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThreadManager.cpp#161 So the main thread is going through nsThreadManager::Shutdown, but finishing the processing of events before the predictor thread or the annotator thread can send back the shutdown ack...and then once the shutdown ack gets sent there, we leak. Does that make any sense?
Flags: needinfo?(nfroyd)
Assignee | ||
Comment 12•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #11) > I think my bet is on the shutdown ack event coming here or after: > > https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThreadManager. > cpp#161 Hmm, it does seem like we need to wait for all the pending shutdown contexts to clear first.
Assignee | ||
Updated•7 years ago
|
Keywords: leave-open
Assignee | ||
Comment 14•7 years ago
|
||
If the above theory is correct, this is the fix. It does seems like something we need regardless.
Attachment #8708499 -
Flags: review?(nfroyd)
![]() |
||
Comment 15•7 years ago
|
||
Comment on attachment 8708499 [details] [diff] [review] Patch Review of attachment 8708499 [details] [diff] [review]: ----------------------------------------------------------------- This looks fine in principle, but...this is assuming that we can never get any events into the main thread's event queue between the first call to NS_ProcessPendingEvents and the WaitForAllAsyncShutdowns in nsThreadManager::Shutdown, right? Because if events can, then it's possible that some of those events are going to call AsyncShutdown, and there might not be outstanding AsyncShutdown threads. Then WaitForAllAsyncShutdowns does nothing, and we're back to where we started. How likely is that? I'm not really sure. I don't know when these AsyncShutdown-calling events are getting inserted into the queue, though... r=me, but I'd really like to know whether we can come up with something more bulletproof, or we can assert that the above doesn't happen.
Attachment #8708499 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 16•7 years ago
|
||
From irc [11:16:56] <khuey> froydnj: in your comment, do you mean the *last* call to NS_ProcessPendingEvents? [11:17:30] <froydnj> khuey: no, I mean the first call: https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThreadManager.cpp#126 [11:17:47] <khuey> froydnj: if an event gets inserted that wants to call AsyncShutdown at that point, it doesn't matter [11:18:03] <froydnj> khuey: why not? [11:18:07] <khuey> froydnj: because we're going to go through and call Shutdown on all outstanding threads just below that [11:18:34] <khuey> froydnj: and there will be at least one thread outstanding [11:18:43] <khuey> froydnj: (because there's something to call AsyncShutdown on) [11:18:51] <khuey> froydnj: so we'll spin the event loop to do the (sync) Shutdown [11:19:05] <khuey> froydnj: and that'll execute any pending runnables that want to call AsyncShutdown on other threads [11:19:42] <khuey> froydnj: similarly once we've called Shutdown on all outstanding threads any pending runnables that want to call AsyncShutdown don't matter, because all threads have been synchronously Shutdown [11:20:26] <froydnj> khuey: ok, I think that makes sense...worth a comment in nsThreadManager::Shutdown? [11:20:32] <khuey> froydnj: sure
Comment hidden (Intermittent Failures Robot) |
Comment 18•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/657e5d2b51de
Comment hidden (Intermittent Failures Robot) |
Comment 20•7 years ago
|
||
seems after this landing we got test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=3103586&repo=mozilla-central
Flags: needinfo?(khuey)
Assignee | ||
Comment 22•7 years ago
|
||
I was hoping the assertion would give us a minidump but apparently not :/
Flags: needinfo?(khuey)
Keywords: leave-open
Comment 23•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c395b4777d7d
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
status-firefox46:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Comment 25•7 years ago
|
||
Per philor, This is very likely the cause for a very hard-to-star, frequent assertion (which is also the source of the bugs marked as dups of this in a chain) -- bug 1240672
Depends on: 1240672
Flags: needinfo?(khuey)
Assignee | ||
Comment 26•7 years ago
|
||
I can make the assertion non-fatal again (which presumably will turn the orange back into a leak). Is that easier to star?
Flags: needinfo?(rjesup)
Flags: needinfo?(philringnalda)
Flags: needinfo?(khuey)
Comment 27•7 years ago
|
||
Yep, leaks don't have tree-specific paths. I filed bug 1242178 on being able to star assertions, but I don't know how likely to be workable it is. The other thing you can do is to push the path out of the way of the bug search with a longer message - treeherder is searching for Assertion failure: IsEmpty(), at /builds/slave/try-lx-d-000000000000000000000/build/src/xpcom/thread so if you turn that into Assertion failure: IsEmpty() event queue should be empty while we are shutting down because we can't even then the bug summary can match what it searches for.
Flags: needinfo?(philringnalda)
Assignee | ||
Updated•7 years ago
|
Status: RESOLVED → REOPENED
Flags: needinfo?(rjesup)
Resolution: FIXED → ---
Comment hidden (Intermittent Failures Robot) |
Comment 29•7 years ago
|
||
Backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/dadf5ab56adc
![]() |
||
Comment 30•7 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/dadf5ab56adc (landed in time to the merge to Aurora)
Target Milestone: mozilla46 → ---
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 36•7 years ago
|
||
I have an idea on how to get us more information on this from automation. I'll try to whip it up this weekend.
Comment 37•7 years ago
|
||
Just an idle question - are we certain the "joiningThread" is MainThread? Too bad we can't hit this under rr..... if it's frequent enough, maybe we can? or maybe rr will make it *more* likely (often it makes things easier to hit, and often harder - depends on which way the timing gods got up that morning).
![]() |
||
Comment 38•7 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #37) > Just an idle question - are we certain the "joiningThread" is MainThread? I don't think it is, judging by: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8569ba92a355 and the failing log: 11:38:41 INFO - [2032] ###!!! ASSERTION: shouldn't be waiting on other threads to shutdown: 'mRequestedShutdownContexts.IsEmpty()', file /builds/slave/try-l64-d-00000000000000000000/build/src/xpcom/threads/nsThread.cpp, line 522 11:38:41 INFO - #01: nsThread::~nsThread() [xpcom/threads/nsThread.cpp:523] 11:38:41 INFO - #02: nsThread::Release() [xpcom/threads/nsThread.cpp:179] 11:38:42 INFO - #03: _PR_DestroyThreadPrivate [nsprpub/pr/src/threads/prtpd.c:238] 11:38:42 INFO - #04: _pt_thread_death_internal [nsprpub/pr/src/pthreads/ptthread.c:889] 11:38:42 INFO - #05: _pt_thread_death [nsprpub/pr/src/pthreads/ptthread.c:863] 11:38:42 INFO - #06: libpthread.so.0 + 0x7c83 11:38:42 INFO - #07: libpthread.so.0 + 0x7ea8 11:38:42 INFO - #08: libc.so.6 + 0xf42ed 11:38:42 INFO - [2032] ###!!! ASSERTION: Non-empty event queue being destroyed; events being leaked.: 'IsEmpty()', file /builds/slave/try-l64-d-00000000000000000000/build/src/xpcom/threads/nsEventQueue.cpp, line 36 11:38:42 INFO - #01: nsThread::nsChainedEventQueue::~nsChainedEventQueue() [mfbt/RefPtr.h:55] 11:38:42 INFO - #02: nsThread::~nsThread() [xpcom/glue/nsTArray.h:2095] 11:38:42 INFO - #03: nsThread::~nsThread() [xpcom/threads/nsThread.cpp:523] 11:38:42 INFO - #04: nsThread::Release() [xpcom/threads/nsThread.cpp:179] 11:38:42 INFO - #05: _PR_DestroyThreadPrivate [nsprpub/pr/src/threads/prtpd.c:238] 11:38:42 INFO - #06: _pt_thread_death_internal [nsprpub/pr/src/pthreads/ptthread.c:889] 11:38:42 INFO - #07: _pt_thread_death [nsprpub/pr/src/pthreads/ptthread.c:863] 11:38:42 INFO - #08: libpthread.so.0 + 0x7c83 11:38:42 INFO - #09: libpthread.so.0 + 0x7ea8 11:38:42 INFO - #10: libc.so.6 + 0xf42ed I *think* that means this is a non-main thread. There's another variant of the non-empty event queue one that *is* on the main thread, but I think that's a separate bug...the extra assertions I added in that try push don't seem to be firing, though.
![]() |
||
Comment 39•7 years ago
|
||
And jesup wins the prize for today: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a3fa8023b527 14:01:43 INFO - 133 @0x7f06a8f979a0 (0 references; 0 from COMPtrs) 14:01:43 INFO - allocation stack: 14:01:43 INFO - #00: nsThread::ShutdownInternal(bool) [xpcom/base/nsAutoPtr.h:34] 14:01:43 INFO - #01: nsThread::AsyncShutdown() [xpcom/threads/nsThread.cpp:720] 14:01:43 INFO - #02: nsThreadPool::PutEvent(already_AddRefed<nsIRunnable>&&) [xpcom/threads/nsThreadPool.cpp:125] 14:01:43 INFO - #03: nsThreadPool::Dispatch(already_AddRefed<nsIRunnable>&&, unsigned int) [xpcom/threads/nsThreadPool.cpp:271] 14:01:43 INFO - #04: mozilla::TaskQueue::DispatchLocked(already_AddRefed<nsIRunnable>, mozilla::TaskQueue::DispatchMode, mozilla::AbstractThread::DispatchFailureHandling, mozilla::AbstractThread::DispatchReason) [xpcom/threads/TaskQueue.cpp:66] 14:01:43 INFO - #05: mozilla::TaskQueue::Dispatch(already_AddRefed<nsIRunnable>, mozilla::AbstractThread::DispatchFailureHandling, mozilla::AbstractThread::DispatchReason) [xpcom/threads/TaskQueue.h:47] 14:01:43 INFO - #06: mozilla::detail::EventTarget<mozilla::AbstractThread>::Dispatch(mozilla::AbstractThread*, already_AddRefed<nsIRunnable>&&) [dom/media/MediaEventSource.h:109] 14:01:43 INFO - #07: mozilla::detail::ListenerHelper<mozilla::AbstractThread, mozilla::MediaEventSourceImpl<<anonymous>, Ts>::ConnectInternal(Target*, This*, Method) [with Target = mozilla::AbstractThread; This = mozilla::MediaDecoderStateMachine; Method = void (mozilla::MediaDecoderStateMachine::*)(const RefPtr<mozilla::MediaData>&); mozilla::ListenerMode Mode = (mozilla::ListenerMode)1; Es = {RefPtr<mozilla::MediaData>}; typename mozilla::EnableIf<mozilla::detail::TakeArgs<Method>::value, mozilla::MediaEventListener>::Type = mozilla::MediaEventListener]::<lambda(mozilla::MediaEventSourceImpl<(mozilla::ListenerMode)1, RefPtr<mozilla::MediaData> >::ArgType<RefPtr<mozilla::MediaData> >&&)> >::DispatchHelper<mozilla::MediaEventSourceImpl<<anonymous>, Ts>::ConnectInternal(Target*, This*, Method) [with Target = mozilla::AbstractThread; This = mozilla::MediaDecoderStateMachine; Method = void (mozilla::MediaDecoderStateMachine::*)(const RefPtr<mozilla::MediaData>&); mozilla::ListenerMode Mode = (mozilla::ListenerMode)1; Es = {RefPtr<mozilla::MediaData>}; typename mozilla::EnableIf<mozilla::detail::TakeArgs<Method>::value, mozilla::MediaEventListener>::Type = mozilla::MediaEventListener]::<lambda(mozilla::MediaEventSourceImpl<(mozilla::ListenerMode)1, RefPtr<mozilla::MediaData> >::ArgType<RefPtr<mozilla::MediaData> >&&)>, const RefPtr<mozilla::MediaData>&> [dom/media/MediaEventSource.h:177] 14:01:43 INFO - #08: void mozilla::MediaEventSourceImpl<(mozilla::ListenerMode)1, RefPtr<mozilla::MediaData> >::NotifyInternal<RefPtr<mozilla::MediaData>&>(RefPtr<mozilla::MediaData>&&&) [dom/media/MediaEventSource.h:468] 14:01:43 INFO - #09: mozilla::MediaQueue<mozilla::MediaData>::PopFront() [mfbt/RefPtr.h:206] 14:01:43 INFO - #10: mozilla::media::DecodedAudioDataSink::PopFrames(unsigned int) [mfbt/AlreadyAddRefed.h:116] 14:01:43 INFO - #11: mozilla::AudioStream::GetUnprocessed(mozilla::AudioBufferWriter&) [dom/media/AudioStream.cpp:583] 14:01:43 INFO - #12: mozilla::AudioStream::DataCallback(void*, long) [dom/media/AudioStream.cpp:658] 14:01:43 INFO - #13: trigger_user_callback [media/libcubeb/src/cubeb_pulse.c:215] 14:01:43 INFO - #14: libpulse.so.0 + 0x29ba7 14:01:43 INFO - #15: libpulsecommon-1.1.so + 0x2d9eb 14:01:43 INFO - #16: libpulse.so.0 + 0xebdd 14:01:43 INFO - #17: libpulsecommon-1.1.so + 0x322a9 14:01:43 INFO - #18: libpulse.so.0 + 0x21a1e 14:01:43 INFO - #19: libpulse.so.0 + 0x21de5 14:01:43 INFO - #20: libpulse.so.0 + 0x21e90 14:01:43 INFO - #21: libpulse.so.0 + 0x3030f 14:01:43 INFO - #22: libpulsecommon-1.1.so + 0x3fd18 14:01:43 INFO - #23: libpthread.so.0 + 0x7e9a 14:01:43 INFO - #24: libc.so.6 + 0xf42ed That is, we're asynchronously shutting down a thread in a thread pool that we don't need anymore, from a non-main-thread thread (one that I don't think we even control via the thread manager, since it looks like it was started from pulseaudio?). The source code: http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThreadPool.cpp#122 // We never dispatched any events to the thread, so we can shut it down // asynchronously without worrying about anything. MOZ_ALWAYS_TRUE(NS_SUCCEEDED(thread->AsyncShutdown())); I guess this would be fallout from bug 1200922, then, khuey?
Flags: needinfo?(khuey)
Assignee | ||
Comment 40•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #39) > That is, we're asynchronously shutting down a thread in a thread pool that > we don't need anymore, from a non-main-thread thread (one that I don't think > we even control via the thread manager, since it looks like it was started > from pulseaudio?). Ugh, jfc. On the bright side, it does explain why this only happens on Linux. I don't think things would have worked before my changes if we ended up in this code path. We would have tried to dispatch a runnable to the current thread that would never run ... Do we have a different intermittent runnable leak that just happened to go away when I landed my patch? I think the fix is to make this call nsThreadPool::ShutdownThread with this thread (and that can call AsyncShutdown, because why not), that way we shut down everything from a thread we control (the main thread). There's a more general problem here of how much of Gecko works when you start calling it from threads we don't control ... I hope this audio code doesn't do anything too crazy!
Flags: needinfo?(khuey)
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 42•7 years ago
|
||
Untested, but something like this.
Attachment #8714152 -
Flags: review?(nfroyd)
![]() |
||
Comment 43•7 years ago
|
||
Comment on attachment 8714152 [details] [diff] [review] Take two Review of attachment 8714152 [details] [diff] [review]: ----------------------------------------------------------------- This fix looks good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=81c83641606f I have some patches that helped in debugging this issue; I'll clean them up and post them in a different bug. ::: xpcom/threads/nsThreadPool.cpp @@ +136,5 @@ > LOG(("THRD-P(%p) shutdown async [%p]\n", this, aThread)); > > + // This is either called by a threadpool thread that is out of work, or > + // a thread that attempted to create a threadpool thread and raced in > + // such a way that the newly created thread is no longer necessary. Thank you for this comment.
Attachment #8714152 -
Flags: review?(nfroyd) → review+
Comment hidden (Intermittent Failures Robot) |
Comment 46•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/323f89d56ef3
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
status-firefox47:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Assignee | ||
Comment 49•7 years ago
|
||
Comment on attachment 8714152 [details] [diff] [review] Take two Approval Request Comment [Feature/regressing bug #]: Bug 1200922 [User impact if declined]: High frequency intermittent leaks [Describe test coverage new/current, TreeHerder]: This is well covered on TreeHerder, because the orange shows up all the time. [Risks and why]: Low risk [String/UUID change made/needed]: None
Attachment #8714152 -
Flags: approval-mozilla-beta?
Attachment #8714152 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 50•7 years ago
|
||
Comment on attachment 8708499 [details] [diff] [review] Patch See above. Note that this landed in time to make it to Aurora, so we only need it on Beta.
Attachment #8708499 -
Flags: approval-mozilla-beta?
Updated•7 years ago
|
status-firefox44:
--- → wontfix
status-firefox45:
--- → affected
Comment 51•7 years ago
|
||
Comment on attachment 8714152 [details] [diff] [review] Take two Improve the testsuite, taking it. Should be in 45 beta 3.
Attachment #8714152 -
Flags: approval-mozilla-beta?
Attachment #8714152 -
Flags: approval-mozilla-beta+
Attachment #8714152 -
Flags: approval-mozilla-aurora?
Attachment #8714152 -
Flags: approval-mozilla-aurora+
Updated•7 years ago
|
Attachment #8708499 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 52•7 years ago
|
||
bugherderuplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/1a004c1da680
Comment 54•7 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/56262438ad7d for the change needs to land on beta only
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•