Closed
Bug 1204784
Opened 9 years ago
Closed 9 years ago
Intermittent leakcheck | default process: 64 bytes leaked (nsRunnable, nsThreadShutdownAckEvent)
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla47
People
(Reporter: nigelb, Assigned: khuey)
References
Details
(Keywords: intermittent-failure, memory-leak)
Attachments
(2 files)
3.60 KB,
patch
|
froydnj
:
review+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
2.13 KB,
patch
|
froydnj
:
review+
Sylvestre
:
approval-mozilla-aurora+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
No description provided.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 3•9 years ago
|
||
Well that bug added this class, so yes, it must be.
Flags: needinfo?(khuey)
Assignee | ||
Comment 4•9 years ago
|
||
ASSERTION: Non-empty event queue being destroyed; events being leaked.: 'IsEmpty()'
Assignee | ||
Comment 5•9 years ago
|
||
*shrug*
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Updated•9 years ago
|
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Comment hidden (Intermittent Failures Robot) |
Comment 7•9 years ago
|
||
Kyle, this seems to be happening more often starting Jan 11.
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1204784&startday=2016-01-01&endday=2016-01-14&tree=all
Any idea what caused this to trigger more often? Can we clean up the events?
Flags: needinfo?(khuey)
Assignee | ||
Comment 8•9 years ago
|
||
It could have been landing bug 1237371 and bug 1237369, which uses this a lot more. But idk why it would lag a few days.
Not sure what's going on ... I'll think about it some more.
Assignee: nobody → khuey
Flags: needinfo?(khuey)
Assignee | ||
Comment 9•9 years ago
|
||
The only way I could see that we would put an event in a queue and not execute it is if there are nested event queues. Do you have any other ideas, Nathan? The Push/PopEventQueue stuff is only used by workers so it seems unlikely to be involved here ...
Flags: needinfo?(nfroyd)
Comment hidden (Intermittent Failures Robot) |
Comment 11•9 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #9)
> The only way I could see that we would put an event in a queue and not
> execute it is if there are nested event queues. Do you have any other
> ideas, Nathan? The Push/PopEventQueue stuff is only used by workers so it
> seems unlikely to be involved here ...
But we "promote" events from a popped event queue:
https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThread.cpp#1176
so Push/PopEventQueue doesn't seem relevant here...? We also have:
16:23:53 INFO - [Child 1950] ###!!! ASSERTION: Non-empty event queue being destroyed; events being leaked.: 'IsEmpty()', file /builds/slave/m-in-l64-d-0000000000000000000/build/src/xpcom/threads/nsEventQueue.cpp, line 36
16:23:53 INFO - #01: nsThread::nsChainedEventQueue::~nsChainedEventQueue() [mfbt/RefPtr.h:55]
16:23:53 INFO - #02: nsThread::~nsThread() [xpcom/glue/nsTArray.h:2093]
16:23:53 INFO - #03: nsThread::~nsThread() [xpcom/threads/nsThread.cpp:508]
16:23:53 INFO - #04: nsThread::Release() [xpcom/threads/nsThread.cpp:179]
16:23:53 INFO - #05: _PR_DestroyThreadPrivate [nsprpub/pr/src/threads/prtpd.c:237]
16:23:53 INFO - #06: _pt_thread_death_internal [nsprpub/pr/src/pthreads/ptthread.c:888]
16:23:53 INFO - #07: _pt_thread_death [nsprpub/pr/src/pthreads/ptthread.c:865]
16:23:53 INFO - #08: libpthread.so.0 + 0x7c83
16:23:53 INFO - #09: libpthread.so.0 + 0x7ea8
16:23:53 INFO - #10: libc.so.6 + 0xf42ed
so the root event queue must be non-empty when the thread dies, because that's the only nsChainedEventQueue we're destroying.
I also think that the ack event must be in the main thread's event queue, as that's where the shutdown ack event from bug 1237371 is going. The main thread is also where LSPAnnotater::Annotate is running, so the main thread would also be receiving a shutdown ack event there.
(I don't really understand how the stack can be libc -> pthread -> nsThread::Release...does that mean we're not actually releasing the main thread until TLS is emptied?)
In any event shutdown for the main thread looks a little strange:
https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThreadManager.cpp#126
There's probably some other IPC event loop stuff involved, but AFAICT the main thread isn't going through the loop here:
https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThread.cpp#390
which means that we're never setting mEventsAreDoomed on the main thread, which means that we keep accepting events until...always?
I think my bet is on the shutdown ack event coming here or after:
https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThreadManager.cpp#161
So the main thread is going through nsThreadManager::Shutdown, but finishing the processing of events before the predictor thread or the annotator thread can send back the shutdown ack...and then once the shutdown ack gets sent there, we leak.
Does that make any sense?
Flags: needinfo?(nfroyd)
Assignee | ||
Comment 12•9 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #11)
> I think my bet is on the shutdown ack event coming here or after:
>
> https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThreadManager.
> cpp#161
Hmm, it does seem like we need to wait for all the pending shutdown contexts to clear first.
Comment 13•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Keywords: leave-open
Assignee | ||
Comment 14•9 years ago
|
||
If the above theory is correct, this is the fix. It does seems like something we need regardless.
Attachment #8708499 -
Flags: review?(nfroyd)
Comment 15•9 years ago
|
||
Comment on attachment 8708499 [details] [diff] [review]
Patch
Review of attachment 8708499 [details] [diff] [review]:
-----------------------------------------------------------------
This looks fine in principle, but...this is assuming that we can never get any events into the main thread's event queue between the first call to NS_ProcessPendingEvents and the WaitForAllAsyncShutdowns in nsThreadManager::Shutdown, right? Because if events can, then it's possible that some of those events are going to call AsyncShutdown, and there might not be outstanding AsyncShutdown threads. Then WaitForAllAsyncShutdowns does nothing, and we're back to where we started.
How likely is that? I'm not really sure. I don't know when these AsyncShutdown-calling events are getting inserted into the queue, though...
r=me, but I'd really like to know whether we can come up with something more bulletproof, or we can assert that the above doesn't happen.
Attachment #8708499 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 16•9 years ago
|
||
From irc
[11:16:56] <khuey> froydnj: in your comment, do you mean the *last* call to NS_ProcessPendingEvents?
[11:17:30] <froydnj> khuey: no, I mean the first call:
https://dxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThreadManager.cpp#126
[11:17:47] <khuey> froydnj: if an event gets inserted that wants to call AsyncShutdown at that
point, it doesn't matter
[11:18:03] <froydnj> khuey: why not?
[11:18:07] <khuey> froydnj: because we're going to go through and call Shutdown on all outstanding
threads just below that
[11:18:34] <khuey> froydnj: and there will be at least one thread outstanding
[11:18:43] <khuey> froydnj: (because there's something to call AsyncShutdown on)
[11:18:51] <khuey> froydnj: so we'll spin the event loop to do the (sync) Shutdown
[11:19:05] <khuey> froydnj: and that'll execute any pending runnables that want to call
AsyncShutdown on other threads
[11:19:42] <khuey> froydnj: similarly once we've called Shutdown on all outstanding threads any
pending runnables that want to call AsyncShutdown don't matter, because all
threads have been synchronously Shutdown
[11:20:26] <froydnj> khuey: ok, I think that makes sense...worth a comment in
nsThreadManager::Shutdown?
[11:20:32] <khuey> froydnj: sure
Comment hidden (Intermittent Failures Robot) |
Comment 18•9 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment 20•9 years ago
|
||
seems after this landing we got test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=3103586&repo=mozilla-central
Flags: needinfo?(khuey)
Comment 21•9 years ago
|
||
Assignee | ||
Comment 22•9 years ago
|
||
I was hoping the assertion would give us a minidump but apparently not :/
Flags: needinfo?(khuey)
Keywords: leave-open
Comment 23•9 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
status-firefox46:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Comment 25•9 years ago
|
||
Per philor, This is very likely the cause for a very hard-to-star, frequent assertion (which is also the source of the bugs marked as dups of this in a chain) -- bug 1240672
Depends on: 1240672
Flags: needinfo?(khuey)
Assignee | ||
Comment 26•9 years ago
|
||
I can make the assertion non-fatal again (which presumably will turn the orange back into a leak). Is that easier to star?
Flags: needinfo?(rjesup)
Flags: needinfo?(philringnalda)
Flags: needinfo?(khuey)
Comment 27•9 years ago
|
||
Yep, leaks don't have tree-specific paths. I filed bug 1242178 on being able to star assertions, but I don't know how likely to be workable it is. The other thing you can do is to push the path out of the way of the bug search with a longer message - treeherder is searching for
Assertion failure: IsEmpty(), at /builds/slave/try-lx-d-000000000000000000000/build/src/xpcom/thread
so if you turn that into
Assertion failure: IsEmpty() event queue should be empty while we are shutting down because we can't even
then the bug summary can match what it searches for.
Flags: needinfo?(philringnalda)
Assignee | ||
Updated•9 years ago
|
Status: RESOLVED → REOPENED
Flags: needinfo?(rjesup)
Resolution: FIXED → ---
Comment hidden (Intermittent Failures Robot) |
Comment 29•9 years ago
|
||
Comment 30•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/dadf5ab56adc (landed in time to the merge to Aurora)
Target Milestone: mozilla46 → ---
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 36•9 years ago
|
||
I have an idea on how to get us more information on this from automation. I'll try to whip it up this weekend.
Comment 37•9 years ago
|
||
Just an idle question - are we certain the "joiningThread" is MainThread? Too bad we can't hit this under rr..... if it's frequent enough, maybe we can? or maybe rr will make it *more* likely (often it makes things easier to hit, and often harder - depends on which way the timing gods got up that morning).
Comment 38•9 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #37)
> Just an idle question - are we certain the "joiningThread" is MainThread?
I don't think it is, judging by:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8569ba92a355
and the failing log:
11:38:41 INFO - [2032] ###!!! ASSERTION: shouldn't be waiting on other threads to shutdown: 'mRequestedShutdownContexts.IsEmpty()', file /builds/slave/try-l64-d-00000000000000000000/build/src/xpcom/threads/nsThread.cpp, line 522
11:38:41 INFO - #01: nsThread::~nsThread() [xpcom/threads/nsThread.cpp:523]
11:38:41 INFO - #02: nsThread::Release() [xpcom/threads/nsThread.cpp:179]
11:38:42 INFO - #03: _PR_DestroyThreadPrivate [nsprpub/pr/src/threads/prtpd.c:238]
11:38:42 INFO - #04: _pt_thread_death_internal [nsprpub/pr/src/pthreads/ptthread.c:889]
11:38:42 INFO - #05: _pt_thread_death [nsprpub/pr/src/pthreads/ptthread.c:863]
11:38:42 INFO - #06: libpthread.so.0 + 0x7c83
11:38:42 INFO - #07: libpthread.so.0 + 0x7ea8
11:38:42 INFO - #08: libc.so.6 + 0xf42ed
11:38:42 INFO - [2032] ###!!! ASSERTION: Non-empty event queue being destroyed; events being leaked.: 'IsEmpty()', file /builds/slave/try-l64-d-00000000000000000000/build/src/xpcom/threads/nsEventQueue.cpp, line 36
11:38:42 INFO - #01: nsThread::nsChainedEventQueue::~nsChainedEventQueue() [mfbt/RefPtr.h:55]
11:38:42 INFO - #02: nsThread::~nsThread() [xpcom/glue/nsTArray.h:2095]
11:38:42 INFO - #03: nsThread::~nsThread() [xpcom/threads/nsThread.cpp:523]
11:38:42 INFO - #04: nsThread::Release() [xpcom/threads/nsThread.cpp:179]
11:38:42 INFO - #05: _PR_DestroyThreadPrivate [nsprpub/pr/src/threads/prtpd.c:238]
11:38:42 INFO - #06: _pt_thread_death_internal [nsprpub/pr/src/pthreads/ptthread.c:889]
11:38:42 INFO - #07: _pt_thread_death [nsprpub/pr/src/pthreads/ptthread.c:863]
11:38:42 INFO - #08: libpthread.so.0 + 0x7c83
11:38:42 INFO - #09: libpthread.so.0 + 0x7ea8
11:38:42 INFO - #10: libc.so.6 + 0xf42ed
I *think* that means this is a non-main thread. There's another variant of the non-empty event queue one that *is* on the main thread, but I think that's a separate bug...the extra assertions I added in that try push don't seem to be firing, though.
Comment 39•9 years ago
|
||
And jesup wins the prize for today:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a3fa8023b527
14:01:43 INFO - 133 @0x7f06a8f979a0 (0 references; 0 from COMPtrs)
14:01:43 INFO - allocation stack:
14:01:43 INFO - #00: nsThread::ShutdownInternal(bool) [xpcom/base/nsAutoPtr.h:34]
14:01:43 INFO - #01: nsThread::AsyncShutdown() [xpcom/threads/nsThread.cpp:720]
14:01:43 INFO - #02: nsThreadPool::PutEvent(already_AddRefed<nsIRunnable>&&) [xpcom/threads/nsThreadPool.cpp:125]
14:01:43 INFO - #03: nsThreadPool::Dispatch(already_AddRefed<nsIRunnable>&&, unsigned int) [xpcom/threads/nsThreadPool.cpp:271]
14:01:43 INFO - #04: mozilla::TaskQueue::DispatchLocked(already_AddRefed<nsIRunnable>, mozilla::TaskQueue::DispatchMode, mozilla::AbstractThread::DispatchFailureHandling, mozilla::AbstractThread::DispatchReason) [xpcom/threads/TaskQueue.cpp:66]
14:01:43 INFO - #05: mozilla::TaskQueue::Dispatch(already_AddRefed<nsIRunnable>, mozilla::AbstractThread::DispatchFailureHandling, mozilla::AbstractThread::DispatchReason) [xpcom/threads/TaskQueue.h:47]
14:01:43 INFO - #06: mozilla::detail::EventTarget<mozilla::AbstractThread>::Dispatch(mozilla::AbstractThread*, already_AddRefed<nsIRunnable>&&) [dom/media/MediaEventSource.h:109]
14:01:43 INFO - #07: mozilla::detail::ListenerHelper<mozilla::AbstractThread, mozilla::MediaEventSourceImpl<<anonymous>, Ts>::ConnectInternal(Target*, This*, Method) [with Target = mozilla::AbstractThread; This = mozilla::MediaDecoderStateMachine; Method = void (mozilla::MediaDecoderStateMachine::*)(const RefPtr<mozilla::MediaData>&); mozilla::ListenerMode Mode = (mozilla::ListenerMode)1; Es = {RefPtr<mozilla::MediaData>}; typename mozilla::EnableIf<mozilla::detail::TakeArgs<Method>::value, mozilla::MediaEventListener>::Type = mozilla::MediaEventListener]::<lambda(mozilla::MediaEventSourceImpl<(mozilla::ListenerMode)1, RefPtr<mozilla::MediaData> >::ArgType<RefPtr<mozilla::MediaData> >&&)> >::DispatchHelper<mozilla::MediaEventSourceImpl<<anonymous>, Ts>::ConnectInternal(Target*, This*, Method) [with Target = mozilla::AbstractThread; This = mozilla::MediaDecoderStateMachine; Method = void (mozilla::MediaDecoderStateMachine::*)(const RefPtr<mozilla::MediaData>&); mozilla::ListenerMode Mode = (mozilla::ListenerMode)1; Es = {RefPtr<mozilla::MediaData>}; typename mozilla::EnableIf<mozilla::detail::TakeArgs<Method>::value, mozilla::MediaEventListener>::Type = mozilla::MediaEventListener]::<lambda(mozilla::MediaEventSourceImpl<(mozilla::ListenerMode)1, RefPtr<mozilla::MediaData> >::ArgType<RefPtr<mozilla::MediaData> >&&)>, const RefPtr<mozilla::MediaData>&> [dom/media/MediaEventSource.h:177]
14:01:43 INFO - #08: void mozilla::MediaEventSourceImpl<(mozilla::ListenerMode)1, RefPtr<mozilla::MediaData> >::NotifyInternal<RefPtr<mozilla::MediaData>&>(RefPtr<mozilla::MediaData>&&&) [dom/media/MediaEventSource.h:468]
14:01:43 INFO - #09: mozilla::MediaQueue<mozilla::MediaData>::PopFront() [mfbt/RefPtr.h:206]
14:01:43 INFO - #10: mozilla::media::DecodedAudioDataSink::PopFrames(unsigned int) [mfbt/AlreadyAddRefed.h:116]
14:01:43 INFO - #11: mozilla::AudioStream::GetUnprocessed(mozilla::AudioBufferWriter&) [dom/media/AudioStream.cpp:583]
14:01:43 INFO - #12: mozilla::AudioStream::DataCallback(void*, long) [dom/media/AudioStream.cpp:658]
14:01:43 INFO - #13: trigger_user_callback [media/libcubeb/src/cubeb_pulse.c:215]
14:01:43 INFO - #14: libpulse.so.0 + 0x29ba7
14:01:43 INFO - #15: libpulsecommon-1.1.so + 0x2d9eb
14:01:43 INFO - #16: libpulse.so.0 + 0xebdd
14:01:43 INFO - #17: libpulsecommon-1.1.so + 0x322a9
14:01:43 INFO - #18: libpulse.so.0 + 0x21a1e
14:01:43 INFO - #19: libpulse.so.0 + 0x21de5
14:01:43 INFO - #20: libpulse.so.0 + 0x21e90
14:01:43 INFO - #21: libpulse.so.0 + 0x3030f
14:01:43 INFO - #22: libpulsecommon-1.1.so + 0x3fd18
14:01:43 INFO - #23: libpthread.so.0 + 0x7e9a
14:01:43 INFO - #24: libc.so.6 + 0xf42ed
That is, we're asynchronously shutting down a thread in a thread pool that we don't need anymore, from a non-main-thread thread (one that I don't think we even control via the thread manager, since it looks like it was started from pulseaudio?). The source code:
http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsThreadPool.cpp#122
// We never dispatched any events to the thread, so we can shut it down
// asynchronously without worrying about anything.
MOZ_ALWAYS_TRUE(NS_SUCCEEDED(thread->AsyncShutdown()));
I guess this would be fallout from bug 1200922, then, khuey?
Flags: needinfo?(khuey)
Assignee | ||
Comment 40•9 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #39)
> That is, we're asynchronously shutting down a thread in a thread pool that
> we don't need anymore, from a non-main-thread thread (one that I don't think
> we even control via the thread manager, since it looks like it was started
> from pulseaudio?).
Ugh, jfc.
On the bright side, it does explain why this only happens on Linux. I don't think things would have worked before my changes if we ended up in this code path. We would have tried to dispatch a runnable to the current thread that would never run ... Do we have a different intermittent runnable leak that just happened to go away when I landed my patch?
I think the fix is to make this call nsThreadPool::ShutdownThread with this thread (and that can call AsyncShutdown, because why not), that way we shut down everything from a thread we control (the main thread). There's a more general problem here of how much of Gecko works when you start calling it from threads we don't control ... I hope this audio code doesn't do anything too crazy!
Flags: needinfo?(khuey)
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 42•9 years ago
|
||
Untested, but something like this.
Attachment #8714152 -
Flags: review?(nfroyd)
Comment 43•9 years ago
|
||
Comment on attachment 8714152 [details] [diff] [review]
Take two
Review of attachment 8714152 [details] [diff] [review]:
-----------------------------------------------------------------
This fix looks good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=81c83641606f
I have some patches that helped in debugging this issue; I'll clean them up and post them in a different bug.
::: xpcom/threads/nsThreadPool.cpp
@@ +136,5 @@
> LOG(("THRD-P(%p) shutdown async [%p]\n", this, aThread));
>
> + // This is either called by a threadpool thread that is out of work, or
> + // a thread that attempted to create a threadpool thread and raced in
> + // such a way that the newly created thread is no longer necessary.
Thank you for this comment.
Attachment #8714152 -
Flags: review?(nfroyd) → review+
Comment 44•9 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
Comment 46•9 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
status-firefox47:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Assignee | ||
Comment 49•9 years ago
|
||
Comment on attachment 8714152 [details] [diff] [review]
Take two
Approval Request Comment
[Feature/regressing bug #]: Bug 1200922
[User impact if declined]: High frequency intermittent leaks
[Describe test coverage new/current, TreeHerder]: This is well covered on TreeHerder, because the orange shows up all the time.
[Risks and why]: Low risk
[String/UUID change made/needed]: None
Attachment #8714152 -
Flags: approval-mozilla-beta?
Attachment #8714152 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 50•9 years ago
|
||
Comment on attachment 8708499 [details] [diff] [review]
Patch
See above.
Note that this landed in time to make it to Aurora, so we only need it on Beta.
Attachment #8708499 -
Flags: approval-mozilla-beta?
Updated•9 years ago
|
status-firefox44:
--- → wontfix
status-firefox45:
--- → affected
Comment 51•9 years ago
|
||
Comment on attachment 8714152 [details] [diff] [review]
Take two
Improve the testsuite, taking it.
Should be in 45 beta 3.
Attachment #8714152 -
Flags: approval-mozilla-beta?
Attachment #8714152 -
Flags: approval-mozilla-beta+
Attachment #8714152 -
Flags: approval-mozilla-aurora?
Attachment #8714152 -
Flags: approval-mozilla-aurora+
Updated•9 years ago
|
Attachment #8708499 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 52•9 years ago
|
||
bugherder uplift |
Comment 53•9 years ago
|
||
Comment 54•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/56262438ad7d for the change needs to land on beta only
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•