Stack overflow through MediaTrackGraphShutDownRunnable
Categories
(Core :: Audio/Video: MediaStreamGraph, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox105 | --- | fixed |
People
(Reporter: jstutte, Assigned: karlt)
References
Details
Crash Data
Attachments
(1 file)
+++ This bug was initially created as a clone of Bug #1691517 +++
At least some of the crashes still visible in bug 1691517 seem to be caused by an infinite recursion that goes through the MediaTrackGraphShutDownRunnable
.
Example: https://crash-stats.mozilla.org/report/index/ddaa54a7-555d-4fb0-a00b-204350220126
Updated•2 years ago
|
Assignee | ||
Comment 1•2 years ago
|
||
https://crash-stats.mozilla.org/report/index/92724d7e-f8ce-40ae-bf97-1a0e50220630#allthreads has 565 "MediaTrackGrph" threads.
477 of these have passed SHDRCV and SHDACK.
These seem to all be waiting in _PR_NotifyJoinWaiters()
implying that nsThreadShutdownAckEvent
has not run on the joining (main) thread.
There is one "GraphRunner" thread in a similar state.
There are 479 nsThread::Shutdown()
calls on the stack from MediaTrackGraphShutDownRunnable::Run()
.
Assignee | ||
Comment 2•2 years ago
|
||
Assignee | ||
Comment 3•2 years ago
|
||
This avoids putting a nested event loop on the stack, which may not unwind if
further threads are shutdown, leading to stack exhaustion.
GraphDriver::Shutdown() is called only after control of the graph has been
handed to the main thread so the graph is not running while this is called and
so MediaTrackGraphShutDownRunnable on the main thread should not need to wait
for thread shutdown.
Comment 5•2 years ago
|
||
Copying crash signatures from duplicate bugs.
Reporter | ||
Comment 6•2 years ago
|
||
(In reply to Karl Tomlinson (back July 25 :karlt) from comment #1)
https://crash-stats.mozilla.org/report/index/92724d7e-f8ce-40ae-bf97-1a0e50220630#allthreads has 565 "MediaTrackGrph" threads.
477 of these have passed SHDRCV and SHDACK.
These seem to all be waiting in_PR_NotifyJoinWaiters()
implying thatnsThreadShutdownAckEvent
has not run on the joining (main) thread.
Some nsThreadShutdownAckEvent
might have run actually. I assume what happened here is that we stack several SpinEventLoopUntil
for all those thread shutdowns, each waiting on a different nsThreadShutdownAckEvent
. So if the topmost SpinEventLoopUntil
did not receive its own nsThreadShutdownAckEvent
, it will continue to block the entire stack.
There is one "GraphRunner" thread in a similar state.
There are 479
nsThread::Shutdown()
calls on the stack fromMediaTrackGraphShutDownRunnable::Run()
.
Karl, is it normal to see so many threads? I'd rather understand from MediaTrackGraphInitThreadRunnable
that we want to re-use existing threads. So my assumption would be, that while the existing thread is waiting to join we already cleared our flags that signal a running thread and someone asks us to start a new thread - but we should not try to do so during shutdown (it would be torn down immediately anyways). Either MediaTrackGraphInitThreadRunnable
or ThreadedDriver::RunThread
should do some AppShutdown::IsInOrBeyond
check for an appropriate phase, I think.
Reporter | ||
Updated•2 years ago
|
Reporter | ||
Comment 7•2 years ago
|
||
(hu, I did not touch crash signatures?)
Assignee | ||
Comment 8•2 years ago
•
|
||
(In reply to Jens Stutte [:jstutte] from comment #6)
(In reply to Karl Tomlinson (back July 25 :karlt) from comment #1)
https://crash-stats.mozilla.org/report/index/92724d7e-f8ce-40ae-bf97-1a0e50220630#allthreads has 565 "MediaTrackGrph" threads.
477 of these have passed SHDRCV and SHDACK.
These seem to all be waiting in_PR_NotifyJoinWaiters()
implying thatnsThreadShutdownAckEvent
has not run on the joining (main) thread.Some
nsThreadShutdownAckEvent
might have run actually. I assume what happened here is that we stack severalSpinEventLoopUntil
for all those thread shutdowns, each waiting on a differentnsThreadShutdownAckEvent
. So if the topmostSpinEventLoopUntil
did not receive its ownnsThreadShutdownAckEvent
, it will continue to block the entire stack.
Nested event loops are truly evil, the situation you describe being just one reason why, but the situation is slightly different here.
The non-main thread stacks in that report suggest that nsThreadShutdownAckEvent
has not Run()
for those threads. _PR_NotifyJoinWaiters()
waits for PR_JoinThread()
from nsThreadShutdownAckEvent::Run()
.
A different timing between thread shutdowns might give a chance for some nsThreadShutdownAckEvent
s to run, in which case perhaps their threads may have terminated and so they would no longer show in the crash report. However, the balance of the count of Shutdown()
calls on the stack and the count of threads in _PR_NotifyJoinWaiters()
indicates that this was not what happened.
This state where nsThreadShutdownAckEvent
s have not run suggests that the threads have all been shut down in succession, with MediaTrackGraphShutDownRunnable
s all queued before any nsThreadShutdownAckEvent
s are queued. Such a situation might occur on GC or page unload.
There are 479
nsThread::Shutdown()
calls on the stack fromMediaTrackGraphShutDownRunnable::Run()
.Karl, is it normal to see so many threads?
It is not normal to see so many threads, but sites can trigger this.
I'd rather understand from
MediaTrackGraphInitThreadRunnable
that we want to re-use existing threads.
There is one active thread per MediaTrackGraph
. The problem here is that we have multiple MediaTrackGraphs
.
So my assumption would be, that while the existing thread is waiting to join we already cleared our flags that signal a running thread and someone asks us to start a new thread - but we should not try to do so during shutdown (it would be torn down immediately anyways). Either
MediaTrackGraphInitThreadRunnable
orThreadedDriver::RunThread
should do someAppShutdown::IsInOrBeyond
check for an appropriate phase, I think.
The assumption is that creation of a new graph is triggered only from content JS, which will not run late in shutdown. Graphs are shut down when the window is destroyed. Realtime graphs are also shutdown during xpcom-will-shutdown, but I guess that is not necessary with shutdown on window destroy.
The main thread has not commenced XPCOM shutdown in the report here.
Reporter | ||
Comment 9•2 years ago
•
|
||
The assumption is that creation of a new graph is triggered only from content JS, which will not run late in shutdown.
Well, we know that running (and blocking) content JS can interfere at least with our parent process shutdown, bug 1777198 tries to improve things here in general. Here we are in the same content process, IIUC, but some additional resilience would probably not hurt?
Assignee | ||
Comment 10•2 years ago
•
|
||
(In reply to Karl Tomlinson (:karlt) from comment #8)
Realtime graphs are also shutdown during xpcom-will-shutdown, but I guess that is not necessary with shutdown on window destroy.
The first shutdown comes during discarding of a window, after which script can run but cannot create a new AudioContext
because the window is no longer fully active.
The shutdown on xpcom-will-shutdown would have been necessary prior to the inclusion of the fully-active check.
(In reply to Jens Stutte [:jstutte] from comment #9)
Here we are in the same content process, IIUC, but some additional resilience would probably not hurt?
I expect the fully-active check provides the is-beyond check that you are expecting.
Comment 11•2 years ago
|
||
Pushed by ktomlinson@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2bbe4f4ee2c2 AsyncShutdown() MediaTrackGrph thread r=padenot
Comment 12•2 years ago
|
||
bugherder |
Description
•