1752305 - Stack overflow through MediaTrackGraphShutDownRunnable

Jens Stutte [:jstutte]

Reporter

Description

•

2 years ago

•

Edited

+++ This bug was initially created as a clone of Bug #1691517 +++

At least some of the crashes still visible in bug 1691517 seem to be caused by an infinite recursion that goes through the MediaTrackGraphShutDownRunnable.

Example: https://crash-stats.mozilla.org/report/index/ddaa54a7-555d-4fb0-a00b-204350220126

Alastor Wu [:alwu]

Updated

•

2 years ago

Component: Audio/Video → Audio/Video: MediaStreamGraph

Karl Tomlinson (:karlt)

Assignee

Updated

•

2 years ago

Severity: -- → S3

Updated

•

2 years ago

Comment 1

•

2 years ago

https://crash-stats.mozilla.org/report/index/92724d7e-f8ce-40ae-bf97-1a0e50220630#allthreads has 565 "MediaTrackGrph" threads.
477 of these have passed SHDRCV and SHDACK.
These seem to all be waiting in _PR_NotifyJoinWaiters() implying that nsThreadShutdownAckEvent has not run on the joining (main) thread.
There is one "GraphRunner" thread in a similar state.

There are 479 nsThread::Shutdown() calls on the stack from MediaTrackGraphShutDownRunnable::Run().

Assignee: nobody → karlt

Karl Tomlinson (:karlt)

Assignee

Comment 2

•

2 years ago

https://treeherder.mozilla.org/jobs?repo=try&revision=cb4ce1c9f7bffe7587b1c7b7c992dda3d070d97a

Karl Tomlinson (:karlt)

Assignee

Comment 3

•

2 years ago

Attached file Bug 1752305 AsyncShutdown() MediaTrackGrph thread r?padenot — Details

This avoids putting a nested event loop on the stack, which may not unwind if
further threads are shutdown, leading to stack exhaustion.

GraphDriver::Shutdown() is called only after control of the graph has been
handed to the main thread so the graph is not running while this is called and
so MediaTrackGraphShutDownRunnable on the main thread should not need to wait
for thread shutdown.

Karl Tomlinson (:karlt)

Assignee

Updated

•

2 years ago

Comment 5

•

2 years ago

Copying crash signatures from duplicate bugs.

Crash Signature: [@ mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal] → [@ mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal] [@ mozilla::PerformanceCounterState::MaybeReportAccumulatedTime]

Jens Stutte [:jstutte]

Reporter

Comment 6

•

2 years ago

(In reply to Karl Tomlinson (back July 25 :karlt) from comment #1)

https://crash-stats.mozilla.org/report/index/92724d7e-f8ce-40ae-bf97-1a0e50220630#allthreads has 565 "MediaTrackGrph" threads.
477 of these have passed SHDRCV and SHDACK.
These seem to all be waiting in _PR_NotifyJoinWaiters() implying that nsThreadShutdownAckEvent has not run on the joining (main) thread.

Some nsThreadShutdownAckEvent might have run actually. I assume what happened here is that we stack several SpinEventLoopUntil for all those thread shutdowns, each waiting on a different nsThreadShutdownAckEvent. So if the topmost SpinEventLoopUntil did not receive its own nsThreadShutdownAckEvent, it will continue to block the entire stack.

There is one "GraphRunner" thread in a similar state.

There are 479 nsThread::Shutdown() calls on the stack from MediaTrackGraphShutDownRunnable::Run().

Karl, is it normal to see so many threads? I'd rather understand from MediaTrackGraphInitThreadRunnable that we want to re-use existing threads. So my assumption would be, that while the existing thread is waiting to join we already cleared our flags that signal a running thread and someone asks us to start a new thread - but we should not try to do so during shutdown (it would be torn down immediately anyways). Either MediaTrackGraphInitThreadRunnable or ThreadedDriver::RunThread should do some AppShutdown::IsInOrBeyond check for an appropriate phase, I think.

Jens Stutte [:jstutte]

Reporter

Updated

•

2 years ago

Crash Signature: [@ mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal] [@ mozilla::PerformanceCounterState::MaybeReportAccumulatedTime] → [@ mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal] [@ mozilla::PerformanceCounterState::MaybeReportAccumulatedTime]

Flags: needinfo?(karlt)

Jens Stutte [:jstutte]

Reporter

Comment 7

•

2 years ago

(hu, I did not touch crash signatures?)

Karl Tomlinson (:karlt)

Assignee

Updated

•

2 years ago

Comment 8

•

2 years ago

•

Edited

(In reply to Jens Stutte [:jstutte] from comment #6)

(In reply to Karl Tomlinson (back July 25 :karlt) from comment #1)

https://crash-stats.mozilla.org/report/index/92724d7e-f8ce-40ae-bf97-1a0e50220630#allthreads has 565 "MediaTrackGrph" threads.
477 of these have passed SHDRCV and SHDACK.
These seem to all be waiting in _PR_NotifyJoinWaiters() implying that nsThreadShutdownAckEvent has not run on the joining (main) thread.

Some nsThreadShutdownAckEvent might have run actually. I assume what happened here is that we stack several SpinEventLoopUntil for all those thread shutdowns, each waiting on a different nsThreadShutdownAckEvent. So if the topmost SpinEventLoopUntil did not receive its own nsThreadShutdownAckEvent, it will continue to block the entire stack.

Nested event loops are truly evil, the situation you describe being just one reason why, but the situation is slightly different here.

The non-main thread stacks in that report suggest that nsThreadShutdownAckEvent has not Run() for those threads. _PR_NotifyJoinWaiters() waits for PR_JoinThread() from nsThreadShutdownAckEvent::Run().

A different timing between thread shutdowns might give a chance for some nsThreadShutdownAckEvents to run, in which case perhaps their threads may have terminated and so they would no longer show in the crash report. However, the balance of the count of Shutdown() calls on the stack and the count of threads in _PR_NotifyJoinWaiters() indicates that this was not what happened.

This state where nsThreadShutdownAckEvents have not run suggests that the threads have all been shut down in succession, with MediaTrackGraphShutDownRunnables all queued before any nsThreadShutdownAckEvents are queued. Such a situation might occur on GC or page unload.

There are 479 nsThread::Shutdown() calls on the stack from MediaTrackGraphShutDownRunnable::Run().

Karl, is it normal to see so many threads?

It is not normal to see so many threads, but sites can trigger this.

I'd rather understand from MediaTrackGraphInitThreadRunnable that we want to re-use existing threads.

There is one active thread per MediaTrackGraph. The problem here is that we have multiple MediaTrackGraphs.

So my assumption would be, that while the existing thread is waiting to join we already cleared our flags that signal a running thread and someone asks us to start a new thread - but we should not try to do so during shutdown (it would be torn down immediately anyways). Either MediaTrackGraphInitThreadRunnable or ThreadedDriver::RunThread should do some AppShutdown::IsInOrBeyond check for an appropriate phase, I think.

The assumption is that creation of a new graph is triggered only from content JS, which will not run late in shutdown. Graphs are shut down when the window is destroyed. Realtime graphs are also shutdown during xpcom-will-shutdown, but I guess that is not necessary with shutdown on window destroy.

The main thread has not commenced XPCOM shutdown in the report here.

Flags: needinfo?(karlt)

Jens Stutte [:jstutte]

Reporter

Comment 9

•

2 years ago

•

Edited

The assumption is that creation of a new graph is triggered only from content JS, which will not run late in shutdown.

Well, we know that running (and blocking) content JS can interfere at least with our parent process shutdown, bug 1777198 tries to improve things here in general. Here we are in the same content process, IIUC, but some additional resilience would probably not hurt?

Karl Tomlinson (:karlt)

Assignee

Comment 10

•

2 years ago

•

Edited

(In reply to Karl Tomlinson (:karlt) from comment #8)

Realtime graphs are also shutdown during xpcom-will-shutdown, but I guess that is not necessary with shutdown on window destroy.

The first shutdown comes during discarding of a window, after which script can run but cannot create a new AudioContext because the window is no longer fully active.
The shutdown on xpcom-will-shutdown would have been necessary prior to the inclusion of the fully-active check.

(In reply to Jens Stutte [:jstutte] from comment #9)

Here we are in the same content process, IIUC, but some additional resilience would probably not hurt?

I expect the fully-active check provides the is-beyond check that you are expecting.

Pulsebot

Comment 11

•

2 years ago

Pushed by ktomlinson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2bbe4f4ee2c2
AsyncShutdown() MediaTrackGrph thread r=padenot

Marian-Vasile Laza

Comment 12

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/2bbe4f4ee2c2

Status: NEW → RESOLVED

Closed: 2 years ago

status-firefox105: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 105 Branch

Bugzilla

Quick Search

Stack overflow through MediaTrackGraphShutDownRunnable

Categories

(Core :: Audio/Video: MediaStreamGraph, defect)

Tracking

()

People

(Reporter: jstutte, Assigned: karlt)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Updated

Updated

Comment 1

Comment 2

Comment 3

Updated

Comment 5

Comment 6

Updated

Comment 7

Updated

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Attachment

General

Description

File Name

Content Type