Closed Bug 1625372 Opened 1 year ago Closed 1 year ago

Interrupt worklet thread JS infinite loops on MediaTrackGraph shutdown

Categories

(Core :: Web Audio, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla78
Tracking Status
firefox-esr68 --- unaffected
firefox77 --- wontfix
firefox78 --- fixed

People

(Reporter: karlt, Assigned: karlt)

References

(Blocks 1 open bug)

Details

Attachments

(4 files, 1 obsolete file)

JS_AddInterruptCallback() and JS_RequestInterruptCallback() are the functions available for this purpose.

Perhaps WatchdogManager in XPCJSContext.cpp may be helpful here. It looks like it is intended for multiple threads, but I only see it used for XPCJSContext and that only for main thread. Or perhaps this can be tied in with realtime priority code.

Blocks: 1625375

Linux is the real problem here, but incidentally makes it a bit easier to fix it I think.

Linux will call a callback when the soft-real-time limit is reached. In Gecko, we promote the callback thread like so. This in turns sets the following soft and hard-limit:

  • Soft limit at 50ms
  • Hard limit at 200ms

Considering that we process per block of 128 frames, with a typical minimum rate of 44.1kHz, this is a typical upper bound for the render quantum at 2.9ms. The limits are intentionally high to not be too aggressive when developers are in the process of writing their worklet, have sub-optimal code, and also to account for scheduling delays when the machine is under load.

When the soft-limit is reached, the kernel sends SIGXCPU, Gecko handles this and a process-global flag is set. This can be extended to get the JSContext and interrupt the script. It looks like this API is thread safe, I don't know whether it is callable from a signal handler. All in all this will probably work well, the goal is to always react in less than 200ms, otherwise the kernel sends SIGKILL to the process. The graph can then decide to demote the thread if the under-runs are frequent, for example.

On other platforms, no fatal signal signal is sent to the process, but on the flip side there is no callback that helps us. We need a real-time watchdog in Gecko. We don't really want to use the normal watchdog, because it's not running on a real-time thread, and so we can't really have any guarantee that it will be scheduled fast enough when an under-run happens. Having a real-time thread blocked on a monitor with timeout will probably work, like so:

  • When the graph starts an iteration, it dispatches a message to the real-time watchdog thread, that starts waiting on a monitor, that has a timeout
  • When the graph is finished with the iteration, it signals the monitor
  • The watchdog threads is unblocked, either via timeout or because it has been signaled
  • If it has been woken up because it has timed out, interrupt the JS (and maybe demote the thread, maybe after repeated under-runs)
  • Otherwise, return

On platforms where audio remoting is enabled (and this will be all of them in the long run), there is a single real-time audio callback thread per process, so a single watchdog thread is required, but we could make this per graph.

I'm a bit wary of sharing code there, the WatchdogManager seems to be doing lots of things that are not needed to solve the problem here, including dangerous things like system calls.

Thank you, Paul. I hadn't considered that the watchdog thread should also be realtime. GraphRunner::OneIteration() may be the best place to have a timed wait.

Assignee: nobody → karlt
Status: NEW → ASSIGNED

I've gone with the approach of stopping JS execution on shutdown of the MTG. This avoids having to make a choice on the time to allow JS to run before interrupting, which is particularly difficult for offline contexts or when a debugger might be attached.

In the future, a mechanism can be added to demote the thread, which need not necessarily stop JS execution. IIUC this is not a necessity right now because the graph driver thread blocks, resetting the CPU time count, when waiting for the GraphRunner thread, and the GraphRunner thread is not realtime when the content process is sandboxed.

See Also: → 1628198

The patches thus far are sufficient only for interrupting worklet JS from offline contexts. Worklet JS from realtime contexts is not interrupted until process shutdown.

I'll add a patch to trigger graph shutdown when the window is being destroyed. Although there may be other clients using the graph, the same graph is used only for clients in the same window, which is going away.

AudioContext::BindToOwner() is never called - see bug 1637159 and https://treeherder.mozilla.org/#/jobs?repo=try&revision=c8bf48f3d351883a6a1741a580dd1771a85e8edf.

See Also: → 1637159

DestroyMediaTrack() is called only on Unlink or destruction of the AudioDestinationNode.
If there are no references to the AudioDestinationNode, then the graph's last stream
will be destroyed and the graph will shut down itself.

Depends on: 1637491
Pushed by ktomlinson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ca93e9921c05
request interrupt of AudioWorklet JS on force shutdown r=padenot
https://hg.mozilla.org/integration/autoland/rev/6e9928fb37d1
remove unnecessary DestroyNonRealtimeInstance() call from AudioDestinationNode::DestroyMediaTrack() r=padenot
https://hg.mozilla.org/integration/autoland/rev/7a38398623f8
permit ForceShutDown() on a realtime graph r=padenot
https://hg.mozilla.org/integration/autoland/rev/d86f066bd68b
shut down MediaTrackGraph from AudioContext when window will be destroyed r=padenot
https://hg.mozilla.org/integration/autoland/rev/b9feee4f447c
add crashtest for JS infinite loops on AudioWorklet thread r=padenot
Regressions: 1637551

Backed out for assertion failures on GraphDriver.cpp.

backout: https://hg.mozilla.org/integration/autoland/rev/bba0d6f3cd111d0eee1ec5cbe035303ad03d48b1

push: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=os%2Cx%2C10.14%2Cdebug%2Cmochitests%2Cwith%2Csocket%2Cprocess%2Ctest-macosx1014-64%2Fdebug-mochitest-media-spi-e10s-2%2Cm-spi%28mda2%29&tochange=7fa9bdd554dda2441c633a2bf8718a27f36f9c42&fromchange=c31fffdb52a74cc35387d8112c7519cd1cb0e482&selectedTaskRun=QSngcSD7SbWxqHfo9JQ8Zg-0

failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=302075028&repo=autoland&lineNumber=1314

[task 2020-05-13T09:52:38.132Z] 09:52:38 INFO - TEST-START | dom/media/test/test_bug448534.html
[task 2020-05-13T09:52:38.196Z] 09:52:38 INFO - GECKO(1852) | Assertion failure: promises->IsEmpty(), at /builds/worker/checkouts/gecko/dom/media/GraphDriver.cpp:537
[task 2020-05-13T09:52:38.201Z] 09:52:38 INFO - Initializing stack-fixing for the first stack frame, this may take a while...
[task 2020-05-13T09:52:54.823Z] 09:52:54 INFO - GECKO(1852) | #01: mozilla::AudioCallbackDriver::~AudioCallbackDriver() [dom/media/GraphDriver.cpp:533]
[task 2020-05-13T09:52:54.824Z] 09:52:54 INFO - GECKO(1852) | #02: mozilla::AudioCallbackDriver::Release() [dom/media/GraphDriver.h:560]
[task 2020-05-13T09:52:54.825Z] 09:52:54 INFO - GECKO(1852) | #03: mozilla::MediaTrackGraphImpl::OneIterationImpl(long long, long long, mozilla::AudioMixer*) [dom/media/MediaTrackGraph.cpp:1430]
[task 2020-05-13T09:52:54.825Z] 09:52:54 INFO - GECKO(1852) | #04: mozilla::GraphRunner::Run() [dom/media/GraphRunner.cpp:114]
[task 2020-05-13T09:52:54.825Z] 09:52:54 INFO - GECKO(1852) | #05: nsThread::ProcessNextEvent(bool, bool*) [mfbt/RefPtr.h:314]
[task 2020-05-13T09:52:54.827Z] 09:52:54 INFO - GECKO(1852) | #06: NS_ProcessNextEvent(nsIThread*, bool) [xpcom/threads/nsThreadUtils.cpp:501]
[task 2020-05-13T09:52:54.827Z] 09:52:54 INFO - GECKO(1852) | #07: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) [ipc/glue/MessagePump.cpp:332]
[task 2020-05-13T09:52:54.827Z] 09:52:54 INFO - GECKO(1852) | #08: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:291]
[task 2020-05-13T09:52:54.827Z] 09:52:54 INFO - GECKO(1852) | #09: nsThread::ThreadFunc(void*) [xpcom/threads/nsThread.cpp:446]
[task 2020-05-13T09:52:55.185Z] 09:52:55 INFO - GECKO(1852) | #10: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:204]
[task 2020-05-13T09:52:55.185Z] 09:52:55 INFO - fix-stacks error: failed to read breakpad symbols dir /Users/cltbld/tasks/task_1589362135/build/symbols/libsystem_pthread.dylib for /usr/lib/system/libsystem_pthread.dylib
[task 2020-05-13T09:52:55.185Z] 09:52:55 INFO - fix-stacks note: this is expected and harmless for system libraries on debug automation runs

Also failing:

Flags: needinfo?(karlt)
Depends on: 1637837
Keywords: leave-open
Depends on: 1638243
Attachment #9142324 - Attachment description: Bug 1625372 request interrupt of AudioWorklet JS on force shutdown r?padenot → Bug 1625372 request interrupt of AudioWorklet JS on force shutdown r=padenot
Attachment #9147531 - Attachment description: Bug 1625372 permit ForceShutDown() on a realtime graph r?padenot → Bug 1625372 permit ForceShutDown() on a realtime graph r=padenot
Pushed by ktomlinson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/152617fc95a0
request interrupt of AudioWorklet JS on force shutdown r=padenot
https://hg.mozilla.org/integration/autoland/rev/9488c1c33caf
remove unnecessary DestroyNonRealtimeInstance() call from AudioDestinationNode::DestroyMediaTrack() r=padenot
https://hg.mozilla.org/integration/autoland/rev/8c9fb10d84ed
permit ForceShutDown() on a realtime graph r=padenot
Keywords: leave-open
Depends on: 1633493
Blocks: 1642849

Bug 1642849 tracks remaining work, but I'll close this to track what landed in 78.

No longer blocks: 1625375
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
No longer depends on: 1637491, 1638243, 1633493, 1637837
Resolution: --- → FIXED
See Also: 1637159
Target Milestone: --- → mozilla78
Summary: Interrupt JS infinite loops on worklet thread → Interrupt worklet thread JS infinite loops on MediaTrackGraph shutdown

Comment on attachment 9147532 [details]
Bug 1625372 shut down MediaTrackGraph from AudioContext when window will be destroyed r?padenot

Revision D74814 was moved to bug 1642849. Setting attachment 9147532 [details] to obsolete.

Attachment #9147532 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.