Interrupt worklet thread JS infinite loops on MediaTrackGraph shutdown
Categories
(Core :: Web Audio, enhancement, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox77 | --- | wontfix |
firefox78 | --- | fixed |
People
(Reporter: karlt, Assigned: karlt)
References
(Blocks 1 open bug)
Details
Attachments
(4 files, 1 obsolete file)
JS_AddInterruptCallback()
and JS_RequestInterruptCallback()
are the functions available for this purpose.
Perhaps WatchdogManager
in XPCJSContext.cpp may be helpful here. It looks like it is intended for multiple threads, but I only see it used for XPCJSContext
and that only for main thread. Or perhaps this can be tied in with realtime priority code.
Comment 1•5 years ago
|
||
Linux is the real problem here, but incidentally makes it a bit easier to fix it I think.
Linux will call a callback when the soft-real-time limit is reached. In Gecko, we promote the callback thread like so. This in turns sets the following soft and hard-limit:
- Soft limit at 50ms
- Hard limit at 200ms
Considering that we process per block of 128 frames, with a typical minimum rate of 44.1kHz, this is a typical upper bound for the render quantum at 2.9ms. The limits are intentionally high to not be too aggressive when developers are in the process of writing their worklet, have sub-optimal code, and also to account for scheduling delays when the machine is under load.
When the soft-limit is reached, the kernel sends SIGXCPU
, Gecko handles this and a process-global flag is set. This can be extended to get the JSContext
and interrupt the script. It looks like this API is thread safe, I don't know whether it is callable from a signal handler. All in all this will probably work well, the goal is to always react in less than 200ms, otherwise the kernel sends SIGKILL
to the process. The graph can then decide to demote the thread if the under-runs are frequent, for example.
On other platforms, no fatal signal signal is sent to the process, but on the flip side there is no callback that helps us. We need a real-time watchdog in Gecko. We don't really want to use the normal watchdog, because it's not running on a real-time thread, and so we can't really have any guarantee that it will be scheduled fast enough when an under-run happens. Having a real-time thread blocked on a monitor with timeout will probably work, like so:
- When the graph starts an iteration, it dispatches a message to the real-time watchdog thread, that starts waiting on a monitor, that has a timeout
- When the graph is finished with the iteration, it signals the monitor
- The watchdog threads is unblocked, either via timeout or because it has been signaled
- If it has been woken up because it has timed out, interrupt the JS (and maybe demote the thread, maybe after repeated under-runs)
- Otherwise, return
On platforms where audio remoting is enabled (and this will be all of them in the long run), there is a single real-time audio callback thread per process, so a single watchdog thread is required, but we could make this per graph.
I'm a bit wary of sharing code there, the WatchdogManager
seems to be doing lots of things that are not needed to solve the problem here, including dangerous things like system calls.
Assignee | ||
Comment 2•5 years ago
|
||
Thank you, Paul. I hadn't considered that the watchdog thread should also be realtime. GraphRunner::OneIteration()
may be the best place to have a timed wait.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 3•5 years ago
|
||
I've gone with the approach of stopping JS execution on shutdown of the MTG. This avoids having to make a choice on the time to allow JS to run before interrupting, which is particularly difficult for offline contexts or when a debugger might be attached.
In the future, a mechanism can be added to demote the thread, which need not necessarily stop JS execution. IIUC this is not a necessity right now because the graph driver thread blocks, resetting the CPU time count, when waiting for the GraphRunner thread, and the GraphRunner thread is not realtime when the content process is sandboxed.
Assignee | ||
Comment 4•5 years ago
|
||
Assignee | ||
Comment 5•5 years ago
|
||
Depends on D71915
Assignee | ||
Comment 6•4 years ago
•
|
||
The patches thus far are sufficient only for interrupting worklet JS from offline contexts. Worklet JS from realtime contexts is not interrupted until process shutdown.
I'll add a patch to trigger graph shutdown when the window is being destroyed. Although there may be other clients using the graph, the same graph is used only for clients in the same window, which is going away.
AudioContext::BindToOwner()
is never called - see bug 1637159 and https://treeherder.mozilla.org/#/jobs?repo=try&revision=c8bf48f3d351883a6a1741a580dd1771a85e8edf.
Assignee | ||
Comment 7•4 years ago
|
||
DestroyMediaTrack() is called only on Unlink or destruction of the AudioDestinationNode.
If there are no references to the AudioDestinationNode, then the graph's last stream
will be destroyed and the graph will shut down itself.
Assignee | ||
Comment 8•4 years ago
|
||
Depends on D74812
Assignee | ||
Comment 9•4 years ago
|
||
Depends on D74813
Comment 10•4 years ago
|
||
Comment 11•4 years ago
|
||
Backed out for assertion failures on GraphDriver.cpp.
backout: https://hg.mozilla.org/integration/autoland/rev/bba0d6f3cd111d0eee1ec5cbe035303ad03d48b1
failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=302075028&repo=autoland&lineNumber=1314
[task 2020-05-13T09:52:38.132Z] 09:52:38 INFO - TEST-START | dom/media/test/test_bug448534.html
[task 2020-05-13T09:52:38.196Z] 09:52:38 INFO - GECKO(1852) | Assertion failure: promises->IsEmpty(), at /builds/worker/checkouts/gecko/dom/media/GraphDriver.cpp:537
[task 2020-05-13T09:52:38.201Z] 09:52:38 INFO - Initializing stack-fixing for the first stack frame, this may take a while...
[task 2020-05-13T09:52:54.823Z] 09:52:54 INFO - GECKO(1852) | #01: mozilla::AudioCallbackDriver::~AudioCallbackDriver() [dom/media/GraphDriver.cpp:533]
[task 2020-05-13T09:52:54.824Z] 09:52:54 INFO - GECKO(1852) | #02: mozilla::AudioCallbackDriver::Release() [dom/media/GraphDriver.h:560]
[task 2020-05-13T09:52:54.825Z] 09:52:54 INFO - GECKO(1852) | #03: mozilla::MediaTrackGraphImpl::OneIterationImpl(long long, long long, mozilla::AudioMixer*) [dom/media/MediaTrackGraph.cpp:1430]
[task 2020-05-13T09:52:54.825Z] 09:52:54 INFO - GECKO(1852) | #04: mozilla::GraphRunner::Run() [dom/media/GraphRunner.cpp:114]
[task 2020-05-13T09:52:54.825Z] 09:52:54 INFO - GECKO(1852) | #05: nsThread::ProcessNextEvent(bool, bool*) [mfbt/RefPtr.h:314]
[task 2020-05-13T09:52:54.827Z] 09:52:54 INFO - GECKO(1852) | #06: NS_ProcessNextEvent(nsIThread*, bool) [xpcom/threads/nsThreadUtils.cpp:501]
[task 2020-05-13T09:52:54.827Z] 09:52:54 INFO - GECKO(1852) | #07: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) [ipc/glue/MessagePump.cpp:332]
[task 2020-05-13T09:52:54.827Z] 09:52:54 INFO - GECKO(1852) | #08: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:291]
[task 2020-05-13T09:52:54.827Z] 09:52:54 INFO - GECKO(1852) | #09: nsThread::ThreadFunc(void*) [xpcom/threads/nsThread.cpp:446]
[task 2020-05-13T09:52:55.185Z] 09:52:55 INFO - GECKO(1852) | #10: _pt_root [nsprpub/pr/src/pthreads/ptthread.c:204]
[task 2020-05-13T09:52:55.185Z] 09:52:55 INFO - fix-stacks error: failed to read breakpad symbols dir/Users/cltbld/tasks/task_1589362135/build/symbols/libsystem_pthread.dylib
for/usr/lib/system/libsystem_pthread.dylib
[task 2020-05-13T09:52:55.185Z] 09:52:55 INFO - fix-stacks note: this is expected and harmless for system libraries on debug automation runs
Also failing:
- ThreadSanitizer: data race /builds/worker/checkouts/gecko/dom/media/MediaTrackGraph.cpp:1477:58 in mozilla::MediaTrackGraphImpl::ForceShutDown()::Message::Run() with log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=302074294&repo=autoland&lineNumber=2212
- dom/media/test/crashtests/audioworklet-iloop-1.html | application timed out after 370 seconds with no output with log https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=302067267&repo=autoland&lineNumber=1600
Assignee | ||
Comment 12•4 years ago
•
|
||
The crash stack is slightly more intuitive than the assertion stack:
~AudioCallbackDriver()
https://hg.mozilla.org/mozilla-central/annotate/b9feee4f447c/dom/media/GraphDriver.cpp#l537
https://hg.mozilla.org/mozilla-central/annotate/b9feee4f447c/dom/media/GraphDriver.cpp#l533
AudioCallbackDriver::Release()
https://hg.mozilla.org/mozilla-central/annotate/b9feee4f447c/dom/media/GraphDriver.h#l560
MediaTrackGraphImpl::OneIterationImpl() calling SwitchAtNextIteration(nullptr)
https://hg.mozilla.org/mozilla-central/annotate/b9feee4f447c/dom/media/MediaTrackGraph.cpp#l1427
GraphRunner::Run()
https://hg.mozilla.org/mozilla-central/annotate/b9feee4f447c/dom/media/GraphRunner.cpp#l114
Assignee | ||
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Comment 13•4 years ago
|
||
Comment 14•4 years ago
|
||
bugherder |
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 15•4 years ago
|
||
Bug 1642849 tracks remaining work, but I'll close this to track what landed in 78.
Assignee | ||
Updated•4 years ago
|
Comment 16•4 years ago
|
||
Comment on attachment 9147532 [details]
Bug 1625372 shut down MediaTrackGraph from AudioContext when window will be destroyed r?padenot
Revision D74814 was moved to bug 1642849. Setting attachment 9147532 [details] to obsolete.
Description
•