Zoom tab crashes frequently with sandbox disabled, but leaves no crash report.
Categories
(Core :: Audio/Video: MediaStreamGraph, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox79 | --- | fixed |
People
(Reporter: emilio, Assigned: emilio)
References
Details
Attachments
(4 files)
Sorry for not having STR, but I started using zoom for the web today, and my tab crashed three or four times. However, there's no crash report or anything of that sort.
I tried to coredumpctl
some of them, and in all of them it says the following:
Program terminated with signal SIGXCPU, CPU time limit exceeded.
And they all seem to be in GC / CC code.
Program terminated with signal SIGXCPU, CPU time limit exceeded.
#0 0x00007fe03b2a8cba in CCGraphBuilder::NoteJSChild(JS::GCCellPtr const&) () from /home/emilio/firefox/libxul.so
[Current thread is 1 (Thread 0x7fe0200bc700 (LWP 118609))]
(gdb) btQuit
(gdb) bt
#0 0x00007fe03b2a8cba in CCGraphBuilder::NoteJSChild(JS::GCCellPtr const&) () at /home/emilio/firefox/libxul.so
#1 0x00007fe03b29ec46 in TraversalTracer::onChild(JS::GCCellPtr const&) () at /home/emilio/firefox/libxul.so
#2 0x00007fe03b7f0422 in bool DoCallback<JS::Value>(JS::CallbackTracer*, JS::Value*, char const*) ()
at /home/emilio/firefox/libxul.so
#3 0x00007fe03b7d2a43 in bool js::gc::TraceEdgeInternal<JS::Value>(JSTracer*, JS::Value*, char const*) ()
at /home/emilio/firefox/libxul.so
#4 0x00007fe03b6aaaf1 in JSObject::traceChildren(JSTracer*) () at /home/emilio/firefox/libxul.so
#5 0x00007fe03b7f0a79 in JS::TraceChildren(JSTracer*, JS::GCCellPtr) () at /home/emilio/firefox/libxul.so
#6 0x00007fe03b29e243 in mozilla::JSGCThingParticipant::TraverseNative(void*, nsCycleCollectionTraversalCallback&)
() at /home/emilio/firefox/libxul.so
#7 0x00007fe03b2a6e04 in CCGraphBuilder::BuildGraph(js::SliceBudget&) () at /home/emilio/firefox/libxul.so
#8 0x00007fe03ba1b652 in nsCycleCollector::MarkRoots(js::SliceBudget&) () at /home/emilio/firefox/libxul.so
#9 0x00007fe03b2add61 in nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*, bool) ()
at /home/emilio/firefox/libxul.so
#10 0x00007fe03ba1cbc3 in nsCycleCollector_collect(nsICycleCollectorListener*) () at /home/emilio/firefox/libxul.so
#11 0x00007fe03ba12390 in mozilla::CycleCollectedJSRuntime::OnGC(JSContext*, JSGCStatus, JS::GCReason) ()
at /home/emilio/firefox/libxul.so
#12 0x00007fe03cb3fc7b in js::gc::GCRuntime::maybeCallGCCallback(JSGCStatus, JS::GCReason) ()
at /home/emilio/firefox/libxul.so
#13 0x00007fe03cb402c3 in js::gc::GCRuntime::gcCycle(bool, js::SliceBudget, mozilla::Maybe<JSGCInvocationKind> const&, JS::GCReason) () at /home/emilio/firefox/libxul.so
#14 0x00007fe03cb40958 in js::gc::GCRuntime::collect(bool, js::SliceBudget, mozilla::Maybe<JSGCInvocationKind> const&, JS::GCReason) () at /home/emilio/firefox/libxul.so
#15 0x00007fe03cb34f66 in js::gc::GCRuntime::startGC(JSGCInvocationKind, JS::GCReason, long) ()
at /home/emilio/firefox/libxul.so
#16 0x00007fe03cb34ea8 in js::gc::GCRuntime::gcIfRequested() () at /home/emilio/firefox/libxul.so
#17 0x00007fe03ca55b5c in JSContext::handleInterrupt() () at /home/emilio/firefox/libxul.so
#18 0x00002ebffd7b5b02 in ()
#19 0x00002717828e5f10 in ()
#20 0x00007fe0200b97b0 in ()
#21 0x00007fe03f8188b0 in _ZN2js3jitL11vmFunctionsE.llvm.4211891828026429874 () at /home/emilio/firefox/libxul.so
#22 0x00002ebffe3693c2 in ()
#23 0x000000000000a820 in ()
#24 0x0000000000000002 in ()
#25 0x0000000000000000 in ()
A separate one:
#0 0x00007fd96470739c in bool DoCallback<JS::Value>(JS::CallbackTracer*, JS::Value*, char const*) ()
at /home/emilio/firefox/libxul.so
#1 0x00007fd9646e9a43 in bool js::gc::TraceEdgeInternal<JS::Value>(JSTracer*, JS::Value*, char const*) ()
at /home/emilio/firefox/libxul.so
#2 0x00007fd9645c1af1 in JSObject::traceChildren(JSTracer*) () at /home/emilio/firefox/libxul.so
#3 0x00007fd964707a79 in JS::TraceChildren(JSTracer*, JS::GCCellPtr) () at /home/emilio/firefox/libxul.so
#4 0x00007fd9641b5243 in mozilla::JSGCThingParticipant::TraverseNative(void*, nsCycleCollectionTraversalCallback&)
() at /home/emilio/firefox/libxul.so
#5 0x00007fd9641bde04 in CCGraphBuilder::BuildGraph(js::SliceBudget&) () at /home/emilio/firefox/libxul.so
#6 0x00007fd964932652 in nsCycleCollector::MarkRoots(js::SliceBudget&) () at /home/emilio/firefox/libxul.so
#7 0x00007fd9641c4d61 in nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*, bool) ()
at /home/emilio/firefox/libxul.so
#8 0x00007fd964933bc3 in nsCycleCollector_collect(nsICycleCollectorListener*) () at /home/emilio/firefox/libxul.so
#9 0x00007fd964929390 in mozilla::CycleCollectedJSRuntime::OnGC(JSContext*, JSGCStatus, JS::GCReason) ()
at /home/emilio/firefox/libxul.so
#10 0x00007fd965a56c7b in js::gc::GCRuntime::maybeCallGCCallback(JSGCStatus, JS::GCReason) ()
at /home/emilio/firefox/libxul.so
#11 0x00007fd965a572c3 in js::gc::GCRuntime::gcCycle(bool, js::SliceBudget, mozilla::Maybe<JSGCInvocationKind> const&, JS::GCReason) () at /home/emilio/firefox/libxul.so
#12 0x00007fd965a57958 in js::gc::GCRuntime::collect(bool, js::SliceBudget, mozilla::Maybe<JSGCInvocationKind> const&, JS::GCReason) () at /home/emilio/firefox/libxul.so
#13 0x00007fd965a4bf66 in js::gc::GCRuntime::startGC(JSGCInvocationKind, JS::GCReason, long) ()
at /home/emilio/firefox/libxul.so
#14 0x00007fd9646d0ebe in JSString* js::AllocateStringImpl<JSString, (js::AllowGC)1>(JSContext*, js::gc::InitialHeap)
() at /home/emilio/firefox/libxul.so
#15 0x00007fd9646263b8 in JSStructuredCloneReader::startRead(JS::MutableHandle<JS::Value>) ()
at /home/emilio/firefox/libxul.so
#16 0x00007fd964623242 in ReadStructuredClone(JSContext*, JSStructuredCloneData const&, JS::StructuredCloneScope, JS::MutableHandle<JS::Value>, JS::CloneDataPolicy const&, JSStructuredCloneCallbacks const*, void*) ()
at /home/emilio/firefox/libxul.so
#17 0x00007fd96598da49 in JS_ReadStructuredClone(JSContext*, JSStructuredCloneData const&, unsigned int, JS::StructuredCloneScope, JS::MutableHandle<JS::Value>, JS::CloneDataPolicy const&, JSStructuredCloneCallbacks const*, void*) ()
at /home/emilio/firefox/libxul.so
#18 0x00007fd964f30565 in mozilla::dom::StructuredCloneHolder::ReadFromBuffer(nsIGlobalObject*, JSContext*, JSStructuredCloneData&, unsigned int, JS::MutableHandle<JS::Value>, JS::CloneDataPolicy const&, mozilla::ErrorResult&) ()
at /home/emilio/firefox/libxul.so
#19 0x00007fd965278f7c in mozilla::dom::SharedMessageBody::Read(JSContext*, JS::MutableHandle<JS::Value>, mozilla::dom::RefMessageBodyService*, mozilla::dom::SharedMessageBody::ReadMethod, mozilla::ErrorResult&) ()
at /home/emilio/firefox/libxul.so
#20 0x00007fd96529e306 in mozilla::dom::PostMessageRunnable::DispatchMessage() const ()
at /home/emilio/firefox/libxul.so
#21 0x00007fd96529e125 in mozilla::dom::PostMessageRunnable::Run() () at /home/emilio/firefox/libxul.so
#22 0x00007fd9641df974 in nsThread::ProcessNextEvent(bool, bool*) () at /home/emilio/firefox/libxul.so
#23 0x00007fd9641dd80e in NS_ProcessPendingEvents(nsIThread*, unsigned int) () at /home/emilio/firefox/libxul.so
#24 0x00007fd964377178 in mozilla::MediaTrackGraphImpl::OneIterationImpl(long, long, mozilla::AudioMixer*) ()
at /home/emilio/firefox/libxul.so
#25 0x00007fd96516eff2 in mozilla::GraphRunner::Run() () at /home/emilio/firefox/libxul.so
#26 0x00007fd9641df974 in nsThread::ProcessNextEvent(bool, bool*) () at /home/emilio/firefox/libxul.so
#27 0x00007fd964b7885e in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ()
at /home/emilio/firefox/libxul.so
#28 0x00007fd964b538cf in MessageLoop::Run() () at /home/emilio/firefox/libxul.so
#29 0x00007fd96497cdae in nsThread::ThreadFunc(void*) () at /home/emilio/firefox/libxul.so
#30 0x00007fd96a84ab6e in _pt_root () at /home/emilio/firefox/libnspr4.so
Also, interestingly, they seem to be pretty close to each 10 minutes:
$ coredumpctl -r
Thu 2020-05-28 17:31:55 CEST 118563 1000 1000 24 present /home/emilio/firefox/firefox-bin
Thu 2020-05-28 17:21:49 CEST 117869 1000 1000 24 present /home/emilio/firefox/firefox-bin
Thu 2020-05-28 17:11:38 CEST 116880 1000 1000 24 present /home/emilio/firefox/firefox-bin
Thu 2020-05-28 17:01:26 CEST 116246 1000 1000 24 present /home/emilio/firefox/firefox-bin
Thu 2020-05-28 16:51:16 CEST 115532 1000 1000 24 present /home/emilio/firefox/firefox-bin
Thu 2020-05-28 16:41:07 CEST 111855 1000 1000 24 present /home/emilio/firefox/firefox-bin
(Probably the other 10 seconds or such on top are the time it took me to realize I had crashed and refresh the tab)
Any idea?
Comment 2•5 years ago
|
||
It looks like we're running low on memory, so we run a GC, but there's already a CC in progress, so we try to finish it synchronously. I don't know why that would crash, but if we're low on memory, and leaking lots of DOM stuff, then the CC's giant hash table is often a thing we crash while manipulating.
Maybe gsvelto has an idea why there's no crash report.
Assignee | ||
Comment 3•5 years ago
|
||
I suspect SIGXCPU is not really catchable, but ICBW
Assignee | ||
Comment 4•5 years ago
|
||
Here's a memory report taken a few seconds before crashing... Do you see anything interesting here Andrew?
Assignee | ||
Comment 5•5 years ago
|
||
Also I took a profile and the main thread seems pretty idle, I was about to take a profile at workers but then I crashed.
Assignee | ||
Comment 6•5 years ago
|
||
Hmm, maybe SIGXCPU can be caught, see bug 1575883...
Assignee | ||
Comment 7•5 years ago
|
||
(A bit closer to the crash)
Memory usage does seem to continuously increase.
Assignee | ||
Comment 8•5 years ago
|
||
https://share.firefox.dev/2XbwqN2 is a profile a bit before the crash too, in case it is useful.
Assignee | ||
Comment 9•5 years ago
|
||
https://share.firefox.dev/3gyqh59 is a profile of the host of the meeting (also running Linux), but which hasn't crashed yet. I'll post a memory report of that ASAP.
Assignee | ||
Comment 10•5 years ago
|
||
Memory report of the aforementioned machine.
Comment 11•5 years ago
|
||
SIGXCPU can be caught, but breakpad doesn't support it so it won't work ATM. Crashpad's exception handler supports it, so maybe I can quickly back-port that support into breakpad. I'll file an issue so I won't forget.
Comment 12•5 years ago
|
||
Oh, is the crash here from not the main thread? I'd assumed it was, but maybe not? We don't have incremental GC or CC off the main thread, so it can't be that. I guess the mozilla::ipc::MessagePumpForNonMainThreads::Run indicates it isn't the main thread. Oh, right, we must be calling the CC from CustomGCCallback(). We always run the CC from the GC on workers. Maybe if the worker is very active, it'll GC a lot of random points? Maybe we need to not CC every time we GC.
There are 7 different Zoom worker threads, which is probably more worker threads than I've ever seen. It looks like a lot of arrays.
Comment 13•5 years ago
|
||
Maybe there are a lot of objects in the CC graph on the worker thread, which causes bad performance in general, and between that and the worker threads being lower priority, it just takes a really long time to run an entire CC, which causes the SIGXCPU. I think the next step here would be to figure out what is going on with the CC. Maybe javascript.options.mem.log will give you some stats in the browser console. Failing that, setting the command line options to get CC logs for workers might help.
Andrew, have you seen anything like this before, where a process gets killed with SIGXCPU due to heavy activity in workers?
Comment 14•5 years ago
|
||
I believe these are (audio) worklet threads, not worker threads. That's why there's no WorkerThreadPrimaryRunnable in the fully symbolicated second backtrace in comment 0. Worklet threads seem to inherit the default nsThread priority of NS_NORMAL. Aside: Worker threads were bumped to normal thread priority in bug 1563950 in https://hg.mozilla.org/mozilla-central/rev/38729643d8c4.
Presumably the SIGXCPU stuff is related to https://searchfox.org/mozilla-central/source/dom/media/UnderrunHandlerLinux.cpp which is intentionally using it? It seems like that shouldn't be fatal then?
Comment 15•5 years ago
|
||
Oh, hm, it looks like the Worklet implementation also has its own PrimaryRunnable that should be on that stack too. It seems concerning that that isn't on the stack.
I'm hoping to learn more about the implementation of worklets soon, but in the meantime, I believe :baku is the (only) expert. NI'ing :baku.
Comment 16•5 years ago
|
||
Karl wrote a lot of the audio worklet code. Karl, do you have time to work on this?
Comment 17•5 years ago
|
||
This looks like a failure to install the SIGXCPU
handler.
Paul, the SIGXCPU handler is not installed when CanSandboxContent()
returns false, but I haven't found a similar condition when choosing whether to use audioipc or when promoting the GraphRunner thread.
Should that be always installed?
Thank you for reporting, Emilio. Can you paste the "Sandbox" section from your about:support, please?
WorkletThread::PrimaryRunnable
exists only for PaintWorklet
because that is not yet hooked into the painting thread(s).
I don't know what to make of the consistent 10 minute intervals.
Updated•5 years ago
|
Comment 18•5 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #17)
This looks like a failure to install the
SIGXCPU
handler.Paul, the SIGXCPU handler is not installed when
CanSandboxContent()
returns false, but I haven't found a similar condition when choosing whether to use audioipc or when promoting the GraphRunner thread.
Should that be always installed?
I think so, that's a mistake.
Assignee | ||
Comment 19•5 years ago
|
||
I had disabled sandboxing to workaround bug 1639181 (in order to use DMD), so I had security.sandbox.content.level=0
. So that makes sense and probably the handler should be installed regardless.
Assignee | ||
Comment 21•5 years ago
|
||
Comment 22•5 years ago
|
||
Updated•5 years ago
|
Comment 23•5 years ago
|
||
bugherder |
Description
•