Closed Bug 1641605 Opened 1 month ago Closed 1 month ago

Zoom tab crashes frequently with sandbox disabled, but leaves no crash report.

Categories

(Core :: Audio/Video: MediaStreamGraph, defect)

Unspecified
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla79
Tracking Status
firefox79 --- fixed

People

(Reporter: emilio, Assigned: emilio)

References

Details

Attachments

(4 files)

Sorry for not having STR, but I started using zoom for the web today, and my tab crashed three or four times. However, there's no crash report or anything of that sort.

I tried to coredumpctl some of them, and in all of them it says the following:

Program terminated with signal SIGXCPU, CPU time limit exceeded.

And they all seem to be in GC / CC code.

Program terminated with signal SIGXCPU, CPU time limit exceeded.
#0  0x00007fe03b2a8cba in CCGraphBuilder::NoteJSChild(JS::GCCellPtr const&) () from /home/emilio/firefox/libxul.so
[Current thread is 1 (Thread 0x7fe0200bc700 (LWP 118609))]
(gdb) btQuit
(gdb) bt
#0  0x00007fe03b2a8cba in CCGraphBuilder::NoteJSChild(JS::GCCellPtr const&) () at /home/emilio/firefox/libxul.so
#1  0x00007fe03b29ec46 in TraversalTracer::onChild(JS::GCCellPtr const&) () at /home/emilio/firefox/libxul.so
#2  0x00007fe03b7f0422 in bool DoCallback<JS::Value>(JS::CallbackTracer*, JS::Value*, char const*) ()
    at /home/emilio/firefox/libxul.so
#3  0x00007fe03b7d2a43 in bool js::gc::TraceEdgeInternal<JS::Value>(JSTracer*, JS::Value*, char const*) ()
    at /home/emilio/firefox/libxul.so
#4  0x00007fe03b6aaaf1 in JSObject::traceChildren(JSTracer*) () at /home/emilio/firefox/libxul.so
#5  0x00007fe03b7f0a79 in JS::TraceChildren(JSTracer*, JS::GCCellPtr) () at /home/emilio/firefox/libxul.so
#6  0x00007fe03b29e243 in mozilla::JSGCThingParticipant::TraverseNative(void*, nsCycleCollectionTraversalCallback&)
    () at /home/emilio/firefox/libxul.so
#7  0x00007fe03b2a6e04 in CCGraphBuilder::BuildGraph(js::SliceBudget&) () at /home/emilio/firefox/libxul.so
#8  0x00007fe03ba1b652 in nsCycleCollector::MarkRoots(js::SliceBudget&) () at /home/emilio/firefox/libxul.so
#9  0x00007fe03b2add61 in nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*, bool) ()
    at /home/emilio/firefox/libxul.so
#10 0x00007fe03ba1cbc3 in nsCycleCollector_collect(nsICycleCollectorListener*) () at /home/emilio/firefox/libxul.so
#11 0x00007fe03ba12390 in mozilla::CycleCollectedJSRuntime::OnGC(JSContext*, JSGCStatus, JS::GCReason) ()
    at /home/emilio/firefox/libxul.so
#12 0x00007fe03cb3fc7b in js::gc::GCRuntime::maybeCallGCCallback(JSGCStatus, JS::GCReason) ()
    at /home/emilio/firefox/libxul.so
#13 0x00007fe03cb402c3 in js::gc::GCRuntime::gcCycle(bool, js::SliceBudget, mozilla::Maybe<JSGCInvocationKind> const&, JS::GCReason) () at /home/emilio/firefox/libxul.so
#14 0x00007fe03cb40958 in js::gc::GCRuntime::collect(bool, js::SliceBudget, mozilla::Maybe<JSGCInvocationKind> const&, JS::GCReason) () at /home/emilio/firefox/libxul.so
#15 0x00007fe03cb34f66 in js::gc::GCRuntime::startGC(JSGCInvocationKind, JS::GCReason, long) ()
    at /home/emilio/firefox/libxul.so
#16 0x00007fe03cb34ea8 in js::gc::GCRuntime::gcIfRequested() () at /home/emilio/firefox/libxul.so
#17 0x00007fe03ca55b5c in JSContext::handleInterrupt() () at /home/emilio/firefox/libxul.so
#18 0x00002ebffd7b5b02 in  ()
#19 0x00002717828e5f10 in  ()
#20 0x00007fe0200b97b0 in  ()
#21 0x00007fe03f8188b0 in _ZN2js3jitL11vmFunctionsE.llvm.4211891828026429874 () at /home/emilio/firefox/libxul.so
#22 0x00002ebffe3693c2 in  ()
#23 0x000000000000a820 in  ()
#24 0x0000000000000002 in  ()
#25 0x0000000000000000 in  ()

A separate one:

#0  0x00007fd96470739c in bool DoCallback<JS::Value>(JS::CallbackTracer*, JS::Value*, char const*) ()
    at /home/emilio/firefox/libxul.so
#1  0x00007fd9646e9a43 in bool js::gc::TraceEdgeInternal<JS::Value>(JSTracer*, JS::Value*, char const*) ()
    at /home/emilio/firefox/libxul.so
#2  0x00007fd9645c1af1 in JSObject::traceChildren(JSTracer*) () at /home/emilio/firefox/libxul.so
#3  0x00007fd964707a79 in JS::TraceChildren(JSTracer*, JS::GCCellPtr) () at /home/emilio/firefox/libxul.so
#4  0x00007fd9641b5243 in mozilla::JSGCThingParticipant::TraverseNative(void*, nsCycleCollectionTraversalCallback&)
    () at /home/emilio/firefox/libxul.so
#5  0x00007fd9641bde04 in CCGraphBuilder::BuildGraph(js::SliceBudget&) () at /home/emilio/firefox/libxul.so
#6  0x00007fd964932652 in nsCycleCollector::MarkRoots(js::SliceBudget&) () at /home/emilio/firefox/libxul.so
#7  0x00007fd9641c4d61 in nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*, bool) ()
    at /home/emilio/firefox/libxul.so
#8  0x00007fd964933bc3 in nsCycleCollector_collect(nsICycleCollectorListener*) () at /home/emilio/firefox/libxul.so
#9  0x00007fd964929390 in mozilla::CycleCollectedJSRuntime::OnGC(JSContext*, JSGCStatus, JS::GCReason) ()
    at /home/emilio/firefox/libxul.so
#10 0x00007fd965a56c7b in js::gc::GCRuntime::maybeCallGCCallback(JSGCStatus, JS::GCReason) ()
    at /home/emilio/firefox/libxul.so
#11 0x00007fd965a572c3 in js::gc::GCRuntime::gcCycle(bool, js::SliceBudget, mozilla::Maybe<JSGCInvocationKind> const&, JS::GCReason) () at /home/emilio/firefox/libxul.so
#12 0x00007fd965a57958 in js::gc::GCRuntime::collect(bool, js::SliceBudget, mozilla::Maybe<JSGCInvocationKind> const&, JS::GCReason) () at /home/emilio/firefox/libxul.so
#13 0x00007fd965a4bf66 in js::gc::GCRuntime::startGC(JSGCInvocationKind, JS::GCReason, long) ()
    at /home/emilio/firefox/libxul.so
#14 0x00007fd9646d0ebe in JSString* js::AllocateStringImpl<JSString, (js::AllowGC)1>(JSContext*, js::gc::InitialHeap)
    () at /home/emilio/firefox/libxul.so
#15 0x00007fd9646263b8 in JSStructuredCloneReader::startRead(JS::MutableHandle<JS::Value>) ()
    at /home/emilio/firefox/libxul.so
#16 0x00007fd964623242 in ReadStructuredClone(JSContext*, JSStructuredCloneData const&, JS::StructuredCloneScope, JS::MutableHandle<JS::Value>, JS::CloneDataPolicy const&, JSStructuredCloneCallbacks const*, void*) ()
    at /home/emilio/firefox/libxul.so
#17 0x00007fd96598da49 in JS_ReadStructuredClone(JSContext*, JSStructuredCloneData const&, unsigned int, JS::StructuredCloneScope, JS::MutableHandle<JS::Value>, JS::CloneDataPolicy const&, JSStructuredCloneCallbacks const*, void*) ()
    at /home/emilio/firefox/libxul.so
#18 0x00007fd964f30565 in mozilla::dom::StructuredCloneHolder::ReadFromBuffer(nsIGlobalObject*, JSContext*, JSStructuredCloneData&, unsigned int, JS::MutableHandle<JS::Value>, JS::CloneDataPolicy const&, mozilla::ErrorResult&) ()
    at /home/emilio/firefox/libxul.so
#19 0x00007fd965278f7c in mozilla::dom::SharedMessageBody::Read(JSContext*, JS::MutableHandle<JS::Value>, mozilla::dom::RefMessageBodyService*, mozilla::dom::SharedMessageBody::ReadMethod, mozilla::ErrorResult&) ()
    at /home/emilio/firefox/libxul.so
#20 0x00007fd96529e306 in mozilla::dom::PostMessageRunnable::DispatchMessage() const ()
    at /home/emilio/firefox/libxul.so
#21 0x00007fd96529e125 in mozilla::dom::PostMessageRunnable::Run() () at /home/emilio/firefox/libxul.so
#22 0x00007fd9641df974 in nsThread::ProcessNextEvent(bool, bool*) () at /home/emilio/firefox/libxul.so
#23 0x00007fd9641dd80e in NS_ProcessPendingEvents(nsIThread*, unsigned int) () at /home/emilio/firefox/libxul.so
#24 0x00007fd964377178 in mozilla::MediaTrackGraphImpl::OneIterationImpl(long, long, mozilla::AudioMixer*) ()
    at /home/emilio/firefox/libxul.so
#25 0x00007fd96516eff2 in mozilla::GraphRunner::Run() () at /home/emilio/firefox/libxul.so
#26 0x00007fd9641df974 in nsThread::ProcessNextEvent(bool, bool*) () at /home/emilio/firefox/libxul.so
#27 0x00007fd964b7885e in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ()
    at /home/emilio/firefox/libxul.so
#28 0x00007fd964b538cf in MessageLoop::Run() () at /home/emilio/firefox/libxul.so
#29 0x00007fd96497cdae in nsThread::ThreadFunc(void*) () at /home/emilio/firefox/libxul.so
#30 0x00007fd96a84ab6e in _pt_root () at /home/emilio/firefox/libnspr4.so

Also, interestingly, they seem to be pretty close to each 10 minutes:

$ coredumpctl -r
Thu 2020-05-28 17:31:55 CEST 118563  1000  1000  24 present   /home/emilio/firefox/firefox-bin
Thu 2020-05-28 17:21:49 CEST 117869  1000  1000  24 present   /home/emilio/firefox/firefox-bin
Thu 2020-05-28 17:11:38 CEST 116880  1000  1000  24 present   /home/emilio/firefox/firefox-bin
Thu 2020-05-28 17:01:26 CEST 116246  1000  1000  24 present   /home/emilio/firefox/firefox-bin
Thu 2020-05-28 16:51:16 CEST 115532  1000  1000  24 present   /home/emilio/firefox/firefox-bin
Thu 2020-05-28 16:41:07 CEST 111855  1000  1000  24 present   /home/emilio/firefox/firefox-bin

(Probably the other 10 seconds or such on top are the time it took me to realize I had crashed and refresh the tab)

Any idea?

This is more like the CC.

Component: JavaScript Engine → XPCOM

It looks like we're running low on memory, so we run a GC, but there's already a CC in progress, so we try to finish it synchronously. I don't know why that would crash, but if we're low on memory, and leaking lots of DOM stuff, then the CC's giant hash table is often a thing we crash while manipulating.

Maybe gsvelto has an idea why there's no crash report.

Flags: needinfo?(gsvelto)

I suspect SIGXCPU is not really catchable, but ICBW

Attached file memory-report2.json.gz

Here's a memory report taken a few seconds before crashing... Do you see anything interesting here Andrew?

Flags: needinfo?(continuation)

Also I took a profile and the main thread seems pretty idle, I was about to take a profile at workers but then I crashed.

Hmm, maybe SIGXCPU can be caught, see bug 1575883...

Attached file memory-report5.json.gz

(A bit closer to the crash)

Memory usage does seem to continuously increase.

https://share.firefox.dev/2XbwqN2 is a profile a bit before the crash too, in case it is useful.

https://share.firefox.dev/3gyqh59 is a profile of the host of the meeting (also running Linux), but which hasn't crashed yet. I'll post a memory report of that ASAP.

Memory report of the aforementioned machine.

SIGXCPU can be caught, but breakpad doesn't support it so it won't work ATM. Crashpad's exception handler supports it, so maybe I can quickly back-port that support into breakpad. I'll file an issue so I won't forget.

Flags: needinfo?(gsvelto)

Oh, is the crash here from not the main thread? I'd assumed it was, but maybe not? We don't have incremental GC or CC off the main thread, so it can't be that. I guess the mozilla::ipc::MessagePumpForNonMainThreads::Run indicates it isn't the main thread. Oh, right, we must be calling the CC from CustomGCCallback(). We always run the CC from the GC on workers. Maybe if the worker is very active, it'll GC a lot of random points? Maybe we need to not CC every time we GC.

There are 7 different Zoom worker threads, which is probably more worker threads than I've ever seen. It looks like a lot of arrays.

See Also: → 1641761

Maybe there are a lot of objects in the CC graph on the worker thread, which causes bad performance in general, and between that and the worker threads being lower priority, it just takes a really long time to run an entire CC, which causes the SIGXCPU. I think the next step here would be to figure out what is going on with the CC. Maybe javascript.options.mem.log will give you some stats in the browser console. Failing that, setting the command line options to get CC logs for workers might help.

Andrew, have you seen anything like this before, where a process gets killed with SIGXCPU due to heavy activity in workers?

Flags: needinfo?(continuation) → needinfo?(bugmail)

I believe these are (audio) worklet threads, not worker threads. That's why there's no WorkerThreadPrimaryRunnable in the fully symbolicated second backtrace in comment 0. Worklet threads seem to inherit the default nsThread priority of NS_NORMAL. Aside: Worker threads were bumped to normal thread priority in bug 1563950 in https://hg.mozilla.org/mozilla-central/rev/38729643d8c4.

Presumably the SIGXCPU stuff is related to https://searchfox.org/mozilla-central/source/dom/media/UnderrunHandlerLinux.cpp which is intentionally using it? It seems like that shouldn't be fatal then?

Flags: needinfo?(bugmail)

Oh, hm, it looks like the Worklet implementation also has its own PrimaryRunnable that should be on that stack too. It seems concerning that that isn't on the stack.

I'm hoping to learn more about the implementation of worklets soon, but in the meantime, I believe :baku is the (only) expert. NI'ing :baku.

Flags: needinfo?(amarchesini)

Karl wrote a lot of the audio worklet code. Karl, do you have time to work on this?

Flags: needinfo?(amarchesini) → needinfo?(karlt)

This looks like a failure to install the SIGXCPU handler.

Paul, the SIGXCPU handler is not installed when CanSandboxContent() returns false, but I haven't found a similar condition when choosing whether to use audioipc or when promoting the GraphRunner thread.
Should that be always installed?

Thank you for reporting, Emilio. Can you paste the "Sandbox" section from your about:support, please?

WorkletThread::PrimaryRunnable exists only for PaintWorklet because that is not yet hooked into the painting thread(s).

I don't know what to make of the consistent 10 minute intervals.

Blocks: 1575883
Component: XPCOM → Audio/Video: MediaStreamGraph
Flags: needinfo?(padenot)
Flags: needinfo?(karlt)
Flags: needinfo?(emilio)
OS: Unspecified → Linux

(In reply to Karl Tomlinson (:karlt) from comment #17)

This looks like a failure to install the SIGXCPU handler.

Paul, the SIGXCPU handler is not installed when CanSandboxContent() returns false, but I haven't found a similar condition when choosing whether to use audioipc or when promoting the GraphRunner thread.
Should that be always installed?

I think so, that's a mistake.

Flags: needinfo?(padenot)

I had disabled sandboxing to workaround bug 1639181 (in order to use DMD), so I had security.sandbox.content.level=0. So that makes sense and probably the handler should be installed regardless.

Flags: needinfo?(emilio)

I can send a patch :)

Assignee: nobody → emilio
Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/06a51605468f
Install SIGXCPU handler even if sandbox is disabled. r=padenot
Summary: Zoom tab crashes frequently, but leaves no crash report. → Zoom tab crashes frequently with sandbox disabled, but leaves no crash report.
Status: NEW → RESOLVED
Closed: 1 month ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79
You need to log in before you can comment on or make changes to this bug.