Deadlock involving profiler mutex while stopping the profiler
Categories
(Core :: Gecko Profiler, defect, P1)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox128 | --- | fixed |
People
(Reporter: jonco, Assigned: jonco)
References
Details
Attachments
(1 file)
As described in https://phabricator.services.mozilla.com/D209919, the patches in bug 1888429 produce a deadlock involving the Gecko Profiler mutex.
The stacks involved are below, with the first thread waiting on a JS mutex while holding the profiler mutex and the second thread waiting on profiler mutex:
PID 5404 | #01: js::GlobalHelperThreadState::wait(js::AutoLockHelperThreadState&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>) [js/src/vm/HelperThreads.cpp:924]
PID 5404 | #02: js::GlobalHelperThreadState::cancelOffThreadIonCompile(mozilla::Variant<JSScript*, JS::Zone*, js::ZonesInState, JSRuntime*> const&) [js/src/vm/HelperThreads.cpp:456]
PID 5404 | #03: js::ReleaseAllJITCode(JS::GCContext*) [js/src/gc/GCAPI.cpp:97]
PID 5404 | #04: js::GeckoProfilerRuntime::enable(bool) [js/src/vm/GeckoProfiler.cpp:89]
PID 5404 | #05: mozilla::profiler::ThreadRegistrationLockedRWOnThread::PollJSSampling() [tools/profiler/core/ProfilerThreadRegistrationData.cpp:241]
PID 5404 | #06: locked_profiler_stop(PSAutoLock const&) [tools/profiler/core/platform.cpp:0]
PID 5404 | #07: profiler_stop() [tools/profiler/core/platform.cpp:6647]
PID 5404 | #08: nsProfiler::StopProfiler(JSContext*, mozilla::dom::Promise**) [tools/profiler/gecko/nsProfiler.cpp:181]
PID 5404 | #09: ??? [/builds/worker/workspace/build/application/firefox/libxul.so + 0x383ab76]
PID 5404 | #10: CallMethodHelper::Call() [js/xpconnect/src/XPCWrappedNative.cpp:1174]
PID 5404 | #11: XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) [js/xpconnect/src/XPCWrappedNative.cpp:1120]
PID 5404 | #12: XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) [js/xpconnect/src/XPCWrappedNativeJSOps.cpp:966]
PID 5404 | #13: CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) [js/src/vm/Interpreter.cpp:480]
PID 5404 | #14: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) [js/src/vm/Interpreter.cpp:574]
PID 5404 | #15: js::Interpret(JSContext*, js::RunState&) [js/src/vm/Interpreter.cpp:0]
PID 5404 | #16: js::RunScript(JSContext*, js::RunState&) [js/src/vm/Interpreter.cpp:452]
PID 5404 | #17: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) [js/src/vm/Interpreter.cpp:606]
PID 5404 | #18: js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICFallbackStub*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) [js/src/jit/BaselineIC
And for the second thread:
PSAutoLock::PSAutoLock
ThreadRegistry::Register
ThreadRegistration::ThreadRegistration
ThreadRegistration::RegisterThread
profiler_register_thread
AutoProfilerRegisterThread::AutoProfilerRegisterThread
TaskController::RunPoolThread
This can be reproduced on try with the patches in bug 1888429 applied:
mach try fuzzy -q "'debug 'linux 'xpcshell" tools/profiler/tests/xpcshell
Comment 1•1 year ago
|
||
Additional discussion happened in https://phabricator.services.mozilla.com/D209919
| Assignee | ||
Comment 2•1 year ago
|
||
Based on mstange's comment in https://phabricator.services.mozilla.com/D209919:
I agree that it's a bad idea to call PollJSSampling() while the profiler
lock is being held. I think we should move all calls to PollJSSampling() out
of any locked_profiler_* functions.
I removed the calls to PollJSSampling() for the current thread from the places
this happened while the lock was held and moved them after the locked
region (replacing them with PollJSSamplingForCurrentThread()).
Mostly this was straightforward except for profiler_clear_js_context() since
PollJSSampling() needs to happen before ClearJSContext() clears mJSContext.
I confirmed that this fixes the deadlock.
| Assignee | ||
Updated•1 year ago
|
Comment 4•1 year ago
|
||
The severity field is not set for this bug.
:canova, could you have a look please?
For more information, please visit BugBot documentation.
Description
•