Closed Bug 1896068 Opened 1 year ago Closed 1 year ago

Deadlock involving profiler mutex while stopping the profiler

Categories

(Core :: Gecko Profiler, defect, P1)

defect

Tracking

()

RESOLVED FIXED
128 Branch
Tracking Status
firefox128 --- fixed

People

(Reporter: jonco, Assigned: jonco)

References

Details

Attachments

(1 file)

As described in https://phabricator.services.mozilla.com/D209919, the patches in bug 1888429 produce a deadlock involving the Gecko Profiler mutex.

The stacks involved are below, with the first thread waiting on a JS mutex while holding the profiler mutex and the second thread waiting on profiler mutex:

PID 5404 | #01: js::GlobalHelperThreadState::wait(js::AutoLockHelperThreadState&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>) [js/src/vm/HelperThreads.cpp:924]
PID 5404 | #02: js::GlobalHelperThreadState::cancelOffThreadIonCompile(mozilla::Variant<JSScript*, JS::Zone*, js::ZonesInState, JSRuntime*> const&) [js/src/vm/HelperThreads.cpp:456]
PID 5404 | #03: js::ReleaseAllJITCode(JS::GCContext*) [js/src/gc/GCAPI.cpp:97]
PID 5404 | #04: js::GeckoProfilerRuntime::enable(bool) [js/src/vm/GeckoProfiler.cpp:89]
PID 5404 | #05: mozilla::profiler::ThreadRegistrationLockedRWOnThread::PollJSSampling() [tools/profiler/core/ProfilerThreadRegistrationData.cpp:241]
PID 5404 | #06: locked_profiler_stop(PSAutoLock const&) [tools/profiler/core/platform.cpp:0]
PID 5404 | #07: profiler_stop() [tools/profiler/core/platform.cpp:6647]
PID 5404 | #08: nsProfiler::StopProfiler(JSContext*, mozilla::dom::Promise**) [tools/profiler/gecko/nsProfiler.cpp:181]
PID 5404 | #09: ??? [/builds/worker/workspace/build/application/firefox/libxul.so + 0x383ab76]
PID 5404 | #10: CallMethodHelper::Call() [js/xpconnect/src/XPCWrappedNative.cpp:1174]
PID 5404 | #11: XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) [js/xpconnect/src/XPCWrappedNative.cpp:1120]
PID 5404 | #12: XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) [js/xpconnect/src/XPCWrappedNativeJSOps.cpp:966]
PID 5404 | #13: CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) [js/src/vm/Interpreter.cpp:480]
PID 5404 | #14: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) [js/src/vm/Interpreter.cpp:574]
PID 5404 | #15: js::Interpret(JSContext*, js::RunState&) [js/src/vm/Interpreter.cpp:0]
PID 5404 | #16: js::RunScript(JSContext*, js::RunState&) [js/src/vm/Interpreter.cpp:452]
PID 5404 | #17: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) [js/src/vm/Interpreter.cpp:606]
PID 5404 | #18: js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICFallbackStub*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) [js/src/jit/BaselineIC

And for the second thread:

PSAutoLock::PSAutoLock
ThreadRegistry::Register
ThreadRegistration::ThreadRegistration
ThreadRegistration::RegisterThread
profiler_register_thread
AutoProfilerRegisterThread::AutoProfilerRegisterThread
TaskController::RunPoolThread

This can be reproduced on try with the patches in bug 1888429 applied:

mach try fuzzy -q "'debug 'linux 'xpcshell" tools/profiler/tests/xpcshell

Based on mstange's comment in https://phabricator.services.mozilla.com/D209919:

I agree that it's a bad idea to call PollJSSampling() while the profiler
lock is being held. I think we should move all calls to PollJSSampling() out
of any locked_profiler_* functions.

I removed the calls to PollJSSampling() for the current thread from the places
this happened while the lock was held and moved them after the locked
region (replacing them with PollJSSamplingForCurrentThread()).

Mostly this was straightforward except for profiler_clear_js_context() since
PollJSSampling() needs to happen before ClearJSContext() clears mJSContext.

I confirmed that this fixes the deadlock.

Assignee: nobody → jcoppeard
Pushed by jcoppeard@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1f67c438e6b5 Don't call into the JS engine with gecko profiler mutex held to avoid deadlock r=profiler-reviewers,mstange

The severity field is not set for this bug.
:canova, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(canaltinova)
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 128 Branch
Severity: -- → S3
Flags: needinfo?(canaltinova)
Priority: -- → P1
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: