Open Bug 1804062 Opened 1 year ago Updated 1 year ago

Deadlock when profiling and recording power metrics

Categories

(Core :: Gecko Profiler, defect, P2)

defect

Tracking

()

People

(Reporter: padenot, Unassigned)

Details

There are four mutex at play here:

M0 is the thread registry lock from the Firefox Profiler
M1 is the LabeledMirrorLock from Glean
M2 is the lock to record scalars from Telemetry
M3 is the lock for a particular nsTimer

With the notation x -> y with locks x and y meaning x was held and y is now acquired, we have:

M0 -> M1 -> M2 -> M3 -> M0

The stacks are the following:

Mutex M0 acquired by main thread here:
#0 pthread_rwlock_rdlock ??:?
#1 mozilla::detail::RWLockImpl::readLock() ??:?
#2 profiler_record_wakeup_count(nsTSubstring<char> const&) /objdir-ff-tsan/dist/include/mozilla/BaseProfilerDetail.h:224
#3 BaseProfilerAutoLockShared /objdir-ff-tsan/dist/include/mozilla/BaseProfilerDetail.h:266
#4 LockedRegistry /objdir-ff-tsan/dist/include/mozilla/ProfilerThreadRegistry.h:217
#5 profiler_record_wakeup_count(nsTSubstring<char> const&) /tools/profiler/core/platform.cpp:6646
#6 mozilla::glean::RecordPowerMetrics() /toolkit/components/glean/ipc/FOGIPC.cpp:357
#7 mozilla::dom::ContentChild::RecvNotifyProcessPriorityChanged(mozilla::hal::ProcessPriority const&) /dom/ipc/ContentChild.cpp:2802

Mutex M1 acquired here while holding mutex M0 in main thread:
#0 pthread_mutex_lock ??:?
#1 mozilla::detail::MutexImpl::lock() ??:?
#2 Lock /objdir-ff-tsan/dist/include/mozilla/Mutex.h:65
#3 Lock /objdir-ff-tsan/dist/include/mozilla/StaticMutex.h:39
#4 AutoLockBase /objdir-ff-tsan/dist/include/mozilla/DataMutex.h:82
#5 Lock /objdir-ff-tsan/dist/include/mozilla/DataMutex.h:96
#6 GetLabeledMirrorLock /objdir-ff-tsan/dist/include/mozilla/glean/bindings/ScalarGIFFTMap.h:33
#7 mozilla::glean::impl::CounterMetric::Add(int) const /toolkit/components/glean/bindings/private/Counter.cpp:26
#8 mozilla::glean::RecordThreadCpuUse(nsTSubstring<char> const&, unsigned long, unsigned long) /toolkit/components/glean/ipc/FOGIPC.cpp:165
#9 mozilla::profiler::ThreadRegistrationUnlockedConstReaderAndAtomicRW::RecordWakeCount() const /tools/profiler/core/ProfilerThreadRegistrationData.cpp:290
#10 profiler_record_wakeup_count(nsTSubstring<char> const&) /tools/profiler/core/platform.cpp:6650
#11 mozilla::glean::RecordPowerMetrics() /toolkit/components/glean/ipc/FOGIPC.cpp:357
#12 mozilla::dom::ContentChild::RecvNotifyProcessPriorityChanged(mozilla::hal::ProcessPriority const&) /dom/ipc/ContentChild.cpp:2802

Mutex M2 acquired here while holding mutex M1 in main thread:
#0 pthread_mutex_lock ??:?
#1 mozilla::detail::MutexImpl::lock() ??:?
#2 Lock /objdir-ff-tsan/dist/include/mozilla/Mutex.h:65
#3 Lock /objdir-ff-tsan/dist/include/mozilla/StaticMutex.h:39
#4 BaseAutoLock /objdir-ff-tsan/dist/include/mozilla/Mutex.h:236
#5 Add /toolkit/components/telemetry/core/TelemetryScalar.cpp:2632
#6 mozilla::Telemetry::ScalarAdd(mozilla::Telemetry::ScalarID, nsTSubstring<char16_t> const&, unsigned int) /toolkit/components/telemetry/core/Telemetry.cpp:2040
#7 operator()<mozilla::DataMutexBase<mozilla::UniquePtr<nsBaseHashtable<nsIntegralHashKey<unsigned int, 0>, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, nsDefaultConverter<mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> > > >, mozilla::DefaultDelete<nsBaseHashtable<nsIntegralHashKey<unsigned int, 0>, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, nsDefaultConverter<mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> > > > > >, mozilla::StaticMutexNameless>::AutoLockBase<mozilla::UniquePtr<nsBaseHashtable<nsIntegralHashKey<unsigned int, 0>, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, nsDefaultConverter<mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> > > >, mozilla::DefaultDelete<nsBaseHashtable<nsIntegralHashKey<unsigned int, 0>, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, nsDefaultConverter<mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> >, mozilla::Tuple<mozilla::Telemetry::ScalarID, nsTString<char16_t> > > > > > > > /toolkit/components/glean/bindings/private/Counter.cpp:29
#8 apply<(lambda at /toolkit/components/glean/bindings/private/Counter.cpp:26:36)> /objdir-ff-tsan/dist/include/mozilla/Maybe.h:602
#9 mozilla::glean::impl::CounterMetric::Add(int) const /toolkit/components/glean/bindings/private/Counter.cpp:26
#10 mozilla::glean::RecordPowerMetrics() /toolkit/components/glean/ipc/FOGIPC.cpp:332
#11 mozilla::dom::ContentChild::RecvNotifyProcessPriorityChanged(mozilla::hal::ProcessPriority const&) /dom/ipc/ContentChild.cpp:2802

Mutex M3 acquired here while holding mutex M2 in main thread:
#0 pthread_mutex_lock ??:?
#1 mozilla::detail::MutexImpl::lock() ??:?
#2 Lock /objdir-ff-tsan/dist/include/mozilla/Mutex.h:65
#3 Lock /objdir-ff-tsan/dist/include/mozilla/StaticMutex.h:39
#4 BaseAutoLock /objdir-ff-tsan/dist/include/mozilla/Mutex.h:236
#5 RemoveTimer /xpcom/threads/nsTimerImpl.cpp:107
#6 InitCommon /xpcom/threads/nsTimerImpl.cpp:395
#7 InitHighResolutionWithNamedFuncCallback /xpcom/threads/nsTimerImpl.cpp:427
#8 InitWithNamedFuncCallback /xpcom/threads/nsTimerImpl.cpp:413
#9 InitWithNamedFuncCallback /xpcom/threads/nsTimerImpl.h:200
#10 DoArmIPCTimerMainThread /toolkit/components/telemetry/core/ipc/TelemetryIPCAccumulator.cpp:88
#11 ArmIPCTimer /toolkit/components/telemetry/core/ipc/TelemetryIPCAccumulator.cpp:103
#12 mozilla::TelemetryIPCAccumulator::RecordChildScalarAction(unsigned int, bool, mozilla::Telemetry::ScalarActionType, mozilla::Variant<unsigned int, bool, nsTString<char16_t> > const&) /toolkit/components/telemetry/core/ipc/TelemetryIPCAccumulator.cpp:186
#13 TelemetryScalar::Add(mozilla::Telemetry::ScalarID, unsigned int) /toolkit/components/telemetry/core/TelemetryScalar.cpp:2595
#14 mozilla::Telemetry::ScalarAdd(mozilla::Telemetry::ScalarID, unsigned int) /toolkit/components/telemetry/core/Telemetry.cpp:2019
#15 mozilla::glean::impl::CounterMetric::Add(int) const /toolkit/components/glean/bindings/private/Counter.cpp:24
#16 mozilla::glean::RecordPowerMetrics() /toolkit/components/glean/ipc/FOGIPC.cpp:331
#17 mozilla::dom::ContentChild::RecvNotifyProcessPriorityChanged(mozilla::hal::ProcessPriority const&) /dom/ipc/ContentChild.cpp:2802

Mutex M0 acquired here while holding mutex M3 in main thread:
#0 pthread_rwlock_rdlock ??:?
#1 mozilla::detail::RWLockImpl::readLock() ??:?
#2 LockShared /objdir-ff-tsan/dist/include/mozilla/BaseProfilerDetail.h:224
#3 BaseProfilerAutoLockShared /objdir-ff-tsan/dist/include/mozilla/BaseProfilerDetail.h:266
#4 LockedRegistry /objdir-ff-tsan/dist/include/mozilla/ProfilerThreadRegistry.h:217
#5 WithOffThreadRefOr<(lambda at /objdir-ff-tsan/dist/include/mozilla/ProfilerThreadState.h:69:7), bool> /objdir-ff-tsan/dist/include/mozilla/ProfilerThreadRegistry.h:270
#6 profiler_thread_is_being_profiled /objdir-ff-tsan/dist/include/mozilla/ProfilerThreadState.h:67
#7 profiler_thread_is_being_profiled_for_markers /objdir-ff-tsan/dist/include/mozilla/ProfilerMarkers.h:129
#8 AddTimer /xpcom/threads/TimerThread.cpp:635
#9 TimerThreadWrapper::AddTimer(nsTimerImpl*, mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /xpcom/threads/nsTimerImpl.cpp:100
#10 InitCommon /xpcom/threads/nsTimerImpl.cpp:406
#11 InitHighResolutionWithNamedFuncCallback /xpcom/threads/nsTimerImpl.cpp:427
#12 InitWithNamedFuncCallback /xpcom/threads/nsTimerImpl.cpp:413
#13 InitWithNamedFuncCallback /xpcom/threads/nsTimerImpl.h:200
#14 SetTimerInternal /xpcom/threads/IdleTaskRunner.cpp:274
#15 Schedule /xpcom/threads/IdleTaskRunner.cpp:213
#16 mozilla::ScheduleTimedOut(nsITimer*, void*) /xpcom/threads/IdleTaskRunner.cpp:172
#17 operator() /xpcom/threads/nsTimerImpl.cpp:661
#18 matchN<mozilla::Variant<nsTimerImpl::UnknownCallback, nsCOMPtr<nsITimerCallback>, nsCOMPtr<nsIObserver>, nsTimerImpl::FuncCallback, nsTimerImpl::ClosureCallback> &, (lambda at /xpcom/threads/nsTimerImpl.cpp:661:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:662:7)> /objdir-ff-tsan/dist/include/mozilla/Variant.h:309
#19 matchN<mozilla::Variant<nsTimerImpl::UnknownCallback, nsCOMPtr<nsITimerCallback>, nsCOMPtr<nsIObserver>, nsTimerImpl::FuncCallback, nsTimerImpl::ClosureCallback> &, (lambda at /xpcom/threads/nsTimerImpl.cpp:658:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:661:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:662:7)> /objdir-ff-tsan/dist/include/mozilla/Variant.h:318
#20 matchN<mozilla::Variant<nsTimerImpl::UnknownCallback, nsCOMPtr<nsITimerCallback>, nsCOMPtr<nsIObserver>, nsTimerImpl::FuncCallback, nsTimerImpl::ClosureCallback> &, (lambda at /xpcom/threads/nsTimerImpl.cpp:657:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:658:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:661:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:662:7)> /objdir-ff-tsan/dist/include/mozilla/Variant.h:318
#21 matchN<mozilla::Variant<nsTimerImpl::UnknownCallback, nsCOMPtr<nsITimerCallback>, nsCOMPtr<nsIObserver>, nsTimerImpl::FuncCallback, nsTimerImpl::ClosureCallback> &, (lambda at /xpcom/threads/nsTimerImpl.cpp:656:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:657:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:658:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:661:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:662:7)> /objdir-ff-tsan/dist/include/mozilla/Variant.h:318
#22 matchN<mozilla::Variant<nsTimerImpl::UnknownCallback, nsCOMPtr<nsITimerCallback>, nsCOMPtr<nsIObserver>, nsTimerImpl::FuncCallback, nsTimerImpl::ClosureCallback> &, (lambda at /xpcom/threads/nsTimerImpl.cpp:656:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:657:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:658:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:661:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:662:7)> /objdir-ff-tsan/dist/include/mozilla/Variant.h:902
#23 match<(lambda at /xpcom/threads/nsTimerImpl.cpp:656:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:657:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:658:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:661:7), (lambda at /xpcom/threads/nsTimerImpl.cpp:662:7)> /objdir-ff-tsan/dist/include/mozilla/Variant.h:857
#24 Fire /xpcom/threads/nsTimerImpl.cpp:655
#25 nsTimerEvent::Run() /xpcom/threads/TimerThread.cpp:365

The severity field is not set for this bug.
:florian, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(florian)
Component: Performance Monitoring → Gecko Profiler
Product: Toolkit → Core
Flags: needinfo?(florian) → needinfo?(canaltinova)

Interesting, I will take a closer look at it soon but how often do you encounter this?
If I'm not mistaken, it looks like glean and profiler try to access the profiler thread registery mutex at the same time. We could maybe return early on glean if the profiler is running?

Severity: -- → S3
Flags: needinfo?(canaltinova)
Priority: -- → P2
You need to log in before you can comment on or make changes to this bug.