Open Bug 1873170 Opened 8 months ago Updated 6 months ago

Reentrancy deadlock in TimerThreadWrapper / nsTimerImpl on a profiler-only path

Categories

(Core :: Gecko Profiler, defect, P2)

Firefox 123
Desktop
Windows
defect

Tracking

()

Tracking Status
firefox123 --- affected

People

(Reporter: yannis, Unassigned)

References

Details

Attachments

(1 file)

Experienced in today's Nightly 123. See attachment for all threads' stacks but I think the main thread stack is enough to explain the deadlock.

TimerThreadWrapper::AddTimer takes sMutex and calls TimerThread::AddTimer which contains this code:

  /* NOTE: RacyFeatures::sActiveAndFeatures has value 0x801f5102 in my deadlocked process */
  if (profiler_thread_is_being_profiled_for_markers(mProfilerThreadId)) {
    nsAutoCString name;
    aTimer->GetName(name, aProofOfLock);
    /* ... */
  }

But nsTimerImpl::GetName can run a JavaScript getter and reach xul!TimerThreadWrapper::RemoveTimer through garbage collection:

05 (Inline Function) --------`--------     xul!TimerThreadWrapper::RemoveTimer+0x1b [/builds/worker/checkouts/gecko/xpcom/threads/nsTimerImpl.cpp @ 109] 
06 (Inline Function) --------`--------     xul!nsTimerImpl::CancelImpl+0x30 [/builds/worker/checkouts/gecko/xpcom/threads/nsTimerImpl.cpp @ 506] 
07 (Inline Function) --------`--------     xul!nsTimerImpl::Cancel+0x30 [/builds/worker/checkouts/gecko/xpcom/threads/nsTimerImpl.cpp @ 496] 
08 0000002e`2e5fa750 00007ffd`8c4fbaf4     xul!nsTimer::Cancel+0x62 [/builds/worker/checkouts/gecko/xpcom/threads/nsTimerImpl.h @ 199] 
09 (Inline Function) --------`--------     xul!nsCOMPtr<nsITimer>::operator bool+0xd [/builds/worker/workspace/obj-build/dist/include/nsCOMPtr.h @ 767] 
0a (Inline Function) --------`--------     xul!mozilla::IdleTaskRunner::ResetTimer+0x27 [/builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp @ 274] 
0b (Inline Function) --------`--------     xul!mozilla::CCGCScheduler::EnsureOrResetGCRunner+0x8a [/builds/worker/checkouts/gecko/dom/base/CCGCScheduler.cpp @ 654] 
0c 0000002e`2e5fa960 00007ffd`8d7ffe7a     xul!DOMGCSliceCallback+0x2d4 [/builds/worker/checkouts/gecko/dom/base/nsJSEnvironment.cpp @ 1889] 
0d (Inline Function) --------`--------     xul!js::gcstats::Statistics::endSlice+0x4d9 [/builds/worker/checkouts/gecko/js/src/gc/Statistics.cpp @ 1260] 
0e (Inline Function) --------`--------     xul!js::gcstats::AutoGCSlice::~AutoGCSlice+0x4d9 [/builds/worker/checkouts/gecko/js/src/gc/Statistics.h @ 511] 
0f 0000002e`2e5fabb0 00007ffd`8bad0ebf     xul!js::gc::GCRuntime::gcCycle+0xe3a [/builds/worker/checkouts/gecko/js/src/gc/GC.cpp @ 4254] 
10 0000002e`2e5fb9b0 00007ffd`8ba09516     xul!js::gc::GCRuntime::collect+0x13f [/builds/worker/checkouts/gecko/js/src/gc/GC.cpp @ 4438] 
11 (Inline Function) --------`--------     xul!js::gc::GCRuntime::gcSlice+0x12 [/builds/worker/checkouts/gecko/js/src/gc/GC.cpp @ 4539] 
12 0000002e`2e5fba40 00007ffd`8ba08e9c     xul!js::gc::GCRuntime::gcIfRequestedImpl+0xb6 [/builds/worker/checkouts/gecko/js/src/gc/GC.cpp @ 4718] 
13 (Inline Function) --------`--------     xul!js::gc::GCRuntime::gcIfRequested+0x7 [/builds/worker/checkouts/gecko/js/src/gc/GCRuntime.h @ 327] 
14 (Inline Function) --------`--------     xul!HandleInterrupt+0xe [/builds/worker/checkouts/gecko/js/src/vm/Runtime.cpp @ 390] 
15 0000002e`2e5fbaf0 00007ffd`8d4d43f3     xul!JSContext::handleInterrupt+0x7c [/builds/worker/checkouts/gecko/js/src/vm/Runtime.cpp @ 483] 
16 0000002e`2e5fbda0 00007ffd`8d4b9f89     xul!js::Interpret+0x1e53 [/builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp @ 1992] 
17 (Inline Function) --------`--------     xul!MaybeEnterInterpreterTrampoline+0x20 [/builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp @ 393] 
18 (Inline Function) --------`--------     xul!js::RunScript+0xf02 [/builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp @ 451] 
19 (Inline Function) --------`--------     xul!js::InternalCallOrConstruct+0x104e [/builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp @ 605] 
1a (Inline Function) --------`--------     xul!InternalCall+0x104e [/builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp @ 640] 
1b (Inline Function) --------`--------     xul!js::Call+0x108f [/builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp @ 672] 
1c (Inline Function) --------`--------     xul!js::CallGetter+0x10ea [/builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp @ 794] 
1d (Inline Function) --------`--------     xul!CallGetter+0x1148 [/builds/worker/checkouts/gecko/js/src/vm/NativeObject.cpp @ 2069] 
1e (Inline Function) --------`--------     xul!GetExistingProperty+0x1256 [/builds/worker/checkouts/gecko/js/src/vm/NativeObject.cpp @ 2097] 
1f (Inline Function) --------`--------     xul!NativeGetPropertyInline+0x16b6 [/builds/worker/checkouts/gecko/js/src/vm/NativeObject.cpp @ 2245] 
20 0000002e`2e5fc280 00007ffd`8d31f053     xul!js::NativeGetProperty+0x16f9 [/builds/worker/checkouts/gecko/js/src/vm/NativeObject.cpp @ 2276] 
21 (Inline Function) --------`--------     xul!js::GetProperty+0x124 [/builds/worker/checkouts/gecko/js/src/vm/ObjectOperations-inl.h @ 117] 
22 0000002e`2e5fc5f0 00007ffd`8b9d8df9     xul!JS_ForwardGetPropertyTo+0x153 [/builds/worker/checkouts/gecko/js/src/vm/PropertyAndElement.cpp @ 598] 
23 (Inline Function) --------`--------     xul!JS_GetPropertyById+0x32 [/builds/worker/checkouts/gecko/js/src/vm/PropertyAndElement.cpp @ 617] 
24 0000002e`2e5fc650 00007ffd`8db80027     xul!JS_GetProperty+0xd9 [/builds/worker/checkouts/gecko/js/src/vm/PropertyAndElement.cpp @ 628] 
25 0000002e`2e5fc6f0 00007ffd`8db7e9e6     xul!nsXPCWrappedJS::CallMethod+0x14e7 [/builds/worker/checkouts/gecko/js/xpconnect/src/XPCWrappedJSClass.cpp @ 912] 
26 0000002e`2e5fcaf0 00007ffd`8dde1569     xul!PrepareAndDispatch+0x166 [/builds/worker/checkouts/gecko/xpcom/reflect/xptcall/md/win32/xptcstubs_x86_64.cpp @ 168] 
27 0000002e`2e5fcbd0 00007ffd`8bb4da5a     xul!SharedStub+0x49
28 (Inline Function) --------`--------     xul!nsTimerImpl::GetName::<lambda_22>::operator()+0x75 [/builds/worker/checkouts/gecko/xpcom/threads/nsTimerImpl.cpp @ 779] 
29 (Inline Function) --------`--------     xul!mozilla::detail::VariantImplementation<unsigned char,1,nsCOMPtr<nsITimerCallback>,nsCOMPtr<nsIObserver>,nsTimerImpl::FuncCallback,nsTimerImpl::ClosureCallback>::matchN+0xc6 [/builds/worker/workspace/obj-build/dist/include/mozilla/Variant.h @ 309] 
2a (Inline Function) --------`--------     xul!mozilla::detail::VariantImplementation<unsigned char,0,nsTimerImpl::UnknownCallback,nsCOMPtr<nsITimerCallback>,nsCOMPtr<nsIObserver>,nsTimerImpl::FuncCallback,nsTimerImpl::ClosureCallback>::matchN+0xd2 [/builds/worker/workspace/obj-build/dist/include/mozilla/Variant.h @ 318] 
2b (Inline Function) --------`--------     xul!mozilla::Variant<nsTimerImpl::UnknownCallback,nsCOMPtr<nsITimerCallback>,nsCOMPtr<nsIObserver>,nsTimerImpl::FuncCallback,nsTimerImpl::ClosureCallback>::matchN+0xd2 [/builds/worker/workspace/obj-build/dist/include/mozilla/Variant.h @ 902] 
2c (Inline Function) --------`--------     xul!mozilla::Variant<nsTimerImpl::UnknownCallback,nsCOMPtr<nsITimerCallback>,nsCOMPtr<nsIObserver>,nsTimerImpl::FuncCallback,nsTimerImpl::ClosureCallback>::match+0xd2 [/builds/worker/workspace/obj-build/dist/include/mozilla/Variant.h @ 857] 
2d 0000002e`2e5fcc40 00007ffd`8db1826b     xul!nsTimerImpl::GetName+0xea [/builds/worker/checkouts/gecko/xpcom/threads/nsTimerImpl.cpp @ 794] 
2e 0000002e`2e5fcc90 00007ffd`8db16c2c     xul!TimerThread::AddTimer+0xabb [/builds/worker/checkouts/gecko/xpcom/threads/TimerThread.cpp @ 1059] 
2f (Inline Function) --------`--------     xul!TimerThreadWrapper::AddTimer+0x33 [/builds/worker/checkouts/gecko/xpcom/threads/nsTimerImpl.cpp @ 102] 

When we reach TimerThreadWrapper::RemoveTimer we'll wait for sMutex but we are the thread holding it.

Interesting. :florian, I think you are more familiar with this area since you helped Gerald with the timer work. Do you have any pointers here?

Severity: -- → S3
Flags: needinfo?(florian)
Priority: -- → P2

Some observations from looking at this for a little bit:

  • As stated already, the immediate cause of the deadlock is TimerThreadWrapper (TTW) locking sMutex in TTW::AddTimer() and then blocking itself out when that cascades down into a TTW::RemoveTimer() call. From what I can tell, TTW needs that sMutex to protect against mThread going away while any of its functions are still using mThread. It is enforcing something stronger, though - no simultaneous calls to TimerThread (TT) functions. Switching to a different locking mechanism (like a multi-access lock that only prevents TTW::Init()/Shutdown() from being executed at the time of other TTW methods) would avoid this immediate cause of the deadlock and would be more correct.
  • However ... that would not really solve the problem because now it would just push the deadlock into TT itself, because TT:AddTimer()/RemoveTimer() each necessarily need exclusive access to the internal data structures of TT. However ... the point at which TT:AddTimer() is requesting the name from the timer is really after the point at which the internal data structures of TT are being accessed, so it should be able to release its lock before that point. (A similar statement applies to TT:RemoveTimer().)

I think that the above two changes combined would probably avoid the deadlock here. It's probably not too much work but changing locking behavior can be a bit worrisome because you can introduce a new deadlock (perhaps one that would show up when not using the profiler) if an assumption is incorrect. It would also be very difficult to know if the changes do resolve the deadlock due to the rarity of this deadlock.

It's a bit unnerving to me that GetName() can be such a landmine like this. (Where, by "landmine", I mean "triggers way more things to happen than one would probably expect".) I'm also wondering if there are other possible places where problems like this could occur. (At a quick glance, nsTimerEvent::Run() uses GetName() and it's not immediately clear to me in what conditions that occurs.)

I will create a bug to make the above-mentioned changes and will link it to this one.

See Also: → 1875116

(In reply to Nazım Can Altınova [:canova][:canaltinova on phabricator] from comment #1)

Interesting. :florian, I think you are more familiar with this area since you helped Gerald with the timer work. Do you have any pointers here?

Honestly, I would just redirect this to Justin, who has actually already commented.

Flags: needinfo?(florian)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: