Closed Bug 1540340 Opened 5 years ago Closed 5 years ago

Re-locking non-recursive mutex in profiler_shutdown -> MOZ_LOG -> WriteFile -> IOInterposer -> profiler_get_backtrace

Categories

(Core :: Gecko Profiler, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox68 --- wontfix
firefox72 --- fixed

People

(Reporter: mozbugz, Assigned: mozbugz)

References

Details

Attachments

(2 files, 1 obsolete file)

Found when closing Firefox that was started with MOZ_LOG=prof:3 MOZ_PROFILER_STARTUP=1 ./mach run (which now effectively enables "mainthreadio" since bug 1539839) :

###!!! ERROR: Potential deadlock detected:
=== Cyclical dependency starts at
--- Mutex : StaticMutex (currently acquired)
calling context
#01: mozilla::BaseStaticMutex<mozilla::recordreplay::Behavior::Preserve>::Lock (c:\mozilla-source\obj-mc-dbg\dist\include\mozilla\StaticMutex.h:39)
#02: mozilla::BaseAutoLock<PSMutex &>::BaseAutoLock (c:\mozilla-source\obj-mc-dbg\dist\include\mozilla\Mutex.h:165)
#03: profiler_get_backtrace (c:\mozilla-source\mozilla-central\tools\profiler\core\platform.cpp:3730)
#04: mozilla::ProfilerIOInterposeObserver::Observe (c:\mozilla-source\mozilla-central\tools\profiler\gecko\ProfilerIOInterposeObserver.cpp:19)
#05: anonymous namespace'::PerThreadData::CallObservers (c:\mozilla-source\mozilla-central\xpcom\build\IOInterposer.cpp:129) #06: mozilla::IOInterposer::Report (c:\mozilla-source\mozilla-central\xpcom\build\IOInterposer.cpp:474) #07: mozilla::IOInterposeObserver::Observation::Report (c:\mozilla-source\mozilla-central\xpcom\build\IOInterposer.cpp:382) #08:anonymous namespace'::WinIOAutoObservation::~WinIOAutoObservation (c:\mozilla-source\mozilla-central\xpcom\build\PoisonIOInterposerWin.cpp:148)
#09: `anonymous namespace'::InterposedNtWriteFile (c:\mozilla-source\mozilla-central\xpcom\build\PoisonIOInterposerWin.cpp:269)
#10: WriteFile[C:\WINDOWS\System32\KERNELBASE.dll +0x1ebda]
#11: SetLoadLibraryCallback[C:\Program Files\ConEmu\ConEmu\ConEmuHk64.dll +0x28a03]
#12: fflush[C:\WINDOWS\System32\ucrtbase.dll +0x16db8]
#13: write[C:\WINDOWS\System32\ucrtbase.dll +0x15df5]
#14: write[C:\WINDOWS\System32\ucrtbase.dll +0x15c7f]
#15: o_iswalpha[C:\WINDOWS\System32\ucrtbase.dll +0x158b7]
#16: fclose_nolock[C:\WINDOWS\System32\ucrtbase.dll +0x172ee]
#17: fclose[C:\WINDOWS\System32\ucrtbase.dll +0x176f0]
#18: vprintf_stderr (c:\mozilla-source\mozilla-central\xpcom\base\nsCRTGlue.cpp:262)

#19: fprintf_stderr (c:\mozilla-source\mozilla-central\xpcom\base\nsCRTGlue.cpp:287)
#20: mozilla::LogModuleManager::Print (c:\mozilla-source\mozilla-central\xpcom\base\Logging.cpp:460)
#21: mozilla::LogModule::Printv (c:\mozilla-source\mozilla-central\xpcom\base\Logging.cpp:602)
#22: mozilla::detail::log_print (c:\mozilla-source\mozilla-central\xpcom\base\Logging.cpp:71)
#23: locked_profiler_stop (c:\mozilla-source\mozilla-central\tools\profiler\core\platform.cpp:3392)
#24: profiler_shutdown (c:\mozilla-source\mozilla-central\tools\profiler\core\platform.cpp:2918)

=== Cycle completed at
--- Mutex : StaticMutex (currently acquired)
calling context
#01: mozilla::BaseStaticMutex<mozilla::recordreplay::Behavior::Preserve>::Lock (c:\mozilla-source\obj-mc-dbg\dist\include\mozilla\StaticMutex.h:39)
#02: mozilla::BaseAutoLock<PSMutex &>::BaseAutoLock (c:\mozilla-source\obj-mc-dbg\dist\include\mozilla\Mutex.h:165)
#03: profiler_get_backtrace (c:\mozilla-source\mozilla-central\tools\profiler\core\platform.cpp:3730)
#04: mozilla::ProfilerIOInterposeObserver::Observe (c:\mozilla-source\mozilla-central\tools\profiler\gecko\ProfilerIOInterposeObserver.cpp:19)
#05: anonymous namespace'::PerThreadData::CallObservers (c:\mozilla-source\mozilla-central\xpcom\build\IOInterposer.cpp:129) #06: mozilla::IOInterposer::Report (c:\mozilla-source\mozilla-central\xpcom\build\IOInterposer.cpp:474) #07: mozilla::IOInterposeObserver::Observation::Report (c:\mozilla-source\mozilla-central\xpcom\build\IOInterposer.cpp:382) #08:anonymous namespace'::WinIOAutoObservation::~WinIOAutoObservation (c:\mozilla-source\mozilla-central\xpcom\build\PoisonIOInterposerWin.cpp:148)
#09: `anonymous namespace'::InterposedNtWriteFile (c:\mozilla-source\mozilla-central\xpcom\build\PoisonIOInterposerWin.cpp:269)
#10: WriteFile[C:\WINDOWS\System32\KERNELBASE.dll +0x1ebda]
#11: SetLoadLibraryCallback[C:\Program Files\ConEmu\ConEmu\ConEmuHk64.dll +0x28a03]
#12: fflush[C:\WINDOWS\System32\ucrtbase.dll +0x16db8]
#13: write[C:\WINDOWS\System32\ucrtbase.dll +0x15df5]
#14: write[C:\WINDOWS\System32\ucrtbase.dll +0x15c7f]
#15: o_iswalpha[C:\WINDOWS\System32\ucrtbase.dll +0x158b7]
#16: fclose_nolock[C:\WINDOWS\System32\ucrtbase.dll +0x172ee]
#17: fclose[C:\WINDOWS\System32\ucrtbase.dll +0x176f0]
#18: vprintf_stderr (c:\mozilla-source\mozilla-central\xpcom\base\nsCRTGlue.cpp:262)
#19: fprintf_stderr (c:\mozilla-source\mozilla-central\xpcom\base\nsCRTGlue.cpp:287)
#20: mozilla::LogModuleManager::Print (c:\mozilla-source\mozilla-central\xpcom\base\Logging.cpp:460)
#21: mozilla::LogModule::Printv (c:\mozilla-source\mozilla-central\xpcom\base\Logging.cpp:602)
#22: mozilla::detail::log_print (c:\mozilla-source\mozilla-central\xpcom\base\Logging.cpp:71)
#23: locked_profiler_stop (c:\mozilla-source\mozilla-central\tools\profiler\core\platform.cpp:3392)
#24: profiler_shutdown (c:\mozilla-source\mozilla-central\tools\profiler\core\platform.cpp:2918)

We've had a few similar issues with disk operations initiated from the profiler.

Maybe we need a more systematic way to disable the IO interposer when the profiler lock is taken?

Alternatively, maybe profiler_get_backtrace shouldn't use the lock? (Or use a different lock if stack-walking itself is not reentrant?)

Note: StaticMutex stacktraces gotten by commenting out MOZ_CALLSTACK_DISABLED in BlockingResourceBase.h:21.

Might also be what I was hitting when attempting to work on bug 1524574.

PSMutex can store the id of the thread owning the lock, and it's safe to check
whether the current thread owns that lock (either it does and no-one else can
change it, or it's another id).

Comments related to memory hooks have been moved to memory_hooks.cpp, because
profiler_is_locked_on_current_thread() could be used for other things.

Logging could be intercepted by the I/O interposer, which takes the lock, this
could deadlock if the mutex had already been taken in the same thread (e.g., by
a "locked" profiler function that wants to log something).

Now, functions that hold the lock must use LOG_LOCKED(lock, ...), which will
skip logging when the I/O interposer is running.
In DEBUG builds, LOG() checks that the lock is not owned, so find possible
misuses.

LOG()s outside of platform.cpp were changed to NS_WARNINGs, because they
could not access gPSMutex that's defined in platform.cpp; and they're really
warnings anyway.

Depends on D49895

I want to modify the PSMutex that Greg wrote in bug 1564475, to check if the current thread really owns the mutex or not.
Then those MOZ_LOGs in locked profiler functions/scopes could be rewritten to skip logging when the I/O interposer is running, thus avoiding a deadlock (at the cost of some lost logging).

Assignee: nobody → gsquelart
Depends on: 1564475
No longer depends on: 1545187
Attachment #9102922 - Attachment description: Bug 1540340 - PSMutex can more accurately if the current thread owns the lock - r?njn!,gregtatum → Bug 1540340 - PSMutex can more accurately determine if the current thread owns the lock - r?njn!,gregtatum
Attachment #9102923 - Attachment description: Bug 1540340 - Prevent logging when mainthredio is on, relocated some interposer pauses/resumes - r?gregtatum → Bug 1540340 - Prevent logging when mainthredio is enabled - r?gregtatum
Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/be6a82fd75e9
PSMutex can more accurately determine if the current thread owns the lock - r=njn,gregtatum
https://hg.mozilla.org/integration/autoland/rev/4ee4cde55851
Prevent logging when mainthredio is enabled - r=gregtatum

Thank you Alexandru.

Note to self: NS_WARNING is not printf!
I see now that my Try didn't include Android. 😅

Flags: needinfo?(gsquelart)

While working on these issues, I realized that in the first patch I introduced just the right tool to solve this issue at the most common points:
Instead of trying to cleverly pause&resume the interposer around potential I/Os in locked sections of the profiler, and instead of trying not to produce some I/Os (mostly from logging), we could just have the interposer itself check if the mutex is already locked (with profiler_is_locked_on_current_thread()), in which case it just gives up and won't use potentially-locking profiler functions!

Hopefully this should handle all sources of profiler->IO->interposer->profiler deadlocks, so we won't have to revisit this again.

The source of deadlock is due to profiler->IO->interposer->profiler loops, where
the second profiler call tries to lock the profiler mutex again.
Thanks to profiler_is_locked_on_current_thread the interposer can now check
whether the mutex is already locked, in which case it just won't call profiler
functions anymore, breaking the loop.

Depends on D49895

Attachment #9102923 - Attachment is obsolete: true
Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f0aabdb3bdb4
PSMutex can more accurately determine if the current thread owns the lock - r=njn,gregtatum
https://hg.mozilla.org/integration/autoland/rev/7e72dcc08ffb
Prevent ProfilerIOInterposeObserver recursive locking - r=gregtatum
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: