Closed Bug 1649056 Opened 4 years ago Closed 4 years ago

Move `do_GetService` out of profiler-locked code to avoid deadlock

Categories

(Core :: Gecko Profiler, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla80
Tracking Status
firefox79 --- fixed
firefox80 --- fixed

People

(Reporter: mozbugz, Assigned: mozbugz)

References

Details

Attachments

(1 file)

Setting low priority&severity because I've only experienced this locally, I cannot reproduce it on Try.

While running profiler gtests, I'm getting the following deadlock:

  1. Start profiling, capture profile.
  2. On main thread (MT), the profiler locks its mutex.
  3. MT: StreamMetaJSCustomObject() does do_GetService(NS_NETWORK_PROTOCOL_CONTRACTID_PREFIX "http", &res);
  4. MT: In that do_GetService, mozilla::ipc::GetIPCLauncher() creates a new thread "IPC Launch".
  5. MT: nsThread::Init() blocks until the new thread completes its initialization (which is when an initial startup event is handled).
  6. In IPC Launch (IL), nsThread::ThreadFunc() calls profiler_register_thread() before starting the event loop that would handle the startup event.
  7. IL: profiler_register_thread() tries to lock the profiler thread, which was taken on the MT at step 2, but won't be released because the MT is waiting for IL here to finish its startup.

I think the solution here is to move all do_GetService calls before the profiler-locked scope.

Main thread stack:

ReentrantMonitorAutoEnter::Wait(unsigned int aInterval)
nsThreadStartupEvent::Wait()
nsThread::Init(const nsTSubstring<char> & aName)
nsThreadManager::NewNamedThread(const nsTSubstring<char> & aName, unsigned int aStackSize, nsIThread * * aResult)
NS_NewNamedThread(const nsTSubstring<char> & aName, nsIThread * * aResult, already_AddRefed<nsIRunnable> aInitialEvent, unsigned int aStackSize)
NS_NewNamedThread(const nsTSubstring<char> & aName, nsIThread * * aResult, nsIRunnable * aInitialEvent, unsigned int aStackSize)
mozilla::ipc::GetIPCLauncher()
mozilla::ipc::BaseProcessLauncher::BaseProcessLauncher(mozilla::ipc::GeckoChildProcessHost * aHost, std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > && aExtraOpts)
mozilla::ipc::WindowsProcessLauncher::WindowsProcessLauncher(mozilla::ipc::GeckoChildProcessHost * aHost, std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > && aExtraOpts)
mozilla::ipc::GeckoChildProcessHost::AsyncLaunch(std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > aExtraOpts)
mozilla::ipc::GeckoChildProcessHost::LaunchAndWaitForProcessHandle(std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > aExtraOpts)
mozilla::net::SocketProcessHost::Launch() Line 76 (c:\mozilla-source\mozilla-central\netwerk\ipc\SocketProcessHost.cpp:76)
mozilla::net::nsIOService::LaunchSocketProcess()
mozilla::net::nsHttpHandler::Init()
mozilla::net::nsHttpHandler::GetInstance()
mozCreateComponent<mozilla::net::nsHttpHandler>()
mozilla::xpcom::CreateInstanceImpl(mozilla::xpcom::ModuleID aID, nsISupports * aOuter, const nsID & aIID, void * * aResult)
`anonymous namespace'::EntryWrapper::CreateInstance(nsISupports * aOuter, const nsID & aIID, void * * aResult)
nsComponentManagerImpl::GetServiceLocked(`anonymous namespace'::MutexLock & aLock, `anonymous namespace'::EntryWrapper & aEntry, const nsID & aIID, void * * aResult)
nsComponentManagerImpl::GetServiceByContractID(const char * aContractID, const nsID & aIID, void * * aResult)
CallGetService(const char * aContractID, const nsID & aIID, void * * aResult)
nsGetServiceByContractIDWithError::operator()(const nsID & aIID, void * * aInstancePtr)
xul.dll!nsCOMPtr<nsIHttpProtocolHandler>::assign_from_gs_contractid_with_error(const nsGetServiceByContractIDWithError & aGS, const nsID & aIID)
nsCOMPtr<nsIHttpProtocolHandler>::nsCOMPtr(const nsGetServiceByContractIDWithError & aGS)
StreamMetaJSCustomObject(const PSAutoLock & aLock, SpliceableJSONWriter & aWriter, bool aIsShuttingDown)
locked_profiler_stream_json_for_this_process(const PSAutoLock & aLock, SpliceableJSONWriter & aWriter, double aSinceTime, bool aIsShuttingDown, ProfilerCodeAddressService * aService)
profiler_stream_json_for_this_process(SpliceableJSONWriter & aWriter, double aSinceTime, bool aIsShuttingDown, ProfilerCodeAddressService * aService)
xul.dll!GeckoProfiler_Markers_Test::TestBody()

IPC Launch stack:

PSMutex::Lock()
PSAutoLock::PSAutoLock(PSMutex & aMutex)
profiler_register_thread(const char * aName, void * aGuessStackTop)
nsThread::ThreadFunc(void * aArg)

This could explain some intermittent timeouts. So I'll work on it now, to hopefully fix these or at least remove this potential cause from future consideration.

Assignee: nobody → gsquelart
Priority: P3 → P1

Some profile meta information can be gathered before the profiler mutex must be locked.
This reduces the main-thread locking when capturing.
Most importantly, it prevents deadlocking in case any of the data-gathering operation would itself rely on profiler-locking functions (e.g., starting a thread, which requires the lock to register the new thread with the profiler).

Depends on D81451

Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6a8b446a95a0
Pre-record some meta information before locking the profiler mutex - r=canaltinova
Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/39d32b376a20
Pre-record some meta information before locking the profiler mutex - r=canaltinova
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla80
Flags: needinfo?(gsquelart)

Comment on attachment 9159999 [details]
Bug 1649056 - Pre-record some meta information before locking the profiler mutex - r?canaltinova

Beta/Release Uplift Approval Request

  • User impact if declined: This would primarily appear as a gtest failure when building beta. Not sure if this failure happens in Firefox itself (no reports so far).
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It's only moving some existing code a bit earlier during a Profiler capture.
    And it only touches Profiler code, which is used by a small proportion of users.
  • String changes made/needed:
Attachment #9159999 - Flags: approval-mozilla-beta?

Comment on attachment 9159999 [details]
Bug 1649056 - Pre-record some meta information before locking the profiler mutex - r?canaltinova

Approved for 79.0b5.

Attachment #9159999 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: