Move `do_GetService` out of profiler-locked code to avoid deadlock
Categories
(Core :: Gecko Profiler, defect, P1)
Tracking
()
People
(Reporter: mozbugz, Assigned: mozbugz)
References
Details
Attachments
(1 file)
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
|
Details | Review |
Setting low priority&severity because I've only experienced this locally, I cannot reproduce it on Try.
While running profiler gtests, I'm getting the following deadlock:
- Start profiling, capture profile.
- On main thread (MT), the profiler locks its mutex.
- MT:
StreamMetaJSCustomObject()
doesdo_GetService(NS_NETWORK_PROTOCOL_CONTRACTID_PREFIX "http", &res);
- MT: In that
do_GetService
,mozilla::ipc::GetIPCLauncher()
creates a new thread "IPC Launch". - MT:
nsThread::Init()
blocks until the new thread completes its initialization (which is when an initialstartup
event is handled). - In IPC Launch (IL),
nsThread::ThreadFunc()
callsprofiler_register_thread()
before starting the event loop that would handle the startup event. - IL:
profiler_register_thread()
tries to lock the profiler thread, which was taken on the MT at step 2, but won't be released because the MT is waiting for IL here to finish its startup.
I think the solution here is to move all do_GetService
calls before the profiler-locked scope.
Assignee | ||
Comment 1•4 years ago
|
||
Main thread stack:
ReentrantMonitorAutoEnter::Wait(unsigned int aInterval)
nsThreadStartupEvent::Wait()
nsThread::Init(const nsTSubstring<char> & aName)
nsThreadManager::NewNamedThread(const nsTSubstring<char> & aName, unsigned int aStackSize, nsIThread * * aResult)
NS_NewNamedThread(const nsTSubstring<char> & aName, nsIThread * * aResult, already_AddRefed<nsIRunnable> aInitialEvent, unsigned int aStackSize)
NS_NewNamedThread(const nsTSubstring<char> & aName, nsIThread * * aResult, nsIRunnable * aInitialEvent, unsigned int aStackSize)
mozilla::ipc::GetIPCLauncher()
mozilla::ipc::BaseProcessLauncher::BaseProcessLauncher(mozilla::ipc::GeckoChildProcessHost * aHost, std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > && aExtraOpts)
mozilla::ipc::WindowsProcessLauncher::WindowsProcessLauncher(mozilla::ipc::GeckoChildProcessHost * aHost, std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > && aExtraOpts)
mozilla::ipc::GeckoChildProcessHost::AsyncLaunch(std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > aExtraOpts)
mozilla::ipc::GeckoChildProcessHost::LaunchAndWaitForProcessHandle(std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > aExtraOpts)
mozilla::net::SocketProcessHost::Launch() Line 76 (c:\mozilla-source\mozilla-central\netwerk\ipc\SocketProcessHost.cpp:76)
mozilla::net::nsIOService::LaunchSocketProcess()
mozilla::net::nsHttpHandler::Init()
mozilla::net::nsHttpHandler::GetInstance()
mozCreateComponent<mozilla::net::nsHttpHandler>()
mozilla::xpcom::CreateInstanceImpl(mozilla::xpcom::ModuleID aID, nsISupports * aOuter, const nsID & aIID, void * * aResult)
`anonymous namespace'::EntryWrapper::CreateInstance(nsISupports * aOuter, const nsID & aIID, void * * aResult)
nsComponentManagerImpl::GetServiceLocked(`anonymous namespace'::MutexLock & aLock, `anonymous namespace'::EntryWrapper & aEntry, const nsID & aIID, void * * aResult)
nsComponentManagerImpl::GetServiceByContractID(const char * aContractID, const nsID & aIID, void * * aResult)
CallGetService(const char * aContractID, const nsID & aIID, void * * aResult)
nsGetServiceByContractIDWithError::operator()(const nsID & aIID, void * * aInstancePtr)
xul.dll!nsCOMPtr<nsIHttpProtocolHandler>::assign_from_gs_contractid_with_error(const nsGetServiceByContractIDWithError & aGS, const nsID & aIID)
nsCOMPtr<nsIHttpProtocolHandler>::nsCOMPtr(const nsGetServiceByContractIDWithError & aGS)
StreamMetaJSCustomObject(const PSAutoLock & aLock, SpliceableJSONWriter & aWriter, bool aIsShuttingDown)
locked_profiler_stream_json_for_this_process(const PSAutoLock & aLock, SpliceableJSONWriter & aWriter, double aSinceTime, bool aIsShuttingDown, ProfilerCodeAddressService * aService)
profiler_stream_json_for_this_process(SpliceableJSONWriter & aWriter, double aSinceTime, bool aIsShuttingDown, ProfilerCodeAddressService * aService)
xul.dll!GeckoProfiler_Markers_Test::TestBody()
IPC Launch stack:
PSMutex::Lock()
PSAutoLock::PSAutoLock(PSMutex & aMutex)
profiler_register_thread(const char * aName, void * aGuessStackTop)
nsThread::ThreadFunc(void * aArg)
Assignee | ||
Comment 2•4 years ago
|
||
This could explain some intermittent timeouts. So I'll work on it now, to hopefully fix these or at least remove this potential cause from future consideration.
Assignee | ||
Comment 3•4 years ago
|
||
Some profile meta information can be gathered before the profiler mutex must be locked.
This reduces the main-thread locking when capturing.
Most importantly, it prevents deadlocking in case any of the data-gathering operation would itself rely on profiler-locking functions (e.g., starting a thread, which requires the lock to register the new thread with the profiler).
Depends on D81451
Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6a8b446a95a0 Pre-record some meta information before locking the profiler mutex - r=canaltinova
Comment 5•4 years ago
|
||
Backed out for assertion failure on ExtensionPolicyService.cpp
Backout link: https://hg.mozilla.org/integration/autoland/rev/fcb6e5236bd628dcb6dca78d52eb5df528822a61
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=307978493&repo=autoland&lineNumber=3788
Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/39d32b376a20 Pre-record some meta information before locking the profiler mutex - r=canaltinova
Comment 7•4 years ago
|
||
bugherder |
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 8•4 years ago
|
||
Comment on attachment 9159999 [details]
Bug 1649056 - Pre-record some meta information before locking the profiler mutex - r?canaltinova
Beta/Release Uplift Approval Request
- User impact if declined: This would primarily appear as a gtest failure when building beta. Not sure if this failure happens in Firefox itself (no reports so far).
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): It's only moving some existing code a bit earlier during a Profiler capture.
And it only touches Profiler code, which is used by a small proportion of users. - String changes made/needed:
Comment 9•4 years ago
|
||
Comment on attachment 9159999 [details]
Bug 1649056 - Pre-record some meta information before locking the profiler mutex - r?canaltinova
Approved for 79.0b5.
Comment 10•4 years ago
|
||
bugherder uplift |
Description
•