Closed
Bug 1346356
Opened 4 years ago
Closed 4 years ago
Deadlock in the profiler because locked_profiler_stop notifies observers
Categories
(Core :: Gecko Profiler, defect, P1)
Core
Gecko Profiler
Tracking
()
RESOLVED
FIXED
mozilla55
| Tracking | Status | |
|---|---|---|
| firefox52 | --- | unaffected |
| firefox-esr52 | --- | unaffected |
| firefox53 | --- | unaffected |
| firefox54 | --- | unaffected |
| firefox55 | --- | fixed |
People
(Reporter: mstange, Assigned: njn)
References
(Blocks 1 open bug)
Details
(Keywords: regression)
Attachments
(4 files, 1 obsolete file)
|
2.38 KB,
patch
|
mstange
:
review+
|
Details | Diff | Splinter Review |
|
8.98 KB,
patch
|
mstange
:
review+
|
Details | Diff | Splinter Review |
|
15.13 KB,
patch
|
mstange
:
review+
|
Details | Diff | Splinter Review |
|
7.98 KB,
patch
|
mstange
:
review+
|
Details | Diff | Splinter Review |
Example call stacks: > [...] > NS_InvokeByIndex > nsProfiler::StopProfiler > profiler_stop > locked_profiler_stop <-- !! > nsObserverService::NotifyObservers > SharedStub > PrepareAndDispatch > nsXPCWrappedJSClass::CallMethod > JS_CallFunctionValue > js::Call > [... js stuff] > mozilla::dom::Element::SetAttr > mozilla::dom::Element::SetAttrAndNotify > nsNodeUtils::AttributeChanged > mozilla::PresShell::AttributeChanged > mozilla::GeckoRestyleManager::AttributeChanged > nsImageBoxFrame::AttributeChanged > mozilla::PresShell::FrameNeedsReflow > nsRefreshDriver::AddLayoutFlushObserver > profiler_get_backtrace > PR_Lock > [...] > mozilla::dom::ContentChild::RecvStopProfiler > profiler_stop > locked_profiler_stop <-- !! > nsObserverService::NotifyObservers > SharedStub > PrepareAndDispatch > nsXPCWrappedJSClass::CallMethod > JS_CallFunctionValue > js::Call > js::InternalCallOrConstruct > js::RunScript > [... js stuff] > js::jit::Bailout > js::jit::BailoutIonToBaseline > ProfilerJSEventMarker > PR_Lock
| Comment hidden (mozreview-request) |
| Comment hidden (mozreview-request) |
| Assignee | ||
Comment 4•4 years ago
|
||
Oh dear. I assigned this to myself and said I would look at it on Monday, but I ended up looking at it yesterday and writing some patches. They are quite similar to yours, but do a bit more refactoring as well, and I fixed a leak I found along the way. Also, I think yours misses the fact that profiler_init() calls profiler_start(). How about I upload them now for your consideration? They're a little under-tested, but I might as well show you them now given the circumstances.
| Assignee | ||
Comment 5•4 years ago
|
||
profiler_start() can call locked_profiler_stop(). In that case it must then call SamplerThread::Join() in order to free the SamplerThread, but it currently doesn't. This patch adds such a call. It also marks locked_profiler_start() with MOZ_MUST_USE to make this mistake less likely in the future.
Attachment #8846385 -
Flags: review?(mstange)
| Assignee | ||
Comment 6•4 years ago
|
||
SamplerThread::Join() is always called just before the SamplerThread object is deleted. Might as well combine them.
Attachment #8846386 -
Flags: review?(mstange)
| Assignee | ||
Comment 7•4 years ago
|
||
Calling NotifyObserver() with gPSMutex locked is a bad idea; arbitrary code can run in observers, which can easily include code that calls other profiler functions that themselves lock gPSMutex, causing deadlock. This has been seen in practise with locked_profiler_stop(). This patch moves all but one of the NotifyObserver() calls in platform.cpp to after the sections where gPSMutex is locked. The remaining call (for the "profiler-subprocess") is harmless, because it just calls a simple callback implemented within platform.cpp, and hard to move. In the future we plan to allow profiler_start() and profiler_stop() to be called from different threads. When that happens, it will be possible for the "profiler-start" and "profiler-stop" notifications to arrive out of order. Unfortunately this seems difficult to avoid. (Well, recursive mutexes would make this problem much easier, but we don't have those...)
Attachment #8846387 -
Flags: review?(mstange)
| Assignee | ||
Comment 8•4 years ago
|
||
This patch does the following. - Introduces NotifyObservers() for the simple notification cases in platform.cpp. - Removes profiler_lock() and profiler_unlock() because they do notifications that the profiler add-on no longer listens for.
Attachment #8846388 -
Flags: review?(mstange)
| Assignee | ||
Comment 9•4 years ago
|
||
It sounds like you have confirmed that this deadlock is the cause of bug 1273422? Good to know.
| Reporter | ||
Comment 10•4 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #4) Oh, oops. Well, I didn't spend very much time on it, and I trust your patches more than I trust mine. :) (In reply to Nicholas Nethercote [:njn] from comment #9) > It sounds like you have confirmed that this deadlock is the cause of bug > 1273422? Good to know. I'm not 100% certain but the log does say "###!!! ERROR: Potential deadlock detected:" so I think it's fairly likely. Thanks for uploading your patches. On the other hand now I can't promise I'll look at them before Monday (your Tuesday)...
| Assignee | ||
Comment 11•4 years ago
|
||
| mozreview-review | ||
Comment on attachment 8846352 [details] Bug 1346356 - Do not notify observers while gPSMutex is locked. https://reviewboard.mozilla.org/r/119408/#review121304 I will r- because my changes subsume yours, and ultimately I think leave the code in a better state overall. Sorry for the overlap!
Attachment #8846352 -
Flags: review?(n.nethercote) → review-
| Assignee | ||
Comment 12•4 years ago
|
||
> Thanks for uploading your patches. On the other hand now I can't promise
> I'll look at them before Monday (your Tuesday)...
It's a public holiday here on Monday anyway :)| Reporter | ||
Comment 13•4 years ago
|
||
Comment on attachment 8846385 [details] [diff] [review] (part 1) - Fix SamplerThread leak in profiler_start() Review of attachment 8846385 [details] [diff] [review]: ----------------------------------------------------------------- ::: tools/profiler/core/platform.cpp @@ +2359,4 @@ > LOG("END profiler_start"); > } > > +static MOZ_MUST_USE SamplerThread* Nice, I was about to suggest adding a MOZ_MUST_USE here.
Attachment #8846385 -
Flags: review?(mstange) → review+
| Reporter | ||
Updated•4 years ago
|
Attachment #8846386 -
Flags: review?(mstange) → review+
| Reporter | ||
Comment 14•4 years ago
|
||
Comment on attachment 8846387 [details] [diff] [review] (part 3) - Do most NotifyObserver() calls when gPSMutex is unlocked Review of attachment 8846387 [details] [diff] [review]: ----------------------------------------------------------------- ::: tools/profiler/core/platform.cpp @@ +2394,1 @@ > if (samplerThread) { Using samplerThread is much better than the separate wasActive variable that I was using.
Attachment #8846387 -
Flags: review?(mstange) → review+
| Reporter | ||
Updated•4 years ago
|
Attachment #8846388 -
Flags: review?(mstange) → review+
| Reporter | ||
Updated•4 years ago
|
Attachment #8846352 -
Attachment is obsolete: true
| Assignee | ||
Comment 15•4 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/3ec4e9dbdcb3f2a47e0618b6c97a3f753f92f1c9 Bug 1346356 (part 1) - Fix SamplerThread leak in profiler_start(). r=mstange. https://hg.mozilla.org/integration/mozilla-inbound/rev/8b0bfd19d13b861440f99b973249ac70f53cd457 Bug 1346356 (part 2) - Merge SamplerThread::Join() into ~SamplerThread(). r=mstange. https://hg.mozilla.org/integration/mozilla-inbound/rev/cbfe870385c09f0a63c46b586eeadb113cbde08a Bug 1346356 (part 3) - Do most NotifyObserver() calls when gPSMutex is unlocked. r=mstange. https://hg.mozilla.org/integration/mozilla-inbound/rev/e4725643ead636a7cb9b8f2103f68a31fe278dbe Bug 1346356 (part 4) - Clean up notification in the profiler. r=mstange.
| Assignee | ||
Comment 16•4 years ago
|
||
mstange, how did you encounter this? Just running the browser manually?
Comment 17•4 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/3ec4e9dbdcb3 https://hg.mozilla.org/mozilla-central/rev/8b0bfd19d13b https://hg.mozilla.org/mozilla-central/rev/cbfe870385c0 https://hg.mozilla.org/mozilla-central/rev/e4725643ead6
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
| Reporter | ||
Comment 18•4 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #16) > mstange, how did you encounter this? Just running the browser manually? Yes, and starting and stopping the profiler. I didn't do anything special.
Updated•4 years ago
|
status-firefox52:
--- → unaffected
status-firefox53:
--- → unaffected
status-firefox54:
--- → unaffected
status-firefox-esr52:
--- → unaffected
You need to log in
before you can comment on or make changes to this bug.
Description
•