Closed
Bug 1346356
Opened 8 years ago
Closed 8 years ago
Deadlock in the profiler because locked_profiler_stop notifies observers
Categories
(Core :: Gecko Profiler, defect, P1)
Core
Gecko Profiler
Tracking
()
RESOLVED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox52 | --- | unaffected |
firefox-esr52 | --- | unaffected |
firefox53 | --- | unaffected |
firefox54 | --- | unaffected |
firefox55 | --- | fixed |
People
(Reporter: mstange, Assigned: n.nethercote)
References
(Blocks 1 open bug)
Details
(Keywords: regression)
Attachments
(4 files, 1 obsolete file)
2.38 KB,
patch
|
mstange
:
review+
|
Details | Diff | Splinter Review |
8.98 KB,
patch
|
mstange
:
review+
|
Details | Diff | Splinter Review |
15.13 KB,
patch
|
mstange
:
review+
|
Details | Diff | Splinter Review |
7.98 KB,
patch
|
mstange
:
review+
|
Details | Diff | Splinter Review |
Example call stacks:
> [...]
> NS_InvokeByIndex
> nsProfiler::StopProfiler
> profiler_stop
> locked_profiler_stop <-- !!
> nsObserverService::NotifyObservers
> SharedStub
> PrepareAndDispatch
> nsXPCWrappedJSClass::CallMethod
> JS_CallFunctionValue
> js::Call
> [... js stuff]
> mozilla::dom::Element::SetAttr
> mozilla::dom::Element::SetAttrAndNotify
> nsNodeUtils::AttributeChanged
> mozilla::PresShell::AttributeChanged
> mozilla::GeckoRestyleManager::AttributeChanged
> nsImageBoxFrame::AttributeChanged
> mozilla::PresShell::FrameNeedsReflow
> nsRefreshDriver::AddLayoutFlushObserver
> profiler_get_backtrace
> PR_Lock
> [...]
> mozilla::dom::ContentChild::RecvStopProfiler
> profiler_stop
> locked_profiler_stop <-- !!
> nsObserverService::NotifyObservers
> SharedStub
> PrepareAndDispatch
> nsXPCWrappedJSClass::CallMethod
> JS_CallFunctionValue
> js::Call
> js::InternalCallOrConstruct
> js::RunScript
> [... js stuff]
> js::jit::Bailout
> js::jit::BailoutIonToBaseline
> ProfilerJSEventMarker
> PR_Lock
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 4•8 years ago
|
||
Oh dear. I assigned this to myself and said I would look at it on Monday, but I ended up looking at it yesterday and writing some patches. They are quite similar to yours, but do a bit more refactoring as well, and I fixed a leak I found along the way.
Also, I think yours misses the fact that profiler_init() calls profiler_start().
How about I upload them now for your consideration? They're a little under-tested, but I might as well show you them now given the circumstances.
Assignee | ||
Comment 5•8 years ago
|
||
profiler_start() can call locked_profiler_stop(). In that case it must then
call SamplerThread::Join() in order to free the SamplerThread, but it currently
doesn't.
This patch adds such a call. It also marks locked_profiler_start() with
MOZ_MUST_USE to make this mistake less likely in the future.
Attachment #8846385 -
Flags: review?(mstange)
Assignee | ||
Comment 6•8 years ago
|
||
SamplerThread::Join() is always called just before the SamplerThread object is
deleted. Might as well combine them.
Attachment #8846386 -
Flags: review?(mstange)
Assignee | ||
Comment 7•8 years ago
|
||
Calling NotifyObserver() with gPSMutex locked is a bad idea; arbitrary code can
run in observers, which can easily include code that calls other profiler
functions that themselves lock gPSMutex, causing deadlock. This has been seen
in practise with locked_profiler_stop().
This patch moves all but one of the NotifyObserver() calls in platform.cpp to
after the sections where gPSMutex is locked. The remaining call (for the
"profiler-subprocess") is harmless, because it just calls a simple callback
implemented within platform.cpp, and hard to move.
In the future we plan to allow profiler_start() and profiler_stop() to be
called from different threads. When that happens, it will be possible for the
"profiler-start" and "profiler-stop" notifications to arrive out of order.
Unfortunately this seems difficult to avoid. (Well, recursive mutexes would
make this problem much easier, but we don't have those...)
Attachment #8846387 -
Flags: review?(mstange)
Assignee | ||
Comment 8•8 years ago
|
||
This patch does the following.
- Introduces NotifyObservers() for the simple notification cases in
platform.cpp.
- Removes profiler_lock() and profiler_unlock() because they do notifications
that the profiler add-on no longer listens for.
Attachment #8846388 -
Flags: review?(mstange)
Assignee | ||
Comment 9•8 years ago
|
||
It sounds like you have confirmed that this deadlock is the cause of bug 1273422? Good to know.
Reporter | ||
Comment 10•8 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #4)
Oh, oops. Well, I didn't spend very much time on it, and I trust your patches more than I trust mine. :)
(In reply to Nicholas Nethercote [:njn] from comment #9)
> It sounds like you have confirmed that this deadlock is the cause of bug
> 1273422? Good to know.
I'm not 100% certain but the log does say "###!!! ERROR: Potential deadlock detected:" so I think it's fairly likely.
Thanks for uploading your patches. On the other hand now I can't promise I'll look at them before Monday (your Tuesday)...
Assignee | ||
Comment 11•8 years ago
|
||
mozreview-review |
Comment on attachment 8846352 [details]
Bug 1346356 - Do not notify observers while gPSMutex is locked.
https://reviewboard.mozilla.org/r/119408/#review121304
I will r- because my changes subsume yours, and ultimately I think leave the code in a better state overall. Sorry for the overlap!
Attachment #8846352 -
Flags: review?(n.nethercote) → review-
Assignee | ||
Comment 12•8 years ago
|
||
> Thanks for uploading your patches. On the other hand now I can't promise
> I'll look at them before Monday (your Tuesday)...
It's a public holiday here on Monday anyway :)
Reporter | ||
Comment 13•8 years ago
|
||
Comment on attachment 8846385 [details] [diff] [review]
(part 1) - Fix SamplerThread leak in profiler_start()
Review of attachment 8846385 [details] [diff] [review]:
-----------------------------------------------------------------
::: tools/profiler/core/platform.cpp
@@ +2359,4 @@
> LOG("END profiler_start");
> }
>
> +static MOZ_MUST_USE SamplerThread*
Nice, I was about to suggest adding a MOZ_MUST_USE here.
Attachment #8846385 -
Flags: review?(mstange) → review+
Reporter | ||
Updated•8 years ago
|
Attachment #8846386 -
Flags: review?(mstange) → review+
Reporter | ||
Comment 14•8 years ago
|
||
Comment on attachment 8846387 [details] [diff] [review]
(part 3) - Do most NotifyObserver() calls when gPSMutex is unlocked
Review of attachment 8846387 [details] [diff] [review]:
-----------------------------------------------------------------
::: tools/profiler/core/platform.cpp
@@ +2394,1 @@
> if (samplerThread) {
Using samplerThread is much better than the separate wasActive variable that I was using.
Attachment #8846387 -
Flags: review?(mstange) → review+
Reporter | ||
Updated•8 years ago
|
Attachment #8846388 -
Flags: review?(mstange) → review+
Reporter | ||
Updated•8 years ago
|
Attachment #8846352 -
Attachment is obsolete: true
Assignee | ||
Comment 15•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/3ec4e9dbdcb3f2a47e0618b6c97a3f753f92f1c9
Bug 1346356 (part 1) - Fix SamplerThread leak in profiler_start(). r=mstange.
https://hg.mozilla.org/integration/mozilla-inbound/rev/8b0bfd19d13b861440f99b973249ac70f53cd457
Bug 1346356 (part 2) - Merge SamplerThread::Join() into ~SamplerThread(). r=mstange.
https://hg.mozilla.org/integration/mozilla-inbound/rev/cbfe870385c09f0a63c46b586eeadb113cbde08a
Bug 1346356 (part 3) - Do most NotifyObserver() calls when gPSMutex is unlocked. r=mstange.
https://hg.mozilla.org/integration/mozilla-inbound/rev/e4725643ead636a7cb9b8f2103f68a31fe278dbe
Bug 1346356 (part 4) - Clean up notification in the profiler. r=mstange.
Assignee | ||
Comment 16•8 years ago
|
||
mstange, how did you encounter this? Just running the browser manually?
Comment 17•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/3ec4e9dbdcb3
https://hg.mozilla.org/mozilla-central/rev/8b0bfd19d13b
https://hg.mozilla.org/mozilla-central/rev/cbfe870385c0
https://hg.mozilla.org/mozilla-central/rev/e4725643ead6
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Reporter | ||
Comment 18•8 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #16)
> mstange, how did you encounter this? Just running the browser manually?
Yes, and starting and stopping the profiler. I didn't do anything special.
Updated•8 years ago
|
status-firefox52:
--- → unaffected
status-firefox53:
--- → unaffected
status-firefox54:
--- → unaffected
status-firefox-esr52:
--- → unaffected
You need to log in
before you can comment on or make changes to this bug.
Description
•