Closed
Bug 1382928
Opened 7 years ago
Closed 7 years ago
Intermittent deadlocks in GeckoProfiler gtest
Categories
(Core :: Gecko Profiler, enhancement)
Core
Gecko Profiler
Tracking
()
RESOLVED
FIXED
mozilla57
Tracking | Status | |
---|---|---|
firefox57 | --- | fixed |
People
(Reporter: mstange, Assigned: mstange)
Details
Attachments
(2 files, 1 obsolete file)
The test is using the pattern thread->Dispatch(runnable, NS_DISPATCH_SYNC) in order to run code synchronously on another thread. This spins the event loop on the main thread while waiting for the runnable to complete. While I hadn't planned for this, there's also not a good reason why spinning the event loop should break anything, and in this case I think it has uncovered a real issue. The deadlock is a circular dependency between CheckResponsivenessTask::mMutex, the timer thread's thread startup mutex, and the profiler state lock. The main thread is running a CheckResponsivenessTask. That task is trying to launch a timer thread while holding mMutex. The separate thread that the test launches is currently executing profiler_stop, which is holding the profiler state lock. Inside profiler_lock, locked_profiler_stop is calling ThreadInfo::StopProfiling() which calls CheckResponsivenessTask::Terminate, which wants to grab the CheckResponsivenessTask's mMutex, which is currently being held by the main thread. In order for the main thread to make progress, the timer thread needs to finish launching. But the timer thread is currently trying to register itself with the profiler, and doing that is blocked on the profiler state lock.
Comment 1•7 years ago
|
||
Yikes! Nice diagnosis. I will be interested to see what the fix looks like :)
Assignee | ||
Comment 2•7 years ago
|
||
Comment hidden (mozreview-request) |
Assignee | ||
Comment 4•7 years ago
|
||
I think making mStop an atomic should be all that's needed. That's the only variable that's protected by the mutex.
Comment 5•7 years ago
|
||
mozreview-review |
Comment on attachment 8888948 [details] Bug 1382928 - Make CheckResponsivenessEvent::mStop an atomic. https://reviewboard.mozilla.org/r/159968/#review165426 I've never got my head around this file. Are you sure mMutex only protects mStop? What about mTimer and mLastTracerTime? ::: tools/profiler/gecko/ThreadResponsiveness.cpp:93 (Diff revision 1) > > NS_DECL_ISUPPORTS_INHERITED > > private: > TimeStamp mLastTracerTime; > Mutex mMutex; If this lands as is, you can remove: - mMutex; - the #include "mozilla/Mutex.h"; - the "using" lines involving Mutex.
Attachment #8888948 -
Flags: review?(n.nethercote)
Comment 6•7 years ago
|
||
Apologies for the slow review; I did it 3 days ago but apparently forgot to hit submit :(
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8888948 -
Attachment is obsolete: true
Assignee | ||
Comment 8•7 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #5) > I've never got my head around this file. Are you sure mMutex only protects > mStop? What about mTimer and mLastTracerTime? mTimer is only accessed on the main thread, and mLastTracerTime is explicitly documented as being unprotected. So in that way, the patch I had here didn't make things worse. But that's not a great state of affairs. So I've now gone ahead and tried to make mLastTracerTime an atomic as well, so that it's not subject to data races any more. Unfortunately, you can't store TimeStamp in an atomic, and you can't store a double in an atomic either, so now I'm storing uint64_t(profiler_time() * 1000.0). Microseconds should be accurate enough, and 2^64us ≈ 585000 years, so the range should be acceptable, too.
Comment hidden (mozreview-request) |
Comment 10•7 years ago
|
||
mozreview-review |
Comment on attachment 8892269 [details] Bug 1382928 - Use atomics for thread-shared pieces of state in ThreadResponsiveness.cpp. https://reviewboard.mozilla.org/r/163206/#review168576 I have numerous comments about names below. Each one reflects my train of thought as I was looking at each piece of code. Having now finished the review, I suggest these renamings: aNow --> aStartToNow_ms mLastTracerTime --> aStartToPrevTracer_us (or aStartToPrevResp_us?) GetLastTracerTime() --> GetStartToLastTracer_ms (or "_us" suffix?) A bit clunky, but the AToB form makes the start and end points clear, and the suffix makes the units clear. What do you think? r=me with some improvement of the names. ::: tools/profiler/gecko/ThreadResponsiveness.h:27 (Diff revision 2) > > void Update(); > > - mozilla::TimeDuration GetUnresponsiveDuration(const mozilla::TimeStamp& now) const { > - return now - mLastTracerTime; > + // The number of milliseconds that elapsed since the last > + // CheckResponsivenessTask was processed. > + double GetUnresponsiveDuration(double aNow) const { It's confusing that the call to this function above has the expression "aNow - CorePS::ProcessStartTime()", but this argument is also called "aNow". Maybe "aNowDuration"? Or "aMSSinceProcessStart"? Not sure, but it helps if the differing units are clear in the name. ::: tools/profiler/gecko/ThreadResponsiveness.h:28 (Diff revision 2) > void Update(); > > - mozilla::TimeDuration GetUnresponsiveDuration(const mozilla::TimeStamp& now) const { > - return now - mLastTracerTime; > + // The number of milliseconds that elapsed since the last > + // CheckResponsivenessTask was processed. > + double GetUnresponsiveDuration(double aNow) const { > + return aNow - *mLastTracerTime; I also have a pet peeve with "Last" in variable names -- does it mean "previous" or "final"? :) "Prev" or "Previous" would be better here, I think. ::: tools/profiler/gecko/ThreadResponsiveness.cpp:83 (Diff revision 2) > mStop = true; > } > > - const TimeStamp& GetLastTracerTime() const { > - return mLastTracerTime; > + // Can be called on any thread. > + double GetLastTracerTime() const { > + return mLastTracerTime / 1000.0; Are you sure you want to return milliseconds here? ::: tools/profiler/gecko/ThreadResponsiveness.cpp:128 (Diff revision 2) > > void > ThreadResponsiveness::Update() > { > mActiveTracerEvent->DoFirstDispatchIfNeeded(); > - mLastTracerTime = mActiveTracerEvent->GetLastTracerTime(); > + mLastTracerTime = Some(mActiveTracerEvent->GetLastTracerTime()); Is this a unit error? mLastTracerTime is measured in microseconds, but GetLastTracerTime() returns milliseconds, as far as I can tell. I suggest adding suffixes ("_ms" and "_us" or "MS" and "US") to these time identifiers to make the units clearer.
Attachment #8892269 -
Flags: review?(n.nethercote) → review+
Assignee | ||
Comment 11•7 years ago
|
||
I like your name suggestions. Thanks!
Comment hidden (mozreview-request) |
Assignee | ||
Comment 13•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8892269 [details] Bug 1382928 - Use atomics for thread-shared pieces of state in ThreadResponsiveness.cpp. https://reviewboard.mozilla.org/r/163206/#review168576 > Is this a unit error? mLastTracerTime is measured in microseconds, but GetLastTracerTime() returns milliseconds, as far as I can tell. > > I suggest adding suffixes ("_ms" and "_us" or "MS" and "US") to these time identifiers to make the units clearer. This was not an error, but it was extremely confusing: We have two members called mLastTracerTime, one is on CheckResponsivenessEvent, and one on ThreadResponsiveness. And their units are different. Now with the suffix that's hopefully less confusing.
Comment 14•7 years ago
|
||
Pushed by mstange@themasta.com: https://hg.mozilla.org/integration/autoland/rev/3b7945df033d Use atomics for thread-shared pieces of state in ThreadResponsiveness.cpp. r=njn
Comment 15•7 years ago
|
||
> We have two members
> called mLastTracerTime, one is on CheckResponsivenessEvent, and one on
> ThreadResponsiveness. And their units are different.
lol
Comment 16•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/3b7945df033d
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in
before you can comment on or make changes to this bug.
Description
•