Closed
Bug 1443499
Opened 5 years ago
Closed 5 years ago
Background Hang Reporter: use only one thread for unwinding and reporting
Categories
(Core :: Gecko Profiler, enhancement)
Core
Gecko Profiler
Tracking
()
RESOLVED
FIXED
mozilla61
People
(Reporter: jseward, Assigned: jseward)
References
Details
Attachments
(1 file, 2 obsolete files)
6.67 KB,
patch
|
nika
:
review+
RyanVM
:
approval-mozilla-esr60+
|
Details | Diff | Splinter Review |
When the BHR wants to report a tardy thread, it hands off the work of getting a stacktrace to a helper thread. The helper threads used are in the Stream Transport Service threadpool, which has a default limit of 25 threads. This has a bad effect when we are in a severely compute-resource constrained situation. Then, many threads will be late, and up to 25 STS worker threads will be employed to do unwinding, a potentially expensive operation. This further restricts the compute resources available to progress the rest of the system. Another effect is that the unwinder work will compete against the "real" STS work for the 25 workers, potentially further slowing forward progress. This effects are clearly visible when running on Valgrind -- it is possible to observe 20+ STS threads running for even simple-ish web pages. This is possibly related to the automation timeouts seen in bug 1296819 and therefore some cause for concern.
Assignee | ||
Comment 1•5 years ago
|
||
(WIP) replaces the use of the STS thread pool with a single nsThread dedicated to unwinding/reporting hangs. This makes Firefox on Valgrind markedly more reponsive. :mystor, is something that you would consider (after cleaning up, proper error handling, etc); or perhaps the same but using a small dedicated nsThreadPool instead?
Assignee | ||
Updated•5 years ago
|
Flags: needinfo?(nika)
Comment 2•5 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #1) > (WIP) replaces the use of the STS thread pool with a single > nsThread dedicated to unwinding/reporting hangs. This makes > Firefox on Valgrind markedly more reponsive. > > :mystor, is something that you would consider (after cleaning > up, proper error handling, etc); or perhaps the same but using > a small dedicated nsThreadPool instead? Yeah, I think this is a good idea. We don't really want to be processing multiple hangs in parallel anyway.
Flags: needinfo?(nika)
Assignee | ||
Updated•5 years ago
|
Summary: Background Hang Reporter: limit worst case resource consumption → Background Hang Reporter: use only one thread for unwinding and reporting
Assignee | ||
Comment 3•5 years ago
|
||
Attachment #8956556 -
Attachment is obsolete: true
Assignee | ||
Comment 4•5 years ago
|
||
Attachment #8957062 -
Attachment is obsolete: true
Assignee | ||
Updated•5 years ago
|
Attachment #8957063 -
Flags: review?(nika)
Comment 5•5 years ago
|
||
Comment on attachment 8957063 [details] [diff] [review] bug1443499-3.diff Review of attachment 8957063 [details] [diff] [review]: ----------------------------------------------------------------- Thanks. ::: toolkit/components/backgroundhangmonitor/BackgroundHangMonitor.cpp @@ +103,5 @@ > + > + // Unwinding and reporting of hangs is despatched to this thread. We also > + // keep a bool indicating whether the thread was successfully created. > + nsCOMPtr<nsIThread> mHangProcessingThread; > + bool mHangProcessingThreadExists; Why not just check if mHangProcessingThread is nullptr instead? @@ +261,2 @@ > { > + // Lock so we don't race against the new monitor or processing threads We will never race against the processing thread, so leave this comment as-is.
Attachment #8957063 -
Flags: review?(nika) → review+
Updated•5 years ago
|
Assignee: nobody → jseward
Pushed by jseward@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/177fdd3c0a75 Background Hang Reporter: use only one thread for unwinding and reporting. r=mystor.
Comment 7•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/177fdd3c0a75
Status: NEW → RESOLVED
Closed: 5 years ago
status-firefox61:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Assignee | ||
Comment 8•5 years ago
|
||
Comment on attachment 8957063 [details] [diff] [review] bug1443499-3.diff [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: For automation runs on Valgrind, Valgrind slows down progress so much that the Background Hang Reporter kills the process unnecessarily, leading the test to fail. The problem is exacerbated because the BHR actually uses the Service Transport Stream (STS) thread pool to report hangs, containing 25 threads, further reducing the progress of threads doing "real" work. The patch replaces the use of the STS thread pool with a single nsThread dedicated to unwinding/reporting hangs, which limits the overall resource damage that the BHR can inflict. The case for including this is only in order to reduce failures of Valgrind automation runs for esr60. There is no other purpose. User impact if declined: Zero, because AFAIK end-user (non-Valgrind) runs do not suffer the above-described slowdown scenario w.r.t the Background Hang Reporter. Fix Landed on Version: 61 Risk to taking this patch (and alternatives if risky): Risks that I can think of (I'm not sure if any of them are actually real, or likely): * if for whatever reason the BHR needs to make a lot of reports in a short timeframe. In that case it all reports would get processed only by a single thread rather than potentially in parallel by a group of up to 25 threads. * In early March there was some evidence that this patch had caused a leak, although we only have 4 crash reports for that in total, and it has not re-occurred since then. * The change to a single-thread scheme is somehow not correct in some other way. I'm not aware of any such problem, though. We could skip it, but I fear that would significantly reduce our ability to avoid timeouts on Valgrind runs on automation for esr60. There are two other patches involved in trying to mitigate this -- bug 1442603 and bug 1296819 -- and those two alone might improve the situation, but by how much I don't know. String or UUID changes made by this patch: See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8957063 -
Flags: approval-mozilla-esr60?
Comment 9•5 years ago
|
||
Comment on attachment 8957063 [details] [diff] [review] bug1443499-3.diff The Valgrind jobs on ESR60 are currently failing frequently due to these issues. Risk seems low enough to warrant taking these.
Attachment #8957063 -
Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
Comment 10•5 years ago
|
||
bugherderuplift |
https://hg.mozilla.org/releases/mozilla-esr60/rev/0e72e4c798bb
status-firefox-esr60:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•