Closed
Bug 1443499
Opened 8 years ago
Closed 8 years ago
Background Hang Reporter: use only one thread for unwinding and reporting
Categories
(Core :: Gecko Profiler, enhancement)
Core
Gecko Profiler
Tracking
()
RESOLVED
FIXED
mozilla61
People
(Reporter: jseward, Assigned: jseward)
References
Details
Attachments
(1 file, 2 obsolete files)
|
6.67 KB,
patch
|
nika
:
review+
RyanVM
:
approval-mozilla-esr60+
|
Details | Diff | Splinter Review |
When the BHR wants to report a tardy thread, it hands off the work of
getting a stacktrace to a helper thread. The helper threads used are in the
Stream Transport Service threadpool, which has a default limit of 25
threads.
This has a bad effect when we are in a severely compute-resource constrained
situation. Then, many threads will be late, and up to 25 STS worker threads
will be employed to do unwinding, a potentially expensive operation. This
further restricts the compute resources available to progress the rest of
the system.
Another effect is that the unwinder work will compete against the "real" STS
work for the 25 workers, potentially further slowing forward progress.
This effects are clearly visible when running on Valgrind -- it is possible
to observe 20+ STS threads running for even simple-ish web pages. This is
possibly related to the automation timeouts seen in bug 1296819 and
therefore some cause for concern.
| Assignee | ||
Comment 1•8 years ago
|
||
(WIP) replaces the use of the STS thread pool with a single
nsThread dedicated to unwinding/reporting hangs. This makes
Firefox on Valgrind markedly more reponsive.
:mystor, is something that you would consider (after cleaning
up, proper error handling, etc); or perhaps the same but using
a small dedicated nsThreadPool instead?
| Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(nika)
Comment 2•8 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #1)
> (WIP) replaces the use of the STS thread pool with a single
> nsThread dedicated to unwinding/reporting hangs. This makes
> Firefox on Valgrind markedly more reponsive.
>
> :mystor, is something that you would consider (after cleaning
> up, proper error handling, etc); or perhaps the same but using
> a small dedicated nsThreadPool instead?
Yeah, I think this is a good idea. We don't really want to be processing multiple hangs in parallel anyway.
Flags: needinfo?(nika)
| Assignee | ||
Updated•8 years ago
|
Summary: Background Hang Reporter: limit worst case resource consumption → Background Hang Reporter: use only one thread for unwinding and reporting
| Assignee | ||
Comment 3•8 years ago
|
||
Attachment #8956556 -
Attachment is obsolete: true
| Assignee | ||
Comment 4•8 years ago
|
||
Attachment #8957062 -
Attachment is obsolete: true
| Assignee | ||
Updated•8 years ago
|
Attachment #8957063 -
Flags: review?(nika)
Comment 5•8 years ago
|
||
Comment on attachment 8957063 [details] [diff] [review]
bug1443499-3.diff
Review of attachment 8957063 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks.
::: toolkit/components/backgroundhangmonitor/BackgroundHangMonitor.cpp
@@ +103,5 @@
> +
> + // Unwinding and reporting of hangs is despatched to this thread. We also
> + // keep a bool indicating whether the thread was successfully created.
> + nsCOMPtr<nsIThread> mHangProcessingThread;
> + bool mHangProcessingThreadExists;
Why not just check if mHangProcessingThread is nullptr instead?
@@ +261,2 @@
> {
> + // Lock so we don't race against the new monitor or processing threads
We will never race against the processing thread, so leave this comment as-is.
Attachment #8957063 -
Flags: review?(nika) → review+
Updated•8 years ago
|
Assignee: nobody → jseward
Pushed by jseward@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/177fdd3c0a75
Background Hang Reporter: use only one thread for unwinding and reporting. r=mystor.
Comment 7•8 years ago
|
||
| bugherder | ||
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox61:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
| Assignee | ||
Comment 8•7 years ago
|
||
Comment on attachment 8957063 [details] [diff] [review]
bug1443499-3.diff
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
For automation runs on Valgrind, Valgrind slows down progress so much
that the Background Hang Reporter kills the process unnecessarily,
leading the test to fail. The problem is exacerbated because the BHR
actually uses the Service Transport Stream (STS) thread pool to report
hangs, containing 25 threads, further reducing the progress of threads
doing "real" work.
The patch replaces the use of the STS thread pool with a single nsThread
dedicated to unwinding/reporting hangs, which limits the overall resource
damage that the BHR can inflict.
The case for including this is only in order to reduce failures of Valgrind
automation runs for esr60. There is no other purpose.
User impact if declined:
Zero, because AFAIK end-user (non-Valgrind) runs do not suffer the
above-described slowdown scenario w.r.t the Background Hang Reporter.
Fix Landed on Version:
61
Risk to taking this patch (and alternatives if risky):
Risks that I can think of (I'm not sure if any of them are actually
real, or likely):
* if for whatever reason the BHR needs to make a lot of reports in a short
timeframe. In that case it all reports would get processed only by a
single thread rather than potentially in parallel by a group of up to 25
threads.
* In early March there was some evidence that this patch had caused a leak,
although we only have 4 crash reports for that in total, and it has not
re-occurred since then.
* The change to a single-thread scheme is somehow not correct in some other
way. I'm not aware of any such problem, though.
We could skip it, but I fear that would significantly reduce our ability
to avoid timeouts on Valgrind runs on automation for esr60. There are
two other patches involved in trying to mitigate this -- bug 1442603 and
bug 1296819 -- and those two alone might improve the situation, but by
how much I don't know.
String or UUID changes made by this patch:
See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8957063 -
Flags: approval-mozilla-esr60?
Comment 9•7 years ago
|
||
Comment on attachment 8957063 [details] [diff] [review]
bug1443499-3.diff
The Valgrind jobs on ESR60 are currently failing frequently due to these issues. Risk seems low enough to warrant taking these.
Attachment #8957063 -
Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
Comment 10•7 years ago
|
||
| bugherder uplift | ||
status-firefox-esr60:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•