Closed Bug 1443499 Opened 8 years ago Closed 8 years ago

Background Hang Reporter: use only one thread for unwinding and reporting

Categories

(Core :: Gecko Profiler, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr60 --- fixed
firefox61 --- fixed

People

(Reporter: jseward, Assigned: jseward)

References

Details

Attachments

(1 file, 2 obsolete files)

When the BHR wants to report a tardy thread, it hands off the work of getting a stacktrace to a helper thread. The helper threads used are in the Stream Transport Service threadpool, which has a default limit of 25 threads. This has a bad effect when we are in a severely compute-resource constrained situation. Then, many threads will be late, and up to 25 STS worker threads will be employed to do unwinding, a potentially expensive operation. This further restricts the compute resources available to progress the rest of the system. Another effect is that the unwinder work will compete against the "real" STS work for the 25 workers, potentially further slowing forward progress. This effects are clearly visible when running on Valgrind -- it is possible to observe 20+ STS threads running for even simple-ish web pages. This is possibly related to the automation timeouts seen in bug 1296819 and therefore some cause for concern.
Attached patch WIP patch (obsolete) — Splinter Review
(WIP) replaces the use of the STS thread pool with a single nsThread dedicated to unwinding/reporting hangs. This makes Firefox on Valgrind markedly more reponsive. :mystor, is something that you would consider (after cleaning up, proper error handling, etc); or perhaps the same but using a small dedicated nsThreadPool instead?
Flags: needinfo?(nika)
(In reply to Julian Seward [:jseward] from comment #1) > (WIP) replaces the use of the STS thread pool with a single > nsThread dedicated to unwinding/reporting hangs. This makes > Firefox on Valgrind markedly more reponsive. > > :mystor, is something that you would consider (after cleaning > up, proper error handling, etc); or perhaps the same but using > a small dedicated nsThreadPool instead? Yeah, I think this is a good idea. We don't really want to be processing multiple hangs in parallel anyway.
Flags: needinfo?(nika)
Summary: Background Hang Reporter: limit worst case resource consumption → Background Hang Reporter: use only one thread for unwinding and reporting
Attached patch bug1443499-2t.diff (obsolete) — Splinter Review
Attachment #8956556 - Attachment is obsolete: true
Attachment #8957062 - Attachment is obsolete: true
Attachment #8957063 - Flags: review?(nika)
Comment on attachment 8957063 [details] [diff] [review] bug1443499-3.diff Review of attachment 8957063 [details] [diff] [review]: ----------------------------------------------------------------- Thanks. ::: toolkit/components/backgroundhangmonitor/BackgroundHangMonitor.cpp @@ +103,5 @@ > + > + // Unwinding and reporting of hangs is despatched to this thread. We also > + // keep a bool indicating whether the thread was successfully created. > + nsCOMPtr<nsIThread> mHangProcessingThread; > + bool mHangProcessingThreadExists; Why not just check if mHangProcessingThread is nullptr instead? @@ +261,2 @@ > { > + // Lock so we don't race against the new monitor or processing threads We will never race against the processing thread, so leave this comment as-is.
Attachment #8957063 - Flags: review?(nika) → review+
Assignee: nobody → jseward
Pushed by jseward@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/177fdd3c0a75 Background Hang Reporter: use only one thread for unwinding and reporting. r=mystor.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Depends on: 1445215
Comment on attachment 8957063 [details] [diff] [review] bug1443499-3.diff [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: For automation runs on Valgrind, Valgrind slows down progress so much that the Background Hang Reporter kills the process unnecessarily, leading the test to fail. The problem is exacerbated because the BHR actually uses the Service Transport Stream (STS) thread pool to report hangs, containing 25 threads, further reducing the progress of threads doing "real" work. The patch replaces the use of the STS thread pool with a single nsThread dedicated to unwinding/reporting hangs, which limits the overall resource damage that the BHR can inflict. The case for including this is only in order to reduce failures of Valgrind automation runs for esr60. There is no other purpose. User impact if declined: Zero, because AFAIK end-user (non-Valgrind) runs do not suffer the above-described slowdown scenario w.r.t the Background Hang Reporter. Fix Landed on Version: 61 Risk to taking this patch (and alternatives if risky): Risks that I can think of (I'm not sure if any of them are actually real, or likely): * if for whatever reason the BHR needs to make a lot of reports in a short timeframe. In that case it all reports would get processed only by a single thread rather than potentially in parallel by a group of up to 25 threads. * In early March there was some evidence that this patch had caused a leak, although we only have 4 crash reports for that in total, and it has not re-occurred since then. * The change to a single-thread scheme is somehow not correct in some other way. I'm not aware of any such problem, though. We could skip it, but I fear that would significantly reduce our ability to avoid timeouts on Valgrind runs on automation for esr60. There are two other patches involved in trying to mitigate this -- bug 1442603 and bug 1296819 -- and those two alone might improve the situation, but by how much I don't know. String or UUID changes made by this patch: See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8957063 - Flags: approval-mozilla-esr60?
Comment on attachment 8957063 [details] [diff] [review] bug1443499-3.diff The Valgrind jobs on ESR60 are currently failing frequently due to these issues. Risk seems low enough to warrant taking these.
Attachment #8957063 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: