Closed
Bug 951431
Opened 10 years ago
Closed 10 years ago
Monitor main thread hangs using BackgroundHangMonitor
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla29
People
(Reporter: jchen, Assigned: jchen)
References
(Blocks 1 open bug)
Details
Attachments
(3 files)
4.51 KB,
patch
|
vladan
:
review+
|
Details | Diff | Splinter Review |
2.36 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
865 bytes,
patch
|
BenWa
:
review+
lsblakk
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
No description provided.
Assignee | ||
Comment 1•10 years ago
|
||
This patch sets up a BackgroundHangMonitor for the Gecko thread that spans XPCOM lifetime. For now it piggybacks on the existing HangMonitor for activity and wait notifications. Also for now, this is for all platforms, but I can make it Android only if needed (only on Android is the Gecko thread a "background thread").
Attachment #8349797 -
Flags: review?(benjamin)
Assignee | ||
Comment 2•10 years ago
|
||
This patch simply makes us report permanent hangs as long transient hangs for now, until we get better analysis (e.g. stack unwinding) for permanent hangs. Not an essential patch and can be reviewed/landed whenever.
Attachment #8349798 -
Flags: review?(nfroyd)
Assignee | ||
Comment 3•10 years ago
|
||
ThreadStackHelper's signal handling can interfere with the profiler's signal handling, resulting in intermittent xpcshell timeouts/failures. I think the safest thing for now is to avoid getting the stack when the profiler is active, which is acceptable because in practice, the profiler is rarely running.
Attachment #8349799 -
Flags: review?(bgirard)
Comment 4•10 years ago
|
||
Comment on attachment 8349799 [details] [diff] [review] Don't get stacks during profiler runs on Linux (v1) Review of attachment 8349799 [details] [diff] [review]: ----------------------------------------------------------------- This feels like a silent failure that is going to cause people to wonder why things aren't working.
Attachment #8349799 -
Flags: review?(bgirard) → review+
![]() |
||
Comment 5•10 years ago
|
||
Comment on attachment 8349798 [details] [diff] [review] Report permanent hangs as transient hangs for now (v1) Review of attachment 8349798 [details] [diff] [review]: ----------------------------------------------------------------- r=me with the question below answered satisfactorily. ::: xpcom/threads/BackgroundHangMonitor.cpp @@ +264,5 @@ > if (MOZ_UNLIKELY(hangTime >= currentThread->mMaxTimeout)) { > // A permahang started > // Skip subsequent iterations and tolerate a race on mWaiting here > currentThread->mWaiting = true; > + currentThread->mHanging = false; What's the rationale for this change? (I don't think it's necessarily incorrect, I'm just surprised to find it along with the other changes.)
Attachment #8349798 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 6•10 years ago
|
||
(In reply to Nathan Froyd (:froydnj) (AFKish through 3 January 2014) from comment #5) > Comment on attachment 8349798 [details] [diff] [review] > Report permanent hangs as transient hangs for now (v1) > > ::: xpcom/threads/BackgroundHangMonitor.cpp > @@ +264,5 @@ > > if (MOZ_UNLIKELY(hangTime >= currentThread->mMaxTimeout)) { > > // A permahang started > > // Skip subsequent iterations and tolerate a race on mWaiting here > > currentThread->mWaiting = true; > > + currentThread->mHanging = false; > > What's the rationale for this change? (I don't think it's necessarily > incorrect, I'm just surprised to find it along with the other changes.) When mHanging is true, we're looking for either the end of the transient hang or its transitioning into a permanent hang. When either case occurs, we need to reset mHanging to false so that we don't continue to look for those two conditions. And this line fixes the previous bug that mHanging was not reset when entering a permanent hang.
Comment 7•10 years ago
|
||
I don't understand what this change actually does. We already have a main-thread hang detector, and this one seems to duplicate that using the new mechanism. Why bother?
Assignee | ||
Comment 8•10 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #7) > I don't understand what this change actually does. We already have a > main-thread hang detector, and this one seems to duplicate that using the > new mechanism. Why bother? The current hang monitor is only enabled on Windows, whereas the new mechanism is cross-platform. Especially for Android, we want main-thread hang monitoring because the Gecko "main thread" is actually a background thread and we're seeing performance issues related to that.
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(benjamin)
Comment 9•10 years ago
|
||
> The current hang monitor is only enabled on Windows What does that mean? We certainly build it on all platforms, and appear to initialize it on all platforms: http://mxr.mozilla.org/mozilla-central/source/xpcom/build/nsXPComInit.cpp#610
Flags: needinfo?(benjamin)
Assignee | ||
Comment 10•10 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #9) > > The current hang monitor is only enabled on Windows > > What does that mean? We certainly build it on all platforms, and appear to > initialize it on all platforms: > http://mxr.mozilla.org/mozilla-central/source/xpcom/build/nsXPComInit.cpp#610 Currently the hang monitor is disabled through pref in bug 705748, http://mxr.mozilla.org/mozilla-central/source/modules/libpref/src/init/all.js#1879 It's only enabled on Windows nightlies because of the Telemetry chrome hangs feature, which overrides that pref. Background hang reporting is similar to chrome hangs but is designed to support multiple platforms and multiple threads.
Flags: needinfo?(benjamin)
Comment 11•10 years ago
|
||
Comment on attachment 8349797 [details] [diff] [review] Monitor main thread hangs using BackgroundHangMonitor (v1) As I understand it, that pref merely controls whether we *crash* on hang. It does not control anything else about the measurement of hangs. I'm disappointed that we seem to have two independent hang monitoring systems and we're trying to glue them together like this. If this is expediently necessary, that's fine, but we should file and assign followups to get this back into a relatively sane single set of code.
Attachment #8349797 -
Flags: review?(benjamin) → review?(vdjeric)
Flags: needinfo?(benjamin)
Updated•10 years ago
|
Attachment #8349797 -
Flags: review?(vdjeric) → review+
Assignee | ||
Comment 12•10 years ago
|
||
Last try run: https://tbpl.mozilla.org/?tree=Try&rev=e21d470dbc38 https://hg.mozilla.org/integration/mozilla-inbound/rev/f3a6bb4975a8 https://hg.mozilla.org/integration/mozilla-inbound/rev/134244c3d863 https://hg.mozilla.org/integration/mozilla-inbound/rev/b6ac5d30404d
Comment 13•10 years ago
|
||
(In reply to Jim Chen [:jchen :nchen] from comment #3) > Created attachment 8349799 [details] [diff] [review] > Don't get stacks during profiler runs on Linux (v1) I don't know whether this is an intended or unintended side effect, but without this patch in place, I get segfaults every run, from the sigaction() call that is protected by the one-and-only-hunk in this patch. I don't know why calling sigaction would cause Gecko to jump off to a completely nonsense address, but it does. I was unable to determine whether it is the calling thread or some other thread, that jumps into outer space at the sigaction(). This is when running on arm-android, using native unwind, with the new unwinder library from bug 938157.
https://hg.mozilla.org/mozilla-central/rev/f3a6bb4975a8 https://hg.mozilla.org/mozilla-central/rev/134244c3d863 https://hg.mozilla.org/mozilla-central/rev/b6ac5d30404d
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Assignee | ||
Comment 15•10 years ago
|
||
Comment on attachment 8349799 [details] [diff] [review] Don't get stacks during profiler runs on Linux (v1) [Approval Request Comment] Bug caused by (feature/regressing bug #): Bug 932865 / Bug 935092 User impact if declined: Random deadlocks when using the profiler tool Testing completed (on m-c, etc.): Locally, m-c Risk to taking this patch (and alternatives if risky): None; patch only avoids hang condition. String or IDL/UUID changes made by this patch: None
Attachment #8349799 -
Flags: approval-mozilla-aurora?
Updated•10 years ago
|
Attachment #8349799 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 16•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/990a21978a2d https://hg.mozilla.org/releases/mozilla-aurora/rev/73f3ce609c23 https://hg.mozilla.org/releases/mozilla-aurora/rev/36dfb491674d
status-firefox28:
--- → fixed
status-firefox29:
--- → fixed
Comment 17•10 years ago
|
||
Backed out the first two changesets because only the third was intended for uplift. https://hg.mozilla.org/releases/mozilla-aurora/rev/6332a66fff72
Comment 18•10 years ago
|
||
Does this have a reproducible case to verify this is working as expected?
Flags: needinfo?(nchen)
Assignee | ||
Comment 19•10 years ago
|
||
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #18) > Does this have a reproducible case to verify this is working as expected? There's now a "Thread Hangs" section in about:telemetry. On startup, there should be a "Gecko" subsection with a histogram called "Gecko-Activity". When a hang occurs (for example by running "for(var a=0;a<1e7;a++);" in the web console), there should be a new "Gecko-Hang-1" histogram under the "Gecko" subsection. I'll look into possibly writing an automated test for this.
Flags: needinfo?(nchen)
Hi Jim, I'm seeing two threads for hang monitoring in my firefox nightlies now (one for the main thread, one for non-main threads). As bsmedberg requested in comment 11, have there been followups filed to unify these threads?
Flags: needinfo?(nchen)
Assignee | ||
Comment 21•9 years ago
|
||
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #20) > Hi Jim, > > I'm seeing two threads for hang monitoring in my firefox nightlies now (one > for the main thread, one for non-main threads). As bsmedberg requested in > comment 11, have there been followups filed to unify these threads? Thanks for the reminder. I filed bug 951431 for a start.
Flags: needinfo?(nchen)
Assignee | ||
Comment 22•9 years ago
|
||
Ugh. Although this bug is a start, the bug I just filed was bug 982337.
You need to log in
before you can comment on or make changes to this bug.
Description
•