Try to record BHR hangs which precede forced shutdowns
Categories
(Core :: XPCOM, enhancement)
Tracking
()
Tracking | Status | |
---|---|---|
firefox72 | --- | fixed |
People
(Reporter: alexical, Assigned: alexical)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
2.30 KB,
text/plain
|
chutten
:
data-review+
|
Details |
Currently if a user forcibly terminates the main process because it's unresponsive, we won't collect the BHR hang. I think we can remedy this by writing hangs out to disk if we pass a certain threshold of, say, eight seconds (the current definition of a "permahang" in BHR terms). At that point the overhead of trying to persist the stack to disk should be a drop in the pond, and it could provide some very valuable data about egregious hangs that users experience.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 1•6 years ago
|
||
In short - if a user forcibly terminates the browser because it seems
to be permanently hung, we currently do not get a change to record the
hang. This is unfortunate, because these likely represent the most
egregious hangs in terms of user frustration. This patch seeks to
address that.
If a hang exceeds 8192ms (the current definition of a "permahang" in
existing BHR terms), then we decide to immediately persist it to disk,
in case we never get a chance to return to the main thread and
submit it. On the next start of the browser, we read the file from
disk on a background thread, and just submit it using the normal
mechanism.
Regarding the handling of the file itself, I tried to do the simplest
thing I could - as far as I can tell there is no standard simple
serialization mechanism available directly to C++ in Gecko, so I just
serialized it by hand. I didn't take any special care with endianness
or anything as I can't think of a situation in which we really care
at all about these files being transferable between architectures. I
directly used PR_Write / PR_Read instead of doing something fancy
like memory mapping the file, because I don't think performance is a
critical concern here and it offers a simple protection against
reading out of bounds.
Assignee | ||
Comment 2•6 years ago
|
||
Comment 3•6 years ago
|
||
Comment 6•6 years ago
|
||
Backed out 2 changesets (Bug 1594577) for build bustages at HangDetails.cpp.
Backout: https://hg.mozilla.org/integration/autoland/rev/d7e41714afd172c1a027bca721a2e3bc319d0257
Push that started the failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&revision=bd42216f7b6309c683bcc8d9d63c26a834d08d04&selectedJob=276505424
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=276505424&repo=autoland&lineNumber=33868
Please note that there is also an ESlint failure:
https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=bd42216f7b6309c683bcc8d9d63c26a834d08d04&selectedJob=276505391
Comment 7•6 years ago
|
||
and some mochitest failures: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=276511743&repo=autoland&lineNumber=1160
Assignee | ||
Comment 8•6 years ago
|
||
Comment 10•6 years ago
|
||
Backed out for xpcshell failures on test_watchdog_hibernate.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/08dc799f5943e645d811b0c4ab37713a01ebe94b
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=277301048&repo=autoland&lineNumber=2757
Comment 11•6 years ago
|
||
There were also GTest failures like https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=277300473&repo=autoland&lineNumber=1877
Comment 12•6 years ago
|
||
Also dt failures on browser_dbg-toolbox-unselected-pause.js
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=277320157&repo=autoland&lineNumber=13834
Assignee | ||
Comment 13•6 years ago
|
||
Comment 14•6 years ago
|
||
Comment 15•6 years ago
|
||
bugherder |
Description
•