Closed
Bug 1234618
Opened 10 years ago
Closed 10 years ago
BHR doesn't seem to handle the idle case correctly
Categories
(Toolkit :: Telemetry, defect)
Toolkit
Telemetry
Tracking
()
RESOLVED
FIXED
mozilla46
People
(Reporter: mconley, Unassigned)
References
Details
Attachments
(1 file)
|
1.16 KB,
patch
|
jchen
:
review+
ritu
:
approval-mozilla-aurora+
ritu
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
Noticed this while investigating bug 1223780. It seems that, regardless of whether or not e10s is enabled, we report microhangs for the main process if a caret is blinking and the process is idle.
STR:
1) Open up a Firefox profile, and install chutten's statuser add-on[1] - this will add a toolbar item that will display how many chrome hangs have been reported to Telemetry.
2) Open about:home, and select the search input so that the caret is blinking.
3) Don't touch the mouse or keyboard. Just let the process go idle.
ER:
No hangs should be repoted.
AR:
Hangs are reported and the count goes up in the toolbar item pretty rapidly.
jrmuizel has been working with me on this. We suspect the BHR is not handling the idle case correctly. I'll let him elaborate, since I think he's got more of the problem swapped into his head.
[1]: https://github.com/chutten/statuser
| Reporter | ||
Updated•10 years ago
|
Flags: needinfo?(jmuizelaar)
Comment 1•10 years ago
|
||
It looks when we call NotifyActivity after waiting for a long time we'll store mIntervalNow in currentThread->mInterval.
Then we wake the Hang Monitor.
It updates the mIntervalNow to the current time. Let's say 150 ms in the future
We then compare the new mIntervalNow to currentThread->mInterval and make the assumption that we've been hanging for 150ms.
It's not clear to me how this is supposed to be working.
Flags: needinfo?(nchen)
Comment 2•10 years ago
|
||
Does this happen on all platforms?
In a nutshell, the hang monitor thread is supposed to keep track of updates to currentThread->mInterval, and expects mInterval to be updated regularly.
When it sees that mInterval hasn't been updated for longer than the timeout value, it records the thread as hanging. When mInterval is finally updated again and the thread was hanging before, it calculates and records the hang duration.
This mechanism is disabled if the thread called NotifyWait before the NotifyActivity call.
Also, mIntervalNow is not the absolute current time. We add a time delta to mIntervalNow each time the hang monitor runs, provided the time delta is smaller than a certain value. However, if the system was sleeping, or if the hang monitor thread didn't get run frequently enough, we keep mIntervalNow the same, and let it get out-of-sync with the absolute time.
Flags: needinfo?(nchen)
Comment 3•10 years ago
|
||
(In reply to Jim Chen [:jchen] [:darchons] from comment #2)
> Does this happen on all platforms?
> This mechanism is disabled if the thread called NotifyWait before the
> NotifyActivity call.
Good point. If it is limited to specific platforms, we should audit their message pumps to ensure that we're not missing any of these calls. I previously found a missing call on the Mac in bug 1132305.
Comment 4•10 years ago
|
||
(In reply to Jim Chen [:jchen] [:darchons] from comment #2)
> In a nutshell, the hang monitor thread is supposed to keep track of updates
> to currentThread->mInterval, and expects mInterval to be updated regularly.
Except it's not updated regularly. A bunch of threads have their wait time set to PR_INTERVAL_NO_TIMEOUT and so we divide this by 4 and set waitTime to that. This causes us to wait a very long but not PR_INTERVAL_NO_TIMEOUT amount of time and so we still update mIntervalNow and think that we've been hung for a long time.
Flags: needinfo?(jmuizelaar)
Comment 5•10 years ago
|
||
Attachment #8701195 -
Flags: review?(nchen)
Comment 6•10 years ago
|
||
Comment on attachment 8701195 [details] [diff] [review]
Fix misdetecting hangs
Review of attachment 8701195 [details] [diff] [review]:
-----------------------------------------------------------------
LGTM
Attachment #8701195 -
Flags: review?(nchen) → review+
Updated•10 years ago
|
Flags: needinfo?(vladan.bugzilla)
Comment 8•10 years ago
|
||
Comment on attachment 8701195 [details] [diff] [review]
Fix misdetecting hangs
Approval Request Comment
[Feature/regressing bug #]: The original BHR implementation
[User impact if declined]: We have more inaccurate BHR reports. Not immediately visible.
[Describe test coverage new/current, TreeHerder]: None
[Risks and why]: Should be pretty low. The safety of the patch is pretty obvious from inspection.
Attachment #8701195 -
Flags: approval-mozilla-beta?
Attachment #8701195 -
Flags: approval-mozilla-aurora?
Comment 9•10 years ago
|
||
| bugherder | ||
Status: NEW → RESOLVED
Closed: 10 years ago
status-firefox46:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Comment 10•10 years ago
|
||
Thanks for fixing this, let's get it uplifted :)
Flags: needinfo?(vladan.bugzilla)
Comment 11•10 years ago
|
||
Comment on attachment 8701195 [details] [diff] [review]
Fix misdetecting hangs
Helps get more accurate telemetry data, seems like a good idea. Beta44+, Aurora45+
Attachment #8701195 -
Flags: approval-mozilla-beta?
Attachment #8701195 -
Flags: approval-mozilla-beta+
Attachment #8701195 -
Flags: approval-mozilla-aurora?
Attachment #8701195 -
Flags: approval-mozilla-aurora+
status-firefox44:
--- → affected
status-firefox45:
--- → affected
Comment 12•10 years ago
|
||
Setting status 44 and 45 to fixed for comment 12.
Wes, I think something might have gone wrong with the landing on beta. The patch was landed on the branch GECKO430b7_2015112614_RELBRANCH, which I don't think is in use. Can you confirm this is true and re-land if so?
Flags: needinfo?(wkocher)
Here's a non-relbranch beta version: https://hg.mozilla.org/releases/mozilla-beta/rev/2e553b97f22f
Flags: needinfo?(wkocher)
Comment 16•10 years ago
|
||
| bugherder uplift | ||
status-b2g-v2.5:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•