Closed Bug 1353440 Opened 7 years ago Closed 7 years ago

Add a Background Hang Reporter for user interactivity periods

Categories

(Core :: DOM: Core & HTML, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Performance Impact high
Tracking Status
firefox55 --- fixed

People

(Reporter: ehsan.akhgari, Assigned: nika)

References

(Depends on 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

Right now Gecko has no notion of whether the user is interacting with the browser or not, and we are in dire need of that.  I'm planning to start doing some simple work to track whether the user is "using" the browser with the following two components initially:

1. Whether the user is typing.
2. Whether the user is clicking.

These are in the human sense of the word, in other words, we would like to teach Gecko to try to understand "typing" and "clicking" beyond the individual keyboard and mouse events.

The initial goal behind this is to have telemetry around the jank that we have when the user is interacting with the browser.  I'm planning to start with a very simple idea based on setting a timer when such an event arrives and keep resetting the timer as new events come in to simulate a period of time of the user actively interacting with the browser.
Whiteboard: [qf:p1 → [qf:p1]
What is this different with the idle observer? The current idle observer observes whole system idleness but we can change it to do both browser-only and system wide idleness tracking. See also bug 1307538
How is this different to user-interaction-active/user-interaction-inactive notifications?
ehsan, is there anything to do here? Like, how is this different to the stuff we already have?
Flags: needinfo?(ehsan)
And based on comment 0 some telemetry code would use this, but which one?
This bug is rather unclear. Why typing and clicking, why not mousemove or wheel scrolling?
But anyhow, Gecko definitely has the notion of user activity.
Sorry for my delay...  I think this is basically a dupe of bug 1307538.  With that we could add a BHR reporter using an idle observer with a short timeout value (1000ms or something) that could give us data about hangs happening when the user is interacting with the browser.  Instead of duping, let's morph this bug to make this about the BHR reporter.  We'd need to fix bug 1307538 first.

Realistically I won't have cycles to work on this any time soon unfortunately, Michael, since you've been working on BHR stuff, are you interested to take this?
Assignee: ehsan → nobody
Depends on: 1307538
Flags: needinfo?(ehsan)
Summary: Add a user activity tracker to Gecko → Add a Background Hang Reporter for user interactivity periods
ok, so this morphed into something else.

Why is this [qf:p1]?
(it is getting rather mysterious to me what is p1)
(In reply to :Ehsan Akhgari (super long backlog, slow to respond, not reviewing for a while) from comment #6)
> Sorry for my delay...  I think this is basically a dupe of bug 1307538. 
> With that we could add a BHR reporter using an idle observer with a short
> timeout value (1000ms or something) that could give us data about hangs
> happening when the user is interacting with the browser.  Instead of duping,
> let's morph this bug to make this about the BHR reporter.  We'd need to fix
> bug 1307538 first.
> 
> Realistically I won't have cycles to work on this any time soon
> unfortunately, Michael, since you've been working on BHR stuff, are you
> interested to take this?

I don't think I understand what the idea is here, is the idea to add a bit to bhr reports which is whether or not the user is currently interacting with the page during the hang?
(In reply to Olli Pettay [:smaug] from comment #7)
> ok, so this morphed into something else.
> 
> Why is this [qf:p1]?
> (it is getting rather mysterious to me what is p1)

I guess we can retriage.

(In reply to Michael Layzell [:mystor] from comment #8)
> I don't think I understand what the idea is here, is the idea to add a bit
> to bhr reports which is whether or not the user is currently interacting
> with the page during the hang?

Well, if that would enable us to look at the hang reports coming from periods of time when the user was interacting with the browser in a separate bucket, then sure.  The basic goal is to be able to have hang reports only for the period of the time that the user was actively using the browser somehow triaged separately from the pool of all hang reports.  But if it's easier to use a separate BHR reporter and gather those hang reports in an entirely separate bucket that seems fine too.  (The former solution seems superior as it doesn't force us to collect more hang report data than we are currently collecting.)
Whiteboard: [qf:p1] → [qf]
I talked to ehsan in person to get a better idea of what he was looking for in this bug. This patch adds annotations to hang reports identifying the ones which occur during user interactions. Ehsan hopes to be able to use the dashboard which we're creating in bug 1344003 and refine his search to exclude hangs which did not occur during user input. This will allow him to prioritize hangs which have a greater impact on the user.

This is the first part of this work, the second part will be part of bug 1344003 where we will have to add UI to exclude hangs based on this annotation.

MozReview-Commit-ID: CSO0K5wiK5H
Attachment #8861161 - Flags: review?(benjamin)
Assignee: nobody → michael
This would not cause a person to leave. It is probably a worthwhile telemetry bug but it is not a qf:p1
Whiteboard: [qf] → [qf:p3]
After discussing this with :bsmedburg upgrading to p1
Whiteboard: [qf:p3] → [qf:p1]
Comment on attachment 8861161 [details] [diff] [review]
Annotate background hang reports which occur while the user is interacting with the browser

Clearing review after discussion on IRC.
Attachment #8861161 - Flags: review?(benjamin)
Attachment #8861161 - Attachment is obsolete: true
Comment on attachment 8863916 [details] [diff] [review]
Annotate background hang reports which occur while the user is interacting with the browser

Review of attachment 8863916 [details] [diff] [review]:
-----------------------------------------------------------------

I'm happy with this but I think it's good for Olli to also have a look at it.

One thing to double check is to ensure that user-interaction-active and user-interaction-inactive are dispatched in both the parent and content processes.  I think they are but we should make sure.

::: dom/base/nsContentUtils.h
@@ +2899,5 @@
> +   * Determine whether or not the user is currently interacting with the web
> +   * browser. This method is safe to call from off of the main thread.
> +   */
> +  static bool
> +  GetUserIsInteracting();

Nit: It would be nice if you make this inline.

@@ +3033,5 @@
>    static int32_t sPrivacyMaxInnerWidth;
>    static int32_t sPrivacyMaxInnerHeight;
>  
> +  class UserInteractionObserver;
> +  static UserInteractionObserver* sUserInteractionObserver;

Nit: please use StaticRefPtr and ClearOnShutdown() for this.
Attachment #8863916 - Flags: review?(ehsan)
Attachment #8863916 - Flags: review?(bugs)
Attachment #8863916 - Flags: review+
The notifications do run in all the processes, based on whether user input has been handled by that process.
Attachment #8863916 - Flags: review?(bugs) → review+
It looks like we fire the active event when we get a UI event. Every five seconds after that, we check if any more UI events triggered in the 5 second interval. If they did, we fire the active event again. Otherwise we fire inactive and cancel the timer.

Five seconds seems like a long time to me. Some of our GC timers are for 4 seconds. Maybe the GC timers are wrong, but I just wanted to point this out.
(In reply to Bill McCloskey (:billm) from comment #17)
> It looks like we fire the active event when we get a UI event. Every five
> seconds after that, we check if any more UI events triggered in the 5 second
> interval. If they did, we fire the active event again. Otherwise we fire
> inactive and cancel the timer.
> 
> Five seconds seems like a long time to me. Some of our GC timers are for 4
> seconds. Maybe the GC timers are wrong, but I just wanted to point this out.

I would be open to changing this timer to make the definition of user activity tighter as well. I attached some printf()s to the observer callbacks in nsContentUtils and tried interacting with the browser, and it definitely seemed like the interaction was considered to be active for longer than I would expect.

I think this should be done in a follow-up, however.
(In reply to Bill McCloskey (:billm) from comment #17)
> It looks like we fire the active event when we get a UI event. Every five
> seconds after that, we check if any more UI events triggered in the 5 second
> interval. If they did, we fire the active event again. Otherwise we fire
> inactive and cancel the timer.

I'm not 100% convinced of our interactivity detection too for similar reasons but Olli seems quite confident that this works well, and I haven't tested this myself, so I was just trusting that.  Olli, do you know if someone has tested what this 5 second timer really maps to?  Detecting "user activity" in the way that humans would define it isn't easy of course, but while a 5 second timer would work really well to deal ahort gaps in between user input events, the fact that we can run GC timers in this period of activity seems weird...

(Improving this in a follow-up sounds fine to me, FWIW.  For now for our BHR purposes this shouldn't matter too much...)
What has GC timers to do with this? GC timers, nor CC timers don't depend on user activity at all.

In the old days we actually did try to trigger GC/CC timers based on these user activity notifications (among other things), and it was wrong. We ended up never running GC, and once user was idle there was some huge pauses.
GC needs to run when it needs to run basically and split to small enough slices if needed or run longer slices if possible.

These days nsJSEnvironment uses user activity just to tweak interslice budget in parent process so that
if user is interacting with the child process area of the and nothing is animating in parent process, we don't want to use longer GC slices in parent process, since those would slow down user input.

5s is random number, as would anything. 1s sound very very short to me. I can easily be typing something and have a 1s pauses, yet I consider I'm definitely still interacting.
Well, fine, we do trigger shrinking GC on user-interaction-active, but that is 30s timer.
er, user-interaction-inactive
I was talking about the NS_GC_DELAY time used for PokeGC. For example, someone could click on a link that loads quickly (LOAD_END) and we would GC within the 5 second user activity period.
(In reply to Bill McCloskey (:billm) from comment #23)
> I was talking about the NS_GC_DELAY time used for PokeGC. For example,
> someone could click on a link that loads quickly (LOAD_END) and we would GC
> within the 5 second user activity period.

I think this may potentially run the risk of making GCs appear to be too much of a hit, but since we're planning scheduling changes for GCs, I don't know if we're going to rely on this data that much for GCs anyways.  And if we do at least that issue won't affect the validity of the rest of the data.

I think we should go ahead and land this patch, I really don't see any reason not to do that.
Pushed by michael@thelayzells.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1beb6571832c
Annotate background hang reports which occur while the user is interacting with the browser, r=ehsan
https://hg.mozilla.org/mozilla-central/rev/1beb6571832c
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
I'm not seeing any annotated results appearing in telemetry. Re-opening to track making sure we actually get these reports.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This should make it so we actually get the annotations.

As before the HangMonitor wasn't instantiated, we just threw out the annotator
when we tried to attach it silently. This should make it so we don't throw it
out anymore.

MozReview-Commit-ID: 2QJMRAUQraL
Attachment #8866931 - Flags: review?(ehsan)
Attachment #8866931 - Flags: review?(ehsan) → review+
Pushed by michael@thelayzells.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a4c22172c7b6
Part 2: Don't register the annotator until after the HangMonitor has started, r=ehsan
https://hg.mozilla.org/mozilla-central/rev/a4c22172c7b6
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Component: DOM → DOM: Core & HTML
Performance Impact: --- → P1
Whiteboard: [qf:p1]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: