Closed Bug 1652613 Opened 3 months ago Closed 2 months ago

Add event telemetry for slow script warnings

Categories

(Firefox :: General, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Firefox 80
Tracking Status
firefox80 --- fixed

People

(Reporter: Gijs, Assigned: Gijs)

References

(Blocks 2 open bugs)

Details

Attachments

(5 files)

To start with, the event should consist of:

  • category: "slow_script_warning"
  • method: "shown"
  • object: either "content" or "chrome" to indicate which process this happened in
  • value/extra : indicate how long the script ran, how long the warning was up, and why/how the warning was dismissed (tab/browser closed, interaction with the warning, automatically when things started behaving better), if uri for hung script is chrome/resource or content-based

There are other things we want, like:

  • whether there was user interaction with the content process while it was hung;
  • our best guesses as to causes of the hang (system is slow, website is doing bad things, browser code being silly, ...)
  • whether the tabs on which the warning was displayed were selected while things were hung (ie how often does this happen in the background)
  • whether user interaction started the thing that was slow

but I think we should file follow-ups for those once we have a basic implementation. AIUI the event telemetry supports the extra event being updated with additional keys after the initial implementation, and I wouldn't want to block getting any additional info here on getting all this additional info.

Assignee: nobody → gijskruitbosch+bugs
Severity: -- → S3
Status: NEW → ASSIGNED
Priority: -- → P1

Comment on attachment 9164294 [details]
Bug 1652613 - record a telemetry event when we show the slow script notification warning, r?mconley

  1. What questions will you answer with this data?

How many web content hangs users are seeing, what causes them, how they respond, and how long those hangs last.

  1. Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements? Some example responses:

We want to improve our handling of hangs based on how they affect users.

  1. What alternative methods did you consider to answer these questions? Why were they not sufficient?

There aren't really alternatives to measuring this in-product if we want representative data.

  1. Can current instrumentation answer these questions?

No, it accumulates hang times in a histogram, but doesn't indicate how common these were or how users and Firefox dealt with them.

  1. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories found on the Mozilla wiki.

Tracking bug for all of these is this bug.

Measurement Description Data Collection Category
how long we were hung 1 or maybe 2 (if user stopped the hang)
which process hung (currently always content, may include 'parent' in future) 1
if/how long we showed a notification 2
why the notification was dismissed (by user or by Firefox) 1 / 2
the type of script that was hung (browser, web content, extension) 1
how often the user requested to wait for the script to finish 2
how often the user switched away from a tab that was hung 2

This collection is documented in its definitions files Histograms.json, Scalars.yaml, and/or Events.yaml and in the Probe Dictionary at https://probes.telemetry.mozilla.org.

  1. How long will this data be collected? Choose one of the following:

I want this data to be collected for 6 months initially (potentially renewable).

  1. What populations / release channels / countries / locales will you measure?

All of them.

  1. Any other filters? Please describe in detail below.

The collection is only run on desktop (not mobile).

  1. If this data collection is default on, what is the opt-out mechanism for users?

Usual telemetry opt-out measures.

  1. Please provide a general description of how you will analyze this data.

Likely to be custom analysis of event pings.

  1. Where do you intend to share the results of your analysis?

Unsure right now, likely to be in this bug or other reports, potentially publicly if we find something significant enough to warrant publishing more general data about what users do in the face of delays/slowness.

12 Is there a third-party tool (i.e. not Telemetry) that you are proposing to use for this data collection?

No.

Attachment #9164294 - Flags: data-review?(chutten)
Attached file data collection review

Data Review Requests should be attached to the bug to integrate more seamlessly with the process. Lemme just do that here.

Attachment #9165051 - Flags: data-review?(chutten)
Attachment #9164294 - Flags: data-review?(chutten)
Comment on attachment 9165051 [details]
data collection review

DATA COLLECTION REVIEW RESPONSE:

    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes.

    Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

    If the request is for permanent data collection, is there someone who will monitor the data over time?

No. This collection will expire in Firefox 85.

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 2, Interaction.

    Is the data collection request for default-on or default-off?

Default on for all channels.

    Does the instrumentation include the addition of any new identifiers?

No.

    Is the data collection covered by the existing Firefox privacy notice?

Yes.

    Does there need to be a check-in in the future to determine whether to renew the data?

Yes. :Gijs is responsible for renewing or removing the collection before it expires in Firefox 85.

---
Result: datareview+
Attachment #9165051 - Flags: data-review?(chutten) → data-review+

The hang duration numbers were bogus. It seems that the code in
XPCJSContext::InterruptCallback establishes the duration by continuously
comparing 'now' with the last timestamp in mSlowScriptCheckpoint - but it
stops writing to the latter after the second time the interrupt callback
fires. So if the slow script runtime limit is N, the timer fires every N/2
seconds, but we increment the duration by N the third time it fires, 1.5N the
fourth time, 2N the fifth time, etc.

This patch fixes the issue by always resetting the timestamp against which we
compare when establishing the duration and incrementing mSlowScriptActualWait.

Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/bdf59854c900
correct hang duration event measuring for slow script hangs, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/1255237ce2e7
report slow script hang durations and only clear hang data after the observer notification, r=mconley
https://hg.mozilla.org/integration/autoland/rev/6f98c9b01920
record a telemetry event when we show the slow script notification warning, r=mconley
https://hg.mozilla.org/integration/autoland/rev/76b5a5d243d1
record whether the user switched away from the tab while it was hung, r=mconley
Flags: needinfo?(gijskruitbosch+bugs)
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/77c85c1ef3f0
correct hang duration event measuring for slow script hangs, r=mccr8
https://hg.mozilla.org/integration/autoland/rev/6e90fed2ff8a
report slow script hang durations and only clear hang data after the observer notification, r=mconley
https://hg.mozilla.org/integration/autoland/rev/5bbd29a7db14
record a telemetry event when we show the slow script notification warning, r=mconley
https://hg.mozilla.org/integration/autoland/rev/74d2da543877
record whether the user switched away from the tab while it was hung, r=mconley
Blocks: 1667245
You need to log in before you can comment on or make changes to this bug.