Add new telemetry probe for excessively long GPU process startup time

RESOLVED FIXED in Firefox 53



3 years ago
3 years ago


(Reporter: gw280, Assigned: gw280)


Dependency tree / graph

Firefox Tracking Flags

(firefox53 fixed)



(2 attachments)

This is our current data for GPU_PROCESS_LAUNCH_TIME_MS:!cumulative=0&end_date=2016-11-14&keys=__none__!__none__!__none__&max_channel_version=nightly%252F53&measure=GPU_PROCESS_LAUNCH_TIME_MS&min_channel_version=null&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2016-11-14&table=0&trim=1&use_submission_date=0

We're going to want to have some more insight into that highest bucket. Upon consulting people who know more about this than I do, it seems our best option is to add a GPU_PROCESS_LAUNCH_TIME_MS_LONG that specifically breaks down the last bucket.
Assignee: nobody → gwright
I'm not sure this needs to be an eternal probe, so I've set it to expire in fx 55 here. We can always renew it later.
It might also be worth putting more telemetry markers in to break down the startup time - ie, how long it takes to actually do the process launch vs. how long to initialise whatever needs to be initialised.
Comment on attachment 8811238 [details] [diff] [review]

Review of attachment 8811238 [details] [diff] [review]:

Will this automatically exclude samples < 1000ms? If not, would be good to do that in GPUChild.
Attachment #8811238 - Flags: review?(dvander) → review+
I believe so. My understanding is that the "low: 1000" field in the histogram definition should cover that.

Comment 6

3 years ago
It will not exclude them entirely: they will be included in the "low" bucket of the histogram, just as any measurements greater than 64000 would be in the "high" bucket. Which seems like the right thing to me.

Comment 7

3 years ago
Why is this better than modifying the current exponential histogram with another 10 buckets and increasing it's max to 64000?
Flags: needinfo?(gwright)
I was told that can't be done? Chris?
Flags: needinfo?(gwright) → needinfo?(chutten)
If I don't miss my guess, what bsmedberg is suggesting is the "Rebucket and rename" approach (see GC_REASON_2) as opposed to the "Add a second one for the long pauses" (see FX_TAB_SWITCH_SPINNER_VISIBLE_LONG_MS)

To my knowledge, neither is better than the other in implementation, ease-of-use, bandwidth cost, or analysis. (Since we've done both in similar cases)
Flags: needinfo?(chutten)
I would suspect then that this would be better, as we should be able to get the GPU process down to < 1000ms launch times for the vast majority of launches, and so the existing one should be useful long-term, and we can use this second one for now to try and get more data on how bad the situation is.
Well, it's possible we can't do anything even if we learn that stuff is slow :)

But either way we should be able to turn one of the histograms off at that point.
Would you prefer the rebucket and rename then? I think both are equally good for our needs.
Attachment #8816312 - Flags: review?(dvander)
Attachment #8816312 - Flags: feedback?(benjamin)
Comment on attachment 8816312 [details] [diff] [review]

Yeah I slightly prefer one histogram if it solves your problems equally well.
Attachment #8816312 - Flags: feedback?(benjamin) → feedback+


3 years ago
Attachment #8811238 - Flags: feedback?(benjamin)
Attachment #8816312 - Flags: review?(dvander) → review+

Comment 14

3 years ago
Pushed by
Rebucket and rename GPU_PROCESS_LAUNCH_TIME_MS to cater for launch times up to 64000ms r=dvander data-review=bsmedberg

Comment 15

3 years ago
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
You need to log in before you can comment on or make changes to this bug.