Write a dashboard to track the number of tab spinners

RESOLVED FIXED

Status

()

Toolkit
Telemetry
P3
normal
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: chutten, Assigned: chutten)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Assignee)

Description

2 years ago
I'm thinking telemetry.js + metricsgraphics.js, pull down the count() of Nightly submissions per day and graph it for TAB_SWITCH_SPINNER_VISIBLE_LONG_MS ...and maybe also TAB_SWITCH_TOTAL_MS for comparison?

Off-the-cuff concerns:
Will this need to be normalized... spinners per tabswitch? normalizing could be difficult.

This is in support of the wider "E10s Spinners are bad" tree near bug 1300411

:blassey, anything in particular you want to see? Questions you want to be able to answer?
Flags: needinfo?(blassey.bugs)
Normalizing by total tab switches or by usage hours would probably be ideal. What makes that difficult?
Flags: needinfo?(blassey.bugs)
(Assignee)

Comment 2

2 years ago
Getting usage hours is not possible via aggregates.telemetry.mozilla.org, which is where I was planning on getting the figures.

Total tab switches I can get by counting the reports of TAB_SWITCH_TOTAL_MS, which ought to do the trick.
(Assignee)

Comment 3

2 years ago
Here's a first-blush attempt. Is this what you're after? https://chutten.github.io/bug1310250/
Flags: needinfo?(blassey.bugs)
(In reply to Chris H-C :chutten from comment #3)
> Here's a first-blush attempt. Is this what you're after?
> https://chutten.github.io/bug1310250/

Yes, that looks great. You can clearly see Bill's landing on 9/17 reducing the number of spinners as well as his one day landing on 8/22. 

The other thing I think would be useful would be time spent in a spinner normalized with non-spinner tab switches at 0ms. With that any movement in median, 75th percentile etc would be meaningful.
Flags: needinfo?(blassey.bugs)
(Assignee)

Comment 5

2 years ago
I'm not sure I quite understand what you mean, but I understand that there's a lack of consideration for the time spent in a spinner. I've updated https://chutten.github.io/bug1310250/ to include a plot showing the proportion of total tab switch time spent in spinners.

I'm not sure what's going on when there's more time spent in spinners for a given build than there was time spent switching tabs in total (the > 100% points in August)...

Is this close to what you had in mind?
Flags: needinfo?(blassey.bugs)
What I had in mind was calculating an average spinner time per tab-switch per user and being able to plot that out (such that we could get a median, average, 75th percentile etc). I realize now that my description was not clear before. That said, this plot might get the job done. Thanks.

The noise in the data is a bit concerning. For example most of the plots show a spike on a 7 day period (9/22, 9/29, 10/5). That's probably not something to solver for here, but something to solve for in telemetry overall. 

As for the >100% of tab switch time being spinner time, that does seem wrong. Mike, any ideas?
Flags: needinfo?(blassey.bugs) → needinfo?(mconley)
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #6)
> As for the >100% of tab switch time being spinner time, that does seem
> wrong. Mike, any ideas?

Hm, looking at the probe again:

http://searchfox.org/mozilla-central/rev/d96317a351af8aa78ab9847e7feed964bbaac7d7/browser/base/content/tabbrowser.xml#4048

We cancel the stopwatch if one exists when the user initiates a tab switch.

I believe this means that the following scenario is possible:

1. User has tabs A, B and C. User is viewing tab A.
2. Content process hangs.
3. User switches to tab B (start FX_TAB_SWITCH_TOTAL_E10S_MS stopwatch), and then sees the spinner (start FX_TAB_SWITCH_SPINNER_VISIBLE_MS stopwatch)
4. User switches to tab C (cancel original FX_TAB_SWITCH_TOTAL_E10S_MS, start new FX_TAB_SWITCH_TOTAL_E10S_MS stopwatch) but spinner is still visible (FX_TAB_SWITCH_SPINNER_VISIBLE_MS continues to run)
5. Content process hang ends, and paint occurs.
6. FX_TAB_SWITCH_TOTAL_E10S_MS started in step 4 (and not step 3) is recorded, FX_TAB_SWITCH_SPINNER_VISIBLE_MS started in step 3 is recorded.

So in this scenario, the FX_TAB_SWITCH_SPINNER_VISIBLE_MS is roughly equal to the (canceled, unrecorded) time to switch to tab B, and the (recorded and finished) switch to tab C.

I'm not certain that this is the cause for the spikes in August, but it's one possible explanation.
Flags: needinfo?(mconley)
(Assignee)

Comment 8

2 years ago
Well that calls into question the use of FX_TAB_SWITCH_TOTAL_E10S_MS as a normalizer for spinners. It's a loose lower-bound of how often and how long our users switch tabs, so it's still useful in a broad sense (I mean, it correctly shows the improvements brought on by :billm's first patch)... but we may need to manufacture a new probe if we need better control.

For tab switch counts, maybe I should use TAB_SWITCH_CACHE_POSITION? or TAB_SWITCH_UDATE_MS? Which would you prefer, :mconley? Or do all current probes suffer the same problem as FX_TAB_SWITCH_TOTAL_E10S_MS?


As for the periodic spikes, they're not _quite_ periodic.

09-22, 09-29, 10-05, 10-10, 10-16... 09-22 could just be a correction from 09-21's trough, and the 10-10 is a two-day peak from 10-09...

It is a bit ridiculous that it can sway by tens of percents for a single day, even on Nightly. However, if the usual Nightly Noise Winds are pushing one way and get a boost from not being able to necessarily trust tabswitch_total... maybe that's enough to explain it.
Flags: needinfo?(mconley)
Perhaps FX_TAB_SWITCH_UPDATE_MS makes the most sense. Reading tabbrowser.xml, it looks like this will be recorded once per switch, and doesn't include other uses of update (for example, after a docShell / frameloader swap when we tear tab tear out / tear in).
Flags: needinfo?(mconley)
Priority: -- → P3
(Assignee)

Comment 10

2 years ago
FX_TAB_SWITCH_UPDATE_MS is... a little unstable and weird for normalization efforts.

I think it's probably best to stick with FX_TAB_SWITCH_TOTAL_E10S_MS.

So I guess we're done here?
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
I've forked and modded the dashboard to help show the percentage of affected clients, and the severity of the spinners that those affected clients see.

https://mikeconley.github.io/bug1310250/
You need to log in before you can comment on or make changes to this bug.