Closed Bug 1504863 Opened 6 years ago Closed 5 years ago

Find a meaningful presentation for the CHECKERBOARD probe family

Categories

(Data Science :: Investigation, task, P1)

task
Points:
2

Tracking

(data-science-status Evaluation & interpretation)

RESOLVED FIXED
Tracking Status
data-science-status --- Evaluation & interpretation

People

(Reporter: tdsmith, Assigned: tdsmith)

References

Details

Brief description of the request:

The distribution of CHECKERBOARD_SEVERITY is strongly bimodal, structured as the sum of two widely-separated log-normal distributions.

Because these populations are widely separated and the proportion of pings we receive from users in each group varies stochastically over time as builds roll out, summary metrics are very unstable for small n.

Further, because the mild and severe peaks are so widely separated, an arithmetic mean of the per-user experience amounts to asking whether a user has ever experienced a severe checkerboarding event, which is not the intent.

Because we believe checkerboarding events are undesirable and affect the user experience negatively, we would like to propose a visualization that accurately reflects the impact of WebRender on the user experience.

Link to any assets:

WebRender dashboard review: https://bugzilla.mozilla.org/show_bug.cgi?id=1501470
Points: --- → 2
Priority: -- → P3
data-science-status: --- → Modeling
Priority: P3 → P2

Some more iteration in https://dbc-caf9527b-e073.cloud.databricks.com/#notebook/86542/command/86586.

I think it makes sense to essentially treat these like crash rates, since they are individually rare-ish (most user-days have zero events, and we only have one or two days of activity per user for each nightly build) and depend on active use of the browser (so active_ticks models exposure to the risk).

Discarding the least severe events (< 500) and plotting the population rate/active-ticks ratio (after truncating either to the 99th percentile per user-day) looks stable over time, and comparable for WR vs control.

Plotting active_ticks-scaled "badness" (sum of log10(severity)) over the population shows identical-looking trends -- it would be nice to capture that some events are worse than others but then we lose the ability to treat it as a Poisson process.

Next step is to add this to the WebRender dashboard.

data-science-status: Modeling → Evaluation & interpretation
Priority: P2 → P1

Added this to the 67 release monitoring dashboard. This should go on the continuous monitoring dashboard as well.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.