Find a meaningful presentation for the CHECKERBOARD probe family
Categories
(Data Science :: Investigation, task, P1)
Tracking
(data-science-status Evaluation & interpretation)
Tracking | Status | |
---|---|---|
data-science-status | --- | Evaluation & interpretation |
People
(Reporter: tdsmith, Assigned: tdsmith)
References
Details
Brief description of the request: The distribution of CHECKERBOARD_SEVERITY is strongly bimodal, structured as the sum of two widely-separated log-normal distributions. Because these populations are widely separated and the proportion of pings we receive from users in each group varies stochastically over time as builds roll out, summary metrics are very unstable for small n. Further, because the mild and severe peaks are so widely separated, an arithmetic mean of the per-user experience amounts to asking whether a user has ever experienced a severe checkerboarding event, which is not the intent. Because we believe checkerboarding events are undesirable and affect the user experience negatively, we would like to propose a visualization that accurately reflects the impact of WebRender on the user experience. Link to any assets: WebRender dashboard review: https://bugzilla.mozilla.org/show_bug.cgi?id=1501470
Assignee | ||
Comment 1•6 years ago
|
||
Some early iteration in https://dbc-caf9527b-e073.cloud.databricks.com/#notebook/44825/command/44830
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 2•5 years ago
|
||
Some more iteration in https://dbc-caf9527b-e073.cloud.databricks.com/#notebook/86542/command/86586.
I think it makes sense to essentially treat these like crash rates, since they are individually rare-ish (most user-days have zero events, and we only have one or two days of activity per user for each nightly build) and depend on active use of the browser (so active_ticks
models exposure to the risk).
Discarding the least severe events (< 500) and plotting the population rate/active-ticks ratio (after truncating either to the 99th percentile per user-day) looks stable over time, and comparable for WR vs control.
Plotting active_ticks-scaled "badness" (sum of log10(severity)) over the population shows identical-looking trends -- it would be nice to capture that some events are worse than others but then we lose the ability to treat it as a Poisson process.
Next step is to add this to the WebRender dashboard.
Assignee | ||
Comment 3•5 years ago
|
||
Added this to the 67 release monitoring dashboard. This should go on the continuous monitoring dashboard as well.
Assignee | ||
Comment 4•5 years ago
|
||
Done in https://github.com/tdsmith/webrender-dashboard/commit/24840237e9e0071c224940ea96e72c42415c761f.
Description
•