Experiment design review checklist:
What is the goal of the effort the experiment is supporting?
The use of Webrender as a rendering solution for Firefox. Webrender has many desirable qualities, and has been validated by two previous experiments. This experiment is to add further validation as this feature is rolled out in 67.
Is an experiment a useful next step towards this goal?
Yes, because it gives a an estimation for feature rollout if Webrender is performant and stable relative to the existing Firefox rendering solution.
What is the hypothesis or research question? Are the consequences for the top-level goal clear if the hypothesis is confirmed or rejected?
- To validate the previous results on the Release and Beta channel found in a two previous experiments that WebRender is a stable and performant rendering solution.
- Yes the consequences are clear: it will either validate or invalidate Webrender as having acceptable performance and stability for feature rollout.
Which measurements will be taken, and how do they support the hypothesis and goal? Are these measurements available in the targeted release channels? Has there been data steward review of the collection?
- The measurements being take are as follows:
No more than a 5% increase in overall crash reports
No more than a 5% increase in OOM crash reports
No more than a 5% increase in shutdown crashes
CANVAS_WEBGL_SUCCESS - no more than 5% regression in "True" value
DEVICE_RESET_REASON - no more than 5% regression in number of submissions
CHECKERBOARD_DURATION - no more than 5% regression in distribution
CHECKERBOARD_PEAK - no more than 5% regression in distribution
CHECKERBOARD_SEVERITY - no more than 5% regression in distribution
CONTENT_LARGE_PAINT_PHASE_WEIGHT - no more than 5% regression in number of submissions
CONTENT_PAINT_TIME - no more than 5% regression in distribution
FX_PAGE_LOAD_MS - no more than 5% regression in distribution
FX_TAB_CLICK_MS - no more than 5% regression in distribution
COMPOSITE_TIME - no more than 10% regression in distribution
CONTENT_FRAME_TIME - no more than 10% regression in distribution
COMPOSITE_FRAME_ROUNDTRIP_TIME - expect to see an improvement here
- These metric measure rendering performance, thereby supporting the hypothesis.
- These measurements are all available in the release channel (though it is too much effort to make that determination).
- At the time, there hasn't been an optional data steward review of the collection.
Is the experiment design supported by an analysis plan? Is it adequate to answer the experimental questions
Yes, the experiment plan follows the previous experiments. In addition, the plan increases the sample size from the Release experiments, from the observed deployment behavior and sample sizes acquired.
Is the requested sample size supported by a power analysis that includes the core product metrics?
Yes, it is the same as the previous study. However, statistics acquired from the similar previous study are used to calculate the requisite sample size.
If the experiment is deployed to channels other than release, is it acceptable that the results will not be representative of the release population?
Not applicable - experiment is deployed on release.