Closed Bug 1474484 Opened 6 years ago Closed 6 years ago

[Shield] WebRender V1 Experiment

Categories

(Shield :: Shield Study, defect)

defect
Not set
normal

Tracking

(firefox63 affected)

RESOLVED FIXED
Tracking Status
firefox63 --- affected

People

(Reporter: relaas, Unassigned)

References

(Depends on 1 open bug)

Details

Basic description of experiment: Enable WebRender by default on qualified hardware by setting the gfx.webrender.all.qualified pref to true for the test population.

What are the branches of the study? Half the experiment population would have WebRender enabled, and half would not.  No third branch.

What percentage of users do you want in each branch? 50/50 is fine.

What Channels and locales do you intend to ship to? Nightly only.

What is your intended go live date and how long will the study run? July 16, 2 weeks

Are there specific criteria for participants? Windows 10, Nvidia GPU only.

What is the main effect you are looking for and what data will you use to make these decisions?
No more than a 5% increase in overall crash reports
No more than a 5% increase in OOM crash reports
No more than a 5% increase in shutdown crashes

Telemetry probes:
CANVAS_WEBGL_SUCCESS - no more than 5% regression in "True" value
DEVICE_RESET_REASON - no more than 5% regression in number of submissions
CHECKERBOARD_DURATION - no more than 5% regression in distribution
CHECKERBOARD_PEAK - no more than 5% regression in distribution
CHECKERBOARD_SEVERITY - no more than 5% regression in distribution
CONTENT_LARGE_PAINT_PHASE_WEIGHT - no more than 5% regression in number of submissions
CONTENT_PAINT_TIME - no more than 5% regression in distribution
FX_PAGE_LOAD_MS - no more than 5% regression in distribution
FX_TAB_CLICK_MS - no more than 5% regression in distribution
COMPOSITE_TIME - no more than 10% regression in distribution
CONTENT_FRAME_TIME - no more than 10% regression in distribution
COMPOSITE_FRAME_ROUNDTRIP_TIME - expect to see an improvement here

Who is the owner of the data analysis for this study? David Bolter, Tim Smith

Who will have access to the data? David Bolter, Thomas Elin, Kartikaya Gupta, William (Chris) Beard, Tim Smith 

Do you plan on surveying users at the end of the study? No

User facing title of the experiment: WebRender

User facing description of the experiment: New generation graphics rendering engine

Link to any relevant google docs / Drive files that describe the project. 
[PHD] https://docs.google.com/document/d/1mo76Ub0l5cNIII0oKqoVn25Dop4nLKKf0CVhvK4_Wps/edit
[Release Criteria] https://docs.google.com/document/d/1zs5b-hAXnIxvl_acGUjibSSeT4ftI_3TIqAoePgCtBI
Summary: WebRender V1 Experiment → [Shield] WebRender V1 Experiment
Science review: R+
Dave, can we get your R+ for the peer review? Thanks.
Flags: needinfo?(dtownsend)
Peer review: R+
Flags: needinfo?(dtownsend)
Sign Off for WebRender - (YELLOW)

WebRender
Targeted: Firefox Nightly 63.0a1

We have finished testing the WebRender experiment.

We have found the following issues:
- Bug 1474583 - [WebRender Shield Study] Higher CPU usage with WebRender enabled on YouTube
- Bug 1474595 - [WebRender shield study] FPS drop with WebRender enabled on webgl demo websites
- Bug 1474294 - [WebRender Shield Study] Specific images entirely coded in HTML & CSS are not correctly rendered with WebRender enabled

QA’s recommendation: YELLOW - SHIP IT, CONDITIONALLY

Reasoning:
- We tested the try build and there is a slight improvement but overall the results remained the same, the activation of WebRender increases the FPS for some of the websites tested, but also decreases the FPS for others (Bug 1474595 P1).
- Even though CPU usage and battery life are not a priority for V1 (Bug 1474583 also P1), there still is the concern that this could have a negative impact on the users.

Testing summary:
- Full Functional test suite: TestRail (https://goo.gl/EpbvZb);
- Verified that the Telemetry probes are correctly sent;
- Tested loading time on Alexa’s topsites, CPU usage, FPS measurements and benchmark with Motion Mark: Testing results (https://goo.gl/2PwRxz).

Tested Platforms:
- Windows 10 x64

Tested Firefox versions:
- Firefox Nightly 63.0a1
Thanks Carmen! From the developer side we're still wanting to ship the experiment. Enabling the experiment will give us more data as to whether the FPS drop is a widespread issue (i.e. affects many users/sites) or restricted to a subset of hardware of pages. Based on QA testing it seems to affect webgl "demo" sites - this class of websites is only going to be a small fraction of the websites visited by users, and so doesn't need to block the experiment.
(In reply to Carmen Fat [:carmenf] - Experiments QA from comment #4)
> Motion Mark: Testing results (https://goo.gl/2PwRxz).

Your Motion Mark numbers look wrong.
Look how awesome it is: https://docs.google.com/spreadsheets/d/e/2PACX-1vQolBzSivIh_pZlciaAmZECPjoo5O3T_O0esg2bMF0mhgbKDFFyO-h-ueeR3cl4PLYCpvRjKIXHGrUb/pubhtml
(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #6)
> (In reply to Carmen Fat [:carmenf] - Experiments QA from comment #4)
> > Motion Mark: Testing results (https://goo.gl/2PwRxz).
> 
> Your Motion Mark numbers look wrong.
> Look how awesome it is:
> https://docs.google.com/spreadsheets/d/e/2PACX-
> 1vQolBzSivIh_pZlciaAmZECPjoo5O3T_O0esg2bMF0mhgbKDFFyO-h-
> ueeR3cl4PLYCpvRjKIXHGrUb/pubhtml

It depends on the hardware being used, as well as the prefs. The sheet you linked to has prefs set to disable the performance.now mitigations and to turn on ASAP mode, which produces much better results but is also a non-default configuration.
Andreas, can you R+ this experiment and that you understand the potential risks, from Product's perspective?
Flags: needinfo?(abovens)
Product: R+
Flags: needinfo?(abovens)
We're live after resolving some confusion around GPU targeting.

The final targeting is:
* Nightly 63+
* Windows 10
* Has an NVidia GPU (can't rely on isActive since that changes dynamically AFAIK; vendorID = '0x10de')
Depends on: 1477156
For those cases where there are two GPUs, is there any way to know (from the results) if the NVidia GPU is being used with Firefox or not?
Depends on: 1477380
(Thomas Elin [:relaas] from comment #0)
> User facing title of the experiment: WebRender
> User facing description of the experiment: New generation graphics rendering engine
Depends on: 1447499
We've fixed our recipe issue for the latest Nightly and relaunched this.  It will only target the most recent Nightly and newer, so our fulfillment will be lower.

Per the analysis, I don't know the answer to that.  Unfortunately dynamic GPU switching makes things very hard to analyze in an unambiguous way.  I am not an expert here though.

Thanks all
Per Thomas I've ended this study. We can close this bug after Tim has a chance to finish his analysis.
Thanks, Matt.

Here's a first look [1] at the distributions of per-user averages for the probes mentioned in the PHD; apologies for the lack of polish. Fewer users submitted qualifying telemetry to the treatment arm vs the control arm for reasons I think are unclear [2]. Sample size was not a concern for powering the comparisons we wanted to make, although if the factors that led to fewer users landing in the treatment arm were associated with some biasing factor (hardware age? etc), that could severely distort the results.

Many of the probes showed improvements but there appears to have been a marked regression in COMPOSITE_TIME, and the raw fraction of users experiencing a crash looks higher in the treatment branch (unadjusted for activity).

Some remaining work includes comparing activity metrics between the branches and packaging the report for presentation; this may drag since I'll be at onboarding next week.

[1] https://dbc-caf9527b-e073.cloud.databricks.com/#notebook/26331/command/26332
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1480242
Here's a look at some activity metrics between the branches; usage hours (measured by the activeTicks simpleMeasurement) were ~20% lower and # URIs visited was 11% lower in the treatment branch after filtering down to the users who actually received WebRender: https://dbc-caf9527b-e073.cloud.databricks.com/#notebook/26915/resultsOnly

Since usage was lower in the treatment branch, the study may have underestimated the fraction of users who would experience a crash and the number of crashes per user with WebRender enabled (vs the case where usage was the same between the branches). Both of those metrics were already higher in the WebRender branch vs the control branch.

I'll close out the bug here; please let me know if I can help with anything else.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.