Bug 1381147 (stylo-pref-study)

[Shield] Pref Flip Study: Quantum CSS (Stylo)

NEW
Assigned to

Status

Shield
Shield Study
P5
normal
a month ago
5 days ago

People

(Reporter: cpeterson, Assigned: cpeterson)

Tracking

(Depends on: 1 bug, Blocks: 1 bug)

Details

(URL)

(Assignee)

Description

a month ago
Quantum CSS (Stylo) Product Hypothesis Doc:
https://docs.google.com/document/d/1sFHmpnqgTTjsNhOSU4qd3wiqgErtjxM7vsl9kCvB5qQ/

Details:

> Basic description of experiment:

Compare the stability and performance of Stylo and Gecko.

> What is the preference we will be changing?

layout.css.servo.enabled

> What are the branches of the study and what values should each branch be set to?

Gecko (control group): layout.css.servo.enabled = false

Stylo (experiment group): layout.css.servo.enabled = true

> What percentage of users do you want in each branch?

Start at 1% of Nightly with uneven branches: 4 to 1 treatment/control. After a week move to 20% with the same ratio. After two weeks move to 50% with the same ratio.

> What Channels/locales do you intend to ship to?

We would like to run the Stylo experiment in Nightly 56 and again in Beta 56 for all locales.

> What is your intended go live date and how long will the study run?

Nightly 56 experiment: July 17 through August 2. 

Beta 56 experiment: mid-August, depending on results of Nightly 56 experiment.

> Are there specific criteria for participants?

OS = Mac, Win32, Win64, or Linux64. Stylo does not work on Linux32 or Android yet!
Flags: needinfo?(mgrimes)
Looks good. We have reviewed the PHD and given the stamp of approval. Send your release drives email and we'll ship on Monday.
Flags: needinfo?(mgrimes)
(Assignee)

Comment 2

a month ago
> What is the main effect you are looking for and what data will you use to make these decisions?

Stylo usage should not be less than Gecko usage. If Stylo’s usage is less than Gecko, then Stylo users are probably have a bad experience due to Stylo bugs.

Compare:

* Total usage hours: bigger is better, but we don’t expect any changes during this experiment.
* Total pages visited: bigger is better, but we don’t expect any changes during this experiment.
* Total URIs visited: bigger is better, but we don’t expect any changes during this experiment.

Stylo crash rate should be no worse than Gecko. We should compare crash rate (per kilohour usage?) instead of total crash count in case Stylo’s usage is less than Gecko’s.

Compare:

* Browser process crashes per kilohour usage: smaller is better, but we don’t expect to see any changes.
* Content process crashes per kilohour usage: smaller is better, but we will likely see slightly more crashes with Stylo.

Stylo page load performance should not be slower than Gecko.

Telemetry probes:

* TIME_TO_DOM_LOADING_MS: smaller is better, but we don’t expect Stylo to affect this probe at all. Thus, this “DOM loading” time is a baseline.
* TIME_TO_DOM_INTERACTIVE_MS: smaller is better, but we don’t expect Stylo to affect this probe during this first experiment. We experiment Stylo to improve (reduce) “DOM interative” time in future experiments.
* TIME_TO_DOM_COMPLETE_MS: smaller is better, but we don’t expect Stylo to affect this probe during this first experiment. We experiment Stylo to improve (reduce) “DOM complete” time in future experiments.

We do not expect Stylo to improve page load performance yet because the team is still fixing some known performance issues before we enable it by default. Ideally we would be able to compare performance and number of CPU cores. Stylo performance should increase linearly with the number of cores.

Stylo should not use more than 20% more memory than the control group. 20% is an arbitrary target. We know Stylo currently uses more memory than Gecko, but we plan to improve Stylo’s memory usage before we enable it by default.

Telemetry probes:

* MEMORY_TOTAL (total memory across all processes): smaller is better, but we expect Stylo to use about 20% more memory that Gecko during this first experiment.
* MEMORY_UNIQUE (unique set size): smaller is better, but we expect Stylo to use about 20% more memory that Gecko during this first experiment.
* MEMORY_VSIZE (virtual memory size): smaller is better, but we expect Stylo to use about 20% more memory that Gecko during this first experiment.

> Who is the owner of the data analysis for this study?

Bobby Holley is the Stylo tech lead.

> Will this experiment require uplift?

No

> Do you plan on surveying users at the end of the study?

No

> Link to any relevant google docs / Drive files that describe the project. Links to prior art if it exists:

Stylo project wiki: https://wiki.mozilla.org/Quantum/Stylo
(Assignee)

Updated

a month ago
Priority: -- → P5
Live for 1% of Nightly. 4 to 1 Ratio of Stylo/Treatment to Gecko/Control. Let us know when you are ready to go to a wider audience.
Great! We've fixed a lot issues over the past week and the feedback channels seem pretty quiet, so I'm increasingly confident that stylo should be a smooth experience.

Ideally, we'd get some baseline reading that nothing disastrous is happening and then bump the population up to 20%. How long will it take us to establish that nothing's on fire with this experiment?
Flags: needinfo?(mgrimes)
You should have data in experiments viewer in the next 24 hours. Anything not represented there can be pulled from Telemetry manually by one of my folks. I think Ilana was looking at this already.
Flags: needinfo?(mgrimes)
Bumped the sample to 20% of Nightly. Still using 4 to 1 ratio.
(Assignee)

Comment 7

23 days ago
Firefox 56 is riding from the Nightly to Beta channel this week. We would like to expand our Stylo experiment to include:

* 50% of Nightly 56 and 57
* 20% of Beta 56
(Assignee)

Comment 8

23 days ago
(In reply to Chris Peterson [:cpeterson] from comment #7)
> Firefox 56 is riding from the Nightly to Beta channel this week. We would
> like to expand our Stylo experiment to include:
> 
> * 50% of Nightly 56 and 57
> * 20% of Beta 56

On second thought, we should probably wait for Stylo top crash (bug 1384824) to be fixed before we increase our sample size. I will follow up when the crash is fixed.
Depends on: 1384824
Thanks Chris. I'll wait for your confirmation and then we can make the changes you requested.
I believe we're ready to increase the experiment to 50%. Chris, let me know if you're aware of anything else blocking us.
Flags: needinfo?(mgrimes)
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #10)
> I believe we're ready to increase the experiment to 50%.

(To be clear, I mean on Nightly. We're still uplifting those two crash fixes to beta).
(Assignee)

Comment 12

17 days ago
I notified the release-drivers list:

The Stylo team has fixed its top crash (bug 1384824) so we'd like to increase our pref study's sample size from 20% to 50% of Nighty 57.

We should disable the study for Nightly *56* so stragglers are not hitting Stylo bugs that have been fixed in Nightly 57.

We are not ready to start the study for *any* Beta 56 users yet. We need to uplift a couple crash fixes first (bug 1384824 and bug 1382568).
Depends on: 1382568
Depends on: 1388031
I've bumped the sample to 50% of Nightly with the same proportions of control/treatment. We are excluding anyone not on 57.
Flags: needinfo?(mgrimes)
No longer depends on: 1388031
(Assignee)

Updated

15 days ago
Depends on: 1386915
(Assignee)

Updated

15 days ago
Depends on: 1381851
(Assignee)

Updated

15 days ago
Alias: stylo-pref-study
(Assignee)

Updated

15 days ago
Depends on: 1387116
(Assignee)

Updated

15 days ago
Depends on: 1381821
(Assignee)

Updated

15 days ago
Depends on: 1292609
(Assignee)

Updated

13 days ago
Depends on: 1388031
(Assignee)

Comment 14

13 days ago
Matt, we'd like to increase our Stylo experiment to 100% of Nightly 57 users. We want to keep the 4:1 treatment/control ratio so 20% of Nightly users are in the Gecko/Control group.

Our Stylo experiment is currently enabled for 50% of Nightly 57 users. We have 19,875 users in the Stylo/Treatment group and 4,933 in the Gecko/Control group. We have not seen any new Stylo crash spikes or serious bug reports this week.
Flags: needinfo?(mgrimes)
Sample updated to include 100% of Nightly 57 with a 4:1 treatment/control ratio.
Flags: needinfo?(mgrimes)
(Assignee)

Updated

5 days ago
Depends on: 1391577
You need to log in before you can comment on or make changes to this bug.