Closed Bug 1381147 (stylo-pref-study) Opened 3 years ago Closed 3 years ago
[Shield] Pref Flip Study: Quantum CSS (Stylo)
Quantum CSS (Stylo) Product Hypothesis Doc: https://docs.google.com/document/d/1sFHmpnqgTTjsNhOSU4qd3wiqgErtjxM7vsl9kCvB5qQ/ Details: > Basic description of experiment: Compare the stability and performance of Stylo and Gecko. > What is the preference we will be changing? layout.css.servo.enabled > What are the branches of the study and what values should each branch be set to? Gecko (control group): layout.css.servo.enabled = false Stylo (experiment group): layout.css.servo.enabled = true > What percentage of users do you want in each branch? Start at 1% of Nightly with uneven branches: 4 to 1 treatment/control. After a week move to 20% with the same ratio. After two weeks move to 50% with the same ratio. > What Channels/locales do you intend to ship to? We would like to run the Stylo experiment in Nightly 56 and again in Beta 56 for all locales. > What is your intended go live date and how long will the study run? Nightly 56 experiment: July 17 through August 2. Beta 56 experiment: mid-August, depending on results of Nightly 56 experiment. > Are there specific criteria for participants? OS = Mac, Win32, Win64, or Linux64. Stylo does not work on Linux32 or Android yet!
Looks good. We have reviewed the PHD and given the stamp of approval. Send your release drives email and we'll ship on Monday.
> What is the main effect you are looking for and what data will you use to make these decisions? Stylo usage should not be less than Gecko usage. If Stylo’s usage is less than Gecko, then Stylo users are probably have a bad experience due to Stylo bugs. Compare: * Total usage hours: bigger is better, but we don’t expect any changes during this experiment. * Total pages visited: bigger is better, but we don’t expect any changes during this experiment. * Total URIs visited: bigger is better, but we don’t expect any changes during this experiment. Stylo crash rate should be no worse than Gecko. We should compare crash rate (per kilohour usage?) instead of total crash count in case Stylo’s usage is less than Gecko’s. Compare: * Browser process crashes per kilohour usage: smaller is better, but we don’t expect to see any changes. * Content process crashes per kilohour usage: smaller is better, but we will likely see slightly more crashes with Stylo. Stylo page load performance should not be slower than Gecko. Telemetry probes: * TIME_TO_DOM_LOADING_MS: smaller is better, but we don’t expect Stylo to affect this probe at all. Thus, this “DOM loading” time is a baseline. * TIME_TO_DOM_INTERACTIVE_MS: smaller is better, but we don’t expect Stylo to affect this probe during this first experiment. We experiment Stylo to improve (reduce) “DOM interative” time in future experiments. * TIME_TO_DOM_COMPLETE_MS: smaller is better, but we don’t expect Stylo to affect this probe during this first experiment. We experiment Stylo to improve (reduce) “DOM complete” time in future experiments. We do not expect Stylo to improve page load performance yet because the team is still fixing some known performance issues before we enable it by default. Ideally we would be able to compare performance and number of CPU cores. Stylo performance should increase linearly with the number of cores. Stylo should not use more than 20% more memory than the control group. 20% is an arbitrary target. We know Stylo currently uses more memory than Gecko, but we plan to improve Stylo’s memory usage before we enable it by default. Telemetry probes: * MEMORY_TOTAL (total memory across all processes): smaller is better, but we expect Stylo to use about 20% more memory that Gecko during this first experiment. * MEMORY_UNIQUE (unique set size): smaller is better, but we expect Stylo to use about 20% more memory that Gecko during this first experiment. * MEMORY_VSIZE (virtual memory size): smaller is better, but we expect Stylo to use about 20% more memory that Gecko during this first experiment. > Who is the owner of the data analysis for this study? Bobby Holley is the Stylo tech lead. > Will this experiment require uplift? No > Do you plan on surveying users at the end of the study? No > Link to any relevant google docs / Drive files that describe the project. Links to prior art if it exists: Stylo project wiki: https://wiki.mozilla.org/Quantum/Stylo
Live for 1% of Nightly. 4 to 1 Ratio of Stylo/Treatment to Gecko/Control. Let us know when you are ready to go to a wider audience.
Great! We've fixed a lot issues over the past week and the feedback channels seem pretty quiet, so I'm increasingly confident that stylo should be a smooth experience. Ideally, we'd get some baseline reading that nothing disastrous is happening and then bump the population up to 20%. How long will it take us to establish that nothing's on fire with this experiment?
You should have data in experiments viewer in the next 24 hours. Anything not represented there can be pulled from Telemetry manually by one of my folks. I think Ilana was looking at this already.
Bumped the sample to 20% of Nightly. Still using 4 to 1 ratio.
Firefox 56 is riding from the Nightly to Beta channel this week. We would like to expand our Stylo experiment to include: * 50% of Nightly 56 and 57 * 20% of Beta 56
(In reply to Chris Peterson [:cpeterson] from comment #7) > Firefox 56 is riding from the Nightly to Beta channel this week. We would > like to expand our Stylo experiment to include: > > * 50% of Nightly 56 and 57 > * 20% of Beta 56 On second thought, we should probably wait for Stylo top crash (bug 1384824) to be fixed before we increase our sample size. I will follow up when the crash is fixed.
Depends on: 1384824
Thanks Chris. I'll wait for your confirmation and then we can make the changes you requested.
I believe we're ready to increase the experiment to 50%. Chris, let me know if you're aware of anything else blocking us.
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #10) > I believe we're ready to increase the experiment to 50%. (To be clear, I mean on Nightly. We're still uplifting those two crash fixes to beta).
I notified the release-drivers list: The Stylo team has fixed its top crash (bug 1384824) so we'd like to increase our pref study's sample size from 20% to 50% of Nighty 57. We should disable the study for Nightly *56* so stragglers are not hitting Stylo bugs that have been fixed in Nightly 57. We are not ready to start the study for *any* Beta 56 users yet. We need to uplift a couple crash fixes first (bug 1384824 and bug 1382568).
Depends on: 1382568
I've bumped the sample to 50% of Nightly with the same proportions of control/treatment. We are excluding anyone not on 57.
Matt, we'd like to increase our Stylo experiment to 100% of Nightly 57 users. We want to keep the 4:1 treatment/control ratio so 20% of Nightly users are in the Gecko/Control group. Our Stylo experiment is currently enabled for 50% of Nightly 57 users. We have 19,875 users in the Stylo/Treatment group and 4,933 in the Gecko/Control group. We have not seen any new Stylo crash spikes or serious bug reports this week.
Sample updated to include 100% of Nightly 57 with a 4:1 treatment/control ratio.
:cpeterson, just a reminder that we should be able to expand this to Linux32. Not sure if emails need to be sent, etc.
(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #16) > :cpeterson, just a reminder that we should be able to expand this to > Linux32. Not sure if emails need to be sent, etc. I don't think it is worth the trouble of adding Linux32 to our Nightly 57 experiment. There are very few Linux32 Nightly users and we will enabling Stylo by default soon. Also, I am asking to enable the experiment for Beta 56 users, where we don't support Linux32, and I don't want to risk configuration mix-ups by having different experiment parameters for Nightly 57 and Beta 56.
Because users on the Beta channel have different hardware configurations and browsing behavior than Nightly users, we'd like to run a tiny experiment (1%) on the Beta 56 channel. Our primary motivation is seeing if Beta users are hitting more or different types of Stylo crashes than Nightly users. We don't need to make any statistical comparisons of Stylo vs Gecko telemetry for Beta 56. We've uplifted most of the crash fixes from Nighty 57 to Beta 56, so stability should not be a big risk. Stylo experiment history: V1 - On July 24, we shipped to 1% of Nightly 56. (comment 3) V2 - On July 26, we shipped to 20% of Nightly 56. (comment 6) V3 - On August 7, we shipped to 50% of Nightly 57. (comment 13) V4 - On August 10, we shipped to 100% of Nightly 57. (comment 15) Proposed V5 - Ship to 1% of Beta 56. Note that the Nightly 57 and proposed Beta 56 experiments are only for Win32, Win64, Mac, and Linux64. Linux32 and Android are excluded for now.
The stude is now live on beta for the populations you requested. Data should start populating in the next 24 hours.
Stylo is now enabled by default in Nightly 57! bug 1330412 Unfortunately, this messes up our Nightly experiment because the default control group will suddenly get Stylo by default. To remedy this, we would like to stop the current Nightly experiment ("pref-flip-quantum-css-style-r1-1381147") and start a new Nightly experiment where we force *disable* Stylo for some Nightly users: * 100% of Nightly 57 users with a 1:4 treatment/control ratio: 1. "stylo-disabled" treatment group with pref "layout.css.servo.enabled" force set to *false*. (20% of Nightly 57 users) 2. "stylo-default" control group with no pref changes. (80% of Nightly 57 users) * Platforms: Win32, Win64, Mac, Linux32, and Linux64. Linux32 support is new this time. Android is still excluded. The current Beta 56 experiment ("pref-flip-quantum-css-stylo-beta-1381147") can remain unchanged.
Looks like I got placed into the treatment group today (although I don't see a study in about:studies); verified that layout.css.servo.enabled is set to false (and it says that this is the default). Finding that my laptop is a *lot* warmer while browsing than it has in a long time, so while I'll suffer through this until it flips itself back, my feeling is that Stylo really improved my CPU utilization vs. Gecko based styles. Obviously however, it could be another, unrelated regression in Nightly.
(In reply to Asif Youssuff from comment #21) > Looks like I got placed into the treatment group today (although I don't see > a study in about:studies); verified that layout.css.servo.enabled is set to > false (and it says that this is the default). > > Finding that my laptop is a *lot* warmer while browsing than it has in a > long time, so while I'll suffer through this until it flips itself back, my > feeling is that Stylo really improved my CPU utilization vs. Gecko based > styles. Obviously however, it could be another, unrelated regression in > Nightly. That's good to hear! No need to suffer through being in the control group if you don't want to be. It's mostly just a backstop at this point to make sure that the old style system still works, so feel free to opt back in to stylo if you want. :-)
Matt, Firefox 57 is riding the trains from Nightly to Beta and Dev Edition this week. We'd like to make the following changes to our Stylo experiments. Does the experiment system treat Beta and Dev Edition users differently? If so, then please consider all references to "Beta" as "Beta and Dev Edition". 1. Stop the experiment on Beta *56*. 2. Start an experiment on Beta *57* for 100% of Beta 57 users with a 5:95 treatment/control ratio. (This is different from the previous Beta 56 and Nightly 57 ratios!) * 5% Stylo-disabled (branch "gecko") treatment * 95% Stylo-default (branch "stylo") control 3. No change to our Nightly experiment. We want to continue our Nightly 57 experiment in Nightly 58 with the existing 1:4 treatment/control ratio: * 20% Stylo-disabled (branch "gecko") treatment * 80% Stylo-default (branch "stylo") control
1,2: See PROPOSAL AND IMPLICATIONS 3. Recipe 203, NOT AFFECTED. NO CHANGE To solve 1,2, we might strand / change user branches Proposal - close recipe 223. - Create a new recipe. Specific individual Users will NOT BE GUARANTEED to be on the same branch as they are now. Since you ask for a ratio change, this will also mean that some users are going to have to change branches. Is this okay?
Flags: needinfo?(mgrimes) → needinfo?(cpeterson)
(cpeterson said it was okay out of band). 223 rev 4 does this and can be enabled on 27 september.
(In reply to Gregg Lind (Fx Strategy and Insights - Shield - Heartbeat ) from comment #24) > Since you ask for a ratio change, this will also mean that some users are > going to have to change branches. > > Is this okay? Yes. That is fine. We are more interested in the metrics for each branch overall than tracking changes in an individual user's metrics over time.
The changes to the Beta study have been made and are now live. As Gregg mentioned, the nightly study should continue running without need for alteration.
status-firefox57=wontfix unless someone thinks this bug should block 57
Matt: now that Beta has moved to 58 and Nightly to 59, are the Stylo experiments still running on Beta and Nightly? I see in the Pref Flip Experiment Client Counts table  that the "pref-flip-quantum-css-style-r1-1381147" (started for Nightly 58) and "pref-flip-quantum-css-stylo-beta-1381147" (started for Beta 57) have "last seen" dates of 2017-11-01. We'd like to keep running the experiments in Beta 58 and Nightly 59 with the same client ratios as we had for Beta 57 and Nightly 58. Once we ship Stylo for Android (hopefully in Firefox 59), we can stop all these Stylo experiments. Beta: * 5% Stylo-disabled (branch "gecko") treatment * 95% Stylo-default (branch "stylo") control Nightly: * 20% Stylo-disabled (branch "gecko") treatment * 80% Stylo-default (branch "stylo") control  https://sql.telemetry.mozilla.org/queries/6154#table
They are not currently, but that's an easy fix. We can keep them live on Beta and Nightly until further notice if relman is ok with that. + Liz for the thumbs up.
Flags: needinfo?(mgrimes) → needinfo?(lhenry)
That sounds fine, thanks Matt.
These studies are both live again. Please let us know when you would like them disabled. They are no gated only on Channel, not version number so they should persist across trains.
(In reply to Matt Grimes [:Matt_G] from comment #32) > These studies are both live again. Please let us know when you would like > them disabled. They are no gated only on Channel, not version number so they > should persist across trains. Thanks! Gating them on channel instead of version number is good for Stylo's testing needs.
Matt, we can stop the Stylo pref studies whenever is convenient for you. We've been using these pref studies to continue testing Gecko's legacy style system in case we needed to disable Stylo. We haven't seen any major compatibility bugs since we shipped Stylo in Firefox 57, so we no longer need to keep running these pref studies. Few users are affected by these pref studies, so there is no rush to stop them. Here are the Stylo experiment IDs: pref-flip-quantum-css-stylo-beta-1381147 pref-flip-quantum-css-style-r1-1381147
Both studies have been disabled. Before we close the bug, can we get a quick recap here of the outcome?
Thanks! We started the Stylo pref study in Nightly 56 to compare Stylo-enabled (treatment group) vs Stylo-disabled (Gecko control group). Ilana prepared a report  for the telemetry probes requested in comment 2. At that time, it showed Stylo used slightly more memory than Gecko (as expected) and Stylo's initial page load time was slightly slower. We confirmed that in other page load benchmarks. Stylo doesn't more work up front than Gecko to save more time later for dynamic page elements. We did not have good telemetry probes to measure dynamic page performance. In our Stylo project retrospective, we acknowledged that we should have designed better telemetry probes and benchmarks to measure dynamic page performance. After we enabled Stylo by default in Nightly 57, we used the pref study to keep some Nightly (20%) and Beta (5%) users testing Gecko's old style system. If we found a serious Stylo bug in 57 and had to disable Stylo when we shipped 57, we wanted to ensure that Gecko's old style system was still well-tested. This was Stylo/Gecko testing was very helpful because it gave us confidence that both Stylo and Gecko were working until we shipped 57. Crash reports were annotated with a Stylo experiment flag to help find Stylo crash reports.  https://gist.github.com/ilanasegall/a422dcdbaec8b0c44b984567a9a04a42
You need to log in before you can comment on or make changes to this bug.