1251052 - [checkerboard-experiment] Make a telemetry experiment to reduce displayport size

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Description

•

8 years ago

We can use the telemetry experiments machinery [1] to try different displayport sizes and see what the impact is on checkerboarding. We already have various telemetry probes in the tree to measure checkerboarding metrics, so the experiment would just need to flip the displayport size prefs and see what the result is. Ideally we want to reduce the displayport to as small as possible without increasing checkerboarding.

[1] https://wiki.mozilla.org/Telemetry/Experiments

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 1

•

8 years ago

bsmedberg, I'd like to put together a telemetry experiment that aims to find the smallest displayport size that doesn't significantly increase checkerboarding. Reducing the displayport size will both save on memory usage and main-thread paint time with APZ enabled. In particular this should help reduce the tp5o_scroll regression seen in e10s vs non-e10s talos testing. Reducing the displayport past a certain point will also increase checkerboarding when scrolling rapidly. The experiment aims to find a better "sweet spot" than we're at now, which was mostly just a random guess from anecdotal data.

We already have telemetry probes landed (CHECKERBOARD_SEVERITY, CHECKERBOARD_PEAK, CHECKERBOARD_DURATION, CHECKERBOARD_POTENTIAL_DURATION) that record various aspects of checkerboard events. The experiment code just needs to divide the population and set different displayport-sizing prefs for each group in the population. All the prefs that need to be modified are "live" and don't require a browser restart. The groups I'd like to make are:

Group 1: Control group
- no pref changes

Group 2: Small horizontal displayport, unchanged vertical displayport
- apz.x_skate_size_multiplier set to 0.5
- apz.x_stationary_size_multiplier set to 0.5

Group 3: Unchanged horizontal displayport, smaller vertical displayport
- apz.y_skate_size_multiplier set to 2.0
- apz.y_stationary_size_multiplier set to 2.0

Group 4: Unchanged horizontal displayport, even smaller vertical displayport
- apz.y_skate_size_multiplier set to 1.0
- apz.y_stationary_size_multiplier set to 1.0

Group 5: Unchanged horizontal displayport, smaller vertical displayport (skate only)
- apz.y_skate_size_multiplier set to 2.0

Group 6: Unchanged horizontal displayport, even smaller vertical displayport (skate only)
- apz.y_skate_size_multiplier set to 1.0

Based on recent historical telemetry data, we get reports of about 1 million checkerboard events per 7 days worth of builds. Even if we split that into 6 groups I'm hoping running this experiment on nightly for 7 days should be sufficient to give us some useful data.

Please let me know if you have any questions or concerns. ni? also to :botond if he has any comments or improvements to suggest.

Flags: needinfo?(botond)

Flags: needinfo?(benjamin)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 2

•

8 years ago

I'm also in the process of running t5po_scroll locally on OS X and windows for each of these groups to see what kind of improvement it yields. I'm expecting more improvement on windows than on OS X, because windows doesn't have tiling enabled and so the displayport isn't getting rounded up beyond the specified multipliers.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

8 years ago

Blocks: 1179735

Benjamin Smedberg

Comment 3

•

8 years ago

What information do you need from me? https://wiki.mozilla.org/Telemetry/Experiments has guidelines: if you want somebody to review the basic framework, please ask felipe.

An example of an experiment which just sets prefs can be found at http://hg.mozilla.org/webtools/telemetry-experiment-server/file/bb8b48cf0418/experiments/flash-protectedmode-beta

You should do some testing to make sure that the prefs set by the experiment take effect properly. A lot of graphics prefs are only read at startup before experiments are launched, which kinda sucks. Are all these prefs "live"?

Flags: needinfo?(benjamin)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 4

•

8 years ago

I was doing the "Before building an experiment" step at https://wiki.mozilla.org/Telemetry/Experiments#Data_Collection, but maybe that's not needed in this case. I'll just go ahead and build it.

And yes, all the prefs are live.

Benjamin Smedberg

Comment 5

•

8 years ago

Ah ok, if this is all telemetry that already exists there's really nothing additional I need to review.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 6

•

8 years ago

Attached patch Experiment (obsolete) — Details — Splinter Review

I'm assuming that the XPI doesn't need signing since it's a nightly-only experiment - please correct me if I'm wrong. Here's the patch to the telemetry-experiment-server repo that adds the experiment.

Attachment #8723811 - Flags: review?(felipc)

:Felipe Gomes (needinfo for replies!)

Comment 7

•

8 years ago

Comment on attachment 8723811 [details] [diff] [review]
Experiment

Review of attachment 8723811 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, I don't think you need to sign if this is gonna stay nightly-only.

::: experiments/displayport-tuning-nightly/code/bootstrap.js
@@ +68,5 @@
> +      Cu.reportError("Got error during bootstrap startup: " + e);
> +    });
> +}
> +
> +function shutdown() {

shutdown can receive a reason parameter, and you can condition this to 
reason != APP_SHUTDOWN

so that the clean-up only happens when the add-on is being disabled/uninstalled, but not during the normal shutdown process

@@ +74,5 @@
> +  defaultPrefs.reset("apz.x_skate_size_multiplier");
> +  defaultPrefs.reset("apz.y_skate_size_multiplier");
> +  defaultPrefs.reset("apz.x_stationary_size_multiplier");
> +  defaultPrefs.reset("apz.y_stationary_size_multiplier");
> +}

can you define an empty install() and uninstall() functions?  Otherwise the add-ons manager will spew warnings about them being undefined

::: experiments/displayport-tuning-nightly/manifest.json
@@ +7,5 @@
> +  "manifest"    : {
> +    "id"               : "displayport-tuning-nightly@experiments.mozilla.org",
> +    "startTime"        : 1456365600,
> +    "endTime"          : 1458007200,
> +    "maxActiveSeconds" : 6048000,

70 days is too many. please use a more realistic value (since this ends in Mar 14 anyway).  This can be used to limit a maximum time for users who are activated early on (i.e., to finish earlier than endTime).

@@ +12,5 @@
> +    "appName"          : ["Firefox"],
> +    "channel"          : ["nightly"],
> +    "minVersion"       : "47.0a1",
> +    "maxVersion"       : "48.*",
> +    "sample"           : 1.0

Figure out with bsmedberg which sample you want here. We have avoided 100% in the past

Attachment #8723811 - Flags: review?(felipc) → feedback+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 8

•

8 years ago

(In reply to :Felipe Gomes (needinfo me!) from comment #7)
> 70 days is too many. please use a more realistic value (since this ends in
> Mar 14 anyway).  This can be used to limit a maximum time for users who are
> activated early on (i.e., to finish earlier than endTime).

Whoops, I meant to make it 7 days. Got an extra 0 in there. I'll update the rest as you suggested. I might also extend the end date by a week to give enough time for QA and deployment; I'm not sure how long that process usually takes.

> > +    "sample"           : 1.0
> 
> Figure out with bsmedberg which sample you want here. We have avoided 100%
> in the past

Given that we're splitting the sample population into 6 branches I'd like to start with as large a sample as I can get away with, to maximize the amount of data for each branch. (Note that one of the branches is the control group, so if you prefer I can use a 83% sample rate and just 5 branches). Is there a reason to avoid 100% sampling though? I saw a few experiments listed at https://wiki.mozilla.org/QA/Telemetry that did it.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 9

•

8 years ago

ni? to bsmedberg for guidance on the sample rate (see last paragraphs of comment 7 and comment 8)

Flags: needinfo?(benjamin)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 10

•

8 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #2)
> I'm also in the process of running t5po_scroll locally on OS X and windows
> for each of these groups to see what kind of improvement it yields. I'm
> expecting more improvement on windows than on OS X, because windows doesn't
> have tiling enabled and so the displayport isn't getting rounded up beyond
> the specified multipliers.

tp5o_scroll results from my local runs:

            OS X            Windows
No e10s     4.5274102       8.8103926
            (baseline)      (baseline)

e10s        6.0060741       13.0563724
            (+32.66%)       (+48.19%)

Group 2     5.9652480       12.5385882
            (+31.76%)       (+42.32%)

Group 3     5.4796194       9.7264680
            (+21.03%)       (+10.40%)

Group 4     4.7459634       8.7287164
            (+4.82%)        (-0.93%)

Group 5     5.7147330       12.6219963
            (+26.22%)       (+43.26%)

Group 6     5.9374061       12.5258583
            (+31.14%)       (+42.17%)

Group 4 did the best, significantly better than all the other groups. Groups 2 and 6 barely moved the needle. These results are more or less expected since reducing the y_stationary multiplier produces the greatest impact. The telemetry experiment will give us a view on how much checkerboarding results from these as well, and then we can pick some multipliers that make sense.

Botond Ballo [:botond]

Comment 11

•

8 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #1)
> ni? also to :botond if he has any comments or improvements to suggest.

Sounds like a good idea to me!

One question: would it make sense to include a group that has the displayport size reduced in both dimensions at once?

Flags: needinfo?(botond)

Benjamin Smedberg

Comment 12

•

8 years ago

release-drivers needs a clean (no experiments) beta population to validate releases, so we must never use more than 50% of the beta population. We're also running into the fact that we're splitting beta in half e10s/non-e10s starting in 46 (probably). The couple of experiments that have run on a 100% sample were run on very restrictive populations (only Polish locale, for example).

What is the size of the population you need to have a representative measurement? The goal is to have the sample be as small as will produce the statistical power you need. If you're not sure, please talk to rvitillo about the statistics.

Flags: needinfo?(benjamin)

Benjamin Smedberg

Comment 13

•

8 years ago

Also, do you want to run this experiment on both e10s and non-e10s users?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 14

•

8 years ago

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #12)
> release-drivers needs a clean (no experiments) beta population to validate
> releases, so we must never use more than 50% of the beta population. We're
> also running into the fact that we're splitting beta in half e10s/non-e10s
> starting in 46 (probably). The couple of experiments that have run on a 100%
> sample were run on very restrictive populations (only Polish locale, for
> example).

This experiment is nightly-only, so the beta concerns shouldn't apply.

> Also, do you want to run this experiment on both e10s and non-e10s users?

e10s is the relevant population; the non-e10s users won't be generating the telemetry data I care about.

Benjamin Smedberg

Comment 15

•

8 years ago

ah, ok. Please do 50% of nightly.

I'm concerned that nightly won't produce interesting or representative data, because the hardware is so skewed compared to the rest of our populations. So I will recommend that you repeat this experiment on aurora/beta in order to validate/tune the defaults for the largest population.

You may want to use an experiment filter to explicitly exclude people from the experiment who are not running e10s: you can write a filter on environment.settings.e10sEnable

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 16

•

8 years ago

(In reply to Botond Ballo [:botond] from comment #11)
> One question: would it make sense to include a group that has the
> displayport size reduced in both dimensions at once?

I thought about that, but I figured that we could probably derive an approximation of that by combining the values from the individual dimensions. I also don't want to increase the number of groups too much. Perhaps when we run the experiment again in the future we can try combinations of both dimensions.

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #15)
> ah, ok. Please do 50% of nightly.

Sounds good.

> I'm concerned that nightly won't produce interesting or representative data,
> because the hardware is so skewed compared to the rest of our populations.
> So I will recommend that you repeat this experiment on aurora/beta in order
> to validate/tune the defaults for the largest population.

Fair enough. For now I'm happy with a first-approximation on this data, and we can run further experiments in the future to tune it more.

> You may want to use an experiment filter to explicitly exclude people from
> the experiment who are not running e10s: you can write a filter on
> environment.settings.e10sEnable

Will do. I'll also exclude people who don't have telemetry enabled since we won't get data from them either.

:Felipe Gomes (needinfo for replies!)

Comment 17

•

8 years ago

When setting the initial branch, you might also want first to check for pre-existing non-default values and put users on that condition on a "not part of experiment" group.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 18

•

8 years ago

Attached patch Experiment (v2) — Details — Splinter Review

With all the comments addressed. I retested it in a fresh Nightly profile and it seems to be working as expected.

Attachment #8724184 - Flags: review?(felipc)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 19

•

8 years ago

(In reply to :Felipe Gomes (needinfo me!) from comment #17)
> When setting the initial branch, you might also want first to check for
> pre-existing non-default values and put users on that condition on a "not
> part of experiment" group.

Oh, sorry. Didn't see this. Would I just create a new branch to indicate the "not part of experiment"?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 20

•

8 years ago

Actually in practice I think the number of people who have manually set these prefs is going to be ~0 so I'm not sure if it's worth worrying about.

:Felipe Gomes (needinfo for replies!)

Comment 21

•

8 years ago

Comment on attachment 8724184 [details] [diff] [review]
Experiment (v2)

Review of attachment 8724184 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #19)
> (In reply to :Felipe Gomes (needinfo me!) from comment #17)
> > When setting the initial branch, you might also want first to check for
> > pre-existing non-default values and put users on that condition on a "not
> > part of experiment" group.
> 
> Oh, sorry. Didn't see this. Would I just create a new branch to indicate the
> "not part of experiment"?

Yeah. Just as a way to have them not skew your data. Up to you to figure out if that would be important or not

Attachment #8724184 - Flags: review?(felipc) → review+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 22

•

8 years ago

landing

Thanks. Landed in the telemetry-experiment-server repo:

https://hg.mozilla.org/webtools/telemetry-experiment-server/rev/844bf183109a

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 23

•

8 years ago

I wrote up a test plan at https://public.etherpad-mozilla.org/p/bug_1251052_experiment - it's pretty basic since there's not much that the add-on really does. ni? to RyanVM for QA testing on it as per IRC discussion.

Flags: needinfo?(ryanvm)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 24

•

8 years ago

After reading a recent dev-platform thread I realized I didn't get the xpi signed which I probably need for installation on default-configuration Nightly builds. Looks like I should be able to sign it with jpm, will do that tomorrow or Monday morning and push the updated xpi to the repo.

Ryan VanderMeulen [:RyanVM]

Comment 25

•

8 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #23)
> I wrote up a test plan at
> https://public.etherpad-mozilla.org/p/bug_1251052_experiment - it's pretty
> basic since there's not much that the add-on really does. ni? to RyanVM for
> QA testing on it as per IRC discussion.

Flags: needinfo?(ryanvm) → needinfo?(rares.bologa)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

8 years ago

Attachment #8723811 - Attachment is obsolete: true

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 26

•

8 years ago

Attached file experiment.xpi — Details

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #24)
> After reading a recent dev-platform thread I realized I didn't get the xpi
> signed which I probably need for installation on default-configuration
> Nightly builds. Looks like I should be able to sign it with jpm, will do
> that tomorrow or Monday morning and push the updated xpi to the repo.

Looks like I can't use the automation signing key to sign experiment xpi's (I get a "You cannot submit this type of add-on" error). On IRC :felipe said I should ask :jason to get it signed - Jason, could you sign this XPI for a telemetry experiment? Thanks!

Flags: needinfo?(jthomas)

RaresB

Updated

•

8 years ago

Flags: needinfo?(rares.bologa) → needinfo?(mfunches)

Michelle Funches - QA

Comment 27

•

8 years ago

Preliminary Detailed Test Cases for Checkerboard Experiment
https://docs.google.com/spreadsheets/d/1NADaSY5pqFc8EBYk2hOAzoh4-OuAjQK-XCID7mnukLk/edit#gid=280492042
a) will await Signed XPI
b) will I need to adjust the "experiments.manifest.uri"

:Felipe Gomes (needinfo for replies!)

Comment 28

•

8 years ago

(In reply to Michelle Funches - QA from comment #27)
> Preliminary Detailed Test Cases for Checkerboard Experiment
> a) will await Signed XPI
> b) will I need to adjust the "experiments.manifest.uri"

The experiments.manifest.uri that you should use is "https://telemetry-experiment-dev.allizom.org/firefox-manifest.json"

I also suggest creating a new String pref named "experiments.force-sample-value" with a value of "0.1"

Jason Thomas [:jason]

Comment 29

•

8 years ago

Attached file signed experiment.xpi — Details

Please see signed experiment.xpi

Flags: needinfo?(jthomas)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 30

•

8 years ago

landing

Thanks!

I landed the signed experiment.xpi into the repo at https://hg.mozilla.org/webtools/telemetry-experiment-server/rev/db30cb1c7ff5 and verified it's updated at the telemetry-dev.allizom.org URL.

Michelle Funches - QA

Comment 31

•

8 years ago

Testing complete; sign-off sent, wiki pages have been updated.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

8 years ago

Flags: needinfo?(mfunches)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 32

•

8 years ago

I sent an email to release-drivers requesting approval to ship the experiment.

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 33

•

8 years ago

In the next 7 days, the Nightly channel will transition from 47 -> 48 and I am told that this should have an impact on the effectiveness of this telemetry experiment. With that, it's a GO!

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 34

•

8 years ago

Thanks! I updated the release_tag in the repo, and will file a bug for pushing it to production.

https://hg.mozilla.org/webtools/telemetry-experiment-server/rev/b00d24d4799d

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

8 years ago

Blocks: 1253363

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 35

•

8 years ago

It's only been a few days of the experiment being live, but I ran an analysis on the telemetry data that's accumulated so far:

https://github.com/staktrace/telemetry-analysis/blob/b31ba0e8e21c2e3d11740da7abbeed255320167a/checkerboarding.ipynb

It looks like groups 4 and 6 have a significantly higher number of checkerboarding events. I need to slice and dice the data some more to see if we just ended up replacing a few large checkerboarding events with a lot of small ones, or something along those lines. Just eyeballing the graphs though it looks like groups 4 and 6 are worse across the board which is bad news :(

Botond Ballo [:botond]

Comment 36

•

8 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #35)
> Just eyeballing the graphs
> though it looks like groups 4 and 6 are worse across the board which is bad
> news :(

Isn't it expected that reducing one or both of the vertical multiplers to 1.0 will cause more checkerboarding?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 37

•

8 years ago

(In reply to Botond Ballo [:botond] from comment #36)
> Isn't it expected that reducing one or both of the vertical multiplers to
> 1.0 will cause more checkerboarding?

Not really - given that we can paint much faster with a smaller displayport it wasn't clear if this would be a net win or loss for checkerboarding - and even if you assume it would cause more checkerboarding, it wasn't clear how much it would increase checkerboarding by.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 38

•

8 years ago

Also to be clear the reason I said it's "bad news" is because it makes much less likely that we can use "shrink the displayport" as a magic bullet for all of our talos woes.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 39

•

8 years ago

I updated the ipynb with some more stuff. The new version is at https://github.com/staktrace/telemetry-analysis/blob/0c1af2aab4a268b21e296346d8568ed6da2eeeb8/checkerboarding.ipynb

Roberto, do you mind taking a quick look at this notebook and reviewing the bits where I'm pulling the data out of telemetry? Just as a sanity check to make sure I'm not doing anything stupid. Thanks!

Flags: needinfo?(rvitillo)

Roberto Agostino Vitillo (:rvitillo)

Comment 40

•

8 years ago

It looks generally good. To compute the sample sum you are using the lower bound of the buckets which yields an approximation; The raw histogram representation has a "sum" field which is accurate but to access it you would have to retreive that data without "get_pings_properties", which as you know returns only a pandas series for a histogram, e.g:

pings.map(lambda p: p["payload"]["histograms"].get("CHECKERBOARD_DURATION", {}).get("sum", 0)).sum()

That said, I wouldn't expect big suprises using the accurate sums.

Flags: needinfo?(rvitillo)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 41

•

8 years ago

Cool, thanks!

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 42

•

8 years ago

Updated the analysis now that there's the experiment has been running a while longer and there's more data. I also looked at some other relevant probes (composite/paint time probes) to see if there was a significant difference there. Not too much unexpected there.

As far as I can tell the control group is the best of the different experiment groups in terms of checkerboarding. group2 is also decent but the sample sum of severity is still (very slightly) worse.

https://github.com/staktrace/telemetry-analysis/blob/5552e29600ea6603eb6368f4a3fba03e5ea1a022/checkerboarding.ipynb

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 43

•

8 years ago

What's interesting is that in the experiment, almost all the groups were worse than the control group. So you'd expect that over the entire population, the net effect of the experiment would be to make things worse. However, looking at the telemetry evolution dashboard I see a marked improvement over the time that the experiment was running [1] (starting March 4th/5th, going for a week from there). I'm not sure what's going on here...

[1] http://mzl.la/1VdeVnF

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

8 years ago

status-firefox47: affected → fixed

Experiment 8 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 7.16 KB, patch	Felipe : feedback+	Details \| Diff \| Splinter Review
Experiment (v2) 8 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 8.74 KB, patch	Felipe : review+	Details \| Diff \| Splinter Review
experiment.xpi 8 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 1.57 KB, application/x-xpinstall		Details
signed experiment.xpi 8 years ago Jason Thomas [:jason] 5.45 KB, application/x-xpinstall		Details