Run a Telemetry experiment to vet SKIA content on Windows

RESOLVED FIXED

Status

()

Core
Graphics
RESOLVED FIXED
8 months ago
7 months ago

People

(Reporter: ashughes, Assigned: ashughes)

Tracking

52 Branch
Unspecified
Windows
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox52 affected)

Details

(Whiteboard: [gfx-noted])

Attachments

(3 attachments, 2 obsolete attachments)

This bug is to be used to track running a Telemetry experiment to A/B test SKIA content on Windows (bug 1007702).  The goal of this experiment is to test whether SKIA introduces regressions for Windows Nightly users who do not have hardware acceleration for whatever reason.

Proposed Criteria:
>      Product: Firefox Nightly
>      Version: 51.0a1
>           OS: Windows (all)
>      Channel: nightly
>     Build ID: 2016-08-26+
>     Language: en-US
> Sample ratio: 50%

We would like this experiment to run for one week starting this weekend (if possible) and go out to a random half of all Nightly users on Windows.

A few questions came up in conversation with Milan:
1) Are crash reports automatically annotated if the user is part of an experiment cohort? If not, can we annotate them?
2) Is it possible to disqualify users from the experiment if they have ClearType enabled?
3) Is it possible to vet if there is a bias in the cohort? ie. We want to be sure that the cohort is of similar OS/Vendor proportions to that of our existing Nightly population.

I have cc'd Milan and Mason on this bug to provide more details as required.
Felipe, please let us know what other information you need. Thanks.
Flags: needinfo?(felipc)
It'd be good to have an answer to #2, but we will probably have bug 1298484 fixed in time to not need the answer in the context of this test.
Depends on: 1298484
(In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #0)
> 1) Are crash reports automatically annotated if the user is part of an
> experiment cohort? If not, can we annotate them?

Yeah, they are automatically annotated

> 2) Is it possible to disqualify users from the experiment if they have
> ClearType enabled?

Is it possible to tell if they have ClearType enabled from privileged JS? If yes, then yes

> 3) Is it possible to vet if there is a bias in the cohort? ie. We want to be
> sure that the cohort is of similar OS/Vendor proportions to that of our
> existing Nightly population.

I didn't understand this point very well, but if we our sampling of users is at random, the proportions will be the same.


I'm not sure if there's someone lined up to do this work or if the intention was for me to take it, but I really can't take it for this week. Creating an experiment add-on like this one is not complicated and I can guide someone willing to do so.
Flags: needinfo?(felipc)

Comment 4

8 months ago
Can we hold off on this for a bit? We have enough skia windows problems on nightly that we can fix those issues before doing another telemetry experiment.

Updated

8 months ago
Depends on: 1299903
(In reply to Mason Chang [:mchang] from comment #4)
> Can we hold off on this for a bit? We have enough skia windows problems on
> nightly that we can fix those issues before doing another telemetry
> experiment.

I'm okay with that. To clarify does this mean we'd target a Telemetry experiment at Aurora 51, assuming the issues in Nightly 51 get fixed in time for migration?

Comment 6

8 months ago
(In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #5)
> (In reply to Mason Chang [:mchang] from comment #4)
> > Can we hold off on this for a bit? We have enough skia windows problems on
> > nightly that we can fix those issues before doing another telemetry
> > experiment.
> 
> I'm okay with that. To clarify does this mean we'd target a Telemetry
> experiment at Aurora 51, assuming the issues in Nightly 51 get fixed in time
> for migration?

No, I'm thinking maybe nightly 52 and that this won't ride the trains this release.
(In reply to Mason Chang [:mchang] from comment #6)
> No, I'm thinking maybe nightly 52 and that this won't ride the trains this release.

Following this up since we just had a meeting, here is what we want to do going forward:

Run an experiment on 50% of unaccelerated Nightly 52 users after it merges next week. By default all unaccelerated Nightly users should fall back to SKIA content on Windows (ie. they should have "gfx.content.azure.backends" set to "direct2d1.1, skia". For 50% of these users we'd like to change this pref to "direct2d1.1, cairo". We would then watch crash stats and our usual feedback channels to see if any bugs come in.

Before we move forward with this we need the following:
1) Determine how many users on Nightly run unaccelerated to ensure a large enough sample size. (Anthony)
2) Land a Telemetry probe so we know which backend users fall back to when D2D fails. (Mason)
3) Determine the current crash rate before/after SKIA landed on August 26th. (Anthony)
4) Determine if we can run a Telemetry experiment by simply changing a string pref. (Anthony)

Mason, please add anything you think I forgot.
(In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #7)
> 4) Determine if we can run a Telemetry experiment by simply changing a string pref. (Anthony)

Felipe, can you answer if this is possible? Basically, we already have a pref to turn on/off SKIA for unaccelerated users (gfx.content.azure.backends). Is it possible to opt 50% in to the experiment simply by changing this pref?
Flags: needinfo?(felipc)
(In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #7)
> 1) Determine how many users on Nightly run unaccelerated to ensure a large enough sample size.

Based on Telemetry data analysis provided by David Anderson, it looks like ~11% of Nightly users are using Software rendering which roughly translates to 3000 users based on current ADIs. If we did a 50-50 split that would give us roughly 1500 users in the experimental cohort and 1500 users in the control cohort.
The experiments are add-ons, and it should be fairly simple to have one that just changes this pref.  (Specially good if this pref can be changed on the fly without requiring a restart to take effect)

The important thing to keep in mind is this:

The experiment system allow you to specify the % of users that will receive the experiment.

If you will analyze the results through your own telemetry probes, you could just ship the experiment to the 50% of users who will get the pref change. But in this case only these users will be marked as "part of the experiment" in crash-stats, for example.

Otherwise, you can ship it to 100% of users on Nightly, and then your own code will choose with 50%/50% whether to choose to keep skia or not, and then mark them as two separate cohorts.

In other words:

If you think that your second point "2) Land a Telemetry probe so we know which backend users fall back to when D2D fails" is enough to analyze the data (without looking at the activeExperiment and activeExperimentBranch info), you can go with the 50% exposure. Otherwise go with 100%.

This might be a bit confusing.. Let me know if it's not clear and I can provide more info or talk on vidyo.
Flags: needinfo?(felipc)
(In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #7)
> 3) Determine the current crash rate before/after SKIA landed on August 26th.

I've come up with a prototype to measure crash rates:
1. http://ashughes1.github.io/metrics-graphics-gfx/#v2-crashes-nightly
2. http://ashughes1.github.io/metrics-graphics-gfx/#v2-crashes-nightly-unaccel

[1] shows the crash volume and rate for Nightly 51.0a1 as a whole by build date.
[2] shows the crash volume and rate for Nightly 51.0a1 users with "Layers-" in AppNotes by build date.

Mason, can you take a look and tell me if these seem correct to you?
Flags: needinfo?(mchang)
(In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #11)
> (In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #7)
> > 3) Determine the current crash rate before/after SKIA landed on August 26th.
> 
> I've come up with a prototype to measure crash rates:
> 1. http://ashughes1.github.io/metrics-graphics-gfx/#v2-crashes-nightly
> 2.
> http://ashughes1.github.io/metrics-graphics-gfx/#v2-crashes-nightly-unaccel
> 
> [1] shows the crash volume and rate for Nightly 51.0a1 as a whole by build
> date.
> [2] shows the crash volume and rate for Nightly 51.0a1 users with "Layers-"
> in AppNotes by build date.
> 
> Mason, can you take a look and tell me if these seem correct to you?

I'm not entirely sure what Layers- does exactly. Why not wait for bug 1302240 to land and just look for that telemetry probe instead?
Flags: needinfo?(mchang) → needinfo?(anthony.s.hughes)
(In reply to Mason Chang [:mchang] from comment #12)
> I'm not entirely sure what Layers- does exactly. Why not wait for bug
> 1302240 to land and just look for that telemetry probe instead?

My understanding was that "Layers-" indicates a crash where a user session failed to get D3D (not that that's the reason of the crash). Will the Telemetry probe show up in crash-stats.mozilla.org reports in a queryable way? If not, how do I correlate the probe to crash data?
Flags: needinfo?(anthony.s.hughes)
(In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #13)
> (In reply to Mason Chang [:mchang] from comment #12)
> > I'm not entirely sure what Layers- does exactly. Why not wait for bug
> > 1302240 to land and just look for that telemetry probe instead?
> 
> My understanding was that "Layers-" indicates a crash where a user session
> failed to get D3D (not that that's the reason of the crash). Will the
> Telemetry probe show up in crash-stats.mozilla.org reports in a queryable
> way? If not, how do I correlate the probe to crash data?

I don't know, I'm not familiar with querying things in telemetry. Do all telemetry probes show up in crash-stats already or what makes it explicit so that it is?
(In reply to Mason Chang [:mchang] from comment #14)
> (In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #13)
> > (In reply to Mason Chang [:mchang] from comment #12)
> > > I'm not entirely sure what Layers- does exactly. Why not wait for bug
> > > 1302240 to land and just look for that telemetry probe instead?
> > 
> > My understanding was that "Layers-" indicates a crash where a user session
> > failed to get D3D (not that that's the reason of the crash). Will the
> > Telemetry probe show up in crash-stats.mozilla.org reports in a queryable
> > way? If not, how do I correlate the probe to crash data?
> 
> I don't know, I'm not familiar with querying things in telemetry. Do all
> telemetry probes show up in crash-stats already or what makes it explicit so
> that it is?

Felipe, can you help enlighten us here?
Flags: needinfo?(felipc)
I don't know too much about querying telemetry and crash stats. Perhaps Chris can answer this for you?
Flags: needinfo?(felipc) → needinfo?(chutten)

Comment 17

8 months ago
crash stats and telemetry have nothing to do with each other at present. Telemetry does have its own crash information ("crash" pings for parent processes, SUBPROCESS_ABNORMAL_ABORT for child processes), but its crash information is far less complete than crash-stats (no stacks, no signature).

I would recommend looking at crash rates per experiment cohort. If the crash rates of all or just certain signatures are wildly different between cohorts, that would be a result.
Flags: needinfo?(chutten)
Thanks Felipe and Chris, so how do we proceed with getting this kicked off? Does an add-on need to be written and if so is there a recipe we can follow?

Thanks
Flags: needinfo?(felipc)
Yep, you need to write an add-on against this repository: http://hg.mozilla.org/webtools/telemetry-experiment-server/

Take a look at one example commit here: http://hg.mozilla.org/webtools/telemetry-experiment-server/rev/844bf183109a

Basically you need:
 - create a new folder
 - set up the manifest.json, bootstrap.js and install.rdf files following the examples

After you have your basics done, you can build the telemetry server locally and it will serve a page that looks like this: https://telemetry-experiment.cdn.mozilla.net/  (useful to check the parameters of your own experiment)

The process is documented in details here: https://wiki.mozilla.org/QA/Telemetry/Developing_a_Telemetry_Experiment

I suggest targeting 100% of Windows users and do the 50% split in your own code (similar to how Kats did in that example commit). That way it will be easier to do the crash-stats analysis afterwards
Flags: needinfo?(felipc)
Tip: in this section of the wiki there's a link to a Makefile that you can drop in your folder and will help will build the experiment xpi: https://wiki.mozilla.org/QA/Telemetry/Developing_a_Telemetry_Experiment#Building_the_experiment
Thanks Felipe!

Mason, are you able to take care of the Telemetry add-on based on comment 19 and 20?
Flags: needinfo?(mchang)
(In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #21)
> Thanks Felipe!
> 
> Mason, are you able to take care of the Telemetry add-on based on comment 19
> and 20?

I won't have time for another 2-3 weeks at least. Do you have time?
Flags: needinfo?(mchang) → needinfo?(anthony.s.hughes)
Otherwise if it's easier, we can just try to put the information in crash-stats. Or if the Layers- thing works, then let's just do that.
I think I got this working. I'll have to update the manifest with the dates we want before I submit my patch. I am thinking Wednesday September 28 through Tuesday October 4 as this gives users a week to get on Nightly 52 and start giving us baseline stability numbers before kicking off the experiment.

Milan, what do you think?
Flags: needinfo?(anthony.s.hughes) → needinfo?(milan)
I suggest at least two weeks so that you can accumulate more data! If early on you detect a problem it's easy to stop the experiment earlier.
We can start it even earlier, say Sep 21, and go for 10-14 days as Felipe recommends, watch it, see what happens.
Flags: needinfo?(milan)
(In reply to Milan Sreckovic [:milan] from comment #26)
> We can start it even earlier, say Sep 21, and go for 10-14 days as Felipe
> recommends, watch it, see what happens.

Follow up question -- do we want this to just run on 52.0a1 builds or do we want to also allow users on 51.0a1 builds that have not updated yet to be included?
Flags: needinfo?(milan)
Experiment on 52 only.
Flags: needinfo?(milan)
Okay, I think I'm ready now. The instructions state to file a bug against FHR to get a review. Should I go ahead and do that or can I just submit my patch for review in *this* bug?
Flags: needinfo?(felipc)
On this bug is fine. Before shipping you'll need to send an Intent to Ship e-mail to release-drivers too. Once the patch is committed, it is automatically pushed to a staging server and we can perform some more tests there. And once it's all good you file a separate bug asking to push it to production
Flags: needinfo?(felipc)
Created attachment 8791706 [details] [diff] [review]
skia-windows-nightly.patch

Initial patch to run a Telemetry experiment A/B testing SKIA for software rendering content on Windows. By default Nightly 52 will have SKIA enabled -- this experiment will flip a pref for 50% of users to revert to Cairo instead for two weeks beginning on September 21. We'll monitor stability for each cohort and watch for reports of rendering bugs in Nightly during this period.
Assignee: nobody → anthony.s.hughes
Status: NEW → ASSIGNED
Attachment #8791706 - Flags: review?(felipc)
Comment on attachment 8791706 [details] [diff] [review]
skia-windows-nightly.patch

Review of attachment 8791706 [details] [diff] [review]:
-----------------------------------------------------------------

::: experiments/skia-windows-nightly/code/bootstrap.js
@@ +27,5 @@
> +      }
> +      yield Experiments.instance().setExperimentBranch(kSELF_ID, branch);
> +    }
> +
> +    let defaultPrefs = new Preferences({defaultBranch: true});

The "defaultBranch" is the version of prefs that don't survive a restart. So while it may seem that this is working by looking in about:config and seeing this pref configured correctly every time, this startup() function might be running too late for it to take effect.

I suggest doing the following:

- Set the pref (using `let prefs = new Preferences();`) right next to where you set "group1". Btw this is a free-form string. Since you only have one group, you could change it to "test", or "cairo", etc. to make it nicer for you.

- where you have (if branch == null), have an `else`, where you check if the value of the pref is what you expected. (for "test" check if it's "direct2d1.1,cairo", for control "direct2d1.1,skia"). If not, put the user in another branch called "user-changed" and don't do anything else anymore with this user.

That way you can filter out users who have manually changed the value of the pref during the experiment.

@@ +49,5 @@
> +function shutdown(reason) {
> +  if (reason != APP_SHUTDOWN) {
> +    // when the add-on is being disabled/uninstalled
> +    let defaultPrefs = new Preferences({defaultBranch: true});
> +    defaultPrefs.reset("gfx.content.azure.backends");

remove the {defaultBranch: true} part and also copy this into the uninstall function   (without the surrounding `if`).  When the experiment ends due to the time expiring (i.e. if it was not ended early), only the uninstall() function will run

::: experiments/skia-windows-nightly/filter.js
@@ +1,3 @@
> +function filter(c) {
> +  return c.telemetryEnvironment.settings.telemetryEnabled &&
> +         c.telemetryEnvironment.settings.e10sEnabled;

do you need to only target users with e10sEnabled? If not, you can remove this entire file to make it not have any filters. I think this one isn't necessary: even if a user doesn't have telemetry enabled, you'll still get value out of them from looking at crash stats.

::: experiments/skia-windows-nightly/manifest.json
@@ +13,5 @@
> +    "channel"          : ["nightly"],
> +    "minVersion"       : "52.0a1",
> +    "maxVersion"       : "52.0a1",
> +    "os"               : ["WINNT"],
> +    "sample"           : 0.5

This means that the experiment will only be shipped to 50% of the Nightly users. Then in your code you divide those users into two groups, meaning that you will actually test only 25% of users.

You can just remove this line to have sample = 100%. For Nightly that's fine.
Attachment #8791706 - Flags: review?(felipc) → feedback+
Created attachment 8791777 [details] [diff] [review]
skia-windows-nightly.patch

Revisions made based on previous feedback.
Attachment #8791706 - Attachment is obsolete: true
Attachment #8791777 - Flags: review?(felipc)
Comment on attachment 8791777 [details] [diff] [review]
skia-windows-nightly.patch

Review of attachment 8791777 [details] [diff] [review]:
-----------------------------------------------------------------

Close but not quite the ideal. See comments below

::: experiments/skia-windows-nightly/code/bootstrap.js
@@ +5,5 @@
> +Cu.import("resource://gre/modules/Preferences.jsm");
> +
> +var gStarted = false;
> +
> +const kSELF_ID = "skia-windows-nightly@experiments.mozilla.org";

tip: you can do a
`const prefs = new Preferences();`
here so that you don't need to do it at various different places

@@ +22,5 @@
> +      let r = Math.random() * 2;
> +      if (r < 1) {
> +        branch = "control";
> +      } else {
> +        branch = "cairo";

set the pref here so that it's only set once (when there was no branch assigned yet) and not every startup

@@ +25,5 @@
> +      } else {
> +        branch = "cairo";
> +      }
> +      yield Experiments.instance().setExperimentBranch(kSELF_ID, branch);
> +    } else {

here the idea is this: if branch is not null, we have already selected either test or control for this user. so what you should do is:

let correctBranch = false;

if (branch == "cairo") {
  correctBranch = (pref == "direct2d1.1,cairo");
} else if (branch == "control") {
  correctBranch = (pref == "direct2d1.1,skia");
}

if (!correctBranch) {
  branch = "user-changed";
  yield setExperimentBranch ...  again
}

@@ +34,5 @@
> +        branch = "user-changed";
> +      }
> +    }
> +
> +    switch (branch) {

then this entire switch can go away
Attachment #8791777 - Flags: review?(felipc)
Created attachment 8792059 [details] [diff] [review]
skia-windows-nightly.patch

Implement revisions suggested in comment 34.
Attachment #8791777 - Attachment is obsolete: true
Attachment #8792059 - Flags: review?(felipc)
Comment on attachment 8792059 [details] [diff] [review]
skia-windows-nightly.patch

Review of attachment 8792059 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great now, thanks! Just take a look at the nit below. Don't commit this yet to the repo. After you re-test this final code, please post the unsigned XPI here and ask :jason to sign it. Then you replace it in your folder with the signed one and push your commit to hg.m.o/webtools/telemetry-experiment-server.

::: experiments/skia-windows-nightly/code/bootstrap.js
@@ +30,5 @@
> +      yield Experiments.instance().setExperimentBranch(kSELF_ID, branch);
> +    } else {
> +      // Exclude the user if they've manually changed the pref
> +      let pref = prefs.get("gfx.content.azure.backends");
> +      let isBranchValid = false;

I just noticed that in the case of the "user-changed" branch, the  `if (!isBranchValid)` check below will run again doing unnecessary work. So the simple solution is to just change this initial value to true.
Attachment #8792059 - Flags: review?(felipc) → review+
Created attachment 8792070 [details]
experiment.xpi

Jason, can you please sign my .xpi as per comment 36?
Attachment #8792070 - Flags: review?(jthomas)

Updated

7 months ago
Attachment #8792070 - Flags: review?(jthomas) → review+
Created attachment 8792612 [details]
experiment.xpi signed

Please see attached.
https://hg.mozilla.org/webtools/telemetry-experiment-server/rev/51653907b88b
Status: ASSIGNED → RESOLVED
Last Resolved: 7 months ago
Resolution: --- → FIXED
As an FYI, I've just sent an email to drivers to request approval to ship this experiment. Milan, for this experiment to work we'll need to re-enable SKIA by default on the Nightly channel. Has that been done yet?
Flags: needinfo?(milan)
status-firefox51: affected → ---
status-firefox52: --- → affected
Version: 51 Branch → 52 Branch
(In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #40)
> As an FYI, I've just sent an email to drivers to request approval to ship
> this experiment. Milan, for this experiment to work we'll need to re-enable
> SKIA by default on the Nightly channel. Has that been done yet?

Milan, can you also add a comment in this bug identifying the Telemetry probe Mason added to track the backend in use?
ContentBackend (see bug 1302240).  Working on the Skia as the default.
Flags: needinfo?(milan)
I pushed the endDate forward 1 week, since the experiment hasn't started yet:
https://hg.mozilla.org/webtools/telemetry-experiment-server/rev/b65f23017c42

And tagged the tip of the tree with release_tag:
https://hg.mozilla.org/webtools/telemetry-experiment-server/rev/82bcba19e75d

Now you just need to file a bug like this one asking for it to be deployed to the cdn: bug 1247822
Flags: needinfo?(anthony.s.hughes)
Blocks: 1305586
(In reply to :Felipe Gomes (needinfo me!) from comment #43)
> Now you just need to file a bug like this one asking for it to be deployed

Done. See bug 1305586.
Flags: needinfo?(anthony.s.hughes)
(In reply to Anthony Hughes (:ashughes) [GFX][QA][Mentor] from comment #44)
> Done. See bug 1305586.

This is now deployed and tested. Now we wait...
Depends on: 1305898
You need to log in before you can comment on or make changes to this bug.