Firefox 46 beta/release split the audience and offer e10s to a configurable subset

RESOLVED FIXED in Firefox 46

Status

()

defect
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: benjamin, Assigned: Felipe)

Tracking

unspecified
mozilla47
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(e10sm8+, firefox46+ fixed, firefox47+ fixed)

Details

(Whiteboard: btpp-active)

User Story

Starting in Firefox 47, we want to be able to configurably split our beta and release populations:

* N% of the population will stay in a no-e10s configuration
* the other segment will have e10s turned on according to the current release rules (currently, only with addons disabled, no a11y, and no-RTL)

I believe we should not do this with an experiment, but should instead do this with a system addon.

The addon will be hard-configured with a set of mappings, for example:

beta channel: 50% e10s
release channel: 20% e10s

The system addon should, at first launch, generate and store a random number. If the random number is greater than the cutoff, at every launch the system addon should enable e10s if appropriate.

The e10s status should be reflected into the telemetry ping environment in a manner that let's us answer the following questions:

* is the addon working correctly to enable/disable e10s
* correlate e10s enabled against other metrics

NOTE: prefs should not be set by this addon to configure e10s on or off: the decision should be programmatic on each startup. We don't want permanent pref values to leak into other channels.

In order to satisfy the no-pref requirement, it may be necessary to make code changes on the trains to programmatically enable e10s instead of just reading prefs.

We will need to verify that system addons consistently start early enough that they can enable e10s, but I'm fairly confident that this is the case.

Attachments

(3 attachments)

Reporter

Description

3 years ago
See the user story for how we should run a 50/50 split in beta and potentially roll e10s out to a subset of release in 46.

System addons are a good solution for this because it allows us to rapidly turn off e10s if issues are discovered, or change the release ratio gradually (e.g. start with 5% and ramp to 50 or even 100% over the course of a few weeks).

I'd like feedback from Felipe and Brad that this is reasonable, and setting a NEEDINFO on Sylvestre and Lawrence to keep you informed.
Flags: needinfo?(sledru)
Flags: needinfo?(felipc)
Flags: needinfo?(blassey.bugs)
This sounds good to me. One change I'd make is to set the rate to 10% in release to start. I assume we'd adjust the rate by updating the add-on.
Flags: needinfo?(blassey.bugs)
OK, thanks.
Flags: needinfo?(sledru)
Sounds good to me too. Two things I'd like to mention:

- make it simple to opt-in in a way that is understandable and easy to track. There will always be ways for people to force e10s on, and I want to discourage people from having to change a combination of prefs and leaving things in a bad or unrecognizable state that we can't track.

- I just don't see why the system add-on shouldn't use prefs to do it. The e10s decision code runs early on startup and the add-on could easily manage a pref to keep the startup path simple. It's also easy to track as we have been managing prefs successfully throughout the e10s development.
Flags: needinfo?(felipc)
Reporter

Comment 4

3 years ago
I'm skeptical about letting people opt in or opt out, but as long as it's completely separate from the random sampling I won't make a big deal of it.

Just as with experiments, you really shouldn't be making persistent state changes within a system addon. On upgrade (or experiment termination), the addon may be removed without cleaning up the persistent prefs.

You can set default prefs at runtime ( getDefaultBranch().setBoolPref() ) to achieve the same goal with no persistence.
Reporter

Updated

3 years ago
Assignee: nobody → felipc

Comment 5

3 years ago
Having an identifiably control group of equal size and population is helpful in comparing signatures.

Release channel starting with same split as finishing beta is good for assessing crash rates. Option may be to dial down beta rate when close to release.
Whiteboard: btpp-active
We haven't actually tested our system addon issues very well yet. They are about to test with Hello in 46 beta 1. So I have some concerns depending on a not-yet-released mechanism. I wouldn't want that to complicate or block release (or release of the feature)

10% sounds fine for beta if we do this, but for release we may also want to rollout a bit more conservatively to check for startup crashes, for a couple of days.
Reporter

Updated

3 years ago
Blocks: 1247497
Comment on attachment 8723662 [details]
MozReview Request: Bug 1249845 - Set up structure for e10srollout system add-on. r=Standard8,glandium

https://reviewboard.mozilla.org/r/36657/#review33327
Attachment #8723662 - Flags: review?(mh+mozilla) → review+
Attachment #8723662 - Flags: review?(standard8) → review+
Comment on attachment 8723662 [details]
MozReview Request: Bug 1249845 - Set up structure for e10srollout system add-on. r=Standard8,glandium

https://reviewboard.mozilla.org/r/36657/#review33655

The structure looks fine here, so r=Standard8, but obviously you'll want to add the bootstrap.js or do that in another patch before shipping this.

::: browser/extensions/e10srollout/moz.build:8
(Diff revision 1)
>    'bootstrap.js'

You'll obviously need to create the bootstrap.js file before landing. Its unclear where you're planning on doing that.
OK.  So as I understand it so far, this staged rollout plan will happen both on beta and on release, only if the criteria are met from here: https://wiki.mozilla.org/Electrolysis/Release_Criteria.

Do we plan to do a staged rollout to some subset of beta users for 46?  What would we consider to be blocking that staged rollout to beta?   I'd like to know that since 46 is going to release on the beta channel next week.

Comment 11

3 years ago
My outsider view is this bug is the block for beta 46 having some e10s.
https://wiki.mozilla.org/E10s/Status/m8 (maybe outdated) are other bugs that ideally get fixed for start of beta 46. The criteria for release then is assessed analysing the beta population.

45 beta started with about 20% e10s (with-addons) then switched to about 10% e10s (without-addons).
This system addon set to a 50/50 split would give about 20% e10s (without-addons). To me seems good as similar to 45 start, hopefully fewer bugs too.

Maybe a v0.0 system addon is best added quick that does nothing/little rather than fully complete?
Comment on attachment 8723662 [details]
MozReview Request: Bug 1249845 - Set up structure for e10srollout system add-on. r=Standard8,glandium

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/36657/diff/1-2/
This code decides on a per-branch basis the proportion of users to be part of the e10s rollout, and sets the pref browser.tabs.remote.autostart.2 to true for users part of the test group. It also sets a name for the assigned group to be stored in the telemetry environment.

The code attempts to not store any prefs in the user's profile (besides the initial random() value), in order to avoid any state being stored or requiring clean-up later.

Review commit: https://reviewboard.mozilla.org/r/37599/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/37599/
Attachment #8725714 - Flags: review?(dtownsend)
FWIW, final list of possible cohorts. I made it very detailed so we can understand how rollout is working. test/control remains the statistically significant groups.

- "unsupportedChannel"
  for any channel other than beta/release

- "pastStartup"
  in case the system add-on code ran too late in the startup process and wasn't able to configure e10s properly

- "optedIn"
  users who opted-in through the opt-in or force-enable pref

- "optedOut"
  users who opted-out through the force-disable pref

- "test"
  random() < threshold for this channel

- "control"
  random() > threshold for this channel

- "unknown"
  in case something goes wrong and the add-on code didn't run at all. shouldn't happen in the wild
Attachment #8725714 - Flags: review?(dtownsend)
Comment on attachment 8725714 [details]
MozReview Request: Bug 1249845 - bootstrap.js code to manage the e10s staged rollout. r=Mossop

https://reviewboard.mozilla.org/r/37599/#review34163

I'm more dubious than Benjamin that this code runs early enough. When I last asked when BrowserTabsRemoteAutostart is called on startup I was shown a stack trace that implicated a restartless add-on's startup code, possibly one of the other system add-ons. We don't start restartless add-ons in any defined order, chances are it is stable for a given configuration but I couldn't guarantee more than that. Have we tested this in a live build with a few other restartless add-ons installed that might do things on startup, like adblock?

::: browser/extensions/e10srollout/bootstrap.js:76
(Diff revision 1)
> +    let existingVal = Preferences.get(PREF_COHORT_SAMPLE, 100);

You might as well skip the isSet check and instead just pass undefined as a default.

::: toolkit/xre/nsAppRunner.cpp:4722
(Diff revision 1)
> +  if (Preferences::GetType(kE10sTrialPref) == nsIPrefBranch::PREF_INVALID) {

It's not clear to me why we care about this pref value. Don't we only care that we've already passed through this function?
(In reply to Dave Townsend [:mossop] from comment #16)
> Comment on attachment 8725714 [details]
> MozReview Request: Bug 1249845 - bootstrap.js code to manage the e10s staged
> rollout. r=Mossop
> 
> https://reviewboard.mozilla.org/r/37599/#review34163
> 
> I'm more dubious than Benjamin that this code runs early enough. When I last
> asked when BrowserTabsRemoteAutostart is called on startup I was shown a
> stack trace that implicated a restartless add-on's startup code, possibly
> one of the other system add-ons. We don't start restartless add-ons in any
> defined order, chances are it is stable for a given configuration but I
> couldn't guarantee more than that. Have we tested this in a live build with
> a few other restartless add-ons installed that might do things on startup,
> like adblock?

I'm running some tests with more add-ons now to check. FWIW I had checked the existing system add-ons code and none in the tree checks for Services.appinfo.browserTabsRemoteAutostart. But yeah any other add-on out there that checks that getter can trigger this if it loads first. There doesn't appear to be many on mxr that does that, so we'll see how that goes.

> 
> ::: browser/extensions/e10srollout/bootstrap.js:76
> (Diff revision 1)
> > +    let existingVal = Preferences.get(PREF_COHORT_SAMPLE, 100);
> 
> You might as well skip the isSet check and instead just pass undefined as a
> default.
> 
> ::: toolkit/xre/nsAppRunner.cpp:4722
> (Diff revision 1)
> > +  if (Preferences::GetType(kE10sTrialPref) == nsIPrefBranch::PREF_INVALID) {
> 
> It's not clear to me why we care about this pref value. Don't we only care
> that we've already passed through this function?

It's just the way I found to see if we've passed through this function before the startup() from our add-on ran.
(In reply to :Felipe Gomes (needinfo me!) from comment #17)
> (In reply to Dave Townsend [:mossop] from comment #16)
> > Comment on attachment 8725714 [details]
> > MozReview Request: Bug 1249845 - bootstrap.js code to manage the e10s staged
> > rollout. r=Mossop
> > 
> > https://reviewboard.mozilla.org/r/37599/#review34163
> > 
> > I'm more dubious than Benjamin that this code runs early enough. When I last
> > asked when BrowserTabsRemoteAutostart is called on startup I was shown a
> > stack trace that implicated a restartless add-on's startup code, possibly
> > one of the other system add-ons. We don't start restartless add-ons in any
> > defined order, chances are it is stable for a given configuration but I
> > couldn't guarantee more than that. Have we tested this in a live build with
> > a few other restartless add-ons installed that might do things on startup,
> > like adblock?
> 
> I'm running some tests with more add-ons now to check. FWIW I had checked
> the existing system add-ons code and none in the tree checks for
> Services.appinfo.browserTabsRemoteAutostart. But yeah any other add-on out
> there that checks that getter can trigger this if it loads first. There
> doesn't appear to be many on mxr that does that, so we'll see how that goes.

As I recall the stack I saw went through CustomizableUI or something before reaching the check, it doesn't have to be direct calls.

> > ::: browser/extensions/e10srollout/bootstrap.js:76
> > (Diff revision 1)
> > > +    let existingVal = Preferences.get(PREF_COHORT_SAMPLE, 100);
> > 
> > You might as well skip the isSet check and instead just pass undefined as a
> > default.
> > 
> > ::: toolkit/xre/nsAppRunner.cpp:4722
> > (Diff revision 1)
> > > +  if (Preferences::GetType(kE10sTrialPref) == nsIPrefBranch::PREF_INVALID) {
> > 
> > It's not clear to me why we care about this pref value. Don't we only care
> > that we've already passed through this function?
> 
> It's just the way I found to see if we've passed through this function
> before the startup() from our add-on ran.

Sure, but you can just set the pref to indicate that regardless. If it gets set after startup() is called then it doesn't make any difference
(In reply to Dave Townsend [:mossop] from comment #18)
> Sure, but you can just set the pref to indicate that regardless. If it gets
> set after startup() is called then it doesn't make any difference

Yeah, it was not to avoid setting the pref itself, but as a way to be able to mark the "pastStartup" cohort properly.

In any case, that will have to go anyway as I did some further testing and this situation occurs often, so we'll need a different solution.

I think it would be fine to just set the normal branch of the pref to true and clean it up in a later release, but I'll try to see if I can get something else to work before.
Comment on attachment 8725713 [details]
MozReview Request: Bug 1249845 - Store the e10s rollout cohort in the telemetry environment. r=gfritzsche

https://reviewboard.mozilla.org/r/37597/#review34199

This looks good.
You also need to request data collection review on this:
https://wiki.mozilla.org/Firefox/Data_Collection

::: toolkit/components/telemetry/docs/environment.rst:46
(Diff revision 1)
> +        e10sCohort: <string>, // which e10s cohort was assigned for this user

Please open an issue (or a PR) to get our "main" ping schema updated:
https://github.com/mozilla-services/mozilla-pipeline-schemas/
Attachment #8725713 - Flags: review?(gfritzsche) → review+
Attachment #8725714 - Flags: review?(dtownsend)
Comment on attachment 8725714 [details]
MozReview Request: Bug 1249845 - bootstrap.js code to manage the e10s staged rollout. r=Mossop

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/37599/diff/1-2/
Comment on attachment 8725714 [details]
MozReview Request: Bug 1249845 - bootstrap.js code to manage the e10s staged rollout. r=Mossop

https://reviewboard.mozilla.org/r/37599/#review34253

Looks fine assuming browser.tabs.remote.autostart.2 doesn't have a default value on the branch we use this on. Is it worth considering a cohort for those that opt out after the test began? At the moment they are lumped in with those that had already lumped out already. Maybe that latter group is so small that it doesn't matter.
Attachment #8725714 - Flags: review?(dtownsend) → review+
(In reply to Dave Townsend [:mossop] from comment #22)
> Comment on attachment 8725714 [details]
> MozReview Request: Bug 1249845 - bootstrap.js code to manage the e10s staged
> rollout. r=Mossop
> 
> https://reviewboard.mozilla.org/r/37599/#review34253
> 
> Looks fine assuming browser.tabs.remote.autostart.2 doesn't have a default
> value on the branch we use this on. Is it worth considering a cohort for
> those that opt out after the test began? At the moment they are lumped in
> with those that had already lumped out already. Maybe that latter group is
> so small that it doesn't matter.

Yep, that pref will remain undefined on the branches where this is used.

About existing opt-out users, that's probably minimal on beta/release since this never existed there outside of the experiment, so I believe they don't need special handling.
Reporter

Updated

3 years ago
User Story: (updated)
Hi Felipe, if a user X opts into e10s manually, and we rollout an update via this system add-on to turn e10s off completely, will e10s also be turned off for that User X? Have we tested this scenario?
Flags: needinfo?(felipc)
Yeah, the system add-on has the ability to do that if necessary. If we just turn down the number to 0% of users it won't (as that only deals with the non-optins), but it's also possible to ship an update through the system add-on that will turn off e10s completely.
Flags: needinfo?(felipc)
Blocks: 1255013
Felipe, does this fix need to be uplifted to Beta 46 for our e10s experiment?
Flags: needinfo?(felipc)
(In reply to :Felipe Gomes (needinfo me!) from comment #30)
> Yeah, the system add-on has the ability to do that if necessary. If we just
> turn down the number to 0% of users it won't (as that only deals with the
> non-optins), but it's also possible to ship an update through the system
> add-on that will turn off e10s completely.

Gotcha. Thanks!
(In reply to Chris Peterson [:cpeterson] from comment #31)
> Felipe, does this fix need to be uplifted to Beta 46 for our e10s experiment?

Nope, not for the experiment. If we do a second phase of e10s testing on beta46 I'll ask to use this instead of the experiment system, and if that's agreed I'll ask for uplift then.
Flags: needinfo?(felipc)
Blocks: 1257265
Comment on attachment 8725714 [details]
MozReview Request: Bug 1249845 - bootstrap.js code to manage the e10s staged rollout. r=Mossop

Approval Request Comment
[Feature/regressing bug #]: Main code for the system add-on for e10s rollout
[User impact if declined]: needed to test e10s on beta46 as planned
[Describe test coverage new/current, TreeHerder]: landed on central/aurora, QA'ed by Softvision
[Risks and why]: the risk is contained in it not working properly and thus not yielding the results for this experiment phase. But that's what we want to test.
[String/UUID change made/needed]: none
Attachment #8725714 - Flags: approval-mozilla-beta?
Comment on attachment 8725713 [details]
MozReview Request: Bug 1249845 - Store the e10s rollout cohort in the telemetry environment. r=gfritzsche

Approval Request Comment
Same as comment 34, this patch stores the data in telemetry which will be used to analyze the data (data analysis after this ends will be tracked on bug 1251259)
Attachment #8725713 - Flags: approval-mozilla-beta?
Comment on attachment 8723662 [details]
MozReview Request: Bug 1249845 - Set up structure for e10srollout system add-on. r=Standard8,glandium

Approval Request Comment
Same as comment 34, this is just the boilerplate to build and ship the system add-on
Attachment #8723662 - Flags: approval-mozilla-beta?

Comment 37

3 years ago
I am on 46 beta channel. All of my active add-ons work well with e10s on, but due to this testing program my FireFox runs with e10s de-facto off. How I can force e10s to be actually used (I have big session so it is too slow in non-e10s mode). 

Thank you in advance.
(In reply to User Dderss from comment #37)
> I am on 46 beta channel. All of my active add-ons work well with e10s on,
> but due to this testing program my FireFox runs with e10s de-facto off. How
> I can force e10s to be actually used (I have big session so it is too slow
> in non-e10s mode). 
> 
> Thank you in advance.

this is not encouraged but you can use the pref "browser.tabs.remote.force-enable"
Comment on attachment 8725714 [details]
MozReview Request: Bug 1249845 - bootstrap.js code to manage the e10s staged rollout. r=Mossop

Needed for the e10s rollout for beta 4 today.
Attachment #8725714 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8725713 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8723662 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Updated

3 years ago
Blocks: 1261362
Blocks: 1261387
Blocks: 1261422

Updated

3 years ago
See Also: → 1281281
You need to log in before you can comment on or make changes to this bug.