Closed Bug 1200276 Opened 5 years ago Closed 4 years ago

Understand how (e10s) electrolysis in Firefox desktop impacts product retention and usage

Categories

(Core Graveyard :: Tracking, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: cmore, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fxgrowth])

From talking to Doug Turner, there is discussion on how ready e10s is for a more general audience release. While the meta tracking 516752 bug for e10s is far from complete and there are known issue, we should consider trying a small scale A/B test to understand how e10s impacts product retention and usage.

We could create a cohort test using funnelcakes as we've done previously with other product and onboarding tests.

Here's a proposed approached:

* Pick a specific future version of Firefox to test on
* Decide on the OS and lang to test
* Decide on the channel (requires the e10s code and flag to be in the specific channel)
* Decide on sample size and duration of test (can be anywhere from 1% to 100% sample size that matches the criteria above, but it really should be 5% or less)
* Define goal criteria (retention, usage, add-ons installed, etc.)

Cohorts:

* A: Funnelcake with out-of-the-box Firefox with e10s disabled
* B: Funnelcake with Firefox with e10s enabled

Let's discuss!
Blocks: 1198541
:dougt: can you review this and let me know how you want to proceed with this test? We have a lot of retention tests going on right now and we'll just have to schedule this test at an appropriate time. 

Also, can you CC the right folks on this bug given our previous discussion?
Whiteboard: [fxgrowth]
Blocks: e10s-rc
Do we normally run these tests on Release or Beta?

(In reply to Chris More [:cmore] from comment #0)
> From talking to Doug Turner, there is discussion on how ready e10s is for a
> more general audience release. While the meta tracking 516752 bug for e10s
> is far from complete and there are known issue, we should consider trying a
> small scale A/B test to understand how e10s impacts product retention and
> usage.
> 
> We could create a cohort test using funnelcakes as we've done previously
> with other product and onboarding tests.
> 
> Here's a proposed approached:
> 
> * Pick a specific future version of Firefox to test on
I'd suggest 43 beta modulo my question above
> * Decide on the OS and lang to test
Do we normally restrict to one OS and one locale? If so I'd say Windows and en-us. If not, I'd say all of them.
> * Decide on the channel (requires the e10s code and flag to be in the
> specific channel)
Beta
> * Decide on sample size and duration of test (can be anywhere from 1% to
> 100% sample size that matches the criteria above, but it really should be 5%
> or less)
I'd think a bigger sample size would be better, so let's go with 5%.
> * Define goal criteria (retention, usage, add-ons installed, etc.)
retention and usage.
> 
> Cohorts:
> 
> * A: Funnelcake with out-of-the-box Firefox with e10s disabled
> * B: Funnelcake with Firefox with e10s enabled
> 
> Let's discuss!
Good questions.

(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #2)
> Do we normally run these tests on Release or Beta?
> 

All of the funnelcakes I have done in the past 3 years have been on the release channel, because release users are a bigger sample size and act quite a bit different than pre-release. For this specific test, that may not be feasible or even make sense. We can do any channel that we think is best suited for the test.

> (In reply to Chris More [:cmore] from comment #0)
> > From talking to Doug Turner, there is discussion on how ready e10s is for a
> > more general audience release. While the meta tracking 516752 bug for e10s
> > is far from complete and there are known issue, we should consider trying a
> > small scale A/B test to understand how e10s impacts product retention and
> > usage.
> > 
> > We could create a cohort test using funnelcakes as we've done previously
> > with other product and onboarding tests.
> > 
> > Here's a proposed approached:
> > 
> > * Pick a specific future version of Firefox to test on
> I'd suggest 43 beta modulo my question above

Firefox 43 sounds fine as long as we are fine with waiting a few more months for a better test. I'm fine with it.

> > * Decide on the OS and lang to test
> Do we normally restrict to one OS and one locale? If so I'd say Windows and
> en-us. If not, I'd say all of them.
> > * Decide on the channel (requires the e10s code and flag to be in the
> > specific channel)
> Beta

Beta seems to make sense and should be enough of a sample. 

> > * Decide on sample size and duration of test (can be anywhere from 1% to
> > 100% sample size that matches the criteria above, but it really should be 5%
> > or less)
> I'd think a bigger sample size would be better, so let's go with 5%.

Yeah, 5% should be fine since we are on beta channel and that's much smaller than release. We can just see how many samples we are getting over time and if we need to adjust the sample up to get more numbers. Probably will need more than 7 days of acquisition.

> > * Define goal criteria (retention, usage, add-ons installed, etc.)
> retention and usage.

+1

> > 
> > Cohorts:
> > 
> > * A: Funnelcake with out-of-the-box Firefox with e10s disabled
> > * B: Funnelcake with Firefox with e10s enabled
> > 
> > Let's discuss!
Summary: Understand how (e10s) electrolysis in Firefox desktop impacts churn and retention rates → Understand how (e10s) electrolysis in Firefox desktop impacts product retention and usage
Let's do a heartbeat survey for these funnelcakes to understand how users feel about e10s.
Absolutely. Adding Gregg and Rob as well. Let me know when you want to meet so we can talk details.
Francesco: can you help here get this rolling? needed:

* create mana test details page
* create release engineering bug and builds
* create optimizely test bug
* add mana page to growth backlog
* needinfo cmore to review

Then

cmore will needinfo matt and team to review
Flags: needinfo?(francescosapolizzi)
Need to confirm this is still the right prefs: 

enable e10s in Nightly by setting values of browser.tabs.remote and browser.tabs.remote.autostart preferences to true.

Need to also confirm that e10s is *only* in nightly and no other channel. A channel closer to release would be better from a volume perspective, but not sure if e10s is out of nightly.
(In reply to Chris More [:cmore] from comment #7)
> Need to confirm this is still the right prefs: 
> 
> enable e10s in Nightly by setting values of browser.tabs.remote and
> browser.tabs.remote.autostart preferences to true.
> 

Confirmed that is the correct pref. There are also browser.tabs.remote.autostart.1 & browser.tabs.remote.autostart.2 prefs, which seem to be Aurora specific. I will follow up on whether or not this means anything to us.

> Need to also confirm that e10s is *only* in nightly and no other channel. A
> channel closer to release would be better from a volume perspective, but not
> sure if e10s is out of nightly.

Confirmed that e10s is in Aurora. https://wiki.mozilla.org/Electrolysis#Schedule

Next Steps:
* Create mana documentation
* Create release engineering bug and builds
* create optimizely test and bug
* add mana page to growth backlog
* needinfo cmore to review
[1] browser.tabs.remote.autostart

[2] browser.tabs.remote.autostart.2

Are the only two preferences we need to worry about currently.

[1] https://dxr.mozilla.org/mozilla-central/source/addon-sdk/source/python-lib/cuddlefish/runner.py#420

[2] https://dxr.mozilla.org/mozilla-central/source/browser/app/profile/firefox.js#1883
Flags: needinfo?(francescosapolizzi)
Depends on: 1213449
Depends on: 1213460
E10s is default on in Developer Edition (fka Aurora), if you want to test there, you are testing an audience that has not been chased away by 5 months of already experiencing E10s. I don't know if that gives any valid results.
Testing on Beta makes more sense, as that channel is non-e10s - but some larger stability fixes to e10s happened in 43, so I'm not sure if it makes sense to do anything before 43 hits beta.
This is the first time we've modified Firefox's functionality in a funnelcake build, or used aurora. What do we plan to do over the longer term with the e10s-disabled group ? Will we remove the config so they go back to the default prefs ? IIRC we don't have any experience with doing that - in theory we can remove <install_dir>/distribution/ with a special update, but we should check what is incorporated into the profile and if we need extra cleanup steps.  Will we need any messaging to those users at that time ? 

As funnelcakes are distributed as new installs, have we looked at how many downloads we get, and how long it will take to get a good sample size ?
Also, why do we need to do Funnelcake for this and not just a Telemetry Experiment?
I think I see why we would want to use Funnelcake; to compare similar cohorts who are downloading Firefox as a new install. As I understand it we can pick a locale and OS, and then offer some percentage of those new users an e10s-enabled beta 43 build. We then can compare whether non-e10s betas retain more users than e10s-enabled betas.   Would this be limited to a specific beta build? ie, are we going to offer it to beta 2 downloaders only, or keep the experiment running for beta 3 or 4 downloaders?

And, I agree with Nick, we have to consider whether we then update the e10s-enabled Funnelcake testers to non-e10s, and how the UI will let users know what's happening. I think it will be best to disable it and inform the user (thanking them, even).  

Though this will stay limited to a fairly small amount of users, I'd like us to keep this also limited to very early beta, so that mid to late beta we can stay focused on shipping 43 in a stable and non-crashy of a state as possible.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #13)
> Also, why do we need to do Funnelcake for this and not just a Telemetry
> Experiment?

Good question.

Telemetry experiments are opt-in of existing users and funnelcakes are separate new Firefox users cohort who aren't skewed by opt'ing in to an experiment. All the tests I perform with funnelcakes and changes to the product, pages, or funnels are neither opt-in or opt-out, they are just given to a specific small percentage of users who are downloading Firefox off of our websites. 

Though, if we all believe a telemetry experiment is better, we can do that. We just need to do a real A/B and control for all variables so that we can analyze retention.

Should I be able to segment FHR payloads by people who have a specific telemetry experiment enabled? Funnelcakes change the channel and build ID so that I can then bucket users cleanly to measure retention.
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #14)
> I think I see why we would want to use Funnelcake; to compare similar
> cohorts who are downloading Firefox as a new install. As I understand it we
> can pick a locale and OS, and then offer some percentage of those new users
> an e10s-enabled beta 43 build. We then can compare whether non-e10s betas
> retain more users than e10s-enabled betas.   Would this be limited to a
> specific beta build? ie, are we going to offer it to beta 2 downloaders
> only, or keep the experiment running for beta 3 or 4 downloaders?

Yeah, all funnecakes need to be created from a specific version (including sub versions), OS and language and then I can distribute them to any percentage of users on any specific web page who are attempting to download that version. Normally, we are doing these funnelcake tests on the release channel, but this will be one of first in recent times that we are doing on pre-release.

Should we do this on beta or the release channel?

> 
> And, I agree with Nick, we have to consider whether we then update the
> e10s-enabled Funnelcake testers to non-e10s, and how the UI will let users
> know what's happening. I think it will be best to disable it and inform the
> user (thanking them, even). 

It is possible to target hotfixes on these funnelcakes to go back to non-e10s. Mike Connor and I did that before with a cohort of funnelcake users. Since they have unique build and channel IDs, we can push an update out to them to turn it off. We probably only want to turn it off if the retention and usage is massively worse than the control (non e10s). If they are similar or better, we can monitor those cohorts over time.
 
> 
> Though this will stay limited to a fairly small amount of users, I'd like us
> to keep this also limited to very early beta, so that mid to late beta we
> can stay focused on shipping 43 in a stable and non-crashy of a state as
> possible.

My only concern about telling users that they are running e10s is that may (or may not) skew results. Kind of like how the best experiments are blind tests. If we were going to ask for feedback on e10s, then them knowing about it would probably make sense. We *could* do a Firefox heartbeat feedback survey on the two cohorts to see if their 5-star rating and feedback of Firefox is any statistically different.
What if we were to add a second tab (we can do that now) to the e10s cohort to teach them about what e10s is all about?

Do we have a good page that users would be able to understand what it is all about? Since these are pre-release folks, do we think a technical page would be suffice?
(In reply to Chris More [:cmore] from comment #15)
> Telemetry experiments are opt-in of existing users

I thought they were opt-out (on pre-release channels that have Telemetry on by default, which includes Beta) - but yes, they are a slice of existing users.
(In reply to Chris More [:cmore] from comment #15)
> Should I be able to segment FHR payloads by people who have a specific
> telemetry experiment enabled? Funnelcakes change the channel and build ID so
> that I can then bucket users cleanly to measure retention.

We have Unified Telemetry on 43 anyhow, FHR is dead (or actually, now just a subset of Telemetry that has a higher degree of anonymity and is opt-out on all channels, while the full Telemetry is opt-out on prerelease channels including beta and opt-in on release).

That said, enabled experiments are visible in that data easily, from all I know - it's not as crude as channel or build ID, it's actually listed in separate fields about experiments.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #19)
> (In reply to Chris More [:cmore] from comment #15)
> > Should I be able to segment FHR payloads by people who have a specific
> > telemetry experiment enabled? Funnelcakes change the channel and build ID so
> > that I can then bucket users cleanly to measure retention.
> 
> We have Unified Telemetry on 43 anyhow, FHR is dead (or actually, now just a
> subset of Telemetry that has a higher degree of anonymity and is opt-out on
> all channels, while the full Telemetry is opt-out on prerelease channels
> including beta and opt-in on release).

Yes, I was meaning the data formally known as FHR in post Firefox 42 versions. :) Need another short acronym. UT?

> 
> That said, enabled experiments are visible in that data easily, from all I
> know - it's not as crude as channel or build ID, it's actually listed in
> separate fields about experiments.

Ahh, interesting. So have we done any telemetry experiments with e10s cohorts already? If not, why haven't we?
I don't know of any telemetry experiments with e10s so far. I think we should look into trying that because it's much easier to do than Funnelcake. If a telemetry experiment doesn't give us the data we need, we can still try Funnelcake, right?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #21)
> I don't know of any telemetry experiments with e10s so far. I think we
> should look into trying that because it's much easier to do than Funnelcake.
> If a telemetry experiment doesn't give us the data we need, we can still try
> Funnelcake, right?

Yup. Let's try telemetry experiments first and if that doesn't work, we can do a funnelcake. Whatever helps us understand how some population of people react (retention, hours, feedback, etc.) to e10s when compared to a similar population of people without e10s.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #21)
> I don't know of any telemetry experiments with e10s so far. I think we
> should look into trying that because it's much easier to do than Funnelcake.
> If a telemetry experiment doesn't give us the data we need, we can still try
> Funnelcake, right?

:kairo: can you let me know if we can do this test via telemetry experiments and when we can kick off a two cohort test? Thanks!
Flags: needinfo?(kairo)
Isn't this a straight-up duplicate of bug 1193089 which has patches?
Sounds like the other bug is for measuring perf, but I'd hope we can do both with this.
Flags: needinfo?(kairo)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #24)
> Isn't this a straight-up duplicate of bug 1193089 which has patches?

Looks to be a duplicate and I didn't know the other was filed. Let's see if we can do a combo measurement of performance and a cohort test with telemetry experiments and only do the funnelcake if we need to do it.
I serious doubt that we have the statistical power to do any "retention"-style analysis on the beta population.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #27)
> I serious doubt that we have the statistical power to do any
> "retention"-style analysis on the beta population.

That's another issue because the sample sizes on non-release are smaller than release. You really don't know how much sample size you need until you do the post analysis and see if there is any variation between the cohorts. Depending on the variation size, you can then calculate if you have enough sample to get above the noise level and get statistical power. If this was strictly conversion-rate-optimization, I could calculate the sample needed, but you really don't know until you run these tests. For any of my past retention tests, I need at least 20k users per variation.
jgriffiths: Is there enough e10s data to ensure we are good to ship e10s? Should we reconsider doing a funnelcake retention test for a new cohort of users on release channel? Regardless of the KPIs we are measuring, I want to make sure it has a neutral to positive impact to new user retention.
Flags: needinfo?(jgriffiths)
jgriffiths:  you mentioned some concerns around RTL locales. is there a tracking bug(s) on those concerns?
Lets also figure out a testing plan and community engagement that will help us evaluate if e10s is ready for any locales where there is the possibility of regression.
We don't need a funnelcake, since we'll be deploying to a random population subset directly. We will be measuring/comparing engagement ratio for the e10s and non-e10s cohorts in bug 1251259.
(In reply to chris hofmann from comment #30)
> jgriffiths:  you mentioned some concerns around RTL locales. is there a
> tracking bug(s) on those concerns?

We have open issues related to RTL that hold up rolling out to these users - 
http://is.gd/cKUiNa
(In reply to Chris More [:cmore] from comment #29)
> jgriffiths: Is there enough e10s data to ensure we are good to ship e10s?
> Should we reconsider doing a funnelcake retention test for a new cohort of
> users on release channel? Regardless of the KPIs we are measuring, I want to
> make sure it has a neutral to positive impact to new user retention.

That's a really interesting suggestion as part of the picture! What do we need to do to get this rolling?

Aside: my expectation is that retention would be better with e10s, unless users run into e10s-specific bugs that we need to fix.
Flags: needinfo?(jgriffiths)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #31)
> We don't need a funnelcake, since we'll be deploying to a random population
> subset directly. We will be measuring/comparing engagement ratio for the
> e10s and non-e10s cohorts in bug 1251259.

Good. That's what I hoped. Will you try new Firefox cohorts on top of existing users? Those two cohorts have different tolerance levels to product performance.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #31)
> We don't need a funnelcake, since we'll be deploying to a random population
> subset directly. We will be measuring/comparing engagement ratio for the
> e10s and non-e10s cohorts in bug 1251259.

Let's keep this bug open for now - I think there are distinctly different cases here. What a funnelcake test gets us is people coming into the downloads page - by definition people on side of our churn problem. 

In my mind, the earliest we could possibly do this is during the 46 release cycle.
we're not sure if we'll ever do this. keeping now as a placeholder.
Priority: -- → P3
Jeff, would an e10s retention funnelcake test provide different information that the user mortality rate analysis (bug 1249665) for the e10s experiment with Beta 46 users?
Depends on: 1249665
Flags: needinfo?(jgriffiths)
(In reply to Chris Peterson [:cpeterson] from comment #37)
> Jeff, would an e10s retention funnelcake test provide different information
> that the user mortality rate analysis (bug 1249665) for the e10s experiment
> with Beta 46 users?

As Jim says:

(In reply to Jim Mathies [:jimm] from comment #36)
> we're not sure if we'll ever do this. keeping now as a placeholder.

Given this, tracking + and P3 are appropriate.
Flags: needinfo?(jgriffiths)
Tracking new user churn is not an e10s release blocker.
No longer blocks: e10s-rc
cmore, unless your team is going to do this work we should resolve it. What do you think?
Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(chrismore.bugzilla)
Resolution: --- → INCOMPLETE
Product: Core → Core Graveyard
(In reply to Benjamin Smedberg AWAY UNTIL 2-AUG-2016 [:bsmedberg] from comment #40)
> cmore, unless your team is going to do this work we should resolve it. What
> do you think?

no plans to do anything here as others didn't feel it was required.
tracking-e10s: + → ---
Flags: needinfo?(chrismore.bugzilla)
:mbest: per our conversation a few minutes ago. What do you want to do about a potential e10s funnelcake now?
Flags: needinfo?(mbest)
:elan: can you help drive internally if we can do a cohort test of new users to understand retention rates of e10s?

For everyone:

If we do a cohort test with a funnelcake, we have to make sure that both the control and experiment cohorts stay consistent over time.

Cohort A: control (no e10s from the start and should not get e10s for at least 6 weeks after profile creation date)

Cohort B: experiment (should have e10s from the start and *always* have e10s)

Funnelcakes are now denoted in the distribution.id in telemtery and the e10s add-on can just check the distribution.id to ensure we are not mixing up the cohorts above.
Flags: needinfo?(elancaster)
:cmore, yes. I'll be back from PTO tomorrow.
Quick update about this: the team agrees we want to do this. Martin Best is working on designing the parameters of the funnelcake. Felipe is on point to confirm what is stated in Comment 43 is true (or not).
Flags: needinfo?(elancaster) → needinfo?(felipc)
Does a funnelcake build have a different channel name?

(In reply to Chris More [:cmore] from comment #43)
> :elan: can you help drive internally if we can do a cohort test of new users
> to understand retention rates of e10s?
> 
> For everyone:
> 
> If we do a cohort test with a funnelcake, we have to make sure that both the
> control and experiment cohorts stay consistent over time.
> 
> Cohort A: control (no e10s from the start and should not get e10s for at
> least 6 weeks after profile creation date)
> 
> Cohort B: experiment (should have e10s from the start and *always* have e10s)


We can kinda do this, but it's not possible to guarantee that a user will always remain on a cohort, specially on experiment, because they might install an add-on or fall into any other rule that deactivates e10s.

But this is no problem because the system properly tags anyone who changes or gets disqualified, so it's easy to filter these people out from the others who remained in the same cohort all the time.
Flags: needinfo?(felipc)
Just to give my bit, it would be great to do a funnel cake sooner than later to make sure we have some baseline data that we can use later to see how things are evolving.  e10s vs non-e10s gives us an opportunity to see if our expectations match the data we see.  Right now, content separation, we hope to just not have things regress.  Another opportunity will be multiple-content-processes where we would like to know if we see a improvement in retention.
Flags: needinfo?(mbest)
(In reply to Martin Best (:mbest) from comment #47)
> Just to give my bit, it would be great to do a funnel cake sooner than later
> to make sure we have some baseline data that we can use later to see how
> things are evolving.  e10s vs non-e10s gives us an opportunity to see if our
> expectations match the data we see.  Right now, content separation, we hope
> to just not have things regress.  Another opportunity will be
> multiple-content-processes where we would like to know if we see a
> improvement in retention.

+1! My sense is that we should see retention be flat in higher spec systems, higher for e10s in lower spc systems, and I'd really like to see how things improve with multi and subsequent features like the compositor process work.
You need to log in before you can comment on or make changes to this bug.