Closed
Bug 1223895
Opened 10 years ago
Closed 8 years ago
Deploy a Heartbeat survey on e10s A/B experiment participants
Categories
(Toolkit :: Telemetry, defect)
Toolkit
Telemetry
Tracking
()
People
(Reporter: vladan, Assigned: elan)
References
Details
I would like to deploy a Heartbeat survey asking e10s & non-e10s users to rate their satisfaction with Firefox performance and analyze their results.
The data from this survey could also be used to develop a "session responsiveness" scoring formula.
Updated•10 years ago
|
tracking-e10s:
--- → +
As keeper of surveyGizmo, I approve this idea.
Design proposal for study.
Claims:
1. e10s has several mechanisms of harming and aiding users.
2. For NEW users, faster vidyo performance should be GOOD. (all upside).
3. For EXISTING users, e10s can break addons, degrading experience.
I would like to see it as part of a FORCED SWITCH (e10s on/off randomly assigned) for a subset of NEW and EXISTING firefox users.
Starting from users with e10s OFF, 4 cohorts:
- new users, e10s forces on.
- new users, e10s forced off.
- existing users (with addons?), e10s forced on.
- existing users (with addons?), e10s forced off.
I would like to see these outcomes tracked (by cohort):
- retention
- total usage
Covariates to track:
- channel
- addons
Comment 2•10 years ago
|
||
Adding Jeff since we just spoke about this yesterday.
![]() |
||
Updated•9 years ago
|
Flags: needinfo?(elancaster)
Assignee | ||
Comment 3•9 years ago
|
||
Confirming we'd like to do this for Firefox 46 Beta Experiment (Phase 2, March 21. If we could do it sooner, that would be even better). We should compare the heartbeat score of e10s and non-e10s cohorts. I'll follow up on getting Bug 1193535 uplifted.
Comment 4•9 years ago
|
||
(In reply to Erin Lancaster [:elan] from comment #3)
> I'll follow up on getting Bug 1193535 uplifted.
It's should already be in 46.
Update: We are indeed seeing data coming in (we confused merge date and deploy date for our first check). Volume by date, for any interested parties:
[(u'20160316', 2689),
(u'20160315', 2751),
(u'20160314', 2167),
(u'20160313', 1309),
(u'20160312', 637),
(u'20160311', 10)]
Assignee | ||
Comment 6•9 years ago
|
||
:Ilana how are ratings looking? Is it possible to do any comparisons at this point? Thank you.
Flags: needinfo?(elancaster) → needinfo?(isegall)
Felipe,
I need a little more info to make sure I'm doing a valid comparison across populations. Here's what I've understood from my conversations with elan so far, and some points that need clarification:
- Only consider users who were ELIGIBLE for experiment 3, which had two rollout phases. Compare ratings from the e10s enabled group to the same population that was not exposed to e10s.
Eligibility for experiment 3 is:
- Beta 46
- rating submitted on 3/23 or later
- no addons (I assume besides system addons - pocket and hello)
- no r to l languages (is this in a pref?)
- no accessibility issues (is this in a pref?)
Implementation:
- In phase 1, apz was disabled, and in phase 2, it was enabled. Were these done at the same or different times?
- Phase 1 subjects can be identified by having an xpi - what is it called?
- Phase 2 subjects can be identified by having the e10srollout@mozilla.org addon.
- 20% of eligible users should be in either phase 1 or phase 2
Additionally:
- participants in experiments 1 and 2 should be discarded.
--> Can we can identify the users who saw exps 1 and 2 by a flipped e10s pref? We want to remove them from consideration. If so, which pref is this? If not, how can we identify these users?
If any of this is incorrect, please let me know, and let me know if you can answer the questions above.
Flags: needinfo?(isegall) → needinfo?(felipc)
Comment hidden (obsolete) |
Comment 9•9 years ago
|
||
Hmm so maybe scratch that entire comment out. MattN has explained to me how heartbeat works. I thought from the bug description that we were still going to deploy a survey and would need to identify users post-experiments.
My understanding now is that this survey already ran while the e10s A/B experiments where running, and the ratings are tied to telemetry pings, and now we want to analyse the ratings with the data that came in. Please let me know if this understanding is now correct.
Assuming this is true, here's how to find the users who were part of these experiments:
==== Phase 1 =====
Look for activeExperiment { id, branch } here: http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/docs/environment.rst?rev=b9ad1239e4b2#238
you should look for an id of: "e10s-beta46-noapz@experiments.mozilla.org"
The branch will tell you whether users were part of test/control groups. As mentioned above, only consider the users with these branches:
- "control-no-addons": users who were part of the control group
- "experiment-no-addons": users who were part of the test group
and ignore everyone else.
This should be enough to fully identify users who were part of this experiment, so you don't need date checks there. If you still need to know, this phase ran from Mar 9 - Mar 21.
==== Phase 2 =====
For this one, you'll need to look for e10sCohort here: http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/docs/environment.rst?rev=b9ad1239e4b2#46
Ignore users who don't have that set, as that was before the phase started. Here you need to consider only users with values "test" or "control".
In this case, you'll have to look at submissions only up to Apr 3, because the value will continue to exist afterwards, but ~100% of users will become "control" after that.
Comment 10•9 years ago
|
||
This is exactly what I needed. Thank you for being so fast. And good job for including a control group, team!!!
I imagine that users who are in control and test are not conflated with users who may have e10s from another source (like one of the first two experiments, or manual pref switch), correct?
Comment 11•9 years ago
|
||
(In reply to Ilana from comment #10)
> I imagine that users who are in control and test are not conflated with
> users who may have e10s from another source (like one of the first two
> experiments, or manual pref switch), correct?
Right, manual pref switch users are on different groups not mentioned here. From the previous experiments, the test/control groups had their prefs cleared back to the default settings when the experiment finishes, so there's no effect of a past experiment on the distribution of the newer ones.
Comment 12•9 years ago
|
||
Erin is driving the Heartbeat survey.
Assignee: nobody → elancaster
status-firefox46:
--- → affected
status-firefox47:
--- → affected
status-firefox48:
--- → affected
Comment 13•9 years ago
|
||
Do we know how long the experiment was deployed to users before HB survey was popped? Additionally, for those in the user-disabled, etc, category, do we have any way of knowing which branch they were in before the disable?
I assume the answer to both questions is no, but I want to check before presenting results.
Flags: needinfo?(fgomes)
Comment 14•9 years ago
|
||
(In reply to Ilana from comment #13)
> Do we know how long the experiment was deployed to users before HB survey
> was popped? Additionally, for those in the user-disabled, etc, category, do
> we have any way of knowing which branch they were in before the disable?
I don't know how Heartbeat pops its survey. The first experiment ran from Mar 9 to Mar 21, and the second from Mar 22 to Apr 4. Perhaps heartbeat also stores the date that the user answered it, and you can compare from that? Other than that, I don't know.
About the users in user-disabled branch, it's probably possible to do that if you do a longitudinal analysis looking for other telemetry pings from the same user, and check the previous values. But I think that's not worth it, as the number of users in those branches is small, and maybe they already started the experiment on those branches due to pre-existing prefs.
Flags: needinfo?(fgomes)
Assignee | ||
Comment 15•9 years ago
|
||
Are we good with HB working in Beta 48? Confirming we are extending the survey through this beta cycle.
Flags: needinfo?(isegall)
Comment 16•9 years ago
|
||
We do have the standard heartbeat survey running on Beta right now. The general satisfaction score is running at all times. @ilana - How reusable is the analysis code you wrote for the first beta test? If I remember correctly it should be fairly easy to duplicate.
Comment 17•9 years ago
|
||
(In reply to Matt Grimes [:Matt_G] from comment #16)
> How reusable is the analysis code you wrote for the first beta test?
100% reusable. Are we looking for the same metric - no significant loss of satisfaction? If so, do you want that as "overall," or channel-by-channel?
Flags: needinfo?(isegall)
Comment 19•9 years ago
|
||
We want to continuously compare the "test" and "control" branches of the e10s experiment on beta, and starting with 48 release we want to compare the "test" and "control" branches on the release channel as well (separately by channel, since comparing channels is unlikely to be useful).
Flags: needinfo?(benjamin)
Assignee | ||
Comment 20•9 years ago
|
||
Ilana, can we obtain an updated report soon?
Flags: needinfo?(isegall)
Comment 21•9 years ago
|
||
Just as an FYI, we are working on getting a dashboard up that will allow you to follow along in realtime as e10s moves through the trains and hits various milestones. I don't have a firm ETA, but Ilana should be able to provide adhoc reports until it is up.
Comment 22•9 years ago
|
||
I can definitely run an updated report.
Are the experiment arms named the same thing? Also, are you interested in looking at all data aggregated since the last time the experiment was run (4/6) or only the most recent version? (The question that we addressed was "Is satisfaction the same for e10s users and non e10s users?" so that question can be answered for "all time," "recently," or by version).
Current report is at https://gist.github.com/ilanasegall/ec88e71e24e04d4c3f94ce2d6c58777d for reference.
Flags: needinfo?(isegall) → needinfo?(elancaster)
Assignee | ||
Comment 23•9 years ago
|
||
By version would be good. We are wanting Firefox 48 at this point.
Flags: needinfo?(elancaster) → needinfo?
Assignee | ||
Updated•9 years ago
|
Flags: needinfo? → needinfo?(felipc)
Comment 24•9 years ago
|
||
Hi, this time finding the branches is a bit different. The data to look for is no longer in environment.addons.activeExperiment, but rather at environment.settings.e10sCohort
That will contain the branch name, and you should compare users between the branches "test" and "control", and ignore any other name.
Flags: needinfo?(felipc)
Assignee | ||
Comment 25•9 years ago
|
||
Hopefully, this is all the information you need. Thank you!
Flags: needinfo?(isegall)
Comment 26•9 years ago
|
||
Outstanding. And you are interested in Firefox 48 across only beta from 6/7 onwards?
Flags: needinfo?(isegall) → needinfo?(felipc)
Assignee | ||
Comment 27•9 years ago
|
||
If the sample size is big enough for the data to be meaningful, that's perfect. Beta 6 + sounds good.
Updated•9 years ago
|
Flags: needinfo?(felipc)
Comment 28•9 years ago
|
||
Updated report available at https://gist.github.com/ilanasegall/1dea80ff88647d8d98001bc46ee5f354
Bottom line: e10s cohort NOT statistically less satisfied, though we have a much smaller sample than we'd like.
note: original report can be seen here: https://gist.github.com/ilanasegall/717ac5fc9229b1c1548c1930fe644452/f356ef875a768ed4ac85c99af7fff768c8d69bdc#file-hb_explore-ipynb
Comment 29•8 years ago
|
||
e10s shipped, this is no longer needed. Closing it off!
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•