Closed Bug 1491379 Opened 6 years ago Closed 6 years ago

[Shield] Opt-out Study: Activity Stream Contextual Feature Recommender, beta 63

Categories

(Shield :: Shield Study, defect, P1)

defect

Tracking

(firefox63+ fixed)

RESOLVED FIXED
Tracking Status
firefox63 + fixed

People

(Reporter: tspurway, Assigned: tspurway)

References

(Blocks 1 open bug)

Details

What is the preference we will be changing?
(note: this is an AS preference and requires the user branch)
browser.newtabpage.activity-stream.asrouter.messageProviders

JSON Values for different cohorts:

Control: 
[]

Cohort 1: 
[{"id":"cfr", "cohort": "one_per_day", "frequency": {"custom": [{"period": "daily", "cap": 1}]}, "type":"local","localProvider":"CFRMessageProvider","enabled":true}]

Cohort 2: 
[{"id":"cfr", "cohort": "three_per_day", "frequency": {"custom": [{"period": "daily", "cap": 1}]}, "type":"local","localProvider":"CFRMessageProvider","enabled":true}]

(The following cohorts are awaiting data/legal/product approval - see note below)

Cohort 3:
[{"id":"cfr", "cohort": "one_per_day_amazon", "frequency": {"custom": [{"period": "daily", "cap": 1}]}, "type":"local","localProvider":"CFRMessageProvider","enabled":true}]

Cohort 4:
[{"id":"cfr", "cohort": "three_per_day_amazon", "frequency": {"custom": [{"period": "daily", "cap": 1}]}, "type":"local","localProvider":"CFRMessageProvider","enabled":true}]


What independent variable(s) (IVs) are you manipulating to affect measurements of your DV(s)? What different levels (values) can each IV assume?
CFR enabled Binary -- users either have CFR enabled or they don’t.

What percentage of users do you want in each branch?
Branches (Existing users only) - 50% of Beta users with 10% in each cohort:
Control: People who meet the criteria (so similar type of user) - but we don't show CFR.

Experiment 1 (include notifications for 5 add-ons: Facebook Container, Reddit Enhancement Suite, Wikipedia Context Menu Search, Enhancer for YouTube, To Google Translate).  Show all recommendations once (if hit threshold).  Max one per day across all add ons.

Experiment 2 (incl. notifications for for 5 add-ons: Facebook Container, Reddit Enhancement Suite, Wikipedia Context Menu Search, Enhancer for YouTube, To Google Translate).  Show recommendation three times (if hit the threshold).  Max one per day across all add ons.

These additional cohorts are pending legal / data review and approval from Marshal and Kev.  Note that this approval does not block this experiment.  If we can’t get the approvals in time, let’s launch the experiment with the above three cohorts.

Experiment 3 (incl. notifications for 6 add-ons: Facebook Container, Reddit Enhancement Suite, Wikipedia Context Menu Search, Enhancer for YouTube, To Google Translate, Amazon Assistant).  Show all recommendations once (if hit threshold).  Max one per day across all add ons.

Experiment 4 (incl. notifications for 6 add-ons: Facebook Container, Reddit Enhancement Suite, Wikipedia Context Menu Search, Enhancer for YouTube, To Google Translate, Amazon Assistant).  Show recommendation three times (if hit the threshold).  Max one per day across all add ons.


What Channels and locales do you intend to ship to? 
Channel: Firefox 63 Beta
Locales:  EN-US only

What is your intended go live date and how long will the study run? 
Begin Monday, 9/24 in Beta
Run for the entire Beta period.
Begin analysis Monday, 10/8.


Are there specific criteria for participants?  
Locale: en-US
Release channel: Beta
Release version: 63
Profile Age:  Depending on the topsites that are used to trigger the recommendations.

What is the main effect you are looking for and what data will you use to make these decisions? 

Main effect - neutral to positive engagement with recommendations

KPIs:
Engagement with recommendation 
Neutral to positive adoption of recommendations 

Sentiment analysis 
Positive experience from the recommendation

3 week Retention (likely NOT used for the Beta experiment)
Neutral or positive gain.  It’s unlikely that we move much with a single recommendation, but we want to make sure we don’t see anything negative. 
 
As CFR grows and we have more helpful suggestion, we hope to positively move the retention needle.  

Telemetry Probes
(see CFR Data Collection Proposal - https://docs.google.com/document/d/1tEsm8lgyHvn6lLHYsXPHi3kUM670n-JwPP0zeS8_3X0/edit#heading=h.7o9r1d8rkheb)

We have worked with Policy and Legal to have a different data collection requirements for Shield vs. release.  client_id will not be collected in the release channel  (but will be used collected in Beta and any Shield studies).

Who is the owner of the data analysis for this study? 
Ben Miroglio

Will this experiment require uplift? 
No 

QA Status of your code: 
This is a regular feature code, tested by QA using the existing development process

Do you plan on surveying users at the end of the study?
Yes, will run a survey at the end of experiment (Survey design still TBD)


Post experiment plan?
Rollout CFR in Beta (based on data results)
Launch a similar experiment in Release (covered by a separate PHD )

Link to any relevant google docs / Drive files that describe the project. Links to prior art if it exists:
Contextual Feature Recommender PRD - https://docs.google.com/document/d/1gUu5j0nZuT2OrigXNW73JgkWFcKGa62s4C0fMFS1BA4/edit?ts=5b7c7a70#

CFR Recipe Reference - https://docs.google.com/document/d/1hpzsQB9gL93HSxzKOjBBVvf45-ATGxwgl1eIzMIBYvA/edit#

List of bugs - https://as-bugzy.herokuapp.com/feature/1471328
Kamyar, I set you as the Shield Study owner for your signoff.

:fmarier and :mika has already signed off on the privacy and legal in bug #1484035, so I am not 'needinfo'ing them for anything

All of the code in CFR and AS Router has been reviewed and r+'d by a Firefox peer in the normal course of development of this feature, so no additional needinfos are needed by a peer.

:mcoman - i believe you are assigned QA for this study, so you have been needinfo'd for your signoff

:osunick - because this bug has been tagged as 'High Risk' because we are recommending 3rd party functionality, which increases brand risk, we need VP level signoff for this study to commence
Flags: needinfo?(nnguyen)
Flags: needinfo?(marius.coman)
Flags: needinfo?(kardekani)
:kev and :merwin, I am needinfo'ing you explicitly for the Amazon Assistant part of the study.  We have split the five cohorts into two groups.  

The first group is a control cohort and two experiment cohorts that have the five *non-Amazon* add-ons.  These are already 'approved' and are in the study and do not need signoff to run the study on these cohorts. 

The second, conditional group is a set of two cohorts that have six add-ons, one of which is the Amazon add-on.  Because of privacy and legal concerns raised by :merwin, I have separated these cohorts and their own group.  The signoff here is explicitly for this group.

In the event we cannot get signoff for the second group, the study will launch on the targeted date *without* the Amazon cohorts.
Flags: needinfo?(merwin)
Flags: needinfo?(kev)
Ack'd and understood. An updated version addressing locale breakage will be required regardless, so we'll need to hold review and auth for the extension until we have it. Working with :merwin to address non-code review. Will leave needinfos in place, and agree they will not gate other tests.
[Tracking Requested - why for this release]:
Summary: [Shield] Opt-out Study: Activity Stream Contextual Feature Recommender → [Shield] Opt-out Study: Activity Stream Contextual Feature Recommender, beta 63
Changing experiment design sign-off to Ilana.
Flags: needinfo?(kardekani) → needinfo?(isegall)
Blocks: 1471328
Priority: -- → P1
Depends on: 1492174, 1488758
Blocks: 1452724
I am clearing :merwin and :kev needinfos for the Amazon Assistant add-on approval.  We are getting ready to launch the study and there is not enough time to incorporate the add-on even if we were to get approval today.
Flags: needinfo?(merwin)
Flags: needinfo?(kev)
Since we are scaling this back to 3 cohorts (control, 5 addons w/ 1 freq cap, 5 addons w/ 3 freq cap), let's scale back our overall percentage of users in Beta to 30% of total population, with 10% for each cohort.
Due to Q/A concerns about being able to adequately test CFR functionality, the release date of this study has been delayed to October 1st
Depends on: 1492454
Depends on: 1493140
Depends on: 1494275
We have finalized testing the Contextual Feature Recommender Shield Study experiment.

QA’s recommendation: YELLOW - SHIP CONDITIONALLY

Reasoning:
- All issues found have been fixed and verified.
- Given the way that CFR is set up to work, we could fully verify the Experiment 2 branch of the study. The reason being that only one recommendation is shown in a day, up to a maximum of 3 times per website for 5 websites. We could not verify that all recommendations stop showing once conditions are met in the same profile because changing the system time more than 7 days in the future will no longer load websites. However, we have managed to verify this behavior in the Experiment 1 branch of the study, but we can't be 100% sure that the Experiment 2 branch will perform the same.

Testing Summary:
- Test suite: TestRail

Tested Platforms:
- Windows 10 x64
- Mac 10.13.3
- Arch Linux 4.13 x64

Tested Firefox versions:
- Firefox Beta build v63.0b9
Flags: needinfo?(marius.coman) → shield-qa+
All patches requested for uplift for this study are now in 63 beta 10, so this as release management approval.
approved
Flags: needinfo?(nnguyen)
Science: R+
Flags: needinfo?(isegall)
This study is live
Thanks, :jgaunt! 

We made a mistake in naming the study's 'slug' id, and aren't getting data into the Tiles pipeline.  Unfortunately, we need to close the existing recipe, clone it with the following changes, and relaunch it:

Recipe Slug:

pref-flip-activity-stream-cfr-1491379
Flags: needinfo?(jgaunt)
Flags: needinfo?(jgaunt) → needinfo?(mgrimes)
Depends on: 1496181
Disabled previous recipe, cloned with new slug, re-deployed to a sampling block that doesn't overlap with the previous.
Flags: needinfo?(mgrimes)
This study is now live
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
This study has ended.
You need to log in before you can comment on or make changes to this bug.