[Experimentation Support] Research Variance Shift Pre/Post Enrollment
Categories
(Data Science :: Investigation, task)
Tracking
(Not tracked)
People
(Reporter: shong, Assigned: shong)
Details
Brief description of the request:
I've noticed a phenomenon with some of my experiments (Pocket 4 Rows experiment, Pocket SPOCs experiment) where the variance for various usage metrics seems to shift massively from the pre-enrollment period to the post-enrollment period. For example:
- For all profiles enrolled in the experiment, for each, get the total number of searches in the 28 days before (pre) and the 28 days after enrollment. If they didn't appear before enrollment, count them as a 0 for this metric.
- Get the variance of this metric for the group.
- The pre-period seems to be a massively more then the post period.
This is suspicious and needs investigation. These are my theories for why this is happening:
- 1: Cloned profiles are causing large outliers in the pre-group. For the post-group, since there are filter criteria, only a subset of the cloned profiles are getting sampled in to the experiment and thus, the magnitude of these outliers is smaller.
- 2: Unenrollment is causing us to only see a subset of some profile's usage history in the post period. This is causing less / smaller outliers.
Goals/Deliverables:
milestone 1: Find a way to explain this phenomenon. Produce a report.
milestone 2: Understand how this might affect experiment results. Include in report.
milestone 3: Recommendations for how to deal with and mitigate pt2. Include in report.
Unresolved issue: where should these findings "live" and how should they be disseminated?
Link to any assets:
Is there a specific data scientist you would like or someone who has helped to triage this request:
- assigned by self to self
Assignee | ||
Comment 1•6 years ago
|
||
Tim brought up a good point. The Normandy CID which determines enrollment might not map to Telemetry CID directly. I.E. in the case of cloned profiles, while multiple profiles might share a Telemetry CID, they might not necessarily share the Normandy CID (which would result in lower variance).
Action: Research and investigate how Normandy CIDs work.
Assignee | ||
Comment 2•5 years ago
|
||
Investigation in progress. work is being captured in this document
Comment 3•5 years ago
|
||
Work for the DS team is now tracked in Jira. You can search with the Data Science Jira project for the corresponding ticket.
Description
•