Sanity check binarized search metrics across various experiments
Categories
(Data Science :: Investigation, task)
Tracking
(Not tracked)
People
(Reporter: flawrence, Assigned: flawrence)
Details
Brief Description of the request (required):
In the analysis of the third experiment associated with bug 1522309, flawrence observed significance with a 99.5% CI for a change in the number of very heavy searchers, and possibly a significant difference between the two experiment groups in the weeks before enrollment (i.e. before being exposed to the experiment). This should be a 1 in ~200 event since the arrow of time rules out causation, but bmiroglio and harter claim to frequently see similar effects. We need to work out whether the "frequently seen" effects also appear when using the should-be-robust binarized metrics, and if so then we need to identify the error in the stats or flawrence's understanding of the stats, or the bias in our experimentation framework.
Business purpose for this request (required):
We need our A/B tests to be reliable.
Requested timelines for the request or how this fits into roadmaps or critical decisions (required):
We're constantly running and analysing experiments and need them to be reliable.
Links to any assets (e.g Start of a PHD, BRD; any document that helps describe the project):
Name of Data Scientist (If Applicable):
Please note if it is found that not enough information has been given this will delay the triage of this request.
Updated•6 years ago
|
Assignee | ||
Comment 1•6 years ago
|
||
Another example: In the second version of the strict list experiment, we observed a significant increase in >10 ad clickers in the first week of the experiment. We did not observe a similar increase in the third version. Was the result in the second version based on unfair branch allocation?
Comment 2•5 years ago
|
||
Work for the DS team is now tracked in Jira. You can search with the Data Science Jira project for the corresponding ticket.
Description
•