Data science support for bootstrap process pref flip study
Categories
(Data Science :: Experiment Collaboration, task)
Tracking
(Not tracked)
People
(Reporter: RT, Assigned: wbeard)
References
()
Details
Attachments
(1 file)
Updated•7 years ago
|
Reporter | ||
Comment 2•7 years ago
|
||
Reporter | ||
Comment 3•6 years ago
|
||
Update: this is now tracking Firefox 67 given delays on feature implementation.
Updated plan:
- User acquisition period: 1 week, starting May 20th
- Data collection period: 3 weeks, starting May 27th
- Data analysis period: 1 week, starting June 17th
Comment 4•6 years ago
|
||
Hi Romain,
When are you thinking that you would like to have someone start to make sure your approach is right for the probes? We don't want to assign the bug until someone will be actively working on it.
Reporter | ||
Comment 5•6 years ago
|
||
(In reply to Jess Mokrzecki [:jmok] from comment #4)
Hi Romain,
When are you thinking that you would like to have someone start to make sure your approach is right for the probes? We don't want to assign the bug until someone will be actively working on it.
67 is now on Nightly and we need data science support now to get the experimenter process started and address any suggestions data science may bring whilst we have the flexibility to do so on nightly - the main thing coming to mind is validating that all necessary probes are available as expected but I'm sure other things will come up through data science review.
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Comment 6•6 years ago
|
||
Comment 7•6 years ago
|
||
Updated•6 years ago
|
Comment 8•6 years ago
|
||
Mea culpa; I hadn't seen that it was already in Experimenter. πββοΈ
This looks good to me but if we expect certain kinds of breakage we might want to look for, or implement, probes that reflect that breakage more directly -- for example, maybe crash rates or counts of TLS errors, since browser usage and retention are relatively insensitive to inconveniences.
I also noticed that the old PHD described a list of populations to examine separately, and it would be good to make sure that you're powered to run all of those in each of those populations if those comparisons are important.
It would be good to double-check with Normandy engineers that the default branch will work for this pref, since it sounds like this involves early initialization -- I'm not totally sure how it works but I know that WebRender needed a user-branch pref in order to enable itself correctly.
What is the goal of the effort the experiment is supporting?
To improve DLL injection blocking by using a stub launcher process on Windows.
Is an experiment a useful next step towards this goal?
Yes; this deploys the stub launcher to a subset of Windows users.
What is the hypothesis or research question? Are the consequences for the top-level goal clear if the hypothesis is confirmed or rejected?
The stub launcher should have no negative impact on the user experience; a detectable negative effect would require diagnosis.
Which measurements will be taken, and how do they support the hypothesis and goal? Are these measurements available in the targeted release channels? Has there been data steward review of the collection?
URI count, usage hours, and retention are the proxies for the user experience. They are available and reviewed. launcherProcessState in the telemetry environment landed in 66.
A couple comments:
- You could throw in active ticks. The more the merrier, you'll pull it for core product metrics anyway, and it could help illuminate any change in usage hours.
- You could ask the data pipeline team to aggregate the launcherProcessState into main_summary if you like.
Is the experiment design supported by an analysis plan? Is it adequate to answer the experimental questions?
Uh, I assert "yes, and yes" based on the existence of a power analysis; the analysis seems straightforward. How do you plan to aggregate the usage metrics? Will you compute per-user sums of usage hours and URIs over the course of the experiment?
Is the requested sample size supported by a power analysis that includes the core product metrics?
Yes. A 2% change in retention seems subjectively large to me; Β―\_(γ)_/Β―. If Romain's happy, I'm happy :)
Updated•6 years ago
|
Updated•6 years ago
|
Description
•