[Growth] New Profile Definitions Metrics Comparisons Investigation
Categories
(Data Science :: Investigation, task)
Tracking
(Not tracked)
People
(Reporter: jmccrosky, Assigned: shong)
Details
Brief description of the request:
We have a number of different ways to measure new profiles or new users of Firefox. We should develop standardized metrics for these concepts.
Link to any assets:
Is there a specific data scientist you would like or someone who has helped to triage this request:
Please assign to Su for now, as he's working on a proposal related to this.
| Reporter | ||
Comment 1•7 years ago
|
||
Rosanne Scholl made the excellent suggestion:
We could ask, in a survey whether the user is a new user or not, in a Heartbeat survey targeted to new profiles. That would give us an estimate of the proportion of new profiles that are new installs, new profiles in existing installs, new downloads on computer that previously had fx, new computers of existing users, returning users after some time with the competition, and brand-new never-before fx users.
This shouldn't block Su's current proposal work, but we should definitely do this at some point.
| Reporter | ||
Updated•7 years ago
|
Comment 2•7 years ago
|
||
+1
Updated•7 years ago
|
| Assignee | ||
Updated•7 years ago
|
| Assignee | ||
Comment 3•7 years ago
•
|
||
Update:
Analysis of New Profile Definitions has been completed in this notebook:
Next Steps:
-
1: Write up non-technical summary of findings (google doc). Include:
- Advantages of each definition
- Disadvantage of each definition
- What profiles we "lose" with each definition.
- Open questions regarding them
-
2: Do in depth analysis of "lost" profiles
- Using the 7 day PCD definition, we are "missing" around 25% of the new profiles using first appearance definition in 2018.
- What do we "lose" by ignoring those new profiles?
- define the comparisons by subset of new profiles by new appearance that have PCD more then 7 days away (this should map to the difference from 7 day PCD definition)
- find what the contribution to the following metrics for that subset and the subset that's not in that subset
- DAU
- Searches
- URI
- hours
- active hours
-
3: Make a recommendation of "official definition for new profiles" using the above data
| Assignee | ||
Comment 4•6 years ago
•
|
||
Update: 4/26/2019
I've completed Part2 of the investigation (next step #2 above) here:
And I've written up a non-technical summary of findings (next step #1 above) along with official recommendation (step #3 above) here:
Next steps:
- 1: share with Jesse
- 2: SMOOT team + data science team for review and feedback
- 3: ?
| Reporter | ||
Comment 5•6 years ago
|
||
This looks great! Thanks :) Let's bring this to the team. Will discuss strategy with you offline.
| Assignee | ||
Comment 6•6 years ago
•
|
||
Feedback: Improvements for Proposal
Contextualize when we want to use a new profile definition
We might want to include use cases for "new profile" definition
- a time series of profile flux
- observing differences in the behaviors of new-profile cohorts who started using Firefox at different times
- targeting profiles for experiments
- classifying profiles who appear in an experiment
Be clear about what use cases this covers, and what it doesn't. Note that this won't cover targeting profiles in experiments.
Thought: maybe include this in appendix?
Background info on PCD (and First Use Date)
Add descriptions of what PCD and firstUseDate actually physically measure. Include summary and documentation for known issues with PCD. Include any analysis (dependent on this bug) comparing the two.
Add analysis comparison for more new profile definitions
- PCD == 1 (no reset)
- recieve new profile ping
Explicit about why these are being rejected
Footnotes: Explaing PCD Windowing and Technical/Performance Comparison
Add footnote why we're defining PCD based definitions with a window.
Add footnote on technical / performacne search - space tradeoffs for each definition. Or in it's own section.
Additional Analysis between included and excluded PCD_7 Group
To strengthen the case that these populations are similar and their usage patterns overlap, maybe include some comparisons of distributions for different the groups (for different metrics).
[Important!] Investigate client_id rolling
Tim brought this up, apparently, when a profile turns on and turns off Telemetry, their client_id gets reset. This could have major impact on the interpretation of the proposed definition (FA).
References:
- Telemetry scalar for Telemetry re-enable
- Fix "optout" ping
- Check comment 3 by Chutten
- code which resets subsession and profile subsession counter
- Look into opt-out pings, and any other data around this behavior
Actions:
- Test behavior locally
| Assignee | ||
Comment 7•6 years ago
|
||
I'll be on PTO / leave for the next 5 weeks, so feel free to reassign if this becomes a priority. Otherwise, I'll pick this up again when I'm back / have bandwidth.
Comment 8•5 years ago
|
||
Work for the DS team is now tracked in Jira. You can search with the Data Science Jira project for the corresponding ticket.
Description
•