Closed Bug 1539762 Opened 7 years ago Closed 5 years ago

[Growth] New Profile Definitions Metrics Comparisons Investigation

Categories

(Data Science :: Investigation, task)

x86_64
Unspecified
task
Not set
normal
Points:
3

Tracking

(Not tracked)

RESOLVED INACTIVE

People

(Reporter: jmccrosky, Assigned: shong)

Details

Brief description of the request:

We have a number of different ways to measure new profiles or new users of Firefox. We should develop standardized metrics for these concepts.

Link to any assets:

Is there a specific data scientist you would like or someone who has helped to triage this request:

Please assign to Su for now, as he's working on a proposal related to this.

Rosanne Scholl made the excellent suggestion:

We could ask, in a survey whether the user is a new user or not, in a Heartbeat survey targeted to new profiles. That would give us an estimate of the proportion of new profiles that are new installs, new profiles in existing installs, new downloads on computer that previously had fx, new computers of existing users, returning users after some time with the competition, and brand-new never-before fx users.

This shouldn't block Su's current proposal work, but we should definitely do this at some point.

Points: --- → 3

+1

Assignee: nobody → shong
Summary: Investigate new user metrics → [Growth] New Profile Definitions Metrics Comparisons Investigation

Update:

Analysis of New Profile Definitions has been completed in this notebook:

Next Steps:

  • 1: Write up non-technical summary of findings (google doc). Include:

    • Advantages of each definition
    • Disadvantage of each definition
    • What profiles we "lose" with each definition.
    • Open questions regarding them
  • 2: Do in depth analysis of "lost" profiles

    • Using the 7 day PCD definition, we are "missing" around 25% of the new profiles using first appearance definition in 2018.
    • What do we "lose" by ignoring those new profiles?
      • define the comparisons by subset of new profiles by new appearance that have PCD more then 7 days away (this should map to the difference from 7 day PCD definition)
      • find what the contribution to the following metrics for that subset and the subset that's not in that subset
        • DAU
        • Searches
        • URI
        • hours
        • active hours
  • 3: Make a recommendation of "official definition for new profiles" using the above data

Status: NEW → ASSIGNED

Update: 4/26/2019

I've completed Part2 of the investigation (next step #2 above) here:

And I've written up a non-technical summary of findings (next step #1 above) along with official recommendation (step #3 above) here:

Next steps:

  • 1: share with Jesse
  • 2: SMOOT team + data science team for review and feedback
  • 3: ?

This looks great! Thanks :) Let's bring this to the team. Will discuss strategy with you offline.

Feedback: Improvements for Proposal

Contextualize when we want to use a new profile definition

Comment

We might want to include use cases for "new profile" definition

  • a time series of profile flux
  • observing differences in the behaviors of new-profile cohorts who started using Firefox at different times
  • targeting profiles for experiments
  • classifying profiles who appear in an experiment

Be clear about what use cases this covers, and what it doesn't. Note that this won't cover targeting profiles in experiments.

Thought: maybe include this in appendix?

Background info on PCD (and First Use Date)

Comment

Add descriptions of what PCD and firstUseDate actually physically measure. Include summary and documentation for known issues with PCD. Include any analysis (dependent on this bug) comparing the two.

Add analysis comparison for more new profile definitions

Comment1

  • PCD == 1 (no reset)
  • recieve new profile ping

Explicit about why these are being rejected

Footnotes: Explaing PCD Windowing and Technical/Performance Comparison

Add footnote why we're defining PCD based definitions with a window.

Add footnote on technical / performacne search - space tradeoffs for each definition. Or in it's own section.

Additional Analysis between included and excluded PCD_7 Group

To strengthen the case that these populations are similar and their usage patterns overlap, maybe include some comparisons of distributions for different the groups (for different metrics).

[Important!] Investigate client_id rolling

Tim brought this up, apparently, when a profile turns on and turns off Telemetry, their client_id gets reset. This could have major impact on the interpretation of the proposed definition (FA).

References:

Actions:

  • Test behavior locally

I'll be on PTO / leave for the next 5 weeks, so feel free to reassign if this becomes a priority. Otherwise, I'll pick this up again when I'm back / have bandwidth.

Work for the DS team is now tracked in Jira. You can search with the Data Science Jira project for the corresponding ticket.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.