49 bytes, text/x-github-pull-request
|Details | Review|
The ideal source of first session profiles would come through the use of a shutdown ping sender. This would impact current engagement metrics, and is therefore somewhat risky. The new profile ping captures new profiles generated on first sessions, and is completely separate from the main ping. Some preliminary analysis has been done to validate that they behave as expected on the beta channel (bug 1386609). This can be added to the churn dataset in order to capture first session profiles. However, there are a few problems that makes this less than straighforward: * The new profile ping's impact has not been measured on release (bug 1381487) * The new profile ping does not capture the entire set of usage metrics (subsession length, uri counts) because it is sent within the first 30 minutes of browsing. * The current ETL code does not support reusable data cleaning for a separate data source * Historical data will be very different than data going forward
Flyby comment: do we need to care about "bot" profiles here? Do we need to filter them out?
I don't think we should or can filter out "bot" profiles, but instead have an easy way to query the data that is as close to the truth as possible. Filtering down by things like attribution should remove a significant amount of noise relative to the entire release population.
Assignee: nobody → amiyaguchi
Points: 3 → 2
Priority: -- → P1
This change will result in a version bump of the dataset because this significantly changes the data. However, the set of fields are not going to change very much aside from the addition of an "is_new_profile" field. It should be feasible to join the two datasets together.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.