Closed Bug 1557811 Opened 5 years ago Closed 5 years ago

Add startup.profile_selection_reason to new profile ping to enable better identification of new installations

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: gkaberere, Unassigned)

References

Details

Attachments

(1 file, 2 obsolete files)

Bug 1557811 - Include scalars in new-profile ping 5 years ago Jan-Erik Rediger [:janerik] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1557811 - Add profile_selection_reason to the new-profile ping store 5 years ago Jan-Erik Rediger [:janerik] 47 bytes, text/x-phabricator-request		Details \| Review
GitHub Pull Request 5 years ago Jan-Erik Rediger [:janerik] 69 bytes, text/x-github-pull-request		Details \| Review

George Kaberere [:gkaberere]

Reporter

Description

•

5 years ago

In main summary, there's a probe: startup.profile_selection_reason that enables us to see the reason why a profile was selected at startup. This same probe would be extremely valuable in the new profile table as well as it would allow us to see / segment the new profile pings by the reason a profile was created. This should allow us to differentiate between new profiles that are as a result of a freshly installed version vs. due to other reasons giving us a clearer picture of our acquisition funnel and conversion rates in the funnel.

Bug adding telemetry event to main summary can be found here: https://bugzilla.mozilla.org/show_bug.cgi?id=1522934

Mark Reid [:mreid]

Comment 1

•

5 years ago

We can't easily add a field from one ping (main) to the summary table for another ping (new-profile), but this is something we could potentially add to the clients_daily or clients_last_seen datasets - would that help?

Flags: needinfo?(gkaberere)

Jesse McCrosky

Comment 2

•

5 years ago

Perhaps the simplest solution is just to have a table that accumulates a list of profiles that were created due to profile-per-install. Then this can easily be joined with whatever data source a person is using.

Note this is also being discussed here: https://github.com/mozilla/bigquery-etl/issues/212

George Kaberere [:gkaberere]

Reporter

Comment 3

•

5 years ago

I get the challenge we'd have adding it to the summary table. I guess my question is more what would we need to do to get the profile_selection_reason sent as part of the new profile ping itself? I think incorporating it there, would be the cleanest solution long term as it would:

Simplify querying for new profiles. We could simply select the specific profile selection reasons we want to see.
Reduce the impact of retention on the acquisition / new profiles reporting. We don't always see all our new profiles send a first shutdown or main ping so using them to filter possibly means we may still be overstating our new profiles / installs.
Allow us to segment the new profiles data better. Looks like that particular dimension can identify the reason a new ping came about i.e. looks like we could separate out new profiles from an install / first run vs new profiles from other activities e.g a user initiated command line argument. This would be really useful from an acquisition / retention perspective as it would give us a much crisper acquisition number.
Reduce the need for join operations and having to scan through main summary and first shutdowns for specific clients keeping query costs low

Flags: needinfo?(gkaberere) → needinfo?(mreid)

Jesse McCrosky

Comment 4

•

5 years ago

As a data point for future reference my query on main_summary below found 99.96% of the profiles compared to also looking at first_shutdown_summary:

ppi_profiles = spark_instance.table("main_summary").select(
col("client_id").alias("id"),
lit(1).alias("ppi")
).filter(
'''submission_date_s3 >= '20190121'
AND scalar_parent_startup_profile_selection_reason IN (
'firstrun-skipped-default', 'restart-skipped-default'
)'''
)

Mark Reid [:mreid]

Comment 5

•

5 years ago

Jan-Erik, do you know what would be involved in adding this info to the new-profile ping?

Flags: needinfo?(mreid) → needinfo?(jrediger)

Jan-Erik Rediger [:janerik]

Comment 6

•

5 years ago

Thanks to the multi-store this should be rather simple:

Add a record_into_store field to the definition file (Scalars.yaml) (don't forget to keep it in the main store as well)
In sendNewProfilePing fetch the data from the new-profile store and merge it into the payload [^1]
Change the pipeline schema to optionally accepts scalar data

[^1]: We might want to always clear out the new-profile store, just so we don't keep too much data around that's never gonna be used when there's no new-profile ping needed.

I guess technically the Telemetry team owns the new-profile team, so we could work/help with the implementation if needed.

Flags: needinfo?(jrediger)

George Kaberere [:gkaberere]

Reporter

Comment 7

•

5 years ago

Thank Jan-Erik. Sounds like we have a possible path forward. Is this a change that can make it into Fx69 for September 3rd release? If it is, how can we get it prioritized / what are the next steps? We are going to have to do setup skyline reporting and this change would make that reporting a lot more accurate.

Flags: needinfo?(mreid)

Flags: needinfo?(jrediger)

Jan-Erik Rediger [:janerik]

Comment 8

•

5 years ago

I'll file a bug in our client component to track the implementation.
I should be able to prioritize implementation for early next week. Once that lands and we get at least some data in we would need to request the patch to be uplifted into Fx69 beta. We should be fine if we get that done in the next week.

Flags: needinfo?(jrediger)

Jan-Erik Rediger [:janerik]

Updated

•

5 years ago

Depends on: 1570652

Jan-Erik Rediger [:janerik]

Comment 9

•

5 years ago

Attached file Bug 1557811 - Include scalars in new-profile ping (obsolete) — Details

Jan-Erik Rediger [:janerik]

Comment 10

•

5 years ago

Attached file Bug 1557811 - Add profile_selection_reason to the new-profile ping store (obsolete) — Details

Depends on D40593

Jan-Erik Rediger [:janerik]

Comment 11

•

5 years ago

Attached file GitHub Pull Request — Details

Assignee: nobody → jrediger

Jan-Erik Rediger [:janerik]

Comment 12

•

5 years ago

(Of course I manage to attach the m-c changes to the wrong issue)

Assignee: jrediger → nobody

Phabricator Automation

Updated

•

5 years ago

Attachment #9082946 - Attachment is obsolete: true

Phabricator Automation

Updated

•

5 years ago

Attachment #9082945 - Attachment is obsolete: true

Mark Reid [:mreid]

Comment 13

•

5 years ago

(In reply to Jan-Erik Rediger [:janerik] from comment #6)

Change the pipeline schema to optionally accepts scalar data

Jan-Erik took care of this over in a PR and a follow-up in the schemas repo.

I think we're good to resolve this bug now, I was able to query this data using:

SELECT
  json_extract(additional_properties, '$.payload.processes.parent.scalars[\'startup.profile_selection_reason\']') AS reason,
  count(*)
FROM
  telemetry_live.new_profile_v4
WHERE
  DATE(submission_timestamp) = '2019-08-25'
  AND additional_properties LIKE '%"scalars":%'
GROUP BY
  reason

Status: NEW → RESOLVED

Closed: 5 years ago

Flags: needinfo?(mreid)

Resolution: --- → FIXED

Jan-Erik Rediger [:janerik]

Comment 14

•

5 years ago

Just to make you aware: The code changed landed in Firefox 70, trees go into soft-code freeze today, meaning I can't really uplift this anymore to 69 (currently beta, going into release next week).

Flags: needinfo?(gkaberere)

George Kaberere [:gkaberere]

Reporter

Updated

•

5 years ago

Flags: needinfo?(gkaberere)

Nobody; OK to take it and work on it

Assignee

Updated

•

2 years ago

Component: Datasets: General → General

You need to log in before you can comment on or make changes to this bug.