Closed Bug 1634468 Opened 5 years ago Closed 5 years ago

Proposal for adding a new Glean metric type JweMetricType (e.g. ecosystem_anon_id)

Categories

(Data Platform and Tools Graveyard :: Glean Metric Types, enhancement, P1)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: klukas, Unassigned)

References

Details

Attachments

(1 file)

Proposal for changing an existing or adding a new Glean metric type

Who is the individual/team requesting this change?

Jeff Klukas, Data Platform Team as a representative of the cross-discipline Account Ecosystem Telemetry working group.

Is this about changing an existing metric type or creating a new one?

Creating a new metric type

Can you describe the data that needs to be recorded?

For Account Ecosystem Telemetry, we need clients to be able to send "account ecosystem" pings with various metrics. The defining feature of an account ecosystem ping is that it includes a sensitive ecosystem_anon_id value that must be decrypted by the data pipeline and replaced with the decrypted ecosystem_user_id value before the ping is uploaded to BigQuery.

Can you provide a raw sample of the data that needs to be recorded (this is in the abstract, and not any particular implementation details about its representation in the payload or the database)

The client receives or generates an ecosystem_anon_id value during the FxA login flow. The ecosystem_anon_id value is a JOSE JWE object in compact serialization form, consisting of several hundred characters. That value includes the value to be decrypted along with various metadata describing the public key used to encrypt, encryption type, etc. An example value with whitespace added:

eyJhbGciOiJFQ0RILUVTIiwia2lkIjoiMFZFRTdmT0txbFdHVGZrY0taRUJ2W
Wl3dkpMYTRUUGlJVGxXMGJOcDdqVSIsImVwayI6eyJrdHkiOiJFQyIsImNydi
I6IlAtMjU2IiwieCI6InY3Q1FlRWtVQjMwUGwxV0tPMUZUZ25OQlNQdlFyNlh
0UnZxT2kzSWdzNHciLCJ5IjoiNDBKVEpaQlMwOXpWNHpxb0hHZDI5NGFDeHRq
cGU5a09reGhELVctUEZsSSJ9LCJlbmMiOiJBMjU2R0NNIn0.
.
A_wzJya943vlHKFH.
yq0JhkGZiZd6UiZK6goTcEf6i4gbbBeXxvq8QV5_nC4.
Knl_sYSBrrP-aa54z6B6gA

What is the business question/use-case that requires the data to be recorded?

The overall business case for Account Ecosystem Telemetry is nuanced. See the Product Ecosystem Metrics Proposal.

How would the data be consumed?

The ecosystem_user_id values present in BigQuery would allow us to correlate usage of a single FxA user across multiple products without knowing who that user is. Various aggregations would be performed across products, grouping by the shared ecosystem_anon_id values.

Why existing metric types are not enough?

There are several details here that bump up against limitations of the current Glean SDK.

First, ecosystem_anon_id values are larger than the current limits of 100 characters for a string metrics or 50 characters for a string list metric.

Second, this value is subject to a new preprocessing step that has not previously existed in the pipeline. The client would be sending a metric with one name, and the pipeline would remove that field, decrypt it, and place a new ecosystem_user_id field into the ping before sending it to the normal Decoder step where schema validation, etc. is performed.

What is the timeline by which the data needs to be collected?

We are targeting desktop first for Account Ecosystem Telemetry, but will be wanting to start testing Glean apps as early as Q3 2020.

See some relevant discussion with :chutten in the #account-ecosystem-telemetry channel on Slack that led me to casting this as a new metric proposal.

May be relevant for the design phase... :klukas, is this id something that will need to be included in "deletion-request" pings?

Flags: needinfo?(jklukas)

(In reply to Chris H-C :chutten from comment #2)

May be relevant for the design phase... :klukas, is this id something that will need to be included in "deletion-request" pings?

My understanding is that we don't need to include this identifier in the deletion request ping when the client opts out of telemetry. We instead have to send this identifier from FxA when an FxA user requests data deletion at the account level. This is based on :chutten's own comments in :rfkelly's Google doc about comparing active user definitions:

Had a conversation in the Shredder meeting and the ruling is: On Telemetry opt-out we request the deletion of client-based data only.

Group: mozilla-employee-confidential
Flags: needinfo?(jklukas)

Hey Jeff, why is this employee-confidential?

Flags: needinfo?(jklukas)

(In reply to Alessio Placitelli [:Dexter] from comment #4)

Hey Jeff, why is this employee-confidential?

I was being probably overly cautious due to the discussion of data deletion policy. I want to be careful not to make any explicit statements about compliance with any legislation, but I suppose it's fine to discuss data deletion policies publicly outside the context of compliance.

Flags: needinfo?(jklukas)

Ah, yes, "the ruling is". That's me being verbose and broad about technical guidelines, not relaying legislative/legal opinion from people who actually know those things. We're clear, but thank you for your sensitivity.

Shows what I get when I ask a question I should've already known the answer to : )

Group: mozilla-employee-confidential
Blocks: aet-pipeline
Priority: -- → P1

This is the discussion document for this proposal.

Mike, can you designate the group of people who should be working on the initial design for this?

Flags: needinfo?(mdroettboom)

:chutten, :dexter : Are you available for the design work on this one?

Flags: needinfo?(mdroettboom)
Flags: needinfo?(chutten)
Flags: needinfo?(alessio.placitelli)

I can make time.

Flags: needinfo?(chutten)

(In reply to Michael Droettboom [:mdroettboom] from comment #8)

:chutten, :dexter : Are you available for the design work on this one?

Yup, we'll take care of that

Flags: needinfo?(alessio.placitelli)
Summary: Proposal for adding a new metric type for ecosystem_anon_id → Proposal for adding a new Glean metric type for ecosystem_anon_id

Hey Mike,

me and Chris finalized the work on this proposal. I believe this is good to move to the comment stage and have others chime in. What do you think?

I'll flag others if that's the case.

Flags: needinfo?(mdroettboom)

I've moved the document to the comment phase. I had a couple of minor nits in the document that I don't think need to hold anything up.

Flags: needinfo?(mdroettboom)

Hey folks,

the document moved from design to comment stage. It is ready for one final look. Final feedback due by June 29th, 2020.

If that looks good to you, please sign off at the top of the document.

Flags: needinfo?(msamuel)
Flags: needinfo?(fbertsch)
Flags: needinfo?(brizental)
Attached file proposal

Hi Teon,

we need data-steward review for the attached proposal. Please check the related Data-Steward section at the top of the document. More information about this process here.

Attachment #9158243 - Flags: data-review?(teon)
Summary: Proposal for adding a new Glean metric type for ecosystem_anon_id → Proposal for adding a new Glean metric type JWEMetricType (e.g. ecosystem_anon_id)
Flags: needinfo?(brizental)
Summary: Proposal for adding a new Glean metric type JWEMetricType (e.g. ecosystem_anon_id) → Proposal for adding a new Glean metric type JweMetricType (e.g. ecosystem_anon_id)
See Also: → 1648208
Flags: needinfo?(fbertsch)

Hey Mike,

looks like the majority of folks signed off the proposal. Is now the time to make a call on this (when you're back!)!

Flags: needinfo?(mdroettboom)

Approved. Bug 1650787 is opened to track the implementation.

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(mdroettboom)
Resolution: --- → FIXED

hey :Dexter, I'm very sorry but I'm currently overloaded with work. also, I am currently away at a virtual conference this week. would it be possible to see if some in #data-stewards on matrix might be able to pick this up.

Flags: needinfo?(alessio.placitelli)

(In reply to Teon Brooks [:teon] from comment #17)

hey :Dexter, I'm very sorry but I'm currently overloaded with work. also, I am currently away at a virtual conference this week. would it be possible to see if some in #data-stewards on matrix might be able to pick this up.

Hey Teon, no worries, this got approved already by Mike (see comment 16). No need to review anymore!

Flags: needinfo?(alessio.placitelli)
Flags: needinfo?(msamuel)
Attachment #9158243 - Flags: data-review?(teon)
Product: Data Platform and Tools → Data Platform and Tools Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: