Proposal for adding a new Glean metric type JweMetricType (e.g. ecosystem_anon_id)
Categories
(Data Platform and Tools Graveyard :: Glean Metric Types, enhancement, P1)
Tracking
(Not tracked)
People
(Reporter: klukas, Unassigned)
References
Details
Attachments
(1 file)
12 bytes,
text/plain
|
Details |
Proposal for changing an existing or adding a new Glean metric type
Who is the individual/team requesting this change?
Jeff Klukas, Data Platform Team as a representative of the cross-discipline Account Ecosystem Telemetry working group.
Is this about changing an existing metric type or creating a new one?
Creating a new metric type
Can you describe the data that needs to be recorded?
For Account Ecosystem Telemetry, we need clients to be able to send "account ecosystem" pings with various metrics. The defining feature of an account ecosystem ping is that it includes a sensitive ecosystem_anon_id
value that must be decrypted by the data pipeline and replaced with the decrypted ecosystem_user_id
value before the ping is uploaded to BigQuery.
Can you provide a raw sample of the data that needs to be recorded (this is in the abstract, and not any particular implementation details about its representation in the payload or the database)
The client receives or generates an ecosystem_anon_id value during the FxA login flow. The ecosystem_anon_id value is a JOSE JWE object in compact serialization form, consisting of several hundred characters. That value includes the value to be decrypted along with various metadata describing the public key used to encrypt, encryption type, etc. An example value with whitespace added:
eyJhbGciOiJFQ0RILUVTIiwia2lkIjoiMFZFRTdmT0txbFdHVGZrY0taRUJ2W
Wl3dkpMYTRUUGlJVGxXMGJOcDdqVSIsImVwayI6eyJrdHkiOiJFQyIsImNydi
I6IlAtMjU2IiwieCI6InY3Q1FlRWtVQjMwUGwxV0tPMUZUZ25OQlNQdlFyNlh
0UnZxT2kzSWdzNHciLCJ5IjoiNDBKVEpaQlMwOXpWNHpxb0hHZDI5NGFDeHRq
cGU5a09reGhELVctUEZsSSJ9LCJlbmMiOiJBMjU2R0NNIn0.
.
A_wzJya943vlHKFH.
yq0JhkGZiZd6UiZK6goTcEf6i4gbbBeXxvq8QV5_nC4.
Knl_sYSBrrP-aa54z6B6gA
What is the business question/use-case that requires the data to be recorded?
The overall business case for Account Ecosystem Telemetry is nuanced. See the Product Ecosystem Metrics Proposal.
How would the data be consumed?
The ecosystem_user_id
values present in BigQuery would allow us to correlate usage of a single FxA user across multiple products without knowing who that user is. Various aggregations would be performed across products, grouping by the shared ecosystem_anon_id
values.
Why existing metric types are not enough?
There are several details here that bump up against limitations of the current Glean SDK.
First, ecosystem_anon_id
values are larger than the current limits of 100 characters for a string metrics or 50 characters for a string list metric.
Second, this value is subject to a new preprocessing step that has not previously existed in the pipeline. The client would be sending a metric with one name, and the pipeline would remove that field, decrypt it, and place a new ecosystem_user_id
field into the ping before sending it to the normal Decoder step where schema validation, etc. is performed.
What is the timeline by which the data needs to be collected?
We are targeting desktop first for Account Ecosystem Telemetry, but will be wanting to start testing Glean apps as early as Q3 2020.
Reporter | ||
Comment 1•5 years ago
|
||
See some relevant discussion with :chutten in the #account-ecosystem-telemetry
channel on Slack that led me to casting this as a new metric proposal.
Comment 2•5 years ago
|
||
May be relevant for the design phase... :klukas, is this id something that will need to be included in "deletion-request" pings?
Reporter | ||
Comment 3•5 years ago
|
||
(In reply to Chris H-C :chutten from comment #2)
May be relevant for the design phase... :klukas, is this id something that will need to be included in "deletion-request" pings?
My understanding is that we don't need to include this identifier in the deletion request ping when the client opts out of telemetry. We instead have to send this identifier from FxA when an FxA user requests data deletion at the account level. This is based on :chutten's own comments in :rfkelly's Google doc about comparing active user definitions:
Had a conversation in the Shredder meeting and the ruling is: On Telemetry opt-out we request the deletion of client-based data only.
Reporter | ||
Comment 5•5 years ago
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #4)
Hey Jeff, why is this employee-confidential?
I was being probably overly cautious due to the discussion of data deletion policy. I want to be careful not to make any explicit statements about compliance with any legislation, but I suppose it's fine to discuss data deletion policies publicly outside the context of compliance.
Comment 6•5 years ago
|
||
Ah, yes, "the ruling is". That's me being verbose and broad about technical guidelines, not relaying legislative/legal opinion from people who actually know those things. We're clear, but thank you for your sensitivity.
Shows what I get when I ask a question I should've already known the answer to : )
Reporter | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 7•5 years ago
|
||
This is the discussion document for this proposal.
Mike, can you designate the group of people who should be working on the initial design for this?
Comment 8•5 years ago
|
||
:chutten, :dexter : Are you available for the design work on this one?
Comment 10•5 years ago
|
||
(In reply to Michael Droettboom [:mdroettboom] from comment #8)
:chutten, :dexter : Are you available for the design work on this one?
Yup, we'll take care of that
Reporter | ||
Updated•5 years ago
|
Comment 11•5 years ago
|
||
Hey Mike,
me and Chris finalized the work on this proposal. I believe this is good to move to the comment stage and have others chime in. What do you think?
I'll flag others if that's the case.
Comment 12•5 years ago
|
||
I've moved the document to the comment phase. I had a couple of minor nits in the document that I don't think need to hold anything up.
Updated•5 years ago
|
Comment 13•5 years ago
|
||
Hey folks,
the document moved from design to comment stage. It is ready for one final look. Final feedback due by June 29th, 2020.
If that looks good to you, please sign off at the top of the document.
Comment 14•5 years ago
|
||
Hi Teon,
we need data-steward review for the attached proposal. Please check the related Data-Steward section at the top of the document. More information about this process here.
Updated•5 years ago
|
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 15•5 years ago
|
||
Hey Mike,
looks like the majority of folks signed off the proposal. Is now the time to make a call on this (when you're back!)!
Comment 16•5 years ago
|
||
Approved. Bug 1650787 is opened to track the implementation.
Comment 17•5 years ago
|
||
hey :Dexter, I'm very sorry but I'm currently overloaded with work. also, I am currently away at a virtual conference this week. would it be possible to see if some in #data-stewards on matrix might be able to pick this up.
Comment 18•5 years ago
|
||
(In reply to Teon Brooks [:teon] from comment #17)
hey :Dexter, I'm very sorry but I'm currently overloaded with work. also, I am currently away at a virtual conference this week. would it be possible to see if some in #data-stewards on matrix might be able to pick this up.
Hey Teon, no worries, this got approved already by Mike (see comment 16). No need to review anymore!
Updated•5 years ago
|
Updated•5 years ago
|
Updated•2 months ago
|
Description
•