Glean Event extra keys could be easily misused
Categories
(Data Platform and Tools Graveyard :: Glean Metric Types, task, P1)
Tracking
(Not tracked)
People
(Reporter: chutten, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(1 file, 3 obsolete files)
84 bytes,
text/x-google-doc
|
Details |
Glean's top-level metrics are mostly quite good at preventing misuse. Events are one place where that isn't the case as the extra keys are a bit loose in what they allow. From the Book of Glean:
- The keys in the extra_keys list must be in dotted snake case, with a maximum length of 40 bytes in UTF-8.
- The values in the extras object have a maximum length of 50 in UTF-8.
Essentially it's an unbounded dictionary of known key names to arbitrary key values, encoded as strings.
We can do better than that.
I propose we extend the event format so that each extra key can be specified as its own high-level metric type. Maybe like so:
views:
login_opened:
type: event
description: Recorded when the login view is opened.
...
extra_keys:
source_of_login:
type: label (or whatever we call that categorical/enumerated scalar type)
labels: [toolbar, dialog, popup, doorhanger]
description: The source from which the login view was opened.
speed:
type: timespan
time_unit: millisecond
description: How long it took the login view to be opened
- Migration will be simplified by transforming existing keys to "string" metrics (which are next on my list of "We could do better").
- This does not appear to be an onerous requirement of our users, but I want to call out that we are going to be asking them to think more, which may affect Glean Events adoption
- Not all metric types seem to make sense as extra keys.
- Schema-wise I have no idea what this means. Pretty sure the schema only tolerates strings in this position, and having it support multiple datatypes might be tricky?
- Maintenance-burden-wise, adding new metric types might require additional cognitive burden around "how does this work with events?"
- Should chat with GLAM about how these could be aggregated
Updated•5 years ago
|
Updated•5 years ago
|
Comment 1•5 years ago
|
||
Hey Chris, would you kindly reframe this bug using the provided template for this component?
Reporter | ||
Comment 2•5 years ago
|
||
(( ... maybe Jan-Erik should be the one to do it since he's the one who moved it : P ))
Proposal for changing an existing or adding a new Glean metric type
Who is the individual/team requesting this change?
:chutten
Is this about changing an existing metric type or creating a new one?
Changing
Can you describe the data that needs to be recorded?
"extra_keys": A dictionary of names to values, presently values are strings.
Can you provide a raw sample of the data that needs to be recorded (this is in the abstract, and not any particular implementation details about its representation in the payload or the database)
[timestamp, views, login_opened, {source_of_login: toolbar, speed: 43ms}]
What is the business question/use-case that requires the data to be recorded?
Glean Data Principles: extra_keys are not a "high-level datatype"
How would the data be consumed?
Presently the sql dataset represents "extra_keys" as a map of string->string (repeated structs of pairs of nullable strings). This would have to change.
I presume it'd be consumed in the way that events are consumed, but in a much wider sql table (column per extra key). This would probably aid projects like GLAM in their aggregation/reporting on event data as it would gain structure.
Why existing metric types are not enough?
Existing "extra_keys" data could be misused in ways that other Glean SDK metric types are specifically designed to prevent. Negative counts, timespans with mixed units...
What is the timeline by which the data needs to be collected?
None. This is an enhancement.
Comment 3•5 years ago
|
||
While the process promises to handle request within a week, I'm putting this on hold until we complete the first test of our process in bug 1595723.
Reporter | ||
Comment 4•5 years ago
|
||
This is fine by me. There is no immediate need for a decision on this, so it can wait for a good time.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 5•5 years ago
|
||
Hey Mike, this is the design document. Would you kindly designate who is going to work on this for the initial design?
Comment 6•5 years ago
|
||
Updated•5 years ago
|
Comment 7•5 years ago
|
||
Travis will take the design phase on this one.
Updated•5 years ago
|
Comment 8•5 years ago
|
||
Here is the proposal, untaking and a ni? for [:mdroettboom] to kick off the next step in the process.
Updated•5 years ago
|
Comment 9•5 years ago
|
||
Hey Travis, the proposal should be added to the relevant section of the design doc from comment 5. Can you kindly do that?
Comment 11•5 years ago
|
||
Hey folks,
the proposal document moved from design to comment stage. It is ready for one final look. Final feedback due by November 20th, 2020.
If that looks good to you, please sign off at the top of the document.
Comment 12•5 years ago
|
||
Hey Leif, you seem to be the one assigned in the JIRA ticket. Please kindly review the proposal from comment 11 (it's the same from JIRA!).
Comment 13•5 years ago
|
||
I reviewed, but keeping the ni so I remember to check back in tomorrow with comments.
Reporter | ||
Comment 14•5 years ago
•
|
||
Reviewed with my Data Steward hat on, r+, signed off.
Comment 15•5 years ago
|
||
Bugzilla is now bugging me about this ni, so I'm turning it off :) Travis, let me know if you want me to write some stuff for that document, or are ready for a next review. In its current state, DE has not signed off on this proposal.
Comment 16•5 years ago
|
||
(In reply to Frank Bertsch [:frank] from comment #15)
Bugzilla is now bugging me about this ni, so I'm turning it off :) Travis, let me know if you want me to write some stuff for that document, or are ready for a next review. In its current state, DE has not signed off on this proposal.
Thanks frank, I'll let you know and happy to incorporate the suggestions from DE.
Comment 17•4 years ago
|
||
(In reply to Frank Bertsch [:frank] from comment #15)
Bugzilla is now bugging me about this ni, so I'm turning it off :) Travis, let me know if you want me to write some stuff for that document, or are ready for a next review. In its current state, DE has not signed off on this proposal.
Frank, I've incorporated the suggestion about changing the format to "event_properties" and ready for you to take another look. As for the types with high cardinality that you were concerned with (DateTime
and Uuid
), I'd be okay with dropping those types from the initial specs for this and adding them back if a definite use case arises. I should know more about Nimbus' use-case for UUID by then end of this week.
Comment 18•4 years ago
|
||
Finally got to a re-read.
I'm still worried about a few API pieces, with respect to their implementation, and how to handle legacy code.
With the payload format settled though I think these should be solvable and can be split into what needs to be in the proposal (definitely the backwards-compatibility/migration) and what can be defered to implementation time (exact API internals).
Holding back sign-off until then.
Comment 19•4 years ago
|
||
Updated a few things on the proposal taking :janerik's comments into consideration.
Comment 20•4 years ago
|
||
Hey folks, it's now time to sign-off on this (or push back :) )!
Comment 21•4 years ago
|
||
Comment 22•4 years ago
|
||
Updated•4 years ago
|
Comment 23•4 years ago
|
||
(Sorry for the spam, I mistakenly added the wrong link first)
Updated•4 years ago
|
Comment 24•4 years ago
|
||
Comment 25•4 years ago
|
||
Added some comments and an updated description of what the schema change will need to be.
The only piece I think we should change is dropping DateTime.
Updated•4 years ago
|
Updated•9 days ago
|
Description
•