Glean SDK might benefit from a "struct" metric type
Categories
(Data Platform and Tools :: Glean Metric Types, enhancement, P2)
Tracking
(Not tracked)
People
(Reporter: chutten, Assigned: chutten)
References
Details
Attachments
(1 obsolete file)
Proposal for changing an existing or adding a new Glean metric type
Who is the individual/team requesting this change?
:chutten, Firefox Telemetry
Is this about changing an existing metric type or creating a new one?
Creating a new one
Can you describe the data that needs to be recorded?
Data that comes in a structured format. For example: the customized layout state of Firefox Desktop's UI, the size and speed of the fastest GC run.
Can you provide a raw sample of the data that needs to be recorded (this is in the abstract, and not any particular implementation details about its representation in the payload or the database)
library_button: { state: "pinned", location: "right", parent_item: "none" },
fastest_gc: { speed: 4250ns, size: 412KB }
What is the business question/use-case that requires the data to be recorded?
General structured data ingestion reasons.
How would the data be consumed?
The data would be available for analysis in the usual ad hoc tooling. GLAM's aggregation could aggregate the sub-metrics individually (like showing a distribution of the fastest GCs' speeds and a distribution of the fastest GCs' sizes), but the relationship between the structured datapoints would be lost without a lot of work.
Why existing metric types are not enough?
We could do this today by flattening the structs like so:
library_button_state: "pinned",
library_button_location: "right",
library_button_parent_item: "none",
fastest_gc_speed: 4250ns,
fastest_gc_size: 412KB,
This is suboptimal for organizational reasons and may make instrumentation mistakes easier.
Also, Events could be adopted to report this via its extra dictionary, but I have other issues with that.
What is the timeline by which the data needs to be collected?
No strict timeline.
Assignee | ||
Comment 1•5 years ago
|
||
Further requirement:
In discussions with the team we highlighted a potential need to use this metric type to support a library of "Named Structs". For example, perhaps a named struct of struct_search_metrics
would have the list of search engines, the current default engine, the number of searches per engine, and assorted other search-y things in it.
Assignee | ||
Comment 2•3 years ago
|
||
For tackling first thing in the new year.
Comment 3•3 years ago
|
||
chutten took the time and prepared the initial proposal: https://docs.google.com/document/d/1JNyJ_WFexoQAA8TTT9xR8X_yrBzp5uUdSO_38I7jYS4/edit#
I selected the relevant stakeholders. Please take a look at the proposal and help shape the design, especially with respect to your designated area of expertise.
As per the process I hope this will take no more than 2 weeks (2022-02-14).
Updated•3 years ago
|
Assignee | ||
Comment 4•3 years ago
|
||
Note to those being consulted: thanks to the very helpful input thus far provided, I've had to rejigger things slightly but comprehensively.
Specifically, :klukas pointed out an excellent reason that the client should pay the complexity cost of assembling the struct into the appropriate structure that makes it easier to ingest. In retrospect this seems obvious, but at the time I couldn't see it, so thank you Jeff!
You may wish to re-review the doc as this has ramifications and there's some stuff that wasn't there when the doc went out:
- There's a fully-fledged Purpose section explaining why this complexity is worth the effort.
- The fully-qualified field names will no longer be limited to the metric name length. Instead each field can have the full use of the 30B of characters that a metric name has. Yes,
my_category.my_metric.my_field.my_nested_field.my_double_nested_field.my_triple_nested_field.my_quadruple_nested_field
is valid. (I think) - Payload assembly has gotten complicated enough that it has its own section separate from Storage. Give it a gander.
- Max nesting limit has been reduced to 5, for everyone's sanity
- Duplication of
struct
metrics andstruct_list
metrics and their fields is far less likely because field position within the structure is now an identifying quality.a.b_c.d
will no longer collide witha.b.c_d
- There's an optional API improvement to make commits a little easier. Still not happy with it, so please poke holes in it at "Option: Commit Full Subtrees"
- There's a new section on Threading and IPC showing that, by wrapping a metric in a
struct_list
, you can make a previously-forbidden-from-using-IPC metric into one that can be used across processes. Which is a neat trick. - There's a new section on Documentation, Use, and Education. Learn some Latin!
Unresolved/ongoing discussions include
- Should storage pay the complexity price instead of payload assembly?
- Should
struct
even have a transaction API since there can only be one of them? - Do we have to drop support for
event
metrics because of a quirk of the Glean ping payload schema?
Things unchanged (for now):
- APIs, SQL, how it looks in metrics.yaml... basically all the touch-points where a Instrumentor, Consumer, or Analyst would interact with these remain unchanged from the initial version.
If any of this is meaningful to your review, you may wish to go through again and weigh in again. Sorry for the shift in bearing, but it was a necessary course correction.
Comment 5•3 years ago
•
|
||
Left my decision summary on the proposal. Although discussions around implementation details are still ongoing, I don't feel they are blocking that.
I will be active on the discussions while they are unresolved and I can contribute to them. Once that is over then I will add my sign-off to the proposal.
Comment 6•3 years ago
|
||
Updated•3 years ago
|
Comment 7•3 years ago
|
||
Thanks :chutten for absorbing suggestions and updating the design. I've left feedback and signed off. I'm tagging in :relud, though, for checking on what complexity this might add to probe-scraper.
Comment 8•3 years ago
•
|
||
(In reply to Jeff Klukas [:klukas] (UTC-4) from comment #7)
I'm tagging in :relud, though, for checking on what complexity this might add to probe-scraper.
I agree with :chutten's assessment that "it will largely be some new abstractions that call into all the existing functionality"
Updated•3 years ago
|
Assignee | ||
Comment 10•3 years ago
|
||
Tagging Jan-Erik for questions in comments on Storage, and for where we go from here. We've got a lot of sign-offs, so we might be close to accepting this and starting to think about scheduling.
Updated•3 years ago
|
Comment 11•3 years ago
|
||
This proposal has been accepted.
Published proposal doc: https://docs.google.com/document/d/e/2PACX-1vTL4YVxNkUQx-yJGjc_DTc9Dpeyf2LcJx0z2W31JkBCOHIuS9dB3kEA57DjkFbuDFztzRggT_O2m3IF/pub
I'm going to file bugs for implementation.
Description
•