Labeled Metrics with staticly-defined Labels should be encouraged
Categories
(Data Platform and Tools :: Glean Metric Types, task, P2)
Tracking
(firefox114 fixed)
Tracking | Status | |
---|---|---|
firefox114 | --- | fixed |
People
(Reporter: chutten, Assigned: chutten)
References
(Blocks 2 open bugs)
Details
Attachments
(5 files)
Proposal for changing an existing or adding a new Glean metric type
Who is the individual/team requesting this change?
:chutten
Is this about changing an existing metric type or creating a new one?
Changing labeled metric types (e.g. Labeled Booleans)
Can you describe the data that needs to be recorded?
No change to the data being recorded.
Can you provide a raw sample of the data that needs to be recorded (this is in the abstract, and not any particular implementation details about its representation in the payload or the database)
No change to the data being recorded.
What is the business question/use-case that requires the data to be recorded?
No change to the business question/use-case.
How would the data be consumed?
No change to how the data is consumed.
Why existing metric types are not enough?
Right now there is no incentive besides (perhaps) Data Hygiene to induce an instrumentor to use statically-defined (as in, defined in the labels
property in a metrics.yaml
) labels for labeled types.
We want to encourage the use of statically-defined labels for reasons including Data Hygiene ( : D ).
So thus we should probably provide a mechanism that makes it worthwhile to go to the effort of statically-defining your labels and ensuring that the list stays up-to-date.
I propose that, like Categorical Histograms, we provide an enum for each label (plus one for __other__
) that can be used instead of strings. If we can come up with other inducements (perhaps by discouraging the use of the non-enum API somehow), that'd be nice as well or instead.
What is the timeline by which the data needs to be collected?
The sooner, the better, as established habits are harder to break.
Assignee | ||
Comment 1•4 years ago
|
||
There's some helpful context over here and in the comments following, which I will summarize here:
- The use case of "error code" is one we've heard before and will hear again.
- Being able to specify a partial list of known values of interest to label them would be nice
- The metric should accept any number of that width, but map anything not in the list to e.g.
__other__
- So for an image decoding library, we could sparsely label some likely values (success, file too small) but if we get an error we didn't care much about (retroencabulator is fumbling) we still get a count for that, but it gets bucketed with other errors we don't care about (baffle spindling overwound, etc).
- The metric should accept any number of that width, but map anything not in the list to e.g.
- Generating an enum for these values would be nice, so users could e.g.
Glean.myCategory.myErrorCode.add(Glean.myCategory.myErrorCode.codes.lp0_on_fire);
- Using an existing
enum
that is already in the codebase would be ideal, so users could e.g.nsresult rv = SomeGeckoOperation(); glean::my_gecko_category::my_error_code.Add(rv);
- But it might be frighteningly difficult. For example, a specific class of "error code" collections would be Gecko's
nsresult
codes. They are bothenum class
values and also constants. We could write an API to take thensresult
enum class
(ErrorCodeMetric::Add(nsresult aCode, uint32_t aErrorCount = 1)
, but what we get on the inside of this API is something like the number2147500037
. At some point we need to figure out that this meansNS_ERROR_FAILURE
and then perhaps translate to that to Glean's label format (ns_error_failure
maybe) so we can store and transmit and analyze and display it. - Getting this right across languages will be interesting as well as I think the JS manifestation of nsresult is a thrown error. But this is very Firefox-specific so maybe I shouldn't overrotate here.
- The important thing is that for C++, especially with the X Macro pattern, you basically need to run a full compiler frontend in order to be able to codegen a map from a given
enum
orenum class
's values to their symbol names as strings. Other languages may make this easier
- But it might be frighteningly difficult. For example, a specific class of "error code" collections would be Gecko's
- All these challenges being challenging doesn't change that having a single source of truth for what values mean what strings is crucial. If we force humans to keep these in-sync through
static_assert
or linters since the tools can only generate transformation functions, then it's still worth it.
Assignee | ||
Comment 2•4 years ago
|
||
(Oh, and some context for the initial submission: I originally made this submission because you couldn't have more than 16 static labels (the same number as dynamic labels). That was a bug. The original design was to allow unlimited static labels. In resolving the bug we fixed the maximum number of static labels to 100. So there is an incentive to use static labels and in fact they are the only choice for categorical counts that have more than 16 categories.)
Comment 4•3 years ago
|
||
When trying to record something with a given label, it would be useful to have a way to know if the label is part of the static list of labels that are accepted by the probe, or if the values will be counted as __other__
. In the latter case, I would like to report the missing label name using another probe (likely a string_list).
Assignee | ||
Comment 5•3 years ago
|
||
Oh, and keeping this up-to-date, we've lifted the limit on the count of static labels in labeled_*
metrics to 4096 in bug 1772163
Assignee | ||
Comment 6•2 years ago
|
||
bug 1790872 will be adding a data collection that would have used Glean if it could.
Assignee | ||
Comment 7•2 years ago
|
||
I intend to draft a proposal for exploring and discussing how to solve this specifically for FOG C++ consumers (and, if I'm lucky, in general for all consumers) next week.
Assignee | ||
Comment 8•2 years ago
|
||
Alrighty, I've come up with a couple of (additive) proposals for consideration. I thought about making this a full Change Proposal, but these are so small that it didn't seem right. So let's do this the BMO+ni way.
Proposal 1: Remove the label regex
Labeled metrics' labels are not and have not ever been bigquery-column-name safe. The regex is causing friction for use that is not balanced by any (intentional) benefit. (I mean, maybe it accidentally makes it harder for a PII leak, but only because it makes everything harder).
Retain a length limit (expressed as bytes of UTF-8 as is tradition) and reserve __other__
, but otherwise permit everything. Or at least everything printable.
Proposal 2: enums for labels
For static labels we can generate an enum with a variant for each label plus one for __other__
. This helps by:
- Providing a cheap (not string) and accurate (no typos) way for API consumers to map external values (like IPC message names or nsresult values) to labels
- It becomes possible to iterate over all possible labels (helpful for
about:glean
improvements in bug 1762281 and has been requested for testing) - Comment #4 via e.g.
MyCategoryMyMetricLabel EnumForLabel(const nsACString& aLabel)
returningMyCategoryMyMetricLabel::other__
(can't start a C++ identifier with__
as that's reserved) for anything not recognized - Retains all the benefits of the current system by augmenting and not replacing it
It should be performant to implement as we could e.g. opt to implement the enum variants to carry the values of the indices of the string label in the metric's labels
vector.
This can be implemented in the language binding layer, leaving glean-core's CowString
-based Labeled<T>::get
alone. This is much the same way that we did things when we added types to event extras.
Travis, Jan-Erik, Bruno please let me know what you think. Feedback deadline is Jan 17, 2023.
(and of course I'm happy to get input from other folks not named. Did I miss a big use case? Is this the worst idea? Let me know.)
Comment 9•2 years ago
|
||
Re Proposal 1: Yes (though maybe restricted to ascii). It's bitten us more often than it solved things. We should check in with pipeline/data-engineering though just to make sure.
Re Proposal 2: It's surprisingly easy to implement in Rust!. Other languages will require some additional work, but I think that shouldn't be too hard either.
Comment 10•2 years ago
|
||
I agree with this, the label restrictions have mostly been a pain or caused confusion and I cannot point to a single specific thing that it has helped or prevented.
On the second proposal, I'm also in favor of enumerations. I am fairly certain that both Swift and Kotlin enums shouldn't make this too difficult to do, and Jan-Erik already points out that it's simple in Rust.
r+ from me on both counts.
Comment 11•2 years ago
|
||
Proposal 1: Based on everything mentioned, this makes sense. I am not as familiar with the pain points described, but the solution sounds like the way to go.
Proposal 2: Enums sound great - this should be fine for Glean.js. Enums on the Typescript side work similar to other languages and we can generate the same sort of object in JS too.
r+ from me as well
Comment 12•2 years ago
|
||
Comment 13•2 years ago
|
||
Assignee | ||
Comment 14•2 years ago
|
||
Assignee | ||
Comment 15•2 years ago
|
||
chutten merged PR #2364: "Bug 1672273 - Remove label regex, replace with '71-bytes of printable ASCII'" in 1b0be12.
Proposal 1 is now implemented, but not released, as "labels (both static and dynamic) must be at most 71 characters of printable ASCII ( to
~
, inclusive)". As always this still requires release and vendoring of both glean_parser
and Glean SDK before anyone can take advantage of it.
Next up: "Proposal 2: enums". Which is proving a mite tricky.
Assignee | ||
Comment 16•2 years ago
|
||
We are changing the type of impl::Labeled<T> to impl::Labeled<T, E> making the
current macro use of CPU use metrics within FOGIPC impossible.
Thus we introduce a ProcessType enum to replace storing the metrics and remove
the macros to acheive the same end.
Assignee | ||
Comment 17•2 years ago
|
||
First we have to replace the venerable metric hashmaps with metric match
statements (like events). This is because, by adding the enum to the Labeled<>
type, we've made it impossible to use HashMap (since its type requires a known
Value type V
, and Labeled<Counter, $SomeEnumType> is incomplete).
Then we had to generate meaningful enums. The order of the values is important,
as I foresee a future in which we may wish to use the enum variant value
(its "discriminant") to index into the glean-core labels
list.
The enums can't have naming collisions. In Rust this is easy, so long as we
stay away from reserved keywords (bug 1814767). In C++ we are using enum class
so the variants' identifiers are scoped. However, on my system at least,
this is still susceptible to collision with preprocessor defines. So we prefix
the variants with the letter e
. So far so good.
C++ Labeled metric implementations is regrettably moved from Labeled.cpp to
Labeled.h, as the template specialization is no longer full.
No decision about JS is made or implied by its exclusion from this commit.
This includes support for runtime registration (via JOG).
The use of strings for labeled_* metrics is preserved mostly unchanged.
The transformation from enum back to string label is performed in Rust during
enum_get
. This is an improvement over the current situation as the strings
in use are guaranteed to not be allocated. However, I suspect we can improve
even more by moving this translation lower (into the SDK, ideally).
GIFFT support requires getting the label string from the enum, which is exposed
on the FFI.
Depends on D170108
Assignee | ||
Comment 18•2 years ago
|
||
I wouldn't mind some feedback on these. They're landable, but I'm not sure they're 100%. And I'm not 100% sold on the ergonomics of it. I'm especially open to suggestions on how to make the C++ API more pleasant, and whether switching to strings as soon as we hit Rust in FOG is worth the effort (or whether I should try for the Full Solution and make the enums literally refer to the indices in the labels: Vec<String>
in glean-core
's labeled.rs
)
Putting these up now for discussion (no ni?
yet. Not gonna compel feedback until I have a more concrete request) and so I can switch gears a bit.
Assignee | ||
Comment 19•2 years ago
|
||
(Alrighty, I've figured out what concrete questions I have)
:nika, :florian - As two of the folks who've in the past expressed interest in using static-labeled_*
metrics for enum-type-things, what are your thoughts about the ergonomics I've settled on? (e.g. the enum is in the metric category's namespace, the prefixing and casing of the variants, the existence of the __other__
variant and its position at the end of the enum, the naming of EnumGet
, etc etc). Are there pieces missing that'd make this a pain for you to use? Feel free to use the patchsets as a guide, but don't feel you need to perform code review.
:Travis - As Glean Tech Lead, what's your thoughts about this? We discussed synchronously about this FOG-first approach, but what do you think about using the enum discriminant as an index into the labels
array? Should I push ahead on running these ideas down into the Glean SDK before landing, or will these changes be enough of an improvement (and forward-looking) to land as-is?
As always, feel free to bring in or suggest others who might wish to have a seat at this table.
Comment 20•2 years ago
|
||
The part that worries me the most about indexing into the labels
array using the enum is something you already stated:
"The order of the values is important"
How can we guarantee a deterministic ordering of the enums? That feels like it could be a deceptively complicated problem considering that sorting the labels in any way could change the order if/when new labels are added, as well as we don't control the order in the metrics.yaml file that static labels are listed.
I do really like the concept of using the discriminate as an index, it cuts down on the amount of strings crossing the FFI which should always be good, but I'm worried about guaranteeing the stability of the discriminate
Assignee | ||
Comment 21•2 years ago
|
||
Luckily, so long as we control both the SDK and the parser, the order of the labels can be stabilized. glean_parser
's object model has ordered_labels
which embedders who care about ordering can be sure is the order of the Vec
that gets passed to the LabeledXMetric
's ctor. Then the order of the labels: []
in the definition matches the order of the enum
matches the order of the labels: Vec<CowString>
in glean-core
, and everyone's happy.
And so long as we keep storing strings, this ordering need only be stable within a build.
Comment 22•2 years ago
|
||
e__Other__
is a bit awkward as a name, but I suppose it's fine. I'm a bit uncertain about how well this will work with IPC code because all labels still need to be written in the metrics.yaml file, but it's still definitely an improvement over the old approach. We'll probably need to have IPDL consume the metrics.yaml file as an input to make sure we don't try to name an enum which isn't in the set.
Overall though I think that having explicit enums is definitely a big improvement.
Assignee | ||
Comment 23•2 years ago
|
||
(In reply to Nika Layzell [:nika] (ni? for response) from comment #22)
all labels still need to be written in the metrics.yaml file
We need some sort of mapping from enum value to string label someplace (since enum values aren't stable, though their meanings are intended to be), and it's the SDK that needs to know it. Alas, the only way we currently have to tell the SDK anything is via metrics.yaml
, so that's where it all goes. Everything else I could come up with was worse : (
We'll probably need to have IPDL consume the metrics.yaml file as an input to make sure we don't try to name an enum which isn't in the set
glean_parser
is in-tree and ready to parse files for you whenever you need it.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 24•2 years ago
|
||
Comment 25•2 years ago
|
||
Backed out 3 changesets (Bug 1672273) for causing OSX build bustages CLOSED TREE
Log: https://treeherder.mozilla.org/logviewer?job_id=413887957&repo=autoland&lineNumber=77965
Backout: https://hg.mozilla.org/integration/autoland/rev/1badd2603e050467d1f37b90244f4bad75d87f00
Assignee | ||
Comment 26•2 years ago
|
||
[task 2023-04-26T20:43:53.168Z] 20:43:53 ERROR - error[E0412]: cannot find type `_Pred` in this scope
[task 2023-04-26T20:43:53.168Z] 20:43:53 INFO - --> /builds/worker/workspace/obj-build/x86_64-apple-darwin/release/build/style-bcd2002c8774f423/out/gecko/structs.rs:460:40
I confess a degree of confusion about how this has anything to do with my patch, but I'm looking.
Assignee | ||
Comment 27•2 years ago
|
||
Not sure why this only pops up as part of this patchset. I was forced to move
some std::tuple-using code into a widely-included header. Maybe that's it?
Depends on D170109
Comment 28•2 years ago
|
||
Comment 29•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/d671c627fba3
https://hg.mozilla.org/mozilla-central/rev/349d6fa3f929
https://hg.mozilla.org/mozilla-central/rev/7c551c5379b4
Updated•2 years ago
|
Description
•