Language Bindings have difficulty recording errors
Categories
(Data Platform and Tools :: Glean: SDK, enhancement, P3)
Tracking
(Not tracked)
People
(Reporter: chutten, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: [telemetry:glean-rs:backlog])
Attachments
(2 files)
The Glean SDK has a lovely Error Reporting mechanism for metrics. Our specific metrics APIs trie to forbid any and all misuse, but for the ones that sneak in despite our best efforts we need to record and report them.
Some errors of usage are only known and checked deep in the core. What is the length for an event extra value? What is the regex for valid labels? What is the maximum value we permit in a Memory Distribution?
Some errors are known at the Language Binding layer, but are passed deep into the core anyway so that we can record errors about them. Counters cannot count backwards, so negative values are an error. We could forbid them at the API by using an unsigned type... but not all of our language bindings support unsigned types. So we weaken our specific metrics API to accept a signed type just so negative values can reach the error reporting machinery in glean-core.
And then there's errors known at the Language Binding layer that we can't pass down. FOG forbids certain operations over IPC. It would be nice to be able to report errors in the same place that our users expect to find them, and in the same ways.
LBs could manufacture their own labeled_counter
-based error system, but that would have some notable flaws:
- It'd live in a different place from glean-core error reporting, making life difficult for testing, analysis, and tooling.
- It has a Ping Problem: errors in glean-core are reported on all pings in which the offending metric is sent (and the "metrics" ping). That is straight up impossible to do with
labeled_counter
s. - It'd be limited to sixteen erroring metrics, which is a limit that glean-core errors doesn't seem to share.
I suggest maybe we should open up core error reporting to the Language Binding layer.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 1•3 years ago
|
||
Reporter | ||
Comment 2•3 years ago
|
||
ni?me to fill in FOG's use case to move towards a proposed design
Reporter | ||
Comment 3•3 years ago
|
||
FOG would like to:
- Add counts to existing Glean errors.
- In the event someone hands me a negative number for a counter addition, I don't want to have to pass it all the way to the core for it to be recorded as an error. I'd like the MLA counter_add FFI to be unsigned and to be able to record the error so it shows up in tests.
- Add new errors
- The most common example of this is IPC. If someone tries to
set
a boolean metric in a non-parent process, that is a non-commutative operation and is thus forbidden (at least for now). I'd like to be able to add an error that would be returned fromtest_get_num_recorded_errors
to aid in testing and added to that metric's pings in case the error manifests in the wild.
- The most common example of this is IPC. If someone tries to
(( and while we're at it, maybe we should tackle this suggestion of mine where we consider the ergonomics of using test_get_num_recorded_errors
from our users who might not know which specific errors they're looking for. ))
The way I see this taking shape is basically an exported version of record_error
. Maybe it doesn't bother with the impl Display
param, or maybe it does. Since CommonMetricData
and ErrorType
are necessarily exposed by RLB (possibly other LBs?), we should treat this carefully as it becomes a public-facing API even if we ask people not to use it.
Comment 4•3 years ago
|
||
I think we'll still want to do this, and it remains valid, but it is not actively being worked on.
Updated•3 years ago
|
Comment 5•4 months ago
|
||
Coming back to this now:
Back then we didn't follow up with this because it would have meant a whole lot of extra code manually added to each and every metric.
These days we have UniFFI and might be able to keep the additional code to a minimum inside glean-core.
We have the trait MetricType
that each metric type implements.
We can add a function record_error(error_type, count)
there with a default implementation that calls error_recording::record_error
.
Then we add that record_error(error_type, count)
method to each metric type in glean-core/src/glean.udl
and moments later we can record errors without much hassle from foreign languages.
That then still needs wiring into FOG, but that might ultimately be easier then.
Comment 6•2 months ago
|
||
Updated•1 month ago
|
Description
•