Overhaul sync ping telemetry error reporting
Categories
(Firefox :: Sync, defect, P3)
Tracking
()
People
(Reporter: markh, Unassigned)
Details
(Whiteboard: SACI)
Error reporting in sync pings is, frankly, useless:
-
Many "reasons" include strings which are variable per user - eg, they include timestamps or guids. Thus, dashboards see thousands of low-rate errors instead of a single high-rate error. See, eg, normalizeFailure in https://gist.github.com/mhammond/749d59be7357eeedce31a1df5ac370b1
-
They are implementation specific - eg, a DNS failure on desktop is reported as
nserror
with an obscure hex string. A dns failure on Android is reported differently (but I've no idea how off the top of my head. It does not make sense to reportnserror
- it makes sense to report some nserror codes as 'network error', some as others, etc -
etc.
The error reporting should be designed to be analyzed.
(This is going to be trickier than it sounds. We will still need some way of handling truly unknown or unexpected errors)
Comment 1•4 years ago
|
||
Many "reasons" include strings which are variable per user - eg, they include timestamps or guids. Thus, dashboards see thousands of low-rate errors instead of a single high-rate error.
Yess, I had to write something similar to your normalizeFailure
function as BigQuery JS function—and that was just for bookmarks! That level of detail is useful in trace logs, where we can see the GUID in context—either in a tree, record cleartext, or both—but it's not really helpful for telemetry, where we're looking at errors in aggregate.
Updated•4 years ago
|
Description
•