Closed Bug 1637798 Opened 4 years ago Closed 4 years ago

Overhaul sync ping telemetry error reporting

Categories

(Firefox :: Sync, defect, P3)

defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: markh, Unassigned)

Details

(Whiteboard: SACI)

Error reporting in sync pings is, frankly, useless:

  • Many "reasons" include strings which are variable per user - eg, they include timestamps or guids. Thus, dashboards see thousands of low-rate errors instead of a single high-rate error. See, eg, normalizeFailure in https://gist.github.com/mhammond/749d59be7357eeedce31a1df5ac370b1

  • They are implementation specific - eg, a DNS failure on desktop is reported as nserror with an obscure hex string. A dns failure on Android is reported differently (but I've no idea how off the top of my head. It does not make sense to report nserror - it makes sense to report some nserror codes as 'network error', some as others, etc

  • etc.

The error reporting should be designed to be analyzed.

(This is going to be trickier than it sounds. We will still need some way of handling truly unknown or unexpected errors)

Many "reasons" include strings which are variable per user - eg, they include timestamps or guids. Thus, dashboards see thousands of low-rate errors instead of a single high-rate error.

Yess, I had to write something similar to your normalizeFailure function as BigQuery JS function—and that was just for bookmarks! That level of detail is useful in trace logs, where we can see the GUID in context—either in a tree, record cleartext, or both—but it's not really helpful for telemetry, where we're looking at errors in aggregate.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.