Open Bug 2004958 Opened 1 day ago Updated 2 hours ago

Should we ignore expected errors in sync success rates?

Tracking

(Not tracked)

Status:

NEW

People

(Reporter: bdk, Unassigned)

References

Details

(Whiteboard: [fxsync-])

Ben Dean-Kawamura [:bdk]

Reporter

Description

•

1 day ago

Our sync errors are currently dominated by things like httperror, Network error, shutdownerror, etc. These are errors that we expect to see in normal operation. Maybe they're more noise than signal and we shouldn't include them in in the sync success rate calculation (they wouldn't count as success or failure). We should still consider counting them and tracking them, but in a different visualization.

Jira Integration Bot

Updated

•

1 day ago

See Also: → https://mozilla-hub.atlassian.net/browse/SYNC-5093

Mark Hammond [:markh] [:mhammond]

Comment 1

•

13 hours ago

I'm a little torn here. I think different visualizations are problematic because at some point we simply don't check them - if we can squeeze them onto a single dashboard it might be fine, but that only scales so far.

Also, I guess this is a kind of philosophical question: what are our dashboards actually for? If they are only to track bugs in the code etc, then ignoring them makes sense. If we want a true measure of success as seem by our users, ignoring them makes less sense. Are elevated 401s a problem? Is a spike in ShutdownErrors a problem? I'd say they are and we should know about them.

That said, I agree it does make our stuff noisy. Is there a middle ground? Can we split into "failure" and "error", where the latter are the expected ones, but keep them on the same graph?

Ben Dean-Kawamura [:bdk]

Reporter

Comment 2

•

2 hours ago

Yeah, I feel torn too. I like the idea of trying to display both on the same graph, maybe we could lean into the idea of "engine success" vs "general sync success". If we see a network error, then we count it as a failure to sync, but not a failure for the current engine. Then maybe we aim for 99.9% success rates for our engines, and accept that sync in general will be closer 90% (I'm kind of guessing at these numbers). We could graph both on the same chart, maybe with a logarithmic scale.

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Should we ignore expected errors in sync success rates?

Categories

(Application Services :: Places, enhancement)

Tracking

(Not tracked)

People

(Reporter: bdk, Unassigned)

References

Details

(Whiteboard: [fxsync-])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2