|Submitter||Diff||Changes||Open Issues||Last Updated|
|Error loading review requests:|
MozReview Request: Bug 1232356 - Remove uninteresting Sync probes; bump versions for the others. r?bsmedberg
40 bytes, text/x-review-board-request
|Details | Review|
* FXA_UNVERIFIED_ACCOUNT_ERRORS, FXA_SERVER_ERRORS, and WEAVE_HMAC_ERRORS had no submissions for 43-45. * WEAVE_ENGINE_APPLY_NEW_FAILURES only recorded failures for add-ons in 43-45. That's interesting, but not particularly useful. Maybe we should add a counter keyed on the add-on name as a follow-up? * WEAVE_ENGINE_SYNC_ERRORS isn't very interesting. History had the highest multiple error rate, followed by bookmarks. Most submissions showed one failure for history, add-ons, bookmarks, and tabs...but nothing really stands out. The percentages are close for each engine type.
Created attachment 8698090 [details] MozReview Request: Bug 1232356 - Remove uninteresting Sync probes; bump versions for the others. r?bsmedberg Bug 1232356 - Remove uninteresting Sync probes; bump versions for the others. r?bsmedberg
I'm going to mark data-feedback+ on this, but I am not a reviewer for this in general. From a more general quality perspective, I encourage teams to keep permanent telemetry on known error cases and have dashboards to monitor and alert on those on a quick or even realtime basis. So it feels funny to be removing some probes just because the current error rate is small.
Comment on attachment 8698090 [details] MozReview Request: Bug 1232356 - Remove uninteresting Sync probes; bump versions for the others. r?bsmedberg https://reviewboard.mozilla.org/r/27791/#review24991 ::: toolkit/components/telemetry/Histograms.json:9836 (Diff revision 1) > - "expires_in_version": "46", > + "expires_in_version": "50", A bunch of these are bumping version without much explanation. I can think of two reasons to keep this data: A. We've found that data is correct and helpful and we have production monitoring this data. If this is the case, we should make it expires_in_version: never. B. We still don't have confidence in the data, or haven't produced a dashboard to monitor it effectively, but we still think it's going to be valuable, and we just want an extension to finish testing/reporting. Which of these seems more true, or is there something else going on? ::: toolkit/components/telemetry/Histograms.json:9839 (Diff revision 1) > "description": "If the user is signed in to a Firefox Account on this device" While you're here, can you document *when* this histogram is recorded? The current doc is pretty unclear about whether this happens once at startup, or can happen other times during the run.
Sorry, I should've left some context. Most of these probes were added because we weren't sure what was causing Sync authentication errors, with the intent to revisit once we had some initial data. To that end, some of these don't seem actionable (WEAVE_HMAC_ERRORS, for example), and some are better monitored by the server (FXA_UNVERIFIED_ACCOUNT_ERRORS, FXA_SERVER_ERRORS, TOKENSERVER_AUTH_ERRORS). There may be value in keeping WEAVE_ENGINE_APPLY_NEW_FAILURES and WEAVE_ENGINE_SYNC_ERRORS, but, if I'm reading the dashboards correctly, I don't think they tell us much beyond "most samples have one bad record." For the others, I bumped the version because I think we need more information to decide whether the probes are valuable.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1236383
You need to log in before you can comment on or make changes to this bug.