Closed
Bug 1311910
Opened 8 years ago
Closed 7 years ago
Add telemetry to measure update error rate for V2 and V4
Categories
(Toolkit :: Safe Browsing, defect, P2)
Toolkit
Safe Browsing
Tracking
()
RESOLVED
FIXED
mozilla53
Tracking | Status | |
---|---|---|
firefox53 | --- | fixed |
People
(Reporter: dlee, Assigned: tnguyen)
References
Details
(Whiteboard: #sbv4-m3)
Attachments
(1 file, 2 obsolete files)
As discussed in WW, we should record the update fail rate for v2 and v4 to get better understanding about if we could ship v4.
Reporter | ||
Updated•8 years ago
|
Assignee: nobody → dlee
Status: NEW → ASSIGNED
Reporter | ||
Updated•8 years ago
|
Priority: -- → P2
Reporter | ||
Updated•8 years ago
|
Summary: Add telemetry to measure update error rate for V2 and V4 → Add telemetry to measure update error and update timeout rate for V2 and V4
Updated•8 years ago
|
Blocks: SafeBrowsingv4Telemetry
Comment hidden (mozreview-request) |
Reporter | ||
Comment 2•8 years ago
|
||
Comment on attachment 8805032 [details] Bug 1311910 - Add telemetry to measure update error and update timeout rate for V2 and V4. Hi Henry, I still can't set you as a reviewer by mozreview, i am not sure if the reviewer "henry" is you.
Attachment #8805032 -
Flags: review?(hchang)
Updated•8 years ago
|
Attachment #8805032 -
Flags: review?(hchang)
Reporter | ||
Updated•8 years ago
|
Attachment #8805032 -
Flags: review?(hchang)
Comment 3•8 years ago
|
||
mozreview-review |
Comment on attachment 8805032 [details] Bug 1311910 - Add telemetry to measure update error and update timeout rate for V2 and V4. https://reviewboard.mozilla.org/r/88862/#review88256 The change should be very useful to v4 development :) BTW, the suffix matching ('-proto') seems to have scattered everywhere. Maybe it's now worth having a utility function for v2/v4. I would file a bug for this.
Attachment #8805032 -
Flags: review?(hchang) → review+
Reporter | ||
Comment 4•8 years ago
|
||
(In reply to Henry Chang [:henry][:hchang] from comment #3) > > The change should be very useful to v4 development :) BTW, the suffix > matching ('-proto') seems to have scattered everywhere. Maybe it's now worth > having a utility function for v2/v4. I would file a bug for this. Agree! Thanks for doing that :)
Reporter | ||
Updated•8 years ago
|
Attachment #8805032 -
Flags: review?(francois)
Comment 5•8 years ago
|
||
mozreview-review |
Comment on attachment 8805032 [details] Bug 1311910 - Add telemetry to measure update error and update timeout rate for V2 and V4. https://reviewboard.mozilla.org/r/88862/#review89076 While thinking more about this patch and what we are trying to measure with this patch, I realized that we're not currently bucketing our existing server-related telemetry in the right way. I think the data should be keyed by provider instead of protocol. That would allow us, for example, to separate the TP updates from the Google SB ones, as well as separating google from google4 all with a single `URLCLASSIFIER_UPDATE_REMOTE_STATUSCODE` telemetry probe (we can't change existing probes, we need to add new ones). So we probably need a keyed histogram like this: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2016-10-30&keys=All!AV%252C241-480!AV%252C481-720!V%252C241-480&max_channel_version=nightly%252F52&measure=VIDEO_SUSPEND_RECOVERY_TIME_MS&min_channel_version=null&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2016-09-19&table=0&trim=1&use_submission_date=0 We should also replace these histograms with keyed ones: - `URLCLASSIFIER_COMPLETE_TIMEOUT` - `URLCLASSIFIER_COMPLETE_REMOTE_STATUS` The other thing I'm thinking about is that this telemetry we already have isn't as useful as it could be: ``` "URLCLASSIFIER_UPDATE_ERROR_TYPE": { "alert_emails": ["safebrowsing-telemetry@mozilla.org"], "expires_in_version": "58", "kind": "enumerated", "n_values": 10, "bug_numbers": [1305801], "description": "An error was encountered while parsing a partial update returned by a Safe Browsing V4 server (0 = addition of an already existing prefix, 1 = parser got into an infinite loop, 2 = removal index out of bounds)" } ``` since the denominator is unclear. If we kept track of successes as well, we would get the error/success rate here instead of just an error count with nothing to compare it against. So we could replace `URLCLASSIFIER_UPDATE_V2_SUCCESS_RATE` and `URLCLASSIFIER_UPDATE_V4_SUCCESS_RATE` with `URLCLASSIFIER_UPDATE_ERROR`, a keyed version of `URLCLASSIFIER_UPDATE_ERROR_TYPE` that would keep track of sucessess too. This would allow us to remove `URLCLASSIFIER_UPDATE_ERROR_TYPE` since we just added it and never actually used it in a release. Do you think the above changes will allow us to answer the questions we are interested in?
Attachment #8805032 -
Flags: review?(francois) → review-
Reporter | ||
Comment 6•8 years ago
|
||
(In reply to François Marier [:francois] from comment #5) > Comment on attachment 8805032 [details] > Bug 1311910 - Add telemetry to measure update error and update timeout rate > for V2 and V4. > > https://reviewboard.mozilla.org/r/88862/#review89076 > > Do you think the above changes will allow us to answer the questions we are > interested in? Yes, i think keyed by provider instead of protocol version is a good idea. I'll update my patch, thanks for suggestion !
Reporter | ||
Comment 7•8 years ago
|
||
(In reply to François Marier [:francois] from comment #5) > > So we could replace `URLCLASSIFIER_UPDATE_V2_SUCCESS_RATE` and > `URLCLASSIFIER_UPDATE_V4_SUCCESS_RATE` with `URLCLASSIFIER_UPDATE_ERROR`, a > keyed version of `URLCLASSIFIER_UPDATE_ERROR_TYPE` that would keep track of > sucessess too. > > This would allow us to remove `URLCLASSIFIER_UPDATE_ERROR_TYPE` since we > just added it and never actually used it in a release. > Right now URLCLASSIFIER_UPDATE_ERROR_TYPE is an enum and it is used to record different type of Full/Partial update "algorithm" error(not include update network error, protocol parser error ... etc). So if we want to merge it with keyed URLCLASSIFIER_UPDATE_SUCCESS_RATE here, then how should we treat all the other update errors ? do we just set those errors with a fixed value in histogram ? or we should expand our enumeration to include all different kind of errors ?
Flags: needinfo?(francois)
Comment 8•8 years ago
|
||
(In reply to Dimi Lee[:dimi][:dlee] from comment #7) > Right now URLCLASSIFIER_UPDATE_ERROR_TYPE is an enum and it is used to > record different type of Full/Partial update "algorithm" error(not include > update network error, protocol parser error ... etc). So if we want to merge > it with keyed URLCLASSIFIER_UPDATE_SUCCESS_RATE here, then how should we > treat all the other update errors ? do we just set those errors with a fixed > value in histogram ? or we should expand our enumeration to include all > different kind of errors ? Using an enum is fine, but we only need to keep track of the errors we care about. I would suggest using 0 for "success", 1 for "unspecified error" and then have 2, 3, 4, etc. for the specific errors we already have. Of course, if you see other errors that we probably want to know about separately, feel free to move them out of the "1" bucket and give them their own error number. One thing to note if you don't already know is that we can't bump the "n_values" in Histograms.json once we set it. So maybe we should make that 16 to give ourselves some room.
Flags: needinfo?(francois)
Assignee | ||
Updated•8 years ago
|
Assignee: dlee → tnguyen
Assignee | ||
Comment 9•8 years ago
|
||
MozReview-Commit-ID: 5xFbV33l8Q7
Assignee | ||
Updated•8 years ago
|
Attachment #8805032 -
Attachment is obsolete: true
Comment hidden (mozreview-request) |
Assignee | ||
Updated•8 years ago
|
Attachment #8813112 -
Attachment is obsolete: true
Comment 11•8 years ago
|
||
mozreview-review |
Comment on attachment 8814789 [details] Bug 1311910 - Add telemetry to measure update error and update timeout rate for V2 and V4 https://reviewboard.mozilla.org/r/94628/#review96952 This patch looks good and it seems like you've updated everything correctly. I also think we can avoid adding a new member variable to hold the update errors. See my inline comments. ::: toolkit/components/telemetry/Histograms.json:4138 (Diff revision 1) > "kind": "exponential", > "high": 200, > "n_buckets": 10, > "description": "Size of the completion cache in entries" > }, > - "URLCLASSIFIER_UPDATE_REMOTE_STATUS": { > + "URLCLASSIFIER_UPDATE_REMOTE_KEYED_STATUS": { nit: `URLCLASSIFIER_UPDATE_REMOTE_STATUS2` seems to be the common pattern when we just want to change the detaisl of a probe ::: toolkit/components/url-classifier/Classifier.cpp:610 (Diff revision 1) > LOG(("update took %dms\n", > PR_IntervalToMilliseconds(clockEnd - clockStart))); > } > > + mLastUpdateResult = UPDATE_SUCCESS; > + nit: we can remove this blank line ::: toolkit/components/url-classifier/Classifier.cpp:1086 (Diff revision 1) > NS_ENSURE_TRUE(updateV4, NS_ERROR_FAILURE); > > if (updateV4->IsFullUpdate()) { > input->Clear(); > output->Clear(); > - rv = lookupCache->ApplyUpdate(updateV4, *input, *output); > + rv = lookupCache->ApplyUpdate(updateV4, *input, *output, mLastUpdateResult); This is a bit weird. We return two different error codes. Maybe what we need is simply to defined a new `NS_ERROR_MODULE_URL_CLASSIFIER` in `xpcom/base/ErrorList.h` just like http://searchfox.org/mozilla-central/rev/ef3cede9a5b050fffade9055ca274654f2bc719e/xpcom/base/ErrorList.h#684-718 That way, we can simply return `nsresult`s like `NS_ERROR_UC_UPDATE_SUCCESS` and `NS_ERROR_UC_UPDATE_DUPLICATE_PREFIX`. ::: toolkit/components/url-classifier/SafeBrowsingError.h:6 (Diff revision 1) > +//* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ > +/* This Source Code Form is subject to the terms of the Mozilla Public > + * License, v. 2.0. If a copy of the MPL was not distributed with this > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > + > +#ifndef SafeBrowsingError_h__ The old code we have doesn't following the coding guidelines, but let's make sure we name the guards properly: https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style#CC_practices i.e. `SafeBrowsingError_h`
Attachment #8814789 -
Flags: review?(francois) → review-
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 14•8 years ago
|
||
mozreview-review |
Comment on attachment 8814789 [details] Bug 1311910 - Add telemetry to measure update error and update timeout rate for V2 and V4 https://reviewboard.mozilla.org/r/94628/#review98254 ::: toolkit/components/telemetry/Histograms.json:4169 (Diff revision 3) > - "expires_in_version": "58", > + "expires_in_version": "59", > "kind": "enumerated", > - "n_values": 10, > - "bug_numbers": [1305801], > - "description": "An error was encountered while parsing a partial update returned by a Safe Browsing V4 server (0 = addition of an already existing prefix, 1 = parser got into an infinite loop, 2 = removal index out of bounds, 3 = checksum mismatch, 4 = missing checksum)" > + "keyed": true, > + "n_values": 16, > + "bug_numbers": [1311910], > + "description": "An error was encountered while update (0 = success, 1 = unspecified error, 2 = addition of an already existing prefix, 3 = parser got into an infinite loop, 4 = removal index out of bounds, 5 = checksum mismatch, 6 = missing checksum). Keyed by provider" "Whether or not an error was encountered while processing a Safe Browsing update (0 = success..." ::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:630 (Diff revision 3) > + nsCString provider; > + // Assume that all the tables in update should have the same provider. > + urlUtil->GetTelemetryProvider(mUpdateTables.SafeElementAt(0, EmptyCString()), provider); > + > + nsresult updateStatus = > + NS_ERROR_GET_MODULE(mUpdateStatus) == NS_ERROR_MODULE_URL_CLASSIFIER ? mUpdateStatus : It's a very good idea to check this! ::: xpcom/base/ErrorList.h:995 (Diff revision 3) > > /* ======================================================================= */ > + /* 42: NS_ERROR_MODULE_URL_CLASSIFIER */ > + /* ======================================================================= */ > +#define MODULE NS_ERROR_MODULE_URL_CLASSIFIER > + /* Update-related codes */ "Update-related error codes" ::: xpcom/base/ErrorList.h:996 (Diff revision 3) > /* ======================================================================= */ > + /* 42: NS_ERROR_MODULE_URL_CLASSIFIER */ > + /* ======================================================================= */ > +#define MODULE NS_ERROR_MODULE_URL_CLASSIFIER > + /* Update-related codes */ > + /* A success code that indicates the update is successfully applied */ I think this comment is redundant. ::: xpcom/base/ErrorList.h:997 (Diff revision 3) > + /* 42: NS_ERROR_MODULE_URL_CLASSIFIER */ > + /* ======================================================================= */ > +#define MODULE NS_ERROR_MODULE_URL_CLASSIFIER > + /* Update-related codes */ > + /* A success code that indicates the update is successfully applied */ > + ERROR(NS_SUCCESS_UC_UPDATE, SUCCESS(0)), Let's rely on `NS_OK` for this case so that we don't have to define a new return code for the case we don't care about. ::: xpcom/base/ErrorList.h:1000 (Diff revision 3) > + /* Update-related codes */ > + /* A success code that indicates the update is successfully applied */ > + ERROR(NS_SUCCESS_UC_UPDATE, SUCCESS(0)), > + > + ERROR(NS_ERROR_UC_UPDATE_UNKNOWN, FAILURE(1)), > + /* Error codes of applying V4 update. */ We could also remove this one.
Attachment #8814789 -
Flags: review?(francois) → review-
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 17•7 years ago
|
||
mozreview-review |
Comment on attachment 8814789 [details] Bug 1311910 - Add telemetry to measure update error and update timeout rate for V2 and V4 https://reviewboard.mozilla.org/r/94628/#review98262 A few minor things to fix and then this is good to go. ::: toolkit/components/telemetry/Histograms.json:4145 (Diff revisions 4 - 5) > "expires_in_version": "never", > "keyed": true, > "kind": "enumerated", > "n_values": 16, > "bug_numbers": [1311910], > - "description": "Server HTTP status code from SafeBrowsing database updates. (0=1xx, 1=200, 2=2xx, 3=204, 4=3xx, 5=400, 6=4xx, 7=403, 8=404, 9=408, 10=413, 11=5xx, 12=502|504|511, 13=503, 14=505, 15=Other). Keyed by provider" > + "description": "Whether or not an error was encountered while processing a Safe Browsing update. (0=1xx, 1=200, 2=2xx, 3=204, 4=3xx, 5=400, 6=4xx, 7=403, 8=404, 9=408, 10=413, 11=5xx, 12=502|504|511, 13=503, 14=505, 15=Other). Keyed by provider" That's the wording I suggested for `URLCLASSIFIER_UPDATE_ERROR`, not `URLCLASSIFIER_UPDATE_REMOTE_STATUS2`. ::: toolkit/components/telemetry/Histograms.json:4169 (Diff revision 4) > - "expires_in_version": "58", > + "expires_in_version": "59", > "kind": "enumerated", > - "n_values": 10, > - "bug_numbers": [1305801], > - "description": "An error was encountered while parsing a partial update returned by a Safe Browsing V4 server (0 = addition of an already existing prefix, 1 = parser got into an infinite loop, 2 = removal index out of bounds, 3 = checksum mismatch, 4 = missing checksum)" > + "keyed": true, > + "n_values": 16, > + "bug_numbers": [1311910], > + "description": "An error was encountered while update (0 = success, 1 = unspecified error, 2 = addition of an already existing prefix, 3 = parser got into an infinite loop, 4 = removal index out of bounds, 5 = checksum mismatch, 6 = missing checksum). Keyed by provider" "during a Safe Browsing update" ::: toolkit/components/url-classifier/nsUrlClassifierDBService.h:144 (Diff revision 4) > // Thread that we do the updates on. > static nsIThread* gDbBackgroundThread; > }; > > -class nsUrlClassifierDBServiceWorker final : public nsIUrlClassifierDBService > +class nsUrlClassifierDBServiceWorker final : public nsIUrlClassifierDBService, > + public nsIURIClassifierCallback nit: indentation ::: toolkit/components/url-classifier/nsUrlClassifierDBService.h:151 (Diff revision 4) > public: > nsUrlClassifierDBServiceWorker(); > > NS_DECL_THREADSAFE_ISUPPORTS > NS_DECL_NSIURLCLASSIFIERDBSERVICE > +NS_DECL_NSIURICLASSIFIERCALLBACK nit: indentation ::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:359 (Diff revision 4) > } > - > +NS_IMETHODIMP > +nsUrlClassifierDBServiceWorker::OnClassifyComplete(nsresult aErrorCode) > +{ > + return NS_OK; > + nit: unnecessary blank line ::: toolkit/components/url-classifier/LookupCacheV4.cpp:295 (Diff revision 5) > // checksum in .metadata > std::string stdChecksum(checksum.BeginReading(), checksum.Length()); > aTableUpdate->NewChecksum(stdChecksum); > - > } else if (aTableUpdate->Checksum() != checksum){ > - LOG(("Checksum mismatch after applying partial update")); > + return NS_ERROR_UC_UPDATE_CHECKSUM_MISMATCH; We should retain this LOG line.
Attachment #8814789 -
Flags: review?(francois) → review-
Reporter | ||
Comment 18•7 years ago
|
||
since thomas is PTO today, i will help fix nits addressed in Comment 17
Comment hidden (mozreview-request) |
Reporter | ||
Comment 20•7 years ago
|
||
ah sorry i should push the last commit by thomas first then push fixes of Comment 17 again so we won't loss diff...
Comment 21•7 years ago
|
||
mozreview-review |
Comment on attachment 8805032 [details] Bug 1311910 - Add telemetry to measure update error and update timeout rate for V2 and V4. https://reviewboard.mozilla.org/r/88862/#review99276 ::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:1859 (Diff revision 2) > gFreshnessGuarantee = Preferences::GetInt(CONFIRM_AGE_PREF, > CONFIRM_AGE_DEFAULT_SEC); > } > } else if (!strcmp(aTopic, "quit-application")) { > Shutdown(); > + nsresult rv; I thought this was an unrelated change displayed because of a rebase, but this is not on mozilla-central yet: http://searchfox.org/mozilla-central/rev/09ae31fddfbc229ef1b8f1a3ccd87ad05680e695/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#1834 Any idea what this is?
Attachment #8805032 -
Flags: review?(francois) → review-
Reporter | ||
Comment 22•7 years ago
|
||
(In reply to François Marier [:francois] from comment #21) > > I thought this was an unrelated change displayed because of a rebase, but > this is not on mozilla-central yet: > http://searchfox.org/mozilla-central/rev/ > 09ae31fddfbc229ef1b8f1a3ccd87ad05680e695/toolkit/components/url-classifier/ > nsUrlClassifierDBService.cpp#1834 > > Any idea what this is? Just checked the review history of this bug and recently changes in nsUrlClassifierDBService.cpp. I think this is just accidentally added during rebase and maybe messed with local changes since thomas was working on shutdown-aware bug? I don't think we want this and will remove this in next patch. Thanks for finding this!
Comment hidden (mozreview-request) |
Comment 24•7 years ago
|
||
mozreview-review |
Comment on attachment 8805032 [details] Bug 1311910 - Add telemetry to measure update error and update timeout rate for V2 and V4. https://reviewboard.mozilla.org/r/88862/#review99552 Ok thanks for removing the stray change. I've also flagged a couple of other suspicious changes here. Can you double-check whether or not they are necessary? If they are, then feel free to land this without further review. If they're not necessary, just take them out and consider it already r+. datareview+ ::: toolkit/components/url-classifier/nsUrlClassifierDBService.h:144 (Diff revision 3) > // Thread that we do the updates on. > static nsIThread* gDbBackgroundThread; > }; > > -class nsUrlClassifierDBServiceWorker final : public nsIUrlClassifierDBService > +class nsUrlClassifierDBServiceWorker final : public nsIUrlClassifierDBService, > + public nsIURIClassifierCallback Is this necessary? ::: toolkit/components/url-classifier/nsUrlClassifierDBService.h:151 (Diff revision 3) > public: > nsUrlClassifierDBServiceWorker(); > > NS_DECL_THREADSAFE_ISUPPORTS > NS_DECL_NSIURLCLASSIFIERDBSERVICE > + NS_DECL_NSIURICLASSIFIERCALLBACK Is this necessary? ::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:140 (Diff revision 3) > > static mozilla::Atomic<int32_t> gFreshnessGuarantee(CONFIRM_AGE_DEFAULT_SEC); > > NS_IMPL_ISUPPORTS(nsUrlClassifierDBServiceWorker, > - nsIUrlClassifierDBService) > + nsIUrlClassifierDBService, > + nsIURIClassifierCallback) Is this necessary? ::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp:356 (Diff revision 3) > } > > return NS_OK; > } > > +NS_IMETHODIMP Is this necessary?
Attachment #8805032 -
Flags: review?(francois) → review+
Comment hidden (mozreview-request) |
Assignee | ||
Comment 26•7 years ago
|
||
(In reply to Dimi Lee[:dimi][:dlee] from comment #22) > (In reply to François Marier [:francois] from comment #21) > > > > I thought this was an unrelated change displayed because of a rebase, but > > this is not on mozilla-central yet: > > http://searchfox.org/mozilla-central/rev/ > > 09ae31fddfbc229ef1b8f1a3ccd87ad05680e695/toolkit/components/url-classifier/ > > nsUrlClassifierDBService.cpp#1834 > > > > Any idea what this is? > > Just checked the review history of this bug and recently changes in > nsUrlClassifierDBService.cpp. > I think this is just accidentally added during rebase and maybe messed with > local changes since thomas was working on shutdown-aware bug? > > I don't think we want this and will remove this in next patch. Thanks for > finding this! Oops, the my hg code base was corrupted when rebasing and I did not notice that. Thanks for helping me to fix that :(
Reporter | ||
Comment 27•7 years ago
|
||
(In reply to François Marier [:francois] from comment #24) > I've also flagged a couple of other suspicious changes here. Can you > double-check whether or not they are necessary? > Checked with thomas, those changes should be removed.
Reporter | ||
Updated•7 years ago
|
Attachment #8814789 -
Attachment is obsolete: true
Reporter | ||
Updated•7 years ago
|
Keywords: checkin-needed
Comment 28•7 years ago
|
||
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/autoland/rev/77b7f79852e0 Add telemetry to measure update error and update timeout rate for V2 and V4. r=francois,henry
Keywords: checkin-needed
Comment 29•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/77b7f79852e0
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox53:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Updated•7 years ago
|
Summary: Add telemetry to measure update error and update timeout rate for V2 and V4 → Add telemetry to measure update error rate for V2 and V4
You need to log in
before you can comment on or make changes to this bug.
Description
•