Closed
Bug 1332780
Opened 7 years ago
Closed 7 years ago
Telemetry probes not recognizing 4xx and 5xx server status codes
Categories
(Toolkit :: Safe Browsing, defect, P2)
Toolkit
Safe Browsing
Tracking
()
RESOLVED
FIXED
mozilla54
Tracking | Status | |
---|---|---|
firefox54 | --- | fixed |
People
(Reporter: francois, Assigned: dimi)
References
Details
(Whiteboard: #sbv4-m5)
Attachments
(1 file)
There are about 3% of update requests that return a status code we don't understand: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2017-01-19&keys=google!other!mozilla!__none__&max_channel_version=nightly%252F53&measure=URLCLASSIFIER_UPDATE_REMOTE_STATUS2&min_channel_version=null&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2016-12-21&table=0&trim=1&use_submission_date=0 This is unusual and we should investigate why we can't get the real status code for these. At the very least, these failed requests should end up in the 4xx or 5xx buckets. On the shavar.services.mozilla.com side, we see a lot of 502s but there are none in telemetry.
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → dlee
Status: NEW → ASSIGNED
Assignee | ||
Comment 1•7 years ago
|
||
Right now the "other" category of URLCLASSIFIER_UPDATE_REMOTE_STATUS2 actually contains the result of v2 update(both google and mozilla) and v4 update (See Bug 1332770 Comment 1). I see the old telemetry(URLCLASSIFIER_UPDATE_REMOTE_STATUS) also has unrecognized status code, so to fully understand the unrecognized status code, we must fix Bug 1332770 first so we can know which provider return those.
Depends on: 1332770
Assignee | ||
Comment 2•7 years ago
|
||
(In reply to François Marier [:francois] from comment #0) > On the shavar.services.mozilla.com side, we see a lot of 502s but there are > none in telemetry. This is because lots of errors from shavar.services.mozilla.com were recorded to "other" category.
Reporter | ||
Comment 3•7 years ago
|
||
Something else that ckolos found is that there's a known bug in the Amazon ELB that says if the client aborts a connection when the Transfer-Encoding header is set, it will count as a 4xx.
Assignee | ||
Updated•7 years ago
|
Whiteboard: #sbv4-m6
Assignee | ||
Comment 4•7 years ago
|
||
Just checked the telemetry dashboard after bug 1332770 was landed, there is still about 3% unknown error code: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2017-02-14&keys=google!google4!mozilla!__none__&max_channel_version=nightly%252F54&measure=URLCLASSIFIER_UPDATE_REMOTE_STATUS2&min_channel_version=null&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2017-02-02&table=0&trim=1&use_submission_date=0 so I will dig into this.
Assignee | ||
Comment 5•7 years ago
|
||
Hi francois, I suspect that the error may because of connection error to server, for example, tcp timeout, which we will treat it as unknown error[1]. So i am thinking if we should add another telemetry to record these[2], how do you think ? [1] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierStreamUpdater.cpp#653 [2] https://dxr.mozilla.org/mozilla-central/source/xpcom/base/ErrorList.h#205
Flags: needinfo?(francois)
Reporter | ||
Comment 6•7 years ago
|
||
(In reply to Dimi Lee[:dimi][:dlee] from comment #5) > I suspect that the error may because of connection error to server, for > example, tcp timeout, which we will treat it as unknown error[1]. > So i am thinking if we should add another telemetry to record these[2], how > do you think ? I think you're right, it would be good to see any kind of connection error reported separately from genuine "unknown errrors". We can't do much about the connection errors, but they could be hiding logic errors in our code.
Flags: needinfo?(francois)
Assignee | ||
Updated•7 years ago
|
Whiteboard: #sbv4-m6 → #sbv4-m5
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8839381 -
Flags: review?(francois)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 10•7 years ago
|
||
An alternative approach will be something like uint16_t errorCode = NS_ERROR_GET_MODULE(status) == NS_ERROR_MODULE_NETWORK ? NS_ERROR_GET_CODE(status) : 0; mozilla::Telemetry::Accumulate( mozilla::Telemetry::URLCLASSIFIER_UPDATE_REMOTE_NETWORK_ERROR, mTelemetryProvider, errorCode); This way we will get complete result but it also requires very large bucket for telemetry. I saw error message "Histograms with large numbers of buckets use disproportionately high amounts of resources." when trying to add a large bucket to telemetry, so I just check connection errors and DNS errors in the patch.
Reporter | ||
Comment 11•7 years ago
|
||
mozreview-review |
Comment on attachment 8839381 [details] Bug 1332780 - Add telemetry to record network error while SafeBrowsing update. https://reviewboard.mozilla.org/r/114064/#review116510 I like your patch and your selection of network errors to report. The only thing that's missing IMHO is the "no error" case. I'd like to be able to know the error rate per provider. So let's make `0 = NS_OK` then `1 = unknown`, and then the rest of what you have. This way, we can see the % of updates that don't have network errors (the `0` case) and therefore the relative importance of these errors. datareview+ ::: toolkit/components/telemetry/Histograms.json:4121 (Diff revision 2) > + "expires_in_version": "60", > + "kind": "enumerated", > + "keyed": true, > + "n_values": 30, > + "bug_numbers": [1332780], > + "description": "Network error from SafeBrowsing database updates. (0=others, 1=already connected, 2=not connected, 3=connection refused,4=net timeout, 5=offline, 6=port access not allowed, 7=net reset, 8=net interrupt, 9=proxy connection refused,10=partial transfer,11=inadequate security,12=unknown host,13=dns lookup queue full,14=unknown proxy host" nit: "other" instead of "others"
Attachment #8839381 -
Flags: review?(francois) → review+
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Keywords: checkin-needed
Comment 13•7 years ago
|
||
Pushed by cbook@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9d135a51856f Add telemetry to record network error while SafeBrowsing update. r=francois
Keywords: checkin-needed
Comment 14•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/9d135a51856f
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox54:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
You need to log in
before you can comment on or make changes to this bug.
Description
•