Closed Bug 1479901 Opened 7 years ago Closed 7 years ago

Investigate the 5% failure rate in application remote lookups

Categories

(Toolkit :: Safe Browsing, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: francois, Assigned: dimi)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

Right now we are seeing about a 5% failure rate for connecting to the application reputation server: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2018-07-26&keys=__none__!__none__!__none__&max_channel_version=beta%252F62&measure=APPLICATION_REPUTATION_SERVER&min_channel_version=null&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2018-06-25&table=1&trim=1&use_submission_date=0 We should investigate further by splitting this generic error bucket into multiple error types so that we can identify the source of errors more easily.
Dimi, if you have time to land this in 63, that could help out with the rest of the download protection work since we'd be getting better data as soon as Beta 63.
Assignee: nobody → dlee
Status: NEW → ASSIGNED
Attached patch WIP Patch (obsolete) — Splinter Review
Hi Francois, Would you mind taking a quick look at this patch? thanks!
Attachment #8997064 - Flags: feedback?(francois)
Comment on attachment 8997064 [details] [diff] [review] WIP Patch Review of attachment 8997064 [details] [diff] [review]: ----------------------------------------------------------------- A few suggestions to improve the patch, but that's exactly the approach I was thinking of. In fact, that's even more details that I was thinking of :) ::: toolkit/components/reputationservice/ApplicationReputationTelemetryUtils.cpp @@ +1,2 @@ > +/* This Source Code Form is subject to the terms of the Mozilla Public > + * * License, v. 2.0. If a copy of the MPL was not distributed with this nit: No need for the two `*` at the start of the line. @@ +6,5 @@ > +#include "mozilla/Assertions.h" > + > +namespace mozilla { > + > +using ServerLabel = Telemetry::LABELS_MEDIA_PLAY_PROMISE_RESOLUTION; `MEDIA_PLAY`? Is that a copy/paste error or are you re-using an existing list? It may be dangerous to reuse an existing list since the Media team could change their scheme at any time and that would impact our probes too. ::: toolkit/components/reputationservice/ApplicationReputationTelemetryUtils.h @@ +7,5 @@ > +#define ApplicationReputationTelemetryUtils_h__ > + > +#include "mozilla/Telemetry.h" > + > +Telemetry::LABELS_MEDIA_PLAY_PROMISE_RESOLUTION `MEDIA_PLAY`? @@ +10,5 @@ > + > +Telemetry::LABELS_MEDIA_PLAY_PROMISE_RESOLUTION > +NSErrorToServerLabel(nsresult rv); > + > +Telemetry::LABELS_MEDIA_PLAY_PROMISE_RESOLUTION ditto ::: toolkit/components/telemetry/Histograms.json @@ +114,4 @@ > "n_values": 3, > "description": "Status of the application reputation remote lookup (0=OK, 1=failed, 2=invalid protobuf response)" > }, > + "APPLICATION_REPUTATION_SERVER2": { nit: I think the `_2` suffix is more common than just `2`. @@ +115,5 @@ > "description": "Status of the application reputation remote lookup (0=OK, 1=failed, 2=invalid protobuf response)" > }, > + "APPLICATION_REPUTATION_SERVER2": { > + "record_in_processes": ["main", "content"], > + "alert_emails": ["francois@mozilla.com", "safebrowsing-telemetry@mozilla.org"], You may as well put your email address here since you're more likely to be keeping an eye on these :) @@ +118,5 @@ > + "record_in_processes": ["main", "content"], > + "alert_emails": ["francois@mozilla.com", "safebrowsing-telemetry@mozilla.org"], > + "bug_numbers": [1479901], > + "releaseChannelCollection": "opt-out", > + "expires_in_version": "66", Let's make this "never" since I think we'll want to continue to monitor this. @@ +121,5 @@ > + "releaseChannelCollection": "opt-out", > + "expires_in_version": "66", > + "kind": "categorical", > + "labels": ["ResponseValid", "FailGetChannel", "FailGetResponse", "HTTP1xx", "HTTP2xx", "HTTP204", "HTTP3xx", "HTTP400", "HTTP4xx", "HTTP403", "HTTP404", "HTTP408", "HTTP413", "HTTP5xx", "HTTP502_504_511", "HTTP513", "HTTP505", "HTTPOthers", "ErrAlreadyConnected", "ErrNotConnected", "ErrConnectionRefused", "ErrTimeout", "ErrOffline", "ErrPortAccess", "ErrNetReset", "ErrNetInterrupt", "ErrProxyConnection", "ErrNetPartial", "ErrNetInadequate", "ErrUnknownProxy", "ErrDNSLookupQueue", "ErrUnkownProxyHost", "ErrOthers"], > + "description": "Status of the application reputation remote lookup" Maybe "Network status" instead of just "Status"
Attachment #8997064 - Flags: feedback?(francois) → feedback-
Thanks for the feedback! > @@ +6,5 @@ > > +#include "mozilla/Assertions.h" > > + > > +namespace mozilla { > > + > > +using ServerLabel = Telemetry::LABELS_MEDIA_PLAY_PROMISE_RESOLUTION; > > `MEDIA_PLAY`? Is that a copy/paste error or are you re-using an existing > list? > > It may be dangerous to reuse an existing list since the Media team could > change their scheme at any time and that would impact our probes too. > Yes, that is a copy/paste error. I wanted to submit this patch before leaving office, didn't check carefully :(
Attached file data-review-request.md
Attachment #8998151 - Flags: review?(francois)
Attachment #8997064 - Attachment is obsolete: true
We only recorded "fail" or "invalid protobuf response" when we received an error while sending remote lookup. However, this is not enough for us to analyze the problem if the error rate is high. This patch introduced a more fine-grained telemetry probe that could help us understand the result better.
Comment on attachment 8998151 [details] data-review-request.md 1) Is there or will there be **documentation** that describes the schema for the ultimate data set available publicly, complete and accurate? Yes, in Histograms.json. 2) Is there a control mechanism that allows the user to turn the data collection on and off? Yes, telemetry setting. 3) If the request is for permanent data collection, is there someone who will monitor the data over time?** Yes, Dimi Lee. 4) Using the **[category system of data types](https://wiki.mozilla.org/Firefox/Data_Collection)** on the Mozilla wiki, what collection type of data do the requested measurements fall under? ** Category 1. 5) Is the data collection request for default-on or default-off? Default ON, all channels. 6) Does the instrumentation include the addition of **any *new* identifiers** (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)? No. 7) Is the data collection covered by the existing Firefox privacy notice? Yes. 8) Does there need to be a check-in in the future to determine whether to renew the data? No, permanent.
Attachment #8998151 - Flags: review?(francois) → review+
(In reply to Dimi Lee[:dimi][:dlee] from comment #5) > Created attachment 8998151 [details] > data-review-request.md BTW, if you rename the file to .txt, then it's easier to review because it gets displayed inline in Bugzilla. This is a work-around until bug 1421032 is fixed.
Comment on attachment 8998152 [details] Bug 1479901 - Add telemetry probe for measuring more detailed network status of download protection remote lookup. r?francois François Marier [:francois] has approved the revision. https://phabricator.services.mozilla.com/D2844
Attachment #8998152 - Flags: review+
Pushed by dlee@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/71e7d262b066 Add telemetry probe for measuring more detailed network status of download protection remote lookup. r=francois
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: