Closed
Bug 1479901
Opened 7 years ago
Closed 7 years ago
Investigate the 5% failure rate in application remote lookups
Categories
(Toolkit :: Safe Browsing, enhancement, P3)
Toolkit
Safe Browsing
Tracking
()
RESOLVED
FIXED
mozilla63
Tracking | Status | |
---|---|---|
firefox63 | --- | fixed |
People
(Reporter: francois, Assigned: dimi)
References
(Blocks 1 open bug)
Details
Attachments
(2 files, 1 obsolete file)
Right now we are seeing about a 5% failure rate for connecting to the application reputation server: https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2018-07-26&keys=__none__!__none__!__none__&max_channel_version=beta%252F62&measure=APPLICATION_REPUTATION_SERVER&min_channel_version=null&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2018-06-25&table=1&trim=1&use_submission_date=0
We should investigate further by splitting this generic error bucket into multiple error types so that we can identify the source of errors more easily.
Reporter | ||
Comment 1•7 years ago
|
||
Dimi, if you have time to land this in 63, that could help out with the rest of the download protection work since we'd be getting better data as soon as Beta 63.
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → dlee
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•7 years ago
|
||
Hi Francois,
Would you mind taking a quick look at this patch? thanks!
Attachment #8997064 -
Flags: feedback?(francois)
Reporter | ||
Comment 3•7 years ago
|
||
Comment on attachment 8997064 [details] [diff] [review]
WIP Patch
Review of attachment 8997064 [details] [diff] [review]:
-----------------------------------------------------------------
A few suggestions to improve the patch, but that's exactly the approach I was thinking of. In fact, that's even more details that I was thinking of :)
::: toolkit/components/reputationservice/ApplicationReputationTelemetryUtils.cpp
@@ +1,2 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * * License, v. 2.0. If a copy of the MPL was not distributed with this
nit: No need for the two `*` at the start of the line.
@@ +6,5 @@
> +#include "mozilla/Assertions.h"
> +
> +namespace mozilla {
> +
> +using ServerLabel = Telemetry::LABELS_MEDIA_PLAY_PROMISE_RESOLUTION;
`MEDIA_PLAY`? Is that a copy/paste error or are you re-using an existing list?
It may be dangerous to reuse an existing list since the Media team could change their scheme at any time and that would impact our probes too.
::: toolkit/components/reputationservice/ApplicationReputationTelemetryUtils.h
@@ +7,5 @@
> +#define ApplicationReputationTelemetryUtils_h__
> +
> +#include "mozilla/Telemetry.h"
> +
> +Telemetry::LABELS_MEDIA_PLAY_PROMISE_RESOLUTION
`MEDIA_PLAY`?
@@ +10,5 @@
> +
> +Telemetry::LABELS_MEDIA_PLAY_PROMISE_RESOLUTION
> +NSErrorToServerLabel(nsresult rv);
> +
> +Telemetry::LABELS_MEDIA_PLAY_PROMISE_RESOLUTION
ditto
::: toolkit/components/telemetry/Histograms.json
@@ +114,4 @@
> "n_values": 3,
> "description": "Status of the application reputation remote lookup (0=OK, 1=failed, 2=invalid protobuf response)"
> },
> + "APPLICATION_REPUTATION_SERVER2": {
nit: I think the `_2` suffix is more common than just `2`.
@@ +115,5 @@
> "description": "Status of the application reputation remote lookup (0=OK, 1=failed, 2=invalid protobuf response)"
> },
> + "APPLICATION_REPUTATION_SERVER2": {
> + "record_in_processes": ["main", "content"],
> + "alert_emails": ["francois@mozilla.com", "safebrowsing-telemetry@mozilla.org"],
You may as well put your email address here since you're more likely to be keeping an eye on these :)
@@ +118,5 @@
> + "record_in_processes": ["main", "content"],
> + "alert_emails": ["francois@mozilla.com", "safebrowsing-telemetry@mozilla.org"],
> + "bug_numbers": [1479901],
> + "releaseChannelCollection": "opt-out",
> + "expires_in_version": "66",
Let's make this "never" since I think we'll want to continue to monitor this.
@@ +121,5 @@
> + "releaseChannelCollection": "opt-out",
> + "expires_in_version": "66",
> + "kind": "categorical",
> + "labels": ["ResponseValid", "FailGetChannel", "FailGetResponse", "HTTP1xx", "HTTP2xx", "HTTP204", "HTTP3xx", "HTTP400", "HTTP4xx", "HTTP403", "HTTP404", "HTTP408", "HTTP413", "HTTP5xx", "HTTP502_504_511", "HTTP513", "HTTP505", "HTTPOthers", "ErrAlreadyConnected", "ErrNotConnected", "ErrConnectionRefused", "ErrTimeout", "ErrOffline", "ErrPortAccess", "ErrNetReset", "ErrNetInterrupt", "ErrProxyConnection", "ErrNetPartial", "ErrNetInadequate", "ErrUnknownProxy", "ErrDNSLookupQueue", "ErrUnkownProxyHost", "ErrOthers"],
> + "description": "Status of the application reputation remote lookup"
Maybe "Network status" instead of just "Status"
Attachment #8997064 -
Flags: feedback?(francois) → feedback-
Assignee | ||
Comment 4•7 years ago
|
||
Thanks for the feedback!
> @@ +6,5 @@
> > +#include "mozilla/Assertions.h"
> > +
> > +namespace mozilla {
> > +
> > +using ServerLabel = Telemetry::LABELS_MEDIA_PLAY_PROMISE_RESOLUTION;
>
> `MEDIA_PLAY`? Is that a copy/paste error or are you re-using an existing
> list?
>
> It may be dangerous to reuse an existing list since the Media team could
> change their scheme at any time and that would impact our probes too.
>
Yes, that is a copy/paste error.
I wanted to submit this patch before leaving office, didn't check carefully :(
Assignee | ||
Comment 5•7 years ago
|
||
Attachment #8998151 -
Flags: review?(francois)
Assignee | ||
Updated•7 years ago
|
Attachment #8997064 -
Attachment is obsolete: true
Assignee | ||
Comment 6•7 years ago
|
||
We only recorded "fail" or "invalid protobuf response" when we received an
error while sending remote lookup. However, this is not enough for us to
analyze the problem if the error rate is high.
This patch introduced a more fine-grained telemetry probe that could help us
understand the result better.
Reporter | ||
Comment 7•7 years ago
|
||
Comment on attachment 8998151 [details]
data-review-request.md
1) Is there or will there be **documentation** that describes the schema for the ultimate data set available publicly, complete and accurate?
Yes, in Histograms.json.
2) Is there a control mechanism that allows the user to turn the data collection on and off?
Yes, telemetry setting.
3) If the request is for permanent data collection, is there someone who will monitor the data over time?**
Yes, Dimi Lee.
4) Using the **[category system of data types](https://wiki.mozilla.org/Firefox/Data_Collection)** on the Mozilla wiki, what collection type of data do the requested measurements fall under? **
Category 1.
5) Is the data collection request for default-on or default-off?
Default ON, all channels.
6) Does the instrumentation include the addition of **any *new* identifiers** (whether anonymous or otherwise; e.g., username, random IDs, etc. See the appendix for more details)?
No.
7) Is the data collection covered by the existing Firefox privacy notice?
Yes.
8) Does there need to be a check-in in the future to determine whether to renew the data?
No, permanent.
Attachment #8998151 -
Flags: review?(francois) → review+
Reporter | ||
Comment 8•7 years ago
|
||
(In reply to Dimi Lee[:dimi][:dlee] from comment #5)
> Created attachment 8998151 [details]
> data-review-request.md
BTW, if you rename the file to .txt, then it's easier to review because it gets displayed inline in Bugzilla. This is a work-around until bug 1421032 is fixed.
Comment 9•7 years ago
|
||
Comment on attachment 8998152 [details]
Bug 1479901 - Add telemetry probe for measuring more detailed network status of download protection remote lookup. r?francois
François Marier [:francois] has approved the revision.
https://phabricator.services.mozilla.com/D2844
Attachment #8998152 -
Flags: review+
Comment 10•7 years ago
|
||
Pushed by dlee@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/71e7d262b066
Add telemetry probe for measuring more detailed network status of download protection remote lookup. r=francois
Comment 11•7 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox63:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
You need to log in
before you can comment on or make changes to this bug.
Description
•