Closed Bug 1150921 Opened 10 years ago Closed 10 years ago

Add telemetry for response codes to SafeBrowsing requests

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla47

Tracking Flags:

Tracking

Status

firefox40

---

affected

firefox47

---

fixed

People

(Reporter: gcp, Assigned: gcp)

References

Details

Attachments

(1 file, 1 obsolete file)

Add telemetry for response codes to SafeBrowsing requests 10 years ago Gian-Carlo Pascutto [:gcp] 7.17 KB, patch	francois : review-	Details \| Diff \| Splinter Review
Add telemetry for response codes to SafeBrowsing requests 10 years ago Gian-Carlo Pascutto [:gcp] 10.17 KB, patch	francois : review+ benjamin : feedback+	Details \| Diff \| Splinter Review

Gian-Carlo Pascutto [:gcp]

Assignee

Description

•

10 years ago

We currently don't have any telemetry that can spot things like: https://bugzilla.mozilla.org/show_bug.cgi?id=1150334 Checking for strange things in server response codes to our requests sounds pretty useful.

François Marier [:francois]

Updated

•

10 years ago

Blocks: 1149867

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 1

•

10 years ago

Attached patch Add telemetry for response codes to SafeBrowsing requests (obsolete) — Details — Splinter Review

Add telemetry for the reponses from the remote server. I grouped the status code into buckets, a bit of a judgement call which distinctions might be useful and which not.

Attachment #8715350 - Flags: review?(francois)

François Marier [:francois]

Comment 2

•

10 years ago

Comment on attachment 8715350 [details] [diff] [review] Add telemetry for response codes to SafeBrowsing requests Review of attachment 8715350 [details] [diff] [review]: ----------------------------------------------------------------- According to the spec these return codes are valid for list updates: 200: OK — Data is available in the HTTP response body. 400: Bad Request — The HTTP request was not correctly formed. The client did not provide all required CGI parameters or the body did not contain any meaningful entries. 403: Forbidden — The client id is invalid. 503: Service Unavailable — The server cannot handle the request. Clients MUST follow the backoff behavior specified in the Request Frequency section. 505: HTTP Version Not Supported — The server CANNOT handle the requested protocol major version. and for gethash: 200: OK — Data is available in the HTTP response body. 204: No Content — There are no full-length hashes with the requested prefix. 400: Bad Request — The HTTP request was not correctly formed. The client did not provide all required CGI parameters. 403: Forbidden — The client id is invalid. 503: Service Unavailable — The server cannot handle the request. Clients MUST follow the backoff behavior specified in the Request Frequency section. 505: HTTP Version Not Supported — The server CANNOT handle the requested protocol major version. So in other words, updates is missing: 505 and gethash is missing: 204, 505. I would suggest having more buckets: - one bucket per documented return code - one bucket for all 1xx undocumented error codes, one for 2xx, etc. - one bucket for return codes we don't care about - one bucket for anything else we'd like to single out So, here's a concrete proposal for gethash: case 100: case 101: // Unexpected 1xx return code statusBucket = TODO; break; case 200: // OK - Data is available in the HTTP response body. statusBucket = TODO; break; case 201: case 202: case 203: case 205: case 206: // Unexpected 2xx return code statusBucket = TODO; break; case 204: // No Content - There are no full-length hashes with the requested prefix. statusBucket = TODO; break; case 300: case 301: case 302: case 303: case 304: case 305: case 307: case 308: // Unexpected 3xx return code statusBucket = TODO; break; case 400: // Bad Request - The HTTP request was not correctly formed. The client did not provide all required CGI parameters. statusBucket = TODO; break; case 401: case 402: case 405: case 406: case 407: case 409: case 410: case 411: case 412: case 413: case 414: case 415: case 416: case 417: case 421: case 426: case 428: case 429: case 431: case 451: // Unexpected 4xx return code statusBucket = TODO; break; case 403: // Forbidden - The client id is invalid. statusBucket = TODO; break; case 404: // Not Found statusBucket = TODO; break; case 408: // Request Timeout statusBucket = TODO; break; case 500: case 501: case 510: // Unexpected 5xx return code statusBucket = TODO; break; case 502: case 504: case 511: // Local network errors, we'll ignore these. statusBucket = TODO; break; case 503: // Service Unavailable - The server cannot handle the request. Clients MUST follow the backoff behavior specified in the Request Frequency section. statusBucket = TODO; break; case 505: // HTTP Version Not Supported - The server CANNOT handle the requested protocol major version. statusBucket = TODO; break; List updates would be the same except with these changes: case 201: case 202: case 203: case 204: <-- no longer in a separate bucket case 205: case 206: // Unexpected 2xx return code statusBucket = TODO; break; case 400: <-- different comment to match SB spec // Bad Request - The HTTP request was not correctly formed. The client did not provide all required CGI parameters or the body did not contain any meaningful entries. statusBucket = TODO; break;

Attachment #8715350 - Flags: review?(francois) → review-

François Marier [:francois]

Updated

•

10 years ago

Assignee: nobody → gpascutto

Status: NEW → ASSIGNED

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 3

•

10 years ago

Attached patch Add telemetry for response codes to SafeBrowsing requests — Details — Splinter Review

Implemented your suggestion, but with same code->bucket map for both cases, and taking 413 as a seperate error.

Attachment #8716277 - Flags: review?(francois)

Gian-Carlo Pascutto [:gcp]

Assignee

Updated

•

10 years ago

Attachment #8715350 - Attachment is obsolete: true

François Marier [:francois]

Comment 4

•

10 years ago

Comment on attachment 8716277 [details] [diff] [review] Add telemetry for response codes to SafeBrowsing requests Review of attachment 8716277 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for making these changes, that should cover all of the possible failure cases. ::: toolkit/components/telemetry/Histograms.json @@ +3242,5 @@ > + "expires_in_version":"never", > + "kind": "enumerated", > + "n_values": 16, > + "bug_numbers": [1150921], > + "description": "Server response code from remote SafeBrowsing lookups" Can you change this to "remote SafeBrowsing gethash lookups"? Just so it's clearly totally clear it's not talking about Application Reputation remote lookups. ::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js @@ +123,5 @@ > + // Request Timeout > + statusBucket = 9; > + break; > + case 413: > + // Request Entity Too Large - Bug 1150334 Nice :) ::: toolkit/components/url-classifier/nsUrlClassifierStreamUpdater.cpp @@ +467,5 @@ > + // Unexpected 2xx return code > + statusBucket = 2; > + break; > + case 204: > + // No Content - There are no full-length hashes with the requested prefix. The "full-length hashes" bit is not revelant here. If you want to have exactly the same code+comments as the other one, maybe just use "No Content" without the rest?

Attachment #8716277 - Flags: review?(francois) → review+

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 5

•

10 years ago

Comment on attachment 8716277 [details] [diff] [review] Add telemetry for response codes to SafeBrowsing requests Add bsmedberg for privacy/telemetry policy.

Attachment #8716277 - Flags: review?(benjamin)

Benjamin Smedberg

Comment 6

•

10 years ago

Comment on attachment 8716277 [details] [diff] [review] Add telemetry for response codes to SafeBrowsing requests If this is "never", who (person or team) is committing to monitor this metric forever? Will the existing telemetry dashboards or automated alerts be sufficient, or will you be building a custom dashboard for this monitoring? Please document what the enumerated values are: either list them in the "description" of the histogram, or reference some existing enumeration that will contain them. That way people browsing telemetry.mozilla.org will be able to understand what the values represent. Is the goal eventually to make this an opt-out operational metric? I would encourage that, after you've tested it. If that's the case, I'd also encourage you to file a separate future bug for real operational monitoring (nagios or the equivalent).

Flags: needinfo?(gpascutto)

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 7

•

10 years ago

(In reply to Benjamin Smedberg [:bsmedberg] from comment #6) > If this is "never", who (person or team) is committing to monitor this > metric forever? Will the existing telemetry dashboards or automated alerts > be sufficient, or will you be building a custom dashboard for this > monitoring? Will be the SafeBrowsing team/module owner. The histogram-based Telemetry Alerts should suffice for this (though I wish those had an option to send email as we've missed warnings before). > Please document what the enumerated values are: either list them in the > "description" of the histogram, or reference some existing enumeration that > will contain them. I think 16 values is too much for the description? And not sure what you mean by the second sentence. Is linking the source code that defines the buckets ok? There is already a link to this bug. > Is the goal eventually to make this an opt-out operational metric? I would > encourage that, after you've tested it. If that's the case, I'd also > encourage you to file a separate future bug for real operational monitoring > (nagios or the equivalent). The goal is to monitor this permanently, and coordinate with the server operator (Google) if we see Firefox clients are having issues.

Flags: needinfo?(gpascutto) → needinfo?(benjamin)

Benjamin Smedberg

Comment 8

•

10 years ago

(In reply to Gian-Carlo Pascutto [:gcp] from comment #7) > (In reply to Benjamin Smedberg [:bsmedberg] from comment #6) > > If this is "never", who (person or team) is committing to monitor this > > metric forever? Will the existing telemetry dashboards or automated alerts > > be sufficient, or will you be building a custom dashboard for this > > monitoring? > > Will be the SafeBrowsing team/module owner. The histogram-based Telemetry > Alerts should suffice for this (though I wish those had an option to send > email as we've missed warnings before). Please use the "alert_emails" field to set this up. However, I am confident that the current alerting mechanism is not sufficient to monitor this. It was built to monitor performance metrics, and it will not be good at monitoring enumerated success/error codes without some modifications. I encourage you to file a separate bug about monitoring and ask Roberto Vitillo for suggestions. > > > Please document what the enumerated values are: either list them in the > > "description" of the histogram, or reference some existing enumeration that > > will contain them. > > I think 16 values is too much for the description? It's not. It's fine for the description to be long, and then the values will show up in the various documentation and dashboards automatically. > > Is the goal eventually to make this an opt-out operational metric? I would > > encourage that, after you've tested it. If that's the case, I'd also > > encourage you to file a separate future bug for real operational monitoring > > (nagios or the equivalent). > > The goal is to monitor this permanently, and coordinate with the server > operator (Google) if we see Firefox clients are having issues. After you've verified that the data is correct and useful, please make this an opt-out metric.

Flags: needinfo?(benjamin)

Benjamin Smedberg

Comment 9

•

10 years ago

Comment on attachment 8716277 [details] [diff] [review] Add telemetry for response codes to SafeBrowsing requests data-review+ with an updated and more detailed description field and a list or individual emails in the alert_emails field.

Attachment #8716277 - Flags: review?(benjamin) → feedback+

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 10

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/d93a009fcc9f14bf4ce1f006c8c4b703f7d27e7a Bug 1150921 - Add telemetry for response codes to SafeBrowsing requests. r=francois f=bsmedberg

Carsten Book [:Tomcat]

Comment 11

•

10 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d93a009fcc9f

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

status-firefox47: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla47

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Add telemetry for response codes to SafeBrowsing requests

Categories

(Toolkit :: Safe Browsing, defect)

Tracking

()

People

(Reporter: gcp, Assigned: gcp)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 1 obsolete file)

Description

Updated

Comment 1

Comment 2

Updated

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Attachment

General

Description

File Name

Content Type