Add telemetry for when gethash calls timeout

RESOLVED FIXED in Firefox 48

Status

()

RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: francois, Assigned: dimi)

Tracking

unspecified
mozilla48
Points:
---

Firefox Tracking Flags

(firefox48 fixed)

Details

(Whiteboard: tpe-seceng)

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

4 years ago
It would be useful to know how often these completions timeout when trying to reach Google's servers.
(Reporter)

Updated

4 years ago
Blocks: 1149867
No longer blocks: 1024555
See Also: → bug 1024555
Component: DOM: Security → Safe Browsing
Product: Core → Toolkit
(Reporter)

Comment 1

3 years ago
This is probably going to be similar to bug 1150921.
Whiteboard: tpe-seceng
(Assignee)

Comment 2

3 years ago
I would like to work on this bug.
Assignee: nobody → dlee
(Assignee)

Updated

3 years ago
Status: NEW → ASSIGNED
(Assignee)

Comment 3

3 years ago
Hi François,
Do we measure this by how many success gethash request with a timeout or the time between each timeout ?
Flags: needinfo?(francois)
(Reporter)

Comment 4

3 years ago
I don't think we have a need for the actual time it took. We just want to make sure that requests don't time out very often.

So I would suggest: # timeouts / (# timeouts + # completed requests)

which means we also need to add telemetry for the requests that succeeded.
Flags: needinfo?(francois)
(Assignee)

Comment 5

3 years ago
Created attachment 8732696 [details] [diff] [review]
Add telemetry for gethash
Attachment #8732696 - Flags: review?(francois)
(Reporter)

Comment 6

3 years ago
Comment on attachment 8732696 [details] [diff] [review]
Add telemetry for gethash

Review of attachment 8732696 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/url-classifier/nsUrlClassifierHashCompleter.js
@@ +553,5 @@
>      // Notify the RequestBackoff once a response is received.
>      this._completer.finishRequest(this.gethashUrl, httpStatus);
>  
>      if (success) {
> +      Services.telemetry.getHistogramById("URLCLASSIFIER_COMPLETE_TIMEOUT").add(0);

What happens when the request doesn't timeout but still fails? For example, if it returns a 503 instead of a 200.

I would say that as far as the telemetry goes, we should consider that a non-timeout event and mark it as such. We're only interested in seeing how often the gethash endpoint fails to answer at all.
Attachment #8732696 - Flags: review?(francois)
(Assignee)

Comment 7

3 years ago
Created attachment 8733220 [details] [diff] [review]
Add telemetry for gethash v2

Modification of this patch:
- Record all gethash requests in telemetry instead of just record success requests
Attachment #8732696 - Attachment is obsolete: true
Attachment #8733220 - Flags: review?(francois)
(Reporter)

Comment 8

3 years ago
Comment on attachment 8733220 [details] [diff] [review]
Add telemetry for gethash v2

Review of attachment 8733220 [details] [diff] [review]:
-----------------------------------------------------------------

That looks fine to me. I think Benjamin needs to review all new telemetry probes.
Attachment #8733220 - Flags: review?(francois) → review?(benjamin)

Comment 9

3 years ago
Comment on attachment 8733220 [details] [diff] [review]
Add telemetry for gethash v2

If this is an exploratory measurement, it should expire (typically in 6 months).

If this is an operational alerting metric, I think we should be collecting from the release population, and you should talk a bit about how you plan on monitoring this with either daily or realtime alerting.
Attachment #8733220 - Flags: review?(benjamin) → feedback-
(Reporter)

Comment 10

3 years ago
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #9)
> If this is an exploratory measurement, it should expire (typically in 6
> months).
> 
> If this is an operational alerting metric, I think we should be collecting
> from the release population, and you should talk a bit about how you plan on
> monitoring this with either daily or realtime alerting.

The question we're trying to answer is: Is the 10s timeout reasonable and (especially) does this happen often in practice?

Is that what you'd consider an exploratory measurement? If so, can we run it for 1 year instead of just 6 months?
Flags: needinfo?(benjamin)
Please run it for 6 months first and if you still need more data you can extend it.

You could instead add a metric that measures the total time a gethash call takes, which would give you a distribution and not just a boolean.
Flags: needinfo?(benjamin)
(Reporter)

Comment 12

3 years ago
Dimi, I think we should go with the 6 months that Benjamin suggested.

https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=1237198&attachment=8735634 suggests that all we need to do is add "expires_in_version" with whatever release will come 6 months after Fx 48.
(Assignee)

Comment 13

3 years ago
Created attachment 8737773 [details]
MozReview Request: Bug 1172688 - Add telemetry for when gethash calls timeout. r=francois, bsmedberg

Review commit: https://reviewboard.mozilla.org/r/44105/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/44105/
Attachment #8737773 - Flags: review?(francois)
Attachment #8737773 - Flags: review?(benjamin)
(Assignee)

Updated

3 years ago
Attachment #8733220 - Attachment is obsolete: true
(Reporter)

Comment 14

3 years ago
Comment on attachment 8737773 [details]
MozReview Request: Bug 1172688 - Add telemetry for when gethash calls timeout. r=francois, bsmedberg

https://reviewboard.mozilla.org/r/44105/#review40815
Attachment #8737773 - Flags: review?(francois) → review+
(Assignee)

Comment 15

3 years ago
(In reply to Dimi Lee[:dimi][:dlee] from comment #13)
> Created attachment 8737773 [details]
> MozReview Request: Bug 1172688 - Add telemetry for when gethash calls
> timeout. r?francois, bsmedberg
> 
> Review commit: https://reviewboard.mozilla.org/r/44105/diff/#index_header
> See other reviews: https://reviewboard.mozilla.org/r/44105/

Forget to say, based on comment 4, i still use boolean for now instead of metric suggested in comment 11
(Reporter)

Comment 16

3 years ago
I think the boolean is fine for this initial exploration.

Updated

3 years ago
Attachment #8737773 - Flags: review?(benjamin) → review+
Comment on attachment 8737773 [details]
MozReview Request: Bug 1172688 - Add telemetry for when gethash calls timeout. r=francois, bsmedberg

https://reviewboard.mozilla.org/r/44105/#review41015

data-review=me with that change or something equivalent

::: toolkit/components/telemetry/Histograms.json:3364
(Diff revision 1)
> +  "URLCLASSIFIER_COMPLETE_TIMEOUT": {
> +    "alert_emails": ["gcp@mozilla.com", "francois@mozilla.com"],
> +    "expires_in_version": "52",
> +    "kind": "boolean",
> +    "bug_numbers": [1172688],
> +    "description": "True if gethash lookup timeout"

Suggest more detail: "this metric is recorded every time a gethash lookup is performed. `true` is recorded if the lookup times out."
(Assignee)

Comment 18

3 years ago
Comment on attachment 8737773 [details]
MozReview Request: Bug 1172688 - Add telemetry for when gethash calls timeout. r=francois, bsmedberg

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44105/diff/1-2/
Attachment #8737773 - Attachment description: MozReview Request: Bug 1172688 - Add telemetry for when gethash calls timeout. r?francois, bsmedberg → MozReview Request: Bug 1172688 - Add telemetry for when gethash calls timeout. r=francois, bsmedberg
(Assignee)

Updated

3 years ago
Keywords: checkin-needed

Comment 20

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/51688b3611d4
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
status-firefox48: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
You need to log in before you can comment on or make changes to this bug.