Closed Bug 1315893 Opened 3 years ago Closed 3 years ago

Add telemetry to measure update time for V2 and V4

Categories

(Toolkit :: Safe Browsing, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox53 --- fixed

People

(Reporter: dimi, Assigned: tnguyen)

References

Details

(Whiteboard: #sbv4-m3)

Attachments

(1 file, 1 obsolete file)

After checking the telemetry - URLCLASSIFIER_CL_UPDATE_TIME[0]
There are about 1% updates take more than 15sec, since the maximum value is 15000[1],
so we cannot know the records exceed 15000.
Maybe it worth enlarge the maximum value so we can get better understanding when investigate bug like bug 1310060

So i'll suggest we add an telemetry with provider as key to record UPDATE_TIME for v2/v4 and 'high' value should be at least 60.

[0] https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2016-10-19&keys=__none__!__none__!__none__&max_channel_version=release%252F49&measure=URLCLASSIFIER_CL_UPDATE_TIME&min_channel_version=null&os=Windows_NT&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2016-09-16&table=0&trim=1&use_submission_date=0
[1] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Histograms.json#3788
Assignee: nobody → dlee
Status: NEW → ASSIGNED
Attached patch WIP Patch (obsolete) — Splinter Review
This patch is based on henry's patch in bug 1315097 to convert table name to provider.
I did not follow the idea of adding provider key, but I think it is not much helpful, probably I am missing something.
Assignee: dlee → tnguyen
Attachment #8808937 - Attachment is obsolete: true
Added provider key. I think it's better to replace the old URLCLASSIFIER_CL_UPDATE_TIME and used the new URLCLASSIFIER_CL_KEYED_UPDATE_TIME with key "All"
Comment on attachment 8812669 [details]
Bug 1315893 - Add telemetry to measure update time for V2 and V4.

https://reviewboard.mozilla.org/r/94340/#review94786

::: toolkit/components/telemetry/Histograms.json:4069
(Diff revision 3)
>      "n_buckets": 10,
>      "description": "Time spent per classifier lookup (ms)"
>    },
> -  "URLCLASSIFIER_CL_UPDATE_TIME": {
> +  "URLCLASSIFIER_CL_KEYED_UPDATE_TIME": {
>      "alert_emails": ["safebrowsing-telemetry@mozilla.org"],
> -    "expires_in_version": "never",
> +    "expires_in_version": "58",

We can use 59 now that mozilla-central is 53 (it's always current + 6).

However in this case, since we're updating an existing non-expiring probe, I believe we can leave it as `never`.

::: toolkit/components/telemetry/Histograms.json:4073
(Diff revision 3)
>      "alert_emails": ["safebrowsing-telemetry@mozilla.org"],
> -    "expires_in_version": "never",
> +    "expires_in_version": "58",
> +    "keyed": true,
>      "kind": "exponential",
>      "low": 20,
> -    "high": 15000,
> +    "high": 60000,

Should we increase that to 120s to go all the way to what I assume is the network timeout?

::: toolkit/components/telemetry/Histograms.json:4074
(Diff revision 3)
> -    "expires_in_version": "never",
> +    "expires_in_version": "58",
> +    "keyed": true,
>      "kind": "exponential",
>      "low": 20,
> -    "high": 15000,
> -    "n_buckets": 15,
> +    "high": 60000,
> +    "n_buckets": 50,

Do we really need to increase the number of buckets that much? It seems like we probably don't need that much granularity.

::: toolkit/components/url-classifier/Classifier.cpp:512
(Diff revision 3)
> +    return NS_OK;
> +  }
> +
> +  // Assume all TableUpdate objects should have the same provider.
> +  Telemetry::AutoTimer<Telemetry::URLCLASSIFIER_CL_KEYED_UPDATE_TIME>
> +    keyedTimer(GetProvider((*aUpdates)[0]->TableName()));

Because the provider name could leak user-controlled strings, we will need to filter this so that it's one of:

- mozilla
- google
- google4
- other

That way, we only send the names of the built-in providers, not any custom ones that may have been added locally.

::: toolkit/components/url-classifier/Classifier.cpp:513
(Diff revision 3)
> +  }
> +
> +  // Assume all TableUpdate objects should have the same provider.
> +  Telemetry::AutoTimer<Telemetry::URLCLASSIFIER_CL_KEYED_UPDATE_TIME>
> +    keyedTimer(GetProvider((*aUpdates)[0]->TableName()));
> +  // 'All' will record all providers

Good catch.

Do you have a use case in mind for the 'All' category?

I can see how we will use the data to  compare providers and to evaluate them individually, but I can't think of a reason why we'd need to look at the aggregate number. The mozilla and google4 stacks don't have much in common.
Attachment #8812669 - Flags: review?(francois) → review-
Rebased commit.
Comment on attachment 8812669 [details]
Bug 1315893 - Add telemetry to measure update time for V2 and V4.

https://reviewboard.mozilla.org/r/94340/#review95128

::: toolkit/components/telemetry/Histograms.json:4069
(Diff revision 5)
>      "expires_in_version": "never",
> +    "keyed": true,
>      "kind": "exponential",
>      "low": 20,
> -    "high": 15000,
> +    "high": 120000,
>      "n_buckets": 15,

We need to increase the number of buckets a little since we increased the "high" value a lot.

Maybe 30 buckets?

::: toolkit/components/url-classifier/Classifier.cpp:511
(Diff revision 5)
> -  Telemetry::AutoTimer<Telemetry::URLCLASSIFIER_CL_UPDATE_TIME> timer;
> +  if (!aUpdates || aUpdates->Length() == 0) {
> +    return NS_OK;
> +  }
> +
> +  // Assume all TableUpdate objects should have the same provider.
> +  nsCString provider = GetProvider((*aUpdates)[0]->TableName());

I think we're going to add a similar check in other V2/V4 telemetry so maybe we need a GetTelemetryProvider() function which returns the provider but filtered for telemetry.

::: toolkit/components/url-classifier/Classifier.cpp:516
(Diff revision 5)
> +  nsCString provider = GetProvider((*aUpdates)[0]->TableName());
> +  // Filter out build-in providers: mozilla, google, google4
> +  if (!NS_LITERAL_CSTRING("mozilla").Equals(provider) &&
> +      !NS_LITERAL_CSTRING("google").Equals(provider) &&
> +      !NS_LITERAL_CSTRING("google4").Equals(provider)) {
> +    provider.Assign(NS_LITERAL_CSTRING("Others"));

"Other"
Attachment #8812669 - Flags: review?(francois) → review-
Comment on attachment 8812669 [details]
Bug 1315893 - Add telemetry to measure update time for V2 and V4.

https://reviewboard.mozilla.org/r/94340/#review95128

> I think we're going to add a similar check in other V2/V4 telemetry so maybe we need a GetTelemetryProvider() function which returns the provider but filtered for telemetry.

Thanks you,
Agree, I am going to add the function in bug 1311910. Will add it here.
Thanks Francois for your review.
I added getTelemetryProvider to idl file for later using (probably URLCLASSIFIER_COMPLETE_TIMEOUT and URLCLASSIFIER_COMPLETE_REMOTE_STATUS in js file and URLCLASSIFIER_UPDATE_ERROR in bug 1311910).
Thanks
Comment on attachment 8812669 [details]
Bug 1315893 - Add telemetry to measure update time for V2 and V4.

https://reviewboard.mozilla.org/r/94340/#review95412

datareview+ and r+

::: toolkit/components/url-classifier/nsIUrlClassifierUtils.idl:37
(Diff revision 6)
>    ACString getProvider(in ACString tableName);
>  
>    /**
> +   * Get the provider used for Telemetry.
> +   * Because recording Telemetry will leak user-controlled strings,
> +   * only build in providers ("mozilla", "google", "google4")

"built-in providers"

I would also remove the list of them because it might change in the future and we don't want to have to remember to update the comment too.
Attachment #8812669 - Flags: review?(francois) → review+
Keywords: checkin-needed
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/631f977c50ee
Add telemetry to measure update time for V2 and V4. r=francois
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/631f977c50ee
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
You need to log in before you can comment on or make changes to this bug.