Closed Bug 1050141 Opened 8 years ago Closed 8 years ago

Add Telemetry for ApplicationReputation remote responses

Categories

(Toolkit :: Downloads API, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla34

People

(Reporter: gcp, Assigned: gcp)

Details

Attachments

(2 files)

We should track the responses from the remote server, not just whether they're valid.
Assignee: nobody → gpascutto
Status: NEW → ASSIGNED
Attachment #8469133 - Flags: review?(mmc)
Comment on attachment 8469133 [details] [diff] [review]
Add Telemetry for ApplicationReputation remote responses. r=

Review of attachment 8469133 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Attachment #8469133 - Flags: review?(mmc) → review+
https://hg.mozilla.org/mozilla-central/rev/5edac09d4b93
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Comment on attachment 8469133 [details] [diff] [review]
Add Telemetry for ApplicationReputation remote responses. r=

Review of attachment 8469133 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/Histograms.json
@@ +48,5 @@
> +  "APPLICATION_REPUTATION_SERVER_RESPONSE": {
> +    "expires_in_version": "never",
> +    "kind": "enumerated",
> +    "n_values": 5,
> +    "description": "Application reputation remote response (0=SAFE, 1=DANGEROUS, 2=UNCOMMON, 3=POTENTIALLY_UNWANTED, 4=DANGEROUS_HOST)"

It occurs to me that Google could add a new verdict at any time and this accumulation will break because n_values = 5. Too late to make the bucket bigger?
Quite a few things in play here:

1) Would the new response get as far as the Accumulate without us updating our protobuff stuff?
2) What does the Accumulate do if the value is out of bounds? Reject or clamp?

If we update to handle new responses then then I would expect the distribution of responses to change anyway, but I'm not sure if we need to replace the probe by one with a new name or can just update n_values or something.
Flags: needinfo?(taras.mozilla)
Monica, do you know what happens on the protobuff side with new verdicts?
Flags: needinfo?(mmc)
(In reply to Gian-Carlo Pascutto [:gcp] from comment #7)
> Monica, do you know what happens on the protobuff side with new verdicts?

Yes. It fails to deserialize and will fail with SERVER_RESPONSE_FAILED, so that's working as intended. We don't want to take action on a new verdict without knowing what it is.

I think the last time I tried changing the type of a bucket without making a new one, dashboards were borked for about a week and then recovered with the new data. Of course, I have no idea what happens on the aggregator side and that's probably not what the telemetry folks intended.
Flags: needinfo?(mmc)
(In reply to [:mmc] Monica Chew (please use needinfo) from comment #8)
> (In reply to Gian-Carlo Pascutto [:gcp] from comment #7)
> > Monica, do you know what happens on the protobuff side with new verdicts?
> 
> Yes. It fails to deserialize and will fail with SERVER_RESPONSE_FAILED, so
> that's working as intended. We don't want to take action on a new verdict
> without knowing what it is.

Or rather, it will deserialize to an int and then won't match a known verdict so no action will be taken.
At what level will no action be taken? If it's in the switch on response.verdict(), then we can accumulate the Telemetry with an out of bounds value, which could be an issue. That's quite a difference from failing to deserialize.

I think I'll amend this patch to add one to n_values and catch all unknown values into that. It'll give us warning if Google updates and "forgets" to tell us, too.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #10)
> At what level will no action be taken? If it's in the switch on
> response.verdict(), then we can accumulate the Telemetry with an out of
> bounds value, which could be an issue. That's quite a difference from
> failing to deserialize.
> 
> I think I'll amend this patch to add one to n_values and catch all unknown
> values into that. It'll give us warning if Google updates and "forgets" to
> tell us, too.

Why stop at one? :) Please make it at least 20. That should buy us at least a year.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #6)
> Quite a few things in play here:
> 
> 1) Would the new response get as far as the Accumulate without us updating
> our protobuff stuff?
> 2) What does the Accumulate do if the value is out of bounds? Reject or
> clamp?
> 
> If we update to handle new responses then then I would expect the
> distribution of responses to change anyway, but I'm not sure if we need to
> replace the probe by one with a new name or can just update n_values or
> something.

I don't even remember how to read C++ anymore :) Don't try to extract function invariants out of management, this can only end in tears.
Flags: needinfo?(taras.mozilla)
>Please make it at least 20. That should buy us at least a year.

I'm not sure mixing histograms with entirely different meanings (i.e. if the possible values change) buys us much, but then again I don't want to face the hardest problem in computer science - finding a good new name for the historgram - too much, so I'll likely round up to 8 or so, giving us 2 new buckets for future growth and a "something else still" bucket for the same number of bits.
Clamp the responses and leave some room to for potential new 
responses to get added.
Attachment #8474553 - Flags: review?(mmc)
Attachment #8474553 - Flags: review?(mmc) → review+
You need to log in before you can comment on or make changes to this bug.