Open Bug 1104368 Opened 10 years ago Updated 1 month ago

Alert emails for significant changes in volume of (parts of) telemetry submissions

Categories

(Toolkit :: Telemetry, defect)

x86_64
Windows 8.1
defect
Not set
normal

Tracking

()

REOPENED

People

(Reporter: Gijs, Unassigned)

Details

After some IRC discussion with Jared about tests for telemetry submissions (related to bug 1100914), we considered that maybe we should have some way to ensure we don't accidentally break telemetry at whatever point between the client submissions and the data being ingested/visualized - client-side tests would just check what goes on on the client.

We don't know what is already in place for this purpose, but we thought it could take the form of current talos alerts, that basically do email and are now also visualized on a webtool, in order to easily track "suspicious" changes in the volume of telemetry submitted.

This would likely have to be based on the number of submitting unique ids, and the amount of data they submit for each 'part', so ideally it would be able to tell us something like "50% reduction in number of profiles submitting simple measurements - shutdownDuration" or such.

Is such a thing feasible, and does that sound like a useful idea to the people who would be concerned with building it? :-)
We currently have email alerts and visualizations based on the telemetry submission rate, broken down by channel.

There are also alerts based on the rate of processing/validation errors when we convert the data for long term storage.

In terms of looking "inside" the data, Roberto built a histogram-change alerting system as well. Maybe that system could be extended to look at changes in occurrence of different measures (instead of the values for that measure)?
Flags: needinfo?(rvitillo)
We certainly would benefit from what Gijs is suggesting but we can't use cerberus for that. We would need to do something similar to what we are doing for the overall Telemetry submission rate and run it individually on each metric.

So yes, it's feasibile and we should look into it.
Flags: needinfo?(rvitillo)
Product: Webtools → Webtools Graveyard
Status: NEW → RESOLVED
Closed: 1 month ago
Resolution: --- → INCOMPLETE

This is still a valid thing to want... This may not be the best component but hopefully the telemetry folks know where this should live instead.

Status: RESOLVED → REOPENED
Component: Telemetry Server → Telemetry
Product: Webtools Graveyard → Toolkit
Resolution: INCOMPLETE → ---
You need to log in before you can comment on or make changes to this bug.