Closed Bug 790621 Opened 12 years ago Closed 9 years ago

Telemetry should warn when a histogram disappears

Categories

(Mozilla Metrics :: Frontend Reports, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX
Backlogged - BZ

People

(Reporter: justin.lebar+bug, Unassigned)

References

Details

(Whiteboard: [Telemetry])

I now have two examples of us losing telemetry data because of a presumed bug on the client side: bug 790615 and bug 741378.

I expect these bugs will continue to happen.

It's critical that we have a way to detect this kind of regression quickly.  That will require assistance from the people managing the telemetry data.

You can tell the difference between a regression and us intentionally removing a histogram by parsing TelemetryHistograms.h.
Blocks: 790615, 741378
We can do that in CDV. Detect differences and between build of day N and build of day N-1

How often are things removed? We could send an email and someone could manually check for that
> How often are things removed?

Good question.

In the month of August, I see 7 changes which removed at least one histogram.  So figure roughly twice a week.

I'm not convinced a human will pay close attention to these messages for more than a month or two if they have to check the hg log twice a week, given that the vast majority of times, we expect the histogram removal to be intentional.

But if you give us the data in machine-readable form, /we/ can parse the histograms JSON file...

Perhaps a simple thing to do would be to create once a day a machine-readable file containing a list of all the (histogram, platform, buildid, reason) tuples for which we've received at least one ping that day, and put it up on a webserver somewhere.
The plan is to validate the incoming data against the Telemetry data schema. Will this schema validation automatically catch this type of regression?
Whiteboard: [Telemetry]
(In reply to Lawrence Mandel [:lmandel] from comment #3)
> The plan is to validate the incoming data against the Telemetry data schema.
> Will this schema validation automatically catch this type of regression?

Not if you're only validating incoming data points one at a time.  It's perfectly valid for any one telemetry ping to be missing any one histogram.  We need to be looking at this in aggregate.

> Perhaps a simple thing to do would be to create once a day a machine-readable file containing a list 
> of all the (histogram, platform, buildid, reason) tuples for which we've received at least one ping 
> that day, and put it up on a webserver somewhere.

I'm thinking this file would be more useful if it included an estimate of how many pings for that tuple we received that day, instead of just indicating a boolean yes we got at least one / no we didn't.
Target Milestone: Unreviewed → Backlogged - BZ
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.