Open Bug 1442617 Opened 6 years ago Updated 2 years ago

Highlight histogram clamping errors to developers

Categories

(Toolkit :: Telemetry, enhancement, P3)

enhancement

Tracking

()

Tracking Status
firefox60 --- affected

People

(Reporter: gfritzsche, Unassigned)

References

Details

In bug 1440832 we had to disable error console logging for histogram value clamping because we were spamming the console.

We still should figure out how to bring these issues to developers attention though.
Some thoughts:
- Could we blacklist the current spamming clamping errors and log any new ones to the error console?
- Could we generically make mochitest-browser tests and others fail when a clamping error occurs? E.g. from some error message?
(In reply to Georg Fritzsche [:gfritzsche] from comment #1)
> - Could we generically make mochitest-browser tests and others fail when a
> clamping error occurs? E.g. from some error message?

Maybe Raphael has ideas here?
Flags: needinfo?(rpierzina)
Do the console log messages provide additional context about which telemetry measurements caused the clamping errors?

Do you think it's sufficient to notify developers that there are clamping issues or do we need to provide any specifics?

Can we persist the log messages (for example by writing them to disk) and does CI allow for generating and storing custom testing artifacts? If so, we could collect and read these artifacts in an extra testing stage and cause a test failure in case of any errors. Ideally we would provide extra context about the telemetry measurements that caused the clamping errors to make it easier for developers to resolve the issues.
Flags: needinfo?(rpierzina)
The key name of the keyed uint scalar telemetry.accumulate_clamped_value will contain the name of the probe the large value attempted accumulation against.[1]

(the value of the scalar will be 2x the number of accumulations made that subsession against that probe. Why 2x? Because of implementation details that were fixed in bug 1320052)

[1]: https://mzl.la/2Ke6wz0
(In reply to Raphael Pierzina [:raphael] UTC+01:00 from comment #3)
> Can we persist the log messages (for example by writing them to disk) and
> does CI allow for generating and storing custom testing artifacts? If so, we
> could collect and read these artifacts in an extra testing stage and cause a
> test failure in case of any errors. Ideally we would provide extra context
> about the telemetry measurements that caused the clamping errors to make it
> easier for developers to resolve the issues.

We do have a bug about making treeherder fail on Telemetry errors (bug 1324774). We could look for specific log messages and make it fail on CI :)
Raphael, is the test harness support something that could fit into your future plans?
Flags: needinfo?(rpierzina)
As discussed during the Firefox Telemetry workweek, we will:

- investigate the fail test suite for Telemetry errors capability
- design log format, that we can parse in the test harness and scan for these errors
Flags: needinfo?(rpierzina)
Priority: -- → P3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.