quoting :mreid... Critical for trusting data that comes out of the pipeline. We should have a minimum set of monitors for every source coming in to the pipeline that watches the basics: - Submission rates - gzip errors (for data that comes in compressed) - json parse errors (for json data) - decoder errors (for any custom code that runs on incoming records) Monitoring specific particular data sources should be logged as separate bugs.
Dashboard will live here: https://metrics.services.mozilla.com/pipeline-monitoring-dashboard/ Repo here: https://github.com/mozilla/pipeline-monitoring-dashboard/
New data is flowing, monitored here: https://pipeline-prototype-cep.prod.mozaws.net/#sandboxes/TelemetryStats/outputs/TelemetryStats.TelemetryDecoderStatistics.cbuf Bespoke dashboards up in a bit.
We've defined the minimum set of common monitors as: - TelemetryStats drops to zero (cep) - TelemetryOutput ProcessFileFailures (dwl) (any increase) - puppet configured plugin terminations - dwl generic monitor for all plugins Trink is implementing this so reassigning to him.
Assignee: kparlante → mtrinkala
Summary: Common monitors for all sources ingested by pipeline (minimum set) → Common monitors/alerts for all sources ingested by pipeline (minimum set)
These are the associated PRs: - https://github.com/mozilla-services/data-pipeline/pull/77 - https://github.com/mozilla-services/data-pipeline/pull/78 - https://github.com/mozilla-services/puppet-config/pull/1363 - https://github.com/mozilla-services/heka/pull/1562 - https://github.com/mozilla-services/puppet-config/pull/1365
The PRs have been reviewed and merged.
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.