Monitor results of update-signatures-daily
Categories
(Cloud Services :: Operations: Normandy, task)
Tracking
(Not tracked)
People
(Reporter: brian, Assigned: brian)
Details
We have a k8s cron job [0] that runs daily to update any signatures that will expire soon. Currently if a k8s error caused the job to not run, or a code error caused the job to fail in a way that did not send a message to sentry, we would not know. We should fix this.
My preferred fix is for the update signatures command [1] to submit metrics for the number of recipes and actions it signed and unsigned. Then if data is missing for these metrics for > 24 hours we can alert.
An alternate fix would be for ops to create a wrapper command that submits a different metric depending on the exit status of the wrapped command and wrap the call to update signatures with that. This idea is discussed further in mana. [2]
[0] https://github.com/mozilla-services/cloudops-infra/blob/master/projects/normandy/k8s/charts/admin/templates/update-signatures-cronjob.yaml
[1] https://github.com/mozilla/normandy/blob/d4ee0f81af292954df6513185997fcbc27a59fd4/normandy/recipes/management/commands/update_signatures.py
[2] https://mana.mozilla.org/wiki/display/SVCOPS/InfluxDB+FAQ#InfluxDBFAQ-HowcanIrecordthesuccessorfailureofatask?
| Assignee | ||
Comment 1•6 years ago
|
||
| Assignee | ||
Comment 2•4 years ago
|
||
I've added email alerting after a day of no data to https://earthangel-b40313e5.influxcloud.net/d/FZfPnJcWz/normandy-signing?orgId=1&from=now-7d&to=now&var-env=prod
Description
•