Closed Bug 1538417 Opened 6 years ago Closed 4 years ago

Monitor results of update-signatures-daily

Categories

(Cloud Services :: Operations: Normandy, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: brian, Assigned: brian)

Details

We have a k8s cron job [0] that runs daily to update any signatures that will expire soon. Currently if a k8s error caused the job to not run, or a code error caused the job to fail in a way that did not send a message to sentry, we would not know. We should fix this.

My preferred fix is for the update signatures command [1] to submit metrics for the number of recipes and actions it signed and unsigned. Then if data is missing for these metrics for > 24 hours we can alert.

An alternate fix would be for ops to create a wrapper command that submits a different metric depending on the exit status of the wrapped command and wrap the call to update signatures with that. This idea is discussed further in mana. [2]

[0] https://github.com/mozilla-services/cloudops-infra/blob/master/projects/normandy/k8s/charts/admin/templates/update-signatures-cronjob.yaml
[1] https://github.com/mozilla/normandy/blob/d4ee0f81af292954df6513185997fcbc27a59fd4/normandy/recipes/management/commands/update_signatures.py
[2] https://mana.mozilla.org/wiki/display/SVCOPS/InfluxDB+FAQ#InfluxDBFAQ-HowcanIrecordthesuccessorfailureofatask?

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.