All users were logged out of Bugzilla on October 13th, 2018
Can we have a look at the crontabber monitoring for stage Socorro? We had a busted job that had been erroring since 8/4, but I didn't see alerts -- either I missed the first alert, or maybe we have this configured to be less annoying than it should be.
I looked through the nagios config and the check is running check_crontabber, which is [email@example.com ~]# locate check_crontabber /etc/nagios/nrpe.d/check_crontabber.cfg /home/eziegenhorn/check_crontabber.py /usr/lib64/nagios/plugins/custom/check_crontabber.sh [firstname.lastname@example.org ~]# cat /usr/lib64/nagios/plugins/custom/check_crontabber.sh #!/bin/bash # Call crontabber.py to check the status of Socorro cron jobs PYTHONPATH=/data/socorro/application:/data/socorro/thirdparty/ /data/socorro/application/socorro/cron/crontabber.py --admin.conf=/etc/socorro/crontabber.ini --nagios
Assignee: server-ops-webops → bburton
Status: NEW → ASSIGNED
We tracked this down to being because the check was given an ACK on 7/26, https://nagios.mozilla.org/phx1/cgi-bin/extinfo.cgi?type=2&host=socorroadm.stage.private.phx1.mozilla.com&service=Socorro+Admin+-+crontab When you ACK a check Nagios does not re-alert unless the check returns an OK, which this check never did until today nagios-phx1 | Wed 12:37:00 PDT  socorroadm.stage.private.phx1.mozilla.com:Socorro Admin - crontab is OK: OK - All systems nominal (http://m.allizom.org/Socorro+Admin+-+crontab)
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.