All users were logged out of Bugzilla on October 13th, 2018

Stage crontabber didn't report error with Signature Summary

RESOLVED FIXED

Status

RESOLVED FIXED
5 years ago
2 years ago

People

(Reporter: selenamarie, Assigned: bburton)

Tracking

Details

Can we have a look at the crontabber monitoring for stage Socorro? 

We had a busted job that had been erroring since 8/4, but I didn't see alerts -- either I missed the first alert, or maybe we have this configured to be less annoying than it should be.
(Assignee)

Comment 1

5 years ago
I looked through the nagios config and the check is running check_crontabber, which is

[root@socorroadm.stage.private.phx1 ~]# locate check_crontabber
/etc/nagios/nrpe.d/check_crontabber.cfg
/home/eziegenhorn/check_crontabber.py
/usr/lib64/nagios/plugins/custom/check_crontabber.sh
[root@socorroadm.stage.private.phx1 ~]# cat /usr/lib64/nagios/plugins/custom/check_crontabber.sh
#!/bin/bash
# Call crontabber.py to check the status of Socorro cron jobs

PYTHONPATH=/data/socorro/application:/data/socorro/thirdparty/ /data/socorro/application/socorro/cron/crontabber.py --admin.conf=/etc/socorro/crontabber.ini --nagios
Assignee: server-ops-webops → bburton
Status: NEW → ASSIGNED
(Assignee)

Comment 2

5 years ago
We tracked this down to being because the check was given an ACK on 7/26, https://nagios.mozilla.org/phx1/cgi-bin/extinfo.cgi?type=2&host=socorroadm.stage.private.phx1.mozilla.com&service=Socorro+Admin+-+crontab

When you ACK a check Nagios does not re-alert unless the check returns an OK, which this check never did until today

nagios-phx1 | Wed 12:37:00 PDT [1338] socorroadm.stage.private.phx1.mozilla.com:Socorro Admin - crontab is OK: OK - All systems nominal (http://m.allizom.org/Socorro+Admin+-+crontab)
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.