We're moving to a new system for running *all* cron jobs called Crontabber. It records any failures and we want to be alerted if any job doesn't exist cleanly. The status of all cron jobs gets outputted to a local crontabbers.json file on sp-admin01 and we also replicate this information every time to the postgres server. We also log every failure on the log files by the way.
We're rolling crontabber back right now, at least partly because we don't have a monitoring method. Let's work this out before we attempt to deploy it again.
Assignee: nobody → server-ops
Component: Infra → Server Operations
Product: Socorro → mozilla.org
QA Contact: jdow
Version: unspecified → other
(In reply to Peter Bengtsson [:peterbe] from comment #0) > We're moving to a new system for running *all* cron jobs called Crontabber. > It records any failures and we want to be alerted if any job doesn't exist > cleanly. > s/exists/exits In python you can find all/any errors like this: import json def run(): errors = 0 for key, info in json.load(open('crontabber.json')).items(): if info.get('last_error'): print key print info['last_error'] print info['error_count'] print errors += 1 sys.exit(errors) ...in case that helps.
What is the location of the json file? We already have a nagios check that looks for json output from a http request and alerts based on the response key/value pairs. Can modify that to check for a local file instead.
Assignee: server-ops → ashish
Status: NEW → ASSIGNED
Whichever head it's installed on, the file is in /home/socorro/persistent/crontabbers.json We configure it like this https://github.com/mozilla/socorro/blob/master/config/crontabber.ini#L23 E.g. [firstname.lastname@example.org ~]$ ls -l /home/socorro/persistent/crontabbers.json -rw-r--r-- 1 socorro socorro 8133 Jul 31 16:46 /home/socorro/persistent/crontabbers.json
Crons are run from the admin boxes. You can find those listed in the crash-stats.mozilla.com mana docs. Alternatively, if you have access to the other socorro nagios alerts all of our current cron monitoring should be taking place on the same box as this.
Marking as dupe of 818736 because that one has more (recent) information. I guess I forgot about this but when I filed the second one.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 818736
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.