Closed Bug 1248589 Opened 9 years ago Closed 9 years ago

Please create nagios check for buildbot backlog age

Categories

(Infrastructure & Operations :: MOC: Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aselagea, Assigned: ryanc)

References

Details

In bug 1220191, I've been working on a script that checks for the age of the oldest item in the pending job queue and returns a status message based on some thresholds. It would be useful if we would have a nagios check based on this. The alert is documented here: https://mana.mozilla.org/wiki/display/NAGIOS/check_backlog_age As stated on the above bug, I assumed the alert will be implemented the same way as check_pending_builds was, since the checks are very similar. That'll mean creating the following config file: "/etc/nagios/nrpe.d/check_backlog_age.cfg" which will run the "check_backlog_age.py" script placed at this location: "/usr/lib64/nagios/plugins/custom/check_backlog_age.py" (this also needs to be created). If there's anything I can help you while setting this, please let me know. Thanks.
Also commented in the bug, this should alert on irc in #buildduty not to the MOC.
Assignee: nobody → vhua
Hi Alin, Just to confirm, You made a check for this in bug 1220191, and you need assistance configuring the Puppet side to use it?
Status: NEW → ASSIGNED
(In reply to Ryan C [:ryanc] from comment #2) > Hi Alin, > > Just to confirm, > > You made a check for this in bug 1220191, and you need assistance > configuring the Puppet side to use it? Hi, In bug 1220191 I only wrote the python script that looks for the backlog age, but the nagios check still needs to be implemented (see https://bugzilla.mozilla.org/show_bug.cgi?id=1220191#c44)
Alright, Got this into Nagios. Made a small change to the script to get it to work in sysadmins r115040. It looks like this is also already alerting, 16:14:40 <nagios-releng> ryanc: nagios1.private.releng.scl3.mozilla.com:Backlog Age is WARNING - WARNING Backlog Age: 7h:17m:15s Last Checked: 2016-02-17 16:12:30 PST This still needs to be flipped to send alerts to buildduty when you're ready.
Assignee: vhua → rchilds
Moved to buildduty by suggestion in sysadmins r115050, 17:37:22 <Callek> ryanc: I'd push it to #buildduty and mark it in bug -- I haven't been following along closely, but the people who need to see it are in Romania, and then kim/coop who are both east coast, so chances of getting a reasonable response tonight is slim 17:38:07 <Callek> ryanc: I don't think theres any downside to having it there before your end of day though 17:42:39 <arr> ryanc: yeah, buildduty only Here it is in buildduty, 19:12:40 <nagios-releng> Wed 19:12:39 PST [4101] nagios1.private.releng.scl3.mozilla.com:Backlog Age is WARNING: WARNING Backlog Age: 8h:48m:29s (http://m.mozilla.org/Backlog+Age)
Update, 08:12:36 <nagios-releng> Thu 08:12:34 PST [4019] nagios1.private.releng.scl3.mozilla.com:Backlog Age is CRITICAL: CRITICAL Backlog Age: 19h:57m:33s (http://m.mozilla.org/Backlog+Age) Looks like it's doing it's thing as requested. Resolving this.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.