Closed
Bug 1248589
Opened 9 years ago
Closed 9 years ago
Please create nagios check for buildbot backlog age
Categories
(Infrastructure & Operations :: MOC: Service Requests, task)
Infrastructure & Operations
MOC: Service Requests
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: aselagea, Assigned: ryanc)
References
Details
In bug 1220191, I've been working on a script that checks for the age of the oldest item in the pending job queue and returns a status message based on some thresholds. It would be useful if we would have a nagios check based on this.
The alert is documented here: https://mana.mozilla.org/wiki/display/NAGIOS/check_backlog_age
As stated on the above bug, I assumed the alert will be implemented the same way as check_pending_builds was, since the checks are very similar. That'll mean creating the following config file: "/etc/nagios/nrpe.d/check_backlog_age.cfg" which will run the "check_backlog_age.py" script placed at this location: "/usr/lib64/nagios/plugins/custom/check_backlog_age.py" (this also needs to be created).
If there's anything I can help you while setting this, please let me know.
Thanks.
Comment 1•9 years ago
|
||
Also commented in the bug, this should alert on irc in #buildduty not to the MOC.
Updated•9 years ago
|
Assignee: nobody → vhua
Assignee | ||
Comment 2•9 years ago
|
||
Hi Alin,
Just to confirm,
You made a check for this in bug 1220191, and you need assistance configuring the Puppet side to use it?
Status: NEW → ASSIGNED
Reporter | ||
Comment 3•9 years ago
|
||
(In reply to Ryan C [:ryanc] from comment #2)
> Hi Alin,
>
> Just to confirm,
>
> You made a check for this in bug 1220191, and you need assistance
> configuring the Puppet side to use it?
Hi,
In bug 1220191 I only wrote the python script that looks for the backlog age, but the nagios check still needs to be implemented (see https://bugzilla.mozilla.org/show_bug.cgi?id=1220191#c44)
Assignee | ||
Comment 4•9 years ago
|
||
Alright,
Got this into Nagios. Made a small change to the script to get it to work in sysadmins r115040. It looks like this is also already alerting,
16:14:40 <nagios-releng> ryanc: nagios1.private.releng.scl3.mozilla.com:Backlog Age is WARNING - WARNING Backlog Age: 7h:17m:15s Last Checked: 2016-02-17 16:12:30 PST
This still needs to be flipped to send alerts to buildduty when you're ready.
Assignee: vhua → rchilds
Assignee | ||
Comment 5•9 years ago
|
||
Moved to buildduty by suggestion in sysadmins r115050,
17:37:22 <Callek> ryanc: I'd push it to #buildduty and mark it in bug -- I haven't been following along closely, but the people who need to see it are in Romania, and then kim/coop who are both east coast, so chances of getting a reasonable response tonight is slim
17:38:07 <Callek> ryanc: I don't think theres any downside to having it there before your end of day though
17:42:39 <arr> ryanc: yeah, buildduty only
Here it is in buildduty,
19:12:40 <nagios-releng> Wed 19:12:39 PST [4101] nagios1.private.releng.scl3.mozilla.com:Backlog Age is WARNING: WARNING Backlog Age: 8h:48m:29s (http://m.mozilla.org/Backlog+Age)
Assignee | ||
Comment 6•9 years ago
|
||
Update,
08:12:36 <nagios-releng> Thu 08:12:34 PST [4019] nagios1.private.releng.scl3.mozilla.com:Backlog Age is CRITICAL: CRITICAL Backlog Age: 19h:57m:33s (http://m.mozilla.org/Backlog+Age)
Looks like it's doing it's thing as requested. Resolving this.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•