Set up nagios check on reserved_slaves files

RESOLVED INCOMPLETE

Status

Release Engineering
General
P3
normal
RESOLVED INCOMPLETE
7 years ago
4 years ago

People

(Reporter: nthomas, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [buildmasters][nagios][buildbotmaster][puppet])

(Reporter)

Description

7 years ago
We reserve slaves for releases by creating the appropriate file in the master directory, eg reserved_slaves_pm01. Once the initial flurry of builds and updates have been produced we often forget to free them again for use in general development. A nagios check would be one way to remind us to do this. 

An empty file or one containing 0 is OK, a non-zero integer is potentially an error state. The alert could go off quickly and get downtimed for a day, or only complain after a day.

Comments ?
(In reply to comment #0)
> We reserve slaves for releases by creating the appropriate file in the
> master directory, eg reserved_slaves_pm01. Once the initial flurry of builds
> and updates have been produced we often forget to free them again for use in
> general development. A nagios check would be one way to remind us to do
> this. 
> 
> An empty file or one containing 0 is OK, a non-zero integer is potentially
> an error state. The alert could go off quickly and get downtimed for a day,
> or only complain after a day.
> 
> Comments ?

I think this makes a lot of sense. I'd vote for the alert going off quickly, and getting downtime'd (*not* ack'ed). If it takes a long time to go off, we lose a bunch of value.
If you write the client-side script, it's easy to invoke it from the server.  The scripts are in the puppet-manifests repo.

Updated

6 years ago
Duplicate of this bug: 671338
From bug 671338:
while we're at it, I don't think there's any reason the reserved_slaves file needs to have the master name appended to it. plain old 'reserved_slaves' should be sufficient I think.
Whiteboard: [buildmasters] → [buildmasters][nagios][buildbotmaster][puppet]
(In reply to Chris AtLee [:catlee] from comment #4)
> From bug 671338:
> while we're at it, I don't think there's any reason the reserved_slaves file
> needs to have the master name appended to it. plain old 'reserved_slaves'
> should be sufficient I think.

Landed http://hg.mozilla.org/build/buildbot-configs/rev/0490771808b6 to address this
reserved_slaves file is gone, no need to monitor
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → INCOMPLETE
(Assignee)

Updated

4 years ago
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.