Not sure is this is the right bucket an you add the following 4 masters to nagios? Buildbot is not currently up on them, as I'm still going through the steps to add them. buildbot-master07.bb.releng.usw2.mozilla.com buildbot-master08.bb.releng.use1.mozilla.com buildbot-master124.bb.releng.use1.mozilla.com buildbot-master125.bb.releng.usw2.mozilla.com The following checks are needed: Command Queue MySQL Connectivity PING Pulse Queue buildbot disk - / load procs - command_runner procs - pulse_publisher No need to check swap.
Anyone in releng has the ability to add basic nagios checks like this now. I've added the above hosts to the use1 and usw2 buildbot-master groups, which gives them all the same checks as the other buildbot masters in their region. If you want me to show you how to do this in the future, let me know. As you point out, several things are not configured/running yet, so the hosts have been downtimed for 7 days. Delete the downtime when you're ready to put them in production (or extend it if you need more than 7 days).
Assignee: relops → arich
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.