Closed Bug 746396 Opened 12 years ago Closed 12 years ago

add nagios monitoring of the opsi master process to staging-opsi and production-opsi

Categories

(Infrastructure & Operations :: RelOps: General, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bear, Assigned: arich)

References

Details

(Whiteboard: [scl3][opsi])

according to 

https://nagios.mozilla.org/nagios/cgi-bin/status.cgi?navbarsearch=1&host=staging-opsi

nagios is not monitoring the OPSI master process
Okay, what should the check be?
Assignee: server-ops-releng → arich
Checking back on this to see if there's more information about what this check should look like.
The opsi people have a project to write a nagios plugin that does many things, including making sure that process is responding, but their 'someone pays for this development and then it's free for everyone' model hasn't attracted any funding yet. You can find the source in google, but licensing excludes us from using it. 

We have '/usr/bin/python /usr/sbin/opsiconfd -D' in the process list. So lets just go with a simple process check for now - 1 instance of opsiconfd should be running.
Assignee: arich → nobody
Component: Server Operations: RelEng → Release Engineering: Machine Management
QA Contact: arich → armenzg
I've added a check for /usr/sbin/opsiconfd to the new opsi servers in scl3

I also had to:

* modify the allowed hosts in /etc/nagios.nrpe.cfg on both machines so that admin1.infra.scl1.mozilla.com and nagios1.private.releng.scl3.mozilla.com could talk to them
* add the check definitions for swap and procs_regex to /etc/nagios/nrpe_local.cfg

I didn't even see puppet installed, so I don't think these changes will get overwritten.
Assignee: nobody → arich
Status: NEW → RESOLVED
Closed: 12 years ago
Component: Release Engineering: Machine Management → Server Operations: RelEng
QA Contact: armenzg → arich
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.