according to https://nagios.mozilla.org/nagios/cgi-bin/status.cgi?navbarsearch=1&host=staging-opsi nagios is not monitoring the OPSI master process
Okay, what should the check be?
Checking back on this to see if there's more information about what this check should look like.
The opsi people have a project to write a nagios plugin that does many things, including making sure that process is responding, but their 'someone pays for this development and then it's free for everyone' model hasn't attracted any funding yet. You can find the source in google, but licensing excludes us from using it. We have '/usr/bin/python /usr/sbin/opsiconfd -D' in the process list. So lets just go with a simple process check for now - 1 instance of opsiconfd should be running.
Assignee: arich → nobody
Component: Server Operations: RelEng → Release Engineering: Machine Management
QA Contact: arich → armenzg
I've added a check for /usr/sbin/opsiconfd to the new opsi servers in scl3 I also had to: * modify the allowed hosts in /etc/nagios.nrpe.cfg on both machines so that admin1.infra.scl1.mozilla.com and nagios1.private.releng.scl3.mozilla.com could talk to them * add the check definitions for swap and procs_regex to /etc/nagios/nrpe_local.cfg I didn't even see puppet installed, so I don't think these changes will get overwritten.
Assignee: nobody → arich
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Component: Release Engineering: Machine Management → Server Operations: RelEng
QA Contact: armenzg → arich
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.