add nagios monitoring of the opsi master process to staging-opsi and production-opsi

RESOLVED FIXED

Status

Infrastructure & Operations
RelOps
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: bear, Assigned: arr)

Tracking

Details

(Whiteboard: [scl3][opsi])

(Reporter)

Description

6 years ago
according to 

https://nagios.mozilla.org/nagios/cgi-bin/status.cgi?navbarsearch=1&host=staging-opsi

nagios is not monitoring the OPSI master process
(Assignee)

Comment 1

6 years ago
Okay, what should the check be?
(Assignee)

Updated

6 years ago
Assignee: server-ops-releng → arich
(Assignee)

Comment 2

6 years ago
Checking back on this to see if there's more information about what this check should look like.
The opsi people have a project to write a nagios plugin that does many things, including making sure that process is responding, but their 'someone pays for this development and then it's free for everyone' model hasn't attracted any funding yet. You can find the source in google, but licensing excludes us from using it. 

We have '/usr/bin/python /usr/sbin/opsiconfd -D' in the process list. So lets just go with a simple process check for now - 1 instance of opsiconfd should be running.
Assignee: arich → nobody
Component: Server Operations: RelEng → Release Engineering: Machine Management
QA Contact: arich → armenzg
(Assignee)

Comment 4

6 years ago
I've added a check for /usr/sbin/opsiconfd to the new opsi servers in scl3

I also had to:

* modify the allowed hosts in /etc/nagios.nrpe.cfg on both machines so that admin1.infra.scl1.mozilla.com and nagios1.private.releng.scl3.mozilla.com could talk to them
* add the check definitions for swap and procs_regex to /etc/nagios/nrpe_local.cfg

I didn't even see puppet installed, so I don't think these changes will get overwritten.
Assignee: nobody → arich
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Component: Release Engineering: Machine Management → Server Operations: RelEng
QA Contact: armenzg → arich
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.