Closed
Bug 393274
Opened 18 years ago
Closed 18 years ago
setup nagios monitoring on release automation machines
Categories
(Release Engineering :: General, defect, P2)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: joduinn, Assigned: bhearsum)
References
Details
(Whiteboard: waiting on bug 410019)
Each of the newly setup machines & VMs used for staging & production release automation should be monitored.
Reporter | ||
Comment 1•18 years ago
|
||
This is not blocking us, but filing tracking bug so we dont forget.
Reporter | ||
Updated•18 years ago
|
Priority: -- → P3
Comment 2•18 years ago
|
||
What sort of monitoring did you have in mind ? If it's tree monitoring, then the production machines will tend to come and go quite alot and end up producing spam.
Comment 3•18 years ago
|
||
(In reply to comment #2)
> What sort of monitoring did you have in mind ? If it's tree monitoring, then
> the production machines will tend to come and go quite alot and end up
> producing spam.
Monitoring the machines directly. Disk space, memory, whether important processes are running, log monitoring, etc.
Comment 4•18 years ago
|
||
Aravind and I discussed; I am going to go ahead and set up the nagios remote plugin execution service (nrpe), and accept connections from the monitoring server.
I'll install, configure, and document the setup of the plugins on the build machine side, and someone on the IT side can configure the server to poll these values and notify us when a problem is found.
Assignee: build → rhelmer
Updated•18 years ago
|
Status: NEW → ASSIGNED
Updated•18 years ago
|
Whiteboard: ETA August 29
Comment 5•18 years ago
|
||
I played a bit with nrpe setup on the staging environment. Looks pretty straightforward to monitor the following cross-platform:
* free disk space
* load average (little different on windows)
* process check (by name, number of processes, zombie procs, etc)
* free memory
This would be great to start with.
The default log check is pretty simplistic (can search for a "bad" query), it'd be nice to search for known-good queries and report on all others (e.g. http://logcheck.org/).
Updated•18 years ago
|
Whiteboard: ETA August 29 → ETA Sept 28
Updated•18 years ago
|
Assignee: rhelmer → build
Status: ASSIGNED → NEW
Whiteboard: ETA Sept 28
Updated•18 years ago
|
Assignee: build → nobody
QA Contact: mozpreed → build
Updated•18 years ago
|
Assignee: nobody → rhelmer
Updated•18 years ago
|
Status: NEW → ASSIGNED
Comment 6•18 years ago
|
||
This is starting to bug me again :)
Let's start by getting the staging environments going.
Updated•18 years ago
|
Whiteboard: waiting on bug 410019
Assignee | ||
Updated•18 years ago
|
Assignee: rhelmer → bhearsum
Status: ASSIGNED → NEW
Assignee | ||
Comment 7•18 years ago
|
||
At this point, I've got all of the Windows machines and all of the Linux machines (sans staging-prometheus-vm) setup with NRPE running. staging-prometheus-vm won't let me install the nrpe RPMs ('rpm' hangs). I think a reboot will fix this problem, but I haven't found a decent time to do it yet.
I've been told justdave has a nice little script to help with nrpe on OS X. We'll see how that goes.
Assignee | ||
Comment 8•18 years ago
|
||
As mentioned in bug 410019, Mac nrpe is up and running now.
Assignee | ||
Updated•18 years ago
|
Priority: P3 → P2
Assignee | ||
Comment 9•18 years ago
|
||
This bug is done now. If we want to add NRPE to other build machines, let's file a new bug to track that.
I've filed bug 412443 about getting the NRPE daemon into the ref platform.
Assignee | ||
Updated•18 years ago
|
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•