Closed
Bug 711610
Opened 14 years ago
Closed 14 years ago
nagios checks for buildapi01
Categories
(Infrastructure & Operations :: RelOps: General, task)
Infrastructure & Operations
RelOps: General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Unassigned)
References
Details
this host isn't in production yet, but we'll need nagios checks on (and downtime'd) beforehand.
please add nagios checks for:
- host being up
- ntp
- ganglia reporting properly
- local rabbitmq instance running (process name is /usr/lib64/erlang/erts-5.6.5/bin/beam)
(I could be convinced to use the other rabbitmq instance dustin set up...)
- buildapi process running (/home/buildapi/bin/python /home/buildapi/bin/paster ...; pidfile is /home/buildapi/buildapi.pid)
- nginx running (process name is nginx, pid /var/run/nginx.pid)
- host responds on port 80 at /
- any other checks you can think of!
Thanks!
Comment 1•14 years ago
|
||
If this is only used locally, then a local rabbitmq instance makes sense - it's isolated, and no more a SPOF than the rest of the system.
If the messages are consumed or produced elsewhere, then a local instance may still make sense, but it should be shoveling the messages to the releng cluster, which is HA.
Comment 2•14 years ago
|
||
Set up:
- host
- HTTP check for a 200 response
- ganglia wio (which is as close as we can get to "reporting properly")
- rabbitmq
- ntp
- (I think an nginx process check would be redundant to the HTTP check)
- check_procs_regex for paster.*/home/buildapi/production.ini
I'll monitor these and downtime if necessary.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 3•14 years ago
|
||
the rabbit check can go, we're going to use rabbit1.build.scl1.mozilla.com instead
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 4•14 years ago
|
||
gone
Status: REOPENED → RESOLVED
Closed: 14 years ago → 14 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•