Closed Bug 711610 Opened 14 years ago Closed 14 years ago

nagios checks for buildapi01

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Unassigned)

References

Details

this host isn't in production yet, but we'll need nagios checks on (and downtime'd) beforehand. please add nagios checks for: - host being up - ntp - ganglia reporting properly - local rabbitmq instance running (process name is /usr/lib64/erlang/erts-5.6.5/bin/beam) (I could be convinced to use the other rabbitmq instance dustin set up...) - buildapi process running (/home/buildapi/bin/python /home/buildapi/bin/paster ...; pidfile is /home/buildapi/buildapi.pid) - nginx running (process name is nginx, pid /var/run/nginx.pid) - host responds on port 80 at / - any other checks you can think of! Thanks!
If this is only used locally, then a local rabbitmq instance makes sense - it's isolated, and no more a SPOF than the rest of the system. If the messages are consumed or produced elsewhere, then a local instance may still make sense, but it should be shoveling the messages to the releng cluster, which is HA.
Set up: - host - HTTP check for a 200 response - ganglia wio (which is as close as we can get to "reporting properly") - rabbitmq - ntp - (I think an nginx process check would be redundant to the HTTP check) - check_procs_regex for paster.*/home/buildapi/production.ini I'll monitor these and downtime if necessary.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
the rabbit check can go, we're going to use rabbit1.build.scl1.mozilla.com instead
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
gone
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
Blocks: 926246
You need to log in before you can comment on or make changes to this bug.