Closed Bug 1245591 Opened 8 years ago Closed 8 years ago

Monitor coalesce service

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dividehex, Assigned: dividehex)

References

Details

The coalescing service will need to be monitored and alert if it falls over.  My inclination is to simply add a https ping check to nagios.  But since we are moving to hosting webapps in external PaaS container products such as heroku, the options are wide open.

For instance, taskcluster uses uptimerobot.com with a simple static page that displays based on json output from uptimerobot.  https://deadmanssnitch.com/ has also been recommended.

The coalesce service itself is a noncritical component.  If it goes offine, tasks simply stop coalescing. So a monitoring service for coalescing doesn't need to be of high time resolution.
Assignee: relops → jwatkins
I've added a host and service check this app.  Unfortunately, heroku doesn't allow icmp traffic to their lb endpoints so I had to change the host check and service ping check to a tcp port 443 check in addition to the https endpoint (app) response check of /v1/ping.  Checks are all green now.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.