Status

Infrastructure & Operations
RelOps
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: dividehex, Assigned: dividehex)

Tracking

Details

(Assignee)

Description

2 years ago
The coalescing service will need to be monitored and alert if it falls over.  My inclination is to simply add a https ping check to nagios.  But since we are moving to hosting webapps in external PaaS container products such as heroku, the options are wide open.

For instance, taskcluster uses uptimerobot.com with a simple static page that displays based on json output from uptimerobot.  https://deadmanssnitch.com/ has also been recommended.

The coalesce service itself is a noncritical component.  If it goes offine, tasks simply stop coalescing. So a monitoring service for coalescing doesn't need to be of high time resolution.
(Assignee)

Updated

2 years ago
Assignee: relops → jwatkins
(Assignee)

Comment 1

2 years ago
I've added a host and service check this app.  Unfortunately, heroku doesn't allow icmp traffic to their lb endpoints so I had to change the host check and service ping check to a tcp port 443 check in addition to the https endpoint (app) response check of /v1/ping.  Checks are all green now.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.