We need to have monitoring in place for slapd. It probably segfaulted in mozillians-dev. We'll want to know when that happens and be able to automatically restart it. Is that possible? We'll work out the root cause of the current issue, but this monitoring will be critical in -stage and production, also.
There already is monitoring for replication lag on each slave, which requires both the master and slave to be working for an OK status. The nagios bot should put the alert in #mozillians for dev and stage and will alert the oncall sysadmin and put the alert in #mozillians for prod. I added a tcp check for port 389 as well, but is redundant, as the first check will catch any issues with the master and the slave.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Puppet will automatically start slapd if it isn't running.
Awesome sauce! Is this true of mozillians-dev also? We need this while we iron the bugs out of sasl-browserid, since a C plugin can take down it's host server.
Yes, dev and stage will both report issues to #mozillians. prod will alert to #mozillians and page the oncall sysadmin.
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.