Closed Bug 914699 Opened 12 years ago Closed 11 years ago

Set up nagios alert for self-serve, that emails sheriffs@

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: ashish)

References

Details

Bug 914570 would have become apparent sooner (502 response from self-serve), if we had a nagios alert set up against self-serve, that emailed sheriffs at m dot org. (It's possible there are already nagios alerts for it, but if so, they'll only be alerting in #buildduty). nthomas, do you know if alerts are already set up for self-serve? If not, would one of the following be appropriate to set as the URL to check? * https://secure.pub.build.mozilla.org/buildapi/self-serve * https://secure.pub.build.mozilla.org/buildapi/self-serve/mozilla-central/rev/tip?format=json Thanks! :-)
Flags: needinfo?(nthomas)
There's an http_expect check on buildapi01, which is the host actually running buildapi and self-serve. That didn't fail during bug 914570, which you'd maybe expect if it's just hitting /buildapi as that doesn't need the db. However the 'procs - buildapi' check did start failing, so buildapi wasn't running at all for a while and the http_expect check should have failed. We should verify what the http expect check is actually doing. Dependencies maybe. I don't see anything checking the user facing proxy, secure.pub.build.mozilla.org, at least not on releng-scl3 nagios instance. I agree it makes sense to add one, the 2nd link makes more sense I think. CCing dustin since he set up all the clustering here.
Flags: needinfo?(nthomas)
(In reply to Ed Morley [:edmorley UTC+1] from comment #0) > Bug 914570 would have become apparent sooner (502 response from self-serve), > if we had a nagios alert set up against self-serve, that emailed sheriffs at > m dot org. (It's possible there are already nagios alerts for it, but if so, > they'll only be alerting in #buildduty). I've created bug 914877 for the more general alert issue raised here.
Yes, monitoring one of those URLs for failures would make a lot of sense. The tricky bit is the LDAP auth, but that shouldn't be too hard to work around. As long as the second URL won't cause too much load, that should be fine.
Blocks: 914877
Can auth be exempted for the Nagios server IP address?
The internal view, buildapi.pvt.build.m.o, doesn't have auth -- that's what bug 993487 uses. In fact, if you just send those alerts to sheriffs, you can probably call this fixed.
(In reply to Dustin J. Mitchell [:dustin] from comment #5) > The internal view, buildapi.pvt.build.m.o, doesn't have auth -- that's what > bug 993487 uses. In fact, if you just send those alerts to sheriffs, you > can probably call this fixed. Done that. Closing this out. sheriffs@ will now get emails for "buildapi.pvt.build.mozilla.org:http - /buildapi/self-serve/jobs" alerts.
Assignee: server-ops → ashish
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Thank you :-)
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.