Closed
Bug 914699
Opened 12 years ago
Closed 11 years ago
Set up nagios alert for self-serve, that emails sheriffs@
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: ashish)
References
Details
Bug 914570 would have become apparent sooner (502 response from self-serve), if we had a nagios alert set up against self-serve, that emailed sheriffs at m dot org. (It's possible there are already nagios alerts for it, but if so, they'll only be alerting in #buildduty).
nthomas, do you know if alerts are already set up for self-serve?
If not, would one of the following be appropriate to set as the URL to check?
* https://secure.pub.build.mozilla.org/buildapi/self-serve
* https://secure.pub.build.mozilla.org/buildapi/self-serve/mozilla-central/rev/tip?format=json
Thanks! :-)
Flags: needinfo?(nthomas)
Comment 1•12 years ago
|
||
There's an http_expect check on buildapi01, which is the host actually running buildapi and self-serve. That didn't fail during bug 914570, which you'd maybe expect if it's just hitting /buildapi as that doesn't need the db. However the 'procs - buildapi' check did start failing, so buildapi wasn't running at all for a while and the http_expect check should have failed. We should verify what the http expect check is actually doing. Dependencies maybe.
I don't see anything checking the user facing proxy, secure.pub.build.mozilla.org, at least not on releng-scl3 nagios instance. I agree it makes sense to add one, the 2nd link makes more sense I think. CCing dustin since he set up all the clustering here.
Flags: needinfo?(nthomas)
(In reply to Ed Morley [:edmorley UTC+1] from comment #0)
> Bug 914570 would have become apparent sooner (502 response from self-serve),
> if we had a nagios alert set up against self-serve, that emailed sheriffs at
> m dot org. (It's possible there are already nagios alerts for it, but if so,
> they'll only be alerting in #buildduty).
I've created bug 914877 for the more general alert issue raised here.
Comment 3•12 years ago
|
||
Yes, monitoring one of those URLs for failures would make a lot of sense. The tricky bit is the LDAP auth, but that shouldn't be too hard to work around. As long as the second URL won't cause too much load, that should be fine.
Assignee | ||
Comment 4•11 years ago
|
||
Can auth be exempted for the Nagios server IP address?
Comment 5•11 years ago
|
||
The internal view, buildapi.pvt.build.m.o, doesn't have auth -- that's what bug 993487 uses. In fact, if you just send those alerts to sheriffs, you can probably call this fixed.
Assignee | ||
Comment 6•11 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #5)
> The internal view, buildapi.pvt.build.m.o, doesn't have auth -- that's what
> bug 993487 uses. In fact, if you just send those alerts to sheriffs, you
> can probably call this fixed.
Done that. Closing this out.
sheriffs@ will now get emails for "buildapi.pvt.build.mozilla.org:http - /buildapi/self-serve/jobs" alerts.
Assignee: server-ops → ashish
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 7•11 years ago
|
||
Thank you :-)
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•