Closed
Bug 632300
Opened 15 years ago
Closed 14 years ago
Need alerts when zeus has exhausted a pool
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: cshields, Assigned: rtucker)
References
Details
We need alerts from our zeus clusters whenever all nodes in a pool have failed their health checks. (ideally we will have already seen outage notices for these nodes individually, but this is monitoring the potential problem from a different angle)
So for instance, tonight SUMO's 3 nodes in PHX failed to respond and they all failed, leaving zeus with no node in the pool. I know zeus throws a different type of error when this occurs, we need to catch it and alert oncall accordingly.
| Assignee | ||
Comment 1•15 years ago
|
||
Which zeus node sits in front of the 3 that failed?
| Reporter | ||
Comment 2•15 years ago
|
||
This is in sjc, but to close this bug out I'd like monitoring turned on for all pools in all clusters.
| Assignee | ||
Comment 3•15 years ago
|
||
from conversations with cshields today, it sounds as though the best solution for this is to add trap collection to nagios. Do we have any trap collecting in place? The only way that I know of to do this is to use snmptt and funnel them into nagios, but with our setup the way it is, I can't see this as being an easy task.
| Assignee | ||
Comment 4•15 years ago
|
||
I now have the zlbXX.nms cluster and pp-zlbXX clusters trapping to dm-nagios01 and dp-nagios01 respectively.
I think that I've got the chatter down to where we want it.
Are there other zlb clusters that should be throwing out traps as well that I don't know about? I can configure them to trap as well.
| Assignee | ||
Comment 5•14 years ago
|
||
oremj,
Whenever you had a free minute. Would you mind adding a comment regarding which other zlb clusters I should enable trapping for?
| Reporter | ||
Comment 6•14 years ago
|
||
zlb01.nms.mozilla.org
pm-zlb-generic01.nms.mozilla.org
pm-zlb-amo01.nms.mozilla.org
| Assignee | ||
Comment 7•14 years ago
|
||
I added these additional ones. Are there any more or can I close this out?
| Reporter | ||
Comment 8•14 years ago
|
||
Closing this out.. we have more that are just doing caching so I don't want to get warnings for network blips right now.
thanks!
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•