Closed Bug 1080026 Opened 11 years ago Closed 11 years ago

request to have nagios alerts that are sent to IRC to include IP address along with host name

Categories

(Infrastructure & Operations :: MOC: Service Requests, task, P3)

x86
macOS

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dcurado, Assigned: rtucker)

Details

I asked if this was possible, and got a "yes it is" answer... Currently we get something like: ftp.mozilla.org is DOWN and it would be helpful for network people (like me) if it could say ftp.mozilla.org [63.245.215.46] is DOWN That way, if were to get 20 hosts reporting down, all with different domain names, but, all in the same IP subnet, we'd more easily be able to correlate where the problem lies. Thanks!
I am increasing the priority of this request. The lack of IP address reporting in nagios alerts makes the alerts significantly less useful in actually getting the problem resolved. Sure, nagios is telling you a set of hosts is down, but that's only half the problem. The second half is getting things restored, and shortening the mean time to repair by providing useful information. Here's what happened today: after restoring core1.phx1, nagios-phx1 started reporting a bunch of hosts down. But, pinging the hosts showed they were not down. The answer: if the hostname has an ipv4 *and* and ipv6 address associated with it, nagios-phx1 will use the IPv6 address. We knew we had a slight issue with IPv6 -- looks like a Juniper VRRP bug -- but the hosts alerting as down, when they were not down, was confusing. Really: this is no small issue. Please look into this as soon as time/energy allows. Thanks.
Assignee: server-ops → nobody
Severity: normal → major
Component: Server Operations → Server Operations: MOC
Priority: -- → P3
QA Contact: shyam → dmoore
Component: Server Operations: MOC → MOC: Service Requests
Product: mozilla.org → Infrastructure & Operations
Assignee: nobody → rtucker
Code Diff: https://github.com/rtucker-mozilla/mozilla-nagios-bot/commit/44286ccf138bb7152c4a6e55c7c224764b54d816 Pull Request: https://github.com/rtucker-mozilla/mozilla-nagios-bot/pull/45 SVN Commit to puppet: Committed revision 95020 Note that these are only changing for host checks, not service ones. Please reopen this if any bugs are found or additional functionality is necessary.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
a) Thank you very much! b) Is it possible to have service checks have this same behavior? c) Thanks again!
Flags: needinfo?(rtucker)
It is possible to have service checks with the same behavior, but I'm against it for 2 reasons: 1. There are automated systems I've seen in the past that parse service notifications from the bots. 2. It will exponentially increase the amount of requests via mklivestatus
Flags: needinfo?(rtucker)
OK, I hear you. But (at the risk of repeating myself, apologies...) when we get a bunch of alerts saying, service A and service M and service Z just went down... I have no way to correlate that. I have to copy and paste each hostname, do a DNS lookup on them, and then I can say, "ah, they are all on the 63.245.217/24 subnet, now I know where to look" Know what I mean? I guess if we're getting host-down alerts at the same time, that should allow me to see the IP addresses...
Flags: needinfo?(rtucker)
(In reply to Dave Curado :dcurado from comment #5) > OK, I hear you. > But (at the risk of repeating myself, apologies...) when we get a bunch of > alerts saying, > service A and service M and service Z just went down... I have no way to > correlate that. > I have to copy and paste each hostname, do a DNS lookup on them, and then I > can say, > "ah, they are all on the 63.245.217/24 subnet, now I know where to look" > > Know what I mean? > I guess if we're getting host-down alerts at the same time, that should > allow me to > see the IP addresses... I guess if we're getting host-down alerts at the same time, that should allow me to see the IP addresses... Yes^
Flags: needinfo?(rtucker)
OK. Hopefully that will do the trick. Thanks again for putting this into place. It's a big deal. It is very much appreciated!
You need to log in before you can comment on or make changes to this bug.