Closed
Bug 1080026
Opened 11 years ago
Closed 11 years ago
request to have nagios alerts that are sent to IRC to include IP address along with host name
Categories
(Infrastructure & Operations :: MOC: Service Requests, task, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dcurado, Assigned: rtucker)
Details
I asked if this was possible, and got a "yes it is" answer...
Currently we get something like:
ftp.mozilla.org is DOWN
and it would be helpful for network people (like me) if it could say
ftp.mozilla.org [63.245.215.46] is DOWN
That way, if were to get 20 hosts reporting down, all with different
domain names, but, all in the same IP subnet, we'd more easily be
able to correlate where the problem lies.
Thanks!
| Reporter | ||
Comment 1•11 years ago
|
||
I am increasing the priority of this request.
The lack of IP address reporting in nagios alerts makes the alerts significantly less useful
in actually getting the problem resolved. Sure, nagios is telling you a set of hosts is
down, but that's only half the problem. The second half is getting things restored,
and shortening the mean time to repair by providing useful information.
Here's what happened today:
after restoring core1.phx1, nagios-phx1 started reporting a bunch of hosts down.
But, pinging the hosts showed they were not down.
The answer: if the hostname has an ipv4 *and* and ipv6 address associated with it, nagios-phx1
will use the IPv6 address. We knew we had a slight issue with IPv6 -- looks like a Juniper
VRRP bug -- but the hosts alerting as down, when they were not down, was confusing.
Really: this is no small issue. Please look into this as soon as time/energy allows.
Thanks.
Assignee: server-ops → nobody
Severity: normal → major
Component: Server Operations → Server Operations: MOC
Priority: -- → P3
QA Contact: shyam → dmoore
Updated•11 years ago
|
Component: Server Operations: MOC → MOC: Service Requests
Product: mozilla.org → Infrastructure & Operations
| Assignee | ||
Updated•11 years ago
|
Assignee: nobody → rtucker
| Assignee | ||
Comment 2•11 years ago
|
||
Code Diff:
https://github.com/rtucker-mozilla/mozilla-nagios-bot/commit/44286ccf138bb7152c4a6e55c7c224764b54d816
Pull Request:
https://github.com/rtucker-mozilla/mozilla-nagios-bot/pull/45
SVN Commit to puppet:
Committed revision 95020
Note that these are only changing for host checks, not service ones.
Please reopen this if any bugs are found or additional functionality is necessary.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 3•11 years ago
|
||
a) Thank you very much!
b) Is it possible to have service checks have this same behavior?
c) Thanks again!
Flags: needinfo?(rtucker)
| Assignee | ||
Comment 4•11 years ago
|
||
It is possible to have service checks with the same behavior, but I'm against it for 2 reasons:
1. There are automated systems I've seen in the past that parse service notifications from the bots.
2. It will exponentially increase the amount of requests via mklivestatus
Flags: needinfo?(rtucker)
| Reporter | ||
Comment 5•11 years ago
|
||
OK, I hear you.
But (at the risk of repeating myself, apologies...) when we get a bunch of alerts saying,
service A and service M and service Z just went down... I have no way to correlate that.
I have to copy and paste each hostname, do a DNS lookup on them, and then I can say,
"ah, they are all on the 63.245.217/24 subnet, now I know where to look"
Know what I mean?
I guess if we're getting host-down alerts at the same time, that should allow me to
see the IP addresses...
Flags: needinfo?(rtucker)
| Assignee | ||
Comment 6•11 years ago
|
||
(In reply to Dave Curado :dcurado from comment #5)
> OK, I hear you.
> But (at the risk of repeating myself, apologies...) when we get a bunch of
> alerts saying,
> service A and service M and service Z just went down... I have no way to
> correlate that.
> I have to copy and paste each hostname, do a DNS lookup on them, and then I
> can say,
> "ah, they are all on the 63.245.217/24 subnet, now I know where to look"
>
> Know what I mean?
> I guess if we're getting host-down alerts at the same time, that should
> allow me to
> see the IP addresses...
I guess if we're getting host-down alerts at the same time, that should allow me to
see the IP addresses...
Yes^
Flags: needinfo?(rtucker)
| Reporter | ||
Comment 7•11 years ago
|
||
OK. Hopefully that will do the trick.
Thanks again for putting this into place.
It's a big deal.
It is very much appreciated!
You need to log in
before you can comment on or make changes to this bug.
Description
•