Its possible that one of the IP's is down. We should probably change that check to look up DNS on the fly rather than leting nagios/puppet set the address. The next time puppet runs on the nagios server, it'll pick a new address for that host and the check will recover.
I just tried pinging the other IPs that are listed in DNA for mozilla.okta.com and none of them are responding, but they are listening on port 80 and 443. I think Okta just started blocking ICMP today, so we'll have to remove this nagios check (currently acked the alert), and replace with a TCP or HTTP check instead.
(In reply to Justin Dow [:jabba] from comment #1) > I just tried pinging the other IPs that are listed in DNA for > mozilla.okta.com and none of them are responding, but they are listening on > port 80 and 443. I think Okta just started blocking ICMP today, so we'll > have to remove this nagios check (currently acked the alert), and replace > with a TCP or HTTP check instead. We should do a http check with transactions see bug 1082144
21:05 < nagios-scl3> Tue 21:05:52 PST  mozilla.okta.com (18.104.22.168) is DOWN :PING CRITICAL - Packet loss = 100%
Since it's an external service and it's just a ping check, wouldn't it make more sense to have this in pingdom, via the URL, vs IP in nagios? I think so.
Assignee: nobody → rwatson
It might, if it responded to pings... which it now does not. pir@shiny> ping mozilla.okta.com PING mozilla.okta.com (22.214.171.124): 56 data bytes Request timeout for icmp_seq 0 Request timeout for icmp_seq 1 Request timeout for icmp_seq 2 Request timeout for icmp_seq 3 ^C --- mozilla.okta.com ping statistics --- 5 packets transmitted, 0 packets received, 100.0% packet loss It needs to be checked on port 80/443, not by ping. I'd say checking from nagios makes sense since it has to be reachable from the location of the ldap server it talks to so it can function.
Created attachment 8585394 [details] okta.png Pingdom has an option to check via https (attached) and works. I personally think it makes sense in Pingdom with the rest of our external services. We use nagios to check the internal server, okta1.private.scl3.mozilla.com
Well. I've added the pingdom check and removed the nagios ones (just the ping checks, the okta1.private still exists) these can be put back in if there is an issue. Would like +R from ashish.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.