mozilla.okta.com is no longer responding to pings

RESOLVED FIXED

Status

RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: achavez, Assigned: rwatson)

Tracking

Details

Attachments

(1 attachment)

(Reporter)

Description

4 years ago
Its possible that one of the IP's is down. We should probably change that check to look up DNS on the fly rather than leting nagios/puppet set the address.

The next time puppet runs on the nagios server, it'll pick a new address for that host and the check will recover.

Comment 1

4 years ago
I just tried pinging the other IPs that are listed in DNA for mozilla.okta.com and none of them are responding, but they are listening on port 80 and 443. I think Okta just started blocking ICMP today, so we'll have to remove this nagios check (currently acked the alert), and replace with a TCP or HTTP check instead.
(In reply to Justin Dow [:jabba] from comment #1)
> I just tried pinging the other IPs that are listed in DNA for
> mozilla.okta.com and none of them are responding, but they are listening on
> port 80 and 443. I think Okta just started blocking ICMP today, so we'll
> have to remove this nagios check (currently acked the alert), and replace
> with a TCP or HTTP check instead.

We should do a http check with transactions see bug 1082144
21:05 < nagios-scl3> Tue 21:05:52 PST [5954] mozilla.okta.com (54.235.64.96) is                      DOWN :PING CRITICAL - Packet loss = 100%
(Assignee)

Comment 4

3 years ago
Since it's an external service and it's just a ping check, wouldn't it make more sense to have this in pingdom, via the URL, vs IP in nagios? I think so.
Assignee: nobody → rwatson
It might, if it responded to pings... which it now does not.

pir@shiny> ping mozilla.okta.com
PING mozilla.okta.com (54.235.64.96): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
^C
--- mozilla.okta.com ping statistics ---
5 packets transmitted, 0 packets received, 100.0% packet loss


It needs to be checked on port 80/443, not by ping. I'd say checking from nagios makes sense since it has to be reachable from the location of the ldap server it talks to so it can function.
(Assignee)

Comment 6

3 years ago
Created attachment 8585394 [details]
okta.png

Pingdom has an option to check via https (attached) and works.

I personally think it makes sense in Pingdom with the rest of our external services. We use nagios to check the internal server, okta1.private.scl3.mozilla.com
(Assignee)

Comment 7

3 years ago
Well. I've added the pingdom check and removed the nagios ones (just the ping checks, the okta1.private still exists) these can be put back in if there is an issue. Would like +R from ashish.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Flags: needinfo?(ashish)
Resolution: --- → FIXED
WFM
Flags: needinfo?(ashish)
You need to log in before you can comment on or make changes to this bug.