Closed Bug 1125678 Opened 9 years ago Closed 9 years ago

mozilla.okta.com is no longer responding to pings

Categories

(Infrastructure & Operations :: MOC: Problems, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: achavez, Assigned: rwatson)

Details

Attachments

(1 file)

Its possible that one of the IP's is down. We should probably change that check to look up DNS on the fly rather than leting nagios/puppet set the address.

The next time puppet runs on the nagios server, it'll pick a new address for that host and the check will recover.
I just tried pinging the other IPs that are listed in DNA for mozilla.okta.com and none of them are responding, but they are listening on port 80 and 443. I think Okta just started blocking ICMP today, so we'll have to remove this nagios check (currently acked the alert), and replace with a TCP or HTTP check instead.
(In reply to Justin Dow [:jabba] from comment #1)
> I just tried pinging the other IPs that are listed in DNA for
> mozilla.okta.com and none of them are responding, but they are listening on
> port 80 and 443. I think Okta just started blocking ICMP today, so we'll
> have to remove this nagios check (currently acked the alert), and replace
> with a TCP or HTTP check instead.

We should do a http check with transactions see bug 1082144
21:05 < nagios-scl3> Tue 21:05:52 PST [5954] mozilla.okta.com (54.235.64.96) is                      DOWN :PING CRITICAL - Packet loss = 100%
Since it's an external service and it's just a ping check, wouldn't it make more sense to have this in pingdom, via the URL, vs IP in nagios? I think so.
Assignee: nobody → rwatson
It might, if it responded to pings... which it now does not.

pir@shiny> ping mozilla.okta.com
PING mozilla.okta.com (54.235.64.96): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
^C
--- mozilla.okta.com ping statistics ---
5 packets transmitted, 0 packets received, 100.0% packet loss


It needs to be checked on port 80/443, not by ping. I'd say checking from nagios makes sense since it has to be reachable from the location of the ldap server it talks to so it can function.
Attached image okta.png
Pingdom has an option to check via https (attached) and works.

I personally think it makes sense in Pingdom with the rest of our external services. We use nagios to check the internal server, okta1.private.scl3.mozilla.com
Well. I've added the pingdom check and removed the nagios ones (just the ping checks, the okta1.private still exists) these can be put back in if there is an issue. Would like +R from ashish.
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(ashish)
Resolution: --- → FIXED
WFM
Flags: needinfo?(ashish)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: