Machines at the office are reporting down: [11:02:37] nagios-scl3 Thu 03:02:37 PDT  admin1a.private.lon1.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100% [11:03:07] nagios-scl3 Thu 03:03:07 PDT  vrouter1.voip.lon1.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100% [11:03:27] nagios-scl3 Thu 03:03:27 PDT  admin1.private.lon1.mozilla.com:PING is CRITICAL: PING CRITICAL - Packet loss = 100% (http://m.mozilla.org/PING) At first glance it appears to be a provider issue. Arzhel is contacting level3
L3 ticket is 8197099. Now I'm trying to locate any remote hands to check for any power issues, etc...
Called Nicholas who is onsite, the maintenance company switched the power off in a part of the office.
back online at 11:10 UTC. Nicholas power cycled the active firewall node, that allowed me to get external connectivity to the firewall, but routing was still not properly working (eg. could't ping something in another vlan). From there I rebooted the switch stack which solved that. Last thing was Keepalived on the admin hosts that was stuck and not distributing the VIP. We also need to figure out why the UPS didn't take the hit (as it was a ~7-8min power outage).
Assignee: network-operations → arzhel
This also caused the BER1 APs to stay stuck in a reboot loop. Power cycling them brought them back online.
Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1046691 for the UPS, still going through the logs for the SRX/EX.
This has highlighted a wider issue with infrastructure in London as the power main the UPS runs on is the same as the emergency power breakers in the non-public side. The guy from D&K simply thought he was working on the emergency back up lights. Unlabelled fuses in the breaker meant he simply switched them off. Lights out, AP's out, UPS power loss and some other component are affected. This should not be run from the same circuit.
There is nothing left to do here.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.