Connectivity issues in the London office.

RESOLVED FIXED

Status

Infrastructure & Operations
NetOps: Office Other
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: w0ts0n, Assigned: XioNoX)

Tracking

Details

(Reporter)

Description

4 years ago
Machines at the office are reporting down:

[11:02:37]  nagios-scl3	 Thu 03:02:37 PDT [5065] admin1a.private.lon1.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100%
[11:03:07]  nagios-scl3	 Thu 03:03:07 PDT [5066] vrouter1.voip.lon1.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100%
[11:03:27]  nagios-scl3	 Thu 03:03:27 PDT [5067] admin1.private.lon1.mozilla.com:PING is CRITICAL: PING CRITICAL - Packet loss = 100% (http://m.mozilla.org/PING)

At first glance it appears to be a provider issue. 
Arzhel is contacting level3
(Assignee)

Comment 1

4 years ago
L3 ticket is 8197099.
Now I'm trying to locate any remote hands to check for any power issues, etc...
(Assignee)

Comment 2

4 years ago
Called Nicholas who is onsite, the maintenance company switched the power off in a part of the office.
(Assignee)

Comment 3

4 years ago
back online at 11:10 UTC.

Nicholas power cycled the active firewall node, that allowed me to get external connectivity to the firewall, but routing was still not properly working (eg. could't ping something in another vlan). From there I rebooted the switch stack which solved that.

Last thing was Keepalived on the admin hosts that was stuck and not distributing the VIP.

We also need to figure out why the UPS didn't take the hit (as it was a ~7-8min power outage).
Assignee: network-operations → arzhel
(Assignee)

Comment 4

4 years ago
This also caused the BER1 APs to stay stuck in a reboot loop. Power cycling them brought them back online.
(Assignee)

Comment 5

4 years ago
Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1046691 for the UPS, still going through the logs for the SRX/EX.

Comment 6

4 years ago
This has highlighted a wider issue with infrastructure in London as the power main the UPS runs on is the same as the emergency power breakers in the non-public side. The guy from D&K simply thought he was working on the emergency back up lights. Unlabelled fuses in the breaker meant he simply switched them off. Lights out, AP's out, UPS power loss and some other component are affected. This should not be run from the same circuit.
(Assignee)

Comment 7

4 years ago
There is nothing left to do here.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.