bugzilla.mozilla.org has resumed normal operation. Attachments prior to 2014 will be unavailable for a few days. This is tracked in Bug 1475801.
Please report any other irregularities here.

Wireless issues in SFO1, multiple APs blipped according to nagios

RESOLVED FIXED

Status

Infrastructure & Operations
NetOps
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: w0ts0n, Assigned: adam)

Tracking

Details

(Reporter)

Description

5 years ago
Lots of alerts in #sysadmins paging on call. Joel reports wifi is flaky. 

[
17:12:55]  nagios-scl3	 Fri 09:12:49 PST [5480] wap709.ops.sfo1.mozilla.net:PING is CRITICAL: PING CRITICAL - Packet loss = 75%, RTA = 6657.41 ms (http://m.allizom.org/PING)
[17:13:26]  nagios-scl3	 Fri 09:13:20 PST [5481] wap709.ops.sfo1.mozilla.net:PING is OK: PING OK - Packet loss = 0%, RTA = 3.09 ms (http://m.allizom.org/PING)
[17:23:45]  nagios-scl3	 Fri 09:23:40 PST [5484] wap706.ops.sfo1.mozilla.net:PING is CRITICAL: PING CRITICAL - Packet loss = 100% (http://m.allizom.org/PING)
[17:24:45]  nagios-scl3	 Fri 09:24:40 PST [5485] wap706.ops.sfo1.mozilla.net:PING is OK: PING OK - Packet loss = 66%, RTA = 3.13 ms (http://m.allizom.org/PING)
[17:25:05]  nagios-scl3	 Fri 09:25:00 PST [5486] wap205.ops.sfo1.mozilla.net:PING is CRITICAL: PING CRITICAL - Packet loss = 100% (http://m.allizom.org/PING)
[17:25:25]  nagios-scl3	 Fri 09:25:20 PST [5487] wap112.ops.sfo1.mozilla.net is DOWN :PING CRITICAL - Packet loss = 100%
[17:25:56]  nagios-scl3	 Fri 09:25:50 PST [5488] wap205.ops.sfo1.mozilla.net:PING is OK: PING OK - Packet loss = 0%, RTA = 2.99 ms (http://m.allizom.org/PING)
[17:25:56]  nagios-scl3	 Fri 09:25:50 PST [5489] wap303.ops.sfo1.mozilla.net:PING is CRITICAL: PING CRITICAL - Packet loss = 57%, RTA = 5227.71 ms (http://m.allizom.org/PING)
[17:26:15]  nagios-scl3	 Fri 09:26:10 PST [5490] wap112.ops.sfo1.mozilla.net is UP :PING OK - Packet loss = 0%, RTA = 3.12 ms
[17:26:55]  nagios-scl3	 Fri 09:26:49 PST [5491] wap303.ops.sfo1.mozilla.net:PING is OK: PING OK - Packet loss = 0%, RTA = 2.97 ms (http://m.allizom.org/PING)

Updated

5 years ago
Summary: wifi flaky inSFO. wap*.ops reporting critical. → Wireless issues in SFO1, multiple APs blipped according to nagios
APs (at least on the 7th floor in SFO) were not providing IPs and were kicking people off as of 10:15AM PST. Issue has come back for a bit.
Issue was deemed stabilized at ~10:30am, Adam will add more notes as to watch exactly was done to stabilize and resolve the issue.
Was about to send "resolved" notification but just got the following;
Fri 12:40:57 PST [5923] wap709.ops.sfo1.mozilla.net:PING is CRITICAL: PING CRITICAL - Packet loss = 100%
Fri 12:41:47 PST [5924] wap709.ops.sfo1.mozilla.net:PING is OK: PING OK - Packet loss = 0%, RTA = 3.11 ms

Since the ~10:30am "fix", this has been the only alert sent relating to the APs.
wap105.ops.sfo1.mozilla.net just flapped;
Fri 12:45:56 PST -> Fri 12:46:36 PST
(Assignee)

Comment 5

5 years ago
We've updated both the routing between MTV1 and SFO1 and the affinity configuration for both controllers. ATM, all APs are online and connected to the SFO1 controller; we are stable and observing.
(Assignee)

Updated

5 years ago
Assignee: network-operations → adam
Severity: critical → normal

Updated

4 years ago
QA Contact: adam → jbarnell
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.