Closed Bug 707630 Opened 13 years ago Closed 12 years ago

PHX1 rolling network outage

Categories

(Infrastructure & Operations Graveyard :: NetOps, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ashish, Assigned: ravi)

References

Details

Network issues in PHX beginning at ~02:33 PDT for about 10 minutes. The issue affected addons.m.o, bugzilla.m.o, Services (sync/weave), Socorro and Input - none of which were reachable or would show a Service Unavailable message.
and IRC (concrete), according to 3crowd.  3crowd shows concrete offline from 10:33 UTC to 10:39 UTC.
and again from 10:46 to 10:49 UTC.
Sorry, I misread the timestamps, the second outage 3crowd reported on the irc server was 10:43 to 10:49 UTC.
Another very short one between 08:23 and 08:24 Pacific. Nagios caught it but being so short, the external monitors didn't.
All clear was given at 0400 UTC after a OS upgrade was performed on equipment.

Full details will be provided once remaining data is collected.
Assignee: network-operations → ravi
Summary: PHX network blip on 12/05 AM → PHX1 rolling network outage
Applied configuration changes per vendor recommendation and rebooted firewalls.  Sustained a 8m outage, but things are stable with all AMO online.

CPU and memory utilization nominal.  We will continue to monitor things especially around 1200 GMT.
Not sure if this is the appropriate bug to add the comment or not, but:
Starting 9:20 AM until 9:25 AM , PHX was experiencing latencies. Netops were alerted and on it promptly.
I think we are all going to selectively forget the month of December 2011...
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.