Closed Bug 700379 Opened 13 years ago Closed 12 years ago

sjc1 outage

Categories

(Infrastructure & Operations Graveyard :: NetOps, task)

x86
macOS
task
Not set
major

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: joduinn, Assigned: dmoore)

Details

From nagios alerts, it looks like we lost sjc1 at approx 10:55 PDT. From irc with arr, unclear if a bug already filed to track this. Please DUP as appropriate Not setting to blocker, as arr tells me in irc that network is just now back up again. Filing to track what caused this outage, and to point developers to for details while we unhork the trees.
Assignee: server-ops-releng → network-operations
Component: Server Operations: RelEng → Server Operations: Netops
QA Contact: zandr → mrz
As buildduty I have not yet noticed anything going wrong.
This was not a repeat of the firewall outages experienced last month. Routing issues within an upstream provider (Internap) caused network latency and instability as routes automatically failed over to other providers. Some ICMP pings and new connections may have timed out. Existing TCP sessions should have recovered after the transition. Any clients who use the Internap network as their best path to reach the sjc1/scl2 datacenters would have been impacted. This includes the VPN tunnel to scl1. We are communicating with Internap to determine if further instability should be expected. Welcome to the Internet.
Assignee: network-operations → dmoore
Nothing to do here.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.