Closed
Bug 1164484
Opened 9 years ago
Closed 9 years ago
We lost both 10G links to PHX1 from our POPs
Categories
(Infrastructure & Operations Graveyard :: NetOps, task)
Infrastructure & Operations Graveyard
NetOps
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Usul, Assigned: dcurado)
Details
<nagios-phx1> Wed 08:14:03 PDT [1277] nagios1.private.pek1.mozilla.com (10.24.75.42) is DOWN :PING CRITICAL - Packet loss = 100% <nagios-scl3> Wed 08:14:19 PDT [5174] webwewant.mozilla.org (63.245.217.19) is DOWN :PING CRITICAL - Packet loss = 100% <nagios-euw1> Wed 11:14:24 EDT [8176] nagios1.private.scl3.mozilla.com (10.22.75.42) is DOWN :PING CRITICAL - Packet loss = 100% <nagios-corp-phx1> Wed 08:14:43 PDT [3001] nagios1.private.pek1.mozilla.com (10.24.75.42) is DOWN :PING CRITICAL - Packet loss = 100% <nagios-releng> Wed 08:14:53 PDT [4649] nagios1.private.phx1.mozilla.com (10.8.75.19) is DOWN :PING CRITICAL - Packet loss = 100% <nagios-phx1> Wed 08:15:03 PDT [1280] nagios1.private.scl3.mozilla.com (10.22.75.42) is DOWN :PING CRITICAL - Packet loss = 100% <nagios-corp-phx1> Wed 08:15:03 PDT [3002] nagios1.private.scl3.mozilla.com (10.22.75.42) is DOWN :PING CRITICAL - Packet loss = 100% <nagios-scl3> Wed 08:16:09 PDT [5175] nagios1.private.phx1.mozilla.com (10.8.75.19) is DOWN :PING CRITICAL - Packet loss = 100% <nagios-scl3> Wed 08:16:09 PDT [5177] nagios1.private.corp.phx1.mozilla.com (10.20.75.46) is DOWN :PING CRITICAL - Packet loss = 100% <nagios-scl3> Wed 08:16:10 PDT [5178] nagios1.private.euw1.mozilla.com (10.150.75.12) is DOWN :PING CRITICAL - Packet loss = 100%
Assignee | ||
Comment 1•9 years ago
|
||
Looks like we lost both 10G links to PHX1 from our POPs. I will contact Zayo right away.
Assignee: network-operations → dcurado
Status: NEW → ASSIGNED
Reporter | ||
Comment 2•9 years ago
|
||
hwine> usul_training: mana is unreachable for me from SFO * grenade|afk is now known as grenade <usul_training> https://bugzilla.mozilla.org/show_bug.cgi?id=1164484 <hwine> usul_training: ah, seems like a good item for the topic here, since whistle pig appears also offline <glob> hwine, fwiw i can access whistlepig <usul_training> I'll send a whistlepig <usul_training> probbaly <arr> did we just lose phx1? <glob> (but replication between bmo's scl3 and phx1 clusters is broken) <usul_training> arr looks like it <usul_training> dcurado, is looking <usul_training> the bug is https://bugzilla.mozilla.org/show_bug.cgi?id=1164484 <dcurado> yes, we lost both 10G circuits into PHX1 <dcurado> However, there is a back door tunnel thing that should be working <arr> we're seeing nagios failures <arr> like of the nagios server in phx itself :} * havi has quit (Quit: havi_away) <arr> and dc2 <usul_training> glob, whistlepig is slow for me <rhelmer> hm I am having trouble logging into any of our servers, from the VPN or the jumphost
Reporter | ||
Updated•9 years ago
|
Summary: some links look like they are down → We lost both 10G links to PHX1 from our POPs
Assignee | ||
Comment 3•9 years ago
|
||
zayo is aware of the problem. Ticket numbers are: 703860 703864
Assignee | ||
Comment 4•9 years ago
|
||
Zayo is still determining the extent of the problem, but the tech I spoke with said: "I think this is a pretty big outage because I haven't had a breath in 20 minutes"
Assignee | ||
Comment 5•9 years ago
|
||
I found a missing route in the bgp policy on Adam Newman's IPSec Love Child Backdoor link, for corp.phx1 (10.20/16). I fixed that, and now we appear to have 100% reach-ability again, even while the 2 x 10GE links from Zayo are down. Which is pretty cool. As Zayo still doesn't really know what is going on, I suspect this may be a prolonged outage.
Comment 6•9 years ago
|
||
From #moc: 09:57:15 < jbarnell> Zayo is observing a fiber cut between Roll, AZ and Winterhaven, CA which is impacting our longhaul network Phoenix, AZ to Los Angeles, CA. Technicians are in the area locating the point of damage at this time. We will continue to provide updates immediately as the become available.
Assignee | ||
Comment 7•9 years ago
|
||
Connectivity has been restored.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 8•9 years ago
|
||
It looks like this may be a returning issue. Dave C is looking.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 9•9 years ago
|
||
I contacted the provider, Zayo. They said the original repair (yesterday) was not 100% complete, and they are now splicing the cable. Hard to know what that means, but if I had to guess, I'd say they are re-doing the work with better materials and better splices, after a quick patch in the field yesterday. But usually a provider would re-route live circuits over different fiber first, or at least tell you they are going to do take down the circuits again.
Assignee | ||
Comment 10•9 years ago
|
||
These circuits have been restored to service.
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
Updated•2 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•