Closed Bug 1078504 Opened 11 years ago Closed 11 years ago

Network issues involving PHX1

Categories

(Infrastructure & Operations :: MOC: Problems, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: achavez, Unassigned)

References

Details

No description provided.
tracking network issues in PHX1 datacenter. Netops is already investigating.
Main problem appears to have been core1.phx1, please see 1078077 for more information about what changes we made there. We appear to be under a frag attack. This could be what effected core1.phx1. I have updated the firewall filters on our border routers to protect them from this attack. Now trying to protect the rest of our network equipment
Depends on: 1078820
For people following the bug and not #moc, we're having more issues with core1.phx1.
PHX1 network outage is still ongoing. The network operations team is investigating a hardware failure with one of the core network switches and determining the best course for recovery.
Main problem appears to have been core1.phx1, please see 1078077 for more information about what changes we made there. We appear to be under a frag attack. This could be what effected core1.phx1. I have updated the firewall filters on our border routers to protect them from this attack. Now trying to protect the rest of our network equipment
all core switches have had protections added. we've worked around core1.phx1 failing. hopefully things are better now.
Status: NEW → ASSIGNED
Status as of 10/7 9am EST: The network is now stable. The root cause of our problems appears to have been an attack involving NTP against a number of our core switches. The core switches have had firewall filters updated on them to protect them from this issue. However, core1.phx1 also failed during this outage -- either a hardware failure that caused the system to fail, or the ntp attack caused enough problems to cause the system to crash. Then, when core1.phx1 tried to reboot, other hardware problems were exposed -- bad flash drive apparently -- so that the operating system could be loaded. Replacement hardware is being shipped to us. I will update this bug once there are further developments.
Changes made for work around: core1.phx1: disabled interface vlan unit 5 core2.phx1: deactivated vrrp on vlan unit 5 changed the IP address on the vlan interface from 63.245.217.253/24 to 63.245.217.1/24 added an l3-interface to the "vips" vlan of vlan.5 fw1.phx1: James changed the priorities and interface watching configuration These configuration changes will be un-done at the appropriate time, after core1.phx1 is repaired.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.