Closed
Bug 762857
Opened 12 years ago
Closed 12 years ago
Network connectivity failure between sjc1 and SCL2/MTV1
Categories
(Infrastructure & Operations Graveyard :: NetOps, task)
Infrastructure & Operations Graveyard
NetOps
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ashish, Assigned: cransom)
Details
HOST DOWNs MTV1 from dm-nagios01 and was briefly unable to ping SCL2 host db1.iddb.scl2.svc.mozilla.com. The nagios bots in mtv1 and scl2 were knocked off too: 04:16:51 -!- nagios-mtv1 [nagios-mtv@moz-BBE3ABD.mv.mozilla.com] has quit [Ping timeout] 04:16:51 -!- nagios-svc-scl2 [nagios-svc@979AC98.5E1FE21F.D25A875A.IP] has quit [Ping timeout] 04:35:33 < nagios-svc-phx1> [162] wp-mon01.phx.weave:browserid.org_gslb is CRITICAL: CRITICAL: GSLB addresses unhealthy: 63.245.209.246=down Casey from netops has been engaged and is looking into things.
Assignee | ||
Comment 1•12 years ago
|
||
the 10g circuits that connect sjc1 to scl2 (mtv1 connects to scl2) briefly flapped. i'm checking with carrier.
Assignee | ||
Comment 2•12 years ago
|
||
and down again.
Assignee | ||
Comment 3•12 years ago
|
||
and up at 4:52. I was speaking to l42 NOC and they had said there was some work scheduled at SJC1 and while he was getting clarification (from Steve, apparently), I got disconnected.
Assignee | ||
Comment 4•12 years ago
|
||
And this back from layer42: Cisco TAC has identified the bug, waiting for a confirmation of which image we need has the fix. It is likely we will open a emergency maintenance window for tonight to upgrade it. I'll be pestering them over the day to make sure we know about the window time frame.
Assignee | ||
Updated•12 years ago
|
Assignee: network-operations → cransom
Assignee | ||
Updated•12 years ago
|
Status: NEW → ASSIGNED
Updated•12 years ago
|
Group: infra
Assignee | ||
Comment 5•12 years ago
|
||
And down hard, again. poked layer42.
Assignee | ||
Comment 6•12 years ago
|
||
Connectivity back up after 10 minutes of downtime.
Summary: Network connectivity blip in SCL2 and MTV1 → Network connectivity failure between sjc1 and SCL2/MTV1
Assignee | ||
Comment 7•12 years ago
|
||
And the last that I hope we hear about this for very long time in regards to most recent 10 minute downtime: I think we are ok now. The new software Cisco gave us loaded itself when the router crashed for a third time. That was at 9:36PDT and it's been normal for the last 3 hours, no further maintenance window expected. Closing.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 8•12 years ago
|
||
for future reference, l42 ticket was #52412
Updated•11 years ago
|
Product: mozilla.org → Infrastructure & Operations
Updated•1 year ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•