Closed Bug 762857 Opened 12 years ago Closed 12 years ago

Network connectivity failure between sjc1 and SCL2/MTV1

Categories

(Infrastructure & Operations Graveyard :: NetOps, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ashish, Assigned: cransom)

Details

HOST DOWNs MTV1 from dm-nagios01 and was briefly unable to ping SCL2 host db1.iddb.scl2.svc.mozilla.com. The nagios bots in mtv1 and scl2 were knocked off too:

04:16:51 -!- nagios-mtv1 [nagios-mtv@moz-BBE3ABD.mv.mozilla.com] has quit [Ping timeout]
04:16:51 -!- nagios-svc-scl2 [nagios-svc@979AC98.5E1FE21F.D25A875A.IP] has quit [Ping timeout]

04:35:33 < nagios-svc-phx1> [162] wp-mon01.phx.weave:browserid.org_gslb is CRITICAL: CRITICAL: GSLB addresses unhealthy: 63.245.209.246=down

Casey from netops has been engaged and is looking into things.
the 10g circuits that connect sjc1 to scl2 (mtv1 connects to scl2) briefly flapped. i'm checking with carrier.
and down again.
and up at 4:52.  I was speaking to l42 NOC and they had said there was some work scheduled at SJC1 and while he was getting clarification (from Steve, apparently), I got disconnected.
And this back from layer42:
Cisco TAC has identified the bug, waiting for a confirmation of which image we need has the fix.  It is likely we will open a emergency maintenance window for tonight to upgrade it.

I'll be pestering them over the day to make sure we know about the window time frame.
Assignee: network-operations → cransom
Status: NEW → ASSIGNED
Group: infra
And down hard, again. poked layer42.
Connectivity back up after 10 minutes of downtime.
Summary: Network connectivity blip in SCL2 and MTV1 → Network connectivity failure between sjc1 and SCL2/MTV1
And the last that I hope we hear about this for very long time in regards to most recent 10 minute downtime:
I think we are ok now.  The new software Cisco gave us loaded itself when the router crashed for a third time.  

That was at 9:36PDT and it's been normal for the last 3 hours, no further maintenance window expected. Closing.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
for future reference, l42 ticket was #52412
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.