Closed Bug 930024 Opened 11 years ago Closed 11 years ago

Disconnects across multiple trees and platforms

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
critical

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 918677

People

(Reporter: emorley, Unassigned)

Details

Looks like one of the VPN tunnels to us-east-1 went down at 10:27ET. Failed over to the other tunnel.
Tunnel 1	72.21.209.193	UP	2013-10-19 05:21 EDT	1 BGP ROUTES
Tunnel 2	72.21.209.225	DOWN	2013-10-23 10:27 EDT	IPSEC IS UP
(In reply to Chris AtLee [:catlee] from comment #1)
> Looks like one of the VPN tunnels to us-east-1 went down at 10:27ET. Failed
> over to the other tunnel.

Once we're out of the woods, where do you get to see that failure? I don't see anything on #buildduty. Thanks!
Tunnel 2 looks like it's back up.

Tunnel 1	72.21.209.193	UP	2013-10-19 05:21 EDT	1 BGP ROUTES
Tunnel 2	72.21.209.225	UP	2013-10-23 11:08 EDT	1 BGP ROUTES

Armen, this is from Amazon's VPC dashboard.
@timestamp,@source_host,@message
2013-10-23T14:27:53.000Z,fw1.releng.console.scl3.mozilla.net,%-RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer 169.254.255.77 (External AS 7224) changed state from Established to Idle (event HoldTime)

this is the only important event that happened between 14:20 and 14:40UTC on our infrastructure. Tunnels to AWS flapping happens regularly (lot more frequently than ipsec to any of our offices for example).
Disconnects haven't occurred since, and the other tree carnage seems to be under control; reopening.

Do we have a bug open for increasing the resilience of these VPN tunnels? (I forget)
Severity: blocker → critical
(In reply to Ed Morley [:edmorley UTC+1] from comment #6)
> Do we have a bug open for increasing the resilience of these VPN tunnels? (I
> forget)

Nothing more can be done on our side from what we can identify so far. We did open a new case with Amazon on the issue, and much more details are in Bug 918677. I'm going to close this one out in favor of the investigation going on in there.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
sgtm, thank you :-)
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.