Closed Bug 890021 Opened 11 years ago Closed 11 years ago

Widespread twisted disconnects across all trees

Categories

(Infrastructure & Operations Graveyard :: NetOps: Other, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: RyanVM, Assigned: cransom)

Details

(Whiteboard: [buildduty])

Happening in large quantities across all trees. For example:
https://tbpl.mozilla.org/php/getParsedLog.php?id=24900392&tree=Mozilla-Central

All trees closed as of 13:02 PT.
Might be a problem with the AWS link? Not sure how to confirm or reject that theory.
15:49 < netops1> >> fw1.console.releng.scl3.mozilla.net: %-RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer
                    169.254.249.25 (External AS 7224) changed state from Established to Idle (event
                    RecvNotify)
15:50 < netops1> >> fw1.console.releng.scl3.mozilla.net: %-RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer
                    169.254.249.25 (External AS 7224) changed state from OpenConfirm to Established
                    (event RecvKeepAlive)

which basically means that the Amazon end of the VPC went down and came back up again.
Moving to NetOps to verify that links with AWS are up and working.
Assignee: nobody → network-operations
Component: Release Engineering → Server Operations: Netops
QA Contact: ravi
Assignee: network-operations → cransom
We already know this was a blip with AWS. We're leaving this open until we're comfortable that there won't be anymore. There's anythin gelse to do here AFAIK:
16:14 <@bhearsum|buildduty> RyanVM: well, it looks like this one is out of our 
                            hands. do you want to wait a bit to see if we have any 
                            more disconnects before declaring this fixed?
16:15 < RyanVM> yes, thanks :)
16:15 <@bhearsum|buildduty> ok
16:15 < RyanVM> thanks for the quick response
Assignee: cransom → network-operations
Trees reopened at 13:22 PT.
this is a normal occurrence and happens regularly over normal operation. there's no further investigation for netops. there's no utility in keeping this bug open.
Assignee: network-operations → cransom
Severity: blocker → normal
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
This is a painful reminder that we had initially went down the path of VPC for Releng with the understanding that doing so could not and would not close the tree.  Sadly this is no longer the case so it would behoove Releng to begin looking how to prevent a IPSEC tunnel to a 3rd party with a *zero* SLA[1] from closing the tree.

Netops has initiatives to help fortify things on our side, but we can only be as good as the weakest link.

[1] http://aws.amazon.com/vpc/faqs/#Q3

 Q. Does the Amazon VPC VPN Connection have a Service Level Agreement (SLA)?
    Not currently.
Component: Server Operations: Netops → NetOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.