Widespread twisted disconnects across all trees

RESOLVED FIXED

Status

Infrastructure & Operations
NetOps: Other
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: RyanVM, Assigned: casey)

Tracking

Details

(Whiteboard: [buildduty])

(Reporter)

Description

5 years ago
Happening in large quantities across all trees. For example:
https://tbpl.mozilla.org/php/getParsedLog.php?id=24900392&tree=Mozilla-Central

All trees closed as of 13:02 PT.
Might be a problem with the AWS link? Not sure how to confirm or reject that theory.
15:49 < netops1> >> fw1.console.releng.scl3.mozilla.net: %-RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer
                    169.254.249.25 (External AS 7224) changed state from Established to Idle (event
                    RecvNotify)
15:50 < netops1> >> fw1.console.releng.scl3.mozilla.net: %-RPD_BGP_NEIGHBOR_STATE_CHANGED: BGP peer
                    169.254.249.25 (External AS 7224) changed state from OpenConfirm to Established
                    (event RecvKeepAlive)

which basically means that the Amazon end of the VPC went down and came back up again.
Moving to NetOps to verify that links with AWS are up and working.
Assignee: nobody → network-operations
Component: Release Engineering → Server Operations: Netops
QA Contact: ravi
(Assignee)

Updated

5 years ago
Assignee: network-operations → cransom
We already know this was a blip with AWS. We're leaving this open until we're comfortable that there won't be anymore. There's anythin gelse to do here AFAIK:
16:14 <@bhearsum|buildduty> RyanVM: well, it looks like this one is out of our 
                            hands. do you want to wait a bit to see if we have any 
                            more disconnects before declaring this fixed?
16:15 < RyanVM> yes, thanks :)
16:15 <@bhearsum|buildduty> ok
16:15 < RyanVM> thanks for the quick response
Assignee: cransom → network-operations
(Reporter)

Comment 5

5 years ago
Trees reopened at 13:22 PT.
(Assignee)

Comment 6

5 years ago
this is a normal occurrence and happens regularly over normal operation. there's no further investigation for netops. there's no utility in keeping this bug open.
Assignee: network-operations → cransom
Severity: blocker → normal
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED

Comment 7

5 years ago
This is a painful reminder that we had initially went down the path of VPC for Releng with the understanding that doing so could not and would not close the tree.  Sadly this is no longer the case so it would behoove Releng to begin looking how to prevent a IPSEC tunnel to a 3rd party with a *zero* SLA[1] from closing the tree.

Netops has initiatives to help fortify things on our side, but we can only be as good as the weakest link.

[1] http://aws.amazon.com/vpc/faqs/#Q3

 Q. Does the Amazon VPC VPN Connection have a Service Level Agreement (SLA)?
    Not currently.
Component: Server Operations: Netops → NetOps: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.