remove graceful restart from network device configurations

RESOLVED FIXED

Status

Infrastructure & Operations
NetOps
RESOLVED FIXED
4 years ago
2 years ago

People

(Reporter: dcurado, Assigned: dcurado)

Tracking

Details

(Assignee)

Description

4 years ago
- remove graceful-restart from:
  Why?: There are a few reasons to use graceful-restart.  The main
        reason is to be able to use graceful-routing-engine-switchover (GRES).
        That allows us to switch from a primary routing-engine to a backup,
        and not drop packets.  However, none of our border routers and few
        of our core switches have a backup routing-engine. 
        graceful-restart is still useful with only 1 routing-engine.
        It allows us to restart routing protocols without disrupting traffic.
        However, we don't tend to restart routing-protocols or the RPD process.
        As well, we are interested in deploying bi-directional forwarding
        detection (BFD) which conflicts with graceful-restart.  
        i.e. you should only have one of the two configured when using BGP

So, we'd like to remove graceful-restart from all the devices in our
network that currently have it configured:

	+ agg1.s301.ops.phx1.mozilla.net
	+ border1.console.pao1.mozilla.net
	+ border1.console.scl3.mozilla.net
	+ border1.console.sjc2.mozilla.net
	+ border1.phx1.mozilla.net
	+ border2.console.scl3.mozilla.net
	+ border2.phx1.mozilla.net
	+ core1.corp.console.scl3.mozilla.net
	+ core1.corp.phx1.mozilla.net
	+ core1.svc.phx1.mozilla.net
	+ fw1.akl1.mozilla.net
	+ fw1.corp.console.scl3.mozilla.net
	+ fw1.corp.phx1.mozilla.net
	+ fw1.lon1.mozilla.net
	+ fw1.ops.par1.mozilla.net
	+ fw1.ops.pdx1.mozilla.net
	+ fw1.ops.scl1.mozilla.net
	+ fw1.phx1.mozilla.net
	+ fw1.releng.scl3.mozilla.net
	+ fw1.scl3.mozilla.net
	+ fw1.sfo1.mozilla.net
	+ fw1.svc.phx1.mozilla.net
	+ fw1.tor1.mozilla.net
	+ switch1.r101-10.ops.scl3.mozilla.net
	+ switch1.r301-10.ops.scl3.mozilla.net

There is no documentation on the impact of removing graceful-restart from a
switch, router, or firewall configuration.  While graceful-restart is configured
on these devices, it is *not* configured as part of any protocol configuration.
Worst case: protocol adjacencies will be cleared when this configuration line
 is removed.
Best case: nothing will happen when this configuration line is removed.

Either way, we'll do this change one device at a time, making sure that
the network is in a good working state before moving on to the next device.

Total Maintenance Time: 2 hours
Expected Impact: A series of short periods of routing churn
(Assignee)

Updated

4 years ago
Assignee: network-operations → dcurado
Flags: cab-review?
(Assignee)

Updated

4 years ago
Status: NEW → ASSIGNED
Approved by the CAB on July 23rd. When are we doing this Dave?
Flags: cab-review? → cab-review+
(Assignee)

Comment 2

4 years ago
We removed graceful restart from the remote office firewalls and some switches.
As mentioned above, there is no documentation from Juniper about the impact
of removing graceful restart.
What we learned is that is probably restarts the Routing Protocol Daemon, aka RPD.
That means all protocols restart.
That means all BGP sessions restart.
Rather than wreaking temporary havoc on the data centers by clearing all the BGP
sessions there, we opted to wait until the upcoming TCW to do that.
We want to clean this stuff up, but there is no need to cause problems in order to do so.
(Assignee)

Comment 3

4 years ago
Graceful restart has been unconfigured from all of our equipment except border1.sjc2.
We'll have to take care of that some time, but making this change causes a long
a disturbing re-convergence time for the entire network.

We made this change to border1.pao1, and it took a long time to reconverge.
Not wanting to do that twice in one day, we left border1.sjc2 configured
with graceful-restart for now.
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED

Updated

2 years ago
Cab Review: --- → approved
Flags: cab-review+
You need to log in before you can comment on or make changes to this bug.