Closed Bug 492033 Opened 11 years ago Closed 10 years ago

Random disconnects when TCP SACK is enabled

Categories

(mozilla.org Graveyard :: Server Operations, task, minor)

All
Other
task
Not set
minor

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Assigned: dmoore)

Details

(Whiteboard: 07/22/2010)

If I'm connected to the MPT VPN, and I ssh to one of the machines in the build network, e.g. moz2-linux-slave02.build.mozilla.org, I'll occasionally get randomly disconnected from the machine.

Something is actually sending me TCP RST packets:
13:51:45.781316 IP moz2-linux-slave02.build.mozilla.org.ssh > 10.2.21.38.51851: Flags [R.], seq 2701, ack 2657, win 79, options [nop,nop,TS val 6684024 ecr 4245624108,nop,nop,sack 1 {2653630087:2653630183}], length 0
13:51:46.601533 IP moz2-linux-slave02.build.mozilla.org.ssh > 10.2.21.38.51851: Flags [R.], seq 2701:2749, ack 2657, win 79, options [nop,nop,TS val 6684241 ecr 4245624108], length 48

Disabling SACK on my machine (echo 0 > /proc/sys/net/ipv4/tcp_sack) seems to fix the issue, but I'd rather not have to do that.

Also, ssh'ing to mpt-vpn.mozilla.com, and then ssh'ing to the desired machine seems to fix the issue.
Assignee: server-ops → dmoore
This bug is due to the lack of TCP SACK support our Cisco firewall software.

When we first encountered it, the status would have been WONTFIX. Cisco had not released a workaround at that time. They have recently released a software upgrade, however, which disables TCP SACK negotiation during the TCP handshake. The end result is the same as disabling SACK locally.

IT will have a discussion later today to determine a schedule for upgrading our firewalls.
Tentatively scheduled for the evening of 05/12
Flags: needs-downtime+
Whiteboard: 05/12 @ 7pm
Group: infra
Running on the new software version now.
Secondary firewall upgrade scheduled for 05/14.
Whiteboard: 05/12 @ 7pm → 05/14 @ 7pm
Firewall upgrade with SACK workaround is complete
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
I've been hitting this again lately.  I haven't tried tcpdump to determine if I'm getting RST packets, but disabling SACK locally does seem to fix the problem.
I'm still hitting this when SACK is enabled locally:

08:12:02.519529 IP production-master.build.mozilla.org.ssh > 10.2.21.86.53291: Flags [R.], seq 725:749, ack 825, win 57, options [nop,nop,TS val 4631086 ecr 2744937268], length 24
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Fill me in here - we did a firewall upgrade to fix this but you're saying it's not fixed?
Assignee: dmoore → mrz
Yes, I'm still getting random disconnects from various machines.  I just managed to capture this when trying to ssh to production-master:


15:05:48.768830 IP production-master.build.mozilla.org.ssh > 10.2.21.38.60501: Flags [R.], seq 1837:1901, ack 1793, win 79, options [nop,nop,TS val 55677417 ecr 548682849], length 64
Assignee: mrz → dmoore
We've confirmed that the patch for this problem actually regressed in a subsequent firewall firmware upgrade. We're investigating our current options.
Flags: needs-downtime+
Whiteboard: 05/14 @ 7pm → [blocked cisco]
A subsequent upgrade has been made available to us from Cisco. We'll schedule an appropriate downtime window shortly.
Flags: needs-downtime+
Whiteboard: [blocked cisco] → 05/18/2010 @ 7pm
Whiteboard: 05/18/2010 @ 7pm
Whiteboard: [needs to be scheduled]
Whiteboard: [needs to be scheduled] → 07/20/2010
Whiteboard: 07/20/2010 → 07/22/2010
FWSM upgrade was completed tonight, and (once again) we've enabled the TCP SACK workaround.
Status: REOPENED → RESOLVED
Closed: 11 years ago10 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.