Closed Bug 888273 Opened 11 years ago Closed 11 years ago

RWIN Scaling (RFC1323) limited to 2 on Windows 7 and 8

Categories

(NSPR :: NSPR, defect, P1)

All
Windows 7
defect

Tracking

(Not tracked)

RESOLVED FIXED
4.10.1

People

(Reporter: daniel, Assigned: mcmanus)

References

Details

Attachments

(1 file, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0 (Beta/Release)
Build ID: 20130618035212

Steps to reproduce:

Clean install of Windows 8, clean install of Firefox 22. Then monitor TCP handshake for window scale setting using Wireshark or use http://www.speedguide.net/analyzer.php to monitor TCP window scale setting received by remote server.


Actual results:

When Windows autotuninglevel=normal, Firefox 22 (and earlier versions) incorrectly sends window scale (RFC1323) of 2 during TCP handshake. 

Internet Explorer 10 and Chrome 27, under the same conditions, correctly send window scale of 8 during TCP handshake.

When running on Windows XP with appropriate registry adjustments, Firefox 22 correctly sends window scale of 8 during TCP handshake.

Firefox 22 running on Fedora 18 by default sends window scale of 7 during TCP handshake (presumably with appropriate OS configuration this could be increased). 

No adjustment of any windows setting, windows tcp/ip or http registry entries, or Firefox about:config parameter has any effect on increasing the window scale used by Firefox 22 (or earlier versions) in Windows 7 or 8.

This bug severely limits performance on internet connections with high bandwidth delay products (BDP). In my case, this limits download speeds in Firefox to 100-300 kilobytes/s, when IE 10 and Chrome get 10 megabytes/s.


Expected results:

Firefox should use window scale of 8 when Windows autotuninglevel=normal on both Windows 7 and Windows 8.
Component: Untriaged → Networking
Product: Firefox → Core
I agree with Daniel; this is kind of bad. 128KB of rwin (max at window scale of 2) @ 100ms is only able to support 10mbit/sec. At 300ms, only a third of that.

It took a little doing for me to repro on win 7 - so I don't want to venture a guess at what fraction of the user base this is impacting.. windows autotuning is a bit of a black box and I'm not a ms-stack expert.

To see it repro I had to run the speedguide TCP tweak program and use it to "reset windows defaults".. Now I was already running autotuning, so I'm not really sure what I reset - but I definitely got the bug to repro when I did that.

I also needed to use a wired connection instead of wireless. I'm not saying that's universally required, but it made a difference for me.

At this point I have no idea why this is happening. We don't manually manage the receive window [letting autotune do it], though there are unused hooks for that. I put in crash statements around that in a local build and was able to reproduce the problem without a crash. We do tweak the send buffer a little bit (because it traditionally is too small and is not autotuned) and I thought that might be causing this as a side effect somehow - but removing that code doesn't change the scaling behavior.

I also have to say that I don't think this has always been true.. but whether the change is in firefox or a windows system update I have no idea.

gotta keep looking - thanks for the report.
Assignee: nobody → mcmanus
Patrick, did you verify that Chrome or IE10 is using a window scale of 8 (2^8) on your setup, as if they are not, your changes to Firefox's send buffer code might be getting blocked elsewhere in the TCP stack.

Out of the box, on a clean install of Windows 8, Chrome and IE10 use a window scale of 8 and Firefox a window scale of 2. But on Windows 7 with other software installed, debugging this becomes a little tricky (as you discovered Windows like to change things dynamically). The first thing to do is to make sure Chrome or IE10 is using a window scale of 8 and that Firefox is using a window scale of 2, this lets you know nothing else is interfering with the stack. The following is what I've found to be necessary to simplify debugging of this problem on non-clean installs of Windows 7.

Firstly, disable any other software that intercepts HTTP connections, e.g. Antivirus or Firewall, as I've found some of these contain bugs that affect the window scale.

Secondly, make sure your receive window auto-tuning level is set to normal. It sounds like you've done this using the TCP tweak program. To do it manually, execute "netsh interface tcp set global autotuninglevel=normal" at an elevated command prompt. You can check your current settings using "netsh interface tcp show global".

Thirdly, disable window scaling heuristics (so Windows always tries to use window scaling, which makes things more predictable for debugging). To do this you need to change or add the following registry key:
Value path: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Tcpip\Parameters
Value type: DWORD
Value name: EnableWsd
Value data: 0

Fourthly, make sure auto-tuning is enabled for HTTP connections. Do this by changing or adding the following four registry keys (you only need the Wow6432Node ones if you're using a 64bit OS):
Value path:
HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Internet Settings
HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Internet Settings\WinHttp
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\Internet Settings
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\Internet Settings\WinHttp
Value type: DWORD
Value name: TcpAutotuning
Value data: 1

Finally, once you've done all the above, you may need to restart your machine for them to take effect. But after that, you should find Firefox using window scale of 2 and Chrome / IE10 using a window scale of 8.
(In reply to Patrick McManus [:mcmanus] from comment #1)
> I agree with Daniel; this is kind of bad. 128KB of rwin (max at window scale
> of 2) @ 100ms is only able to support 10mbit/sec. At 300ms, only a third of
> that.
> 

I need to correct this - window scale of 2 is a factor of 4, which means a max rwin of 256KB; which is 20mbit/sec at 100ms. That's still going to be a bottleneck that we need to deal with but it bugged me that I got it wrong the first time.

Anyhow I haven't figured out the trigger here yet. I tried totally stubbing out the nspr setsockopt wrapper so that no calls could go through that API and yet we still create these connections with window scale of 2 where chrome and ie have 8.

still looking..
Probably because the cause is a workaround for some broken servers (bug 363997).
But I doubt this workaround is still required because both Chrome and IE operates without the workaround for a long time.
It is this use of SIO_SET_COMPATIBILITY_MODE tucked into nspr..

http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/md/windows/w95sock.c#151

http://msdn.microsoft.com/en-us/library/windows/desktop/ms741621%28v=vs.85%29.aspx

it doesn't look like there is a way to undo that call, so we'll need to patch nspr to take it out or make it optional.

I presume it is there to enhance compatibility with middleboxes that don't understand window scaling.. I don't think that's much of an issue any more (as chrome and IE aren't doing it and the windows HTTP stack doesn't do it).
Component: Networking → NSPR
Product: Core → NSPR
Version: 22 Branch → other
Blocks: 363997
ted, do you have bandwidth to be able to steal this review?
Flags: needinfo?(ted)
I was looking at an LTE connection today that speedtest.net measured end to end at 60ms and 60mbit/sec (on a linux platform). That's not a high delay environment.

That would take 440KB of TCP window to carry - so our windows client wouldn't be able to fill that pipe.

Let's get that fixed.
Comment on attachment 770151 [details] [diff] [review]
re-enable rfc 1323 window scaling factor of 8 for windows

Review of attachment 770151 [details] [diff] [review]:
-----------------------------------------------------------------

This seems right, but I don't know if we're more worried about backwards-compat in NSPR than Firefox. Was the intent of this code to work around broken networking hardware somewhere along the way?
Flags: needinfo?(ted)
Thanks Ted!

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #9)
> Comment on attachment 770151 [details] [diff] [review]
> re-enable rfc 1323 window scaling factor of 8 for windows
> 
> Review of attachment 770151 [details] [diff] [review]
> -----------------------------------------------------------------
> 
> This seems right, but I don't know if we're more worried about
> backwards-compat in NSPR than Firefox. 

well, the code was put in originally for firefox (363997) - though I know that doesn't answer the whole question.

> Was the intent of this code to work
> around broken networking hardware somewhere along the way?

right.. during the launch of vista there were interop problems. But all TCP applications on >= Vista can use larger scales by default (and chrome and IE use the default) so this trail has been well blazed and now the workaround has become a bottleneck.

afaict this isn't something I can patch up in gecko wile still using NSPR sockets.
Comment on attachment 770151 [details] [diff] [review]
re-enable rfc 1323 window scaling factor of 8 for windows

Review of attachment 770151 [details] [diff] [review]:
-----------------------------------------------------------------

switch reviewers due to timeout.
Attachment #770151 - Flags: review?(wtc) → review?(ted)
Patrick: thank you very much for the patch. I just got
back from a vacation. Sorry about the delay in reviewing
your patch.

I fixed the indentation of the "if" in _PR_MD_InitSockets()
and added a line to reset socketFixInet6RcvBuf to PR_FALSE
in _PR_MD_CleanupSockets().

Patch checked in on the NSPR hg trunk:
https://hg.mozilla.org/projects/nspr/rev/a8691537c601
Attachment #770151 - Attachment is obsolete: true
Attachment #770151 - Flags: review?(ted)
Attachment #780094 - Flags: checked-in+
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Priority: -- → P1
Hardware: x86_64 → All
Target Milestone: --- → 4.10.1
wtc - thanks! Can we get NSPR released to m-c before the next aurora uplift? (8/5 iirc).
if ted isn't back by then to do this, push to m-c and worry about upstream nspr later.
Comment on attachment 780094 [details] [diff] [review]
re-enable rfc 1323 window scaling factor of 8 for windows, v2, by Patrick McManus

Pushed to mozilla-inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7ecaf40de72f
wtc, safe to assume that you'll upstream this during the next nspr release?
https://hg.mozilla.org/mozilla-central/rev/7ecaf40de72f
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
(In reply to Doug Turner (:dougt) from comment #16)
> wtc, safe to assume that you'll upstream this during the next nspr release?

Yes. In general I check in NSPR changes in the NSPR upstream first
(I did that in this case, see the end of comment 12) and then push
official NSPR tags to mozilla-central.
You need to log in before you can comment on or make changes to this bug.