Last Comment Bug 789018 - Websockets problems in FF 15 (speculative connect websockify)
: Websockets problems in FF 15 (speculative connect websockify)
Status: VERIFIED FIXED
:
Product: Core
Classification: Components
Component: Networking: WebSockets (show other bugs)
: 15 Branch
: x86_64 Windows 7
: -- normal (vote)
: mozilla18
Assigned To: Patrick McManus [:mcmanus]
:
Mentors:
: 790731 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-06 04:58 PDT by Niklas Rother
Modified: 2012-09-24 09:32 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
wontfix
verified
verified
verified


Attachments
patch 0 (1.52 KB, patch)
2012-09-12 12:11 PDT, Patrick McManus [:mcmanus]
jduell.mcbugs: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
lukasblakk+bugs: approval‑mozilla‑beta+
Details | Diff | Review

Description Niklas Rother 2012-09-06 04:58:52 PDT
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20100101 Firefox/15.0
Build ID: 20120824154833

Steps to reproduce:

I'm using Websockify (https://github.com/kanaka/websockify) to connect to a VNC Server (TightVNC) via noVNC (https://github.com/kanaka/noVNC). This worked great with Firefox 14.0.1, Chrome 21 and even IE 10. Now I upgraded to Firefox 15 and the problems start.

Websockfy is a Websocket <> normal socket proxy used to tunnel the VNC connection over websockets.


Actual results:

Websockify hangs after a connect with Firefox. I'm quite sure is it nothing related to wesockify since it is working with Chrome 21 and IE 10 and worked with Firefox 14.

The only relevant log info from websockify is "2: 79.194.220.16: ignoring socket not ready"

Firefox says in the console: "Firefox can't connect to ws://..."


Expected results:

The connection should be established as in the previous version and all other brwosers.
Comment 1 Patrick McManus [:mcmanus] 2012-09-06 05:53:56 PDT
I bet websockify is choking on parallel tcp connects during the websockets handshake. That's fundamentally a problem with the server that ought to be fixed.

If I'm right its being triggered by the FF15 http speculative connect code. It doesn't have much meaning to websockets, but ws is bootstrapped with HTTP so it is being triggered.

Next week I'll put together a patch to remove these from websocket driven connections in FF. It would be great if you could push the websockify developers to this bug so we can work on the server side too (which, again assuming my speculation is correct, is vulnerable to being dos'd by a telnet session not just the FF behavior.)
Comment 2 Niklas Rother 2012-09-06 07:15:39 PDT
That was a fast reply! Thanks for you help!
I've send a mail to Joel (the developer of websockify) asking him for help. 
For the meantime: Is there maybe a switch in about:config to disable this speculative connect code? I've looked over it and only found network.prefetch-next but setting it to false seems not to help...
Comment 3 Joel Martin 2012-09-06 07:17:53 PDT
Websockify in it's native environment handles this just fine. However, it's a python server and currently relies on the multiprocessing module to handle multiple client connections in separate processes. This module is problematic on windows (depending on python version AFAICT) and will get disabled if websockify detects it's not fully supported (websockify prints a warning about this on startup). In that case websockify will only accept one connection at a time. 

The current select timeout in websockify is 3 seconds. The default noVNC timeout is 2 seconds. This means that if there is an initial non-functioning connection it takes 3 seconds for websockify to detect this condition and cleanup the connection. By this time noVNC will have timed out. I have simulated this condition with --run-once (which disables multiprocessing in websockify) and have replicated the situation using firefox 15. 

I think removing the speculative connect for WebSocket connections in firefox would be a good idea generally.

For the filer of the issue there are a couple of workarounds:

* Run websockify on a platform that is better supported by python generally (i.e. not Windows)

* Run one of the alternative versions of websockify. For example, the websockify repository has a Node.js implementation. This should work fine in Windows and allow multiple client connections (although I've never tested it there personally). You will have to install Node and some node modules required by websockify.js

* Increase the noVNC connect timeout value (every connection will take at least 3 seconds waiting for the first non-functioning connection to timeout).

* Decrease the select timeout in websockify (I may do this myself since 3 seconds is probably overly cautious). This will increase CPU usage especially if you make it really low or 0. You'll probably want to do this in combination with increasing noVNC timeout because your effective connection timeout to the websockify target will become the difference between the two.
Comment 4 Jason Duell [:jduell] (needinfo? me) 2012-09-06 08:46:13 PDT
> a patch to remove these from websocket driven connections in FF.

Yeah, it sounds like it's best to stop doing the speculative connects for websockets at this point, and get on aurora/beta so this this breakage is just a six-week window (I don't think it rises to level of a chemspill).
Comment 5 Niklas Rother 2012-09-07 00:48:21 PDT
You were completly right, I'm using websockify on Windows and get a message that multiprocessing is disabled.
I increased the timeout of noVNC to 10sec and now it works even in Firefox 15! 
Thanks for that idea!

There is still one thing that makes me wonder: When I don't increase the timeout websockify seems to hang in the second connection. After a (failed) connect with FF 15 I had to restart websockify to be able to connect with Chrome. Maybe there is still a (Windows) bug hidden in websockify? As Patrick said it might be possible to crash a Windows installation of websockify remotly by sending two parallel connections to websockify and closing them before the 3sec timeout is over.

Anyway, I agree that this speculative connect on WebSockets is a Firefox bug and should be fixed in the next version.
Comment 6 Joel Martin 2012-09-07 06:25:31 PDT
Without multiprocessing, websockify will only allow a single connection at a time. In that case websockify will not accept new connections until the existing socket is closed. Have you actually seen the Windows version of websockify crash (exit) or just refuse new connections? If it's crashing, that's a different issue.
Comment 7 Niklas Rother 2012-09-07 06:34:11 PDT
It's not crashing but refusing connections. According to you, this means the socket is still open. But even if I close the page websockify keeps hanging. Maybe Firefox isn't closing the sockets correctly?
Comment 8 Joel Martin 2012-09-07 09:11:06 PDT
Perhaps. You could verify by completely restarting firefox and seeing if that forces the connection to close and allows a new connection. If that doesn't address it then there is probably an addition issue in websockify where it isn't properly detecting closed sockets (perhaps related to differences with Windows socket management).
Comment 9 Niklas Rother 2012-09-10 01:14:16 PDT
Ok, closing Firefox completly "reanimates" websockify, but with a few errors about a forcible closed connection. I was really sure I tried closing Firefox before...
Comment 10 Patrick McManus [:mcmanus] 2012-09-12 12:11:11 PDT
Created attachment 660539 [details] [diff] [review]
patch 0

as I thought about this patch I began to wonder why this issue was so aggressively reported..

while it is certainly possible that the speculative connection could create an extra tcp socket the base case should start the connection early (before the cache) and then later on hook back up with the already made connection (or perhaps the connection still in progress) - so that only 1 would be made. The idea is just to start early (perhaps before cancellation due to cache hits), not to wildly make more connections than we think we're going to need. (that's a different topic :))

But the reports indicated that there were always (or at least most of the time) two of them. I began to worry something was really awry beyond the websocket scope.

But it turns out its just a WS issue. After the speculative connect is made the channel disables Keep-Alive (also out of concern for compatibility with websockets servers).. and because it can't do KA it can't (re-)use the speculative connect. That's why there were always 2 and they were indeed basically wasted unless other HTTP traffic was going to connect to the same host and port.

So this patch
 1] disables the SC for upgrade transactions (i.e. websocket bootstraps)
 2] disables the SC for any transaction where KA is disabled (though websocket actually does it later, so it still needs part 1)
Comment 11 Jason Duell [:jduell] (needinfo? me) 2012-09-12 12:28:15 PDT
Comment on attachment 660539 [details] [diff] [review]
patch 0

Review of attachment 660539 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpChannel.cpp
@@ +529,5 @@
> +    // if we are offline, when doing http upgrade (i.e. websockets bootstrap),
> +    // or if we can't do keep-alive (because then we couldn't reuse
> +    // the speculative connection anyhow).
> +    if (mApplicationCache || gIOService->IsOffline() || 
> +        mUpgradeProtocolCallback || !(mCaps & NS_HTTP_ALLOW_KEEPALIVE))

nice catch with KEEPALIVE.
Comment 12 Patrick McManus [:mcmanus] 2012-09-12 13:26:56 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/18610afc9f55
Comment 13 Patrick McManus [:mcmanus] 2012-09-12 13:39:03 PDT
Comment on attachment 660539 [details] [diff] [review]
patch 0

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 729133
User impact if declined: interop problems with some websocket servers
Testing completed (on m-c, etc.): on m-c
Risk to taking this patch (and alternatives if risky): extremely low - this patch simply selectively disables an optimization. no patch is risk free, but this is pretty close.
String or UUID changes made by this patch: none.
Comment 14 Jason Duell [:jduell] (needinfo? me) 2012-09-12 13:39:30 PDT
*** Bug 790731 has been marked as a duplicate of this bug. ***
Comment 15 Ryan VanderMeulen [:RyanVM] 2012-09-12 18:16:33 PDT
https://hg.mozilla.org/mozilla-central/rev/18610afc9f55
Comment 17 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-09-18 15:47:43 PDT
@Niklas, can you please confirm this is fixed with the latest Firefox 16, 17, and 18 builds?

Firefox 18.0a1 can be downloaded from nightly.mozilla.org
Firefox 17.0a2 can be downloaded from aurora.mozilla.org
Firefox 16.0b4 should be out later this week on beta.mozilla.org

Thank you
Comment 18 Niklas Rother 2012-09-19 05:38:38 PDT
Sorry, I didn't knew you were waiting for me...
I checked it with Aurora and it worked as exspected. No warnings in Websockify and no timeout in noVNC. Everything fine!
Comment 19 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-09-19 09:48:47 PDT
Thanks Niklas, would you be able to check in Nightly and Beta?
Comment 20 Niklas Rother 2012-09-20 00:25:31 PDT
Ok, I checked with nighly and it seemed to work. I had a problem in the first case but wasn't able to reproduce it. Maybe some network problem.
Now it work's every time.
I wait for beta 16.0b4 and check it with that version when it is ready.
Comment 21 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-09-20 08:59:40 PDT
Thanks a lot for your help verifying this bug, Niklas. You can find 16.0b4 candidate builds at the following URL if you don't want to wait:

ftp://ftp.mozilla.org/pub/firefox/nightly/16.0b4-candidates/build1/
Comment 22 Niklas Rother 2012-09-24 00:25:53 PDT
It seems like the final version of 16.0b4 became ready in the meantime so I checked with it today -> everything is fine!
Comment 23 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-09-24 09:32:41 PDT
Thank you very much, Niklas.

Note You need to log in before you can comment on or make changes to this bug.