Last Comment Bug 748766 - Only count connected websockets toward max-websocket limit
: Only count connected websockets toward max-websocket limit
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Networking: WebSockets (show other bugs)
: 13 Branch
: x86 All
: -- major (vote)
: mozilla15
Assigned To: Jason Duell [:jduell] (needinfo me)
:
: Patrick McManus [:mcmanus]
Mentors:
http://blog.dennis-boldt.de/?p=177
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-04-25 07:23 PDT by Dennis Boldt
Modified: 2012-06-04 08:59 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed


Attachments
slightly modified version of HTML test page (1.34 KB, patch)
2012-05-29 17:20 PDT, Jason Duell [:jduell] (needinfo me)
akeybl: approval‑mozilla‑aurora-
Details | Diff | Splinter Review
fix/workaround, v1 (5.77 KB, patch)
2012-05-29 21:58 PDT, Jason Duell [:jduell] (needinfo me)
mcmanus: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
diagnostic patch to track session count and # of websocket channels (3.72 KB, patch)
2012-05-29 22:04 PDT, Jason Duell [:jduell] (needinfo me)
no flags Details | Diff | Splinter Review
Log with fix and no web console (135.82 KB, text/plain)
2012-05-29 22:06 PDT, Jason Duell [:jduell] (needinfo me)
no flags Details
Log with fix and web console on (116.46 KB, text/plain)
2012-05-29 23:51 PDT, Jason Duell [:jduell] (needinfo me)
no flags Details

Description Dennis Boldt 2012-04-25 07:23:34 PDT
All details: http://blog.dennis-boldt.de/?p=177
Comment 1 Dennis Boldt 2012-04-26 07:07:24 PDT
I verified that issue with the Linux version under Ubuntu 10.04. and the Windows version under Windows XP.
Comment 2 Dennis Boldt 2012-05-23 08:28:00 PDT
Bug still exists in Firefox 13.0b4
Comment 3 Jason Duell [:jduell] (needinfo me) 2012-05-23 13:03:38 PDT
It's not a bug, it's a feature.... Or at least a practical limit at some point.  We can't support infinite #s of file descriptors/sockets.

I'm trying to get some consensus happening on the hybi list about what to do when a browser hits its WS count limits.
Comment 4 Dennis Boldt 2012-05-23 16:26:50 PDT
Hey Jason,

of course I cannot open (and do not close) unlimited WebSocket connections. Have you read my blog article? The problem is much more problematic. 

In my experiement, I wrote a simple test application. It opens a WebSocket connection and close it immediately as soon it was opened. Such that, there will be never more than one WebSocket connection opened at a time.

The result:

- Chrome 20 opens and closes more than 2000 without any problem
- Chromium 18 opens and closes more than 2800 connections
- Firefox 11 opens and closes more than 1300 WebSocket connections
- Firefox 12 opens and closes 196 WebSocket connections. The 197th WebSocket connection closed with the error 1006.
- Firefox 13.0b4/nightly has the same problem, just after 198 connections

My idea:  Since version 7, Firefox has the preference network.websocket.max-connections to limit the number of connections that can be open at a time, which default value is 200 (see MDN: https://developer.mozilla.org/en/WebSockets#Gecko_7.0 ). I changed that value to 400. Afterwards, the test application opened and closed 396 (400-4) in Firefox 12 until the error 1006 appeared.

Conclusion: Firefox 12 can open and close network.websocket.max-connections - 4 connections. The value of opened WebSocket connection seems _not_ to be decremented in Firefox 12, as soon the WebSocket connection is closed.

It's not a feature, it's a bug ;)
Comment 5 Jason Duell [:jduell] (needinfo me) 2012-05-24 11:08:56 PDT
> It's not a feature, it's a bug ;)

Agreed!  Thanks for the followup clarification.  We definitely need to fix this.
Comment 6 Jason Duell [:jduell] (needinfo me) 2012-05-29 17:20:19 PDT
Created attachment 628154 [details] [diff] [review]
slightly modified version of HTML test page

Dennis:

This is a slightly modified version of your test, which does not require the web console to see how many websockets have been created.  Could you try running it *without* enabling the web console, and tell me if you still see errors?

I don't see any errors if I run this test with no web console.  If I try it again, I hit the limit as you've described.  So at least part of the problem here is that the web console is keeping WebSocketChannels alive for a very long time.

You mentioned that your MS thesis application was running into the WS connection limit, too.   Do you see this only when the web console is on?

Thanks a lot for the reduced test case, by the way.
Comment 7 Jason Duell [:jduell] (needinfo me) 2012-05-29 21:58:39 PDT
Created attachment 628218 [details] [diff] [review]
fix/workaround, v1

So I suspect the web console is the major culprit here, but this is both a quick workaround, and a code change that we probably ought to make anyway.

So we're not failing after a total mMaxConcurrentConnections websockets have been opened during a browser session (good! That means this bug is actually much less serious than I thought).  Rather, we are failing if there are mMaxConcurrentConnections WebSocketChannels still alive at any one point.  That really shouldn't be happening in this simple app (which just opens a single websocket, then opens another during onclose, so there's logically really only ever 1 or 2 websockets that should be logically open).  But the web console or something else is keeping them alive.

We make the problem worse, however, by counting every existing WebSocketChannel as a "session", even if it failed to open, or has closed its TCP connection. That seems to cause the test to go into a spin where, even as actually-connected WebSocketChannels are getting garbage collected, the failed, never-opened ones keep the session count too high for new ones to connect. This patch fixes that by incrementing the session count only during the actual TCP session (roughly speaking).  With it, we no longer hit the connection limit, with or without the web console on.
Comment 8 Jason Duell [:jduell] (needinfo me) 2012-05-29 22:04:59 PDT
Created attachment 628221 [details] [diff] [review]
diagnostic patch to track session count and # of websocket channels

Here's the patch I'm using to generate some stats on how many websockets we've got in existence at once, and how high the session count gets.
Comment 9 Jason Duell [:jduell] (needinfo me) 2012-05-29 22:06:29 PDT
Created attachment 628222 [details]
Log with fix and no web console

With the patch and no web console we do great:  session count is always 1 or 0, and we never have more than 3 websockets in existence (including the protocol handler).
Comment 10 Jason Duell [:jduell] (needinfo me) 2012-05-29 23:51:48 PDT
Created attachment 628251 [details]
Log with fix and web console on

With the web console on we still do well with session count (0 or 1), but the number of live WebSocketChannels gets high--we don't delete the first one until we get to 452 live WebSocketChannels!  (after that we keep hovering at about 220-250 WebSocketChannels: GC seems more frequent after the 1st collection).

The stack trace when we finally delete looks like this:

#0  mozilla::net::WebSocketChannel::~WebSocketChannel 
    at netwerk/protocol/websocket/WebSocketChannel.cpp:728
#1  0x00007f478187de0c in mozilla::net::WebSocketChannel::~WebSocketChannel 
    at netwerk/protocol/websocket/WebSocketChannel.cpp:774
#2  0x00007f478187ad0e in mozilla::net::WebSocketChannel::Release 
    at netwerk/protocol/websocket/WebSocketChannel.cpp:48
#3  0x00007f47816c1251 in nsCOMPtr<nsIRDFDelegateFactory>::~nsCOMPtr 
    at ../../dist/include/nsCOMPtr.h:487
#4  0x00007f4781828431 in mozilla::net::HttpBaseChannel::~HttpBaseChannel 
    at netwerk/protocol/http/HttpBaseChannel.cpp:65
#5  0x00007f478183085e in nsHttpChannel::~nsHttpChannel 
    at netwerk/protocol/http/nsHttpChannel.cpp:110
#6  0x00007f4781830898 in nsHttpChannel::~nsHttpChannel 
    at netwerk/protocol/http/nsHttpChannel.cpp:116
#7  0x00007f4782e4c87e in nsHashPropertyBag::Release 
#8  0x00007f4781828d42 in mozilla::net::HttpBaseChannel::Release 
    at netwerk/protocol/http/HttpBaseChannel.cpp:144
#9  0x00007f478183f362 in nsHttpChannel::Release 
#10 0x00007f47826a46cf in DoDeferredRelease<nsISupports*> 
#11 0x00007f478269a828 in XPCJSRuntime::GCCallback 
    at js/xpconnect/src/XPCJSRuntime.cpp:636
#12 0x00007f4783663f23 in Collect 
    at js/src/jsgc.cpp:3807
#13 0x00007f47836640b4 in js::GCSlice 
    at js/src/jsgc.cpp:3832
#14 0x00007f478364c9d0 in js::IncrementalGC 
#15 0x00007f4782145f31 in nsJSContext::GarbageCollectNow 
    at dom/base/nsJSEnvironment.cpp:2968
#16 0x00007f478214684b in GCTimerFired 

So we're somehow being kept alive by the HTTP channel that we upgraded from (presumably), which is being kept alive by... something to do with the web console.   I've checked and mListener in nsHttpChannel is set to null, so it's not that.  Will investigate some more.
Comment 11 Jason Duell [:jduell] (needinfo me) 2012-05-30 10:29:32 PDT
Rob/Mihai,

dcamp pointed me at you.  Do you think we've got work to do to make the webconsole not hang onto nsHttpChannels?  See the logs and comments in comment 9 vs comment 10.   Or is this fine?  (It's all getting garbage collected eventually).
Comment 12 Jason Duell [:jduell] (needinfo me) 2012-05-30 10:55:04 PDT
I figured out the first issue was nsHttpChannel not releasing its Upgrade callback (a WebsocketChannel).  Once that was fixed, I still got the same very late deletion, this time by nsStreamListenerTee, then nsStreamListenerWrapper.  After these are fixed (patch is in bug 759819), I now get earlier collection:  121 websockets are created before the 1st one gets destroyed, with this stack trace:

#0  mozilla::net::WebSocketChannel::~WebSocketChannel
    at /home/jj/hg/in/t/netwerk/protocol/websocket/WebSocketChannel.cpp:781
#1  in mozilla::net::WebSocketChannel::Release (this=0x7f7df2db3000)
    at /home/jj/hg/in/t/netwerk/protocol/websocket/WebSocketChannel.cpp:48
#2  in DoDeferredRelease<nsISupports*> (array=...) 
    at /home/jj/hg/in/t/js/xpconnect/src/XPCJSRuntime.cpp:598
#3  in XPCJSRuntime::GCCallback (rt=0x7f7dfd623000, status=JSGC_END)
    at /home/jj/hg/in/t/js/xpconnect/src/XPCJSRuntime.cpp:636
#4  Collect (rt=0x7f7dfd623000, incremental=true, budget=10000, gckind=js::GC_SHRINK, 
    reason=js::gcreason::MAYBEGC) at /home/jj/hg/in/t/js/src/jsgc.cpp:3807

So whatever causes an object to get slurped into DoDeferredRelease is happening to the WebSocketChannel.  Perhaps that's normal?
Comment 13 Patrick McManus [:mcmanus] 2012-05-30 12:32:26 PDT
Comment on attachment 628218 [details] [diff] [review]
fix/workaround, v1

ok.. the deferred increment probably doesn't matter much but I can see how the earlier decrement is important.
Comment 14 Jason Duell [:jduell] (needinfo me) 2012-05-31 16:40:35 PDT
> the deferred increment probably doesn't matter much but I can see how the 
> earlier decrement is important.

Agree that the earlier decrement is the key, but I did seem to see an issue where failed websockets that hadn't been GC'd yet was also clogging the connection limit, so I do think both are useful.
Comment 15 Jason Duell [:jduell] (needinfo me) 2012-05-31 16:47:32 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/79b1dbd0b0f8

OK, when this lands I'll consider this bug closed--there've been some improvements in the m-c web console and/or GC in the past few days--I'm now only seeing 60 websockets created before the first GC cleanup, and the average number that exist is much lower--20 or less.   I'm still not clear how the web console code gets its hands on WebSocketChannels, which it has no direct acccess to, and/or otherwise somehow causes them to get cleaned up via DoDeferredRelease() instead of the usual immediate cleanup, but it doesn't seem to be a problem (http channels wind up getting deferred cleanup the same way, but that makes more sense, since the web console JS directly refers to them).  Not something I'm going to worry about now that we've fixed the session count issue, unless someone tells me otherwise.
Comment 16 Jason Duell [:jduell] (needinfo me) 2012-05-31 16:51:48 PDT
Comment on attachment 628218 [details] [diff] [review]
fix/workaround, v1

Dennis:  please let me know if you've seen this issue at any point when you *haven't* been using the web console--that would change our risk/benefit analysis here a lot.

[Approval Request Comment]
Bug caused by (feature/regressing bug #):  long-standing
User impact if declined:  users using the web console may run into an artificially low websocket count (and have their connections fail) if they create and close websockets rapidly.  Not a common scenario, but for developers especially, not wildly unlikely.  
Testing completed (on m-c, etc.):  Tested by hand.
Risk to taking this patch (and alternatives if risky): very low risk.  Just moves some counter increment/decrement to different code location.
String or UUID changes made by this patch: none.
Comment 17 Dennis Boldt 2012-05-31 16:58:22 PDT
First of all: nice work.

Sorry for not answering yet. I am going to deliver my thesis in some hours. Therefore I will check this issue in detail on the weekend. Is your bugfix already in the nighly build, so I can compare?

Btw: Why the web console didn't had any effect in earlier versions, e.g. FF11?
Comment 18 Ed Morley [:emorley] 2012-06-01 08:09:50 PDT
https://hg.mozilla.org/mozilla-central/rev/79b1dbd0b0f8
Comment 19 Jason Duell [:jduell] (needinfo me) 2012-06-01 12:11:42 PDT
Dennis.

Good luck with your thesis delivery :)   And thanks for taking time to report this while you were busy with that.

The fix is in the tree, so it should be in the nightly build now.

> Btw: Why the web console didn't had any effect in earlier versions, e.g. FF11?

Good question for which I don't have a clear answer.  It could have been something the web console did, or perhaps a change in garbage collection completely unrelated to the web console.
Comment 20 Alex Keybl [:akeybl] 2012-06-01 14:51:07 PDT
Comment on attachment 628154 [details] [diff] [review]
slightly modified version of HTML test page

[Triage Comment]
Only affects logged count, no chance for major user facing regressions. Approved for Aurora 14.
Comment 21 Jason Duell [:jduell] (needinfo me) 2012-06-02 22:35:15 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/173d2921c023

Just noticed after I checked in that you +a'd patch 1 instead of 2.  Patch 1 isn't something that's meant to be in the tree, so I assume you meant patch 2, which is the bug fix we landed on m-c.
Comment 22 Alex Keybl [:akeybl] 2012-06-04 08:59:37 PDT
Fixed up the flags, thanks.

Note You need to log in before you can comment on or make changes to this bug.