Closed
Bug 936979
Opened 12 years ago
Closed 12 years ago
websocket will never connected after a lot of failure
Categories
(Core :: Networking: WebSockets, defect)
Tracking
()
RESOLVED
FIXED
mozilla29
People
(Reporter: fatmck, Assigned: jduell.mcbugs)
Details
Attachments
(3 files)
|
1.31 KB,
application/x-compressed-tar
|
Details | |
|
1.58 KB,
patch
|
Details | Diff | Splinter Review | |
|
1.56 KB,
patch
|
mcmanus
:
review+
|
Details | Diff | Splinter Review |
First, sorry for my poor English.
My code is using setInterval to check connection state of websocket(the delay is 3 seconds).
In setInterval callback function, if i found websocket is not connected, i would close it, and create a new websocket.
To see this bug, you should close the server, so the connecting attempt will be always failed.
After a lot connection failure(about over 8 times on my machine, you can just wait for one minute), startup the server, but no connection happens.
I use tcpdump to print the tcp packages: after about over 8 times failure, there is no TCP SYNC sent from firefox. (TCP SYNC means a client is trying connect to a server)
After you see this bug, refresh the page, connecting attempt will still be failure. you must wait a long time, then the connection will be success.
Env: ubuntu12.04 64bit + firefox25 (also buggy in ubuntu13.04 32bit)
The attachment contains following files:
1. test.html : the html file runing the websocket client (trying to conent 127.0.0.1 port 1026)
2. tcpdump.txt : the output of tcpdump in which you will see 8 SYNC packages, and also 8 RESET packages following each SYNC package. Lines marked by [S] is a TCP SYNC package sent by client side which is the firefox websocket. Lines marked by [R.] is a TCP RESET package sent by the server machine, which means no server side application is listening port 1026.
When you confirming this bug, you even don't need a server, just run tcpdump using the following command: sudo tcpdump -ilo tcp and port 1026
This will print any tcp packages happend on 127.0.0.1:1026.
Then open test.html in firefox, you can only see SYNC packages and RESET packages in the first few seconds(on my machine it is 8 SYNC packages in 24 seconds), and then nothing! That means: firefox can not make connecting attemp after a lot of failure. Refresh the web page, still, no connecting attemp happened!
Same code running perfect on google chromium 30.
Component: General → Networking: WebSockets
Product: Firefox → Core
Comment 1•12 years ago
|
||
echo, this is an interesting case and indeed a bug.
Basically firefox has some logic to backoff our connection rate when there are some failed connects - rfc 6455 7.2.3 encourages that. After some time goes by we reduce the backoff.
Your test essentially closes the websocket and restarts a new one every 3 seconds. The bug comes into play when you tests closes the socket from javascript during that backoff timeout - we interpret that as further failure and backoff even more. The process repeats every 3 seconds and the result is that we never end up with a backoff value of less than 3 seconds, so your test always cancels it. deadlock.
The fix appears simple - when we fail to connect during the self-imposed backoff period (probably because js closed the websocket), don't use that as input into extending/increasing the backoff period.
Comment 2•12 years ago
|
||
bug 936979 - websocket will never connected after a lot of failure r?jduell
Attachment #832318 -
Flags: review?(jduell.mcbugs)
Comment 3•12 years ago
|
||
wow, i am so happy to see this patch when i getup in the morning, thank you very much.
So this patch will go with firefox26 probablly? Currently i am using Chromium for development due to this bug.
Updated•12 years ago
|
status-firefox25:
--- → affected
status-firefox26:
--- → affected
status-firefox27:
--- → affected
status-firefox28:
--- → affected
| Assignee | ||
Comment 5•12 years ago
|
||
I think this patch does a more complete fix.
The problem with filtering just on CONNECTING_DELAYED is that we can hit the same JS close() call when we're in CONNECTING_QUEUED (if a 1st websocket is trying to connect, and a second is launched with the same "close after 3 seconds" logic), or in CONNECTING_IN_PROGRESS if the timing is right (we're starting to connect but the timeout/close happens before we're done). It can even happen in NOT_CONNECTING (AsyncOpen does a DNS lookup: if the timer/close happens before DNS calls OnLookupComplete, we're still in NOT_CONNECTING state).
We can be fairly certain that rv == NS_ERROR_NOT_CONNECTED means JS has called close while mTransport == null (we don't call StopSession with that error code anywhere else), and that captures all of these cases:
http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/websocket/WebSocketChannel.cpp#2802
Patrick, let me know if you agree.
Attachment #8358660 -
Flags: review?(mcmanus)
Comment 6•12 years ago
|
||
Comment on attachment 8358660 [details] [diff] [review]
936979.closedelay.v2
Review of attachment 8358660 [details] [diff] [review]:
-----------------------------------------------------------------
yes; better.
Attachment #8358660 -
Flags: review?(mcmanus) → review+
Updated•12 years ago
|
Attachment #832318 -
Flags: review?(jduell.mcbugs)
| Assignee | ||
Comment 7•12 years ago
|
||
Comment 8•12 years ago
|
||
Assignee: nobody → jduell.mcbugs
Status: UNCONFIRMED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Comment 9•12 years ago
|
||
echo, can you please verify that this bug is fixed for you in Firefox 29?
Flags: needinfo?(fatmck)
You need to log in
before you can comment on or make changes to this bug.
Description
•