websocket will never connected after a lot of failure

RESOLVED FIXED in mozilla29



5 years ago
6 months ago


(Reporter: fatmck, Assigned: jduell)


25 Branch

Firefox Tracking Flags

(firefox25 affected, firefox26 affected, firefox27 affected, firefox28 affected)



(3 attachments)



5 years ago
Created attachment 829973 [details]
test.html runing the websocket client and tcpdump output files

First, sorry for my poor English.

My code is using setInterval to check connection state of websocket(the delay is 3 seconds).
In setInterval callback function, if i found websocket is not connected, i would close it, and create a new websocket.

To see this bug, you should close the server, so the connecting attempt will be always failed.
After a lot connection failure(about over 8 times on my machine, you can just wait for one minute), startup the server, but no connection happens.
I use tcpdump to print the tcp packages: after about over 8 times failure, there is no TCP SYNC sent from firefox. (TCP SYNC means a client is trying connect to a server)
After you see this bug, refresh the page, connecting attempt will still be failure. you must wait a long time, then the connection will be success.

Env: ubuntu12.04 64bit  + firefox25 (also buggy in ubuntu13.04 32bit)

The attachment contains following files:
1. test.html  : the html file runing the websocket client (trying to conent port 1026)
2. tcpdump.txt : the output of tcpdump in which you will see 8 SYNC packages, and also 8 RESET packages following each SYNC package. Lines marked by [S] is a TCP SYNC package sent by client side which is the firefox websocket. Lines marked by [R.] is a TCP RESET package sent by the server machine, which means no server side application is listening port 1026.

When you confirming this bug, you even don't need a server, just run tcpdump using the following command: sudo tcpdump -ilo tcp and port 1026
This will print any tcp packages happend on
Then open test.html in firefox, you can only see SYNC packages and RESET packages in the first few seconds(on my machine it is 8 SYNC packages in 24 seconds), and then nothing! That means: firefox can not make connecting attemp after a lot of failure. Refresh the web page, still, no connecting attemp happened!

Same code running perfect on google chromium 30.


5 years ago
Component: General → Networking: WebSockets
Product: Firefox → Core
echo, this is an interesting case and indeed a bug.

Basically firefox has some logic to backoff our connection rate when there are some failed connects - rfc 6455 7.2.3 encourages that. After some time goes by we reduce the backoff.

Your test essentially closes the websocket and restarts a new one every 3 seconds. The bug comes into play when you tests closes the socket from javascript during that backoff timeout - we interpret that as further failure and backoff even more. The process repeats every 3 seconds and the result is that we never end up with a backoff value of less than 3 seconds, so your test always cancels it. deadlock.

The fix appears simple - when we fail to connect during the self-imposed backoff period (probably because js closed the websocket), don't use that as input into extending/increasing the backoff period.
Created attachment 832318 [details] [diff] [review]
commit e79c294afa7a24bfaea46c12d52142776Author: Patrick McManus <mcmanus@ducksong.com>

    bug 936979 -  websocket will never connected after a lot of failure r?jduell
Attachment #832318 - Flags: review?(jduell.mcbugs)

Comment 4

5 years ago
wow, i am so happy to see this patch when i getup in the morning, thank you very much.

So this patch will go with firefox26 probablly? Currently i am using Chromium for development due to this bug.
status-firefox25: --- → affected
status-firefox26: --- → affected
status-firefox27: --- → affected
status-firefox28: --- → affected

Comment 5

5 years ago
Created attachment 8358660 [details] [diff] [review]

I think this patch does a more complete fix.

The problem with filtering just on CONNECTING_DELAYED is that we can hit the same JS close() call when we're in CONNECTING_QUEUED (if a 1st websocket is trying to connect, and a second is launched with the same "close after 3 seconds" logic), or in CONNECTING_IN_PROGRESS if the timing is right (we're starting to connect but the timeout/close happens before we're done).  It can even happen in NOT_CONNECTING (AsyncOpen does a DNS lookup: if the timer/close happens before DNS calls OnLookupComplete, we're still in NOT_CONNECTING state).

We can be fairly certain that rv == NS_ERROR_NOT_CONNECTED means JS has called close while mTransport == null (we don't call StopSession with that error code anywhere else), and that captures all of these cases:


Patrick, let me know if you agree.
Attachment #8358660 - Flags: review?(mcmanus)
Comment on attachment 8358660 [details] [diff] [review]

Review of attachment 8358660 [details] [diff] [review]:

yes; better.
Attachment #8358660 - Flags: review?(mcmanus) → review+
Attachment #832318 - Flags: review?(jduell.mcbugs)
Assignee: nobody → jduell.mcbugs
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29

Comment 9

5 years ago
echo, can you please verify that this bug is fixed for you in Firefox 29?
Flags: needinfo?(fatmck)

Comment 10

6 months ago
just reply to clear the needinfo request.
Flags: needinfo?(fatmck)
You need to log in before you can comment on or make changes to this bug.