Websocket opening takes ages after repeated failures to same address
Categories
(Core :: Networking: WebSockets, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox122 | --- | fixed |
People
(Reporter: waclaw66, Assigned: acreskey)
References
Details
(Whiteboard: [necko-triaged][necko-priority-queue], [wptsync upstream])
Attachments
(8 files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/110.0
Steps to reproduce:
I'm having issue with websocket opening on a certain domain that takes long time ~ 40s.
I've created a demo client and server project (attached) to debug it.
Client with javascript is available on:
https://pong.bolesiny.net/
https://pong.waclaw.cz/
Check the console, there is no visible html output.
Actual results:
When Firefox is opened, websocket is opened immediately. After a while (minutes) of browsing on about ~30 tabs it takes a long time to open that websocket in client demo. Once websocket is opened, then its reopening is fast again. After few minutes of inactivity its opening take ages again. The weird thing is that it does only on particular domain bolesiny.net, I've tried another domains (e.g. waclaw.cz) reverse proxied to the same server and it works without any problem. All those domains use the same way generated letsencrypt certificate.
You can check the profile https://share.firefox.dev/3HM8Ps0 of that slow websocket opening, it could help to trace the problem.
I've tried to disable antivirus, didn't help. It works fine with a clear Firefox profile and in Chrome as well. It's dependent to a specific FF state, probably high amount of tabs or websockets on a particular domain.
Expected results:
Websocket should be opened fast on any domain.
Reporter | ||
Comment 1•1 year ago
|
||
Comment 2•1 year ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Security: PSM' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Reporter | ||
Updated•1 year ago
|
Reporter | ||
Comment 3•1 year ago
|
||
I think I've found a cause and know how to reproduce it. It's caused by many unsuccessful websocket opening attempts.
One of those mentioned (pinned) tabs is Grafana which uses websocket for live updates. I have unintentionally not enabled websocket upgrade in nginx reverse proxy for its subdomain. Grafana tries to connect to websocket every 15s, after few minutes, connecting to websockets for that domain and all other subdomains take ages (20-100s).
Once I enabled websocket upgrade for that Grafana subdomain, websockets for whole domain started working flawlessly.
Seems to be an issue in Firefox. Does Firefox somehow recycle websockets for a domain or is there any websocket pool limit?
Reporter | ||
Comment 4•1 year ago
|
||
I have prepared a simpe demo that demostrates the problem...
https://pong.bolesiny.net/proof/
Tried FF 110b9 devel edition and nightly 111a1, the second connection took always >30s in my case. Tested on different machines, always same problem.
Chrome without any problem.
Reporter | ||
Comment 5•1 year ago
|
||
I've tested FF with mozregression as far as I could (because of old TLS cababilities) and it looks that FF suffers with above problem since the beginning.
Comment 6•1 year ago
|
||
Reporter | ||
Comment 7•1 year ago
|
||
Hi Sunil, can you filter for "connected" and post all connection times?
Comment 8•1 year ago
|
||
Comment 9•1 year ago
|
||
I was able to reproduce this locally using this link https://pong.bolesiny.net/proof/.
Apart from the huge connection time difference with regard to chrome, I also noticed that Firefox connects only once, whereas Chrome reconnects after few failures.
Updated•1 year ago
|
Reporter | ||
Comment 10•1 year ago
|
||
Thats weird, I've tested it on three different PC with different internet access, Windows 10, Fedora 37, always the same :/
Althought you are on MacOS, right?
Reporter | ||
Comment 11•1 year ago
|
||
Reporter | ||
Comment 12•1 year ago
|
||
Updated•1 year ago
|
Comment 13•1 year ago
|
||
(In reply to Václav Nováček from comment #10)
Thats weird, I've tested it on three different PC with different internet access, Windows 10, Fedora 37, always the same :/
Althought you are on MacOS, right?
Thats correct!
Reporter | ||
Comment 14•1 year ago
|
||
Comment on attachment 9317445 [details]
ChromeFailingWebSocket
Ohh wait, now I spotted that, you've tested Chrome :D Therefore I was surprised, that's working.
Reporter | ||
Comment 15•1 year ago
|
||
(In reply to Sunil Mayya from comment #9)
I was able to reproduce this locally using this link https://pong.bolesiny.net/proof/.
Apart from the huge connection time difference with regard to chrome, I also noticed that Firefox connects only once, whereas Chrome reconnects after few failures.
Please check the javascript behing, you've maybe misunderstood that demo example.
There are two kind of connections. First connects every 60s to proper websocket endpoint. The second tries to connect to false websocket endpoint, therefore it fails. That failures of second connection cause that those huge re-connection times of first connection.
I'll have a look at whether we're implementing the spec correctly.
Updated•1 year ago
|
Updated•9 months ago
|
Assignee | ||
Updated•8 months ago
|
Assignee | ||
Comment 17•8 months ago
|
||
This is interesting; thanks for logging it, Václav.
It looks like we keep track of webSocket connect failures by address and port.
In this test we attempt to connect to two urls, both at the same address and port:
wss://pong.bolesiny.net/ws (valid)
wss://pong.bolesiny.net/ (invalid)
We repeatedly attempt the invalid url which fails and thus progresses the expontential backoff for that address/port.
From RFC 6455 , "clients SHOULD use some form of backoff when trying to reconnect after abnormal closures as described in this section."
But if we keyed the errors by the full url and port -- that seems like it would prevent this seemingly odd behaviour and still minimize reconnects after failure.
Note that the test provided in this bug works gracefully in both Chrome (as noted) and an Safari (i.e. connections to the valid WebSocket URL are not delayed).
Assignee | ||
Updated•8 months ago
|
Assignee | ||
Comment 18•7 months ago
|
||
Hmm, although we key Websocket connections off the address so implementation-wise if we key failures off the full url we've introduced some complexity.
Valentin -- thoughts?
Assignee | ||
Comment 19•7 months ago
|
||
Although our implementation follows the intention of rfc6455#section-4.1, we can handle the described situation more gracefully by prioritizing new WS connections to paths that have not previously failed.
Assignee | ||
Comment 20•7 months ago
|
||
We can discuss in the patch, but what's happening is that we need to serialize attempts to connect to a give host / port pair.
https://datatracker.ietf.org/doc/html/rfc6455#section-4.1
And the repeated failed attempts fill up the queue to that host.
So in this implementation I allow WS connections to a path that has not failed yet to get priority in the queue.
Assignee | ||
Comment 21•7 months ago
|
||
Depends on D193386
Comment 22•7 months ago
|
||
Pushed by acreskey@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dee1e05a2cca Websocket opening takes ages after repeated failures to same address r=necko-reviewers,valentin https://hg.mozilla.org/integration/autoland/rev/27bfc748bec2 Add wpt tests for Websocket repeated failures to same address r=necko-reviewers,kershaw
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/43181 for changes under testing/web-platform/tests
Comment 24•7 months ago
|
||
Backed out for bc failure on browser_staticPartition_websocket.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/cfb634c3e7654129c226c81ad4eedf017f062043
Log link: https://treeherder.mozilla.org/logviewer?job_id=436383045&repo=autoland&lineNumber=12093
Upstream PR was closed without merging
Comment 26•7 months ago
|
||
Pushed by acreskey@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8e259edabb62 Websocket opening takes ages after repeated failures to same address r=necko-reviewers,valentin https://hg.mozilla.org/integration/autoland/rev/872aa271adfd Add wpt tests for Websocket repeated failures to same address r=necko-reviewers,kershaw
Assignee | ||
Updated•7 months ago
|
Comment 27•7 months ago
|
||
Backed out for bustages on WebSocketChannel.cpp
Backout link: https://hg.mozilla.org/integration/autoland/rev/f371dd435ba9da0420c0f6b3664afc0673f92bce
Log link: https://treeherder.mozilla.org/logviewer?job_id=437446239&repo=autoland&lineNumber=57888
Upstream PR was closed without merging
Comment 29•7 months ago
|
||
Pushed by acreskey@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/36f007cf29d0 Websocket opening takes ages after repeated failures to same address r=necko-reviewers,valentin https://hg.mozilla.org/integration/autoland/rev/180501fca227 Add wpt tests for Websocket repeated failures to same address r=necko-reviewers,kershaw
Comment 30•7 months ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/36f007cf29d0
https://hg.mozilla.org/mozilla-central/rev/180501fca227
Upstream PR merged by moz-wptsync-bot
Upstream PR merged
Assignee | ||
Updated•6 months ago
|
Description
•