Closed
Bug 878792
Opened 12 years ago
Closed 12 years ago
orphaned CLOSE_WAIT sockets with cloudsound.com
Categories
(Core :: Networking: HTTP, defect)
Tracking
()
RESOLVED
FIXED
mozilla25
People
(Reporter: jonathan, Assigned: mcmanus)
References
Details
Attachments
(2 files, 1 obsolete file)
7.31 KB,
patch
|
u408661
:
review+
|
Details | Diff | Splinter Review |
3.51 KB,
patch
|
u408661
:
review+
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20130602 Firefox/24.0 (Nightly/Aurora)
Build ID: 20130602031240
Steps to reproduce:
Log into soundcloud.com (notifications are delivered to users by Websocket, this doesn't seem to happen unless one is logged in, afaict.) Observe Firefox process's TCP/IP connections (I used Process Explorer) for a few minutes.
Actual results:
The number of connections to pushers.soundcloud.com grows continuously with most connections in the CLOSE_WAIT state. Eventually SoundCloud stops behaving correctly and general browsing in Firefox is hindered to the point of things either loading extremely slowly or not at all.
Expected results:
Presumably eventually some of these CLOSE_WAIT connections should actually be closed and freed.
(FWIW, I was unable to reproduce this in Chrome 29, which means it is likely that this is a result of a Firefox issue. I'm not also not sure if this occurs on other sites that employ Websocket, this just happens to be the most apparent to me.)
Reporter | ||
Comment 1•12 years ago
|
||
I should probably also mention that this was over HTTPS.
Reporter | ||
Comment 2•12 years ago
|
||
I sampled a few different Nightly builds:
2013-03-31: The effect does still seem to exist, but the growth of CLOSE_WAIT connections seems fairly slow.
2013-05-12: The effect does still seem to exist and seems less pronounced than Firefox 24, but more than the Firefox 22 nightly.
(Note: My method for determining these was not terribly scientific, but involved around 15-20 minutes of regular usage, which did seem to produce different results between versions.)
Updated•12 years ago
|
Component: Untriaged → Networking: WebSockets
Product: Firefox → Core
Reporter | ||
Updated•12 years ago
|
Component: Networking: WebSockets → Untriaged
OS: Windows 7 → All
Product: Core → Firefox
Hardware: x86_64 → x86
Reporter | ||
Comment 3•12 years ago
|
||
Sorry, Drew. I didn't actually see your changes.
Component: Untriaged → Networking: WebSockets
Product: Firefox → Core
Assignee | ||
Comment 4•12 years ago
|
||
I've been actively looking at this for the last day or so, thanks to a comment in 469344
The problem occurs with soundcloud's websockets server, but I really think the issue is in our generic connection handling with long timeouts.. soundcloud just happens to have a hostname with several IP addresses and most of them timing out, plus websocket code that keeps trying to connect periodically in the background.. so its an ideal trigger.
Assignee: nobody → mcmanus
Assignee | ||
Comment 5•12 years ago
|
||
right now pushers.soundcloud.com resolves to 6 addresses.
most of them don't connect at all, occasionally one of them connects after a very long time.
Necko will only try and connect at the http (or websockets) level for 90 seconds by default.
After that timeout we do SocketTransport()->Close(NS_ERROR_NET_TIMEOUT), followed by AsyncWait(nullptr) (which means don't call me back on network events);
And that indeed forced a closed of the socket. Unfortunately the nsSocketTransport::RecoverFromError() logic will try the next DNS record silently for us on TIMEOUT.
So the socket transport service moved onto the next record, but the http connection manager was no longer listening.. if the next record actually connected (probably very slowly) there was nothing to dispatch I/O events to so the socket transport went into the idle list and never got polled again.. so when the FIN did arrive from the server (timing us out this time) we don't read it.
The fix is simple - just close with NS_ERROR_ABORT instead of NET_TIMEOUT to more accurately convey that we are aborting the socket.
This is a candidate for at least one level of backport. I'm not too worried about soundcloud (which is an ideal test case - thank you Jonathan Jacobs) but a small amount of this can accumulate across the web and theoretically lead to those unreproducible complaints of having to restart firefox because networking isn't running.
Summary: Websockets seem to hang around forever consuming copious numbers of socket handles → orphaned CLOSE_WAIT sockets with cloudsound.com
Assignee | ||
Comment 6•12 years ago
|
||
Attachment #769718 -
Flags: review?(hurley)
Assignee | ||
Comment 7•12 years ago
|
||
Attachment #769719 -
Flags: review?(hurley)
Assignee | ||
Updated•12 years ago
|
Component: Networking: WebSockets → Networking: HTTP
Attachment #769718 -
Flags: review?(hurley) → review+
Assignee | ||
Comment 8•12 years ago
|
||
also update dns blacklist for all connection failures, to preserve DNS failover
Attachment #770390 -
Flags: review?(hurley)
Assignee | ||
Updated•12 years ago
|
Attachment #769719 -
Attachment is obsolete: true
Attachment #769719 -
Flags: review?(hurley)
Comment on attachment 770390 [details] [diff] [review]
orphaned CLOSE_WAIT sockets with multiple A records
Review of attachment 770390 [details] [diff] [review]:
-----------------------------------------------------------------
OK, looks good (as discussed via email).
Attachment #770390 -
Flags: review?(hurley) → review+
Assignee | ||
Comment 10•12 years ago
|
||
Comment 11•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/099ffd0d0d37
https://hg.mozilla.org/mozilla-central/rev/d1d6724bd3a7
Status: UNCONFIRMED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Comment 12•12 years ago
|
||
Hi,
I still have this problem on Mozilla Firefox 24.0 (Linux 3.9.3-gentoo #1 SMP PREEMPT Thu May 23 09:42:56 MSK 2013 x86_64).
When I have 550 connections
$ netstat -npt |grep firefox |awk '{print $6}' |sort|uniq -c
549 CLOSE_WAIT
1 ESTABLISHED
and I can't open new tab or renew an opened.
Assignee | ||
Comment 13•12 years ago
|
||
(In reply to Evgeny from comment #12)
> Hi,
> I still have this problem on Mozilla Firefox 24.0 (Linux 3.9.3-gentoo #1 SMP
> PREEMPT Thu May 23 09:42:56 MSK 2013 x86_64).
> When I have 550 connections
> $ netstat -npt |grep firefox |awk '{print $6}' |sort|uniq -c
>
> 549 CLOSE_WAIT
> 1 ESTABLISHED
>
> and I can't open new tab or renew an opened.
the target milestone is firefox 25
Comment 14•12 years ago
|
||
Is this bug will fixed in an update of ESR version (FF 24) ?
Assignee | ||
Comment 15•12 years ago
|
||
(In reply to Evgeny from comment #14)
> Is this bug will fixed in an update of ESR version (FF 24) ?
no. It does not meet the criteria for backporting to ESR: "Maintenance of each ESR, through point releases, is limited to high-risk/high-impact security vulnerabilities and in rare cases may also include off-schedule releases that address live security vulnerabilities. Backports of any functional enhancements and/or stability fixes are not in scope."
(http://www.mozilla.org/en-US/firefox/organizations/faq/)
You need to log in
before you can comment on or make changes to this bug.
Description
•