nsHalfOpenSockets do not fully release nsSocketTransport objects when abandoned

RESOLVED FIXED in Firefox 27

Status

()

Core
Networking: HTTP
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: sworkman, Assigned: sworkman)

Tracking

25 Branch
mozilla28
Points:
---

Firefox Tracking Flags

(firefox25 wontfix, firefox26 wontfix, firefox27 fixed, firefox28 fixed, firefox-esr17 wontfix, firefox-esr24 wontfix)

Details

(Whiteboard: [qa-])

Attachments

(1 attachment)

Bug 853423 uncovered a leak of several objects surrouding nsHttpConnectionMgr::nsHalfOpenSocket (see bug 853423 comment 25). Local observations suggest that not releasing the nsSocketTransport object and associated input and output streams can keep the socket alive even after it has been abandoned.

This was able to be reproduced using netwerk/test/unit/test_seer.js (xpcshell) and toolkit/mozapps/extensions/test/browser/browser_list.js (mochitest-browser).

In test_seer.js, there was no localhost http server, so connections were refused. Even after the nsHalfOpenSocket was abandoned by nsHttpConnectionMgr, the nsSocketTransport tried to recover from the failed connection attempts by retrying with the next address in the DNS host record. In addition, the nsHalfOpenSocket was kept alive by the nsSocketTransport as its event sink and security callback object. Other objects were similarly kept from deleting. So, when test_seer.js finished up, the leak was detected.

In browser_list.js, the leak was observed because of the backup transport objects created in nsHalfOpenSocket.

So, the solution is to carry out some more cleanup in nsHttpConnectionMgr::nsHalfOpenSocket::Abandon; specifically, cleaning up the mutual references of nsHalfOpenSocket, nsSocketTransport and the input and output streams.
(Assignee)

Comment 1

5 years ago
Created attachment 829569 [details] [diff] [review]
v1.0 Cleanup mutual references held between nsHalfOpenSocket and nsSocketTransport in nsHalfOpenSocket::Abandon

Added some cleanup to nsHalfOpenSocket::Abandon.

-- Remove mutual references for nsHalfOpenSocket and nsSocketTransport for main transport and backup.
-- Remove mutual references for nsHalfOpenSocket and its output stream and backup.
-- Null out references to input stream and backup (input stream should have no references to the half open socket - these are only set when the connection is established and socket and stream ownership moves to nsHttpConnection).

Also added a few comments for quick reading in the future.
Attachment #829569 - Flags: review?(mcmanus)
Comment on attachment 829569 [details] [diff] [review]
v1.0 Cleanup mutual references held between nsHalfOpenSocket and nsSocketTransport in nsHalfOpenSocket::Abandon

Review of attachment 829569 [details] [diff] [review]:
-----------------------------------------------------------------

steve this is really great. This is potentially a socket resource leak - not just a memory one. Theoretically it could lead to networking lockups, but it would seem pretty hard to get to that extreme. nonetheless, maybe after a week or so we should move this up at least to aurora it has potential for fixing a serious problem even if we can't show it.
Attachment #829569 - Flags: review?(mcmanus) → review+
(Assignee)

Comment 4

5 years ago
Thanks Pat. Yeah, I was wondering about that too, especially after thinking about lame-network stuff recently.

On inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/78cd3e278127

I'll land bug 853423 later - I'd like the patches to be isolated in case one of them needs to be backed out.
https://hg.mozilla.org/mozilla-central/rev/78cd3e278127
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
(Assignee)

Comment 6

5 years ago
Comment on attachment 829569 [details] [diff] [review]
v1.0 Cleanup mutual references held between nsHalfOpenSocket and nsSocketTransport in nsHalfOpenSocket::Abandon

[Approval Request Comment]
Bug caused by (feature/regressing bug #):
-- Server is unreachable (network connection down, bad DNS record etc.) and load is canceled (e.g. user cancels page load).
 
User impact if declined:
-- Potential memory leaks
-- File descriptors could be held unnecessarily which may lead to connection issues.

Testing completed (on m-c, etc.): 
-- On m-c for a while now.

Risk to taking this patch (and alternatives if risky):
-- No known risk.
-- No known alternative.

String or IDL/UUID changes made by this patch:
-- None.
Attachment #829569 - Flags: approval-mozilla-aurora?
Steve, given 853423  which is blocking this big is only on Firefox 28, unclear is this patch in needed on aurora. Can you help ? Anything I am missing here ?

Updated

5 years ago
Flags: needinfo?(sworkman)
:bajaj - this is a long standing resource leak (~20 releases) of a networking socket on an error condition. it blocked another feature which made the tests show it more clearly but it has always been there. I've updated the status fields to reflect that.

Leaks like this can lead to networking requiring a restart ("I gotta restart firefox to reach anything") which we don't hear about often but do occasionally hear about.. so imo this is worth backporting at least one release.
status-firefox25: --- → affected
status-firefox26: --- → affected
status-firefox27: --- → affected
status-firefox28: --- → fixed
status-firefox-esr17: --- → affected
status-firefox-esr24: --- → affected
(Assignee)

Comment 9

5 years ago
Bhavana: what Pat said :) This patch was blocking bug 853423, not the other way round.
Flags: needinfo?(sworkman)
Comment on attachment 829569 [details] [diff] [review]
v1.0 Cleanup mutual references held between nsHalfOpenSocket and nsSocketTransport in nsHalfOpenSocket::Abandon

low risk win, looks good to land.
Attachment #829569 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Patrick McManus [:mcmanus] from comment #8)
> :bajaj - this is a long standing resource leak (~20 releases) of a
> networking socket on an error condition. it blocked another feature which
> made the tests show it more clearly but it has always been there. I've
> updated the status fields to reflect that.
> 
> Leaks like this can lead to networking requiring a restart ("I gotta restart
> firefox to reach anything") which we don't hear about often but do
> occasionally hear about.. so imo this is worth backporting at least one
> release.

(In reply to Steve Workman [:sworkman] from comment #9)
> Bhavana: what Pat said :) This patch was blocking bug 853423, not the other
> way round.

Thanks for the clarification. Just approved :) Please add verifyme if their is any QA help needed here.
https://hg.mozilla.org/releases/mozilla-aurora/rev/4494b9b0a0e9
status-firefox25: affected → wontfix
status-firefox26: affected → wontfix
status-firefox27: affected → fixed
status-firefox-esr17: affected → wontfix
status-firefox-esr24: affected → wontfix
Thanks Bhavana! I think it should be fine without QA.
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.