Open Bug 829591 Opened 11 years ago Updated 2 years ago

Need to research options for more aggressive collection of peer connections

Categories

(Core :: WebRTC: Networking, enhancement, P5)

21 Branch
enhancement

Tracking

()

People

(Reporter: abr, Unassigned)

References

Details

Unlike most objects, peer connections consume substantial resources beyond the memory they're using. In particular, as soon as PCs are created, they begin ICE-related processing, which consumes quite a number of network sockets (the exact number depends on several factors, such as how many network interfaces the machine has).

Because of this behavior, we may need to reap orphaned peer connections more aggressively than other objects. This bug is a placeholder to represent the research task (and solution, if one emerges) of determining whether a more aggressive collection scheme is feasible.

See also Bug 824919.
Depends on: 824919
Adam, how does that affect our automation? Especially when we will have a dozen of peer connection tests? Will this raise a problem? Or are we on the safe side when we correctly cleanup the test?
(In reply to Henrik Skupin (:whimboo) from comment #2)
> Adam, how does that affect our automation? Especially when we will have a
> dozen of peer connection tests? Will this raise a problem? Or are we on the
> safe side when we correctly cleanup the test?

Automation should work fine, as the relevant objects get cleaned up shortly after the page is unloaded. If you want to be particularly conservative, you can throw in an occasional call to force garbage and cycle collection:

      window.QueryInterface(Components.interfaces.nsIInterfaceRequestor)
        .getInterface(Components.interfaces.nsIDOMWindowUtils)
        .garbageCollect();

But I really don't think this should be necessary.
(In reply to Adam Roach [:abr] from comment #3)
> Automation should work fine, as the relevant objects get cleaned up shortly
> after the page is unloaded. If you want to be particularly conservative, you
> can throw in an occasional call to force garbage and cycle collection:

Sorry that I wasn't that explicit with my question but I wonder about bug 826044 here. Could that be related? Should we ensure to have clean-up code in the crashtests?
(In reply to Adam Roach [:abr] from comment #3)
> Automation should work fine, as the relevant objects get cleaned up shortly
> after the page is unloaded. If you want to be particularly conservative, you
> can throw in an occasional call to force garbage and cycle collection:

Adam, when I reload multiple pages, each with two PeerConnection() calls in it for testing the SDP, I am forced to add a delay of min. 3 seconds (on my machine) between each test, otherwise I will get "PeerConnectionCtx onDeviceEvent PeerConnection already started" for each upcoming testcase and then I can not trigger the SDP parser. I have tried to trigger the GC manually in each testcase  but that didn't work.
(In reply to Christoph Diehl [:cdiehl] from comment #5)

> Adam, when I reload multiple pages, each with two PeerConnection() calls in
> it for testing the SDP, I am forced to add a delay of min. 3 seconds (on my
> machine) between each test, otherwise I will get "PeerConnectionCtx
> onDeviceEvent PeerConnection already started" for each upcoming testcase and
> then I can not trigger the SDP parser. I have tried to trigger the GC
> manually in each testcase  but that didn't work.

Interesting. I suspect that's a different problem. Please open a new bug, preferably with an attached minimal test case for replicating that issue.
(In reply to Henrik Skupin (:whimboo) from comment #4)
> (In reply to Adam Roach [:abr] from comment #3)
> > Automation should work fine, as the relevant objects get cleaned up shortly
> > after the page is unloaded. If you want to be particularly conservative, you
> > can throw in an occasional call to force garbage and cycle collection:
> 
> Sorry that I wasn't that explicit with my question but I wonder about bug
> 826044 here. Could that be related? Should we ensure to have clean-up code
> in the crashtests?

I wouldn't think we should leak anything from this behavior, as all of the involved objects are ultimately in the custodianship of the GC and CC. (In any case, after reading through the comments in 826044, it sounds like Jan-Ivar has isolated the leak issue to gUM.)

*This* bug is really intended to track a task to determine whether there's a way to trigger GC/CC on PeerConnections more quickly after their last reference goes away. Anything pertaining to actual *leaks* would be be a new and unrelated bug.

Here's a simple litmus test: if you find a problem that can be worked around by forcing GC/CC as I describe above, it has a pretty good chance of being related to this issue. If it cannot be fixed by forcing GC/CC, then it's guaranteed to be something different that either pertains to an existing bug other than this one, or it is a new problem that needs to have a new bug opened for it.
When we address this issue, we would ideally ensure that all the PeerConnections are cleaned up before the STS thread is shut down. See Bug 856848 comment 10 for details.
Priority: -- → P4
From Andrew McCreight in IRC: "There's some mechanism (the name escapes me at the moment) where you can tell the GC that you are larger than you appear to be, which will make it GC more. Malloc bytes or something."

It's worth noting that this approach only works for GC, not CC, which means the PeerConnection object complex needs to be turned into a DAG. See Bug 824919 comment 25 for the classes involved in this graph. The only cycle left is PeerConnection <--> PeerConnectionObserver, and I think we can safely weaken the Observer-to-PC reference.
(In reply to Adam Roach [:abr] from comment #9)
> The only cycle
> left is PeerConnection <--> PeerConnectionObserver, and I think we can
> safely weaken the Observer-to-PC reference.

n.b. The fix for Bug 928221 intentionally introduces a new cycle between PCObserver JS object and a guard object (called "referent"). This should be fixed after Bug 928535 lands.
Depends on: 928535
backlog: --- → webRTC+
Rank: 46
Whiteboard: [WebRTC] [blocking-webrtc-]
It seems very unlikely that I'll have an opportunity to work on WebRTC platform development for the foreseeable future. I'm unassigning this bug so that someone else might pick it up.
Assignee: adam → nobody
Mass change P4->P5 to align with new Mozilla triage process.
Priority: P4 → P5
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.