Closed Bug 980270 Opened 12 years ago Closed 11 years ago

ICE leaks many allocations

Categories

(Core :: WebRTC: Networking, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla33
Tracking Status
firefox30 --- wontfix
firefox31 --- wontfix
firefox32 --- fixed
firefox33 --- fixed
firefox34 --- fixed
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: jesup, Assigned: bwc)

References

Details

(Keywords: memory-leak, Whiteboard: [lsan][MemShrink:P2])

Attachments

(4 files)

Lots and lots of things get leaked in ICE/etc
So, I suspect that the following line is causing a leak when ctx->response is already set: http://dxr.mozilla.org/mozilla-central/source/media/mtransport/third_party/nICEr/src/stun/stun_client_ctx.c#524
Assignee: nobody → docfaraday
The rest of the nICEr stuff in this trace is related to the nr_registry, which is set up once but never torn down. Do we want to put priority on cleaning this up on shutdown? I don't see a whole lot of reason to expend extra effort to clean this up when all NrIceCtx are gone (prior to shutdown), since it looks like tens of K, but if it is easier/cleaner this way it is an option.
Flags: needinfo?(rjesup)
First attempt at plugging the recurring leaks.
SUMMARY: LeakSanitizer: 198910 byte(s) leaked in 1824 allocation(s). That's down in the same test from ~360K/2000, and no errors flagged.
Flags: needinfo?(rjesup)
Attached file y11
Attachment #8387137 - Flags: review?(ekr)
Do we want to spend the time to clean up the one-time nr_registry leaks?
Flags: needinfo?(rjesup)
There seem to be 3x 32K+delta allocs there (perhaps some other smaller ones). Note that due to jemalloc, the 'slop' for a 32K + delta allocation is almost 32K.... so these are poor sizes to begin with. Given at least 100K is locked in use after the first ref to ICE, and probably closer to 200K, I'd like to see them released when we're not using them (when we shut down peerconnections), even if it's off a timer to avoid churn or some such (I don't think a timer is needed). I'm not so concerned with releasing them in shutdown; if that's all we should just add entries in the suppression list for lsan (see the grandparent bug to this one on lsan for suppression info IIRC). Also: are these registry sizes appropriate for this use? they seem large for a browser that at most might have a handful of sessions active (doubly so with bundle/rtcpmux)
Flags: needinfo?(rjesup)
Flags: needinfo?(ekr)
Flags: needinfo?(docfaraday)
slop here is more like 3*(1 page - delta) apparently. Still we're well over 100K total.
(In reply to Randell Jesup [:jesup] from comment #9) > There seem to be 3x 32K+delta allocs there (perhaps some other smaller > ones). Note that due to jemalloc, the 'slop' for a 32K + delta allocation > is almost 32K.... so these are poor sizes to begin with. > > Given at least 100K is locked in use after the first ref to ICE, and > probably closer to 200K, I'd like to see them released when we're not using > them (when we shut down peerconnections), even if it's off a timer to avoid > churn or some such (I don't think a timer is needed). I'm not so concerned > with releasing them in shutdown; if that's all we should just add entries in > the suppression list for lsan (see the grandparent bug to this one on lsan > for suppression info IIRC). > > Also: are these registry sizes appropriate for this use? they seem large > for a browser that at most might have a handful of sessions active (doubly > so with bundle/rtcpmux) This is configuration setup and setting up the logging system/s, so the max number of ICE contexts isn't really a factor here.
Flags: needinfo?(docfaraday)
Begin experimenting with cleaning up nr_registry and r_log.
Try out patch #2 and see how much it helps.
Flags: needinfo?(rjesup)
Whiteboard: [lsan][MemShrink] → [lsan][MemShrink:P2]
(In reply to Byron Campen [:bwc] from comment #4) > The rest of the nICEr stuff in this trace is related to the nr_registry, > which is set up once but never torn down. Do we want to put priority on > cleaning this up on shutdown? I don't see a whole lot of reason to expend > extra effort to clean this up when all NrIceCtx are gone (prior to > shutdown), since it looks like tens of K, but if it is easier/cleaner this > way it is an option. I don't see a lot of value in clenaning up the shutdown stuff.
Flags: needinfo?(ekr)
Priority: -- → P3
Target Milestone: --- → mozilla33
Attachment #8387137 - Flags: review?(ekr) → review?(drno)
Comment on attachment 8387137 [details] [diff] [review] Part 1: Plug a couple of common leaks in nICEr. Review of attachment 8387137 [details] [diff] [review]: ----------------------------------------------------------------- LGTM With this patch applied a three hour call as a TURN client no longer showed a memory leak.
Attachment #8387137 - Flags: review?(drno) → review+
Whiteboard: [lsan][MemShrink:P2] → [lsan][MemShrink:P2][webrtc-uplift]
Keywords: checkin-needed
I think this is worth considering uplifting as this leak consumes ~1MB per second per stream if the TURN relay of your client gets selected by ICE.
Do we have a Try link for this? :)
Keywords: checkin-needed
Try run looks green to me. Requesting checkin.
Keywords: checkin-needed
Thanks for fixing this! Hopefully we're a little closer to being able to do more targeted LSan whitelisting of WebRTC leaks.
Comment on attachment 8387137 [details] [diff] [review] Part 1: Plug a couple of common leaks in nICEr. Approval Request Comment [Feature/regressing bug #]: TURN support is available at least since FF23; 786235 is oldest ticket I was able to find regarding TURN support. [User impact if declined]: Firefox might get killed by the OS in a WebRTC call if the host runs out of memory (this eats at around 1MB/s if the scenario applies). [Describe test coverage new/current, TBPL]: No TBPL coverage as TBPL allows no proper TURN testing. This was caught in the WebRTC QA lab and can be verified with that as well. Additionally manual verification has been done and can be done as needed. [Risks and why]: low risk - we are freeing memory which should not be used any more; the same function for freeing the memory is in use in other places in the code and has not caused problems so far. [String/UUID change made/needed]: None.
Attachment #8387137 - Flags: approval-mozilla-beta?
Attachment #8387137 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
I'm not sure how verifiable this really is. This code still leaks, just less.
(In reply to Andrew McCreight [:mccr8] from comment #26) > I'm not sure how verifiable this really is. This code still leaks, just > less. Thanks Andrew for the info! Removing keyword for now... add it back if you think the QA team can help with this.
Keywords: verifyme
Whiteboard: [lsan][MemShrink:P2][webrtc-uplift] → [lsan][MemShrink:P2]
Could you provide the detailed reproduce steps and video for me to verify this bug. Thanks!
Flags: needinfo?(rjesup)
(In reply to Sue from comment #29) > Could you provide the detailed reproduce steps and video for me to verify > this bug. Thanks! There is no specific video. You would need to establish a call between two different networks, so that the call goes through a TURN relay. And then leave the call up for quite some time and watch the memory usage.
Flags: needinfo?(rjesup)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: