Last Comment Bug 872978 - Intermittent data channel leaks - "9378 bytes leaked (DataChannel, DataChannelConnection, DtlsIdentity, Mutex, NrIceCtx, ...)"
: Intermittent data channel leaks - "9378 bytes leaked (DataChannel, DataChann...
Status: RESOLVED FIXED
[WebRTC][blocking-webrtc+][MemShrink]...
: intermittent-failure, mlk
Product: Core
Classification: Components
Component: WebRTC (show other bugs)
: 24 Branch
: All All
: -- normal (vote)
: mozilla24
Assigned To: [:jesup] on pto until 2016/7/5 Randell Jesup
: Jason Smith [:jsmith]
Mentors:
Depends on: 876167 889088
Blocks: 796894
  Show dependency treegraph
 
Reported: 2013-05-16 03:06 PDT by Henrik Skupin (:whimboo)
Modified: 2013-07-01 14:14 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
disabled
+
fixed
+
fixed
+
fixed


Attachments
log output (1.16 MB, text/plain)
2013-05-16 11:12 PDT, Henrik Skupin (:whimboo)
no flags Details
log output (basic video + multiple channel) (101.50 KB, text/plain)
2013-05-16 11:39 PDT, Henrik Skupin (:whimboo)
no flags Details
NSPR log (datachannel:5,signaling:5) (68.54 KB, text/plain)
2013-05-16 12:47 PDT, Henrik Skupin (:whimboo)
no flags Details
process any pending stream resets on incoming resets (12.40 KB, patch)
2013-05-25 09:53 PDT, [:jesup] on pto until 2016/7/5 Randell Jesup
no flags Details | Diff | Review
process any pending stream resets on incoming resets (13.24 KB, patch)
2013-06-01 18:46 PDT, [:jesup] on pto until 2016/7/5 Randell Jesup
tuexen: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
lukasblakk+bugs: approval‑mozilla‑beta+
Details | Diff | Review

Description Henrik Skupin (:whimboo) 2013-05-16 03:06:51 PDT
With attachment 750174 [details] [diff] [review] (bug 796894) applied to a mozilla-central checkout, I intermittently see the following leak. It looks like that it is only happening when I'm connected to the MPT VPN. For safety I will set qa-automation-blocked for now because the datachannel tests could not be landed if it's a general issue.

TEST-INFO | leakcheck | leaked 2 DataChannel (288 bytes)
TEST-INFO | leakcheck | leaked 2 DataChannelConnection (928 bytes)
TEST-INFO | leakcheck | leaked 2 DtlsIdentity (64 bytes)
TEST-INFO | leakcheck | leaked 6 Mutex (144 bytes)
TEST-INFO | leakcheck | leaked 2 NrIceCtx (384 bytes)
TEST-INFO | leakcheck | leaked 6 NrIceMediaStream (1008 bytes)
TEST-INFO | leakcheck | leaked 2 NrIceResolver (80 bytes)
TEST-INFO | leakcheck | leaked 24 NrSocket (4800 bytes)
TEST-INFO | leakcheck | leaked 1 ReentrantMonitor (32 bytes)
TEST-INFO | leakcheck | leaked 2 StringAdopt (2 bytes)
TEST-INFO | leakcheck | leaked 2 TransportFlow (352 bytes)
TEST-INFO | leakcheck | leaked 2 VerificationDigest (176 bytes)
TEST-INFO | leakcheck | leaked 1 nsDNSService (144 bytes)
TEST-INFO | leakcheck | leaked 2 nsDeque (192 bytes)
TEST-INFO | leakcheck | leaked 1 nsIDNService (120 bytes)
TEST-INFO | leakcheck | leaked 1 nsPrefBranch (128 bytes)
TEST-INFO | leakcheck | leaked 1 nsSocketTransportService (216 bytes)
TEST-INFO | leakcheck | leaked 3 nsStringBuffer (24 bytes)
TEST-INFO | leakcheck | leaked 10 nsTArray_base (80 bytes)
TEST-INFO | leakcheck | leaked 2 nsTimerImpl (192 bytes)
TEST-INFO | leakcheck | leaked 1 nsUnicodeNormalizer (24 bytes) 
TEST-UNEXPECTED-FAIL | leakcheck | 9378 bytes leaked (DataChannel, DataChannelConnection, DtlsIdentity, Mutex, NrIceCtx, ...)
Comment 1 Jan-Ivar Bruaroey [:jib] 2013-05-16 05:59:23 PDT
Did you try removing the patches and verify you could no longer reproduce the leaks?
Comment 2 Henrik Skupin (:whimboo) 2013-05-16 06:01:01 PDT
Removing which patches? The attachment I have mentioned is necessary so data channel tests get run at all. We don't have any of those yet in our mochitest suite.
Comment 3 Henrik Skupin (:whimboo) 2013-05-16 06:03:42 PDT
I got the leak now without being connected to the MPT VPN. So it's indeed blocking us to get the new tests for data channels landed. I will try to nail down and find a simple testcase.
Comment 4 Henrik Skupin (:whimboo) 2013-05-16 11:11:06 PDT
The same leak happens with the current version of my patch on bug 796894, and blocks us from getting the data channel tests landed.

This leak does not reproduce constantly so it's kinda hard to figure out what's going on. I will attach the log from a mochitest run in the hope that you can find something in there.
Comment 5 Henrik Skupin (:whimboo) 2013-05-16 11:12:08 PDT
Created attachment 750555 [details]
log output

stdout/stderr output to the console from the mochitest run.
Comment 6 Henrik Skupin (:whimboo) 2013-05-16 11:39:20 PDT
Created attachment 750573 [details]
log output (basic video + multiple channel)

It looks like that the test_dataChannel_basicVideo.html test is causing this leak most of the time. I will try to get a datachannel log.
Comment 7 Henrik Skupin (:whimboo) 2013-05-16 12:39:17 PDT
As the following try server run shows the leak is happening across platforms:
https://tbpl.mozilla.org/?tree=Try&rev=fb2864809c6b
Comment 8 Henrik Skupin (:whimboo) 2013-05-16 12:47:54 PDT
Created attachment 750622 [details]
NSPR log (datachannel:5,signaling:5)
Comment 9 [:jesup] on pto until 2016/7/5 Randell Jesup 2013-05-25 09:53:36 PDT
Created attachment 754142 [details] [diff] [review]
process any pending stream resets on incoming resets

WIP patch - no leaks in heavily-retriggered Try run (or locally after 15+ hours of mochitest runs), but there's a Windows (and maybe linux) crash inside the SCTP library which appears to be some sort of internal race condition between local and remote association shutdown - Michael Tuexen is looking at it, and I'm trying to reproduce again locally on Linux with stack backtrace.
Comment 10 [:jesup] on pto until 2016/7/5 Randell Jesup 2013-06-01 18:46:48 PDT
Created attachment 757062 [details] [diff] [review]
process any pending stream resets on incoming resets
Comment 11 [:jesup] on pto until 2016/7/5 Randell Jesup 2013-06-01 18:53:26 PDT
Comment on attachment 757062 [details] [diff] [review]
process any pending stream resets on incoming resets

This change triggers the sctp library bug in bug 876167 (see the workaround there).  Sending a new try of the two patches as the last try had one optional item here commented out in a failed attempt to avoid bug 876167

This cleans up stream close handling, especially at association shutdown time
Comment 12 [:jesup] on pto until 2016/7/5 Randell Jesup 2013-06-02 06:50:22 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/062a6a2269b5
Comment 13 Phil Ringnalda (:philor) 2013-06-02 12:09:14 PDT
https://hg.mozilla.org/mozilla-central/rev/062a6a2269b5
Comment 14 [:jesup] on pto until 2016/7/5 Randell Jesup 2013-06-02 19:50:09 PDT
Comment on attachment 757062 [details] [diff] [review]
process any pending stream resets on incoming resets

[Approval Request Comment]
Bug caused by (feature/regressing bug #): N/A

User impact if declined: Intermittent leak when closing PeerConnections when DataChannels are in use.  When the tests land (which they should this week), the intermittent failure will show up quite often on M3. It's unclear how often it would happen to users - probably rare, as it requires both sides to be closing at the same time.

Testing completed (on m-c, etc.): On M-C.  To solve bug 876167, we had to run a zillion Try retriggers and local mochitest runs.

Risk to taking this patch (and alternatives if risky): We need to take bug 876167 if we take this one (though that bug could still probably be hit with the right timing without this patch - this patch makes it much easier to hit that other bug. 

String or IDL/UUID changes made by this patch: none
Comment 15 Ryan VanderMeulen [:RyanVM] 2013-06-03 18:29:17 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/9005eaf15285

Doesn't apply cleanly to beta.
Comment 16 [:jesup] on pto until 2016/7/5 Randell Jesup 2013-06-04 00:46:25 PDT
https://hg.mozilla.org/releases/mozilla-beta/rev/931514a4b3be

Note You need to log in before you can comment on or make changes to this bug.