If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Intermittent 182, 188, 197, 221, 716, 763, 767 bytes leaked (Mutex, NrIceResolver, PeerConnectionMedia, ReentrantMonitor, nsSocketTransport, nsTArray_base, nsThread)

RESOLVED FIXED in Firefox 22

Status

()

Core
WebRTC: Signaling
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: jib, Assigned: jib)

Tracking

({intermittent-failure, mlk})

Trunk
mozilla24
x86
Mac OS X
intermittent-failure, mlk
Points:
---

Firefox Tracking Flags

(firefox22 fixed, firefox23 fixed, firefox24 fixed)

Details

(Whiteboard: [WebRTC][blocking-webrtc-][qa-])

Attachments

(3 attachments, 4 obsolete attachments)

Created attachment 750180 [details]
mozilla-inbound-leak.txt

At times (over the last week at least), I get this leak fairly persistently when running the crashtests locally:

  ./mach crashtest dom/media/tests/crashtests

Meaning: I can run the test again and again and get the same leaking result. At other times they do not happen at all, and I've never seen them on the try/build servers, only locally.

I want to stress that I ran off of a pull from mozilla-inbound from earlier today, *without* any local patches added (see output), and I was able to verify the leak immediately on the first three runs (I didn't try more, since in my experience, once they happen, they happen persistently).

I'm filing this report preemptively to disprove any relation to patches I'm testing and am about to land.

See attachment (leaks at the bottom).
Created attachment 750182 [details]
mozilla-inbound-leak.txt

Just the crashtest output.
Attachment #750180 - Attachment is obsolete: true

Updated

4 years ago
Keywords: intermittent-failure, mlk
Whiteboard: [WebRTC][blocking-webrtc-]
Created attachment 752354 [details]
Leak, narrowed down to single crashtest

I've narrowed it down to 860143.html, which all it does is this:

<head>
  <meta charset="utf-8">
  <title>bug 860143</title>
  <script type="application/javascript">
    function start() {
      var o0 = new window.mozRTCPeerConnection({
        iceServers: [
          {
            url: "turn:AAAAAAAAAAAAAAAAA        AAAAAAAAAAA  AAAAAAAAAAAAA AA AAAAA        AAAAAAAAAAA  AAAAAAAAAAAAAAAA AAAAA        AAAAAAAAAA, AAAAAAAAAAAAAAAA        AAA    AAAAAAAAAAAAAAA"
          }
        ]
      });
      o0.close();
    }
  </script>
</head>

<body onload="start()">

Which makes sense as it causes asynchronous DNS resolution to start (NrIceResolver), with not a lot of time to finish.
Attachment #750182 - Attachment is obsolete: true
Actually, it is much simpler: The 257-character url gets past the validation in JS and into the parsing code in c++ where it fails in nricectx.h:

>    else if (addr.size() < 256) {

Nice one! From there it is a failure to cleanup from a rarely-exercised failure path in PeerConnectionImpl.cpp:

  mMedia = new PeerConnectionMedia(this);

  // Connect ICE slots.
  mMedia->SignalIceGatheringCompleted.connect(this,
    &PeerConnectionImpl::IceGatheringCompleted);
  mMedia->SignalIceCompleted.connect(this, &PeerConnectionImpl::IceCompleted);
  mMedia->SignalIceFailed.connect(this, &PeerConnectionImpl::IceFailed);

  // Initialize the media object.
  if (aRTCConfiguration) {
    IceConfiguration ic;
    res = ConvertRTCConfiguration(*aRTCConfiguration, &ic, aCx);
>   NS_ENSURE_SUCCESS(res, res); // <-- res = NS_ERROR_FAILURE.
    res = mMedia->Init(ic.getStunServers(), ic.getTurnServers());

Unfortunately, mMedia is created but not yet initialized, which is not good since cleanup of it happens to be rather complicated:

  // Forget the reference so that we can transfer it to
  // SelfDestruct().
  mMedia.forget().get()->SelfDestruct();

I'll reorder the code to avoid this, since there's really no need to create mMedia before parsing the url. Patch coming up.

Now why doesn't this leak show up on tinderbox? I see the test run, but no sign of the failure path being exercised. Are crashtest leaks disabled there?
Assignee: nobody → jib
Missed jib in IRC, but here's some more info I was going to say.

jsmith	jib: Dug into the issue you hit btw with Clint
jsmith	jib: Turns out you found a bug in buildbot
jsmith	jib: Or actually how we are using buildbot (aka our scripts)
jsmith	jib: We're apparently not running leak checks for crashtests on OS X 10.8
Good, though I don't think that explains not seeing this on tbpl, since it should have appeared for other OSes. Shutdown of the internal PeerConnectionMedia object is complicated, so it is possible there's a yet to be explored shutdown race here.
Created attachment 752456 [details] [diff] [review]
Avoid leak of PeerConnectionMedia on 257+ character iceServer url

Reorders the code to parse the iceServers before creating the inner PeerConnectionMedia object, avoiding leaking on the subset of invalid iceServers that the JS validation code doesn't catch.
Attachment #752456 - Flags: review?(rjesup)
Created attachment 752832 [details] [diff] [review]
Part 1: Puts new PeerConnections on global list earlier, ensuring close is called should PC.initialize fail.

Actually fixes the leak rather than avoiding it.

Further analysis with Randell led us to the real culprit, which was that new PeerConnections weren't being added to the global list of PCs until after pc.initialize succeeded, which meant that if pc.initialize failed, then pc.close didn't get called by the shutdown observer.

Instead, pc.close got called during garbage collection, potentially much later in shutdown with the STS thread and the main thread possibly too far gone to succeed at completing the nsRunnable ping-pong dance in PeerConnectionMedia::SelfDestruct(). We believe this explains the intermittency.
Attachment #752456 - Attachment is obsolete: true
Attachment #752456 - Flags: review?(rjesup)
Attachment #752832 - Flags: review?(rjesup)
Created attachment 752834 [details] [diff] [review]
Bug 872839: Part 2: Untangle iceServer parsing from PeerConnectionMedia initialization for cleaner failure path.

Cleaner code and failure paths is still a good idea, though no longer necessary to avoid the leak.
Attachment #752834 - Flags: review?(rjesup)

Updated

4 years ago
Attachment #752832 - Flags: review?(rjesup) → review+

Updated

4 years ago
Attachment #752834 - Flags: review?(rjesup) → review+
Created attachment 752861 [details] [diff] [review]
Part 2: Untangle iceServer parsing from PeerConnectionMedia initialization for cleaner failure path. r=jesup

Added a comment. Carrying forward r+ from rjesup.
Attachment #752834 - Attachment is obsolete: true
Attachment #752861 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/integration/mozilla-inbound/rev/f5303482b695
https://hg.mozilla.org/integration/mozilla-inbound/rev/ec2e0fb43322
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/f5303482b695
https://hg.mozilla.org/mozilla-central/rev/ec2e0fb43322
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24

Updated

4 years ago
Whiteboard: [WebRTC][blocking-webrtc-] → [WebRTC][blocking-webrtc-][qa-]
Feel free to nominate for Aurora uplift btw.

Updated

4 years ago
Whiteboard: [WebRTC][blocking-webrtc-][qa-] → [WebRTC][blocking-webrtc-][qa-][webrtc-uplift]
Comment on attachment 752832 [details] [diff] [review]
Part 1: Puts new PeerConnections on global list earlier, ensuring close is called should PC.initialize fail.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): N/A

User impact if declined: Occasional leaks when PeerConnection initialization fails (generally would be due to misconfiguration)/

Testing completed (on m-c, etc.): on m-c for a while, verified

Risk to taking this patch (and alternatives if risky): very low; the patch  makes sure it's on the list of active connections before doing anything that can fail

String or IDL/UUID changes made by this patch: none
Attachment #752832 - Flags: approval-mozilla-aurora?
Just requesting uplift for part 1, as part two is just code cleanup

Updated

4 years ago
Attachment #752832 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-aurora/rev/345c950f4ae3
status-firefox22: --- → affected
status-firefox23: --- → fixed
status-firefox24: --- → fixed

Updated

4 years ago
Whiteboard: [WebRTC][blocking-webrtc-][qa-][webrtc-uplift] → [WebRTC][blocking-webrtc-][qa-]
https://hg.mozilla.org/releases/mozilla-beta/rev/58d396dd75aa
status-firefox22: affected → fixed
You need to log in before you can comment on or make changes to this bug.