Closed Bug 827878 Opened 9 years ago Closed 9 years ago

crash in mozilla::TransportFlow::SendPacket


(Core :: WebRTC, defect, P1)




Tracking Status
firefox20 --- fixed
firefox21 --- fixed


(Reporter: florian, Assigned: jesup)



(Keywords: crash, Whiteboard: [webrtc][blocking-webrtc+])

Crash Data


(2 files)

This bug was filed from the Socorro interface and is 
report bp-02b12d82-5ad1-4791-98ae-495e52130108 .

We started seeing this crash in our WebRTC in SocialAPI demo with today's nightly (21.0a1 (2013-01-08)).

Steps to reproduce:
1. A calls B (audio+video call).
2. B accepts the call.
3. B closes the call. B usually crashes at this point.
4. The video of B is now frozen in A's floating chat window. If A closes that window, A often (but not always) crashes too.

Note: at step 3, if A closes the call, there seems to be no crash.
Likely a null mTransport....  Probably a race between async stuff shutting down the connection and outgoing packets.  We probably need to make sure any shutdown packets have flushed, and/or hold a strong ref.  See also ::Destroy()
Assignee: nobody → rjesup
Priority: -- → P1
Whiteboard: [webrtc][blocking-webrtc+]
Comment on attachment 700200 [details] [diff] [review]
don't release TransportFlow while DataChannel runnables may still be in use

the SendPacket runnable holds DataChannelConnection, but doesn't stop it from being Destroyed and the transport ref dropped.  This will hold the transport (but not stop DataChannelConnection from disconnecting).
Attachment #700200 - Flags: review?(ekr)
OS: Mac OS X → All
Whiteboard: [webrtc][blocking-webrtc+] → [webrtc][blocking-webrtc+][webrtc-uplift]
Comment on attachment 700200 [details] [diff] [review]
don't release TransportFlow while DataChannel runnables may still be in use

Review of attachment 700200 [details] [diff] [review]:

Attachment #700200 - Flags: review?(ekr) → review+
Blocks: 829899
I managed to reproduce a variant of this due to the fix just landed being incomplete - releasing the TransportFlow (on MainThread) can also cause NSS to send packets from PR_Close(), which is a no-no off of the STS thread.  The comment in my patch landed for bug 805521 stated that releasing on MainThread was ok per ekr; it turns out that's wrong.  We should check other TransportFlow releases to verify no others do this.

Patch to be attached; leave-open
Depends on: 805251
Whiteboard: [webrtc][blocking-webrtc+][webrtc-uplift] → [webrtc][blocking-webrtc+][webrtc-uplift][leave-open]
Attachment #701443 - Flags: review?(ekr)
Depends on: 830100
Comment on attachment 701443 [details] [diff] [review]
TransportFlows must be released on STS thread

Review of attachment 701443 [details] [diff] [review]:

I suggested a simpler approach below. The r+ holds whether you do it my way or with the class
as you do now.

Please annotate this bug so we can back it out when bug 830100 is resolved.

::: netwerk/sctp/datachannel/DataChannel.cpp
@@ +165,5 @@
> +  if (mTransportFlow && !IsSTSThread()) {
> +    MOZ_ASSERT(mSTS);
> +    nsCOMPtr<nsIRunnable> release_transport = new TransportRelease(mTransportFlow.forget());
> +    mSTS->Dispatch(release_transport, NS_DISPATCH_NORMAL);
> +  }

You can certainly do it this way with a class, but it would be simpler to do:

static void ReleaseTransportFlow(nsRefPtr<TransportFlow> tflow) {}

and then...

RUN_ON_THREAD(mSTS, WrapRunnableNM(ReleaseTransportFlow, mTransportFlow), NS_DISPATCH_NORMAL);

::: netwerk/sctp/datachannel/DataChannel.h
@@ +185,5 @@
> +      {
> +        MOZ_ASSERT(!NS_IsMainThread());
> +        return NS_OK;
> +      }
> +    nsRefPtr<TransportFlow> mFlow;

Is there any reason for mFlow to be public?
Attachment #701443 - Flags: review?(ekr) → review+
Whiteboard: [webrtc][blocking-webrtc+][webrtc-uplift][leave-open] → [webrtc][blocking-webrtc+][webrtc-uplift]
Closed: 9 years ago
Resolution: --- → FIXED
Keywords: verifyme
Comment on attachment 701443 [details] [diff] [review]
TransportFlows must be released on STS thread

[Approval Request Comment]
Bug caused by (feature/regressing bug #): When DataChannels landed, though a fix for cleaning up DataChannels that landed shortly before uplift a week ago made it much easier to actually hit it (or more likely).  The first fix moved it back the original "hard to hit" category; the second patch (this one) closed the hole.

This request is to uplift both fixes in this bug.

User impact if declined: Crashes when trying DataChannels (SociaAPI demo) with FF20.

Testing completed (on m-c, etc.): On m-c for a few days (longer for first fix); local testing (hundred of runs and closes or reloads of DataChannel tests)

Risk to taking this patch (and alternatives if risky): Very low (very simple fixes).

String or UUID changes made by this patch: none
Attachment #701443 - Flags: approval-mozilla-aurora?
Attachment #701443 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Flags: in-testsuite?
Duplicate of this bug: 828313
Verified by testing and closing calls during the call's execution to ensure no crash is happening.
Keywords: verifyme
Fixed in Fx20+, clearing webrtc-uplift flag.
Whiteboard: [webrtc][blocking-webrtc+][webrtc-uplift] → [webrtc][blocking-webrtc+]
You need to log in before you can comment on or make changes to this bug.