WebRTC crash [@mozilla::MediaStreamGraphImpl::AppendMessage::ControlMessage]

RESOLVED FIXED in Firefox 20

Status

()

defect
--
critical
RESOLVED FIXED
7 years ago
6 years ago

People

(Reporter: posidron, Assigned: jesup)

Tracking

(Blocks 1 bug, {assertion, crash, intermittent-failure})

Trunk
mozilla20
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite -

Firefox Tracking Flags

(firefox20 fixed, firefox21 fixed)

Details

(Whiteboard: [getUserMedia] [blocking-gum+][qa-], crash signature)

Attachments

(2 attachments, 1 obsolete attachment)

Posted file callstack
I am getting this crash now very often with m-c changeset: 117822:5af4c6bd5104 while trying to test the SDP.

void
MediaStreamGraphImpl::AppendMessage(ControlMessage* aMessage)
{
  NS_ASSERTION(NS_IsMainThread(), "main thread only");
  NS_ASSERTION(!aMessage->GetStream() ||
               !aMessage->GetStream()->IsDestroyed(),
               "Stream already destroyed");

  if (mDetectedNotRunning &&
      mLifecycleState > LIFECYCLE_WAITING_FOR_MAIN_THREAD_CLEANUP) {

[...]
The fixes for bug 827007 and bug 826576 which just landed will at minimum help with that, and may solve it.  Please retest with those.

Marking as blocking-webrtc for now because of how this report was triggered, but I was seeing these assertions in gum as well before we landed those patches.

Taking bug.  If this isn't fixed, I may need to talk with roc after investigating.
Assignee: nobody → rjesup
Depends on: 822956
Whiteboard: [WebRTC] [blocking-webrtc+]
I saw it twice, but cannot reproduce it on command (and in fact now seem not to be able to in a dozen or two attempts):

Load the webrtc-landing datachannel test.
Start
wait...
Stop
open a second tab
got to about:memory
Scroll to the bottom, hit GC
switch back to tab 1
Hit Reload

Now that we have NotifyRemoved, I can add a boolean to avoid calling RemoveListener when we're already removed, but there still may be a window where it can occur.  However, this may just be wallpaper over a real problem
Crash Signature: [@ mozilla::MediaStreamGraphImpl::AppendMessage(mozilla::`anonymous namespace''::ControlMessage*)]
OS: Mac OS X → All
Hardware: x86_64 → All
This is in fact a webrtc bug and not GUM - I'm pretty certain the listeners in question were the ones in the conduits, not the GUM listeners. 

With the change we added in bug 827007, a probable sources of these problems can be more easily dealt with - the new NotifyRemoved() callback can tell the conduit to not call RemoveListener later (which is almost certainly happening in some cases today).
It was my understanding from discussion on IRC that RemoveListener could be safely called multiple times. It cannot?
It can.  But it cannot be safely called after the channel is destroyed, which is controlled from the DOM wrapper (and what's causing this assertion).  Also, after the channel is Destroyed the graph may shut down, which leads to crashes trying to send control messages to it (from ::RemoveListener()).

With the new change, we do RemoveAllListenersImpl() in DestroyImpl, and that causes NotifyRemoved() callbacks, so Listeners can know they're removed.
Hmm.... Looking through the code, the only place I see us calling RemoveListener() is through Local/RemoteSourceStreamInfo::Detach and both of these structures hold DOM ref pointers.
Why do you think this is us?
Blocks: 827452
also cleans up the handling of NotifyFinished a little
Attachment #698822 - Flags: review?(roc)
We'll want to nominate for uplift on this.
Whiteboard: [WebRTC] [blocking-webrtc+] → [getUserMedia] [blocking-gum+][webrtc-uplift]
Comment on attachment 698822 [details] [diff] [review]
add locks against calling RemoveListener on a Destroy()ed MediaStream

Review of attachment 698822 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaManager.h
@@ +141,5 @@
>    }
>  
> +  void NotifyFinished(MediaStreamGraph* aGraph);
> +
> +  void NotifyRemoved(MediaStreamGraph* aGraph);

add "virtual" and MOZ_OVERRIDE

@@ +152,5 @@
>    nsRefPtr<SourceMediaStream> mStream;
>    nsRefPtr<MediaInputPort> mPort;
>    TrackTicks mLastEndTimeAudio;
>    TrackTicks mLastEndTimeVideo;
> +  bool mFinished;

Need some comments for all these fields indicating how multithreaded access is controlled.
Attachment #698822 - Attachment is obsolete: true
Attachment #698822 - Flags: review?(roc)
Comment on attachment 698903 [details] [diff] [review]
add locks against calling RemoveListener on a Destroy()ed MediaStream

Also in the process of documenting, I realized your change to add nsDOMUserMediaStreams also removed the need for the mPort in GUMCMSL, so I removed it.
Attachment #698903 - Flags: review?(roc)
Comment on attachment 698903 [details] [diff] [review]
add locks against calling RemoveListener on a Destroy()ed MediaStream

Review of attachment 698903 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaManager.h
@@ +153,5 @@
> +
> +  // Set at Activate on MainThread
> +
> +  // Accessed from MediaStreamGraph thread, MediaManager thread, and MainThread
> +  // No locking needed as they're not operated on except on the MediaManager thread

I don't understand this comment, it seems to contradict itself.
Attachment #698903 - Flags: review?(roc) → review+
I'll clarify.  Those are only addrefed except on the MediaManager thread, no operations done on the other threads

Thanks
Duplicate of this bug: 827452
https://hg.mozilla.org/mozilla-central/rev/f63cab588d2f
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment on attachment 698903 [details] [diff] [review]
add locks against calling RemoveListener on a Destroy()ed MediaStream

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 822956 (or rather landing that exposed a latent bug)

User impact if declined:  Random GC-caused crashes after closing a getUserMedia-supplied MediaStream.  getUserMedia is now preffed on by default.  Not every assertion would cause a crash in an opt build, but some would.

Testing completed (on m-c, etc.): On m-c, plus local testing - code includes debugs that let me know if the formerly-crashing case has been avoided.  By hand using about:memory-triggered GC's it's not hard to hit the assertion. It also had been being hit as a random orange.

Risk to taking this patch (and alternatives if risky): Low risk - this adds a "gatekeeper" boolean to avoid calling RemoveListener if we've been notified the listener has already been removed.

String or UUID changes made by this patch: none
Attachment #698903 - Flags: approval-mozilla-aurora?
Whiteboard: [getUserMedia] [blocking-gum+][webrtc-uplift] → [getUserMedia] [blocking-gum+][webrtc-uplift][qa-]
Comment on attachment 698903 [details] [diff] [review]
add locks against calling RemoveListener on a Destroy()ed MediaStream

Low risk crash fix for a newly pref'd-on feature.
Attachment #698903 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
No need to land on Aurora; turns out it was uplifted with Aurora even though it landed on Monday 1/7.
Target Milestone: mozilla21 → mozilla20
Turns out I didn't read well enough; ryan uplifted.  Thanks.
Flags: in-testsuite-
Whiteboard: [getUserMedia] [blocking-gum+][webrtc-uplift][qa-] → [getUserMedia] [blocking-gum+][qa-]
You need to log in before you can comment on or make changes to this bug.