Closed Bug 831427 Opened 8 years ago Closed 8 years ago

Race condition Destroy()ing MediaStreams - marked as destroyed before DestroyImpl() runs

Categories

(Core :: Audio/Video, defect)

17 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla21

People

(Reporter: jesup, Assigned: jesup)

Details

(Keywords: assertion, Whiteboard: [getUserMedia][blocking-gum+][qa-])

Attachments

(1 file)

There's a race condition between Destroy() and anything using a MediaStream -- Destroy sets mMainThreadDestroyed (on MainThread, like it says) and sends a messages to the graph.  AddMessage checks IsDestroyed (really mMainThreadDestroyed) and asserts if it's set, but DestroyImpl() may not have run yet (and may not have removed any listeners).  So in certain unusual but not *too* hard to hit cases, stopping mediastreams can lead to Destroy being called, and before it's processed the MediaStreamListener in gUM (aka GUMCMSL object) tries to RemoveListener(), causing an assertion.

This points out that the fix in bug 827007 is incomplete - it removes the listeners, but leaves a small timing hole in the assertion logic.

Really, there are 3ish states for MediaStreams, not two:
1) Alive
2) Being Destroyed
3) Destroyed

In Being Destroyed, many or most API requests should be no-ops (and in particular RemoveListener) and not assertions.  Once we get to Destroyed(), all listeners should be removed and have been notified of that and should not call us again.  (They should do something like GUMCMSL does, and have a Mutex-protected indicator of "has this listener been removed" - and MediaStreamGraph should not move the MediaStream to Destroyed until all said listeners are removed and notified.)
The assertion is 
!aMessage->GetStream() || !aMessage->GetStream()->IsDestroyed()
in MediaStreamGraph.cpp:1723 (in my pull)
(In reply to Randell Jesup [:jesup] from comment #0)
> Really, there are 3ish states for MediaStreams, not two:
> 1) Alive
> 2) Being Destroyed

which means, "destroyed on the main thread but not yet destroyed on the media graph thread".

> 3) Destroyed
> 
> In Being Destroyed, many or most API requests should be no-ops (and in
> particular RemoveListener) and not assertions.

I don't think that's true in general. Once main-thread code has called Destroy(), main-thread code should not touch the stream again.

In this case, where you have a message from off the main thread to the main thread that might race with a main-thread Destroy(), I think the best fix is to simply call IsDestroyed() in the message's Run().
Hmmm, ok.  Perhaps the solution is to call IsDestroyed() in GUMCMSL::Remove() before RemoveListener() and any other possibly-racing MediaStream methods.  (I automatically think of RemoveListener() as threadsafe and forget we're on MainThread.)
Assignee: roc → rjesup
Comment on attachment 703118 [details] [diff] [review]
Gate RemoveListener(stream) to avoid calling if Destroy() pending

Simple fix I believe.  It would help to document which calls on SourceMediaStreams/MediaStreams/etc are callable from what threads:

  SourceMediaStream says virtually all are any thread
  MediaStream doesn't really say (though many methods in the .h file have assertions to mainthread, and probably most of the others have assertions in the .cpp)
  ProcessedMediaStream has some that are MSG thread only, and the others by implication are any thread, but it doesn't say

We should also document if they're protected against being called after Destroy (which some are, like AppendToTrack, and others aren't like RemoveListener).

It's an open question if the IsDestroyed() is better here or in RemoveListener() - some users will know it's not needed, but others (most?) will need it (and might not realize, since it will only trigger the assertion in a debug build, and then only rarely).  But for now, this is clearly a way to get rid of the assertion, and it's no big deal either way if people know to test it.
Attachment #703118 - Flags: review?(roc)
(In reply to Randell Jesup [:jesup] from comment #5)
>   SourceMediaStream says virtually all are any thread
>   MediaStream doesn't really say (though many methods in the .h file have
> assertions to mainthread, and probably most of the others have assertions in
> the .cpp)

Everything marked "// control API" is main-thread-only. That is documented high up in MediaStreamGraph.h. That "control API" annotation applies to every function below it before the next empty line. Everything below "// media graph thread only" and before the next empty line is media graph thread only. However, there shouldn't be a blank line after RemoveAllListenersImpl, I should fix that up. I guess there should be an explicit comments "start of control API" "end of control API".

ProcessedMediaStream is similar.

> We should also document if they're protected against being called after
> Destroy (which some are, like AppendToTrack, and others aren't like
> RemoveListener).

Basically any external API that's called off the main thread (like most of SourceMediaStream) should protect against destruction, but any main thread API should not --- since destruction is controlled on the main thread.

> It's an open question if the IsDestroyed() is better here or in
> RemoveListener() - some users will know it's not needed, but others (most?)
> will need it (and might not realize, since it will only trigger the
> assertion in a debug build, and then only rarely).  But for now, this is
> clearly a way to get rid of the assertion, and it's no big deal either way
> if people know to test it.

We should have tests that run in debug builds.

What bothers me about making methods check IsDestroyed internally is that in general, messing with a destroyed object might indicate a problem that is not resolved just by dropping those method calls on the floor.
https://hg.mozilla.org/integration/mozilla-inbound/rev/9e6d6383957f

We'll want to uplift this to Aurora
Whiteboard: [getUserMedia][blocking-gum+] → [getUserMedia][blocking-gum+][webrtc-uplift]
Target Milestone: --- → mozilla21
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #6)
> (In reply to Randell Jesup [:jesup] from comment #5)
> >   SourceMediaStream says virtually all are any thread
> >   MediaStream doesn't really say (though many methods in the .h file have
> > assertions to mainthread, and probably most of the others have assertions in
> > the .cpp)
> 
> Everything marked "// control API" is main-thread-only. That is documented
> high up in MediaStreamGraph.h. That "control API" annotation applies to
> every function below it before the next empty line. Everything below "//
> media graph thread only" and before the next empty line is media graph
> thread only. However, there shouldn't be a blank line after
> RemoveAllListenersImpl, I should fix that up. I guess there should be an
> explicit comments "start of control API" "end of control API".

Aha.  I stopped going up at MediaStream, which was well after the block with the general description (I'd forgotten that was an invariant of the Control APIs, and "// control API" didn't trigger me to go check what that was; it just seemed like a comment.  Maybe adding an "// end control API" and "// call only from main thread (see above)", and maybe "// Check IsDestroyed() before calling if that is possible" (and/or add discussion of that to the top block; I searched for Destroy before posting my comments and didn't get pulled to the top block).

Or you could put the knowledge of the API type in people's face, and prefix/suffix the method names, but that'd be a pain to introduce now.

> ProcessedMediaStream is similar.
> 
> > We should also document if they're protected against being called after
> > Destroy (which some are, like AppendToTrack, and others aren't like
> > RemoveListener).
> 
> Basically any external API that's called off the main thread (like most of
> SourceMediaStream) should protect against destruction, but any main thread
> API should not --- since destruction is controlled on the main thread.

Makes sense now.

> 
> > It's an open question if the IsDestroyed() is better here or in
> > RemoveListener() - some users will know it's not needed, but others (most?)
> > will need it (and might not realize, since it will only trigger the
> > assertion in a debug build, and then only rarely).  But for now, this is
> > clearly a way to get rid of the assertion, and it's no big deal either way
> > if people know to test it.
> 
> We should have tests that run in debug builds.

Well, we do - it does NS_ASSERTION - the problem is that NS_ASSERTIONS (especially rare ones) are easy to miss, and even if seen may be hard to reproduce (not everyone runs with assertions-as-crashes, though perhaps that should be the default!).  And this one was quite rare, I only dug in to find it because I got lucky and noticed it once (I haven't been running that way!)

> What bothers me about making methods check IsDestroyed internally is that in
> general, messing with a destroyed object might indicate a problem that is
> not resolved just by dropping those method calls on the floor.

Well, since the assertions in opt builds don't exist, they'll let the calls through and queue control changes, which seems at least as risky unless we believe we've found *all* things that trigger the assertions before the code gets into the tree (or believe it's safe to queue them for destroyed streams, which you imply it's not).

I could (and did) test this 100 times without hitting it; some failures might happen only under load (we see that on a bunch of mochitests when run on the servers), etc.  On the other hand, a runtime check may not worth the memory access and jump-not-taken if they're called a lot (though I don't think they are, as main-thread-only calls).
I agree we should drop the calls on the floor in addition to asserting, as a way to mitigate the impact of bugs.
https://hg.mozilla.org/mozilla-central/rev/9e6d6383957f
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Whiteboard: [getUserMedia][blocking-gum+][webrtc-uplift] → [getUserMedia][blocking-gum+][webrtc-uplift][qa-]
We've been asked if this needs nominated for Beta/20 (it's on Aurora/21 and nightly/22).

My analysis: there's no downside in an opt build for this bug.  MediaManager holds a ref to the MediaStream here, so it will not disappear.  In a debug build, an assertion will fire.  In an opt build (and debug build), it will insert the RemoveListener control message, and then execute it on the MSG thread.  This will remove any listeners - but all listeners were removed already on Destroy(), and in any case removing any Listeners should be a safe operation.

Roc, derf, do you agree?  (with regards to beta)
I'm also ok with this not being uplifted to Beta.  If anyone feels differently, please say so here (in this bug) in the next day or two;  otherwise we'll consider this bug completely resolved.  Thanks.
(In reply to Randell Jesup [:jesup] from comment #11)
> Roc, derf, do you agree?  (with regards to beta)

Yes.
Whiteboard: [getUserMedia][blocking-gum+][webrtc-uplift][qa-] → [getUserMedia][blocking-gum+][qa-]
You need to log in before you can comment on or make changes to this bug.