Closed Bug 1164463 Opened 5 years ago Closed 5 years ago

MediaManager shutdown is non-deterministic and doesn't always finish

Categories

(Core :: WebRTC: Audio/Video, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox41 --- fixed

People

(Reporter: jesup, Assigned: jesup)

References

Details

Attachments

(4 files, 3 obsolete files)

Attached patch WIP patch to cleanup shutdown (obsolete) — Splinter Review
We can fail to run the lambda (at least in e10s content shutdown), and leave MediaEngines running because nsDOMUserMediaStreams can be waiting for GC/CC and holding refs to Audio/VideoSources.  (and some other issues).
See Also: → 1145883
There's still a hole in that it can fail to fully shut down the MediaManager thread (when DispatchToMainthread fails) depending on lots of factors.  However, this is clearly better in that regard and greatly reduces leaking of threads/etc and getting MediaEngine actually shut down, which should remove some intermittent shutdown crashes and perhaps some of the hangs
Attachment #8606831 - Flags: review?(jib)
Attachment #8605262 - Attachment is obsolete: true
Comment on attachment 8606831 [details] [diff] [review]
clean up MediaManager shutdown to be reliable and avoid holding locks while Joining a thread

Review of attachment 8606831 [details] [diff] [review]:
-----------------------------------------------------------------

r- for potential GetBackend() trouble on shutdown now.

::: dom/media/MediaManager.cpp
@@ +2237,4 @@
>          MOZ_ASSERT(MediaManager::IsInMediaThread());
>          mozilla::ipc::BackgroundChild::CloseForCurrentThread();
> +        if (NS_FAILED(NS_DispatchToMainThread(mReply))) {
> +          LOG(("Will leak thread: DispatchToMainthread of lambda failed in MediaManager shutdown"));

s/lambda/reply runnable/ ?

@@ +2251,5 @@
>      // cleared until the lambda function clears it.
>  
>      MediaManager::GetMessageLoop()->PostTask(FROM_HERE, new ShutdownTask(
>          media::CallbackRunnable::New([this]() mutable {
> +      LOG(("MediaManager shutdown lambda running"));

Did you mean to leave this one in?

@@ +2262,4 @@
>        LOG(("Releasing MediaManager singleton and thread"));
> +      {
> +        MutexAutoLock lock(mMutex);
> +        backend = mBackend.forget();

I think this introduces a new potential problem: GetBackend() [1] never returns nullptr, and is written to create a new mBackend if mBackend == nullptr. This could happen now on shutdown once mMutex is released.

I'm not sure what would happen, but creating new objects on the way down seems like asking for trouble.

[1] http://mxr.mozilla.org/mozilla-central/source/dom/media/MediaManager.cpp?rev=e19e46655f8c&mark=2075-2081#2068

@@ +2267,5 @@
>        if (mMediaThread) {
>          mMediaThread->Stop();
>        }
> +      // note that this == sSingleton (or better!)
> +      nsRefPtr<MediaManager> kungFuDeathGrip(sSingleton);

We want this as early as possible to ensure we release last. In fact, why not make it an argument to the lambda?

    nsRefPtr<MediaManager> that(this);
    MediaManager::GetMessageLoop()->PostTask(FROM_HERE, new ShutdownTask(
        media::CallbackRunnable::New([this, that]() mutable {

Variable capture is by-copy by default, so this will copy the 'that' nsRefPtr safely and keep it alive until the function has ended.

(Since 'this' gets special capture treatment, keeping it lets us refer to mActiveCallbacks etc. directly without writing that->mActiveCallbacks.)

::: dom/media/webrtc/MediaEngineWebRTC.cpp
@@ +279,5 @@
>    if (mHasTabVideoSource || dom::MediaSourceEnum::Browser == aMediaSource)
>      aVSources->AppendElement(new MediaEngineTabVideoSource());
>  
> +  ptrViECapture = nullptr;
> +  ptrViEBase = nullptr;

Why do these need to be cleared here now? And why not mViERender?

::: dom/media/webrtc/MediaEngineWebRTC.h
@@ +255,2 @@
>    virtual void EnumerateAudioDevices(dom::MediaSourceEnum,
> +                                    nsTArray<nsRefPtr<MediaEngineAudioSource> >*) override;

s/> >/>>/ since you're touching this. Also, whitespace alignment seems off by one.
Attachment #8606831 - Flags: review?(jib) → review-
(In reply to Jan-Ivar Bruaroey [:jib] from comment #2)
> > +  ptrViECapture = nullptr;
> > +  ptrViEBase = nullptr;
> 
> Why do these need to be cleared here now? And why not mViERender?

Strike the last part about mViERender, as I got confused.

But about ptrViECapture and ptrViEBase, there are a lot of early returns in this long function, and I'm having trouble following why these are cleared here and not on those failure returns.
the ptrVie* null's were left over from when it seemed like they weren't getting done for some reason (actually, there was a reference to a source lingering in GC/CC).  Generally cleaned up some more, and more hoisted out of the ShutdownTask.  Green Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=422b6c5f073d
Attachment #8607724 - Flags: review?(jib)
Comment on attachment 8607724 [details] [diff] [review]
interdiffs from review

Review of attachment 8607724 [details] [diff] [review]:
-----------------------------------------------------------------

I think we need another round on mBackend, but I think we're close.

::: dom/media/MediaManager.cpp
@@ +2026,5 @@
>    // includes picture support for Android.
>    // This IS called off main-thread.
>    MutexAutoLock lock(mMutex);
>    if (!mBackend) {
> +    MOZ_ASSERT(sSingleton);  // we should never create a new backend in shutdown

What does this patch do to prevent this?

@@ +2223,5 @@
> +    mCallIds.Clear();
> +    {
> +      MutexAutoLock lock(mMutex);
> +      if (mBackend) {
> +        mBackend->Shutdown(); // ok to invoke multiple times

Hmm, shouldn't this be called on media thread?

@@ +2245,5 @@
>          LOG(("MediaManager Thread Shutdown"));
>          MOZ_ASSERT(MediaManager::IsInMediaThread());
>          mozilla::ipc::BackgroundChild::CloseForCurrentThread();
> +        // must explicitly do this before dispatching the reply, since the reply may kill us with Stop()
> +        mBackend = nullptr; // last reference, will invoke Shutdown() again

Now that we have this ShutdownTask, why not clear sSingleton.mBackend here directly instead of passing it from main? If we did that then we could get rid of mMutex, since it looks like GetBackend() is only ever called on the media thread.

Even with this, I think we need an mInShutdown flag, otherwise an already-queued GetUserMediaTask or GetUserMediaDevicesTask task might still conceivably manage to call GetBackend() again before we're totally stopped, hitting the new MOZ_ASSERT.

@@ +2268,5 @@
> +    // Release the backend (and call Shutdown()) from within the MediaManager thread
> +    RefPtr<MediaEngine> temp;
> +    {
> +      MutexAutoLock lock(mMutex);
> +      temp = mBackend.forget();

This still nulls out sSingleton.mBackend on the main thread, leaving a window on shutdown where gUM calls may call GetBackend() and start things up again, at least in theory.
Attachment #8607724 - Flags: review?(jib) → review-
Comment on attachment 8606831 [details] [diff] [review]
clean up MediaManager shutdown to be reliable and avoid holding locks while Joining a thread

Review of attachment 8606831 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/webrtc/MediaEngineDefault.h
@@ +168,5 @@
> +  virtual void Shutdown() {
> +    MutexAutoLock lock(mMutex);
> +
> +    mVSources.Clear();
> +    mASources.Clear();

The Shutdown() call from main made me go look at this again. Why exactly do mVSources and mASources need to be cleared now? Is it to appease some try-run?

What concerns me is this only touches MediaEngineDefault, which I believe is the "fake" engine used by tests in the tree, yet the other "real" engines have no similar fixes, yet run in the real world and *not* in the tree. Are we sure they don't suffer similar problems?
> ::: dom/media/MediaManager.cpp
> @@ +2026,5 @@
> >    // includes picture support for Android.
> >    // This IS called off main-thread.
> >    MutexAutoLock lock(mMutex);
> >    if (!mBackend) {
> > +    MOZ_ASSERT(sSingleton);  // we should never create a new backend in shutdown
> 
> What does this patch do to prevent this?

Switched to MOZ_RELEASE_ASSERT(!sInShutdown).
 
> @@ +2223,5 @@
> > +    mCallIds.Clear();
> > +    {
> > +      MutexAutoLock lock(mMutex);
> > +      if (mBackend) {
> > +        mBackend->Shutdown(); // ok to invoke multiple times
> 
> Hmm, shouldn't this be called on media thread?

This is ok (and how it used to be done too); Shutdown() grabs the internal lock.  Doing this early helps.

> @@ +2245,5 @@
> >          LOG(("MediaManager Thread Shutdown"));
> >          MOZ_ASSERT(MediaManager::IsInMediaThread());
> >          mozilla::ipc::BackgroundChild::CloseForCurrentThread();
> > +        // must explicitly do this before dispatching the reply, since the reply may kill us with Stop()
> > +        mBackend = nullptr; // last reference, will invoke Shutdown() again
> 
> Now that we have this ShutdownTask, why not clear sSingleton.mBackend here
> directly instead of passing it from main? If we did that then we could get
> rid of mMutex, since it looks like GetBackend() is only ever called on the
> media thread.

See above, and removing the mutex plus moving Shutdown() to the MediaThread ShutdownTask runnable causes Oranges in debug builds (leaks).  Moving to nsIAsyncShutdown would resolve those; we can probably do this then.
 
Green try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a4e12d837ee9 (before the mIsShutdown change)
(In reply to Jan-Ivar Bruaroey [:jib] from comment #6)
> Comment on attachment 8606831 [details] [diff] [review]
> clean up MediaManager shutdown to be reliable and avoid holding locks while
> Joining a thread
> 
> Review of attachment 8606831 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/media/webrtc/MediaEngineDefault.h
> @@ +168,5 @@
> > +  virtual void Shutdown() {
> > +    MutexAutoLock lock(mMutex);
> > +
> > +    mVSources.Clear();
> > +    mASources.Clear();
> 
> The Shutdown() call from main made me go look at this again. Why exactly do
> mVSources and mASources need to be cleared now? Is it to appease some
> try-run?

This hits the normal sources as well.  We need to have them drop all their API pointers to webrtc.org stuff *before* we shutdown the MediaEngines (among other things)

The real engines implement Shutdown(); see MediaEngineWebRTCVideoSource::Shutdown()
Attachment #8606831 - Attachment is obsolete: true
added sInShutdown checks to things that call PostTask
Attachment #8608919 - Flags: review?(jib)
Attachment #8608864 - Attachment is obsolete: true
Attachment #8608864 - Flags: review?(jib)
Comment on attachment 8608919 [details] [diff] [review]
clean up MediaManager shutdown to be reliable and avoid holding locks while Joining a thread

Review of attachment 8608919 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaManager.cpp
@@ +1787,5 @@
>        onFailure.forget(), windowID, listener, mPrefs);
>    }
> +  if (sInShutdown) {
> +    return task->Denied(NS_LITERAL_STRING("In shutdown"));
> +  }

LGTM, though I worry about maintaining all these, especially since they may be dead code. Some of them are quite far from the PostTask they're protecting.

If we're going to do this (which I'm regretting a little) then maybe we could move the check to MediaManager::GetMessageLoop() or equivalent?

E.g. MediaManager::PostTask(FROM_HERE, task.forget());

And have that be a no-op (free the runnable) in sInShutdown ?
Attachment #8608919 - Flags: review?(jib) → review+
Comment on attachment 8612687 [details] [diff] [review]
collapse sInShutdown check and GetMessageLoop()->PostTask(FROM_HERE, into a single ::PostTask()

Review of attachment 8612687 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm. Minor debug consideration.

::: dom/media/MediaManager.cpp
@@ +1011,5 @@
>      // Dispatch to the media thread to ask it to start the sources,
>      // because that can take a while.
>      // Pass ownership of trackunion to the MediaOperationTask
>      // to ensure it's kept alive until the MediaOperationTask runs (at least).
> +    MediaManager::PostTask(

We should try to pass in the FROM_HERE location tracker I think.

http://mxr.mozilla.org/mozilla-central/ident?i=FROM_HERE&tree=mozilla-central
Attachment #8612687 - Flags: review?(jib) → review+
Comment on attachment 8608919 [details] [diff] [review]
clean up MediaManager shutdown to be reliable and avoid holding locks while Joining a thread

>+++ b/dom/media/webrtc/MediaEngineDefault.h
> {
> public:
>   explicit MediaEngineDefault(bool aHasFakeTracks = false)
>     : mHasFakeTracks(aHasFakeTracks)
>     , mMutex("mozilla::MediaEngineDefault")
>   {}
[...]
>+  virtual void Shutdown() {
>+    MutexAutoLock lock(mMutex);
>+
>+    mVSources.Clear();
>+    mASources.Clear();
>+  };

This "Shutdown" method was missing an 'override' annotation, which causes clang 3.6 & newer to spam a "-Winconsistent-missing-override" build warning, which busts --enable-warnings-as-errors builds (with clang 3.6+).

I'll push a followup to add that keyword, with blanket r+ that ehsan granted me for fixes of this sort over in bug 1126447 comment 2.
Backed out for WinXP debug 868504.html permacrash. Hopefully the other patches in the push didn't depend on this...
https://treeherder.mozilla.org/logviewer.html#?job_id=10272415&repo=mozilla-inbound
(didn't end up acting on comment 16, due to tree-closure & subsequent backout. jesup, could you add an "override" annotation to MediaEngineDefault's Shutdown() method, in the version of the patch that ends up re-landing?)
https://hg.mozilla.org/mozilla-central/rev/7aec151a5d61
https://hg.mozilla.org/mozilla-central/rev/e592b2411157
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
See Also: → 1515873
You need to log in before you can comment on or make changes to this bug.