Closed Bug 1228788 Opened 4 years ago Closed 4 years ago

mozGetUserMedia — FATAL ERROR: AsyncShutdown timeout (Media shutdown: blocking on media thread)

Categories

(Core :: WebRTC: Audio/Video, defect, P2, critical)

defect

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
firefox44 --- affected
firefox45 --- fixed
Blocking Flags:

People

(Reporter: jruderman, Assigned: gcp)

References

(Blocks 1 open bug)

Details

(5 keywords)

Attachments

(3 files)

Attached file testcase
1. Create a fresh profile with prefs.js:

     user_pref("media.navigator.permission.disabled", true);
     user_pref("browser.tabs.remote.autostart", false);
     user_pref("browser.tabs.remote.autostart.1", false);
     user_pref("browser.tabs.remote.autostart.2", false);

2. Load the testcase.
3. ⌘Q as soon as the orange recording icon appears in the address bar.

Result:
  30% of the time, shutdown succeeds but everything leaks
  70% of the time, shutdown stalls and eventually aborts with the following message:

FATAL ERROR: AsyncShutdown timeout in profile-before-change Conditions: [{"name":"Media shutdown: blocking on media thread","state":"(none)","filename":"dom/media/MediaManager.cpp","lineNumber":1600,"stack":"Media shutdown"}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources.
Attached file outputs & stacks
The C++ stack mostly says it's calling JS, and the JS stack mostly says it's in onTimeout (AsyncShutdown.jsm:948).
I assume the component is wrong here?
Seems like it.
Assignee: mozilla → nobody
Component: AutoConfig (Mission Control Desktop) → WebRTC: Audio/Video
This may well be an issue with the Camera IPC channel getting nuked before it had a response.  We need mediamanager:4,camerasparent:4,cameraschild:4 logs.
backlog: --- → webrtc/webaudio+
Rank: 25
Priority: -- → P2
I've reproduced it. The content process is hung with main-thread waiting for ShutdownTask to come back from the MediaManager thread, but the MediaManager thread is in a MediaOperationTask stuck in DispatchToParent waiting for a reply that never comes:

MediaManager (34)
> #0	0x00007fff9406ef5e in __psynch_cvwait ()
> #1	0x00007fff9b0b673d in _pthread_cond_wait ()
> #2	0x00000001003f506e in PR_WaitCondVar at /Users/Jan/moz/mozilla-central/nsprpub/pr/src/pthreads/ptsynch.c:396
> #3	0x0000000100824107 in mozilla::CondVar::Wait(unsigned int) at /Users/Jan/moz/mozilla-central/xpcom/glue/BlockingResourceBase.cpp:501
> #4	0x000000010077ec72 in mozilla::Monitor::Wait(unsigned int) at /Users/Jan/moz/mozilla-central/obj-x86_64-apple-darwin12.2.1-debug/xpcom/io/../../dist/include/mozilla/Monitor.h:40
> #5	0x000000010077ebde in mozilla::MonitorAutoLock::Wait(unsigned int) at /Users/Jan/moz/mozilla-central/obj-x86_64-apple-darwin12.2.1-debug/xpcom/io/../../dist/include/mozilla/Monitor.h:88
> #6	0x0000000103e61732 in mozilla::camera::CamerasChild::DispatchToParent(nsIRunnable*, mozilla::MonitorAutoLock&) at /Users/Jan/moz/mozilla-central/dom/media/systemservices/CamerasChild.cpp:225
bool
CamerasChild::DispatchToParent(nsIRunnable* aRunnable,
                               MonitorAutoLock& aMonitor)
{
  {
    OffTheBooksMutexAutoLock lock(CamerasSingleton::Mutex());
    CamerasSingleton::Thread()->Dispatch(aRunnable, NS_DISPATCH_NORMAL);
  }
  // We can't see if the send worked, so we need to be able to bail
  // out on shutdown (when it failed and we won't get a reply).
  if (!mIPCIsAlive) {
    return false;
  }
  // Guard against spurious wakeups.
  mReceivedReply = false;
  // Wait for a reply
  do {
->  aMonitor.Wait();
  } while (!mReceivedReply && mIPCIsAlive);
  if (!mReplySuccess) {
    return false;
  }
  return true;
}
> #7	0x0000000103e62714 in mozilla::camera::CamerasChild::ReleaseCaptureDevice(mozilla::camera::CaptureEngine, int) at /Users/Jan/moz/mozilla-central/dom/media/systemservices/CamerasChild.cpp:476
  MonitorAutoLock monitor(mReplyMonitor);
->if (!DispatchToParent(runnable, monitor)) {
    return -1;
  }
> #8	0x0000000103e62621 in mozilla::camera::ReleaseCaptureDevice(mozilla::camera::CaptureEngine, int) at /Users/Jan/moz/mozilla-central/dom/media/systemservices/CamerasChild.cpp:459
> #9	0x0000000103f008ac in mozilla::MediaEngineRemoteVideoSource::Deallocate() at /Users/Jan/moz/mozilla-central/dom/media/webrtc/MediaEngineRemoteVideoSource.cpp:148
> #10	0x0000000103d08843 in mozilla::MediaOperationTask::Run() at /Users/Jan/moz/mozilla-central/dom/media/MediaManager.cpp:331
      case MEDIA_STOP:
      case MEDIA_STOP_TRACK:
        {
          NS_ASSERTION(!NS_IsMainThread(), "Never call on main thread");
          if (mAudioDevice) {
            mAudioDevice->GetSource()->Stop(source, kAudioTrack);
            mAudioDevice->GetSource()->Deallocate();
          }
          if (mVideoDevice) {
            mVideoDevice->GetSource()->Stop(source, kVideoTrack);
->          mVideoDevice->GetSource()->Deallocate();
          }
> #11	0x0000000100e60fa0 in MessageLoop::RunTask(Task*) at /Users/Jan/moz/mozilla-central/ipc/chromium/src/base/message_loop.cc:364
> #12	0x0000000100e6151f in MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) at /Users/Jan/moz/mozilla-central/ipc/chromium/src/base/message_loop.cc:372
> #13	0x0000000100e61744 in MessageLoop::DoWork() at /Users/Jan/moz/mozilla-central/ipc/chromium/src/base/message_loop.cc:459
> #14	0x0000000100f37bae in mozilla::ipc::DoWorkRunnable::Run() at /Users/Jan/moz/mozilla-central/ipc/glue/MessagePump.cpp:220
> #15	0x00000001007c9acd in nsThread::ProcessNextEvent(bool, bool*) at /Users/Jan/moz/mozilla-central/xpcom/threads/nsThread.cpp:964
> #16	0x00000001008555a7 in NS_ProcessNextEvent(nsIThread*, bool) at /Users/Jan/moz/mozilla-central/xpcom/glue/nsThreadUtils.cpp:297
> #17	0x0000000100f38450 in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) at /Users/Jan/moz/mozilla-central/ipc/glue/MessagePump.cpp:355
> #18	0x0000000100e60e85 in MessageLoop::RunInternal() at /Users/Jan/moz/mozilla-central/ipc/chromium/src/base/message_loop.cc:234
> #19	0x0000000100e60d95 in MessageLoop::RunHandler() at /Users/Jan/moz/mozilla-central/ipc/chromium/src/base/message_loop.cc:227
> #20	0x0000000100e60d3d in MessageLoop::Run() at /Users/Jan/moz/mozilla-central/ipc/chromium/src/base/message_loop.cc:201
> #21	0x0000000100e862f9 in base::Thread::ThreadMain() at /Users/Jan/moz/mozilla-central/ipc/chromium/src/base/thread.cc:172
> #22	0x0000000100e8753c in ThreadFunc(void*) at /Users/Jan/moz/mozilla-central/ipc/chromium/src/base/platform_thread_posix.cc:39
> #23	0x00007fff9b0b59b1 in _pthread_body ()
> #24	0x00007fff9b0b592e in _pthread_start ()
> #25	0x00007fff9b0b3385 in thread_start ()

Oddly, mIPCIsAlive appears to be true, but it doesn't matter since the CamerasChild::mReplyMonitor never lets up.
Flags: needinfo?(gpascutto)
I can reproduce this. Oddly, from a quick read of the CamerasChild/CamerasParent logs it doesn't appear to be an actual Shutdown problem, as there's no sign of any Shutdown sequence. Investigating further.
Flags: needinfo?(gpascutto)
Assignee: nobody → gpascutto
Looking at the logic, the question whether to try to reply to the child or not is predicated on IsShuttingDown(), which (among others) checks mWebRTCIsAlive. But mWebRTCIsAlive says something about whether the WebRTC stack is alive, not the IPC channels. I'm adding logging to verify but I suspect the aforementioned might be the issue here.
Nope, it's not. The Mac capture code is plain hanging in its destructor.

Thread 48:: VideoCapture
0   libsystem_kernel.dylib        	0x00007fff8b4c5f5e __psynch_cvwait + 10
1   libsystem_pthread.dylib       	0x00007fff970af73d _pthread_cond_wait + 767
2   com.apple.Foundation          	0x00007fff971ad048 -[NSCondition wait] + 240
3   com.apple.Foundation          	0x00007fff971ac01a -[NSObject(NSThreadPerformAdditions) performSelector:onThread:withObject:waitUntilDone:modes:] + 935
4   com.apple.Foundation          	0x00007fff97261abc -[NSObject(NSThreadPerformAdditions) performSelectorOnMainThread:withObject:waitUntilDone:modes:] + 87
5   com.apple.QTKit               	0x00007fff8a8483aa -[QTCaptureSession _rebuildGraph] + 45
6   com.apple.Foundation          	0x00007fff971be0cf NSKeyValueNotifyObserver + 379
7   com.apple.Foundation          	0x00007fff971bd93f NSKeyValueDidChange + 457
8   com.apple.Foundation          	0x00007fff9718a17c -[NSObject(NSKeyValueObserverNotification) didChangeValueForKey:] + 118
9   com.apple.Foundation          	0x00007fff971be0cf NSKeyValueNotifyObserver + 379
10  com.apple.Foundation          	0x00007fff971bd93f NSKeyValueDidChange + 457
11  com.apple.Foundation          	0x00007fff9718a17c -[NSObject(NSKeyValueObserverNotification) didChangeValueForKey:] + 118
12  XUL                           	0x0000000103ee6adc -[VideoCaptureMacQTKitObjC dealloc] + 124 (video_capture_qtkit_objc.mm:43)
13  XUL                           	0x0000000103ee5bad webrtc::videocapturemodule::VideoCaptureMacQTKit::~VideoCaptureMacQTKit() + 141 (video_capture_qtkit.mm:68)
14  XUL                           	0x0000000103ee781e webrtc::RefCountImpl<webrtc::videocapturemodule::VideoCaptureMacQTKit>::~RefCountImpl() + 14 (ref_count.h:44)
15  XUL                           	0x0000000103ee787c webrtc::RefCountImpl<webrtc::videocapturemodule::VideoCaptureMacQTKit>::Release() + 44 (ref_count.h:73)
16  XUL                           	0x0000000103e48134 webrtc::ViECapturer::~ViECapturer() + 180 (vie_capturer.cc:112)
17  XUL                           	0x0000000103e4849e webrtc::ViECapturer::~ViECapturer() + 14 (vie_capturer.cc:100)
18  XUL                           	0x0000000103e66bab webrtc::ViEInputManager::DestroyCaptureDevice(int) + 123 (vie_input_manager.cc:252)
19  XUL                           	0x0000000103e467ad webrtc::ViECaptureImpl::ReleaseCaptureDevice(int) + 189 (vie_capture_impl.cc:134)
20  XUL                           	0x00000001032bbb25 mozilla::camera::CamerasParent::RecvReleaseCaptureDevice(int const&, int const&)::$_16::operator()() const + 37 (CamerasParent.cpp:703)
21  XUL                           	0x00000001032bba9d mozilla::media::LambdaRunnable<mozilla::camera::CamerasParent::RecvReleaseCaptureDevice(int const&, int const&)::$_16>::Run() + 

I think this is a bug I ran into during development, but for which we removed the workaround in bug 1209987 because the issue no longer reproduced for me after the threading changes there.
The webrtc update does break the old workaround of doing that free on the main thread.

# Fatal error in /Users/morbo/hg/mozilla-central/media/webrtc/trunk/webrtc/system_wrappers/source/thread_posix.cc, line 107
# Check failed: thread_checker_.CalledOnValidThread()
Looks like we hit a similar issue before in bug 837539.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #6)
> Oddly, from a quick read of the CamerasChild/CamerasParent logs it doesn't appear to be an actual Shutdown
> problem, as there's no sign of any Shutdown sequence.

Well, I closed the last non-about: content-tab, which as I understand it (just from empirical observation really), is effectively a "shutdown" in the content process, but not in the parent process?
Sure, but I mean that the bug isn't in the code trying to tear down IPC channels or with handling the state where one of the ends has already disappeared. It's much higher up, when we try to free up the device. It's locking up before MediaManager::mBackend::Shutdown() gets called.
FWIW, I originally reported this without e10s, and you might find it easier to debug in that configuration.
This seems to work for me. I tried variations of the fix for bug 837539 until something worked. Please verify if it fixes the hang for you.
Attachment #8694301 - Flags: review?(rjesup)
Attachment #8694301 - Flags: feedback?(jruderman)
Comment on attachment 8694301 [details] [diff] [review]
Force QT device release to happen on the main thread

Review of attachment 8694301 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, we've seen stuff like this before with Mac capture code...
Attachment #8694301 - Flags: review?(rjesup) → review+
https://hg.mozilla.org/mozilla-central/rev/cd3ee5f0e8f6
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla45
I'm still seeing this crash using the following steps:
1. Launch Firefox [1] with a clean profile.
2. Click on Hello button and launch a conversation.
3. Update Firefox.

[1] 
* 44.0RC build 1 updated to the latest 44.0RC build 3
** bp-62f9324a-def4-437c-b4ad-6b9332160204
* 44.0RC build 3 updated to 45 beta 2 (on 'beta-cdntest' update channel)
** bp-e72ed27b-ec5c-4c6f-8b59-e1be62160204

This issue is also reproducible when applying updates from a Nightly build before this fix (E.g. 45.0a1 from 2015-12-01) to the latest 47.0a1 (bp-5a08939d-3e8d-4523-a69d-1ca5f2160204), but it *does not reproduce* with a Nightly build after this fix (E.g. 45.0a1 from 2015-12-05).

Affected platforms: Mac OS X 10.9.5 and Mac OS X 10.11.1

Terminal output when the crash occurs:
*************************
JavaScript error: resource://gre/modules/PerformanceStats.jsm, line 492: Error: forget() called twice
WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"Media shutdown: blocking on media thread","state":"(none)","filename":"/builds/slave/m-cen-m64-ntly-000000000000000/build/src/dom/media/MediaManager.cpp","lineNumber":1600,"stack":"Media shutdown"}] Barrier: profile-before-change
JavaScript error: chrome://loop/content/modules/MozLoopPushHandler.jsm, line 73: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIWebSocketChannel.asyncOpen]
FATAL ERROR: AsyncShutdown timeout in profile-before-change Conditions: [{"name":"Media shutdown: blocking on media thread","state":"(none)","filename":"/builds/slave/m-cen-m64-ntly-000000000000000/build/src/dom/media/MediaManager.cpp","lineNumber":1600,"stack":"Media shutdown"}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources.
[Parent 3254] ###!!! ABORT: file /builds/slave/m-cen-m64-ntly-000000000000000/build/src/dom/media/MediaManager.cpp, line 1600

As far as I can see the above crashes are related with this bug report. Any ideas?
Flags: needinfo?(gpascutto)
1) The fix went in Firefox 45, not Firefox 44.
2) There's no reason why the fix here would depend on whether Nightly was updated or not.
3) There's 2 months of other changes since the fix was committed.

You might be seeing another bug, I might've not fixed it but just made it more intermittent, I have no idea.
Flags: needinfo?(gpascutto)
(In reply to Gian-Carlo Pascutto [:gcp] from comment #20)
> 1) The fix went in Firefox 45, not Firefox 44.
Yes, indeed; the fix is not in 44 and that's why I was pointing to the fact that some users might hit this crash on Mac OS X when applying updates from 44 to 45, under the mentioned circumstances. 

> 2) There's no reason why the fix here would depend on whether Nightly was
> updated or not.
Are you referring to the fact that applying an update to nightly without the fix crashes and no crash with a nightly with the fix is not related with this report?

> 3) There's 2 months of other changes since the fix was committed.
> You might be seeing another bug, I might've not fixed it but just made it
> more intermittent, I have no idea.
Logged bug 1245863. Thanks for your input!
(In reply to Alexandra Lucinet, QA Mentor [:adalucinet] from comment #21)
> Are you referring to the fact that applying an update to nightly without the
> fix crashes and no crash with a nightly with the fix is not related with
> this report?

Wait, I think I may have completely misunderstood what you said wrt. updating.

Are you saying that:

a) Firefox versions before the fix hang when you do <some action related to shutdown like updating>.
b) Firefox versions after the fix do not hang.

If so, that's 100% the expected behavior. There was a shutdown hang and this bug fixed it.
Duplicate of this bug: 1245863
Gian-Carlo -- When can this bug happen? Can it only happen in shutdown when the video camera is already running/capturing?  If that's correct, it seems that the likelihood of a Release user hitting this bug is fairly low.  (Just trying to get a feel for how Fx44 users are possibly impacted by this bug.)  Thanks.
Flags: needinfo?(gpascutto)
(In reply to Maire Reavy [:mreavy] (Plz ni?:mreavy) from comment #24)
> Gian-Carlo -- When can this bug happen? Can it only happen in shutdown when
> the video camera is already running/capturing?  If that's correct, it seems
> that the likelihood of a Release user hitting this bug is fairly low.  (Just
> trying to get a feel for how Fx44 users are possibly impacted by this bug.) 
> Thanks.

That's correct (and only on Mac OS X).
Flags: needinfo?(gpascutto)
You need to log in before you can comment on or make changes to this bug.