Closed Bug 902775 Opened 11 years ago Closed 9 years ago

Shutdown hang/deadlock, shutting down MediaStreamGraph thread, waiting in BufferedAudioStream::Write()

Categories

(Core :: Audio/Video, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: karlt, Unassigned)

References

Details

This is happening locally when the browser tries to shutdown after webaudio mochitests.  This is the same hang on shutdown after mochitest timeouts in bug 865642 comment 147, and likely bug 870065.  Locally, it happens even when no mochitests timeout.

The main thread is trying to shutdown the MSG thread.

#0  pthread_cond_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f8c6b2c2eca in PR_WaitCondVar (cvar=0xea7530, timeout=4294967295)
    at /home/karl/moz/dev/nsprpub/pr/src/pthreads/ptsynch.c:385
#2  0x00007f8c6b2c36d7 in PR_Wait (mon=0xea7b70, timeout=4294967295)
    at /home/karl/moz/dev/nsprpub/pr/src/pthreads/ptsynch.c:582
#3  0x00007f8c65b1521a in mozilla::ReentrantMonitor::Wait (this=0xea79b8, 
    interval=4294967295)
    at /mnt/sda11/karl/obj/xpcom/build/BlockingResourceBase.cpp:309
#4  0x00007f8c63c899f9 in mozilla::ReentrantMonitorAutoEnter::Wait (
    this=0x7fff39367650, interval=4294967295)
    at ../../dist/include/mozilla/ReentrantMonitor.h:192
#5  0x00007f8c65b99b0c in nsEventQueue::GetEvent (this=0xea79b8, 
    mayWait=true, result=0x7fff393676d0)
    at /home/karl/moz/dev/xpcom/threads/nsEventQueue.cpp:58
#6  0x00007f8c65b9d063 in nsThread::ProcessNextEvent (this=0xea7950, 
    mayWait=true, result=0x7fff393677cf)
    at /home/karl/moz/dev/xpcom/threads/nsThread.cpp:614
#7  0x00007f8c65b249bb in NS_ProcessNextEvent (thread=0xea7950, mayWait=true)
    at /mnt/sda11/karl/obj/xpcom/build/nsThreadUtils.cpp:238
#8  0x00007f8c65b9c9c3 in nsThread::Shutdown (this=0x384c680)
    at /home/karl/moz/dev/xpcom/threads/nsThread.cpp:463
#9  0x00007f8c65b9f6b1 in nsThreadManager::Shutdown (
    this=0x7f8c6932d5a0 <nsThreadManager::get()::sInstance>)
    at /home/karl/moz/dev/xpcom/threads/nsThreadManager.cpp:124
#10 0x00007f8c65b2f021 in mozilla::ShutdownXPCOM (servMgr=0xebb408)
    at /home/karl/moz/dev/xpcom/build/nsXPComInit.cpp:680
#11 0x00007f8c65b2ed3b in NS_ShutdownXPCOM (servMgr=0xebb408)
    at /home/karl/moz/dev/xpcom/build/nsXPComInit.cpp:615
#12 0x00007f8c6387bdc6 in ScopedXPCOMStartup::~ScopedXPCOMStartup (
    this=0xea61e0, __in_chrg=<optimized out>)
    at /home/karl/moz/dev/toolkit/xre/nsAppRunner.cpp:1130

(gdb) p mThread->tid
$7 = 28417
(gdb) info threads
  Id   Target Id         Frame 
  17   Thread 0x7f8c4f393700 (LWP 28301) "gdbus" 0x00007f8c69ff28f0 in *__GI___poll () from /lib64/libc.so.6
  16   Thread 0x7f8c4b392700 (LWP 28302) "Gecko_IOThread" syscall ()
    at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:39
  15   Thread 0x7f8c33fff700 (LWP 28305) "JS GC Helper" pthread_cond_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  14   Thread 0x7f8c27fff700 (LWP 28306) "JS Sour~ Thread" pthread_cond_wait
    () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  13   Thread 0x7f8c1bfff700 (LWP 28307) "JS Watchdog" pthread_cond_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  12   Thread 0x7f8c0ffff700 (LWP 28310) "Hang Monitor" pthread_cond_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  11   Thread 0x7f8b5bfff700 (LWP 28336) "Analysis Helper" pthread_cond_wait
    () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  10   Thread 0x7f8b57ffe700 (LWP 28337) "Analysis Helper" pthread_cond_wait
    () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  9    Thread 0x7f8b53ffd700 (LWP 28338) "Analysis Helper" pthread_cond_wait
    () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  8    Thread 0x7f8b4fffc700 (LWP 28339) "Analysis Helper" pthread_cond_wait
    () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  7    Thread 0x7f8b37fff700 (LWP 28340) "Analysis Helper" pthread_cond_wait
    () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  6    Thread 0x7f8b33ffe700 (LWP 28341) "Analysis Helper" pthread_cond_wait
    () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  5    Thread 0x7f8b27fff700 (LWP 28342) "Analysis Helper" pthread_cond_wait
    () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  4    Thread 0x7f8c44f4e700 (LWP 28344) "firefox" 0x00007f8c69ff28f0 in *__GI___poll () from /lib64/libc.so.6
  3    Thread 0x7f8afbfff700 (LWP 28370) "MediaManager" pthread_cond_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
  2    Thread 0x7f8b97ffe700 (LWP 28417) "firefox" pthread_cond_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
* 1    Thread 0x7f8c6b2e5740 (LWP 28291) "firefox" pthread_cond_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162

The MSG thread is waiting at
http://hg.mozilla.org/mozilla-central/annotate/79b5c74ef97b/content/media/AudioStream.cpp#l625

#0  pthread_cond_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x00007f8c6b2c2eca in PR_WaitCondVar (cvar=0x7f8b8d6accb0, 
    timeout=4294967295)
    at /home/karl/moz/dev/nsprpub/pr/src/pthreads/ptsynch.c:385
#2  0x00007f8c65b15300 in mozilla::CondVar::Wait (this=0x7f8b8c0b8678, 
    interval=4294967295)
    at /mnt/sda11/karl/obj/xpcom/build/BlockingResourceBase.cpp:336
#3  0x00007f8c6459596a in mozilla::Monitor::Wait (this=0x7f8b8c0b8660, 
    interval=4294967295) at ../../dist/include/mozilla/Monitor.h:47
#4  0x00007f8c64595a11 in mozilla::MonitorAutoLock::Wait (
    this=0x7f8b97ffc4c0, interval=4294967295)
    at ../../dist/include/mozilla/Monitor.h:102
#5  0x00007f8c645971ff in mozilla::BufferedAudioStream::Write (
    this=0x7f8b8c0b85e0, aBuf=0x7f8b97ffc530, aFrames=512)
    at /home/karl/moz/dev/content/media/AudioStream.cpp:625
#6  0x00007f8c6459547d in mozilla::AudioSegment::WriteTo (
    this=0x7f8b97ffda00, aOutput=0x7f8b8c0b85e0)
    at /home/karl/moz/dev/content/media/AudioSegment.cpp:185
#7  0x00007f8c645db3fe in mozilla::MediaStreamGraphImpl::PlayAudio (
    this=0x36b14c0, aStream=0x4353560, aFrom=37086015, aTo=37097199)
    at /home/karl/moz/dev/content/media/MediaStreamGraph.cpp:818
#8  0x00007f8c645dc5f3 in mozilla::MediaStreamGraphImpl::RunThread (
    this=0x36b14c0)
    at /home/karl/moz/dev/content/media/MediaStreamGraph.cpp:1140
#9  0x00007f8c645dcdd5 in mozilla::(anonymous namespace)::MediaStreamGraphInitThreadRunnable::Run (this=0x12dc4f0)
    at /home/karl/moz/dev/content/media/MediaStreamGraph.cpp:1276
#10 0x00007f8c65b9d11e in nsThread::ProcessNextEvent (this=0x384c680, 
    mayWait=true, result=0x7f8b97ffddcf)
    at /home/karl/moz/dev/xpcom/threads/nsThread.cpp:622
#11 0x00007f8c65b249bb in NS_ProcessNextEvent (thread=0x384c680, mayWait=true)
    at /mnt/sda11/karl/obj/xpcom/build/nsThreadUtils.cpp:238
#12 0x00007f8c65b9bf02 in nsThread::ThreadFunc (arg=0x384c680)
    at /home/karl/moz/dev/xpcom/threads/nsThread.cpp:250
#13 0x00007f8c6b2ca4cd in _pt_root (arg=0x3568b90)
    at /home/karl/moz/dev/nsprpub/pr/src/pthreads/ptthread.c:204
#14 0x00007f8c6aee3e14 in start_thread () from /lib64/libpthread.so.0

The alsa_run_thread is in poll:

#0  0x00007f8c69ff28f0 in *__GI___poll () from /lib64/libc.so.6
#1  0x00007f8c65fba9fb in alsa_run (ctx=0x7f8b1800b560)
    at /home/karl/moz/dev/media/libcubeb/src/cubeb_alsa.c:370
#2  0x00007f8c65fbac80 in alsa_run_thread (context=0x7f8b1800b560)
    at /home/karl/moz/dev/media/libcubeb/src/cubeb_alsa.c:420
#3  0x00007f8c6aee3e14 in start_thread () from /lib64/libpthread.so.0

(gdb) up
#1  0x00007f8c65fba9fb in alsa_run (ctx=0x7f8b1800b560)
    at /home/karl/moz/dev/media/libcubeb/src/cubeb_alsa.c:370
(gdb) p ctx->nfds
$8 = 4
(gdb) p ctx->fds[0]@4
$9 = {{fd = 40, events = 9, revents = 0}, {fd = 58, events = 41, 
    revents = 0}, {fd = 64, events = 41, revents = 0}, {fd = 70, events = 41, 
    revents = 0}}

9 == POLLIN | POLLERR
41 == POLLIN | POLLERR | 0x0020
0x0020 == POLLNVAL
The first fd is ctx->control_fd_read.  The rest are /dev/snd/timer or /dev/snd/pcmC0D0p.
This is from a different run

(gdb) p *ctx
$15 = {ops = 0x7f219dcd80c0 <alsa_ops>, thread = 139781751498496, mutex = {
    __data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, 
      __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
    __size = '\000' <repeats 39 times>, __align = 0}, streams = {
    0x7f1ffc01aad0, 0x7f1ffc03f770, 0x7f1ffc0185e0, 0x7f1ffc106a90, 0x0, 0x0, 
    0x0, 0x7f1ffc33fde0, 0x7f1ffc09b2c0, 0x7f1ffc33f230, 0x7f1ffc33f3f0, 
    0x7f1ffc342400, 0x0, 0x0, 0x0, 0x0}, fds = 0x7f202c001130, nfds = 10, 
  rebuild = 0, shutdown = 0, control_fd_read = 40, control_fd_write = 56, 
  active_streams = 9, local_config = 0x0, is_pa = 0}

All streams have state == RUNNING.
Karl, do you see this on a specific (set of) OS?
Flags: needinfo?(karlt)
I actually saw this only with Linux/alsa.  But 865642 comment 147 is the same issue on WINNT 6.1.

I can't reproduce reliably, but it happens often enough that I could try some logging, if I knew what to look for.
Flags: needinfo?(karlt)
OS: Windows 7 → Linux
If you could get the value of BufferedAudioStream::mState, and mBuffer.Length(), that would be great.
It seems this goes wrong before shutdown, I guess the cause of test timeouts.

On the cubeb ctx, there are 4 BufferedAudioStreams.
They all have state = RUNNING and mState == BufferedAudioStream::STARTED.

All except one has mBuffer.mCount == 0.
The other has mBuffer.mCount == 384000 and is the one waiting in
BufferedAudioStream::Write().

Each poll() returns revents for the streams with mCount == 0, but not for
the waiting stream.

cubeb's alsa_run only checks last_activity against CUBEB_WATCHDOG_MS when poll
returns no revents at all.  Should we assume that is the bug, and watchdog streams separately, or should I try to find out why this particular stream is not hearing from /dev/snd/timer?
Flags: needinfo?(kinetik)
BTW, the comment says "this should never happen", but avail = 24065204 and stm->buffer_size = 5120 at
http://hg.mozilla.org/mozilla-central/annotate/55b6b3b78d37/media/libcubeb/src/cubeb_alsa.c#l284
I wonder what thread is driving the other BufferedAudioStreams.  If it is the MSG thread, then perhaps that shouldn't be blocking on Write().
(In reply to Karl Tomlinson (:karlt) from comment #8)
> BTW, the comment says "this should never happen", but avail = 24065204 and
> stm->buffer_size = 5120 at

Where's the avail value coming from?  Try logging the two snd_pcm_avail_update calls.  The reason that code is there is that buggy ALSA/driver code returns garbage values, so it could be that.

(In reply to Karl Tomlinson (:karlt) from comment #7)
> Each poll() returns revents for the streams with mCount == 0, but not for
> the waiting stream.

Can you work out the state of that ALSA PCM?  I'll bet it's in SND_PCM_STATE_RUNNING but isn't actually running.  We've run into variants of this bug a lot, which is why the awful watchdog code is required.

> cubeb's alsa_run only checks last_activity against CUBEB_WATCHDOG_MS when
> poll
> returns no revents at all.  Should we assume that is the bug, and watchdog
> streams separately

We should fix that, yes.  It'll resolve the timeout by killing the stream.

> or should I try to find out why this particular stream
> is not hearing from /dev/snd/timer?

If you have time, it'd be worth making sure this ends up being the same root cause as the other cases we've hit and isn't a bug in our code.
Flags: needinfo?(kinetik)
I've now seen this with 3 different sound cards.  2 different drivers.

"HDA Intel MID" "92HD81B1C5 Analog"
"Plantronics .Audio 646 DSP" "USB Audio"
"Logitech H360 Headset" "USB Audio"

(In reply to Matthew Gregan [:kinetik] from comment #10)
> Can you work out the state of that ALSA PCM?  I'll bet it's in
> SND_PCM_STATE_RUNNING but isn't actually running.

Yes, it is in SND_PCM_STATE_RUNNING.
Component: Web Audio → Video/Audio
This was occurring in the same situations as bug 908462:

It didn't happen with the DeadlockDetector disabled, and it doesn't happen after
https://hg.mozilla.org/mozilla-central/rev/30ef08a0a1bc even with the DeadlockDetector enabled.
Depends on: 901630
Implications from analysis in bug 908462 are that this happens when there are long main thread delays.
This cannot happen anymore, since the MSG audio output refactoring. Closing.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.