Last Comment Bug 757707 - Intermittent crashtests/752784-1.html | application timed out after 330 seconds with no output
: Intermittent crashtests/752784-1.html | application timed out after 330 secon...
Status: RESOLVED FIXED
: intermittent-failure
Product: Core
Classification: Components
Component: Audio/Video (show other bugs)
: Trunk
: x86 Linux
: -- normal (vote)
: mozilla15
Assigned To: Matthew Gregan [:kinetik]
:
Mentors:
Depends on:
Blocks: 438871 758198 759821
  Show dependency treegraph
 
Reported: 2012-05-22 20:02 PDT by Phil Ringnalda (:philor)
Modified: 2012-11-25 19:31 PST (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch v0 (2.70 KB, patch)
2012-05-29 22:40 PDT, Matthew Gregan [:kinetik]
no flags Details | Diff | Splinter Review
patch v1 (3.17 KB, patch)
2012-05-30 23:36 PDT, Matthew Gregan [:kinetik]
roc: review+
Details | Diff | Splinter Review

Description Phil Ringnalda (:philor) 2012-05-22 20:02:32 PDT
https://tbpl.mozilla.org/php/getParsedLog.php?id=11968696&tree=Mozilla-Inbound
Rev3 Fedora 12 mozilla-inbound debug test crashtest on 2012-05-22 18:47:11 PDT for push 99878dcae029
slave: talos-r3-fed-008

REFTEST TEST-START | file:///home/cltbld/talos-slave/test/build/reftest/tests/content/media/test/crashtests/752784-1.html | 418 / 2059 (20%)
WARNING: No outer window available!: file ../../../dom/base/nsGlobalWindow.cpp, line 8391
WARNING: No outer window available!: file ../../../dom/base/nsGlobalWindow.cpp, line 8391
WARNING: No outer window available!: file ../../../dom/base/nsGlobalWindow.cpp, line 8391
WARNING: No outer window available!: file ../../../dom/base/nsGlobalWindow.cpp, line 8391
++DOMWINDOW == 62 (0xb9bfdd4) [serial = 941] [outer = 0x9fbb830]
nsBuiltinDecoderStateMachine::RunStateMachine queuing nsBuiltinDecoder::PlaybackEnded
nsBuiltinDecoderStateMachine::RunStateMachine queuing nsBuiltinDecoder::PlaybackEnded
nsBuiltinDecoder::PlaybackEnded mPlayState=3
nsBuiltinDecoder::PlaybackEnded mPlayState=3
WARNING: No outer window available!: file ../../../dom/base/nsGlobalWindow.cpp, line 8391
WARNING: No outer window available!: file ../../../dom/base/nsGlobalWindow.cpp, line 8391
WARNING: No outer window available!: file ../../../dom/base/nsGlobalWindow.cpp, line 8391
WARNING: No outer window available!: file ../../../dom/base/nsGlobalWindow.cpp, line 8391
nsBuiltinDecoderStateMachine::RunStateMachine queuing nsBuiltinDecoder::PlaybackEnded
nsBuiltinDecoder::PlaybackEnded mPlayState=3
nsBuiltinDecoderStateMachine::RunStateMachine queuing nsBuiltinDecoder::PlaybackEnded
nsBuiltinDecoder::PlaybackEnded mPlayState=3
nsBuiltinDecoderStateMachine::RunStateMachine queuing nsBuiltinDecoder::PlaybackEnded
nsBuiltinDecoder::PlaybackEnded mPlayState=3
TEST-UNEXPECTED-FAIL | file:///home/cltbld/talos-slave/test/build/reftest/tests/content/media/test/crashtests/752784-1.html | application timed out after 330 seconds with no output
XScreenSaver state: Disabled
User input has been idle for 709 seconds
args: ['/home/cltbld/talos-slave/test/build/bin/screentopng']
(screenshot)
INFO | automation.py | Application ran for: 0:06:48.826801
INFO | automation.py | Reading PID log: /tmp/tmp_6xBhfpidlog
==> process 2126 launched child process 2159
INFO | automation.py | Checking for orphan process with PID: 2159
Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux-debug/1337736022/firefox-15.0a1.en-US.linux-i686.crashreporter-symbols.zip
PROCESS-CRASH | file:///home/cltbld/talos-slave/test/build/reftest/tests/content/media/test/crashtests/752784-1.html | application crashed (minidump found)
Crash dump filename: /tmp/tmpv4Xc0F/minidumps/16ab87d0-32ad-d9fb-3f9db17b-1c82ee4f.dmp
Operating system: Linux
                  0.0.0 Linux 2.6.31.5-127.fc12.i686.PAE #1 SMP Sat Nov 7 21:25:57 EST 2009 i686
CPU: x86
     GenuineIntel family 6 model 23 stepping 10
     2 CPUs

Crash reason:  SIGABRT
Crash address: 0x84b

Thread 0 (crashed)
 0  linux-gate.so + 0x424
    eip = 0x007f8424   esp = 0xbfa181a4   ebp = 0xbfa181d8   ebx = 0x0b079b74
    esi = 0x00000000   edi = 0x00000011   eax = 0xfffffffc   ecx = 0x00000080
    edx = 0x00000002   efl = 0x00200246
    Found by: given as instruction pointer in context
 1  libnspr4.so!PR_Lock [ptsynch.c : 174 + 0x7]
    eip = 0x00134446   esp = 0xbfa181e0   ebp = 0xbfa181f8
    Found by: previous frame's frame pointer
 2  libnspr4.so!PR_EnterMonitor [ptsynch.c : 529 + 0xa]
    eip = 0x0013519b   esp = 0xbfa18200   ebp = 0xbfa18228   ebx = 0x00149530
    esi = 0x0b079b70
    Found by: call frame info
 3  libxul.so!mozilla::ReentrantMonitor::Enter [BlockingResourceBase.cpp : 282 + 0x8]
    eip = 0x020e5e53   esp = 0xbfa18230   ebp = 0xbfa18268   ebx = 0x02d27288
    esi = 0x0960d8d4   edi = 0x00000000
    Found by: call frame info
 4  libxul.so!nsBuiltinDecoderStateMachine::Shutdown [nsBuiltinDecoderStateMachine.cpp : 1442 + 0x1c]
    eip = 0x01aa9efa   esp = 0xbfa18270   ebp = 0xbfa182a8   ebx = 0x02d27288
    esi = 0x09a72900   edi = 0x09a7290c
    Found by: call frame info
 5  libxul.so!nsBuiltinDecoder::Shutdown [nsBuiltinDecoder.cpp : 152 + 0x16]
    eip = 0x01aa5321   esp = 0xbfa182b0   ebp = 0xbfa182c8   ebx = 0x02d27288
    esi = 0x0960d830   edi = 0x0a8a3730
    Found by: call frame info
 6  libxul.so!nsHTMLMediaElement::ShutdownDecoder [nsHTMLMediaElement.cpp : 553 + 0x10]
    eip = 0x017e8d69   esp = 0xbfa182d0   ebp = 0xbfa182f8   ebx = 0x02d27288
    esi = 0x0a8a3740   edi = 0x0a8a3730
    Found by: call frame info
 7  libxul.so!nsHTMLMediaElement::~nsHTMLMediaElement [nsHTMLMediaElement.cpp : 1681 + 0x8]
    eip = 0x017eac75   esp = 0xbfa18300   ebp = 0xbfa18318   ebx = 0x02d27288
    esi = 0x0a8a36e8   edi = 0x0a3537f0
    Found by: call frame info
 8  libxul.so!nsHTMLAudioElement::~nsHTMLAudioElement [nsHTMLAudioElement.cpp : 82 + 0x39]
    eip = 0x017e167b   esp = 0xbfa18320   ebp = 0xbfa18338   ebx = 0x02d27288
    esi = 0x0a8a36e8   edi = 0x0a3537f0
    Found by: call frame info
 9  libxul.so!nsNodeUtils::LastRelease [nsNodeUtils.cpp : 251 + 0x7]
    eip = 0x0169c9b6   esp = 0xbfa18340   ebp = 0xbfa18388   ebx = 0x02d27288
    esi = 0x0a8a36e8   edi = 0x0a3537f0
Comment 1 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-22 22:43:04 PDT
The main thread is trying to take the decoder lock in nsBuiltinDecoderStateMachine::Shutdown due to the media element being GCed.

On the libcubeb thread, nsBufferedAudioStream::DataCallback is trying to take the lock on the stream.

On the nsBuiltinDecoderStateMachine thread (thread 18), we have a deep nest of nsBuiltinDecoderStateMachine::StopDecodeThread calls. We're exiting our monitor, trying to stop mDecodeThread, and while that waits, we get a nested nsBuiltinDecoderStateMachine::Run event which eventually triggers another nsBuiltinDecoderStateMachine::StopDecodeThread, etc. There's a nest of these 9-10 deep. At the bottom it's blocked trying to get a decoder lock. I guess at that point it shouldn't be holding any locks since StopDecodeThread drops the lock it's holding.

A very large number of nsBuiltinDecoderStateMachine::AudioLoop threads are trying to write to their nsBufferedAudioStream but waiting for space to open up. I guess they're not holding any locks.

One AudioLoop thread (thread 27) is calling nsBufferedAudioStream::Pause. I can't tell from the stack whether it's blocked or not.

Several DecodeLoop threads are doing a Wait, I think it must be the decoder lock. So they shouldn't be holding any locks.

So I can't tell who's holding the decoder lock that the main thread is waiting for. Nor can I tell who is holding the nsBufferedAudioStream lock.

Due to the large number of threads I'm pretty sure this problem is caused by 691096-1.html.
Comment 2 Matthew Gregan [:kinetik] 2012-05-22 23:59:10 PDT
nsAudioStream::Pause() is only called from one place, and it is holding the decoder lock when it does so.  Presumably that AudioLoop thread is holding the decoder lock, the nsBufferedAudioStream lock (inside Pause()) and is off in space somewhere in cubeb_stream_stop.  A decent stack would help; maybe we'll get lucky and this will fail on a debug build overnight.
Comment 3 PTO until Sep 5 NZ time; Chris Pearce (:cpearce) 2012-05-23 03:15:27 PDT
I've had plans for a while to refactor the way we're shutting down media threads, so we don't call Shutdown() on threads until we know their thread runners have actually exited so we don't have to wait. I was waiting for the audio threads to go away before doing that though.

We've got a few other orange/crash bugs that occur with call stacks like this (older ones with stacks calling into libsydneyaudio).
Comment 4 Matthew Gregan [:kinetik] 2012-05-23 20:42:26 PDT
I can reproduce this by running the content/media/test/crashtests a few hundred times in a opt build.
Comment 5 Matthew Gregan [:kinetik] 2012-05-23 20:47:16 PDT
This is a fairly simple deadlock.

Thread 198 (AudioLoop)
stm in PROCESSING, blocked on condition, holding nsBufferedAudioStream monitor

Thread 212 (cubeb_run)
waiting on nsBufferedAudioStream monitor, responsible for transition stm from PROCESSING to another state on completion
Comment 6 Treeherder Robot 2012-05-24 06:01:03 PDT
bbondy
https://tbpl.mozilla.org/php/getParsedLog.php?id=12010718&tree=Try
Rev3 Fedora 12x64 try debug test crashtest on 2012-05-23 20:02:28
slave: talos-r3-fed64-017

TEST-UNEXPECTED-FAIL | file:///home/cltbld/talos-slave/test/build/reftest/tests/content/media/test/crashtests/752784-1.html | application timed out after 330 seconds with no output
PROCESS-CRASH | file:///home/cltbld/talos-slave/test/build/reftest/tests/content/media/test/crashtests/752784-1.html | application crashed (minidump found)
Thread 0 (crashed)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
Comment 7 Treeherder Robot 2012-05-29 17:40:18 PDT
philor
https://tbpl.mozilla.org/php/getParsedLog.php?id=12172828&tree=Mozilla-Inbound
Rev3 Fedora 12 mozilla-inbound opt test crashtest-ipc on 2012-05-29 15:57:59
slave: talos-r3-fed-005

TEST-UNEXPECTED-FAIL | file:///home/cltbld/talos-slave/test/build/reftest/tests/content/media/test/crashtests/752784-1.html | application timed out after 330 seconds with no output
PROCESS-CRASH | file:///home/cltbld/talos-slave/test/build/reftest/tests/content/media/test/crashtests/752784-1.html | application crashed (minidump found)
Thread 0 (crashed)
Comment 8 Matthew Gregan [:kinetik] 2012-05-29 22:40:36 PDT
Created attachment 628235 [details] [diff] [review]
patch v0

I was sloppy with holding locks when I wrote this originally.  Holding locks across calls to cubeb can result in deadlocks between the caller and the data or state callback running on another thread.

Pushed to try here: https://tbpl.mozilla.org/?tree=Try&rev=b627afd8ee5d
Comment 9 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-30 02:34:34 PDT
Comment on attachment 628235 [details] [diff] [review]
patch v0

Review of attachment 628235 [details] [diff] [review]:
-----------------------------------------------------------------

So what happens if multiple nsBufferedAudioStream operations race, e.g. two Writes or a Write and a Pause? Now their execution can be interleaved.

Or are we always using nsBufferedAudioStreams from a single thread?
Comment 10 John Drinkwater (:beta) 2012-05-30 03:47:00 PDT
Consistently getting this when I seek opus audio, it is not intermittent for me. Occuring on http://media.kradradio.com/test/zan%20and%20shawn%20-%203-28-12_32kbit.opus to be specific.

It’s just sat here in the `bt`. 
#0  0x00007f997c7e789c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f997c7e309b in _L_lock_1006 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f997c7e301c in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3  0x00007f997b4e6f59 in PR_Lock () from /home/john/bin/firefox-nightly/firefox/libnspr4.so
#4  0x00007f997b4e75e5 in PR_EnterMonitor () from /home/john/bin/firefox-nightly/firefox/libnspr4.so
#5  0x00007f997887d0d0 in nsBuiltinDecoder::Play() () from /home/john/bin/firefox-nightly/firefox/libxul.so
#6  0x00007f997867df7c in nsHTMLMediaElement::Play() () from /home/john/bin/firefox-nightly/firefox/libxul.so
#7  0x00007f9978d725ca in NS_InvokeByIndex_P () from /home/john/bin/firefox-nightly/firefox/libxul.so

Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/15.0 Firefox/15.0a1
Comment 11 John Drinkwater (:beta) 2012-05-30 06:44:07 PDT
(In reply to Matthew Gregan [:kinetik] from comment #8)
> Created attachment 628235 [details] [diff] [review]
> patch v0
> 
> I was sloppy with holding locks when I wrote this originally.  Holding locks
> across calls to cubeb can result in deadlocks between the caller and the
> data or state callback running on another thread.
> 
> Pushed to try here: https://tbpl.mozilla.org/?tree=Try&rev=b627afd8ee5d

This works for me.
Comment 12 Treeherder Robot 2012-05-30 11:38:30 PDT
mbrubeck
https://tbpl.mozilla.org/php/getParsedLog.php?id=12203479&tree=Mozilla-Inbound
Rev3 Fedora 12 mozilla-inbound opt test crashtest on 2012-05-30 11:16:48
slave: talos-r3-fed-028

TEST-UNEXPECTED-FAIL | file:///home/cltbld/talos-slave/test/build/reftest/tests/content/media/test/crashtests/752784-1.html | application timed out after 330 seconds with no output
PROCESS-CRASH | file:///home/cltbld/talos-slave/test/build/reftest/tests/content/media/test/crashtests/752784-1.html | application crashed (minidump found)
Thread 0 (crashed)
Comment 13 Matthew Gregan [:kinetik] 2012-05-30 23:36:52 PDT
Created attachment 628609 [details] [diff] [review]
patch v1

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #9)
> Or are we always using nsBufferedAudioStreams from a single thread?

It's intended that callers perform their own synchronization, and I know this is true for AudioLoop and the Audio Data API (I didn't check the new MediaStreams code)--both use a single thread, with a small exception.  The exception is that the GetPosition calls (and possibly SetVolume) are intended to be safe to call from a different thread.  Also note that nsNativeAudioStream contains no synchronization (which may not matter given how little state it has now, but it used to have a shared buffer that would've been corrupted by interleaved writes).

The synchronization in nsBufferedAudioStream is intended to protect the state between the callbacks (which may run on different threads) and (already synchronized) calls to the nsBufferedAudioStream API.  The last patch does have a bug where it could clobber mState == ERRORED set by the state callback while the lock is not held over cubeb_stream_start/stop, so I've fixed that and cleaned up a few of the other changes.

Try push: https://tbpl.mozilla.org/?tree=Try&rev=389fb496f098
Comment 14 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-31 00:43:30 PDT
Comment on attachment 628609 [details] [diff] [review]
patch v1

Review of attachment 628609 [details] [diff] [review]:
-----------------------------------------------------------------

Please document the threading requirements for nsBufferedAudioStream clients.
Comment 15 Matthew Gregan [:kinetik] 2012-05-31 23:54:07 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/0429c7b12bab

Note You need to log in before you can comment on or make changes to this bug.