Closed Bug 620721 Opened 14 years ago Closed 13 years ago

Intermittent test_timeupdate_small_files.html | Test timed out. followed by shutdown timeout

Categories

(Core :: Audio/Video, defect)

Other Branch
All
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla8

People

(Reporter: philor, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure)

+++ This bug was initially created as a clone of Bug #581025 +++

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1292951714.1292954402.2593.gz
Rev3 Fedora 12x64 mozilla-central debug test mochitests-1/5 on 2010/12/21 09:15:14
s: talos-r3-fed64-026

58347 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | r11025_s16_c1.wav - shouldn't get timeupdate after ended
--DOMWINDOW == 19 (0x6bd3218) [serial = 1866] [outer = 0x389e940] [url = http://mochi.test:8888/tests/content/media/test/test_standalone.html]
--DOMWINDOW == 18 (0x8973e78) [serial = 1867] [outer = (nil)] [url = http://mochi.test:8888/tests/content/media/test/320x240.ogv]
--DOMWINDOW == 17 (0x8d3db78) [serial = 1868] [outer = (nil)] [url = http://mochi.test:8888/tests/content/media/test/seek.webm]
58348 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_timeupdate_small_files.html | Test timed out.
...
TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds with no output
(In reply to comment #18)
> philringnalda%gmail.com
> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1294908298.1294911095.4865.gz
> Rev3 Fedora 12 mozilla-central debug test mochitests-1/5 on 2011/01/13 00:44:58

Right before the timeout, the log says:

58534 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | 320x240.ogv - shouldn't get timeupdate after ended
58535 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | 320x240.ogv - shouldn't get timeupdate after ended
58536 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | 320x240.ogv - should see at least one timeupdate: 0.26600000262260437
58537 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | r11025_s16_c1.wav - shouldn't get timeupdate after ended
58538 INFO TEST-PASS | /tests/content/media/test/test_timeupdate_small_files.html | r11025_s16_c1.wav - shouldn't get timeupdate after ended
NEXT ERROR 58539 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_timeupdate_small_files.html | Test timed out.

So it looks like in this case we didn't send the "ended" event in the WAV decoder for r11025_s16_c1.wav.
This is our old enemy, sa_stream_write() not returning from nsAudioStream::Drain(), causing the nsWaveDecoder::Stop() to be unable to Shutdown() the decoder thread on the main thread.

(In reply to comment #58)
> bill%wg9s.com
> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1295813336.1295816295.25498.gz
> Rev3 Fedora 12 mozilla-central debug test mochitests-1/5 on 2011/01/23 12:08:56

Relevant thread stacks from that log:

Thread 0 (crashed)
 0  linux-gate.so + 0x424
    eip = 0x0096f424   esp = 0xbf9779e0   ebp = 0xbf977a38   ebx = 0x0939f3c8
    esi = 0x00000000   edi = 0x000000d5   eax = 0xfffffffc   ecx = 0x00000080
    edx = 0x000000d5   efl = 0x00000202
    Found by: given as instruction pointer in context
 1  libnspr4.so!PR_Wait [ptsynch.c : 614 + 0x14]
    eip = 0x00316b03   esp = 0xbf977a40   ebp = 0xbf977a68
    Found by: previous frame's frame pointer
 2  libxul.so!nsAutoMonitor::Wait [nsAutoLock.h : 346 + 0x14]
    eip = 0x010d4357   esp = 0xbf977a70   ebp = 0xbf977a88   ebx = 0x034ddb94
    Found by: call frame info
 3  libxul.so!nsEventQueue::GetEvent [nsEventQueue.cpp : 85 + 0x12]
    eip = 0x0274354a   esp = 0xbf977a90   ebp = 0xbf977ac8   ebx = 0x034ddb94
    Found by: call frame info
 4  libxul.so!nsThread::nsChainedEventQueue::GetEvent [nsThread.h : 114 + 0x1b]
    eip = 0x0274783a   esp = 0xbf977ad0   ebp = 0xbf977ae8   ebx = 0x034ddb94
    Found by: call frame info
 5  libxul.so!nsThread::ProcessNextEvent [nsThread.cpp : 616 + 0x5e]
    eip = 0x02745380   esp = 0xbf977af0   ebp = 0xbf977b88   ebx = 0x034ddb94
    Found by: call frame info
 6  libxul.so!NS_ProcessNextEvent_P [nsThreadUtils.cpp : 250 + 0x1f]
    eip = 0x026cc95c   esp = 0xbf977b90   ebp = 0xbf977bc8   ebx = 0x034ddb94
    esi = 0x0ce77b10
    Found by: call frame info
 7  libxul.so!nsThread::Shutdown [nsThread.cpp : 492 + 0x12]
    eip = 0x0274652c   esp = 0xbf977bd0   ebp = 0xbf977c58   ebx = 0x034ddb94
    esi = 0x0ce77b10
    Found by: call frame info
 8  libxul.so!nsWaveDecoder::Stop [nsWaveDecoder.cpp : 1388 + 0x1b]
    eip = 0x01be83a5   esp = 0xbf977c60   ebp = 0xbf977c68   ebx = 0x034ddb94
    esi = 0x093a8ffc
    Found by: call frame info
 9  libxul.so!nsRunnableMethodImpl<void (nsWaveDecoder::*)(), true>::Run [nsThreadUtils.h : 345 + 0x4e]
    eip = 0x01bea5a9   esp = 0xbf977c70   ebp = 0xbf977c78   ebx = 0x034ddb94
    esi = 0x093a8ffc
    Found by: call frame info
10  libxul.so!nsThread::ProcessNextEvent [nsThread.cpp : 633 + 0x18]
    eip = 0x02745402   esp = 0xbf977c80   ebp = 0xbf977d18   ebx = 0x034ddb94
    esi = 0x093a8ffc
    Found by: call frame info

[...]

Thread 19
 0  linux-gate.so + 0x424
    eip = 0x0096f424   esp = 0x9ff18d14   ebp = 0x9ff18d38   ebx = 0x9ff18d60
    esi = 0x00000000   edi = 0x00c58ff4   eax = 0xfffffdfc   ecx = 0x00000001
    edx = 0xffffffff   efl = 0x00000293
    Found by: given as instruction pointer in context
 1  libasound.so.2.0.0 + 0x4b835
    eip = 0x055cc836   esp = 0x9ff18d40   ebp = 0x9ff18db8
    Found by: previous frame's frame pointer
 2  libasound.so.2.0.0 + 0x4ba0a
    eip = 0x055cca0b   esp = 0x9ff18dc0   ebp = 0x9ff18dd8
    Found by: previous frame's frame pointer
 3  libasound.so.2.0.0 + 0x4bb4a
    eip = 0x055ccb4b   esp = 0x9ff18de0   ebp = 0x9ff18e28
    Found by: previous frame's frame pointer
 4  libasound.so.2.0.0 + 0x901ea
    eip = 0x056111eb   esp = 0x9ff18e30   ebp = 0x9ff18ea8
    Found by: previous frame's frame pointer
 5  libasound.so.2.0.0 + 0x4611b
    eip = 0x055c711c   esp = 0x9ff18eb0   ebp = 0x9ff18ee8
    Found by: previous frame's frame pointer
 6  libxul.so!sa_stream_write [sydney_audio_alsa.c : 260 + 0x1a]
    eip = 0x01beb11d   esp = 0x9ff18ef0   ebp = 0x9ff18f28
    Found by: previous frame's frame pointer
 7  libxul.so!nsAudioStreamLocal::Drain [nsAudioStream.cpp : 520 + 0x36]
    eip = 0x01b91c7e   esp = 0x9ff18f30   ebp = 0x9ff18f58   ebx = 0x034ddb94
    esi = 0x00002b10   edi = 0x01b91d4e
    Found by: call frame info
 8  libxul.so!nsWaveStateMachine::Run [nsWaveDecoder.cpp : 791 + 0x1b]
    eip = 0x01be9b9c   esp = 0x9ff18f60   ebp = 0x9ff19148   ebx = 0x034ddb94
    esi = 0x0e573a00   edi = 0x01b91d4e
    Found by: call frame info
 9  libxul.so!nsThread::ProcessNextEvent [nsThread.cpp : 633 + 0x18]
    eip = 0x02745402   esp = 0x9ff19150   ebp = 0x9ff191e8   ebx = 0x034ddb94
    esi = 0x00000000   edi = 0x003d0f00
    Found by: call frame info
(In reply to comment #190)
> philor
> http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1302226650.1302229743.31455.gz
> Rev3 Fedora 12 mozilla-central debug test mochitests-1/5 on 2011/04/07 18:37:30
> 
> s: talos-r3-fed-023

Stack here is slightly different from that in comment 59 in that it's not a Drain call resulting in the block, it's a normal audio write:

Thread 21
 0  linux-gate.so + 0x424
    eip = 0x00d9b424   esp = 0xa13fbcd4   ebp = 0xa13fbcf8   ebx = 0xa13fbd20
    esi = 0x00000000   edi = 0x00c58ff4   eax = 0xfffffdfc   ecx = 0x00000001
    edx = 0xffffffff   efl = 0x00000293
    Found by: given as instruction pointer in context
 1  libasound.so.2.0.0 + 0x4b835
    eip = 0x055cc836   esp = 0xa13fbd00   ebp = 0xa13fbd78
    Found by: previous frame's frame pointer
 2  libasound.so.2.0.0 + 0x4ba0a
    eip = 0x055cca0b   esp = 0xa13fbd80   ebp = 0xa13fbd98
    Found by: previous frame's frame pointer
 3  libasound.so.2.0.0 + 0x4bb4a
    eip = 0x055ccb4b   esp = 0xa13fbda0   ebp = 0xa13fbde8
    Found by: previous frame's frame pointer
 4  libasound.so.2.0.0 + 0x901ea
    eip = 0x056111eb   esp = 0xa13fbdf0   ebp = 0xa13fbe68
    Found by: previous frame's frame pointer
 5  libasound.so.2.0.0 + 0x4611b
    eip = 0x055c711c   esp = 0xa13fbe70   ebp = 0xa13fbea8
    Found by: previous frame's frame pointer
 6  libxul.so!sa_stream_write [sydney_audio_alsa.c : 260 + 0x1a]
    eip = 0x01df02dd   esp = 0xa13fbeb0   ebp = 0xa13fbee8
    Found by: previous frame's frame pointer
 7  libxul.so!nsAudioStreamLocal::Write [nsAudioStream.cpp : 471 + 0x28]
    eip = 0x01d993a5   esp = 0xa13fbef0   ebp = 0xa13fbf88   ebx = 0x035b755c
    esi = 0x00001000   edi = 0x0e371f38
    Found by: call frame info
 8  libxul.so!nsBuiltinDecoderStateMachine::PlayFromAudioQueue [nsBuiltinDecoderStateMachine.cpp : 645 + 0x5a]
    eip = 0x01d927da   esp = 0xa13fbf90   ebp = 0xa13fc008   ebx = 0x035b755c
    esi = 0x00000800   edi = 0x0e371f38
    Found by: call frame info
 9  libxul.so!nsBuiltinDecoderStateMachine::AudioLoop [nsBuiltinDecoderStateMachine.cpp : 510 + 0x29]

Thread 0 (crashed)
 0  linux-gate.so + 0x424
    eip = 0x00d9b424   esp = 0xbfc36ac0   ebp = 0xbfc36b18   ebx = 0x09723fd8
    esi = 0x00000000   edi = 0x000000eb   eax = 0xfffffffc   ecx = 0x00000080
    edx = 0x000000eb   efl = 0x00000206
    Found by: given as instruction pointer in context
 1  libnspr4.so!PR_Wait [ptsynch.c : 614 + 0x14]
    eip = 0x00859b6b   esp = 0xbfc36b20   ebp = 0xbfc36b48
    Found by: previous frame's frame pointer
 2  libxul.so!mozilla::Monitor::Wait [BlockingResourceBase.cpp : 346 + 0x14]
    eip = 0x02780f46   esp = 0xbfc36b50   ebp = 0xbfc36b88   ebx = 0x035b755c
    Found by: call frame info
 3  libxul.so!mozilla::MonitorAutoEnter::Wait [Monitor.h : 228 + 0x13]
    eip = 0x011e8894   esp = 0xbfc36b90   ebp = 0xbfc36b98   ebx = 0x035b755c
    Found by: call frame info
 4  libxul.so!nsEventQueue::GetEvent [nsEventQueue.cpp : 83 + 0x12]
    eip = 0x027f47a3   esp = 0xbfc36ba0   ebp = 0xbfc36bc8   ebx = 0x035b755c
    Found by: call frame info
 5  libxul.so!nsThread::nsChainedEventQueue::GetEvent [nsThread.h : 109 + 0x1b]
    eip = 0x027f891e   esp = 0xbfc36bd0   ebp = 0xbfc36be8   ebx = 0x035b755c
    Found by: call frame info
 6  libxul.so!nsThread::ProcessNextEvent [nsThread.cpp : 601 + 0x5e]
    eip = 0x027f6636   esp = 0xbfc36bf0   ebp = 0xbfc36c88   ebx = 0x035b755c
    Found by: call frame info
 7  libxul.so!NS_ProcessNextEvent_P [nsThreadUtils.cpp : 250 + 0x1f]
    eip = 0x0277f460   esp = 0xbfc36c90   ebp = 0xbfc36cc8   ebx = 0x035b755c
    esi = 0x0ad2cea0
    Found by: call frame info
 8  libxul.so!nsThread::Shutdown [nsThread.cpp : 481 + 0x12]
    eip = 0x027f7638   esp = 0xbfc36cd0   ebp = 0xbfc36d38   ebx = 0x035b755c
    esi = 0x0ad2cea0
    Found by: call frame info
 9  libxul.so!nsBuiltinDecoder::Stop [nsBuiltinDecoder.cpp : 133 + 0x1d]

Thread 19
 0  linux-gate.so + 0x424
    eip = 0x00d9b424   esp = 0x9363dc54   ebp = 0x9363dc88   ebx = 0x1097c7a4
    esi = 0x00000000   edi = 0x0000000b   eax = 0xfffffe00   ecx = 0x00000080
    edx = 0x00000002   efl = 0x00000202
    Found by: given as instruction pointer in context
 1  libnspr4.so!PR_Lock [ptsynch.c : 206 + 0xa]
    eip = 0x00858aae   esp = 0x9363dc90   ebp = 0x9363dcb8
    Found by: previous frame's frame pointer
 2  libnspr4.so!PR_EnterMonitor [ptsynch.c : 561 + 0xd]
    eip = 0x008598b2   esp = 0x9363dcc0   ebp = 0x9363dce8   ebx = 0x0086e460
    Found by: call frame info
 3  libxul.so!mozilla::Monitor::Enter [BlockingResourceBase.cpp : 315 + 0xd]
    eip = 0x02781c8c   esp = 0x9363dcf0   ebp = 0x9363dd28   ebx = 0x035b755c
    Found by: call frame info
 4  libxul.so!mozilla::MonitorAutoEnter::MonitorAutoEnter [Monitor.h : 218 + 0xc]
    eip = 0x011e871e   esp = 0x9363dd30   ebp = 0x9363dd48   ebx = 0x035b755c
    esi = 0x0f3de1a4
    Found by: call frame info
 5  libxul.so!nsBuiltinDecoderStateMachine::StopPlayback [nsBuiltinDecoderStateMachine.cpp : 693 + 0x14]
    eip = 0x01d8f8f0   esp = 0x9363dd50   ebp = 0x9363dda8   ebx = 0x035b755c
    esi = 0x0f3de1a4
    Found by: call frame info
 6  libxul.so!nsBuiltinDecoderStateMachine::AdvanceFrame [nsBuiltinDecoderStateMachine.cpp : 1518 + 0x12]
    eip = 0x01d90eed   esp = 0x9363ddb0   ebp = 0x9363df28   ebx = 0x035b755c
    esi = 0x00000001
    Found by: call frame info
 7  libxul.so!nsBuiltinDecoderStateMachine::Run [nsBuiltinDecoderStateMachine.cpp : 1301 + 0xa]
    eip = 0x01d922b1   esp = 0x9363df30   ebp = 0x9363e158   ebx = 0x035b755c
    esi = 0x00000001   edi = 0x003d0f00
    Found by: call frame info
A couple of things:

1. We don't use the write workaround from bug 573924 in drain, so it's possible to hit the write>avail hang there.

2. Bug 574924 comment 11 says "if the first write is smaller than avail, it's still possible to hang in snd_pcm_writei", so maybe we're hitting that case here.

3. It's possible that the workaround is needed in more cases than just resuming from a stream cork.

I tried to reproduce this on my machine last week, but never hit it over 24 hours of running 5-10 copies of the test in parallel.  I'm on Fedora 13 with a more recent PulseAudio that may include a fix for this particular bug.  Without being able to reproduce the hang, the best I can do is come up with test patches for at least items 1 and 2 and land them to see if they help.
Logging reveals we hit item 2 with the following files, at least:

http://mochi.test:8888/tests/content/media/test/r11025_s16_c1.wav
(gdb) p avail
$5 = 5513
(gdb) p nframes
$6 = 2048

http://mochi.test:8888/tests/content/media/test/small-shot.ogg
(gdb) p avail
$8 = 5564
(gdb) p nframes
$9 = 1024

I saw this in the normal (non-resume) case, but with the right load/timing this could happen after resuming a cork caused by network buffering.

Working around this requires writing sufficient silence before the audio data to make an avail-sized write.  The audio clock will need to be adjusted by the amount of silence written to retain A/V sync.
(In reply to comment #194)
> I tried to reproduce this on my machine last week, but never hit it over 24
> hours of running 5-10 copies of the test in parallel.  I'm on Fedora 13 with a
> more recent PulseAudio that may include a fix for this particular bug.  Without
> being able to reproduce the hang, the best I can do is come up with test
> patches for at least items 1 and 2 and land them to see if they help.

This is the right approach to take here, please land whatever experimental patches you need here, to see if they help.  You can also submit the patch to try and take lots of additional test runs to try to guess whether they help or not.
(In reply to comment #194)
> 2. Bug 574924 comment 11 says "if the first write is smaller than avail, it's
> still possible to hang in snd_pcm_writei", so maybe we're hitting that case
> here.

I ran 30-40 test runs on tryserver with a patch to fix this, and it seems to help, but bug 651023 is making it difficult to be sure, so I'll fix that first.  

Unfortunately, this approach breaks test_audiowrite.html and possibly real code using mozWriteAudio, so it needs more work.
OS: Linux → BeOS
Target Milestone: --- → mozilla5
Version: Trunk → Other Branch
(In reply to comment #233)
> (In reply to comment #194)
> > 2. Bug 574924 comment 11 says "if the first write is smaller than avail, it's
> > still possible to hang in snd_pcm_writei", so maybe we're hitting that case
> > here.
> 
> I ran 30-40 test runs on tryserver with a patch to fix this, and it seems to
> help, but bug 651023 is making it difficult to be sure, so I'll fix that first. 
> 
> Unfortunately, this approach breaks test_audiowrite.html and possibly real code
> using mozWriteAudio, so it needs more work.

If you have a patch here which you suspect will fix this, then I'd say please land it.  We can always reopen this bug if it happens again.
Blocks: 648353
Can we please land the patches Matthew was talking about in comment 233?  This bug is one of the issues that is blocking bug 648353 from landing...
It's not ready, as I said in comment 233 is breaks test_audiowrite.html and likely other things.  I'm working on an improved fix that avoids that problem, so I'll post the patch soon.
(In reply to comment #236)
> It's not ready, as I said in comment 233 is breaks test_audiowrite.html and
> likely other things.  I'm working on an improved fix that avoids that problem,
> so I'll post the patch soon.

OK, thanks!
Depends on: cubeb
Is bug 623444 going to be the fix here?
Fixed by Bug 675839.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: mozilla5 → mozilla8
No longer depends on: cubeb
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.