Closed Bug 620326 Opened 14 years ago Closed 13 years ago

Intermittent failure in /tests/content/media/test/test_playback.html | Test timed out.

Categories

(Core :: Audio/Video, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: roc, Assigned: cpearce)

References

Details

(Keywords: intermittent-failure, Whiteboard: [approved-patches-landed])

Attachments

(1 file, 1 obsolete file)

Rev3 Fedora 12x64 mozilla-central debug test mochitests-1/5 on 2010/12/19 19:17:07
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1292815027.1292817245.22335.gz#err0

I'm pretty sure this isn't due to my push, it was green on try, and we've had problems with random failures here before.

If the orange persists I will close this bug and back out.
Whiteboard: [orange]
I landed bug 616800 a few pushes back, which is much more likely to have caused this than roc's push, since it changes the linux alsa sound backend which is where a lot of these problems have happened in the past. Sigh.
Summary: Failure in test_playback.html → Intermittent failure in /tests/content/media/test/test_playback.html | Test timed out.
I caught this in a VM. Decoder thread stacks:

Thread 6 (Thread 0x7f59246ae710 (LWP 17870)):
#5  0x00007f5965b4edf1 in nsBuiltinDecoderStateMachine::Wait (this=0x7f594d074f60, aMs=40)
at ../../../content/media/nsBuiltinDecoderStateMachine.cpp:1362
#6  0x00007f5965b4ec2a in nsBuiltinDecoderStateMachine::AdvanceFrame (this=0x7f594d074f60)
at ../../../content/media/nsBuiltinDecoderStateMachine.cpp:1333
#7  0x00007f5965b4d574 in nsBuiltinDecoderStateMachine::Run (this=0x7f594d074f60)
at ../../../content/media/nsBuiltinDecoderStateMachine.cpp:996

Thread 5 (Thread 0x7f59210ef710 (LWP 17872)):
#4  0x00007f5965b4f8c1 in mozilla::MonitorAutoEnter::Wait (this=0x7f59210ee930, interval=4294967295)
at ../../dist/include/mozilla/Monitor.h:226
#5  0x00007f5965b4ab39 in nsBuiltinDecoderStateMachine::DecodeLoop (this=0x7f594d074f60)
at ../../../content/media/nsBuiltinDecoderStateMachine.cpp:349
#6  0x00007f5965b50c1c in nsRunnableMethodImpl<void (nsBuiltinDecoderStateMachine::*)(), true>::Run (this=0x7f594a8a29d0)
at ../../dist/include/nsThreadUtils.h:345

Thread 4 (Thread 0x7f59206ee710 (LWP 17873)):
#0  0x0000003d4620b3bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5963e2c6db in PR_WaitCondVar (cvar=0x7f59444e2680, timeout=4294967295) at ../../../../../nsprpub/pr/src/pthreads/ptsynch.c:417
#2  0x00007f5963e2ce6d in PR_Wait (mon=0x7f5944255c80, timeout=4294967295) at ../../../../../nsprpub/pr/src/pthreads/ptsynch.c:614
#3  0x00007f596666e84a in mozilla::Monitor::Wait (this=0x7f594b2fcb40, interval=4294967295) at BlockingResourceBase.cpp:346
#4  0x00007f5965b4f8c1 in mozilla::MonitorAutoEnter::Wait (this=0x7f59206ed9b0, interval=4294967295)
at ../../dist/include/mozilla/Monitor.h:226
#5  0x00007f5965b4ae44 in nsBuiltinDecoderStateMachine::AudioLoop (this=0x7f594d074f60)
at ../../../content/media/nsBuiltinDecoderStateMachine.cpp:409
#6  0x00007f5965b50c1c in nsRunnableMethodImpl<void (nsBuiltinDecoderStateMachine::*)(), true>::Run (this=0x7f594a741c40)
at ../../dist/include/nsThreadUtils.h:345

The decoder thread is sleeping because it thinks the audio decode is ahead, and the audio thread is asleep because the audio queue is empty.

This happens because we calculate the amount of audio ahead as "audio decoded and in the audio queue" + "amount of unplayed audio already pushed to hardware", and we only update this calculation at the start of the loop.

We then get this timeout when the following happens: the decode thread decodes ahead and because its audio queue is full it decides it should sleep, but before it sleeps, the audio thread runs and pushes all the decoded audio to the hardware and sleeps because there's no audio left in the queue. Then the decode thread goes to sleep because it previously decided it should sleep based on the audio decoded value it had at the start of the loop.

I have a fix, I just need to test it more before attaching it...

We can also reset the video element's src attribute in test_playback at the end of every test case to reduce the number of threads running in that test (the thread count got up to 70 while I was looping on that test).
Assignee: nobody → chris
Blocks: 438871
Attached patch Patch v1 (obsolete) — Splinter Review
* Ensure the info we use to put the decoder thread to sleep is up to date. This will ensure we never get into the situation which I diagnosed in comment 4.
Attachment #501893 - Flags: review?(roc)
Attached patch Patch v2Splinter Review
* Added more comments re GetDecodedAudioDuration() being an upper bound, and on how we receive a notification on the decoder monitor to prevent both audio push and decode threads waiting for each other to do something.
* Changed DecodeLoop() to always hold the decoder monitor, and drop it when decoding.
* Mochitests pass locally on Win7, OSX 10.5, and in Linux VM.
Attachment #501893 - Attachment is obsolete: true
Attachment #502697 - Flags: review?(roc)
Attachment #501893 - Flags: review?(roc)
Comment on attachment 502697 [details] [diff] [review]
Patch v2

Can you add a comment that explains that the audioQueue.GetSize() can't change to zero between checking it and the Wait(), because popping off the audio queue requires the decoder lock? And same for video? (Check it's true for video!)
Attachment #502697 - Flags: review?(roc) → review+
Landed patch, and also landed the changes (disabled) to MediaTestManager that I've been using to loop infinitely on media mochitests until they fails.

http://hg.mozilla.org/mozilla-central/rev/c380880159ee
http://hg.mozilla.org/mozilla-central/rev/eac1b10ca8d0
Since those patches landed, don't see any other reports on this bug. Can we close this out?
Whiteboard: [orange] → [orange][approved-patches-landed]
Yes, I declare victory!
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
(In reply to comment #58)
> Yes, I declare victory!
Yep, you can really see it in this graph: http://brasstacks.mozilla.com/orangefactor/?display=Bug&endday=2011-02-25&startday=2010-08-01&bugid=620326

Nicely done, cpearce!
(In reply to comment #59)
> (In reply to comment #58)
> > Yes, I declare victory!
> Yep, you can really see it in this graph:
> http://brasstacks.mozilla.com/orangefactor/?display=Bug&endday=2011-02-25&startday=2010-08-01&bugid=620326

That orangefactor tool is seriously cool.
Whiteboard: [orange][approved-patches-landed] → [approved-patches-landed]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: