Intermittent Linux & Linux64 test_playback.html | application timed out after 330 seconds with no output

RESOLVED FIXED in mozilla6

Status

()

Core
Audio/Video
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: philor, Assigned: cpearce)

Tracking

({intermittent-failure})

Trunk
mozilla6
x86_64
Linux
intermittent-failure
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

6 years ago
Looks like the first one actually found its way to bug 620326,

http://tinderbox.mozilla.org/showlog.cgi?log=Devtools/1302883693.1302884633.21846.gz
Rev3 Fedora 12x64 devtools opt test mochitests-1/5 on 2011/04/15 09:08:13

s: talos-r3-fed64-049
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html |
application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application
crashed (minidump found)
Thread 0 (crashed)

followed by

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1303155389.1303158376.6513.gz
Rev3 Fedora 12x64 mozilla-central debug test mochitests-1/5 on 2011/04/18 12:36:29
s: talos-r3-fed64-013

63558 INFO TEST-PASS | /tests/content/media/test/test_playback.html | bug482461-theora.ogv time should run forwards: p=0.000264 c=0.26722
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output

and

http://tinderbox.mozilla.org/showlog.cgi?log=Cedar/1303178059.1303179058.6211.gz
Rev3 Fedora 12x64 cedar opt test mochitests-1/5 on 2011/04/18 18:54:19
s: talos-r3-fed64-049

63561 INFO TEST-PASS | /tests/content/media/test/test_playback.html | bug500311.ogv time should run forwards: p=1.37525 c=1.667833
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
There were content/media changes landed around the time of the first timeout.  The main change would be bug 641718.

Only the first of the failures has a stack track, but I can't see anything obvious there, unless the threads are livelocked in an UpdateReadyState/PlaybackPositionChanged dance.

The AudioLoop is asleep in Wait(), another thread is dispatch an event to the main thread from AdvanceFrame/UpdateReadyState, and another is dispatching an event to the main thread from nsBuiltinDecoderStateMachine::Run.  The main thread is waiting on the decoder monitor in PlaybackPositionChanged.
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(In reply to comment #1)
> Only the first of the failures has a stack track, but I can't see anything
> obvious there, unless the threads are livelocked in an
> UpdateReadyState/PlaybackPositionChanged dance.

Caught this in a debugger and it looks like this is what's happening.  AdvanceFrame's remainingTime calculation is negative (because now > frame->mTime) and clock_time from the audio stream is not updating.  This results in AdvanceFrame returning without sleeping then being immediately called again.  Each time, it calls UpdatePlaybackPosition which causes the main thread to call PlaybackPositionChanged.
Lost my debugging session before I solved this and haven't been able to reproduce it again yet.  But I think the problem is:

For 44.1kHz audio, there are floor(22.676) usecs per sample.  This means clock_time has a granularity of no higher than 22us (one sample).  I think this is a situation where the difference between the next frame time and the clock time is less than 22us, so we wait in a tight loop.
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment 7, 9, 10, and 11 are on 32-bit Linux. Updating bug summary.
Summary: Intermittent Linux64 test_playback.html | application timed out after 330 seconds with no output → Intermittent Linux & Linux64 test_playback.html | application timed out after 330 seconds with no output
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 23

6 years ago
(In reply to comment #6)

> For 44.1kHz audio, there are floor(22.676) usecs per sample.  This means
> clock_time has a granularity of no higher than 22us (one sample).  I think this
> is a situation where the difference between the next frame time and the clock
> time is less than 22us, so we wait in a tight loop.

I've caught this in the debugger as well, and this seems to be what is happening. The audio thread hasn't push more the mAudioStream->GetMinWriteSamples() worth of audio to the hardware, and it's also blocked waiting to acquire the decoder monitor, so the audio clock isn't advancing.

I would have expected the ReentrantMonitorAutoEnter in the Run() loop to drop the monitor to allow the other threads to take the monitor...

Perhaps in AdvanceFrame() we should ensure remainingTime is always at least 1ms? This means that video frames could be late by almost 1ms during playback.

Or we could round clock_time down to the nearest ms in the "Skip frames up to the frame at playback postion" loop, so that we present frames slightly early, and so wait longer. Due to the delay in our rendering pipeline, I doubt this would be noticeable.
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 27

6 years ago
Created attachment 531773 [details] [diff] [review]
Patch: Round up wait times to 1ms to ensure we drop the decoder monitor when presenting video frames.

Try what I suggested in comment 23. Seems to work. This means frames may be slightly late(r), but I doubt it will be noticeable given the existing delays in our rendering pipeline. I ran test_playback for several hours on my linux box, and could no longer trigger the intermittent failure.

Greenish on Try: http://tbpl.mozilla.org/?tree=Try&rev=54d99e77149e
Assignee: nobody → chris
Status: NEW → ASSIGNED
Attachment #531773 - Flags: review?(kinetik)
Comment on attachment 531773 [details] [diff] [review]
Patch: Round up wait times to 1ms to ensure we drop the decoder monitor when presenting video frames.

It might be better to perform this round up in Wait() itself.
(Assignee)

Comment 29

6 years ago
Created attachment 531818 [details] [diff] [review]
Patch v2

(In reply to comment #28)
> It might be better to perform this round up in Wait() itself.

Probably a good idea. Looks like it's possible we could hit this same problem in the Wait() call while draining at the end of AudioLoop().
Attachment #531773 - Attachment is obsolete: true
Attachment #531773 - Flags: review?(kinetik)
Attachment #531818 - Flags: review?(kinetik)
Attachment #531818 - Flags: review?(kinetik) → review+
(Assignee)

Comment 30

6 years ago
http://hg.mozilla.org/mozilla-central/rev/d15d6f06eeb6
(Assignee)

Comment 31

6 years ago
I declare victory!
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(Assignee)

Updated

6 years ago
Target Milestone: --- → mozilla6
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Keywords: intermittent-failure
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.