Last Comment Bug 651023 - Intermittent Linux & Linux64 test_playback.html | application timed out after 330 seconds with no output
: Intermittent Linux & Linux64 test_playback.html | application timed out after...
Status: RESOLVED FIXED
: intermittent-failure
Product: Core
Classification: Components
Component: Audio/Video (show other bugs)
: Trunk
: x86_64 Linux
: -- normal (vote)
: mozilla6
Assigned To: Chris Pearce (:cpearce)
:
:
Mentors:
Depends on:
Blocks: 438871
  Show dependency treegraph
 
Reported: 2011-04-18 20:13 PDT by Phil Ringnalda (:philor)
Modified: 2012-11-25 19:31 PST (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Patch: Round up wait times to 1ms to ensure we drop the decoder monitor when presenting video frames. (1.18 KB, patch)
2011-05-11 15:21 PDT, Chris Pearce (:cpearce)
no flags Details | Diff | Splinter Review
Patch v2 (2.39 KB, patch)
2011-05-11 17:45 PDT, Chris Pearce (:cpearce)
kinetik: review+
Details | Diff | Splinter Review

Description Phil Ringnalda (:philor) 2011-04-18 20:13:31 PDT
Looks like the first one actually found its way to bug 620326,

http://tinderbox.mozilla.org/showlog.cgi?log=Devtools/1302883693.1302884633.21846.gz
Rev3 Fedora 12x64 devtools opt test mochitests-1/5 on 2011/04/15 09:08:13

s: talos-r3-fed64-049
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html |
application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application
crashed (minidump found)
Thread 0 (crashed)

followed by

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1303155389.1303158376.6513.gz
Rev3 Fedora 12x64 mozilla-central debug test mochitests-1/5 on 2011/04/18 12:36:29
s: talos-r3-fed64-013

63558 INFO TEST-PASS | /tests/content/media/test/test_playback.html | bug482461-theora.ogv time should run forwards: p=0.000264 c=0.26722
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output

and

http://tinderbox.mozilla.org/showlog.cgi?log=Cedar/1303178059.1303179058.6211.gz
Rev3 Fedora 12x64 cedar opt test mochitests-1/5 on 2011/04/18 18:54:19
s: talos-r3-fed64-049

63561 INFO TEST-PASS | /tests/content/media/test/test_playback.html | bug500311.ogv time should run forwards: p=1.37525 c=1.667833
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
Comment 1 Matthew Gregan [:kinetik] 2011-04-18 21:09:55 PDT
There were content/media changes landed around the time of the first timeout.  The main change would be bug 641718.

Only the first of the failures has a stack track, but I can't see anything obvious there, unless the threads are livelocked in an UpdateReadyState/PlaybackPositionChanged dance.

The AudioLoop is asleep in Wait(), another thread is dispatch an event to the main thread from AdvanceFrame/UpdateReadyState, and another is dispatching an event to the main thread from nsBuiltinDecoderStateMachine::Run.  The main thread is waiting on the decoder monitor in PlaybackPositionChanged.
Comment 2 Treeherder Robot 2011-04-19 15:48:07 PDT
philor
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1303249332.1303252453.12769.gz
Rev3 Fedora 12x64 mozilla-central debug test mochitests-1/5 on 2011/04/19 14:42:12

s: talos-r3-fed64-008
63460 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | Test timed out.
TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds with no output
PROCESS-CRASH | Shutdown | application crashed (minidump found)
Thread 0 (crashed)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 2115 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 3 Treeherder Robot 2011-04-19 15:48:14 PDT
ehsan%mozilla.com
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1303249332.1303252453.12769.gz
Rev3 Fedora 12x64 mozilla-central debug test mochitests-1/5 on 2011/04/19 14:42:12

s: talos-r3-fed64-008
63460 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | Test timed out.
TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds with no output
PROCESS-CRASH | Shutdown | application crashed (minidump found)
Thread 0 (crashed)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 2115 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 4 Treeherder Robot 2011-04-20 03:32:51 PDT
mounir
http://tinderbox.mozilla.org/showlog.cgi?log=Cedar/1303291931.1303294383.16910.gz
Rev3 Fedora 12 cedar debug test mochitests-1/5 on 2011/04/20 02:32:11

s: talos-r3-fed-047
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application crashed (minidump found)
Thread 0 (crashed)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 2039 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 5 Matthew Gregan [:kinetik] 2011-04-20 04:37:14 PDT
(In reply to comment #1)
> Only the first of the failures has a stack track, but I can't see anything
> obvious there, unless the threads are livelocked in an
> UpdateReadyState/PlaybackPositionChanged dance.

Caught this in a debugger and it looks like this is what's happening.  AdvanceFrame's remainingTime calculation is negative (because now > frame->mTime) and clock_time from the audio stream is not updating.  This results in AdvanceFrame returning without sleeping then being immediately called again.  Each time, it calls UpdatePlaybackPosition which causes the main thread to call PlaybackPositionChanged.
Comment 6 Matthew Gregan [:kinetik] 2011-04-20 20:53:17 PDT
Lost my debugging session before I solved this and haven't been able to reproduce it again yet.  But I think the problem is:

For 44.1kHz audio, there are floor(22.676) usecs per sample.  This means clock_time has a granularity of no higher than 22us (one sample).  I think this is a situation where the difference between the next frame time and the clock time is less than 22us, so we wait in a tight loop.
Comment 7 Treeherder Robot 2011-04-21 10:34:14 PDT
mounir
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1303405507.1303406456.11264.gz
Rev3 Fedora 12 tryserver opt test mochitests-1/5 on 2011/04/21 10:05:07

s: talos-r3-fed-045
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | Exited with code 1 during test run
Comment 8 Treeherder Robot 2011-04-22 12:12:09 PDT
21%vingtetun.org
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1303496945.1303499304.26405.gz
Rev3 Fedora 12x64 mozilla-central debug test mochitests-1/5 on 2011/04/22 11:29:05

s: talos-r3-fed64-009
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application crashed (minidump found)
Thread 0 (crashed)
TEST-UNEXPECTED-FAIL | plugin process 2090 | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
Comment 9 Treeherder Robot 2011-04-26 14:24:35 PDT
mfinkle%mozilla.com
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1303849795.1303852379.9822.gz
Rev3 Fedora 12 mozilla-central debug test mochitests-1/5 on 2011/04/26 13:29:55

s: talos-r3-fed-031
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application crashed (minidump found)
Thread 0 (crashed)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 2139 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 10 Treeherder Robot 2011-04-26 23:54:37 PDT
rnewman
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1303881258.1303885875.10843.gz
Rev3 Fedora 12 mozilla-central debug test mochitests-1/5 on 2011/04/26 22:14:18

s: talos-r3-fed-024
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application crashed (minidump found)
Thread 0 (crashed)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 2111 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 11 Treeherder Robot 2011-04-27 03:16:52 PDT
dao
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1303893310.1303895029.15440.gz
Rev3 Fedora 12 mozilla-central opt test mochitests-1/5 on 2011/04/27 01:35:10

s: talos-r3-fed-040
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | Exited with code 1 during test run
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application crashed (minidump found)
Thread 21 (crashed)
Comment 12 Matt Brubeck (:mbrubeck) 2011-04-27 07:35:15 PDT
Comment 7, 9, 10, and 11 are on 32-bit Linux. Updating bug summary.
Comment 13 Treeherder Robot 2011-04-27 07:35:34 PDT
mbrubeck%mozilla.com
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1303908401.1303912889.16579.gz
Rev3 Fedora 12 mozilla-central debug test mochitests-1/5 on 2011/04/27 05:46:41

s: talos-r3-fed-024
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application crashed (minidump found)
Thread 0 (crashed)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 2061 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 14 Treeherder Robot 2011-04-27 08:16:22 PDT
philor
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1303909923.1303914452.24725.gz
Rev3 Fedora 12 tracemonkey debug test mochitests-1/5 on 2011/04/27 06:12:03

s: talos-r3-fed-033
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application crashed (minidump found)
Thread 0 (crashed)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 2043 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 15 Treeherder Robot 2011-04-28 03:11:48 PDT
jacek%codeweavers.com
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1303980949.1303984458.22726.gz
Rev3 Fedora 12x64 mozilla-central debug test mochitests-1/5 on 2011/04/28 01:55:49

s: talos-r3-fed64-004
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application crashed (minidump found)
Thread 0 (crashed)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 2102 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 16 Treeherder Robot 2011-04-29 12:29:33 PDT
Ms2ger%gmail.com
http://tinderbox.mozilla.org/showlog.cgi?log=Try/1304103705.1304105019.6823.gz
Rev3 Fedora 12x64 try opt test mochitests-1/5 on 2011/04/29 12:01:45

s: talos-r3-fed64-021
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application crashed (minidump found)
Thread 0 (crashed)
Comment 17 Treeherder Robot 2011-04-29 16:01:07 PDT
bz%mit.edu
http://tinderbox.mozilla.org/showlog.cgi?log=Cedar/1304104992.1304106811.15098.gz
Rev3 WINNT 5.1 cedar opt test mochitests-1/5 on 2011/04/29 12:23:12

s: talos-r3-xp-027
64849 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | Test timed out.
TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds with no output
PROCESS-CRASH | Shutdown | application crashed (minidump found)
Thread 33 (crashed)
Comment 18 Treeherder Robot 2011-04-29 16:48:34 PDT
zackw%panix.com
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1304117691.1304120101.6221.gz
Rev3 Fedora 12 mozilla-central debug test mochitests-1/5 on 2011/04/29 15:54:51

s: talos-r3-fed-007
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application crashed (minidump found)
Thread 0 (crashed)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 2067 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 19 Treeherder Robot 2011-04-29 16:49:45 PDT
wesj
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1304117691.1304120101.6221.gz
Rev3 Fedora 12 mozilla-central debug test mochitests-1/5 on 2011/04/29 15:54:51

s: talos-r3-fed-007
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application crashed (minidump found)
Thread 0 (crashed)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 2067 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 20 Treeherder Robot 2011-05-01 20:29:06 PDT
philor
http://tinderbox.mozilla.org/showlog.cgi?log=Devtools/1304077681.1304079039.29467.gz
Rev3 Fedora 12x64 devtools opt test mochitests-1/5 on 2011/04/29 04:48:01

s: talos-r3-fed64-047
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
Comment 21 Treeherder Robot 2011-05-01 20:44:33 PDT
philor
http://tinderbox.mozilla.org/showlog.cgi?log=Devtools/1304077681.1304079039.29467.gz
Rev3 Fedora 12x64 devtools opt test mochitests-1/5 on 2011/04/29 04:48:01

s: talos-r3-fed64-047
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
Comment 22 Treeherder Robot 2011-05-03 12:45:55 PDT
dtownsend%oxymoronical.com
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1304450857.1304451752.907.gz
Rev3 Fedora 12 mozilla-central opt test mochitests-1/5 on 2011/05/03 12:27:37

s: talos-r3-fed-006
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application crashed (minidump found)
Thread 0 (crashed)
Comment 23 Chris Pearce (:cpearce) 2011-05-04 15:19:59 PDT
(In reply to comment #6)

> For 44.1kHz audio, there are floor(22.676) usecs per sample.  This means
> clock_time has a granularity of no higher than 22us (one sample).  I think this
> is a situation where the difference between the next frame time and the clock
> time is less than 22us, so we wait in a tight loop.

I've caught this in the debugger as well, and this seems to be what is happening. The audio thread hasn't push more the mAudioStream->GetMinWriteSamples() worth of audio to the hardware, and it's also blocked waiting to acquire the decoder monitor, so the audio clock isn't advancing.

I would have expected the ReentrantMonitorAutoEnter in the Run() loop to drop the monitor to allow the other threads to take the monitor...

Perhaps in AdvanceFrame() we should ensure remainingTime is always at least 1ms? This means that video frames could be late by almost 1ms during playback.

Or we could round clock_time down to the nearest ms in the "Skip frames up to the frame at playback postion" loop, so that we present frames slightly early, and so wait longer. Due to the delay in our rendering pipeline, I doubt this would be noticeable.
Comment 24 Treeherder Robot 2011-05-07 09:23:50 PDT
philor
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1304782152.1304784562.15656.gz
Rev3 Fedora 12 mozilla-central debug test mochitests-1/5 on 2011/05/07 08:29:12

s: talos-r3-fed-048
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application crashed (minidump found)
Thread 0 (crashed)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 2064 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 25 Treeherder Robot 2011-05-09 23:09:57 PDT
rcampbell%mozilla.com
http://tinderbox.mozilla.org/showlog.cgi?log=Devtools/1304967837.1304970324.16154.gz
Rev3 Fedora 12 devtools debug test mochitests-1/5 on 2011/05/09 12:03:57

s: talos-r3-fed-023
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application crashed (minidump found)
Thread 0 (crashed)
TEST-UNEXPECTED-FAIL | automationutils.processLeakLog() | missing output line for total leaks!
TEST-UNEXPECTED-FAIL | plugin process 2129 | automationutils.processLeakLog() | missing output line for total leaks!
Comment 26 Treeherder Robot 2011-05-10 09:28:22 PDT
bjacob%mozilla.com
http://tinderbox.mozilla.org/showlog.cgi?log=Try/1305040834.1305041873.7905.gz
Rev3 Fedora 12x64 try opt test mochitests-1/5 on 2011/05/10 08:20:34

s: talos-r3-fed64-034
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application crashed (minidump found)
Thread 0 (crashed)
Comment 27 Chris Pearce (:cpearce) 2011-05-11 15:21:41 PDT
Created attachment 531773 [details] [diff] [review]
Patch: Round up wait times to 1ms to ensure we drop the decoder monitor when presenting video frames.

Try what I suggested in comment 23. Seems to work. This means frames may be slightly late(r), but I doubt it will be noticeable given the existing delays in our rendering pipeline. I ran test_playback for several hours on my linux box, and could no longer trigger the intermittent failure.

Greenish on Try: http://tbpl.mozilla.org/?tree=Try&rev=54d99e77149e
Comment 28 Matthew Gregan [:kinetik] 2011-05-11 16:56:35 PDT
Comment on attachment 531773 [details] [diff] [review]
Patch: Round up wait times to 1ms to ensure we drop the decoder monitor when presenting video frames.

It might be better to perform this round up in Wait() itself.
Comment 29 Chris Pearce (:cpearce) 2011-05-11 17:45:00 PDT
Created attachment 531818 [details] [diff] [review]
Patch v2

(In reply to comment #28)
> It might be better to perform this round up in Wait() itself.

Probably a good idea. Looks like it's possible we could hit this same problem in the Wait() call while draining at the end of AudioLoop().
Comment 30 Chris Pearce (:cpearce) 2011-05-11 19:15:55 PDT
http://hg.mozilla.org/mozilla-central/rev/d15d6f06eeb6
Comment 31 Chris Pearce (:cpearce) 2011-05-14 16:36:31 PDT
I declare victory!
Comment 32 Treeherder Robot 2011-08-29 11:44:26 PDT
mfinkle%mozilla.com
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-Aurora/1314638597.1314639381.7045.gz
Rev3 WINNT 5.1 mozilla-aurora opt test mochitests-1/5 on 2011/08/29 10:23:17

s: talos-r3-xp-011
TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | application timed out after 330 seconds with no output
PROCESS-CRASH | /tests/content/media/test/test_playback.html | application crashed (minidump found)
Thread 28 (crashed)
Comment 33 Treeherder Robot 2011-12-03 02:14:11 PST
Ms2ger%gmail.com
https://tbpl.mozilla.org/php/getParsedLog.php?id=7722775&tree=Try
Rev3 Fedora 12x64 try debug test mochitests-1/5 on 2011-12-03 01:24:55

76291 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_playback.html | Test timed out.
TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds with no output
PROCESS-CRASH | Shutdown | application crashed (minidump found)
Thread 0 (crashed)

Note You need to log in before you can comment on or make changes to this bug.