Last Comment Bug 761917 - Investigate Linux test timeouts now that cubeb has landed
: Investigate Linux test timeouts now that cubeb has landed
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Audio/Video (show other bugs)
: Trunk
: All Linux
: -- normal (vote)
: ---
Assigned To: Matthew Gregan [:kinetik]
:
Mentors:
Depends on:
Blocks: 684125 684174 693067 708343 709557 709558 780490 780491 780492
  Show dependency treegraph
 
Reported: 2012-06-05 22:45 PDT by Matthew Gregan [:kinetik]
Modified: 2012-11-12 18:15 PST (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch v0 (1.41 KB, patch)
2012-08-16 21:45 PDT, Matthew Gregan [:kinetik]
cajbir.bugzilla: review+
Details | Diff | Review

Description Matthew Gregan [:kinetik] 2012-06-05 22:45:21 PDT
The ALSA backend for cubeb includes a watchdog to kill streams that are running but never marked as ready by ALSA.  This was put in place to work around a large number of test timeouts seen during Linux mochitest-1 runs.

I was reasonably confident that this workaround was sufficient, but we're still seeing frequent timeouts that seem to be caused by the problem the watchdog is supposed to work around.  I need to investigate the cause of the test failures and that the watchdog is functioning correctly after the last minute rework of the ALSA backend before landing.
Comment 1 Matthew Gregan [:kinetik] 2012-06-05 23:40:36 PDT
Failure with some logging: https://tbpl.mozilla.org/php/getParsedLog.php?id=12406755&tree=Try

Not quite enough logging to be sure what's wrong, but it looks as if stream 0x9a628f20 entered DRAINING but was then put into state INACTIVE before the drain callback ran.

Streams start INACTIVE, and once RUNNING only return to INACTIVE state by a call to cubeb_stream_stop or a successful completion of DRAINING state.

For nsBufferedAudioStream::DataCallback to return less than aFrames (and caused cubeb to put the stream into the DRAINING state), the nsBufferedAudioStream must have been put into its DRAINING state by a call to Drain() (which then waits for a DRAINING -> DRAINED transition caused by nsBufferedAudioStream::StateCallback).

Given that nsBufferedAudioStream's Pause (which calls cubeb_stream_stop) and Drain are both called from a single thread, and Drain is blocking, it seems impossible that a call to cubeb_stream_stop caused the invalid DRAINING -> INACTIVE transition.
Comment 2 Matthew Gregan [:kinetik] 2012-08-15 23:35:40 PDT
My debugging in comment 1 was off; it looks like the main problem is simply that it was possible to restart a cubeb_stream that had been disabled, and that would result in the watchdog workaround for broken ALSA PCM's effectively being disabled.

So here's a try run with a possible fix: https://tbpl.mozilla.org/?tree=Try&rev=6108e1936d71

The test_played and test_bug495145 failures are both waiting for tests using seek.webm to complete, which doesn't contain an audio track.  There's also a couple of the usual failures in test_error_in_video_document and test_bug726904, neither related to audio.

The only one that's a mystery, and that I don't recall ever seeing before, is the crash when shutting down the tests.  It's audio related (looks like a null deref in AudioLoop), and I don't have any immediate ideas as to the cause.  I'll push some more debugging to try.
Comment 3 Matthew Gregan [:kinetik] 2012-08-15 23:58:04 PDT
https://tbpl.mozilla.org/?tree=Try&rev=e2defc70a05a
Comment 4 Matthew Gregan [:kinetik] 2012-08-16 21:45:52 PDT
Created attachment 652668 [details] [diff] [review]
patch v0

Try's green so far (~120-130 green M1s), but it's going to take forever to get the rest of the results with the current try queue.  This fix definitely improves things, so we might as well land it.  I'll reenable the disabled Linux tests in their respective bugs after this lands.
Comment 5 Matthew Gregan [:kinetik] 2012-08-16 22:11:32 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/221388169ca7

Leave this open for now, I'll follow it up later.
Comment 6 Ed Morley [:emorley] 2012-08-17 05:26:45 PDT
https://hg.mozilla.org/mozilla-central/rev/221388169ca7
Comment 7 Matthew Gregan [:kinetik] 2012-08-19 16:58:45 PDT
(In reply to Matthew Gregan [:kinetik] from comment #5)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/221388169ca7

So I've got 1356 runs of the content/media parts of M1 here (441 Linux opt, 447 Linux debug, 244 Linux64 opt, 254 Linux64 debug).  There were 19 M1 failures (2 each on the 64-bit builds, the rest on 32-bit): 7 were a mix of test_bug726904 and test_error_in_video_document and are obviously unrelated to audio, 1 is a null deref in MediaStreamGraphImpl::RemoveStream, and the remaining 11 were test timeouts.

Of the test timeouts, 8 were test_played failing on seek.webm#27 (which has no audio), 3 were test_bug495145 on seek.webm{3a,3b}, and 2 were test_bug493187 (seek.webm#4).

The shutdown crash I mentioned in comment 2 hasn't shown up; either the debug logging I added hid it, or the debug logging I removed from my earlier patch caused it (unlikely).

Based on that, I'm calling this fixed.

Note You need to log in before you can comment on or make changes to this bug.