Closed Bug 761917 Opened 12 years ago Closed 12 years ago

Investigate Linux test timeouts now that cubeb has landed

Categories

(Core :: Audio/Video, defect)

All
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: kinetik, Assigned: kinetik)

References

Details

Attachments

(1 file)

The ALSA backend for cubeb includes a watchdog to kill streams that are running but never marked as ready by ALSA.  This was put in place to work around a large number of test timeouts seen during Linux mochitest-1 runs.

I was reasonably confident that this workaround was sufficient, but we're still seeing frequent timeouts that seem to be caused by the problem the watchdog is supposed to work around.  I need to investigate the cause of the test failures and that the watchdog is functioning correctly after the last minute rework of the ALSA backend before landing.
Blocks: 684125, 693067, 684174
Failure with some logging: https://tbpl.mozilla.org/php/getParsedLog.php?id=12406755&tree=Try

Not quite enough logging to be sure what's wrong, but it looks as if stream 0x9a628f20 entered DRAINING but was then put into state INACTIVE before the drain callback ran.

Streams start INACTIVE, and once RUNNING only return to INACTIVE state by a call to cubeb_stream_stop or a successful completion of DRAINING state.

For nsBufferedAudioStream::DataCallback to return less than aFrames (and caused cubeb to put the stream into the DRAINING state), the nsBufferedAudioStream must have been put into its DRAINING state by a call to Drain() (which then waits for a DRAINING -> DRAINED transition caused by nsBufferedAudioStream::StateCallback).

Given that nsBufferedAudioStream's Pause (which calls cubeb_stream_stop) and Drain are both called from a single thread, and Drain is blocking, it seems impossible that a call to cubeb_stream_stop caused the invalid DRAINING -> INACTIVE transition.
Blocks: 780490, 780491, 780492
My debugging in comment 1 was off; it looks like the main problem is simply that it was possible to restart a cubeb_stream that had been disabled, and that would result in the watchdog workaround for broken ALSA PCM's effectively being disabled.

So here's a try run with a possible fix: https://tbpl.mozilla.org/?tree=Try&rev=6108e1936d71

The test_played and test_bug495145 failures are both waiting for tests using seek.webm to complete, which doesn't contain an audio track.  There's also a couple of the usual failures in test_error_in_video_document and test_bug726904, neither related to audio.

The only one that's a mystery, and that I don't recall ever seeing before, is the crash when shutting down the tests.  It's audio related (looks like a null deref in AudioLoop), and I don't have any immediate ideas as to the cause.  I'll push some more debugging to try.
Attached patch patch v0Splinter Review
Try's green so far (~120-130 green M1s), but it's going to take forever to get the rest of the results with the current try queue.  This fix definitely improves things, so we might as well land it.  I'll reenable the disabled Linux tests in their respective bugs after this lands.
Attachment #652668 - Flags: review?(chris.double)
Attachment #652668 - Flags: review?(chris.double) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/221388169ca7

Leave this open for now, I'll follow it up later.
Whiteboard: [leave open]
(In reply to Matthew Gregan [:kinetik] from comment #5)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/221388169ca7

So I've got 1356 runs of the content/media parts of M1 here (441 Linux opt, 447 Linux debug, 244 Linux64 opt, 254 Linux64 debug).  There were 19 M1 failures (2 each on the 64-bit builds, the rest on 32-bit): 7 were a mix of test_bug726904 and test_error_in_video_document and are obviously unrelated to audio, 1 is a null deref in MediaStreamGraphImpl::RemoveStream, and the remaining 11 were test timeouts.

Of the test timeouts, 8 were test_played failing on seek.webm#27 (which has no audio), 3 were test_bug495145 on seek.webm{3a,3b}, and 2 were test_bug493187 (seek.webm#4).

The shutdown crash I mentioned in comment 2 hasn't shown up; either the debug logging I added hid it, or the debug logging I removed from my earlier patch caused it (unlikely).

Based on that, I'm calling this fixed.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [leave open]
Blocks: 709558
Blocks: 709557
Blocks: 708343
Depends on: 786539
No longer depends on: 786539
Blocks: 798440
No longer blocks: 798440
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: