Open Bug 1282260 Opened 8 years ago Updated 2 years ago

Intermittent dom/media/test/test_playback_rate.html | big.wav: We are effectively slowing down playback. (13.641, 9.287982)

Categories

(Core :: Audio/Video: Playback, defect, P5)

defect

Tracking

()

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Component: Audio/Video → Audio/Video: Playback
I am looking into migrating jobs from hardware to AWS instances, specifically for windows 7.  I run into this bug often:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7a4482c92162a6208922211588f2fa399ffbefbe&group_state=expanded&filter-searchStr=vm%20mda

out of 100 jobs of mochitest-media-e10s (50 opt/ 50 debug), there are 20 failures, about 14 of them have this specific failure.

:k17e, can you find someone to help us debug this and figure it out?
Flags: needinfo?(ajones)
here is a clip from a log (win7 debug: http://archive.mozilla.org/pub/firefox/try-builds/jmaher@mozilla.com-7a4482c92162a6208922211588f2fa399ffbefbe/try-win32-debug/try_win7_vm-debug_test-mochitest-media-e10s-bm129-tests1-windows-build291.txt.gz):
00:15:46     INFO -  557 INFO TEST-FAIL | dom/media/test/test_playback_rate.html | The author of the test has indicated that flaky timeouts are expected.  Reason: untriaged
00:15:46     INFO -  558 INFO TEST-PASS | dom/media/test/test_playback_rate.html | seek.ogv: Current time should not change when playbackRate is null (2.012273 2.012273).
00:15:46     INFO -  559 INFO TEST-PASS | dom/media/test/test_playback_rate.html | seek.ogv: The element should not be in paused state.
00:15:46     INFO -  560 INFO TEST-PASS | dom/media/test/test_playback_rate.html | seek.ogv: Paused event should not have been received.
00:15:46     INFO -  561 INFO TEST-PASS | dom/media/test/test_playback_rate.html | seek.ogv: Playback rate should be clamped to 5.
00:15:46     INFO -  562 INFO TEST-PASS | dom/media/test/test_playback_rate.html | seek.ogv: The playback rate should still be 5.
00:15:46     INFO -  563 INFO TEST-PASS | dom/media/test/test_playback_rate.html | seek.ogv: The audio should be muted when playing at high speed, but should not appear as such.
00:15:46     INFO -  564 INFO TEST-PASS | dom/media/test/test_playback_rate.html | seek.ogv: Current time should be equal to the duration (not change by playback rate).
00:15:46     INFO -  565 INFO [finished seek.ogv-1] remaining= big.wav-0
00:15:46     INFO -  566 INFO TEST-PASS | dom/media/test/test_playback_rate.html | [finished seek.ogv-1 t=9.556] Length of array should match number of running tests
00:15:46     INFO -  567 INFO TEST-PASS | dom/media/test/test_playback_rate.html | [started seek.webm-2 t=9.559] Length of array should match number of running tests
00:15:46     INFO -  568 INFO TEST-PASS | dom/media/test/test_playback_rate.html | seek.webm: Pitch preservation should be enabled by default.
00:15:46     INFO -  569 INFO TEST-PASS | dom/media/test/test_playback_rate.html | seek.webm: playbackRate should be initially 1.0
00:15:46     INFO -  570 INFO TEST-PASS | dom/media/test/test_playback_rate.html | seek.webm: defaultPlaybackRate should be initially 1.0
00:15:46     INFO -  571 INFO TEST-PASS | dom/media/test/test_playback_rate.html | seek.webm: PlaybackRate should be clamped to 0.25.
00:15:46     INFO -  572 INFO TEST-PASS | dom/media/test/test_playback_rate.html | big.wav: If we disable the pitch preservation, it should appear as such.
00:15:46     INFO -  573 INFO TEST-PASS | dom/media/test/test_playback_rate.html | big.wav: The playback rate shoud be 0.25.
00:15:46     INFO -  574 INFO TEST-UNEXPECTED-FAIL | dom/media/test/test_playback_rate.html | big.wav: We are effectively slowing down playback. (13.592, 9.287982)
00:15:46     INFO -      ontimeupdate@dom/media/test/test_playback_rate.html:56:5
00:15:46     INFO -      EventListener.handleEvent*startTest@dom/media/test/test_playback_rate.html:150:3
00:15:46     INFO -      MediaTestManager/this.nextTest@dom/media/test/manifest.js:1630:7
00:15:46     INFO -      MediaTestManager/this.runTests/<@dom/media/test/manifest.js:1567:7
00:15:46     INFO -  575 INFO TEST-FAIL | dom/media/test/test_playback_rate.html | The author of the test has indicated that flaky timeouts are expected.  Reason: untriaged
00:15:46     INFO -  576 INFO [finished big.wav-0] remaining= seek.webm-2
00:15:46     INFO -  577 INFO TEST-PASS | dom/media/test/test_playback_rate.html | [finished big.wav-0 t=14.153] Length of array should match number of running tests
00:15:46     INFO -  578 INFO TEST-PASS | dom/media/test/test_playback_rate.html | [started gizmo.mp4-3 t=14.154] Length of array should match number of running tests
00:15:46     INFO -  579 INFO TEST-PASS | dom/media/test/test_playback_rate.html | gizmo.mp4: Pitch preservation should be enabled by default.
00:15:46     INFO -  --DOMWINDOW == 9 (0775F400) [pid = 5252] [serial = 449] [outer = 00000000] [url = about:blank]
00:15:47     INFO -  580 INFO TEST-PASS | dom/media/test/test_playback_rate.html | gizmo.mp4: playbackRate should be initially 1.0
00:15:47     INFO -  581 INFO TEST-PASS | dom/media/test/test_playback_rate.html | gizmo.mp4: defaultPlaybackRate should be initially 1.0
Hi Farmer,
Can you have this in your queue? It looks like the playback rate bug you solved last time.
Flags: needinfo?(ajones) → needinfo?(fatseng)
Yes, I would check it.
Flags: needinfo?(fatseng)
thank you!  This is easy to test on try:
 try: -b do -p win32 -u mochitest-media-e10s[Windows 7] -t none

that will run both windows 7, and windows 7-vm, the windows7-vm is where we see the problems happen.

do ping me if you have questions or issues regarding reproducing this.
Thanks Joel.
I could reproduce it from try server now and looking at it.
(In reply to Farmer Tseng[:fatseng] from comment #10)
> Thanks Joel.
> I could reproduce it from try server now and looking at it.
Have fun!
Assignee: nobody → fatseng
While playing audio, MediaFormatReader receives END_OF_STREAM event immediately. I'm digging out the root cause.
:fatseng, have you had any luck figuring this out?  Is there any more I can do to help get to the root cause?
Flags: needinfo?(fatseng)
To summarize my finding.
According to Joel Maher suggestion, ran try server by “try: -b do -p win32 -u mochitest-media-e10s[Windows 7] -t none”. 

I observed that it only happen in Windows 7 VM.
playback_rate.html set playback rate as 0.25 at beginning.
The duration of big.wav is about 9.2 seconds.
Therefore, audio engine should spend 36.8 seconds to finish playing.
I printed out the value of stream_delay.
stream_delay means how many frames remain in audio engine.
In failure case, when playing to 3.035294 position(duration: 9.287982), stream_delay still has 1076429 frames.
1076429 / 44100 = 24.408820 seconds
Afterward, stream_delay became to  -743 suddenly, audio engine play to end position.
It looks like there is a problem in Windows audio engine.

Here is my try result.
 https://treeherder.mozilla.org/#/jobs?repo=try&revision=da7e9da1b835&selectedJob=25456025

Here is full log.
https://archive.mozilla.org/pub/firefox/try-builds/fatseng@mozilla.com-da7e9da1b83536799ff2fded3ed6692ad8c42971/try-win32-debug/try_win7_vm_gfx-debug_test-mochitest-media-e10s-bm137-tests1-windows-build91.txt.gz

00:00:39     INFO -  cubeb --- get_position: 1569519, stream_delay: 1048383, stream_rate: 44100, mix_rate: 44100
00:00:39     INFO -  Farmer --- UpdatePlaybackPosition--audio/wave; codecs=1, 2954285, duration: 11848
00:00:39     INFO -  585 INFO TEST-PASS | dom/media/test/test_playback_rate.html | big.wav: Farmer -- The current time is 2.954285, duration is 9.287982.
00:00:39     INFO -  cubeb --- get_output_buffe frame_count:1323, output_buffer_frame_count:4410, padding_out:3087
00:00:39     INFO -  Farmer --- cubeb -- frames_written: 1573488, out_frames: 1323, output_frames_needed: 1323.
00:00:39     INFO -  cubeb --- get_output_buffe frame_count:1323, output_buffer_frame_count:4410, padding_out:3087
00:00:39     INFO -  Farmer --- cubeb -- frames_written: 1580103, out_frames: 1323, output_frames_needed: 1323.
00:00:40     INFO -  cubeb --- get_output_buffe frame_count:1323, output_buffer_frame_count:4410, padding_out:3087
00:00:40     INFO -  Farmer --- cubeb -- frames_written: 1586718, out_frames: 1323, output_frames_needed: 1323.
00:00:40     INFO -  DECODER_STATE_COMPLETED
00:00:40     INFO -  cubeb --- get_position: 1590687, stream_delay: 1062406, stream_rate: 44100, mix_rate: 44100
00:00:40     INFO -  Farmer --- UpdatePlaybackPosition--audio/wave; codecs=1, 2994790, duration: 12009
00:00:40     INFO -  cubeb --- get_output_buffe frame_count:1323, output_buffer_frame_count:4410, padding_out:3087
00:00:40     INFO -  Farmer --- cubeb -- frames_written: 1593333, out_frames: 1323, output_frames_needed: 1323.
00:00:40     INFO -  cubeb --- get_output_buffe frame_count:1323, output_buffer_frame_count:4410, padding_out:3087
00:00:40     INFO -  Farmer --- cubeb -- frames_written: 1599948, out_frames: 1323, output_frames_needed: 1323.
00:00:40     INFO -  cubeb --- get_output_buffe frame_count:1323, output_buffer_frame_count:4410, padding_out:3087
00:00:40     INFO -  Farmer --- cubeb -- frames_written: 1606563, out_frames: 1323, output_frames_needed: 1323.
00:00:40     INFO -  DECODER_STATE_COMPLETED
00:00:40     INFO -  cubeb --- get_position: 1611855, stream_delay: 1076429, stream_rate: 44100, mix_rate: 44100
00:00:40     INFO -  Farmer --- UpdatePlaybackPosition--audio/wave; codecs=1, 3035294, duration: 12170
00:00:40     INFO -  586 INFO TEST-PASS | dom/media/test/test_playback_rate.html | big.wav: Farmer -- The current time is 3.035294, duration is 9.287982.
00:00:40     INFO -  cubeb --- get_output_buffe frame_count:1323, output_buffer_frame_count:4410, padding_out:3087
00:00:40     INFO -  Farmer --- cubeb -- frames_written: 1613178, out_frames: 1323, output_frames_needed: 1323.
00:00:40     INFO -  start draining frames_written: 1615080. stm->draining=true, out_frames: 579, output_frames_needed: 1323.
00:00:40     INFO -  cubeb --- get_output_buffe stm->draining is true, padding_out:3087
00:00:40     INFO -   cubeb --- get_output_buffe stm->draining is true, padding_out:1764
00:00:40     INFO -   cubeb --- get_output_buffe stm->draining is true, padding_out:441
00:00:40     INFO -   cubeb --- get_output_buffe stm->draining is true, padding_out:0
00:00:40     INFO -   get_output_buffer: CUBEB_STATE_DRAINED --- Farmer
00:00:40     INFO -  DecodedAudioDataSink::Drained
00:00:40     INFO -  cubeb --- wasapi_stream_render_loop --
00:00:40     INFO -  cubeb --- Last: pos=4283408, freq=352800, cur_pos=12.141179, max_pos=36.550000, delay=24.408821
00:00:40     INFO -            Now: pos=12926592, freq=352800, cur_pos=36.640000, max_pos=36.623129, delay=-0.016871
00:00:40     INFO -  cubeb --- get_position: 1615080, stream_delay: -743, stream_rate: 44100, mix_rate: 44100
00:00:40     INFO -  OnMediaSinkAudioComplete --- Farmer
00:00:40     INFO -  Farmer --- DECODER_STATE_COMPLETED: HasVideo: FALSE, mVideoCompleted: FALSE || HasAudio: TRUE, mAudioCompleted: TRUE --1
00:00:40     INFO -  Farmer --- DECODER_STATE_COMPLETED--audio/wave; codecs=1, 9287982
00:00:40     INFO -  Farmer --- UpdatePlaybackPosition--audio/wave; codecs=1, 9287982, duration: 12237
00:00:40     INFO -  DecodedAudioDataSink::Shutdown
Flags: needinfo?(fatseng)
I am not familiar with libcubeb and WASAPI.
Does any libcubeb expert help to check this?
I think kinetik is not on PTO at the moment.
Flags: needinfo?(kinetik)
stream_delay should never be negative; the caller of current_stream_delay converts it to a uint64_t.  It looks like the assert verifying this was changed in https://github.com/kinetiknz/cubeb/commit/3148466d261f411bb7ef9464dcd8a95373f0dafe but that change looks wrong to me.  That landed via bug 1251502 around 29th March.

So, how do we get into a situation where cur_pos is larger than max_pos?  We must be recording a different value in stm->frames_written to what we're feeding the audio stream, since we expect it to never advance beyond the last frame we sent it.

It looks like we're not handling the draining case correctly in refill - we update stm->frames_written with the value returned from the resampler, but in refill's caller (refill_callback_output) we're using output_frames rather than got when releasing the buffer, which will cause too many frames (probably of silence) to be played from the final buffer.

Farmer, would you mind trying to repro again with the attached (but untested!) patch?  Thanks.
Flags: needinfo?(kinetik)
I pushed patch to try, will update you on status.
thanks for the great work on this bug!
Still fails with patch.
I disable stream_delay in this try, will enable it re-do test.
https://treeherder.mozilla.org/logviewer.html#?job_id=25734301&repo=try#L78159

I observed that out_frames is 1323 in failure case.
In normal case, out_frames is most 441.
Not sure whether this is clue.

[Failure case]
INFO -  cubeb --- get_output_buffer frame_count:1323, output_buffer_frame_count:4410, padding_out:3087
INFO -  Farmer --- cubeb -- frames_written: 1520568, out_frames: 1323, output_frames_needed: 1323.
INFO -  cubeb --- get_output_buffer frame_count:1323, output_buffer_frame_count:4410, padding_out:3087
INFO -  Farmer --- cubeb -- frames_written: 1527183, out_frames: 1323, output_frames_needed: 1323.
INFO -  cubeb --- get_output_buffer frame_count:1323, output_buffer_frame_count:4410, padding_out:3087
INFO - Farmer --- cubeb -- frames_written: 1533798, out_frames: 1323, output_frames_needed: 1323. 

[Normal case]
INFO -  cubeb --- get_output_buffer frame_count:441, output_buffer_frame_count:4410, padding_out:3969
INFO -  Farmer --- cubeb -- frames_written: 134064, out_frames: 441, output_frames_needed: 441.
INFO -  cubeb --- get_output_buffer frame_count:441, output_buffer_frame_count:4410, padding_out:3969
INFO -  Farmer --- cubeb -- frames_written: 136710, out_frames: 441, output_frames_needed: 441.
INFO -  cubeb --- get_output_buffer frame_count:441, output_buffer_frame_count:4410, padding_out:3969
INFO -  Farmer --- cubeb -- frames_written: 138915, out_frames: 441, output_frames_needed: 441.
Flags: needinfo?(kinetik)
Working through this a bit...

Input File     : 'dom/media/test/big.wav'
Channels       : 1
Sample Rate    : 11025
Precision      : 8-bit
Duration       : 00:00:09.29 = 102400 samples ~ 696.599 CDDA sectors
File Size      : 102k
Bit Rate       : 88.2k
Sample Encoding: 8-bit Unsigned Integer PCM

We time stretch it to be 4x longer, so we'd expect to see 409600 frames, but we're resampling it via AudioConverter from 11025Hz to 44100Hz so we really expect to see 1638400 frames.

When libcubeb drains, it has been fed 1615080 frames (slightly less than the 1638400 we'd expect?), which is 36.62s of audio.

The test started at 23:44:15 and failed at 23:44:29, running for a total of 14s.

AudioStream::GetPosition reports the (adjusted) current timestamp is 3.05s * 4 = 12.2s on the wall clock, which is what libcubeb is reporting ((1615080-1077003)/44100 = 12.2s).

So that seems mostly correct, although the large stream_delay is surprising because I can't see how the playback cursor can ~24.42s behind the write cursor on the audio stream because the buffers should only be ~100ms.  stream_delay is supposed to capture extra delay beyond the buffer sizing (e.g. common for Bluetooth), but 24+s seems extraordinary... possibly an oddity of the VM's audio driver?

There's a big discrepancy between the stream_delay time and the time we fire the drain callback, causing the stream to end early and thus setting the currentTime of the audio element under test to the duration.

The drain callback timing is based on counting down the actual buffered data (by waiting for padding_out to drop to 0), which happens with ~5ms of AudioStream starting the drain sequence.  If the stream_delay is valid, we need to adjust the drain callback timing to fire when the audio buffers have emptied *and* account for the stream delay.  But we need to establish whether the stream_delay calculation is valid or if there's something going wrong there, first.
(In reply to Farmer Tseng[:fatseng] from comment #23)
> I observed that out_frames is 1323 in failure case.
> In normal case, out_frames is most 441.
> Not sure whether this is clue.

It's related but I don't know what it means.  Maybe there's a bug in the VM's audio drivers?  I can't see what we're doing differently between normal and broken streams as far as configuring them.

For regular media playback (i.e. not low latency WebAudio/WebRTC) we request 100ms of latency, query the buffer size, and refill based on what we're asked to (via GetCurrentPadding) so we don't control the buffer sizes directly.

Normally, we're getting a buffer of 4410 and callbacks every 441 frames, implying 10x blocks of data.  In the growing stream_delay case, we're getting a buffer of 4410 and callbacks every 1323 frames, which works out to be blocks of ~3.33x, which seems suspicious since it's not evenly divisible.

Curiously, the low-latency streams we create (we request 10ms, but get 30ms which is the minimum for shared-mode WASAPI streams) have a max buffer of 1323 and callbacks every 441 frames (3x blocks in the buffer).  Not sure if this is a coincidence or a clue.

There does seem to be a strong correlation between stream_delay growing with streams that have an out_frames of 1323.  One of the logs I looked at had 3 cases of this (the other two don't depend on the timing of the drain, so don't fail the test), and many cases where our 100ms latency stream would have the regular 441 frame sized blocks.

I pushed a new patch which changes the drain timing to use the audio clock... I expect this to fix the test failure, but it doesn't explain the strange buffer/stream_delay behaviour we're seeing here.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=cccb3c7c5095
Flags: needinfo?(kinetik)
Despite trying to delay the drain callback to fire at the correct time, we're still failing on this test.  The logs show:

23:42:51     INFO -  08C06400: start draining.
23:42:51     INFO -  08C06400: draining: padding_out=0 fpos=537764 maxpos=1615080
23:42:51     INFO -  08C06400: draining: padding_out=0 fpos=1615080 maxpos=1615080

So somehow the audio clock is jumping from ~12.2s to ~36.6 in under a second and we still fire the drain callback early.  I'm out of ideas at this stage.
should we consider disabling this test for windows debug?
Assignee: fatseng → nobody
(In reply to Joel Maher ( :jmaher) from comment #29)
> should we consider disabling this test for windows debug?

It's failing on opt too.  I'd feel more comfortable disabling if we had a better idea what's going wrong.
:kinetic- last week we had no failures on integration/trunk branches, did something change to fix this?  possibly something around Sept 22nd or 23rd?
Flags: needinfo?(kinetik)
(In reply to Joel Maher ( :jmaher) from comment #40)
> :kinetic- last week we had no failures on integration/trunk branches, did
> something change to fix this?  possibly something around Sept 22nd or 23rd?

Nothing I'm aware of...

There was a libcubeb (the audio API) update around the 23rd in bug 1286041, but the only changes were in the PulseAudio backend.  I can't see anything obviously related in the rest of the media or test changes, either.
Flags: needinfo?(kinetik)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: