Closed Bug 1024858 Opened 6 years ago Closed 6 years ago

SourceBuffer switching hangs in MSR::InitializePendingDecoders due to data starvation


(Core :: Audio/Video, defect)

Not set





(Reporter: kinetik, Assigned: kinetik)


(Blocks 1 open bug)



(1 file, 1 obsolete file)

Previous testing always had perfect alignment or a slight overlap.
I can't reproduce the bug as described, and I can't see how it can happen in the current code, so I'm morphing this into a similar bug that I can reproduce (and which may be this bug and was incorrectly diagnosed originally).

Steps to reproduce:
1. Start a YT MSE video playing and open the stats for nerds display
2. Force it to low quality (240p) playback
3. Once it switches, quickly (to avoid buffering too much at 240p) back to auto
4. Sometimes the playback will hang when attempting to switch

What I'm seeing in the MSE logs at step 4 is a new initialization segment appended to an existing SourceBuffer causing a new decoder to start up.  The player appends only enough to trigger the initialization segment switch logic (~250 bytes) and never appends more data.  The decoder thread then hangs inside MSR::InitializePendingDecoders (called from an event dispatched by EnqueueDecoderInitialization) while waiting for enough data to return from ReadMetadata.
Summary: MSE doesn't switch playback to next SourceBuffer when there's a ~1 frame gap between buffers → SourceBuffer switching hangs in MSR::InitializePendingDecoders due to data starvation
tl;dr: Fx MSE clears 'buffered' on adaptation.

- Background -

The MSE Source Buffer Monitoring algorithm is implemented partially or incorrectly in a number of browsers, and as a result the YT player has a number of workarounds. One of these is based on the observation that most media stacks sync to the audio clock. If video buffered ranges is, say, [0, 10], but audio is [0, 20], the spec says that we should hit HAVE_CURRENT_DATA at t=10 and stall, but most platforms keep on playing until t=20, displaying no video or even corrupt video during [10, 20]. So instead, the YT player withholds audio appends until that portion of the timeline has been covered by video appends; in other words it'll only append audio past t=10 once the video buffered ranges expands past [0,10].

This algorithm uses the current media time to identify the buffered time ranges corresponding to the appropriate segment. On a spec-compliant browser, this would be safe, as the source buffer monitoring algorithm prohibits playback unless the current time is a member of the buffered ranges of all active source buffers. 

- Problem -

In Firefox, appending a new initialization segment immediately clears all buffered ranges for the source buffer. Any media data appended subsequently starts a new buffered time range.

Here's an example scenario where this produces the observed results:

- Initial format is chosen to be 240p.

- Video is appended from [0,15]. Because the duration is not added to the buffered ranges, videoSourceBuffer.buffered.end(0) has the value '14.98'.

- Audio is appended from [0,15], using linear interpolation to estimate the byte offset that corresponds to 15s within the chunk. Because this method is imperfect, audioSourceBuffer.buffered.end(0) is '14.9'. (The alternative, a full sample-accurate parser in JS, would be excessive.)

- 360p is selected. A new initialization segment is appended for 360p. videoSourceBuffer.buffered.length is 0.

- A video segment from [15, 25] is appended.

--> If buffered ranges were retained and merged here into a single segment, everything would be fine.

- Playback advances, but because the current time (in the range 0 -> 14.9) and the video buffered ranges (in the range 15 to 25) never intersect, the logic which allows audio data to be appended is never satisfied.

- Video playback proceeds to 14.9 and stalls hard due to lack of data.

- Occasionally, something like a tab switch will cause the decoder to pull the next available frame, which happens to be at 15s.

- Current time now is set to 15s, which intersects with the current buffered range and allows appending of audio. Playback resumes.

- Suggested fix -

Don't discard all media data on adaptation, but instead merge new data with existing data in the media timeline.

Because the Source Buffer Monitoring algorithm indicates that the eviction on new initialization segment would immediately result in a playback stall, and thus make seamless resolution switching impossible, I believe that the problem here is with Fx's implementation of Media Source, and not the way YT player is using it. Let me know if you have a different opinion and we can discuss.

- Repro tools -

It's possible to observe this manually by wiring up an adaptation example, which I'll do in a bit.

Here's a Greasemonkey script which configures the initial byterate used by the HTML5 player. This can be used to trigger various kinds of adaptive behavior near startup.


// ==UserScript==
// @name        Set initial bandwidth
// @namespace   ytl
// @description Set initial bandwidth on YT.
// @include*
// @include*
// @version     1
// @grant       none
// @run-at      document-start
// ==/UserScript==

unsafeWindow.console.log('Setting initial bandwidth');
unsafeWindow.localStorage['yt-player-bandwidth'] = JSON.stringify({
  data: JSON.stringify({
    delay: 0,
    tailDelay: 0,
    byterate: 40000
  expiration: + (3600 * 24 * 7 * 1000)

Thanks Steve, that comment and the two new bugs you filed are immensely helpful!
Also bug 1050083 for WebM buffered fixes.
Reimplement @buffered for media elements using a MediaSource in terms of the
active source buffers (matching MSE spec).  Also include buffered ranges
from discarded decoders in SourceBuffer @buffered.
Attachment #8470613 - Flags: review?(cajbir.bugzilla)
Attachment #8469880 - Attachment is obsolete: true
Attachment #8470613 - Flags: review?(cajbir.bugzilla) → review+
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
You need to log in before you can comment on or make changes to this bug.