Media Source: retain appended media to improve seeking backwards




5 years ago
4 years ago


(Reporter: strobe, Unassigned)


(Blocks: 1 bug)

34 Branch

Firefox Tracking Flags

(Not tracked)




5 years ago
In the current implementation of Media Source, media data is evicted as soon as it is played. This is a feature request to change that behavior, preferably to a model where eviction is based on time or memory pressure, so that a consistent sequence of appends will produce a more consistent set of buffered ranges regardless of the advancement or nonadvancement of the playback during the appends.

The current behavior of evicting immediately will degrade quality of experience; on YouTube, for instance, something like 20% of playbacks include a seek to a time smaller than the current time. If media data was preserved, these seeks would be nigh-instant. As it stands, they take a few seconds to complete (with workarounds for other current issues in place).

Immediately evicting data can cause implementations to be fragile. Observe Fx behavior in the 480p VP9 transcode on this video:  The video features a few seconds of black frames, which are all coded as individual frames but are naturally essentially 0 bytes. The logo which appears at 3.042s kicks off a new keyframe. This means that Firefox consumes the entire first segment in its first request for decode data, which immediately evicts the time from [0,3.042]. This means that for the first three seconds of playback of the 480p transcode, there is "apparently" no data covering the given time range as exposed via Media Source. (In this case, playback is actually specified to stall, as per MSE 3.5.12 step 3 substep 5.) 

This behavior also makes client implementations more complicated. Media Source client applications require knowledge about what is and is not present in a source buffer, and for maximum portability many require a series of workarounds for implementation quirks. By being so prompt about backwards evictions, many of those workarounds are compromised. and are examples of this.

In Chrome, IE11, and Safari 8, as well as a huge number of embedded platforms, Source Buffer evictions happen as a result of memory pressure. I personally have had a lot of experience with this strategy, and while it's not perfect, it works quite well and results in more or less predictable behavior for both authors and implementers, even in the face of other platform issues such as the significant delay and long hidden buffers that are the hallmark of many embedded hardware pipelines.

It's true that the spec is hand-wavey about eviction algorithms, so this behavior isn't technically wrong (except insofar as the Source Buffer Monitoring algorithm would cause a pipeline stall in the example given in this bug). However, I still advocate for an informally similar behavior even if the full behavior isn't spec'd.

As a point of comparison, Chrome's behavior can be summarized at a high level as:

- While we're over the limit, and the earliest segment in the buffer does not contain the current time, remove the earliest segment.
- While we're over the limit, and the latest segment in the buffer does not contain the current time, remove the latest segment.
- If we're still over the limit, the segment that contains the current time is itself over the limit; throw a QUOTA_EXCEEDED error.

This works pretty well in practice.
Blocks: 778617
Ever confirmed: true

Comment 1

5 years ago
OK, so I just looked at the code, and it seems that this is mostly in place already, which is sweet - Fx is already evicting based on memory pressure, in theory. Yet it definitely appears to be evicting right up to currentTime in practice.

I think the reason for the observed behavior is this constant:
const int evict_threshold = 1000000;

For video, this is crazy-small and we'll basically always evict right up to the current time. In Chrome, the equivalent constant is 150MB for video, 12MB for audio:

In IE11, I've observed it to be well over 300MB, though I don't know that it's a hard-and-fast rule. I'll go discover what Safari does when I'm back in front of a Yosemite box.

I'd recommend one or both of the following changes:

- Raise this constant to a larger size in bytes. YT currently tries to buffer ahead 20MB of video in auto quality mode, 60MB in manual quality, so maybe >=75MB to leave some breathing room for seekback in manual quality mode?*

- Add a padding constant (say 10 seconds) to the threshold by which media data around the current time is protected from eviction, to protect against any circumstance in which small offsets result in premature eviction of media data as with the example video.

Hopefully this is simple enough to get cherrypicked into 33?

* To be clear, I'm not attempting to mandate a particular implementation choice, just making a suggestion that would happen to work if there aren't any objections. I'm sad that the YouTube player even has constants hardcoded at all for this but it's really hard to discover the eviction size dynamically (this is the "it's not perfect" alluded to above).
Some WIP here: bug 1049318 covers making the eviction less aggressive (basically does what Steve suggested), and bug 1049133 and bug 1050652 cover making the eviction offset estimation more accurate.
We have adjusted the values and have other bugs that relate to improving eviction. We'll close this one for now.
Last Resolved: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.