Open Bug 1762976 Opened 3 years ago Updated 2 months ago

Interoperable <video> underflow

Categories

(Core :: Audio/Video: Playback, enhancement)

enhancement

Tracking

()

People

(Reporter: jrmuizel, Unassigned, NeedInfo)

References

(Blocks 1 open bug)

Details

See https://github.com/whatwg/html/issues/6359 for the spec side of this.

Blocks: twitch

Paul, I think we discussed this in a meeting previously, but I don't remember what our position was. What are your feelings about this?

Flags: needinfo?(padenot)

Just to be clear, the spec currently requires playback to be suspended when either the video or audio track does not have corresponding data available:

  • HTMLMediaElement.buffered is an intersection of buffered from each object in MediaSource.activeSourceBuffers if MediaSource.readyState is not ended.
  • SourceBuffer.buffered is an intersection of track ranges for each "audio and video track buffer managed by this SourceBuffer" if MediaSource.readyState is not ended.
  • Coded Frame Processing for MSE affects HTMLMediaElement.readyState according to parsed (but not decoded) coded frames.
  • Similarly, SourceBuffer Monitoring suspends playback when "HTMLMediaElement.buffered contains a TimeRanges that ends at the current playback position and does not have a range covering the time immediately after the current position"

However, I expect the playback experience can be better, in at least some situations, if the rules are bent a little here.

If enough audio is available to continue playback while continuing to present the most recent video frame while more recent frames are not available, and the next frame does eventually arrive, then there is some non-zero maximum interval for which the continued playback without video frame updates is preferable to suspension of playback.
Suspension of playback is an interrupting and negative experience.
Temporary suspension of video is very often less jarring than temporary suspension of audio.

The ideal maximum interval would depend on the media content, but, even with content where video quality is more important than audio, the ideal maximum interval would almost always still be greater than zero due to the advantages of smoother playback.

If the next video frame does not arrive within the chosen maximum interval, then playback needs to be suspended even after continuing to play audio for a period of time. After suspension and when sufficient video data does arrive, there is the option of resuming playback from the timestamp of the last video frame. i.e. repeating some of the audio. The experience would be no worse than current Firefox behavior, even for content where users would prefer to see each video frame. That would mean halting currentTime as soon as data for the next video frame is no longer available. Such a change in behavior would be largely unnoticed by content.

So I believe there are improvements we could make to the playback experience.
However, if WebCodecs would allow content to choose its preferred solution (and I assume it would), then there is a question of whether priority should be given to shipping WebCodecs.

The websites can indeed implement their own playbacks by using WebAudio + WebCodecs + some custom javascript demuxers like this if they want, or do some custom video-sink via mediacapture-transform API. Does this what comment 3 suggest?

Thank you for the links. Those are the kind of solutions that I had in mind.
Comment 3 is asking a question about priority rather than suggesting an answer.

I usually prefer to give developers low level tools to do what they like, rather than providing higher level tools that may be easier to use but don't necessarily do things exactly as desired.

That depends, however, on whether those low level tools (WebCodecs, etc) can provide something as efficient as the higher level (MediaSource). I don't know whether that can be expected or not?

If the foreseeable low level tools are not going to be as efficient, or are going to have other difficulties such as less effective A/V sync, then they are not going to be a full solution here.

What about providing a property that controls how long playback is allowed to continue without video?

Flags: needinfo?(karlt)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #6)

What about providing a property that controls how long playback is allowed to continue without video?

That could plausibly be useful, and seems to have been proposed for "Support playback through unbuffered ranges". It wouldn't be a full solution to that spec issue, as there's still the question of where to resume from when data does arrive, which would depend on whether latency is more important than playback quality. There are also some finer details to fill in such as whether that would be a per-SourceBuffer or per-track property.

However, I don't feel we need to block progress on having a new property.
I expect we could pick a single value that would almost always lead to a better experience.

Flags: needinfo?(karlt)
You need to log in before you can comment on or make changes to this bug.