The duration of a WebM frame is tricky to determine. In general, we use the start time of the next frame as the end time of the current frame. This isn't possible for the last frame in a stream. In that case, we can use the media's duration if the media specifies a duration and we know we're at the end of the stream. For any other case where we don't have the next frame data available, we guess the duration by reusing the duration of the preceding frame. This mostly works fine for video, because most video is a fixed frame rate. A quick survey of my local WebM files reveals that there's a mixture of files with fixed duration audio frames and varying duration audio frames. Selecting a file with varying duration audio frames at random (with a video frame rate of 30 FPS) reveals: Count | Duration (ms) -------+---------------- 1916 | 11 2977 | 12 1759 | 17 1326 | 18 11197 | 23 3108 | 24 ...so the chance of calculating an incorrect audio frame duration is high, with a worst case error of 13ms for this file. This isn't a major issue for regular playback, but becomes a more serious problem in MSE where data is appended by script in chunks and we must calculate timestamps as the data is appended. The following places are currently using the "preceding frame duration" code: WebMBufferedState::CalculateBufferedForRange WebMContainerParser::ParseStartAndEndTimestamps WebMReader::DecodeVideoFrame
Updated summary. This is true for video frames too, with variable rate videos and because of timecode resolution errors (see bug 1064705 comment 5).
Summary: Frame duration of WebM audio frames can be inaccurate → Estimated frame duration of WebM frames can be inaccurate
Blink uses the maximum difference in inter-frame timestamps, falling back to kDefaultBufferDurationInMs = 125 if no such data is available. It may be that this is never used for increasing MediaSource duration, and only presentation timestamps increase MediaSource duration. FrameProcessor::ProcessFrame bails early if there is no frame duration
This does something pretty similar to what chrome does, and uses the largest frame duration (of the block of data that was appended) as the duration for the final frame. This is likely to be an over-estimate, but if we attempt to read past the real end (either by playback or seeking), then we'll get EOS, and cajbir's patch from bug 1089937 will transition us to the next reader. This adds some extra overhead since we iterate all the added frames, but I don't think that should matter much.
Note that with this patch, we can remove the fuzz factor from bug 1089937 since the buffered ranges now overlap (at least for the examples I've looked at).
Comment on attachment 8517029 [details] [diff] [review] Use the largest duration I'm not yet convinced that an over-estimate is harmless. This is a tricky problem though so I'm very happy to continue discussion. If one reader's range is an over-estimate and overlaps the next reader then SelectReader(), as currently implemented at least, may select the earlier reader when we want the next reader. As you say, we'll get OnNotDecoded() and we'll try SelectReader with a later target, but then we may skip the decoder for the target that we really wanted, skipping frames. Similarly when reporting SourceBuffer.buffered, I'd prefer to report a smaller, possibly minimal or even zero duration for the last frame, to err on the side of telling the player that we don't have frames so that it can append a possible over-estimate of the missing frames, rather than an under-estimate. A fuzz test would be useful at some point to ensure we don't reduce the duration reported in the init segment. If there is no immediate need for this in light of plans discussed in bug 1089937, then I'd prefer to leave this for now, at least. It's also not clear that this is always an over-estimate. If the duration of the last frame is intended to be greater than others in the run, then the fuzz factor from bug 1089937 may still be required. I'm not worried about the overhead. I haven't thought too hard about it but I expect it can be addressed if that is a problem. I suspect that SelectReader is the place to make estimates. If we improve the algorithm in SelectReader, to accept a tolerance, but aim for the nearest reader, then I expect we won't need to over-estimate the duration of the last frame. It may still be useful to track the largest duration, so we know when our fuzz factor is not large enough.
4 years ago
Priority: -- → P5
Component: Audio/Video → Audio/Video: Playback
This is no longer a major problem with the new MSE as we don't rely as much on the calculated buffered range to perform operations. instead we demux all frames found. The last frame duration will be wrong ; however we typically wait until we are certain there's no further frames coming following bug 1199878. Marking as won't fix
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.