Open Bug 1903466 Opened 12 days ago Updated 2 days ago

Improve WebM timecode handling to address gaps between video frames

Categories

(Core :: Audio/Video: Playback, defect, P2)

defect

Tracking

()

ASSIGNED

People

(Reporter: kinetik, Assigned: kinetik)

References

Details

Attachments

(1 file)

The current WebM timecode handling (particular around computing the duration of frames) could be improved. https://www.youtube.com/watch?v=y5EazNziymk is an example of a file where the calculated frame duration alternates between 41ms and 42ms (derived from current_frame_time - last_frame_time), but the file provides a default frame duration for the track (41.708333ms) that could be used as a more accurate frame duration in this case. libnestegg exposes the track default duration via nestegg_track_default_duration, but we're not currently using it in WebMDemuxer

Looking at Chromium's WebM parser, there are additional improvements we could adopt:

  • take additional care with timecode rounding/quantization issues when the timecode scale precision is low (WebM defaults to 1ms precision)
  • improve frame duration estimation when no track default duration or block duration is present in the file; Chromium keeps track of the maximum seen frame duration for each cluster and uses this when estimating new frame durations
  • estimate duration of audio blocks for known audio codecs (may not be necessary)

There's also some possible issues around negative timecode handling in WebM that I suspect we don't handle correctly - previously we've treated this as a badly muxed file but Chromium has specific handling for these cases. I'll need to look into this further.

Matthew, could we consider to make this bug P1/P2? As this affects all VP9 videos on Youtube, and small gaps decrease the MSE performance (it's harder to calculate the intersection or find a target interval) Thanks!

Flags: needinfo?(kinetik)

We probably need to handle this better as well.

I've bumped it up to P2, open to making it P1 but unsure if I can get all of these fixes done before the next nightly cycle.

Flags: needinfo?(kinetik)
Priority: P3 → P2

Interesting that DefaultDuration usage was removed from Chromium in 2012, but is back again now.

Duration estimates seem to be used only for when even a default duration is not available.

Attached file webm-logging.txt

Even if the default duration or block duration is set in the container, it would usually have rounding error when 1s is not dividable by frame rate (eg. 24 fps, duration is 0.41666666...) So if the sample is not really a last sample, it seems better to calculate duration based on next sample's time code.

I was curios why the end time of the last sample in appended buffer would be different between the end time from WebMContainerParser and the end time calculated from WebMDemuxer. By adding the log here, I found that in WebMContainerParser we can actually see the timecode for the next sample, even if the data of that sample hasn't been appended yet.

Below logs are from the attached log, this is the first time we got the appended buffer, we can see that we can actually know that the timecode after 3128000000 would be 3170000000.

2024-06-24 21:09:32.355000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(1f2d3ad36d0)::Append: Inserted timecode 3128000000 in 75
2024-06-24 21:09:32.355000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(1f2d3ad36d0)::Append: Inserted timecode 3170000000 in 76
// skip logs...
2024-06-24 21:09:32.355000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaSourceSamples WebMContainerParser[1f2d3ad3600] (video/webm; codecs="vp09.00.51.08.01.01.01.01.00")::ParseStartAndEndTimestamps: [0, 3128000] [fso=220, leo=2599993, l=150 processedIdx=74 fs=2599993]

But in WebMDemuxer, as the demuxer was only looking for a valid sample, the last valid sample is 3086000 and we calculated the next time stamp to 3127000 (3086000 + 410000), where the deviation appeared.

2024-06-24 21:09:32.358000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMDemuxer[1f2c9426800] ::GetNextPacket: GetNextPacket(video): tstamp=3045000, duration=-1, defaultDuration=41708
2024-06-24 21:09:32.358000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMDemuxer[1f2c9426800] ::GetNextPacket: push sample tstamp: 3045000 next_tstamp: 3086000 length: 13937 kf: 0
2024-06-24 21:09:32.358000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMDemuxer[1f2c9426800] ::GetNextPacket: GetNextPacket(video): tstamp=3086000, duration=-1, defaultDuration=41708
2024-06-24 21:09:32.358000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMDemuxer[1f2c9426800] ::DemuxPacket: EOS
2024-06-24 21:09:32.358000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMDemuxer[1f2c9426800] ::GetNextPacket: push sample tstamp: 3086000 next_tstamp: 3127000 length: 2966 kf: 0
2024-06-24 21:09:32.358000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMDemuxer[1f2c9426800] ::DemuxPacket: EOS

When we started paring next appended buffer, we could confirm that 3128000000 was indeed the next timecode. So if we can know that in WebMContainerParser, it's no sense that we can't do that in WebMDemuxer. Maybe we can have a function to know what is the next timecode when EOS happens?

2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaSource TrackBuffersManager[1f2c823bb00] ::SetAppendState: AppendState changed from PARSING_MEDIA_SEGMENT to WAITING_FOR_SEGMENT
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaSourceSamples ContainerParser[1f2d3ad3b70] (video/webm; codecs="vp09.00.51.08.01.01.01.01.00")::IsInitSegmentPresent: aLength=4896299 [1f43b675]
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(162cecf070)::Append: Should get the TimecodeScale first
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaSourceSamples ContainerParser[1f2d3ad3b70] (video/webm; codecs="vp09.00.51.08.01.01.01.01.00")::IsMediaSegmentPresent: aLength=4896299 [1f43b675]
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(162cecf070)::Append: Inserted timecode 3128000000 in 0
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(162cecf070)::Append: Inserted timecode 3170000000 in 1
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(162cecf070)::Append: Inserted timecode 3212000000 in 2
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(162cecf070)::Append: Inserted timecode 3253000000 in 3
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(162cecf070)::Append: Inserted timecode 3295000000 in 4
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(162cecf070)::Append: Inserted timecode 3337000000 in 5
Flags: needinfo?(kinetik)

DefaultDuration is always stored in nanoseconds, so it's not subject to the same level of rounding issues as BlockDuration and block timestamps are. BlockDuration and block timestamps use the media and track timescales, which usually (with default/common values) result in a granularity of 1 millisecond.

I think we should be using the BlockDuration (if present), then the DefaultDuration (if present), and only then falling back to estimating the duration from the next block's timestamp. BlockDuration and duration estimation will both need to deal with rounding issues resulting from the media/track scale after being determined, which I suspect a big part that we're not currently handling correctly.

It shouldn't be too difficult to add a new function to libnestegg to try to peek the next block's timestamp, which for the cases where there's sufficient data available to parse the timestamp but not an entire block will allow the behaviour to match WebMContainerParser.

Flags: needinfo?(kinetik)

Is there a reason why the next block/sample would be in presentation order?
Does NextPacket() return blocks in coding order?

Another benefit of recording the frame duration indicated for the sample (by BlockDuration or DefaultDuration) rather than recording the difference between timestamps is that, when the next sample gets evicted, a duration based on timestamp differences would no longer be appropriate.

See Also: → 1900191
See Also: → 1904750

(In reply to Matthew Gregan [:kinetik] from comment #6)

DefaultDuration is always stored in nanoseconds, so it's not subject to the same level of rounding issues as BlockDuration and block timestamps are. BlockDuration and block timestamps use the media and track timescales, which usually (with default/common values) result in a granularity of 1 millisecond.

I think we should be using the BlockDuration (if present), then the DefaultDuration (if present), and only then falling back to estimating the duration from the next block's timestamp. BlockDuration and duration estimation will both need to deal with rounding issues resulting from the media/track scale after being determined, which I suspect a big part that we're not currently handling correctly.

Shouldn't we respect the timecode from the next sample first? If we always use the block duration or default duration, for the cases which would has deviation on duration, eg. 24fps, adding 41666 us to the timestamp usually doesn't match the timecode of the next sample.

Chromium's DefaultDuration handling was added for https://issues.chromium.org/issues/41094055

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: