1903466 - Improve WebM timecode handling to address gaps between video frames

Assignee

Description

•

8 months ago

The current WebM timecode handling (particular around computing the duration of frames) could be improved. https://www.youtube.com/watch?v=y5EazNziymk is an example of a file where the calculated frame duration alternates between 41ms and 42ms (derived from current_frame_time - last_frame_time), but the file provides a default frame duration for the track (41.708333ms) that could be used as a more accurate frame duration in this case. libnestegg exposes the track default duration via nestegg_track_default_duration, but we're not currently using it in WebMDemuxer

Looking at Chromium's WebM parser, there are additional improvements we could adopt:

take additional care with timecode rounding/quantization issues when the timecode scale precision is low (WebM defaults to 1ms precision)
improve frame duration estimation when no track default duration or block duration is present in the file; Chromium keeps track of the maximum seen frame duration for each cluster and uses this when estimating new frame durations
estimate duration of audio blocks for known audio codecs (may not be necessary)

There's also some possible issues around negative timecode handling in WebM that I suspect we don't handle correctly - previously we've treated this as a badly muxed file but Chromium has specific handling for these cases. I'll need to look into this further.

Alastor Wu [:alwu]

Comment 1

•

8 months ago

Matthew, could we consider to make this bug P1/P2? As this affects all VP9 videos on Youtube, and small gaps decrease the MSE performance (it's harder to calculate the intersection or find a target interval) Thanks!

Flags: needinfo?(kinetik)

Alastor Wu [:alwu]

Comment 2

•

8 months ago

We probably need to handle this better as well.

Matthew Gregan [:kinetik]

Assignee

Comment 3

•

8 months ago

I've bumped it up to P2, open to making it P1 but unsure if I can get all of these fixes done before the next nightly cycle.

Flags: needinfo?(kinetik)

Priority: P3 → P2

Karl Tomlinson (:karlt back Feb 10)

Comment 4

•

8 months ago

Interesting that DefaultDuration usage was removed from Chromium in 2012, but is back again now.

Duration estimates seem to be used only for when even a default duration is not available.

Alastor Wu [:alwu]

Comment 5

•

8 months ago

Attached file webm-logging.txt — Details

Even if the default duration or block duration is set in the container, it would usually have rounding error when 1s is not dividable by frame rate (eg. 24 fps, duration is 0.41666666...) So if the sample is not really a last sample, it seems better to calculate duration based on next sample's time code.

I was curios why the end time of the last sample in appended buffer would be different between the end time from WebMContainerParser and the end time calculated from WebMDemuxer. By adding the log here, I found that in WebMContainerParser we can actually see the timecode for the next sample, even if the data of that sample hasn't been appended yet.

Below logs are from the attached log, this is the first time we got the appended buffer, we can see that we can actually know that the timecode after 3128000000 would be 3170000000.

2024-06-24 21:09:32.355000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(1f2d3ad36d0)::Append: Inserted timecode 3128000000 in 75
2024-06-24 21:09:32.355000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(1f2d3ad36d0)::Append: Inserted timecode 3170000000 in 76
// skip logs...
2024-06-24 21:09:32.355000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaSourceSamples WebMContainerParser[1f2d3ad3600] (video/webm; codecs="vp09.00.51.08.01.01.01.01.00")::ParseStartAndEndTimestamps: [0, 3128000] [fso=220, leo=2599993, l=150 processedIdx=74 fs=2599993]

But in WebMDemuxer, as the demuxer was only looking for a valid sample, the last valid sample is 3086000 and we calculated the next time stamp to 3127000 (3086000 + 410000), where the deviation appeared.

2024-06-24 21:09:32.358000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMDemuxer[1f2c9426800] ::GetNextPacket: GetNextPacket(video): tstamp=3045000, duration=-1, defaultDuration=41708
2024-06-24 21:09:32.358000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMDemuxer[1f2c9426800] ::GetNextPacket: push sample tstamp: 3045000 next_tstamp: 3086000 length: 13937 kf: 0
2024-06-24 21:09:32.358000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMDemuxer[1f2c9426800] ::GetNextPacket: GetNextPacket(video): tstamp=3086000, duration=-1, defaultDuration=41708
2024-06-24 21:09:32.358000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMDemuxer[1f2c9426800] ::DemuxPacket: EOS
2024-06-24 21:09:32.358000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMDemuxer[1f2c9426800] ::GetNextPacket: push sample tstamp: 3086000 next_tstamp: 3127000 length: 2966 kf: 0
2024-06-24 21:09:32.358000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMDemuxer[1f2c9426800] ::DemuxPacket: EOS

When we started paring next appended buffer, we could confirm that 3128000000 was indeed the next timecode. So if we can know that in WebMContainerParser, it's no sense that we can't do that in WebMDemuxer. Maybe we can have a function to know what is the next timecode when EOS happens?

2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaSource TrackBuffersManager[1f2c823bb00] ::SetAppendState: AppendState changed from PARSING_MEDIA_SEGMENT to WAITING_FOR_SEGMENT
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaSourceSamples ContainerParser[1f2d3ad3b70] (video/webm; codecs="vp09.00.51.08.01.01.01.01.00")::IsInitSegmentPresent: aLength=4896299 [1f43b675]
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(162cecf070)::Append: Should get the TimecodeScale first
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaSourceSamples ContainerParser[1f2d3ad3b70] (video/webm; codecs="vp09.00.51.08.01.01.01.01.00")::IsMediaSegmentPresent: aLength=4896299 [1f43b675]
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(162cecf070)::Append: Inserted timecode 3128000000 in 0
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(162cecf070)::Append: Inserted timecode 3170000000 in 1
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(162cecf070)::Append: Inserted timecode 3212000000 in 2
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(162cecf070)::Append: Inserted timecode 3253000000 in 3
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(162cecf070)::Append: Inserted timecode 3295000000 in 4
2024-06-24 21:09:32.360000 UTC - [Child 51804: MediaSupervisor #2]: D/MediaDemuxer WebMBufferedParser(162cecf070)::Append: Inserted timecode 3337000000 in 5

Flags: needinfo?(kinetik)

Matthew Gregan [:kinetik]

Assignee

Comment 6

•

8 months ago

DefaultDuration is always stored in nanoseconds, so it's not subject to the same level of rounding issues as BlockDuration and block timestamps are. BlockDuration and block timestamps use the media and track timescales, which usually (with default/common values) result in a granularity of 1 millisecond.

I think we should be using the BlockDuration (if present), then the DefaultDuration (if present), and only then falling back to estimating the duration from the next block's timestamp. BlockDuration and duration estimation will both need to deal with rounding issues resulting from the media/track scale after being determined, which I suspect a big part that we're not currently handling correctly.

It shouldn't be too difficult to add a new function to libnestegg to try to peek the next block's timestamp, which for the cases where there's sufficient data available to parse the timestamp but not an entire block will allow the behaviour to match WebMContainerParser.

Flags: needinfo?(kinetik)

Karl Tomlinson (:karlt back Feb 10)

Comment 7

•

8 months ago

Is there a reason why the next block/sample would be in presentation order?
Does NextPacket() return blocks in coding order?

Another benefit of recording the frame duration indicated for the sample (by BlockDuration or DefaultDuration) rather than recording the difference between timestamps is that, when the next sample gets evicted, a duration based on timestamp differences would no longer be appropriate.

Karl Tomlinson (:karlt back Feb 10)

Updated

•

8 months ago

Updated

•

8 months ago

Comment 8

•

8 months ago

(In reply to Matthew Gregan [:kinetik] from comment #6)

DefaultDuration is always stored in nanoseconds, so it's not subject to the same level of rounding issues as BlockDuration and block timestamps are. BlockDuration and block timestamps use the media and track timescales, which usually (with default/common values) result in a granularity of 1 millisecond.

I think we should be using the BlockDuration (if present), then the DefaultDuration (if present), and only then falling back to estimating the duration from the next block's timestamp. BlockDuration and duration estimation will both need to deal with rounding issues resulting from the media/track scale after being determined, which I suspect a big part that we're not currently handling correctly.

Shouldn't we respect the timecode from the next sample first? If we always use the block duration or default duration, for the cases which would has deviation on duration, eg. 24fps, adding 41666 us to the timestamp usually doesn't match the timecode of the next sample.

Karl Tomlinson (:karlt back Feb 10)

Comment 9

•

8 months ago

Chromium's DefaultDuration handling was added for https://issues.chromium.org/issues/41094055

Karl Tomlinson (:karlt back Feb 10)

Updated

•

7 months ago

Comment 10

•

7 months ago

In the original issue you also repoted that "we found some weird parts for Youtube's VP9 stream, such as Invalid ID element size (the maximum should be 4 per spec)".

Reading the spec I think it must be only 4, but this needs to be fixed by Youtube team, if it is real. Is it?

There are precedents where bugs like this were fixed on chromium bugtracker and of course this is probably a bug in ffmpeg anyway, as it is indeed used by google. Webm is just mkv. Mkv famously has no DTS, only PTS and presentation timestamps are not true timestamps. They are in the form of durations instead. Attempts were made to make it higher than millisecond precision too in ffmpeg, but they went nowhere.

val.zapod.vz

Comment 11

•

6 months ago

"WebM defaults to 1ms precision"

Nope. Mkv defaults to it, webm mandates 1 millisecond. That makes typical 24.000 and 24/1.001 impossible to represent and all such streams VFR. For AVC VUI has ticks that signal whether the stream is cfr and what is allow to show 24000 value and 1001, which perfectly represents (!!) THE ACTUAL frame rate.

Next: same duplicated timestamps are allowed:

"e.g. TrueHD packets have a duration < 1ms,
so different packets will end up with the same timestamp when using
1/1000)"
From here:
https://patchwork.ffmpeg.org/project/ffmpeg/patch/20210521002954.1684-1-michael.dirks@xaymar.com/

Also just for the record the H.264 spec (Annex C) and the MPEG-TS spec does cover the case of rounded timestamps. Blu-Ray files are still CFR irrespective of the precision of the timestamps.

Jim Mathies [:jimm]

Updated

•

5 months ago

Severity: S2 → S3

Karl Tomlinson (:karlt back Feb 10)

Updated

•

3 months ago

Bugzilla

Improve WebM timecode handling to address gaps between video frames

Categories

(Core :: Audio/Video: Playback, defect, P2)

Tracking

()

People

(Reporter: kinetik, Assigned: kinetik)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Updated

Comment 8

Comment 9

Updated

Comment 10

Comment 11

Updated

Updated

Attachment

General

Description

File Name

Content Type