Closed Bug 1649859 Opened 4 years ago Closed 4 years ago

Video frame timestamps slide back and forth wildly during playback, causing dropped frames

Categories

(Core :: Audio/Video: cubeb, defect, P1)

defect

Tracking

()

RESOLVED FIXED
81 Branch
Tracking Status
firefox-esr68 --- wontfix
firefox-esr78 --- wontfix
firefox78 --- wontfix
firefox79 --- wontfix
firefox80 --- wontfix
firefox81 --- fixed

People

(Reporter: mstange, Unassigned)

References

Details

Attachments

(1 obsolete file)

I've been looking into frame drops on 60fps YouTube videos in bug 1649515.

I think I found the main reason for the frame drops: The timestamps on the video frames are unreliable.

Context: When a video is displayed on the screen, the frame to display is picked based on the frame's timestamp. We keep a buffer of around 10 frames in the compositor and pick the latest frame that's not "in the future" compared to the composition time.
In order for this to work reliably, the timestamp of a given frame should be stable.

However, in practice, these timestamps are not stable at all: Whenever the buffer of 10 frames is updated, the timestamps of the video frames slide back and forth by +- 10ms. This is bound to cause frame drops!

To investigate this more, I printed out the timestamps of all video frames in each invocation of VideoSink::RenderVideoFrames. Here's an example, filtered down to show only the timestamp of frame 1263, computed as (t - TimeStamp::ProcessCreation()).ToMilliseconds():

VideoSink::RenderVideoFrames. aClockTime: 20995270, aClockTimeStamp: 69264.220, playbackRate: 1.000
  - frame ID 1263 has timestamp: 69302.950ms
VideoSink::RenderVideoFrames. aClockTime: 21006875, aClockTimeStamp: 69273.780, playbackRate: 1.000
  - frame ID 1263 has timestamp: 69300.905ms
VideoSink::RenderVideoFrames. aClockTime: 21018500, aClockTimeStamp: 69284.675, playbackRate: 1.000
  - frame ID 1263 has timestamp: 69300.175ms
VideoSink::RenderVideoFrames. aClockTime: 21041708, aClockTimeStamp: 69300.582, playbackRate: 1.000
  - frame ID 1263 has timestamp: 69292.874ms

It suddenly slides into the past by 8ms! (from 69300.175ms to 69292.874ms)

This video frame timestamp variation is caused by clocktime variation:

VideoSink::RenderVideoFrames. aClockTime: 20995270, aClockTimeStamp: 69264.220
                                           +11.605ms                  +9.560ms
VideoSink::RenderVideoFrames. aClockTime: 21006875, aClockTimeStamp: 69273.780
                                           +11.625ms                  +10.895ms
VideoSink::RenderVideoFrames. aClockTime: 21018500, aClockTimeStamp: 69284.675
                                           +23.208ms                  +15.907ms
VideoSink::RenderVideoFrames. aClockTime: 21041708, aClockTimeStamp: 69300.582
                                           +0ms                       +10.565ms
VideoSink::RenderVideoFrames. aClockTime: 21041708, aClockTimeStamp: 69311.147
                                           +11.604ms                  +8.764ms
VideoSink::RenderVideoFrames. aClockTime: 21053312, aClockTimeStamp: 69319.911
                                           +23.229ms                  +16.889ms
VideoSink::RenderVideoFrames. aClockTime: 21076541, aClockTimeStamp: 69336.800
                                           +0ms                       +8.992ms
VideoSink::RenderVideoFrames. aClockTime: 21076541, aClockTimeStamp: 69345.792
                                           +11.604ms                  +9.744ms
VideoSink::RenderVideoFrames. aClockTime: 21088145, aClockTimeStamp: 69355.536

aClockTime and aClockTimeStamp are treated as the same point of time, but they change in very different increments.

I think this needs to be fixed if we want stable video playback.

The reason for the different clock time increments is the following.

We obtain the clock time values from AudioSinkWrapper::GetPosition, and this is a video that also plays audio, so the two values are computed as follows:

  • aClockTimeStamp is set to TimeStamp::Now()
  • aClockTime is set to mAudioSink->GetPosition().

These are not the same at all!

I think we need a stable reference timestamp, and then compute aClockTimeStamp as referenceTimeStamp + (mAudioSink->GetPosition() - audioPositionAtReferenceTimeStamp).MultDouble(mParams.mPlaybackRate).ToTimeDuration().
This would make both aClockTimeStamp and aClockTime advance in the same increments, and as a result, the video frame timestamps should be rock solid.

Experimentally, I tried to use mPlayStartTime as referenceTimeStamp and mPlayDuration as audioPositionAtReferenceTimeStamp, but it didn't work very well. The resulting timestamp was too far in the past by over 200ms, causing aggressive re-decoding.

Thank you for finding this!

In addition, just want to make sure I understand the reason of dropping frame in the compositor. So it's like, we have rendered a frame which timestamp is 10, and the next frame's timestamp is supposed to be later than 10. However, when sliding back issue occurs, the next frame's timestamp surprisingly becomes earlier than the timestamp of the frame which we have rendered, and that cause a dropped frame. Do I understand it correctly?

Thank you.

Flags: needinfo?(mstange.moz)

(In reply to Alastor Wu [:alwu] from comment #2)

So it's like, we have rendered a frame which timestamp is 10, and the next frame's timestamp is supposed to be later than 10. However, when sliding back issue occurs, the next frame's timestamp surprisingly becomes earlier than the timestamp of the frame which we have rendered, and that cause a dropped frame.

The next frame wouldn't usually become earlier than the previously-rendered frame. It's more that the next frame would be so early that the next + 1 frame would be presented on the next composite, skipping the "next" frame.

Let's say we have three buffered frames A, B and C available in the compositor, at timestamps 10, 26 and 43.
Now we have a composite with compositionTime 22.0. We pick the frame A.
Suddenly ImageComposite::SetImages is called for the same frames but with timestamps that are slid back by 8ms, giving us 2, 18 and 35.
The next composite is at compositionTime 38.6. We pick frame C, skipping frame B.
Now, another call to ImageComposite::SetImages moves the timestamps again, this time in the other direction by 12ms, giving us 14, 30 and 47.
The next composite is at compositionTime 55.3. We pick frame C again.

So now we presented A, C, C rather than A, B, C.

Here's an example: https://share.firefox.dev/31GWtyh (I added lots of profiler markers here)

Look at frames 683 to 687. The original time deltas from the webm file between these frames are as follows:
683 - 16.0ms - 684 - 17.0ms - 685 - 17.0ms - 686 - 16.0ms - 687

Rather than presenting 683, 684, 685, 686, we present 683, 684, 686, 686.

We skip from 684 straight to 686 because the timestamps slide backwards by 4ms, so that 686 is just older than the composition timestamp.

The composite at 123182.742ms has frame 684 at 123170.500ms, frame 685 at 123187.500ms, and frame 686 at 123204.500ms, and picks frame 684.
The composite at 123200.359ms has frame 684 at 123166.133ms, frame 685 at 123183.133ms, and frame 686 at 123200.133ms, and picks frame 686.

Here are the deltas between the composition timestamps and the image timestamps for those composites:

Composition at 123167.016ms, presents 683 with timestamp 123154.500ms
                  +15.726ms                               +16.000ms
Composition at 123182.742ms, presents 684 with timestamp 123170.500ms
                  +17.617ms                               +29.633ms <- moving forwards two images, skipping 785
Composition at 123200.359ms, presents 686 with timestamp 123200.133ms
                  +15.977ms                               +7.484ms <- same image, moved by 7.5ms
Composition at 123216.336ms, presents 686 with timestamp 123207.617msa

Another problem is the fact that the deltas between the composites are not consistently 16.67ms. This would be less of a problem if the video frames fell somewhere in the middle between the vsync composite times. But we can look at that once the video frame times are stable.
If uneven composites were the only problem, the numbers would look as follows:

Composition at 123167.016ms, presents 683 with timestamp 23155.375ms
                  +15.726ms                               +16.000ms
Composition at 123182.742ms, presents 684 with timestamp 23171.375ms
                  +17.617ms                               +34.000ms
Composition at 123200.359ms, presents 686 with timestamp 23205.375ms
                  +15.977ms                               +0.000ms
Composition at 123216.336ms, presents 686 with timestamp 23205.375ms

(The deltas between the image timestamps would exactly match the original deltas from the webm file.)

Flags: needinfo?(mstange.moz)

(Notes after a conversation with padenot:)

This problem does not seem to exist on Windows, because on Windows, mAudioSink->GetPosition() interpolates so that it does match TimeStamp::Now().

A simple fix to this might be to do equivalent interpolation on macOS. More specifically, the call to cubeb_stream_get_position ends up in position(), which takes self.frames_played without interpolation, and frames_played is only updated whenever the audio output callback is called. That audio callback call does have a system timestamp for when the queued audio frames are expected to be played, in (*tstamp).mHostTime, so we should be able to store that and use it in order to do interpolation in position().

Hi, Chun-min, Paul,
Per comment4, I wonder if any of you guys have time to check this get-position issue?
Thank you.

Component: Audio/Video: Playback → Audio/Video: cubeb
Flags: needinfo?(padenot)
Flags: needinfo?(cchang)

Paul opened https://github.com/ChunMinChang/cubeb-coreaudio-rs/pull/99 with a patch. Based on some first testing it does seem to be making the video frame timestamps significantly more stable.

Flags: needinfo?(padenot)
Flags: needinfo?(cchang)
Severity: -- → S3
Priority: -- → P1
Depends on: 1653350
Attachment #9164879 - Attachment is obsolete: true
Depends on: 1654151

Bug 1653350 has added a profiler marker that makes this bug easily observable. Here's a profile from watching https://www.youtube.com/watch?v=BSypUV6QUNw: https://share.firefox.dev/2COWANI
If you hover over the VideoFrameTimeStampJitter markers, you'll see values between -10ms and +10ms. For smooth video playback, these values should all be zero, or at least below 0.1ms.

With the fixes from PR #99 and from bug 1654151, the VideoFrameTimeStampJitter values are a lot more stable: https://share.firefox.dev/3eIbHWl

There are two sources for the remaining variation:

  • After #99, the interpolation in position() is dependent on the time at which the audio output callback is called. This makes it vulnerable to thread scheduling variation. Furthermore, position() uses current_output_latency_frames rather than total_output_latency_frames. If it used total_output_latency_frames, it would become independent of the time at which the callback is called, because the callback timestamp would subtract itself out in the math. Proof
  • There's another timestamp which could subtract itself out of the video frame timestamp math: The time at which VideoSink::UpdateRenderedVideoFrames(), or rather AudioSinkWrapper::GetPosition(), is called. GetPosition() calls mach_absolute_time() twice, once in TimeStamp::Now() and once in mAudioSink->GetPosition() (in the new interpolation code that PR #99 adds, in position()). When calculating the timestamp of the video frames, these two timestamps are subtracted from each other. If any amount of time passes between the two calls to mach_absolute_time(), the values will be slightly off. In other words, there is an opportunity to simplify the math here so that the time of the call to VideoSink::UpdateRenderedVideoFrames() is no longer used in the calculation. But this will probably require a new cubeb API, for example something like cubeb_stream_get_time_reference which would return two values: a host timestamp and a position, which would reflect the state as of the most recent call to the audio output callback.

Current status: PR #99 has landed upstream. It briefly made it into autoland in bug 1654699 comment 8, but was backed out again due to test failures. This problem was then filed as issue #106.

No longer depends on: 1654151
Depends on: 1655699

How is the situation now? The position code is updated in #99 and #107.

Flags: needinfo?(mstange.moz)

(Bug 1655699 landed the update.)

It is fixed! The video frame timestamps are pretty solid now. These changes have reduced the jitter to ~0.01ms in most cases, down from ~10ms.

The second item from comment 9 would improve the situation even more, but things definitely seem good enough now.

Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(mstange.moz)
Resolution: --- → FIXED
Target Milestone: --- → 81 Branch

Do we need something uplifted to 80 for this, or should the changes ride the trains?

Flags: needinfo?(mstange.moz)
Flags: needinfo?(cchang)

I think this can just ride the trains. The fix was in an external crate, which ended up being updated along with a few other crates, and the update contained some other changes as well. So it would be a bit of a hassle to only pick up the fix we want.

Flags: needinfo?(mstange.moz)
Flags: needinfo?(cchang)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: