Video frame timestamps slide back and forth wildly during playback, causing dropped frames
Categories
(Core :: Audio/Video: cubeb, defect, P1)
Tracking
()
People
(Reporter: mstange, Unassigned)
References
Details
Attachments
(1 obsolete file)
I've been looking into frame drops on 60fps YouTube videos in bug 1649515.
I think I found the main reason for the frame drops: The timestamps on the video frames are unreliable.
Context: When a video is displayed on the screen, the frame to display is picked based on the frame's timestamp. We keep a buffer of around 10 frames in the compositor and pick the latest frame that's not "in the future" compared to the composition time.
In order for this to work reliably, the timestamp of a given frame should be stable.
However, in practice, these timestamps are not stable at all: Whenever the buffer of 10 frames is updated, the timestamps of the video frames slide back and forth by +- 10ms. This is bound to cause frame drops!
To investigate this more, I printed out the timestamps of all video frames in each invocation of VideoSink::RenderVideoFrames
. Here's an example, filtered down to show only the timestamp of frame 1263, computed as (t - TimeStamp::ProcessCreation()).ToMilliseconds()
:
VideoSink::RenderVideoFrames. aClockTime: 20995270, aClockTimeStamp: 69264.220, playbackRate: 1.000
- frame ID 1263 has timestamp: 69302.950ms
VideoSink::RenderVideoFrames. aClockTime: 21006875, aClockTimeStamp: 69273.780, playbackRate: 1.000
- frame ID 1263 has timestamp: 69300.905ms
VideoSink::RenderVideoFrames. aClockTime: 21018500, aClockTimeStamp: 69284.675, playbackRate: 1.000
- frame ID 1263 has timestamp: 69300.175ms
VideoSink::RenderVideoFrames. aClockTime: 21041708, aClockTimeStamp: 69300.582, playbackRate: 1.000
- frame ID 1263 has timestamp: 69292.874ms
It suddenly slides into the past by 8ms! (from 69300.175ms to 69292.874ms)
This video frame timestamp variation is caused by clocktime variation:
VideoSink::RenderVideoFrames. aClockTime: 20995270, aClockTimeStamp: 69264.220
+11.605ms +9.560ms
VideoSink::RenderVideoFrames. aClockTime: 21006875, aClockTimeStamp: 69273.780
+11.625ms +10.895ms
VideoSink::RenderVideoFrames. aClockTime: 21018500, aClockTimeStamp: 69284.675
+23.208ms +15.907ms
VideoSink::RenderVideoFrames. aClockTime: 21041708, aClockTimeStamp: 69300.582
+0ms +10.565ms
VideoSink::RenderVideoFrames. aClockTime: 21041708, aClockTimeStamp: 69311.147
+11.604ms +8.764ms
VideoSink::RenderVideoFrames. aClockTime: 21053312, aClockTimeStamp: 69319.911
+23.229ms +16.889ms
VideoSink::RenderVideoFrames. aClockTime: 21076541, aClockTimeStamp: 69336.800
+0ms +8.992ms
VideoSink::RenderVideoFrames. aClockTime: 21076541, aClockTimeStamp: 69345.792
+11.604ms +9.744ms
VideoSink::RenderVideoFrames. aClockTime: 21088145, aClockTimeStamp: 69355.536
aClockTime
and aClockTimeStamp
are treated as the same point of time, but they change in very different increments.
I think this needs to be fixed if we want stable video playback.
Reporter | ||
Comment 1•4 years ago
•
|
||
The reason for the different clock time increments is the following.
We obtain the clock time values from AudioSinkWrapper::GetPosition
, and this is a video that also plays audio, so the two values are computed as follows:
aClockTimeStamp
is set toTimeStamp::Now()
aClockTime
is set tomAudioSink->GetPosition()
.
These are not the same at all!
I think we need a stable reference timestamp, and then compute aClockTimeStamp
as referenceTimeStamp + (mAudioSink->GetPosition() - audioPositionAtReferenceTimeStamp).MultDouble(mParams.mPlaybackRate).ToTimeDuration()
.
This would make both aClockTimeStamp
and aClockTime
advance in the same increments, and as a result, the video frame timestamps should be rock solid.
Experimentally, I tried to use mPlayStartTime
as referenceTimeStamp
and mPlayDuration
as audioPositionAtReferenceTimeStamp
, but it didn't work very well. The resulting timestamp was too far in the past by over 200ms, causing aggressive re-decoding.
Comment 2•4 years ago
•
|
||
Thank you for finding this!
In addition, just want to make sure I understand the reason of dropping frame in the compositor. So it's like, we have rendered a frame which timestamp is 10, and the next frame's timestamp is supposed to be later than 10. However, when sliding back issue occurs, the next frame's timestamp surprisingly becomes earlier than the timestamp of the frame which we have rendered, and that cause a dropped frame. Do I understand it correctly?
Thank you.
Reporter | ||
Comment 3•4 years ago
•
|
||
(In reply to Alastor Wu [:alwu] from comment #2)
So it's like, we have rendered a frame which timestamp is 10, and the next frame's timestamp is supposed to be later than 10. However, when sliding back issue occurs, the next frame's timestamp surprisingly becomes earlier than the timestamp of the frame which we have rendered, and that cause a dropped frame.
The next frame wouldn't usually become earlier than the previously-rendered frame. It's more that the next frame would be so early that the next + 1 frame would be presented on the next composite, skipping the "next" frame.
Let's say we have three buffered frames A, B and C available in the compositor, at timestamps 10, 26 and 43.
Now we have a composite with compositionTime 22.0. We pick the frame A.
Suddenly ImageComposite::SetImages
is called for the same frames but with timestamps that are slid back by 8ms, giving us 2, 18 and 35.
The next composite is at compositionTime 38.6. We pick frame C, skipping frame B.
Now, another call to ImageComposite::SetImages
moves the timestamps again, this time in the other direction by 12ms, giving us 14, 30 and 47.
The next composite is at compositionTime 55.3. We pick frame C again.
So now we presented A, C, C rather than A, B, C.
Here's an example: https://share.firefox.dev/31GWtyh (I added lots of profiler markers here)
Look at frames 683 to 687. The original time deltas from the webm file between these frames are as follows:
683 - 16.0ms - 684 - 17.0ms - 685 - 17.0ms - 686 - 16.0ms - 687
Rather than presenting 683, 684, 685, 686, we present 683, 684, 686, 686.
We skip from 684 straight to 686 because the timestamps slide backwards by 4ms, so that 686 is just older than the composition timestamp.
The composite at 123182.742ms has frame 684 at 123170.500ms, frame 685 at 123187.500ms, and frame 686 at 123204.500ms, and picks frame 684.
The composite at 123200.359ms has frame 684 at 123166.133ms, frame 685 at 123183.133ms, and frame 686 at 123200.133ms, and picks frame 686.
Here are the deltas between the composition timestamps and the image timestamps for those composites:
Composition at 123167.016ms, presents 683 with timestamp 123154.500ms
+15.726ms +16.000ms
Composition at 123182.742ms, presents 684 with timestamp 123170.500ms
+17.617ms +29.633ms <- moving forwards two images, skipping 785
Composition at 123200.359ms, presents 686 with timestamp 123200.133ms
+15.977ms +7.484ms <- same image, moved by 7.5ms
Composition at 123216.336ms, presents 686 with timestamp 123207.617msa
Another problem is the fact that the deltas between the composites are not consistently 16.67ms. This would be less of a problem if the video frames fell somewhere in the middle between the vsync composite times. But we can look at that once the video frame times are stable.
If uneven composites were the only problem, the numbers would look as follows:
Composition at 123167.016ms, presents 683 with timestamp 23155.375ms
+15.726ms +16.000ms
Composition at 123182.742ms, presents 684 with timestamp 23171.375ms
+17.617ms +34.000ms
Composition at 123200.359ms, presents 686 with timestamp 23205.375ms
+15.977ms +0.000ms
Composition at 123216.336ms, presents 686 with timestamp 23205.375ms
(The deltas between the image timestamps would exactly match the original deltas from the webm file.)
Reporter | ||
Comment 4•4 years ago
|
||
(Notes after a conversation with padenot:)
This problem does not seem to exist on Windows, because on Windows, mAudioSink->GetPosition() interpolates so that it does match TimeStamp::Now().
A simple fix to this might be to do equivalent interpolation on macOS. More specifically, the call to cubeb_stream_get_position
ends up in position()
, which takes self.frames_played
without interpolation, and frames_played
is only updated whenever the audio output callback is called. That audio callback call does have a system timestamp for when the queued audio frames are expected to be played, in (*tstamp).mHostTime, so we should be able to store that and use it in order to do interpolation in position()
.
Comment 5•4 years ago
|
||
Hi, Chun-min, Paul,
Per comment4, I wonder if any of you guys have time to check this get-position issue?
Thank you.
Reporter | ||
Comment 6•4 years ago
|
||
Paul opened https://github.com/ChunMinChang/cubeb-coreaudio-rs/pull/99 with a patch. Based on some first testing it does seem to be making the video frame timestamps significantly more stable.
Updated•4 years ago
|
Reporter | ||
Comment 7•4 years ago
|
||
Updated•4 years ago
|
Reporter | ||
Comment 8•4 years ago
|
||
Bug 1653350 has added a profiler marker that makes this bug easily observable. Here's a profile from watching https://www.youtube.com/watch?v=BSypUV6QUNw: https://share.firefox.dev/2COWANI
If you hover over the VideoFrameTimeStampJitter
markers, you'll see values between -10ms and +10ms. For smooth video playback, these values should all be zero, or at least below 0.1ms.
Reporter | ||
Comment 9•4 years ago
•
|
||
With the fixes from PR #99 and from bug 1654151, the VideoFrameTimeStampJitter
values are a lot more stable: https://share.firefox.dev/3eIbHWl
There are two sources for the remaining variation:
- After #99, the interpolation in
position()
is dependent on the time at which the audio output callback is called. This makes it vulnerable to thread scheduling variation. Furthermore,position()
usescurrent_output_latency_frames
rather thantotal_output_latency_frames
. If it usedtotal_output_latency_frames
, it would become independent of the time at which the callback is called, because the callback timestamp would subtract itself out in the math. Proof - There's another timestamp which could subtract itself out of the video frame timestamp math: The time at which
VideoSink::UpdateRenderedVideoFrames()
, or ratherAudioSinkWrapper::GetPosition()
, is called.GetPosition()
callsmach_absolute_time()
twice, once inTimeStamp::Now()
and once inmAudioSink->GetPosition()
(in the new interpolation code that PR #99 adds, inposition()
). When calculating the timestamp of the video frames, these two timestamps are subtracted from each other. If any amount of time passes between the two calls tomach_absolute_time()
, the values will be slightly off. In other words, there is an opportunity to simplify the math here so that the time of the call toVideoSink::UpdateRenderedVideoFrames()
is no longer used in the calculation. But this will probably require a new cubeb API, for example something likecubeb_stream_get_time_reference
which would return two values: a host timestamp and a position, which would reflect the state as of the most recent call to the audio output callback.
Reporter | ||
Comment 10•4 years ago
|
||
Current status: PR #99 has landed upstream. It briefly made it into autoland in bug 1654699 comment 8, but was backed out again due to test failures. This problem was then filed as issue #106.
Comment 12•4 years ago
|
||
Reporter | ||
Comment 13•4 years ago
•
|
||
(Bug 1655699 landed the update.)
It is fixed! The video frame timestamps are pretty solid now. These changes have reduced the jitter to ~0.01ms in most cases, down from ~10ms.
The second item from comment 9 would improve the situation even more, but things definitely seem good enough now.
Comment 14•4 years ago
|
||
Do we need something uplifted to 80 for this, or should the changes ride the trains?
Reporter | ||
Comment 15•4 years ago
|
||
I think this can just ride the trains. The fix was in an external crate, which ended up being updated along with a few other crates, and the update contained some other changes as well. So it would be a bit of a hassle to only pick up the fix we want.
Updated•4 years ago
|
Description
•