Fast seek (inaccurate seeks) will almost certainly leads to broken A/V Sync

RESOLVED FIXED in Firefox 42

Status

()

defect
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: jya, Assigned: jya)

Tracking

Trunk
mozilla43
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox42 fixed, firefox43 fixed)

Details

Attachments

(1 attachment)

Assignee

Description

4 years ago
Was reviewing the seek code as part of bug 1159343.

And I took notice of the PrevSyncPoint seek type.

So if we seek to any time, the audio seek will likely be very accurate (as all frames are keyframes), but the video could be several seconds behind.

It doesn't appear that there is anything in place to dismiss all future video frames until we've caught up with audio.

Or maybe I'm reading it wrong.
Assignee

Updated

4 years ago
Flags: needinfo?(cpearce)
With PrevSyncPoint seeking, the MP4Reader seeks the video stream first, finds the keyframe time, and then seeks the audio to that time. MediaFormatReader should be doing the same thing.
Flags: needinfo?(cpearce)
Assignee

Comment 2

4 years ago
Do not only rely on the MediaDecoderState machine to keep A/V sync after a seek as should we seeked in fast mode ; it wouldn't
Attachment #8645701 - Flags: review?(cpearce)
I don't understand why we can't count on MDSM for A/V sync. Even if the timestamps do not align after seeking, MDSM should still be able to handle that. It is like audio/video tracks have different start times and MDSM would send silent frames for padding.
Assignee

Comment 4

4 years ago
neither do I.

But unless we did so, we had broken A/V sync (bug 1105066)

I have managed to reproduce it with some youtube videos recently.

Now there probably will be a better fix in MDSM, but in the mean time.. this works well and simply restore a past behaviour available in MP4Reader and MediaSourceReader.
Assignee

Comment 5

4 years ago
JW, comment https://bugzilla.mozilla.org/show_bug.cgi?id=1105066#c5 is the interesting part for you.
(In reply to Jean-Yves Avenard [:jya] from comment #0)
> Was reviewing the seek code as part of bug 1159343.
> 
> And I took notice of the PrevSyncPoint seek type.
> 
> So if we seek to any time, the audio seek will likely be very accurate (as
> all frames are keyframes), but the video could be several seconds behind.
> 
> It doesn't appear that there is anything in place to dismiss all future
> video frames until we've caught up with audio.
> 
> Or maybe I'm reading it wrong.

You are reading it wrong.

With PrevSyncPoint seek type, we seek both audio and video streams to the *video* keyframe preceding the seek target. So that means we must seek the video stream to the video keyframe preceding the seek target, and then seek the audio stream to that video keyframe's timestamp.

From your comment above you've implied that we should be seeking both streams as close as possible to the audio keyframe preceding the seek target, but that's not the case. The goal of PrevSyncPoint is to seek such that you can playback immediately at the *video* keyframe preceding the seek target as fast as possible.

The goal with fastSeek is to eliminate the delay in video decoding when we decode from the previous keyframe up to the seek target in order to fulfil an accurate seek request; this could be very slow on mobile.

If you integrated with the demuxer closely (as we probably can now do with the new MediaFormatReader), you can probably calculate in advance of the video seek completing where the target video keyframe is, and so you could seek the audio stream in parallel.
Assignee

Comment 7

4 years ago
(In reply to Chris Pearce (:cpearce) from comment #6)

> With PrevSyncPoint seek type, we seek both audio and video streams to the
> *video* keyframe preceding the seek target. So that means we must seek the
> video stream to the video keyframe preceding the seek target, and then seek
> the audio stream to that video keyframe's timestamp.
> 
> From your comment above you've implied that we should be seeking both
> streams as close as possible to the audio keyframe preceding the seek
> target, but that's not the case. The goal of PrevSyncPoint is to seek such
> that you can playback immediately at the *video* keyframe preceding the seek
> target as fast as possible.

sure, the problem here is that no such information is available to the MediaDecoderReader. There's nothing that tells it what type of seeking it should be doing.

We've always assumed that the MDSM will then do the right thing.

> 
> The goal with fastSeek is to eliminate the delay in video decoding when we
> decode from the previous keyframe up to the seek target in order to fulfil
> an accurate seek request; this could be very slow on mobile.
> 
> If you integrated with the demuxer closely (as we probably can now do with
> the new MediaFormatReader), you can probably calculate in advance of the
> video seek completing where the target video keyframe is, and so you could
> seek the audio stream in parallel.

This is exactly what the patch I've submitted for your review is doing.

There is one particular corner case I had to implement however is that you can't always seek the audio exactly to the video keyframe ; this issue was noted in bug 1173792.

When you start youtube with a parameter so it seeks exactly at some value (in the bug 61s) ;
it loads video with [58-70] and audio with [60-70]

The keyframe in the video buffer is at 58s ; it's not possible to seek audio to 58s as there's no such data in the source buffers, and it will never be loaded.

So what the MediaFormatReader does instead is attempt to seek to where the video seeked to ; and if that failed due to missing data ; then it will seek instead to the closest buffered point.

If the closest buffered point is further than the original seek, then it will enter in WAITING_FOR_DATA mode and will not resolve the seek.

Now, we could handle the case where we couldn't exactly seek the audio and video to the same point, like the example in bug 1173792. However, this would required the MediaFormatReader to start decoding frames and drop them before returning to the MDSM.
This is a very easy thing to do, but if the aim of the fast seek is to seek as quickly as possible, there's no guarantee the gap between audio and video is going to be small. For all we know, the key frames could be 128 frames appart, requiring several seconds to decode + drop.

So as such, I think that not handling this case is better from a performance point of view.
Attachment #8645701 - Flags: review?(cpearce) → review+
(In reply to Chris Pearce (:cpearce) from comment #6)
> You are reading it wrong.

My understanding of how seeking worked predated the great MSDM refactoring, where every seek was accurate by default, whereas now every seek is fast by default. My understanding has been updated now.
AudioSink will pad silence to fill the gap between the start time and the first audio sample when |start time < sample.mTime|. However, I can't find the code to deal with the case where |sample.mTime < start time|. AudioSink should discard samples predating the start time. Checking...
Assignee

Updated

4 years ago
Blocks: 1193124
https://hg.mozilla.org/mozilla-central/rev/4e6233840d00
Assignee: nobody → jyavenard
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Assignee

Updated

4 years ago
Blocks: 1197083
Backed out for a youtube playback regression. See Bug 1199573.
https://hg.mozilla.org/releases/mozilla-aurora/rev/5bb661db5c6c
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.