Slow video decoding causes video to get out of sync

RESOLVED FIXED in Firefox 37

Status

()

RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: bholley, Assigned: bholley)

Tracking

(Blocks: 1 bug)

unspecified
mozilla37
x86
Mac OS X
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox36 wontfix, firefox37 fixed, firefox38 unaffected)

Details

Attachments

(5 attachments)

(Assignee)

Description

4 years ago
I have some patches that busy-wait in the video PDM. With bug 1114830, we successfully play the audio without jittering in and out of buffering mode, but the video gets behind. Chris says that we have code in MediaDecoderStateMachine::AdvanceFrame that's supposed to avoid setting late frames as the current frame.
(Assignee)

Comment 1

4 years ago
Thinking a bit more about this though - wouldn't we also need code that tells the decoder to fast-forward and skip over the stuff that's already too late? Otherwise the decoder will keep outputting most-recent frames that are further and further behind the current time.

Do we have code to do that? If so, where does it live?
Flags: needinfo?(cpearce)
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #1)
> Thinking a bit more about this though - wouldn't we also need code that
> tells the decoder to fast-forward and skip over the stuff that's already too
> late? Otherwise the decoder will keep outputting most-recent frames that are
> further and further behind the current time.
> 
> Do we have code to do that? If so, where does it live?

That's the skip-to-next-keyframe logic? It passes aTimeThresold into RequestVideoData when aSkip is true, which is the current playback position.
Flags: needinfo?(cpearce)
(Assignee)

Comment 3

4 years ago
I debugged this a bit. What's happening is that we're decoding video slowly enough that we never buffer enough video to get out of mIsVideoPrerolling, which means that the skip-to-next-keyframe logic is never allowed to kick in. This, in turn, always causes us to throw away everything but the most recent sample in AdvanceFrame (since they're all way behind), which feeds back into the mIsVideoPrerolling problem. So the video just gets further and further behind the audio.

Should we perhaps also bail out of mIsVideoPrerolling after hitting a certain point in the timeline (say, 1s)?
Flags: needinfo?(cpearce)
The idea of "prerolling" is to give the decode chance to fill its buffers before we start playback, so that the skip-to-next-keyframe and buffering logic don't kick in and stop playback or cause glitches.

Perhaps a better solution is to not actually start playback until the decode has filled the AudioQueue() and VideoQueue(). Then we wouldn't need these special cases.

Adding a timer to unset mIsVideoPrerolling would probably work too, but we'd need to handle canceling the timer.
Flags: needinfo?(cpearce)
(Assignee)

Comment 9

4 years ago
Created attachment 8541930 [details] [diff] [review]
Part 1 - Make MediaDecoderStateMachine::Play run on the state machine thread. v1

We want to invoke StartDecoding() from Play, but that calls into a bunch of stuff
that asserts that we're on the state machine thread. It's not clear to me whether
that's actually necessary, but this seems like the right thing to do regardless
given that this is all supposed to be async anyway.
Attachment #8541930 - Flags: review?(cpearce)
(Assignee)

Comment 10

4 years ago
Created attachment 8541931 [details] [diff] [review]
Part 2 - Use StartDecoding() rather than SetState in PlayInternal. v1

This makes sure that decode tasks are dispatched and that all the preroll state
is appropriately set up.
Attachment #8541931 - Flags: review?(cpearce)
(Assignee)

Comment 11

4 years ago
Created attachment 8541932 [details] [diff] [review]
Part 3 - Hoist arms-length conditions and rename StartPlayback() to MaybeStartPlayback(). v1
Attachment #8541932 - Flags: review?(cpearce)
(Assignee)

Comment 12

4 years ago
Created attachment 8541933 [details] [diff] [review]
Part 4 - Dynamically compute preroll thresholds. v1

Currently, the preroll threshold ends up higher than the ample threshold in the
audio-only case where we slash the audio thresholds by a factor of 8. The best
way to avoid these sorts of bugs is to compute the values dynamically.
Attachment #8541933 - Flags: review?(cpearce)
(Assignee)

Comment 13

4 years ago
Created attachment 8541934 [details] [diff] [review]
Part 5 - Don't start playback during prerolling. v1
Attachment #8541934 - Flags: review?(cpearce)
Comment on attachment 8541930 [details] [diff] [review]
Part 1 - Make MediaDecoderStateMachine::Play run on the state machine thread. v1

Review of attachment 8541930 [details] [diff] [review]:
-----------------------------------------------------------------

I think this won't work because the other state transitions (specifically the call to MDSM::Seek()) happen on the main thread and are processed synchronously. So if the MediaDecoder calls MDSM::Play(), and then MDSM::Seek(), the play task would be enqueued, then MSDM::Seek() runs and changes the state to seeking, and then the play task could run and overwrite the state from seeking to playing, and the state becomes inconsistent.

You can assume r+ on a patch to change the MOZ_ASSERT_IF's in void MediaDecoderStateMachine::Play(), but not the patch to make MDSM::Play() async.
Attachment #8541930 - Flags: review?(cpearce) → review-
Attachment #8541931 - Flags: review?(cpearce) → review+
Attachment #8541932 - Flags: review?(cpearce) → review+
Comment on attachment 8541933 [details] [diff] [review]
Part 4 - Dynamically compute preroll thresholds. v1

Review of attachment 8541933 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaDecoderStateMachine.h
@@ +942,5 @@
>    // behind. Otherwise our "we're falling behind" logic will trigger
>    // unneccessarily if we start playing as soon as the first sample is
>    // decoded. These two fields store how many video frames and audio
>    // samples we must consume before are considered to be finished prerolling.
> +  uint32_t AudioPrerollUsecs()

Both of these functions can be const right?

@@ +944,5 @@
>    // decoded. These two fields store how many video frames and audio
>    // samples we must consume before are considered to be finished prerolling.
> +  uint32_t AudioPrerollUsecs()
> +  {
> +    if (mIsRealTime) {

if (mScheduler->IsRealTime()) {
  ...

@@ +955,5 @@
> +  }
> +  uint32_t VideoPrerollFrames() { return mIsRealTime ? 0 : mAmpleVideoFrames / 2; }
> +
> +  // A cache of mScheduler->IsRealTime().
> +  bool mIsRealTime;

I don't think you should cache mScheduler->IsRealTime(), as there are still many other places in MediaDecoderStateMachine that don't use this cached value you added, they call mScheduler->IsRealTime(). i.e. you're making the code less consistent, adding multiple sources of truth, with this change.

Add a (const if possible) helper function that returns mScheduler->IsRealTime() if you're offended by having to deref mScheduler all over the place. And then replace all calls to mScheduler->IsRealTime() with the helper function so we're consistent.
Attachment #8541933 - Flags: review?(cpearce) → review+
Comment on attachment 8541934 [details] [diff] [review]
Part 5 - Don't start playback during prerolling. v1

Review of attachment 8541934 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaDecoderStateMachine.cpp
@@ +882,5 @@
>      mFirstVideoFrameAfterSeek = nullptr;
>    }
> +  if (isAudio) {
> +    AudioQueue().Finish();
> +    mIsAudioPrerolling = false;

Whenever mIs{Audio,Video}Prerolling changes from false to true, we could need to start playback, right? i.e. you should probably ScheduleStateMachine() whenever mIs{Audio,Video}Prerolling changes?

You should probably have a helper function that resets mIs{Audio,Video}Prerolling and calls ScheduleStateMachine(), and use that wherever you change mIs{Audio,Video}Prerolling.

@@ +977,3 @@
>      case DECODER_STATE_DECODING: {
>        Push(video);
> +      if (mIsVideoPrerolling && DonePrerollingVideo())

if (...) {

Braces riding high.

::: dom/media/MediaDecoderStateMachine.h
@@ +956,5 @@
>    uint32_t VideoPrerollFrames() { return mIsRealTime ? 0 : mAmpleVideoFrames / 2; }
>  
> +  bool DonePrerollingAudio()
> +  {
> +    return !IsAudioDecoding() || GetDecodedAudioDuration() >= AudioPrerollUsecs() * mPlaybackRate;

Assert monitor held in both of these helper functions.
Attachment #8541934 - Flags: review?(cpearce) → review+
(Assignee)

Comment 17

4 years ago
(In reply to Chris Pearce (:cpearce) (PTO until 5 Jan) from comment #14)
> I think this won't work because the other state transitions (specifically
> the call to MDSM::Seek()) happen on the main thread and are processed
> synchronously. So if the MediaDecoder calls MDSM::Play(), and then
> MDSM::Seek(), the play task would be enqueued, then MSDM::Seek() runs and
> changes the state to seeking, and then the play task could run and overwrite
> the state from seeking to playing, and the state becomes inconsistent.

My original patch actually did this with seek, but there were minor issues and I felt the seek stuff was a distraction. I could easily revive it though, which I believe would solve this issue.

> You can assume r+ on a patch to change the MOZ_ASSERT_IF's in void
> MediaDecoderStateMachine::Play(), but not the patch to make MDSM::Play()
> async.

It's not an issue of the asserts in play - it's all of the dependent helpers that end up getting invoked when we do StartDecoding. In general, the situation with these things is really dumb - every method just asserts the union of whatever threads people have observed it running on, without any sort of logic as to _why_ that's important (especially in the case where we're asserting main-threadedness while accessing state machine state).

It seems to me like we either need to make more of an effort to manipulate state on a designated thread, or we should get rid of the assertions entirely, since our safety model is effectively "use whatever thread you want and hold the lock". How do you want to proceed?

(In reply to Chris Pearce (:cpearce) (PTO until 5 Jan) from comment #15)
> Add a (const if possible) helper function that returns
> mScheduler->IsRealTime() if you're offended by having to deref mScheduler
> all over the place. And then replace all calls to mScheduler->IsRealTime()
> with the helper function so we're consistent.

The issue was just one of include dependencies and the fact that mScheduler currently has an incomplete type in MDSM.h. I can find a way around it though.
Flags: needinfo?(cpearce)
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #17)
> (In reply to Chris Pearce (:cpearce) (PTO until 5 Jan) from comment #14)
> > I think this won't work because the other state transitions (specifically
> > the call to MDSM::Seek()) happen on the main thread and are processed
> > synchronously. So if the MediaDecoder calls MDSM::Play(), and then
> > MDSM::Seek(), the play task would be enqueued, then MSDM::Seek() runs and
> > changes the state to seeking, and then the play task could run and overwrite
> > the state from seeking to playing, and the state becomes inconsistent.
> 
> My original patch actually did this with seek, but there were minor issues
> and I felt the seek stuff was a distraction. I could easily revive it
> though, which I believe would solve this issue.

There's also MDSM::SetDormant() and MDSM::Shutdown() which transition on the main thread.

We're not consistent on which thread we we transition the state. We should ideally be, but we're better to limit our scope here if we can.


> > You can assume r+ on a patch to change the MOZ_ASSERT_IF's in void
> > MediaDecoderStateMachine::Play(), but not the patch to make MDSM::Play()
> > async.
> 
> It's not an issue of the asserts in play - it's all of the dependent helpers
> that end up getting invoked when we do StartDecoding. In general, the
> situation with these things is really dumb - every method just asserts the
> union of whatever threads people have observed it running on, without any
> sort of logic as to _why_ that's important (especially in the case where
> we're asserting main-threadedness while accessing state machine state).
> 
> It seems to me like we either need to make more of an effort to manipulate
> state on a designated thread, or we should get rid of the assertions
> entirely, since our safety model is effectively "use whatever thread you
> want and hold the lock". How do you want to proceed?

Yeah, the code has not aged well. I don't think we should try to fix it /right now/, since we want to ship MSE in Firefox 36.

Ideally, we'd only call SetState() on the state machine thread. That way all state transitions would be synchronized via a single task queue. We should work towards that.

I think you should be able to keep the dispatch to PlayInternal() in your Patch 1 if you added a check inside PlayInternal() so that it bails out if the state machine is not in DECODING, COMPELTED, or BUFFERING state. That should prevent an enqueued PlayInternal task overwriting a seek task. Make sense? Test that, and if it works, assume r+ on patch 1.
Flags: needinfo?(cpearce)
Comment on attachment 8541930 [details] [diff] [review]
Part 1 - Make MediaDecoderStateMachine::Play run on the state machine thread. v1

Review of attachment 8541930 [details] [diff] [review]:
-----------------------------------------------------------------

The dispatch to PlayInternal() should be OK if you add a check inside PlayInternal() so that it bails out if the state machine is *not* in DECODING, COMPELTED, or BUFFERING state. That should prevent an enqueued PlayInternal task overwriting a seek task's state change. Make sense? Test that, and if it works, r+.
Attachment #8541930 - Flags: review- → review+
(Assignee)

Updated

4 years ago
Assignee: nobody → bobbyholley
Bobby: Does this need uplift to 38?
Flags: needinfo?(giles)
Flags: needinfo?(bobbyholley)
to 36, rather.
Flags: needinfo?(giles)
status-firefox36: --- → affected
status-firefox37: --- → fixed
status-firefox38: --- → unaffected
(Assignee)

Comment 25

4 years ago
I think that would have been smart, though it's unfortunate that we've now waited a month. I'm concerned that this means that other patches that we've developed against Nightly (with these patches) will now have less-certain behavior when backported to Beta (without it).

I'd vote for uplift, but could be convinced that it's too late.
Flags: needinfo?(bobbyholley)
Yes, it's unfortunate that we missed this. Given we don't have evidence this helps YouTube, I think we should skip it for 36 at this point.
status-firefox36: affected → wontfix

Updated

4 years ago
Depends on: 1156860

Updated

3 years ago
Depends on: 1185350
You need to log in before you can comment on or make changes to this bug.