Closed Bug 589626 Opened 14 years ago Closed 14 years ago

HTML5 Video buffering logic inconsistent

Categories

(Core :: Audio/Video, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- final+

People

(Reporter: cpearce, Assigned: cpearce)

References

Details

Attachments

(3 files, 5 obsolete files)

Our buffering logic isn't too good. This was largely aggravated/caused by my checkin of bug 508082, but there were still things inconsistent about it before that.

The specific problem I saw: if I use a rate-controlled webserver, and a variable bitrate video, we can end up getting into a situation where we keep flipping between readyStates HAVE_ENOUGH_DATA and HAVE_CURRENT_DATA, and switching back and forth between buffering and playing for about 30 seconds. This stopping and starting is a very poor user experience, and in this case if we actually buffered for a few seconds longer, we'd actually be able to resume playback without all the stopping and starting.

Issues with our code/logic:
1. Thanks to my change in bug 543769 we start buffering if we're low on audio or low on decoded frames. Since we've got skip-to-video-keyframe logic, it doesn't really make sense to buffer if we're low on video frames (unless we don't have audio) as this should be handled by the skip-to-keyframe logic.
2. Thanks to my change in bug 543769, we resume buffering if it looks like the decode will not catch up with the download (if CanPlayThrough()). This is a different heuristic to what we use to decide to start buffering (we start buffering if the decode actually does catch up with the download), so they can disagree, and so we can end up toggling between buffering and not buffering. This is particularly a problem in variable bitrate videos, when the bitrate spikes to more than the download rate.
3. We don't run the decoder when buffering; we set mBufferExhausted when the decode catches up with the download, and the decode thread Wait()s until !mBufferExhausted, which is set by the state machine thread when it finishes buffering.
4. We're racing: When we come out of buffering state we restart the decoder threads. Then the next time the state machine thread loop enters its DECODING state, it checks if we've got no video frames decoded, and if we don't, we'll start buffering again (thanks to 1.). However the decoder won't run while we're buffering (thanks to 3.), so the decoder thread will be racing to produce a frame, and the state machine thread will be racing to consume a frame. Unless the decoder has made rapid progress, it's likely that there's no video frames decoded, so we'll go back into buffering (thanks to 1.), and pause the decoder. If we don't actually produce a frame, we can get stuck in a cycle of start/stop buffering.

Solutions:
1. Don't start buffering if we are low on video frames, unless we're only playing video. Let the keyframe skipping logic handle this situation.
2. Don't stop the decode when it catches up with the download. We can rely on the media stream reads being blocking to achieve the decode slowing down when it's low on data. The decode will also pause naturally when its buffers get full.
3. Use the same, or at least complimentary, heuristics for starting and stopping buffering.
This should really block 2.0, it's a pretty bad regression.
Assignee: nobody → chris
blocking2.0: --- → ?
While working on fixing the buffering logic I found we're using (mCurrentFrameTime + mStartTime) all over the place. I think this is a bit clunky, error prone, and could be clearer, so I refactored its usage into a new method "MediaTime()". This returns (mCurrentFrameTime + mStartTime), which is the time W.R.T. the media file's native timeline, not the time we report to the media element, which is normalized into the range [0,duration]. I also discovered one place were we were using mCurrentFrameTime, when we really meant to be using mCurrentFrameTime+mStartTime.
Attachment #470988 - Flags: review?(chris.double)
Attached patch Patch 2: Fix buffering logic (obsolete) — Splinter Review
* Change nsBuiltinDecoderStateMachine::HasFutureAudio() so that it only considers us having future audio if the audio data pushed to the hardware plus that which is waiting to be pushed to the hardware is less than  LOW_AUDIO_MS. Previously we considered these two types of audio data separately, so previously we could actually have no audio data in the hardware, and less than LOW_AUDIO_MS decoded waiting to be pushed, and consider ourselves to have future audio data, which was inconsistent.
* Don't halt the decoder thread when the decode catches up with the download (when mBufferExhausted==PR_TRUE). The decode will naturally stop when it blocks while reading the stream, or when its buffers fill up.
* Only keyframe skip if we're playing audio as well as video. This means when the decode catches up with the download and we're playing only video, we'll stop to buffer, otherwise we'll keep playing and rely on keyframe skipping.
* Added a few convenience functions to tell whether we're low on decoded data or not, and consolidated the logic to (re)use these functions.
* Change the "should I start buffering" condition check in DECODING state to move into buffering state if we're low on decoded audio data, or low on decoded video data if we don't have an audio stream.
* Change the "should I remain buffering?" condition check in BUFFERING state to also remain in buffering if we don't have full decode buffers. This means we won't stop buffering if we can't actually play anything. This prevents us from switching back into buffering state again immediately after exiting buffering because we're low on decoded data (not stopping the decoder while buffering also helps with this).
* Change the "should I remain buffering?" condition check in BUFFERING state to not remain in buffering if the download has been suspended. We previously didn't remain in buffering if the download was suspended by the cache, but if the download has been suspended by some other means (say going into bfcache, or preload=metadata causing us to suspend the load) we could get stuck in buffering state (for 30s at least). Changed the "should I go into buffering?" condition check in DECODING state to match.
* Change nsMediaDecoder::CanPlayThrough() to not claim it can play through unless it's got an estimated 20 seconds worth of data buffered after the current playback position. This makes the calculation more stable when the media's bitrate fluctuates, and less eager to claim it can play through near the start of a download. When we have an autoplay video, we start playback when we go into readyState HAVE_ENOUGH_DATA, and we go into HAVE_ENOUGH_DATA when CanPlayThrough() returns true, so adding this extra check means we delay starting autoplay videos. I think we should do this because our previous CanPlayThrough() calculation relied on estimated bitrate (stream_length / duration), which can be horribly wrong in a VBR media. Adding this extra check means we won't say we can play through (and won't start playing an autoplay video) unless we have enough buffered data to play at least *something*. So if our estimate is horribly wrong, we should at least be able to autoplay a non-miniscule amount of video before discovering we need to stop to buffer. This extra check also prevents us toggling between HAVE_ENOUGH_DATA and HAVE_CURRENT_DATA/HAVE_FUTURE_DATA when our download rate is greater than the estimated bitrate, but not greater than the media's actual current bitrate (particularly at the start of a download, or after a seek).
Attachment #470989 - Flags: review?(chris.double)
Attachment #470989 - Flags: feedback?(roc)
Comment on attachment 470988 [details] [diff] [review]
Patch 1: Add nsBuiltinDecoderStateMachine::MediaTime()

Changing review to Matthew, as Chris Double is away.
Attachment #470988 - Flags: review?(chris.double) → review?(kinetik)
Comment on attachment 470989 [details] [diff] [review]
Patch 2: Fix buffering logic

Changing review to Matthew, as Chris Double is away.
Attachment #470989 - Flags: review?(chris.double) → review?(kinetik)
+        if ((mBufferExhausted || HasLowDecodedData()) &&

Why would we start buffering just because HasLowDecodedData() returns true?

+             mDecoder->GetState() == nsBuiltinDecoder::PLAY_STATE_PLAYING &&
+             !stream->IsDataCachedToEndOfStream(mDecoder->mDecoderPosition) &&
+             !stream->IsSuspendedByCache() &&
+             !stream->IsSuspended())

Isn't the call to IsSuspendedByCache redundant?
(In reply to comment #6)
> +        if ((mBufferExhausted || HasLowDecodedData()) &&
> 
> Why would we start buffering just because HasLowDecodedData() returns true?

Hmm, you're right, that doesn't help. I was worried about the case where the decode can't keep up with the playback, but I think in retrospect in that case we're pretty much screwed anyway.

I added in the "are we running low on decoded data" condition a few months back:
https://bugzilla.mozilla.org/show_bug.cgi?id=557426#c5

Looking back at bug 557426, I think that || should be an && else we'd regress bug 557426.

> +             mDecoder->GetState() == nsBuiltinDecoder::PLAY_STATE_PLAYING &&
> +             !stream->IsDataCachedToEndOfStream(mDecoder->mDecoderPosition) &&
> +             !stream->IsSuspendedByCache() &&
> +             !stream->IsSuspended())
> 
> Isn't the call to IsSuspendedByCache redundant?

Ah yes, thanks for pointing that out!
(In reply to comment #7)
> Hmm, you're right, that doesn't help. I was worried about the case where the
> decode can't keep up with the playback, but I think in retrospect in that case
> we're pretty much screwed anyway.

Actually, skipping to the next keyframe will save us here!
With roc's feedback addressed.
Attachment #470989 - Attachment is obsolete: true
Attachment #471027 - Flags: review?(kinetik)
Attachment #470989 - Flags: review?(kinetik)
Attachment #470989 - Flags: feedback?(roc)
Comment on attachment 470988 [details] [diff] [review]
Patch 1: Add nsBuiltinDecoderStateMachine::MediaTime()

Maybe rename it GetMediaTime for symmetry with GetCurrentTime?  I'm not fussy, though.
Attachment #470988 - Flags: review?(kinetik) → review+
Comment on attachment 470988 [details] [diff] [review]
Patch 1: Add nsBuiltinDecoderStateMachine::MediaTime()

+++ b/content/media/nsBuiltinDecoderStateMachine.h
+  PRInt64 nsBuiltinDecoderStateMachine::MediaTime() const {

This fails to compile with recent GCC.  You need to remove |nsBuiltinDecoderStateMachine::|.
Comment on attachment 471027 [details] [diff] [review]
Patch 2 v2: Buffering fix

+  // Returns PR_TRUE if the media queue has had it last sample added to it.

"has had its"
Attachment #471027 - Flags: review?(kinetik) → review+
We had test failures:
WINNT 5.2 mozilla-central debug test mochitests-1/5 on 2010/09/05 20:13:55
53498 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_bug493187.html | Test timed out.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1283742835.1283745579.30915.gz

Rev3 Fedora 12 mozilla-central opt test mochitests-1/5 on 2010/09/05 21:10:40
52729 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_bug493187.html | Test timed out.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1283746240.1283747261.5614.gz

At least 1 other too.

Backed out:

http://hg.mozilla.org/mozilla-central/rev/0d386e58fbdf
http://hg.mozilla.org/mozilla-central/rev/5cdd18879555
http://hg.mozilla.org/mozilla-central/rev/848ed8a387a9
http://hg.mozilla.org/mozilla-central/rev/d22e058453bb
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The failure in test_bug493187 was caused by the canplaythrough event not firing. This patch fixes that, and probably fixes bug 568402.

nsMediaDecoder::CanPlayThrough() reports PR_FALSE if the download or playback rate isn't considered reliable. The download rate is not considered reliable if we've been downloading for less than 3 seconds. If the media cache suspends the download of a media resource within the first 3 seconds of a download, we'll consider the download rate unreliable, and we'll never be able to never fire canplaythrough (unless the download starts up again perhaps). The fix is to change to HAVE_ENOUGH_DATA readyState when we suspend the download. We can only safely switch to HAVE_ENOUGH_DATA if our readyState is at least HAVE_CURRENT_DATA, else we may fire events like loadeddata before we should.

This patch also includes a few minor test fixes.
Attachment #472920 - Flags: review?(roc)
Attachment #472920 - Flags: approval2.0?
nsBuiltinDecoder::NotifySuspendedStatusChanged calls NotifyAutoplayDataReady. It needs to call nsHTMLMediaElement::ChangeReadyState. Or better still, nsHTMLMediaElement::UpdateReadyStateForData, and add a check to nsHTMLMediaElement::UpdateReadyStateForData to go to HAVE_ENOUGH_DATA when the stream IsSuspendedByCache.
Attachment #472920 - Attachment is obsolete: true
Attachment #473814 - Flags: review?(roc)
Attachment #472920 - Flags: review?(roc)
Attachment #472920 - Flags: approval2.0?
+  if (mDecoder->GetCurrentStream()->IsSuspendedByCache() &&
+      !mDecoder->IsEnded())

I think this should happen after we check aNextFrame != NEXT_FRAME_AVAILABLE. If we can't even decode one frame, the media cache should not have suspended the stream, but we might temporarily be in a state where it has and it just hasn't noticed that it needs to read yet.
Attachment #473814 - Attachment is obsolete: true
Attachment #473867 - Flags: review?(roc)
Attachment #473814 - Flags: review?(roc)
Bounced again!

s: win32-slave40
53625 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_bug493187.html | Test timed out.
tinderbox.mozilla.org/showlog.cgi?log=Firefox/1284165896.1284168956.5378.gz

Backout:
http://hg.mozilla.org/mozilla-central/rev/a91268f8fe90
http://hg.mozilla.org/mozilla-central/rev/95252c3fd859
http://hg.mozilla.org/mozilla-central/rev/801cefec09a6
http://hg.mozilla.org/mozilla-central/rev/9eabb903c0f3
http://hg.mozilla.org/mozilla-central/rev/90f288d4f7cb
http://hg.mozilla.org/mozilla-central/rev/b6dff6572c77
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I was able to reproduce this in an opt build. The problem is that I changed nsBuiltinDecoderStateMachine::HasFutureAudio() to consider more than LOW_AUDIO_MS of available audio as having future audio, whereas previously it would return PR_TRUE if the audio queue had 1 or more chunks in it. HasFutureAudio() is used by the readyState calculation, which affects when we fire canplaythrough.

The decode loop calls UpdateReadyStateForData() if the audio queue has less than 2 chunks in it, anticipating that the readystate may change when we cross this boundary. However that was inconsistent with my change to HasFutureAudio(), so we could miss this transition and if we were not playing we'd never attempt to update the readyState again, and so never end up firing canplaythrough.

My previous patch 3, which fires canplaythrough when the download was suspended, would sometimes make up for this by updating the readyState again, but not in all cases, so I'm pretty convinced that we don't need it anymore.
Attachment #473867 - Attachment is obsolete: true
Attachment #475003 - Flags: review?(roc)
* Just update the readystate in DecodeLoop if !IsPlaying() as discussed.
* Don't require all active streams to be playing for us to pause the decode thread. Now we just pause the decode if all the active streams are far enough ahead of the playback position. With the change above, we could get into a situation where we wouldn't pause the decode loop if only one of the streams had finished, but videoWait and audioWait were PR_TRUE. Previously we'd sleep at the start of the loop if mBufferExhausted, but I removed that in patch 1, so we need to do it here instead.
* Bug 588312 changed the nsBuiltinDecoderStateMachine::Run() function to set the current playback position to the seek target time before performing the seek, but it passed this new time down to the reader's seek function as the currentTime. This is wrong, because if the seek target is the same as the passed-in current playback position, we won't actually seek. I also changed the usage of mStartTime+mCurrentFrameTime to use the GetMediaTime() convenience function I added in patch 1.
Attachment #475003 - Attachment is obsolete: true
Attachment #475209 - Flags: review?(roc)
Attachment #475003 - Flags: review?(roc)
Third time's the charm...

http://hg.mozilla.org/mozilla-central/rev/89cc2e7eb2fb
http://hg.mozilla.org/mozilla-central/rev/d43718e5b27a
http://hg.mozilla.org/mozilla-central/rev/ada6ae1fa42b
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: