Closed Bug 1125469 Opened 5 years ago Closed 5 years ago

Playback sometimes stall waiting, while buffered data is there

Categories

(Core :: Audio/Video, defect, P1)

x86
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox36 --- unaffected
firefox37 + fixed
firefox38 --- fixed

People

(Reporter: jya, Assigned: jya)

References

(Blocks 1 open bug)

Details

Attachments

(5 files, 3 obsolete files)

Attached file log.txt.zip
Seen this particularly happening today, now that I have very verbose logging

Under some circumstances, playback will stall even though we have data to play.
All happening using YouTuve

logs attached, it's super verbose
Priority: -- → P1
From the log:

> 543834112[123190f60]: MediaSourceReader(13739a400)::Seek(aTime=36044488, aEnd=5221491712, aCurrent=4838798720)
> 543834112[123190f60]: Seek creating MediaPromise (12051be40)

This seek promise never gets resolved, which means that mWaitingForSeekData remains false.

> 543834112[123190f60]: DecodeSeek invoking Then() [this=12051be40, thenValue=122d9e480, aThisVal=13739a800, isPending=1]
> [35531] WARNING: Releasing screensaver: file /Users/jyavenard/Work/Mozilla/mozilla-central/widget/cocoa/nsAppShell.mm, line 80
> 2093089536[1004481a0]: SourceBuffer(135ca3080)::GetBuffered ranges=[]
> 2093089536[1004481a0]: SourceBuffer(135d4b700)::GetBuffered ranges=[(0.000000, 50.015124)]

This indicates that there are two source buffers, presumably one audio and one video. One of them has the seek target (t=30.04) and the other does not. So MediaSourceReader::TrackBuffersContainTime presumably is correctly returning false.

If this log was generated with the range removal patches from bug 1096089, I suspect they might be at fault here for discarding an entire track buffer.
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #1)
> If this log was generated with the range removal patches from bug 1096089, I
> suspect they might be at fault here for discarding an entire track buffer.

remove was called on that sourcebuffer using the arguments 0, duration. So it was entirely emptied.
Coming back from dinner, absolutely 0 change, and playback is once again flawless...

maybe it was a temporary failure on youtube side... certainly works great now
Can we close this?
Flags: needinfo?(jyavenard)
I can still reproduce it from time to time...

after seeking back and forth, and clearing the sourcebuffer by changing resolution, from time to time it gets stuck in waiting.
Flags: needinfo?(jyavenard)
Can you try to get a log for this?
I can reproduce this fairly reliably on https://www.youtube.com/watch?v=aB16fJpoj-I&list=PLFqO_oqoHHMzDskSwMatRBeWtT7qrB4dF&index=3

Nightly 38.0a1 (2015-01-29)

What logs do you need?
(In reply to Martin Thomson [:mt] from comment #7)
> I can reproduce this fairly reliably on
> https://www.youtube.com/watch?v=aB16fJpoj-
> I&list=PLFqO_oqoHHMzDskSwMatRBeWtT7qrB4dF&index=3
> 
> Nightly 38.0a1 (2015-01-29)
> 
> What logs do you need?

Which OS?

This is the most verbose logging:

MEDIA_LOG_SAMPLES=1 NSPR_LOG_MODULES="MediaDecoder:5,MediaSource:5,MediaPromise:5,MP4Demuxer:5,AppleMedia:5"

If that messes up timing too much, you can try dropping the MEDIA_LOG_SAMPLES part.
Linux.  I'll see what I can get for you.
(In reply to Martin Thomson [:mt] from comment #9)
> Linux.  I'll see what I can get for you.

Are you using MP4 or webm (check stats for nerds on the youtube video)? On Nightly, linux is set to use webm MSE, which we've punted on for the time being (we know of several webm-related stalls). If you can reproduce it on mp4, that would be much more interesting to us at the moment.
Flags: needinfo?(martin.thomson)
It's definitely vp9, so I'm guessing that means webm.  I'll confirm when I get back to the machine.
Yeah. Try setting the following prefs to true:

media.fragmented-mp4.enabled
media.fragmented-mp4.exposed
media.fragmented-mp4.ffmpeg.enabled
media.mediasource.mp4.enabled

And setting the following to false (since youtube prefers webm if both are available):

media.mediasource.webm.enabled
Logs are being collected as we speak.  I'm using a debug build off m-c, so the main thread is pretty unresponsive, but video is still pretty smooth.  No sign of stalling yet, where the webm stalled pretty frequently.  Guess that it's the known webm stalls I'm hitting.
Flags: needinfo?(martin.thomson)
jya, do you think my patches in bug 1127203 might fix this?
Flags: needinfo?(jyavenard)
Looks like I'm not going to help you much here.  The log file just passed 1Gb in size, no stalls.  Silky smooth too.
(In reply to Martin Thomson [:mt] from comment #15)
> Looks like I'm not going to help you much here.  The log file just passed
> 1Gb in size, no stalls.  Silky smooth too.

I'm not complaining :-)

Thanks for helping us on this! We're down to the wire schedule-wise, and this kind of help makes all the difference.
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #14)
> jya, do you think my patches in bug 1127203 might fix this?

I think it will. And in combination with bug 1125776 which fixes the race of Buffered() returning ranges of non initialized decoders ; those both should fix it hopefully.

I will be running more tries over the week-end to confirm
Flags: needinfo?(jyavenard)
Blocks: 1130263
Ensure that we seek to that time we actually have.

In practice, with MP4 it doesn't matter, the MP4 demuxer will seek to the first time available after seek target. But not knowing what the WebM demuxer does, and just in case.
Attachment #8560431 - Flags: review?(matt.woodrow)
Assignee: nobody → jyavenard
Status: NEW → ASSIGNED
Ensure that we set the audio/video data discontinuity flag after a seek/reset. Without it the state machine would be dropping all data we output. The flag would be set on the first frame returned by the demuxer, but under some circumstances (like if we were skipping to next frame) we could be ignoring that first returned frame ; so we would never set it. So we would exaust our buffer and either audio or video would stop playing while the other would continue
Attachment #8560433 - Flags: review?(matt.woodrow)
Rely on what the sub-reader returns in regards to discontinuities detection.
Attachment #8560860 - Flags: review?(matt.woodrow)
oops left debugging logs in there
Attachment #8560861 - Flags: review?(matt.woodrow)
Attachment #8560433 - Attachment is obsolete: true
Attachment #8560433 - Flags: review?(matt.woodrow)
Attachment #8560860 - Attachment is obsolete: true
Attachment #8560860 - Flags: review?(matt.woodrow)
This is the actual proper fix for this mess. Part2 is probably not required, but as it doesn't hurt and does what it's supposed to do properly, I think we should leave it.

So what was happening is that under some circumstances, the MDSM requested for the next video data with skiptonextframe=true and a time threshold of 60s. This set mDropAudioBeforeThreshold and mDropVideoBeforeThreshold to mTimeThreshold to 60s. Shortly after, we have a seek to 7s (before the decoder had time to output anything which would have reset mDropAudio/VideoBeforeThreshold flag). The seek waits for data to be available at the 7s mark, execute the seek and resolve the promise. Next the MDSM calls RequestAudio/VideoData with skiptonextframe=0, timethreshold=0; but our timethreshold is still set to 60s. The decoder we've seeked to doesn't contain 60s ; and as such will never return any frames ever.
Attachment #8560863 - Flags: review?(matt.woodrow)
Comment on attachment 8560431 [details] [diff] [review]
Part1. Don't attempt to seek on time found with fuzz search

Review of attachment 8560431 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/mediasource/MediaSourceReader.cpp
@@ +508,5 @@
>      mAudioSourceDecoder = newDecoder;
> +    if (usedFuzz) {
> +      nsRefPtr<dom::TimeRanges> ranges = new dom::TimeRanges();
> +      newDecoder->GetBuffered(ranges);
> +      *aTarget = ranges->GetStartTime() * USECS_PER_S;

If we have overlapping decoders, then selecting a new one because of fuzz doesn't guarantee we're at the start.

I think we need SelectDecoder to also adjust aTarget, until we properly make sure we never have overlapping time ranges.
Attachment #8560861 - Flags: review?(matt.woodrow) → review+
Attachment #8560863 - Flags: review?(matt.woodrow) → review+
(In reply to Matt Woodrow (:mattwoodrow) from comment #23)
> Comment on attachment 8560431 [details] [diff] [review]
> Part1. Don't attempt to seek on time found with fuzz search
> 
> Review of attachment 8560431 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/media/mediasource/MediaSourceReader.cpp
> @@ +508,5 @@
> >      mAudioSourceDecoder = newDecoder;
> > +    if (usedFuzz) {
> > +      nsRefPtr<dom::TimeRanges> ranges = new dom::TimeRanges();
> > +      newDecoder->GetBuffered(ranges);
> > +      *aTarget = ranges->GetStartTime() * USECS_PER_S;
> 
> If we have overlapping decoders, then selecting a new one because of fuzz
> doesn't guarantee we're at the start.

in practice, it does.

Because otherwise we would have succeeded with the exact search.

A sub-decoder only ever has a single continuous range of data (in theory that could be different if the data provided to appendBuffer had a discontinuity).

So the only time an exact search would have failed, and a fuzz one succeeded is if the search value is just a tiny bit under the start time value.

> 
> I think we need SelectDecoder to also adjust aTarget, until we properly make
> sure we never have overlapping time ranges.

If we want to handle a decoder with a discontinuous range, it's more effort.

Alternatively, can add an extra check that we only adjust to aTarget to buffered.start if aTarget was inferior to aTarget..
I think this is the better approach, as the demuxer will handle the discontinuity otherwise.
(In reply to Jean-Yves Avenard [:jya] from comment #24)
> 
> in practice, it does.
> 
> Because otherwise we would have succeeded with the exact search.
> 
> A sub-decoder only ever has a single continuous range of data (in theory
> that could be different if the data provided to appendBuffer had a
> discontinuity).
> 
> So the only time an exact search would have failed, and a fuzz one succeeded
> is if the search value is just a tiny bit under the start time value.
> 

Ok, please add a comment explaining that.
Attachment #8560431 - Flags: review?(matt.woodrow) → review+
Attached patch Disable MP4 audio bitrate test (obsolete) — Splinter Review
Disable audio bitrate test. It was failing purely due to luck
Disable audio bitrate test. It was failing purely due to luck. Add bug link
Attachment #8561280 - Attachment is obsolete: true
Test disable pushed to aurora 37 as part of uplifting bug 1129732, which causes timeouts on linux.

https://hg.mozilla.org/releases/mozilla-aurora/rev/550f936c3293
Comment on attachment 8560863 [details] [diff] [review]
Part3. Make sure we stop dropping frames from previous operation after seeking

Requesting 37 uplift of parts 1-3.

Approval Request Comment
[Feature/regressing bug #]: MSE
[User impact if declined]: Playback stall with YouTube.
[Describe test coverage new/current, TreeHerder]: Landed on m-c.
[Risks and why]: Changes are MSE-specific so regression risk is low.
[String/UUID change made/needed]: None.
Attachment #8560863 - Flags: approval-mozilla-aurora?
Comment on attachment 8560431 [details] [diff] [review]
Part1. Don't attempt to seek on time found with fuzz search

As stated, this bug was pre-approved to land with a set of changes for MSE. Marking the approval after the fact.
Attachment #8560431 - Flags: approval-mozilla-aurora+
Attachment #8560861 - Flags: approval-mozilla-aurora+
Attachment #8560863 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.