Closed Bug 1163227 Opened 9 years ago Closed 9 years ago

Unable to play back YouTube video - continually polls/shows AJAX spinner

Categories

(Core :: Audio/Video, defect)

defect
Not set
major

Tracking

()

VERIFIED FIXED
mozilla41
Tracking Status
firefox40 --- affected
firefox41 --- fixed

People

(Reporter: stephend, Assigned: jya)

References

(Blocks 1 open bug, )

Details

Attachments

(12 files, 5 obsolete files)

828.02 KB, text/plain
Details
5.76 MB, application/x-bzip
Details
2.08 KB, patch
Details | Diff | Splinter Review
5.99 KB, patch
Details | Diff | Splinter Review
3.93 KB, patch
Details | Diff | Splinter Review
1.13 KB, patch
Details | Diff | Splinter Review
6.18 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
1.21 KB, patch
ajones
: review+
Details | Diff | Splinter Review
1.57 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
4.52 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
4.10 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
12.85 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
Attached file MediaLog.txt
Build ID: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:40.0) Gecko/20100101 Firefox/40.0

GPU info:
* Nvidia Quadro K600
* CPU: Inter Core i7-4470 CPU @ 3.40GHZ
* RAM: 32GB
* Driver version: 347.52

STR:

1. Have previously loaded https://www.youtube.com/watch?v=jEnd8JIMii4 in the browser (so you can hopefully reproduce my setup in this problem)
2. Set https://www.youtube.com/watch?v=jEnd8JIMii4 as your homepage, in Nightly Firefox on Windows
3. Wait for the video to render

Actual Results:

A ton of GETs fire (visible via the Network panel in the Web Console/Dev Tools), but the video -- at least in my case -- never resumes playing

Screencast showing the problem (please note; it doesn't, unfortunately, correspond to the logfile, but does demonstrate the behavior I see consistently, today): http://screencast.com/t/x8kHX626a

BTW, I've also filed bug 1152580 from the same machine, and Firefox Nightly (though a different -- earlier -- build).
Apologies in advance, Anthony -- was told in #media to ni? you :-)
Flags: needinfo?(ajones)
(I should also mention that I had e10s *disabled* when I experienced this bug and made this log/screencast.)
How easy is it to reproduce?
Blocks: MSE
Can you please try get a log with "MediaPromise:5" included in the logging?

It looks like we're getting stuck waiting for a seek to complete, but it's not really clear why.
Flags: needinfo?(ajones)
Attached file MediaLog.txt.bz2
Here's a logfile from playing two 4K YouTube videos, one of which started "Experiencing Interruptions?" and then remained stuck in the AJAX-loading/spinning state - hopefully that's captured in this log.
Matt/Anthony; mind taking a look at the new logfile I've attached, with MediaPromise:5? Thx!
Flags: needinfo?(matt.woodrow)
This log seems very different to the previous one, I can't find the same stall in it.

Can you please try reproduce again with only one video playing?
Flags: needinfo?(matt.woodrow)
(In reply to Matt Woodrow (:mattwoodrow) from comment #7)
> This log seems very different to the previous one, I can't find the same
> stall in it.
> 
> Can you please try reproduce again with only one video playing?

I caught this again, albeit this time with a tab of https://www.youtube.com/watch?v=jEnd8JIMii4 I detached from my main browser window (which also had GitHub.com loaded in a separate tab).

The video playback stopped twice: 1) 4:04 and 2) 14:15

Here's the corresponding screencast which unfortunately only shows from the 1st stop at 4:04 onward: http://www.screencast.com/t/JvqHxiSHT
Corresponding logfile with NSPR_LOG_MODULES=MediaDecoder:5,MediaSource:5,nsMediaElement:5,nsMediaElementEvents:5,mediaPromise:5 all set: https://app.box.com/s/7h8fymnyv9hthghdymjmg3kw740wx5i9
(In reply to Stephen Donner [:stephend] from comment #8)
> (In reply to Matt Woodrow (:mattwoodrow) from comment #7)
> > This log seems very different to the previous one, I can't find the same
> > stall in it.
> > 
> > Can you please try reproduce again with only one video playing?
> 
> I caught this again, albeit this time with a tab of
> https://www.youtube.com/watch?v=jEnd8JIMii4 I detached from my main browser
> window (which also had GitHub.com loaded in a separate tab).
> 
> The video playback stopped twice: 1) 4:04 and 2) 14:15
> 
> Here's the corresponding screencast which unfortunately only shows from the
> 1st stop at 4:04 onward: http://www.screencast.com/t/JvqHxiSHT

How long does playback stop for?

At 4:04 I can see that you ran out of decoded video frames and it paused briefly to catch up, shouldn't have stopped for long though.

The log doesn't appear to cover 14:15.
(In reply to Matt Woodrow (:mattwoodrow) from comment #10)

<snip>

> How long does playback stop for?
> 

Sorry; I can't recollect - is there a way to get timestamps in the logs?

> At 4:04 I can see that you ran out of decoded video frames and it paused
> briefly to catch up, shouldn't have stopped for long though.
> 
> The log doesn't appear to cover 14:15.

If you just click on https://app.box.com/s/7h8fymnyv9hthghdymjmg3kw740wx5i9 you get a preview -- downloading, I get the full file, ending in:

...
9412[225a5400]: Resolving Then() call made from mozilla::MediaDecoderStateMachine::FinishShutdown [Runnable=8f83b40, Promise=8f78200, ThenValue=a0dd6c0]
9412[225a5400]: MediaPromise::~MediaPromise [this=8f78200]
0[80e800]: ResolveOrRejectRunnable::Run() [this=8f83b40]

Does that help?  I'll try to get a more concerted effort again, tomorrow (it's my work machine, in Mountain View).
With Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:41.0) Gecko/20100101 Firefox/41.0 https://hg.mozilla.org/mozilla-central/rev/4fb7ff694bf5, I hit a case where it was AJAX-spinning for at least 3 solid minutes, before I shut down Nightly, to report back here.  I've captured that in a logfile and screencast, too:

1) logfile: https://app.box.com/s/5mrlrljd69otcu07fem3vojh3oqnet76
2) screencast: https://app.box.com/s/kvgz9ycvymkv7r726kqv39qavm41vh0i
Interesting piece of the log: https://pastebin.mozilla.org/8834032

Looks like the format change caused youtube to evict all existing data (line 3), and we update the buffered ranges reported to MediaSourceReader (lines 34-37).

The next audio decode fails (line 76), but the buffered ranges we see in SelectDecoder still show the old ranges, so WaitForData returns immediately with haveAudio=1 (line 89) and we try decoding audio again.

This repeats over and over again without changing the media element state, since we always think we have data.

I think this is because MediaFormatReader using cached ranges, and doesn't appear to update them when we get an eviction.

MediaFormatReader only landed recently though, so this looks like a different issue to what was originally reported :(
Notify MediaSourceDecoder/Reader that data was removed/evicted.
Attachment #8607963 - Flags: review?(matt.woodrow)
Assignee: nobody → jyavenard
Status: NEW → ASSIGNED
Add MediaDecoderReader::NotifyDataRemoved() method. This allows for the MediaSource's TrackBuffer to notify the sub reader that it's just finished butchering the data. It wasn't required with MP4Reader because calling MP4Reader::GetBuffered() always force to recalculate the index. I intend to scrap this once we get rid of MediaSourceReader and the current TrackBuffer
Attachment #8607968 - Flags: review?(cpearce)
Have the TrackBuffer tells the sub-readers when data were evicted
Attachment #8607975 - Flags: review?(matt.woodrow)
This is the actual cause of the stall when using the MediaFormatReader. The MP4 Index returns UINT64_MAX when it can't evict... silly bugger
Attachment #8607976 - Flags: review?(matt.woodrow)
I believe part1 could fix the problem in version 40 (and maybe earlier).

The rest of the patches are only relevant with the MediaFormatReader which is very conservative in refreshing its cached buffered TimeRanges
may have to uplift part1 if found to be the cause.
Attachment #8607968 - Flags: review?(cpearce) → review+
https://treeherder.mozilla.org/#/jobs?repo=try&revision=547323a7bdcd

All green. Doesn't mean much as it was all green before :)
After about 2.5 hours playback, found playback stalled.
https://www.youtube.com/watch?v=_Q5pWdLAaLg
	mediasource:https://www.youtube.com/9de01f04-e056-cd4b-8d1b-d690750b805c
	currentTime: 4.87619
		SourceBuffer 0
			start=0 end=4.875895
		SourceBuffer 1
			start=0 end=5.005
	Internal Data:
	Dumping data for reader 1435ee800:
		Dumping Audio Track Decoders: - mLastAudioTime: 4.875894
			Reader 0: 14975d800 ranges=[(0.000000, 4.875895)] active=true size=81283
		Dumping Video Track Decoders - mLastVideoTime: 4.337666
			Reader 0: 14976f800 ranges=[(0.000000, 5.005000)] active=true size=1382492

Seems like a rounding issue, which is ultra-surprising since the move to TimeUnit/TimeIntervals are there should be none.

Most likely is the use of fractionals in the MP4 container.
https://www.youtube.com/watch?v=_Q5pWdLAaLg
	mediasource:https://www.youtube.com/eea01ffe-23df-bb4a-8d21-da5023a9138e
	currentTime: 28.869523
		SourceBuffer 0
			start=0 end=28.722539
		SourceBuffer 1
	Internal Data:
	Dumping data for reader 1483c9000:
		Dumping Audio Track Decoders: - mLastAudioTime: 28.722538
			Reader 1: 159bd8800 ranges=[(0.000000, 28.722539)] active=true size=462037
			Reader 0: 13a1df000 ranges=[] active=false size=0
		Dumping Video Track Decoders - mLastVideoTime: 24.958266
			Reader 0: 128872000 ranges=[] active=true size=43294223

And another case with a 43MB sourcebuffer, yet empty :(
Attachment #8607963 - Flags: review?(matt.woodrow) → review+
Attachment #8607975 - Flags: review?(matt.woodrow) → review+
Attachment #8607976 - Flags: review?(matt.woodrow) → review+
Attachment #8607963 - Attachment is obsolete: true
Attachment #8607968 - Attachment is obsolete: true
Attachment #8607975 - Attachment is obsolete: true
Attachment #8607976 - Attachment is obsolete: true
Always ensure MediaSourceReader is notified that new data has arrived (or be removed) on its own task queue. There could be a race where we call the sub-reader's GetBuffered() leading to a discrepancy between what playback expects to find and what's really there. This caused to never come out of WAITING_FOR_DATA mode.
Attachment #8609379 - Flags: review?(matt.woodrow)
*never* do blocking reads with MSE. From time to time we have a rounding issue (usually by one microsecond) between the time of the last sample and the moof's end time. This caused the MediaSourceReader attempt to decode one more sample that we don't have, entering a blocking read. We ensure that the sample is within the resource's content, or report EOS otherwise.
Attachment #8609381 - Flags: review?(ajones)
Little simplification
Attachment #8609382 - Flags: review?(ajones)
Attachment #8609381 - Attachment is obsolete: true
Attachment #8609381 - Flags: review?(ajones)
Another interesting stall:
HTMLMediaElement debug data

https://www.youtube.com/watch?v=jEnd8JIMii4
	mediasource:https://www.youtube.com/ca06e66d-cdd0-814f-8403-0f599e77bc94
	currentTime: 153.599995
		SourceBuffer 0
			start=10.0078 end=175.054943
		SourceBuffer 1
			start=150.15 end=175.175
	Internal Data:
	Dumping data for reader 11b54d800:
		Dumping Audio Track Decoders: - mLastAudioTime: 154.621382
			Reader 2: 12ad42000 ranges=[(150.000907, 175.054943)] active=true size=404131
			Reader 1: 13620a000 ranges=[(10.007800, 150.000317)] active=false size=2249647
			Reader 0: 13a5ec000 ranges=[] active=false size=0
		Dumping Video Track Decoders - mLastVideoTime: 152.986166
			Reader 0: 132b83000 ranges=[(150.150000, 175.175000)] active=true size=75663176

We're stuck trying to seek to 152.986166 to get a new sample and this makes us jump to audio reader 1, which then fail with EOS, and MediaSourceReader retries indefinitely. Actually, I believe this is the original bug reported as the behaviour is exactly as described in the first STR.
Blocks: 1166836
Fix sourcebuffer eviction, we were evicting the entire sourcebuffer, another microseconds / seconds issue.
Attachment #8609703 - Flags: review?(matt.woodrow)
Do not reset the demuxer before skipping to the demuxer.
Attachment #8609722 - Flags: review?(cpearce)
Properly recalculate next keyframe time in MP4Demuxer after either a reset or skipping to next key frame. This would otherwise trigger the skip-to-nextkeyframe algorithm unecessarily.
Attachment #8609723 - Flags: review?(cpearce)
Had forgotten that one. This is MediaFormatReader with logs, made it easier to debug what was going on. Simplify ::Update(), as we don't need the monitor anymore, NeedInput() didn't have to be where it was (remnant of MP4Reader.
Attachment #8609739 - Flags: review?(cpearce)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=62ec2a54c4a6

super consistent on all version of mac and windows, still old intermittent with bug 1138555 on linux
Attachment #8609379 - Flags: review?(matt.woodrow) → review+
Attachment #8609703 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8609382 [details] [diff] [review]
Part7. Never do blocking read if we don't have data

Review of attachment 8609382 [details] [diff] [review]:
-----------------------------------------------------------------

How is the resource read triggered in the non-MSE case?

Did you consider either of the options below?

1. Add SampleIterator::IsNextSampleBuffered()
2. Make SampleIterator::GetNext() return something that contains the meta-data but instead of providing the data, implements GetDataPromise() and GetDataIfBuffered() methods.
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #36)
> Comment on attachment 8609382 [details] [diff] [review]
> Part7. Never do blocking read if we don't have data
> 
> Review of attachment 8609382 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> How is the resource read triggered in the non-MSE case?
> 
> Did you consider either of the options below?
> 
> 1. Add SampleIterator::IsNextSampleBuffered()
> 2. Make SampleIterator::GetNext() return something that contains the
> meta-data but instead of providing the data, implements GetDataPromise() and
> GetDataIfBuffered() methods.

this is a big change compare to what we have. I can't foresee the impact changing this API. It would also render the demuxer incompatible with the existing MP4Reader (which is still there).

Returning nullptr is what we do when we reach an EOS in other cases (such as parsing the moof or the moov). 

The logic I added there is the same as we do in the moofparser:
We check that the data is contained in the current range. There are three cases to consider:
1- MSE, we know the exact buffered range. Here Length() would be the end of the buffered data. We don't want to block, ever (this would lead to stall (MediaFormatReader) or deadlocks (MP4Reader))
2- Plain MP4 from a server reporting the size. Length() returned would be the end of the data reported by the server. Read can be blocking while the resource retrieves the value
3- Plain MP4 from server not reporting the size. Length() will be returned as INT64_MAX, so we block while the resource retrieves the value.
For case 2 and 3, a EOS will be returned after should the data not be present as size read != size expected.

The patch handle all 3 conditions (as it does when reading the moof or any of the incomple boxes)
Attachment #8609722 - Flags: review?(cpearce) → review+
Attachment #8609723 - Flags: review?(cpearce) → review+
Comment on attachment 8609739 [details] [diff] [review]
Part5: Increase verbosity content for debugging purposes

Review of attachment 8609739 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaFormatReader.cpp
@@ +722,5 @@
>    // which overrides our "few more samples" threshold.
>    return
>      !aDecoder.mError &&
>      aDecoder.HasPromise() &&
> +    !aDecoder.mDemuxRequest.Exists() &&

This looks like a behaviour change, rather than a logging change, did you mean to include it in this patch? There's a few chunks like that below.

@@ +997,5 @@
>  MediaFormatReader::ResetDecode()
>  {
>    MOZ_ASSERT(OnTaskQueue());
>  
> +  LOGV("");

Did you mean to log a blank line here, or should you be logging "MediaFormatReader::ResetDecode()" here instead?

@@ +1034,5 @@
>      return;
>    }
>  
>    RefPtr<nsIRunnable> task =
> +    NS_NewRunnableMethodWithArgs<TrackType, MediaData*>(

Here you're dropping the refcounting on MediaData. That means the MediaData is not stored refcounted, and so it could end up being destroyed before you're done using it. You should put that back!
Attachment #8609739 - Flags: review?(cpearce) → review+
(In reply to Chris Pearce (:cpearce) from comment #39)

> ::: dom/media/MediaFormatReader.cpp
> @@ +722,5 @@
> >    // which overrides our "few more samples" threshold.
> >    return
> >      !aDecoder.mError &&
> >      aDecoder.HasPromise() &&
> > +    !aDecoder.mDemuxRequest.Exists() &&
> 
> This looks like a behaviour change, rather than a logging change, did you
> mean to include it in this patch? There's a few chunks like that below.

It's not a behaviour change. We were checking if mDemuxRequest existed later after calling NeedInput() (in RequestDemuxSamples()) ; so it was cleaner to do so in NeedInput()

NeedInput() taking the meaning: do we need to demux a new frame or feed data to the decoder.

> 
> @@ +997,5 @@
> >  MediaFormatReader::ResetDecode()
> >  {
> >    MOZ_ASSERT(OnTaskQueue());
> >  
> > +  LOGV("");
> 
> Did you mean to log a blank line here, or should you be logging
> "MediaFormatReader::ResetDecode()" here instead?

the macro already adds the name of the function by default.

> 
> @@ +1034,5 @@
> >      return;
> >    }
> >  
> >    RefPtr<nsIRunnable> task =
> > +    NS_NewRunnableMethodWithArgs<TrackType, MediaData*>(
> 
> Here you're dropping the refcounting on MediaData. That means the MediaData
> is not stored refcounted, and so it could end up being destroyed before
> you're done using it. You should put that back!

no no.. That's what bug 1153295 was about. MediaData being refcounted (and as such having a Release/AddRef method, NS_NewRunnableMethodWithArgs will automatically increase the refcount for those.
Attachment #8609382 - Flags: review?(ajones) → review+
Sorry for the delay in verifying this, and thanks for all the hard work + patience!

Verified FIXED using Build identifier: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:41.0) Gecko/20100101 Firefox/41.0
Status: RESOLVED → VERIFIED
Depends on: 1196696
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: