Closed Bug 1345898 Opened 5 years ago Closed 5 years ago

Intermittent dom/media/mediasource/test/test_ChangeWhileWaitingOnMissingData_mp4.html | Test timed out.


(Core :: Audio/Video: Playback, defect)

Not set



Tracking Status
firefox53 --- wontfix
firefox54 + wontfix
firefox55 --- fixed


(Reporter: intermittent-bug-filer, Assigned: jya)



(Keywords: intermittent-failure)


(2 files, 1 obsolete file)

that intermittent only occurred because bug 1345756 happened to have been committed after bug 1342913 (reviewboard didn't commit the first due to rebasing issue)
Closed: 5 years ago
Resolution: --- → INVALID
Those intermittents on mac are only due to bug 1345756 not being uplifted (even though the mochitest depending on it was)
Starting to wonder if this isn't an issue related to the very old Libav used by try machines...

[task 2017-05-26T22:12:50.451347Z] 22:12:50     INFO - GECKO(1487) | Decoder=7f6c82a98800 MediaDecoder State: channels=2 rate=22050 hasAudio=1 hasVideo=1 mPlayState=PLAYING mdsm=7f6c84c90000
[task 2017-05-26T22:12:50.453142Z] 22:12:50     INFO - GECKO(1487) | reader data:
[task 2017-05-26T22:12:50.455348Z] 22:12:50     INFO - GECKO(1487) | Audio Decoder: ffmpeg audio decoder
[task 2017-05-26T22:12:50.457812Z] 22:12:50     INFO - GECKO(1487) | Audio Frames Decoded: 40
[task 2017-05-26T22:12:50.460169Z] 22:12:50     INFO - GECKO(1487) | Audio State: ni=0 no=0 wp=0 demuxr=0 demuxq=0 decoder=0 tt=-1.0 tths=-1 in=17 out=17 qs=0 pending=0 wfd=0 wfk=0 sid=8
[task 2017-05-26T22:12:50.462226Z] 22:12:50     INFO - GECKO(1487) | Video Decoder: ffmpeg video decoder
[task 2017-05-26T22:12:50.470215Z] 22:12:50     INFO - GECKO(1487) | Hardware Video Decoding: disabled
[task 2017-05-26T22:12:50.472945Z] 22:12:50     INFO - GECKO(1487) | Video Frames Decoded: 30 (skipped=0)
[task 2017-05-26T22:12:50.474444Z] 22:12:50     INFO - GECKO(1487) | Video State: ni=0 no=0 wp=1 demuxr=0 demuxq=1 decoder=0 tt=-1.0 tths=-1 in=24 out=24 qs=1 pending:1 wfd=1 wfk=0 sid=7
[task 2017-05-26T22:12:50.476445Z] 22:12:50     INFO - GECKO(1487) | Dumping Data for Demuxer: 7f6c82aa3800
[task 2017-05-26T22:12:50.478780Z] 22:12:50     INFO - GECKO(1487) | 	Dumping Audio Track Buffer(audio/mp4a-latm): mLastAudioTime=2.414875
[task 2017-05-26T22:12:50.480799Z] 22:12:50     INFO - GECKO(1487) | 		Audio Track Buffer Details: NumSamples=52 Size=17062 Evictable=16754 NextGetSampleIndex=52 NextInsertionIndex=52
[task 2017-05-26T22:12:50.482410Z] 22:12:50     INFO - GECKO(1487) | 		Audio Track Buffered: ranges=[(0.000000, 2.414875)]
[task 2017-05-26T22:12:50.484305Z] 22:12:50     INFO - GECKO(1487) | 	Dumping Video Track Buffer(video/avc): mLastVideoTime=1.696666
[task 2017-05-26T22:12:50.485348Z] 22:12:50     INFO - GECKO(1487) | 		Video Track Buffer Details: NumSamples=72 Size=89272 Evictable=0 NextGetSampleIndex=4294967295 NextInsertionIndex=72
[task 2017-05-26T22:12:50.488807Z] 22:12:50     INFO - GECKO(1487) | 		Video Track Buffered: ranges=[(0.095000, 2.498333)]
[task 2017-05-26T22:12:50.489470Z] 22:12:50     INFO - GECKO(1487) | Decoder=7f6c82a98800 MediaDecoderStateMachine State: GetMediaTime=1666325 GetClock=1666734 mMediaSink=7f6c82ae8df0 state=BUFFERING mPlayState=3 mSentFirstFrameLoadedEvent=1 IsPlaying=0 mAudioStatus=idle mVideoStatus=waiting mDecodedAudioEndTime=2414874 mDecodedVideoEndTime=1663333mAudioCompleted=0 mVideoCompleted=0

We see for the video track:
ni=0 no=0 wp=1 demuxr=0 demuxq=1 decoder=0 tt=-1.0 tths=-1 in=24 out=24 qs=1 pending:1 wfd=1 wfk=0 sid=7

so we have a waiting promise, we have num input == num sample
but the demuxed queue is 1. It seems that the decoder really returned more output than input.

As the demuxed queue is 1, we won't attempt to demux a new sample and as such, the waiting promise will never be resolved. hence the stall..
I think I've figured it out.

This test add data from stream 1 and as soon as we have a stall, it adds data from stream 2.

So we have in the sourcebuffer.
0a 1a 2a 3a 4a (0a is IDR)

When we attempt to demux past 4a, we drain the decoder. 
The last frame returned after draining the decoder is 4a. So we call InternalSeek(4)

right at that time, the sourcebuffer content is replaced with:
0b 2b 1b 4b 3b (0b IDR, 2b 4b B-Frames)

This change occurs right as we had previously demuxed frame 4a (from the internal seek). The next frame demuxed is 3b

Due to the streamID change, on a non-keyframe we drain the decoder once again. But as there's no decodepromise pending, the decoded output isn't dropped, which prevent draining to complete.
This is similar to bug 1342913, another case where draining doesn't complete for a very similar reason, yet happening slightly before the one described in bug 1342913.
Assignee: nobody → jyavenard
See Also: → 1342913

I'm a bit cautious with P3, as, as I understand it, on Android calling Flush() will invalidate all decoded frames previously returned even when they haven't been painted yet.
Mind you, if this patch could cause problem, then things are currently broken anyway.

The issue at hand, is when we encounter a gap in the data. We then drain the decoder to make sure you return all frames up to the gap.

We then seek the decoder to the last position decoded, just so that we can resume the decoding once more data is appended to the sourcebuffer.
If Flush() was to invalidate all frames, then things would be broken, as the first thing occurring when seeking is to call Flush(), it's likely not all the frames returned during the previous drain have been painted yet....

Now the peculiar situation occurs should, when performing this internal seek, we detect a change of resolution (this can happen when the previous data is overwritten with new ones).
Previously, nothing would be done, we would drain the decoder once again and get new data .
The consequences to this as that we get those frames twice in the decoded queue (the frames from the first resolution followed by the new ones).

P3 as such, Flush the decoder and any pending decoded frames (not yet returned to the MDSM) so that we don't get duplicated frames.

If calling Flush() does invalidate all returned frames so far, then doing P3 isn't any worse than what is being done.. But it's something we will want to fix anyway.
Flags: needinfo?(jolin)
asking to track because it's yet another reason we have intermittent stalls in YouTube and bug 1342913 (the original problem) was tracked.
These changes are another fix for 1342913.
Component: DOM → Audio/Video: Playback
Comment on attachment 8872468 [details]
Bug 1345898: P1. Add debugging info.
Attachment #8872468 - Flags: review?(gsquelart) → review+
Comment on attachment 8872469 [details]
Bug 1345898: P2. Flush rather than drain decoder.
Attachment #8872469 - Flags: review?(gsquelart) → review+
Attachment #8872470 - Attachment is obsolete: true
Attachment #8872470 - Flags: review?(jolin)
Pushed by
P1. Add debugging info. r=gerald
P2. Flush rather than drain decoder. r=gerald
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment on attachment 8872469 [details]
Bug 1345898: P2. Flush rather than drain decoder.

Approval Request Comment
[Feature/Bug causing the regression]: MSE
[User impact if declined]: Intermittent stalls with video sites using MSE (includes YouTube)
[Is this code covered by automated tests?]: yes.
[Has the fix been verified in Nightly?]: There's been no intermittent since the changes went in.
[Needs manual test from QE? If yes, steps to reproduce]: I reproduced it by running media try, and repeating 100 times, the stall would usually happen 2 or 3 times
[List of other uplifts needed for the feature/fix]: bug 1368662 (which is to go after these ones)
[Is the change risky?]: no
[Why is the change risky/not risky?]: We ignore frames that would have been ignored anyway (as they are to be dropped). It is possible that it introduces an extremely small video glitch, but those would be barely noticeable and would have resulted in a stall anyway.
[String changes made/needed]: none
Flags: needinfo?(jolin)
Attachment #8872469 - Flags: approval-mozilla-beta?
Comment on attachment 8872469 [details]
Bug 1345898: P2. Flush rather than drain decoder.

Since it's not a new regression in 54 and it's very late in Beta54. Let this ride the train and won't fix in 54. Beta54-.
Attachment #8872469 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
See Also: → 1360832
You need to log in before you can comment on or make changes to this bug.