Closed Bug 1268868 Opened 4 years ago Closed 4 years ago

Playback stalls on video generated by MSE inside buffered range

Categories

(Core :: Audio/Video: Playback, defect, P1)

46 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla49
Tracking Status
firefox46 --- wontfix
firefox47 --- verified
firefox48 --- verified
firefox49 --- verified

People

(Reporter: bikegriffith, Assigned: jya)

References

Details

Attachments

(3 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:46.0) Gecko/20100101 Firefox/46.0
Build ID: 20160421124000

Steps to reproduce:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:46.0) Gecko/20100101 Firefox/46.0

Load the hls.js demo page for a certain hls video:
http://dailymotion.github.io/hls.js/demo/?src=https%3A%2F%2Frecordings.boxcast.com%2Frecordings%2Fsfcbpomtdkwfbi6cofsl%2F34%2F37%2F480p.m3u8&enableStreaming=true&autoRecoverError=true&enableWorker=true&levelCapping=-1&defaultAudioCodec=undefined

Observe playback stalling at about 4 seconds.

(This sounds a lot like 1239178 but it's different because there's no fraglooploading error ... the buffer does properly fill across the expected continuum).


Actual results:

The playback stalls at 4 seconds into the video and never resumes. You have to seek several seconds past this point to get playback to resume.  This same video plays through without incident in Chrome.

No events are fired that would indicate a problem. Dumped the about:media data and saw nothing suspicious.

```
http://dailymotion.github.io/hls.js/demo/?src=https%3A%2F%2Frecordings.boxcast.com%2Frecordings%2Fsfcbpomtdkwfbi6cofsl%2F34%2F37%2F480p.m3u8&enableStreaming=true&autoRecoverError=true&enableWorker=true&levelCapping=-1&defaultAudioCodec=undefined
	mediasource:http://dailymotion.github.io/d2a6a4cf-26a2-6743-97ba-cbef1c7abf65
	currentTime: 4.13452
	Quality: 100% (total:128 dropped:0 corrupted:0)
	Buffered ranges: [(0.082625, 40.646666)]
		SourceBuffer 0
			start=0.082625 end=40.660479
		SourceBuffer 1
			start=0.066666 end=40.646666
	Internal Data:
	audio decoder: apple CoreMedia decoder
	audio frames decoded: 384
	video decoder: apple hardware VT decoder
	hardware video decoding: enabled
	video frames decoded: 121 (skipped:0)
	Dumping data for demuxer 19a353000:
		Dumping Audio Track Buffer(audio/mp4a-latm): - mLastAudioTime: 6.205291
			NumSamples:1902 Size:1101799 NextGetSampleIndex:287 NextInsertionIndex:1902
			Buffered: ranges=[(0.082625, 40.660479)]
		Dumping Video Track Buffer(video/avc) - mLastVideoTime: 4.165911
			NumSamples:1187 Size:5016389 NextGetSampleIndex:110 NextInsertionIndex:1187
			Buffered: ranges=[(0.066666, 40.646666)]
```


Expected results:

Video should play through to end, or at least give some warning/error that can help us understand what is unique about the buffered audio/video track that is causing issues.
Component: Untriaged → Audio/Video: Playback
Product: Firefox → Core
do you see this happening on windows too?
Flags: needinfo?(bikegriffith)
Assignee: nobody → jyavenard
Priority: -- → P1
Yes, confirmed that it stops at 4 seconds on Windows 10 / FF46 as well

Mozilla/5.0 (Windows NT 10.0; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0
thank you for this stream, awesome.. we had been chasing an intermittent stall for month, and finally a way to reproduce it consistently...
[Tracking Requested - why for this release]: we have had reports of intermittent playback stall for a while, highly confident that the problem exhibited by this video is actually the problem. getting this in 47 is a must really.
Flags: needinfo?(bikegriffith)
actually, this is not the problem I thought it was (though i did notice a problem in how we can recover should data be evicted)

In this stream, we stall because the gap between two sample's dts is too big, and it's treated as a gap in the data.

132577 end:4165911, dts:3665911, duration:33334, kf:0)
[MediaPlayback #2]: D/MediaSourceSamples TrackBuffersManager(1436e8000:video/mp4)::ProcessFrames: Processing video/avc frame(pts:4165911 end:4199244, dts:4099244, duration:33333, kf:0)

so samples at index 110 has a dts of 4.099244s and the one before that is 3.665911s ; that's 0.433333s we do not jump over gap that are greater than 125ms.

The pts are correct, and as the buffered range are based on the pts it looks continuous, and this is why about:media doesn't show any abnormal.

It's poorly muxed.

Guillaume, is this something you could get around? like rewrite the dts so they are continuous without such abnormal gap?

I could probably ignore such a big gap if the pts are continuous (which here they are, there a no re-ordered frames).
Flags: needinfo?(g.du.pontavice)
oh, it's likely the same as hls.js issue #333
https://github.com/dailymotion/hls.js/issues/333
Thank you for analyzing this segment.  What is the possibility of Firefox giving some more useful troubleshooting information in cases like this, either through the about:media extension or by directly raising an error?
I tried with an previous nightly of 44, the issue was already present, after 4 sec, the playback stops.
(In reply to bikegriffith from comment #7)
> Thank you for analyzing this segment.  What is the possibility of Firefox
> giving some more useful troubleshooting information in cases like this,
> either through the about:media extension or by directly raising an error?

You can add logging on the console:
NSPR_LOG_MODULES=MediaSource:5,MediaSourceSamples:5

will show the details on every frames parsed.

Per MSE spec, such segment should leave gaps in the buffered range:
https://w3c.github.io/media-source/#sourcebuffer-coded-frame-processing

"If last decode timestamp for track buffer is set and the difference between decode timestamp and last decode timestamp is greater than 2 times last frame duration:

        If mode equals "segments":
            Set group end timestamp to presentation timestamp.
        If mode equals "sequence":
            Set group start timestamp equal to the group end timestamp.

        Unset the last decode timestamp on all track buffers.
        Unset the last frame duration on all track buffers.
        Unset the highest end timestamp on all track buffers.
        Set the need random access point flag on all track buffers to true.
        Jump to the Loop Top step above to restart processing of the current coded frame.

"

That means that all frames being added between the one with the dodgy dts and the next keyframe should be skipped and ignored.

The difference here between the two decode timestamps is 0.433333s.
The duration of the last frame was: 0.033334, twice that is 0.066668.

So all frames between 4.165911s 10.166622s should have not been added to the source buffer.

Unfortunately, I added an optimisation where we treat all samples within a MP4 moof as continuous and without gap.

Right now when adding the frames they are added anyway, regardless of the gap in decode time. But the player however that reads the frame do not skip over that gap.

I believe the hls.js has code to skip over the gaps it detects (from the buffered attribute), so we just need to properly report it.
It would still be a 6s data gap, that's a lot and if those frames with bad dts happen often, it will make the viewing experience painful.
Even after re-enabling the gap detection within a media segment (for mp4 it's a moof+mdat box) I still get the issue.

Now something that is interesting:
ProcessFrames: Processing video/avc frame(pts:3632577 end:3665911, dts:3565911, duration:33334, kf:0)
ProcessFrames: Processing video/avc frame(pts:3665911 end:4099244, dts:3599244, duration:433333, kf:0)
ProcessFrames: Processing video/avc frame(pts:4099244 end:4132577, dts:3632577, duration:33333, kf:1)
ProcessFrames: Processing video/avc frame(pts:4132577 end:4165911, dts:3665911, duration:33334, kf:0)
ProcessFrames: Processing video/avc frame(pts:4165911 end:4199244, dts:4099244, duration:33333, kf:0)
ProcessFrames: Processing video/avc frame(pts:4199244 end:4232533, dts:4132577, duration:33289, kf:0)

I have varied the algorithm from the MSE spec to allow a gap of not twice the last frame duration, but twice the longest frame duration seen in the moof.

frame(pts:3665911 end:4099244, dts:3599244, duration:433333, kf:0) has a duration of 0.433333s but looking at the next sample:
frame(pts:4099244 end:4132577, dts:3632577, duration:33333, kf:1)

So even though the duration of the frame is 433333us at a dts of 3599244us, then next dts is 3632577us, indicating a duration of 33333us only.

The dts in this file are definitely messed up. If we were to go by the spec, we should end up with the gap I mentioned earlier and we would stall anyway.

When we used exactly the spec and used the last duration (rather than the longest duration) it would also cause non fully proper file not to play (and unfortunately, those includes YouTube encoding) as sometimes we would have frames at the start that would be only 1 or 2us long but the gap between the dts was > 2 it would gap at the start causing a video to not start.

Maybe I could narrow the gap detection to only use the longest duration in the GOP rather than the whole media segment.
It had been added to improve speed but it means that we don't properly handle invalid media segment with discontinuities.

Review commit: https://reviewboard.mozilla.org/r/49881/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/49881/
Attachment #8747405 - Flags: review?(gsquelart)
Attachment #8747406 - Flags: review?(gsquelart)
Attachment #8747407 - Flags: review?(gsquelart)
This makes us closer to the spec, while still allowing some leeway in gap detection which was found to too strict in the past.

Review commit: https://reviewboard.mozilla.org/r/49883/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/49883/
If the last frames of a media segment were evicted due to gap detection, mLongestFrameDuration would have been reset.

Additionally, simplify the code by using temporary variables.

Review commit: https://reviewboard.mozilla.org/r/49885/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/49885/
Attachment #8747405 - Flags: review?(gsquelart) → review+
Comment on attachment 8747405 [details]
MozReview Request: Bug 1268868: [MSE] P1. Re-enable gap detection within a media segment. r?gerald

https://reviewboard.mozilla.org/r/49881/#review46615
Attachment #8747406 - Flags: review?(gsquelart)
Comment on attachment 8747406 [details]
MozReview Request: Bug 1268868: [MSE] P2. Reset longest duration after keyframe is seen. r?gerald

https://reviewboard.mozilla.org/r/49883/#review46617

r+ with nits:

In commit description, 'to too strict' -> 'to be too strict' (add missing 'be').

::: dom/media/mediasource/TrackBuffersManager.cpp:1550
(Diff revision 1)
> -           ? trackBuffer.mLastFrameDuration.ref()
> +             ? trackBuffer.mLastFrameDuration.ref()
> +             : sample->mKeyframe
> +                 ? trackBuffer.mLastFrameDuration.ref()
> -           : std::max(trackBuffer.mLastFrameDuration.ref(),
> +                 : std::max(trackBuffer.mLastFrameDuration.ref(),
> -                      trackBuffer.mLongestFrameDuration.ref()));
> +                            trackBuffer.mLongestFrameDuration.ref()));

'?' and ':' should be left-aligned with the start of the test-expression above (as they were before).
https://reviewboard.mozilla.org/r/49883/#review46619

::: dom/media/mediasource/TrackBuffersManager.cpp:1550
(Diff revision 1)
> -           ? trackBuffer.mLastFrameDuration.ref()
> +             ? trackBuffer.mLastFrameDuration.ref()
> +             : sample->mKeyframe
> +                 ? trackBuffer.mLastFrameDuration.ref()
> -           : std::max(trackBuffer.mLastFrameDuration.ref(),
> +                 : std::max(trackBuffer.mLastFrameDuration.ref(),
> -                      trackBuffer.mLongestFrameDuration.ref()));
> +                            trackBuffer.mLongestFrameDuration.ref()));

Why?

I thought that having a 2 spaces indentation whenever a new line start was more appropriate.
https://reviewboard.mozilla.org/r/49883/#review46619

> Why?
> 
> I thought that having a 2 spaces indentation whenever a new line start was more appropriate.

I don't make the rules, I only enforce them! :-)
https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style#Operators
Comment on attachment 8747406 [details]
MozReview Request: Bug 1268868: [MSE] P2. Reset longest duration after keyframe is seen. r?gerald

https://reviewboard.mozilla.org/r/49883/#review46625

(forgot to actually r+ before)
Attachment #8747406 - Flags: review+
Comment on attachment 8747407 [details]
MozReview Request: Bug 1268868: [MSE] P3. Prevent crash should gap be detected in content. r?gerald

https://reviewboard.mozilla.org/r/49885/#review46621

Usual r+ with nit:

::: dom/media/mediasource/TrackBuffersManager.cpp:1466
(Diff revision 1)
>      // OR
>      // If last decode timestamp for track buffer is set and the difference between decode timestamp and last decode timestamp is greater than 2 times last frame duration:
>  
>      if (trackBuffer.mLastDecodeTimestamp.isSome() &&
>          (decodeTimestamp < trackBuffer.mLastDecodeTimestamp.ref() ||
> -         decodeTimestamp - trackBuffer.mLastDecodeTimestamp.ref() > 2*trackBuffer.mLongestFrameDuration.ref())) {
> +         decodeTimestamp - trackBuffer.mLastDecodeTimestamp.ref() > 2*trackBuffer.mLongestFrameDuration)) {

Line (was and) is too long, please split at '>'. And while there, add spaces around multiplication operator.
Attachment #8747407 - Flags: review?(gsquelart) → review+
Comment on attachment 8747405 [details]
MozReview Request: Bug 1268868: [MSE] P1. Re-enable gap detection within a media segment. r?gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49881/diff/1-2/
Comment on attachment 8747406 [details]
MozReview Request: Bug 1268868: [MSE] P2. Reset longest duration after keyframe is seen. r?gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49883/diff/1-2/
Comment on attachment 8747407 [details]
MozReview Request: Bug 1268868: [MSE] P3. Prevent crash should gap be detected in content. r?gerald

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49885/diff/1-2/
https://hg.mozilla.org/mozilla-central/rev/d14dc334734a
https://hg.mozilla.org/mozilla-central/rev/83bcf409c4ed
https://hg.mozilla.org/mozilla-central/rev/697ce34a416e
Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
See Also: → 1269325
What's the context for the backout mentioned in comment 25?  Does this bug need to be reopened?  (And if not, is there a followup tracking the re-landing of the backed-out bit, if it needs re-landing?)
Flags: needinfo?(jyavenard)
(In reply to Daniel Holbert [:dholbert] from comment #27)
> What's the context for the backout mentioned in comment 25?  Does this bug
> need to be reopened?  (And if not, is there a followup tracking the
> re-landing of the backed-out bit, if it needs re-landing?)

The backout was one way to fix this particular bug, but happened to be incomplete and didn't fix other variations seen. Bug 1269325 has the "better" fix and fix it for all cases.

The remaining two commits that I found happened to be bugs that did exist and that I found while flying over the code.
Flags: needinfo?(jyavenard)
Comment on attachment 8747406 [details]
MozReview Request: Bug 1268868: [MSE] P2. Reset longest duration after keyframe is seen. r?gerald

Uplift request is for P2 and P3 only.

Approval Request Comment
[Feature/regressing bug #]: 1268868
[User impact if declined]: Crash with some user data. Seen with YouTube MSE conformance test. So the crash can happen with YouTube and I believe one of the reason we have had some unexplainable crashes with youtube.
[Describe test coverage new/current, TreeHerder]: In central, lots of mochitest
[Risks and why]: Can't think of any, it's a fix
[String/UUID change made/needed]: None
Attachment #8747406 - Flags: approval-mozilla-beta?
Attachment #8747406 - Flags: approval-mozilla-aurora?
Hello bikegriffith, could you please verify this issue is fixed as expected on a latest Nightly build? Thanks!
Flags: needinfo?(bikegriffith)
Comment on attachment 8747406 [details]
MozReview Request: Bug 1268868: [MSE] P2. Reset longest duration after keyframe is seen. r?gerald

Let's uplift to Aurora48 and stabilize before considering uplifting to Beta47.
Attachment #8747406 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Ritu Kothari (:ritu) from comment #30)
> Hello bikegriffith, could you please verify this issue is fixed as expected
> on a latest Nightly build? Thanks!

Confirmed fixed as expected in 49.0a1 (2016-05-11)

Thank you!
Flags: needinfo?(bikegriffith)
(In reply to bikegriffith from comment #32)
> (In reply to Ritu Kothari (:ritu) from comment #30)
> > Hello bikegriffith, could you please verify this issue is fixed as expected
> > on a latest Nightly build? Thanks!
> 
> Confirmed fixed as expected in 49.0a1 (2016-05-11)
> 
> Thank you!

Awesome! Thanks a lot for a prompt verification. :)
Status: RESOLVED → VERIFIED
Comment on attachment 8747406 [details]
MozReview Request: Bug 1268868: [MSE] P2. Reset longest duration after keyframe is seen. r?gerald

Fix was verified on Nightly, Beta47+
Attachment #8747406 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Blocks: 1272916
Part 3 isn't applying cleanly on beta. Can we see if it needs rebasing?
Flags: needinfo?(jyavenard)
Flags: needinfo?(jyavenard)
Flags: needinfo?(g.du.pontavice)
Flags: needinfo?
Flagging this for verification on 47.0b7.
Flags: qe-verify+
Reproduced with Nightly 48.0a1 from 2016-04-18.
Verified fixed with Firefox 47 beta 7 (Build ID: 20160518173344) and latest Aurora 48.0a2, across platforms [1].

[1] Windows 10 64-bit, Mac OS X 10.11.1 and Ubuntu 13.10 64-bit
No longer blocks: 1278005
Depends on: 1278005
You need to log in before you can comment on or make changes to this bug.