'waiting' event not raised when MSE video playback reaches buffer with no data for time

RESOLVED FIXED in mozilla36

Status

()

RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: cajbir, Assigned: bholley)

Tracking

(Blocks: 1 bug)

unspecified
mozilla36
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

4 years ago
Created attachment 8484731 [details]
test_eos.html

Test demonstrates the issue. It adds data from a webm file with a time section missing. Playback starts and when that time section is reached it is expected that a "waiting" event occurs. The missing section is then added and playback should continue.

Test times out due to no waiting event. 

This was encountered while writing tests to test EOS of decoders in bug 1062055.

Relevant spec section is: http://www.w3.org/TR/media-source/#buffer-monitoring

"If buffered for all objects in activeSourceBuffers do not contain TimeRanges for the current playback position:
* Set the HTMLMediaElement.readyState attribute to HAVE_METADATA.
* If this is the first transition to HAVE_METADATA, then queue a task to fire a simple event named loadedmetadata at the media element.
* Abort these steps.
"

The change to HAVE_METADATA should raise the "waiting" event.

Hopefully I've written the test write and/or understood the spec correctly.
(Reporter)

Updated

4 years ago
Blocks: 778617
(Reporter)

Comment 1

4 years ago
The equivalent test fails in Chrome too. No 'waiting' event is received. Chrome is in ready state HAVE_ENOUGH_DATA.
(Reporter)

Comment 2

4 years ago
Chrome does give a stalled event. Here's the test reworked for Chrome:

http://cd.pn/mse/test_1063323.html
MediaDecoderStateMachine::HasLowUndecodedData always returns false because reliable remains false after calling MediaDecoder::ComputePlaybackRate with MSE.

I'm not sure how well HasLowUndecodedData will work with widely varying bitrate streams like you may have with MSE, because it tries to compute a desired file offset (of current offset plus one second of data) from the current playback rate and resource size, then work out if the desired file offset is available.

What may work is returning/calculating the required values from only the current decoder(s) in use.  Or it may be easier to refactor the buffering calculations so that the MSE implementation can provide an implementation that doesn't require exposing resource offsets and lengths directly.

To make it work at all, we need to at least:

- Call DispatchBytesConsumed in SourceBufferResource::Read
- Return something sane from MediaSourceResource::{GetCachedDataEnd,GetDownloadRate,GetLength,Tell}
- Return something sane from MediaSourceResource:: (for canplaythrough)
(In reply to Matthew Gregan [:kinetik] from comment #3)
> - Return something sane from MediaSourceResource:: (for canplaythrough)

Oops, that was GetDownloadRate, which I moved to the line above.
Since I started digging into this, I'll grab the bug and see what I can do.

Investigating the following possibilities:
1. Implement low-level bits mentioned above
   (may be insufficient long-term due to single stream offset/length issues mentioned above)
2. Override necessary parts of MediaDecoderStateMachine directly
   (HasLowUndecodedData, at least)
3. Abstract out buffering queries/decisions so MSE can supply correct implementations
Assignee: nobody → kinetik
Status: NEW → ASSIGNED
My queue's backing up with other stuff, dropping this bug back in the pool.
Assignee: kinetik → nobody
Status: ASSIGNED → NEW
See Also: → bug 1082155
(Assignee)

Comment 7

4 years ago
I've got this mostly working modulo some crashes which I'll investigate tomorrow. I also haven't pushed to try yet - I've tinkered with a bunch of stuff, so I wouldn't be surprised if stuff went orange there.

cajbir - Are you sure about the 0.8 lower bound in the testcase? I'm getting a waiting event at 0.71, and had to bump the bound down a notch. Does that mean that I should investigate the calculations involved, or were you just guessing?
Flags: needinfo?(cajbir.bugzilla)
(Reporter)

Comment 9

4 years ago
(In reply to Bobby Holley (:bholley) from comment #7)
> cajbir - Are you sure about the 0.8 lower bound in the testcase? I'm getting
> a waiting event at 0.71, and had to bump the bound down a notch. Does that
> mean that I should investigate the calculations involved, or were you just
> guessing?

It was not a guess. It was obtained from dumping the webm file and finding the byte offset for the frame at that time position and not adding the data for that frame. These are the values for the byte positions and their associated time offsets in the file:

| + 0x14f - 335 - 318 - SimpleBlock (key, track number 1, 1 frame(s), timecode 0.000s = 00:00:00.000)
|  + Frame with size 7045 hexdump d0 77 00 9d 01 2a 40 01 f0 00 00 07 08 85 85 88
| + 0x63c5 - 25541 - 25523 - SimpleBlock (key, track number 1, 1 frame(s), timecode 0.400s = 00:00:00.400)
|  + Frame with size 3418 hexdump d0 51 00 9d 01 2a 40 01 f0 00 02 47 08 85 85 88
| + 0xb68a - 46730 - 46712 - SimpleBlock (key, track number 1, 1 frame(s), timecode 0.800s = 00:00:00.800)
|  + Frame with size 3398 hexdump d0 4f 00 9d 01 2a 40 01 f0 00 02 47 08 85 85 88
| + 0x1090b - 67851 - 67833 - SimpleBlock (key, track number 1, 1 frame(s), timecode 1.200s = 00:00:01.200)
|  + Frame with size 3401 hexdump 30 51 00 9d 01 2a 40 01 f0 00 02 47 08 85 85 88
| + 0x15b98 - 88984 - 88966 - SimpleBlock (key, track number 1, 1 frame(s), timecode 1.600s = 00:00:01.600)
|  + Frame with size 3380 hexdump 50 51 00 9d 01 2a 40 01 f0 00 02 47 08 85 85 88
| + 0x1ae3e - 110142 - 110124 - SimpleBlock (key, track number 1, 1 frame(s), timecode 2.000s = 00:00:02.000)
|  + Frame with size 3396 hexdump b0 50 00 9d 01 2a 40 01 f0 00 02 47 08 85 85 88
| + 0x2014c - 131404 - 131386 - SimpleBlock (key, track number 1, 1 frame(s), timecode 2.400s = 00:00:02.400)
|  + Frame with size 3487 hexdump f0 51 00 9d 01 2a 40 01 f0 00 02 47 08 85 85 88
| + 0x254a5 - 152741 - 152723 - SimpleBlock (key, track number 1, 1 frame(s), timecode 2.800s = 00:00:02.800)
|  + Frame with size 3465 hexdump 50 50 00 9d 01 2a 40 01 f0 00 02 47 08 85 85 88
| + 0x2a84d - 174157 - 174139 - SimpleBlock (key, track number 1, 1 frame(s), timecode 3.200s = 00:00:03.200)
|  + Frame with size 3502 hexdump d0 51 00 9d 01 2a 40 01 f0 00 02 47 08 85 85 88
| + 0x2fb6c - 195436 - 195418 - SimpleBlock (key, track number 1, 1 frame(s), timecode 3.600s = 00:00:03.600)
|  + Frame with size 3467 hexdump d0 51 00 9d 01 2a 40 01 f0 00 02 47 08 85 85 88

The 0.8s occurs at byte offset 46712.
Flags: needinfo?(cajbir.bugzilla)
(Assignee)

Comment 10

4 years ago
(In reply to Bobby Holley (:bholley) from comment #7)
> cajbir - Are you sure about the 0.8 lower bound in the testcase? I'm getting
> a waiting event at 0.71, and had to bump the bound down a notch. Does that
> mean that I should investigate the calculations involved, or were you just
> guessing?

I had a look at this yesterday. The basic issue is that (in my debug build), the play tends to overtake the decode at ~0.7s, at which point we stop to buffer. We never resume playback because the estimated usable undecoded data (0.1s) is far below the threshold, and we never get the data because the testcase is waiting for a "waiting" event where the video position is >0.8s.

It seems like we have a fundamental mismatch between buffering heuristics (which I assume are designed to avoid jerky playback) and potential invariants like the one in the testcase. I can think about this more, but as I'm new to this space, I'd be interested to hear others' opinions. NIing some folks who may have them.
Flags: needinfo?(kinetik)
Flags: needinfo?(cpearce)
Flags: needinfo?(cajbir.bugzilla)
(Assignee)

Updated

4 years ago
Blocks: 1082155
(Assignee)

Comment 11

4 years ago
This got pretty big, so I'm going to split some of the pieces here into dependent bugs. The tests for this bug will cover all of that work.
Assignee: nobody → bobbyholley
(Assignee)

Updated

4 years ago
Depends on: 1090983
(Assignee)

Updated

4 years ago
Depends on: 1090991
(Assignee)

Updated

4 years ago
Depends on: 1091008
Probably couple of things going on here:

WebMReader will be blocked waiting for the next frame so it can compute the frame duration for the frame its current decoding, which effectively causes a 1 frame delay.  For this file, I'd expect the last frame to be produced before blocking would be t=0.733, which the reader currently trying to decode t=0.767 and waiting for data at offset 46719..50128 to fetch the start time of the following frame (t=0.8) to compute the frame duration.  (Related: bug 1065207, where the WebM code tries to estimate the frame duration without requiring the following frame's start time.)

The state machine updates currentTime to the start time of the presented frame as we present video and audio frames (but this file has no audio), so currentTime could only become 0.8 once the frame starting at that time is presented.  Unless I'm misremembering, there's no code to update the currentTime to the end time of the current frame in cases where we're waiting on an external event before we can update currentTime again.  If the last frame we can display is the last frame reported in buffered, we'll have a mismatch between buffered and currentTime, because currentTime will report the start time of that frame and buffered will report the end time.

If the buffering logic is holding back frames we could present, that should probably be changed, since it seems like draining the queue of presentable frames before buffering is fine.
Flags: needinfo?(kinetik)
(Assignee)

Comment 13

4 years ago
Thanks for the analysis kinetik. I dug in some more and that does indeed seem to be what's going on. I think for now this probably isn't worth spending time on. I've set the limit at 0.7, and filed bug 1091774 to fix this up at some point.
Flags: needinfo?(cpearce)
Flags: needinfo?(cajbir.bugzilla)
(Assignee)

Updated

4 years ago
Blocks: 1091774
(Assignee)

Comment 14

4 years ago
Here's a green try push for the test: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=33c41388ee21

The test went orange on android just because it was racey - I've fixed it, and done an extra android-only push:

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=485c7234008d
(Assignee)

Comment 15

4 years ago
Created attachment 8515166 [details] [diff] [review]
Tests. v2

At this point, all the work has been done in the dependent bugs, so we just
need the test.
Attachment #8515166 - Flags: review?(cajbir.bugzilla)
(Assignee)

Comment 16

4 years ago
Created attachment 8515320 [details] [diff] [review]
Tests. v3

Android looks good, except that the v.duration issue isn't stable, so the todo
is problematic. I've removed it for now.
Attachment #8515166 - Attachment is obsolete: true
Attachment #8515166 - Flags: review?(cajbir.bugzilla)
Attachment #8515320 - Flags: review?(cajbir.bugzilla)
(Reporter)

Comment 17

4 years ago
Comment on attachment 8515320 [details] [diff] [review]
Tests. v3

Review of attachment 8515320 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/mediasource/test/test_WaitingOnMissingData.html
@@ +16,5 @@
> +  ms.addEventListener("sourceopen", function() {
> +    ok(true, "Receive a sourceopen event");
> +    ok(!receivedSourceOpen, "Should only receive one sourceopen for this test");
> +    receivedSourceOpen = true;
> +    is(ms.readyState, "open", "MediaSource must be in open state after sourceopen");

This test is already done in test_MediaSource.html. As a general rule we're trying not to duplicate tests across the cases. Please remove the tests in this file that are already done in test_MediaSource.
Attachment #8515320 - Flags: review?(cajbir.bugzilla) → review+
(Assignee)

Comment 18

4 years ago
(In reply to cajbir (:cajbir) from comment #17)
> Comment on attachment 8515320 [details] [diff] [review]
> Tests. v3
> 
> Review of attachment 8515320 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/media/mediasource/test/test_WaitingOnMissingData.html
> @@ +16,5 @@
> > +  ms.addEventListener("sourceopen", function() {
> > +    ok(true, "Receive a sourceopen event");
> > +    ok(!receivedSourceOpen, "Should only receive one sourceopen for this test");
> > +    receivedSourceOpen = true;
> > +    is(ms.readyState, "open", "MediaSource must be in open state after sourceopen");
> 
> This test is already done in test_MediaSource.html. As a general rule we're
> trying not to duplicate tests across the cases. Please remove the tests in
> this file that are already done in test_MediaSource.

I removed the readyState and sourceBuffers checks. I want to keep the receivedSourceOpen check, because I added it specifically in response to a bug where we were re-opening the stream multiple times, and the resulting behavior was very non-obvious.
(Assignee)

Updated

4 years ago
Depends on: 1096597
https://hg.mozilla.org/mozilla-central/rev/59490ab8a4f7
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Depends on: 1099187
Depends on: 1099190
You need to log in before you can comment on or make changes to this bug.