test_BufferingWait.html gets stuck mid-stream

RESOLVED WONTFIX

Status

()

defect
P3
normal
RESOLVED WONTFIX
5 years ago
4 years ago

People

(Reporter: bholley, Assigned: bholley)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

In the tests I added in bug 1093020, playback gets stuck at around 2s, which we should investigate. Thankfully, this happens after the stuff this test is actually testing, so I can just invoke SimpleTest.finish() at the right moment and split this off into a followup bug.
Ahah! So this is an issue that I had sort of imagined must be a problem when reading the code.

The basic issue is that a video frame gets split across multiple track buffers. MediaSourceReader::SelectReader thinks that the current reader can handle the frame, because the start of the frame is indeed in the range returned by GetBuffered. But then we get partway through decoding it, run out of data, and SourceBufferResource::Read rather inelegantly blocks waiting for more data (which will never come, because it's in a different TrackBuffer).

We could make the calculations in SelectReader more accurate, but it seems hard to guarantee that we'll get them 100% right, and the current cost of failure is locking up a decoder threader (and a decoder) forever.

So it seems like we want to stop blocking in SourceBufferResource, and add the ability for the subdecoder to propagate the failure back to MediaSourceReader, which can then either switch to a new reader, or inform the state machine that it's out of data (using the machinery I added in bug 1090991).

If it _does_ have the data (just in a different track buffer), we need to figure out what to do. Do we try to backfill the decoder with the first half of the frame, or just drop the frame and move on? The second is much easier, but I'm not sure whether the impact on playback is something we'd be unhappy with.

I'm happy to start tackling this problem, but don't want to step on anybody's toes if other people are already working on it. Anthony, is this on the map somewhere? Are people working on it?
Flags: needinfo?(ajones)
Is this bug 1089937 and/or bug 1062055?
(In reply to Bobby Holley (:bholley) from comment #1)
> Ahah! So this is an issue that I had sort of imagined must be a problem when
> reading the code.
> 
> The basic issue is that a video frame gets split across multiple track
> buffers. MediaSourceReader::SelectReader thinks that the current reader can
> handle the frame, because the start of the frame is indeed in the range
> returned by GetBuffered. But then we get partway through decoding it, run
> out of data, and SourceBufferResource::Read rather inelegantly blocks
> waiting for more data (which will never come, because it's in a different
> TrackBuffer).

There is a single SourceBuffer so I would expect a single TrackBuffer, but I
assume the issue exists because of multiple SourceBufferResources/Decoders/etc.

> So it seems like we want to stop blocking in SourceBufferResource, and add
> the ability for the subdecoder to propagate the failure back to
> MediaSourceReader, which can then either switch to a new reader, or inform
> the state machine that it's out of data (using the machinery I added in bug
> 1090991).

Blocking may be bug 1062055.

> If it _does_ have the data (just in a different track buffer), we need to
> figure out what to do. Do we try to backfill the decoder with the first half
> of the frame, or just drop the frame and move on? The second is much easier,
> but I'm not sure whether the impact on playback is something we'd be unhappy
> with.

If media segments contain partial info and are out of order, then my
interpretation is that we should drop the coded frames.  This is kind of
covered by the need random access point flag in the Coded Frame Processing
algorithm, but I'm not sure it is exactly what you describe.

https://dvcs.w3.org/hg/html-media/raw-file/default/media-source/media-source.html#sourcebuffer-coded-frame-processing

6.

    5. Set the need random access point flag on all track buffers to true.

11. If the need random access point flag on track buffer equals true, then run the following steps:

    1. If the coded frame is not a random access point, then drop the coded frame and jump to the top of the loop to start processing the next coded frame.
    2. Set the need random access point flag on track buffer to false.
(In reply to Karl Tomlinson (:karlt) from comment #3)
> There is a single SourceBuffer so I would expect a single TrackBuffer, but I
> assume the issue exists because of multiple
> SourceBufferResources/Decoders/etc.

Err, right. Sorry.
 
> > So it seems like we want to stop blocking in SourceBufferResource, and add
> > the ability for the subdecoder to propagate the failure back to
> > MediaSourceReader, which can then either switch to a new reader, or inform
> > the state machine that it's out of data (using the machinery I added in bug
> > 1090991).
> 
> Blocking may be bug 1062055.

Ok, I'll block on that for now.

> 
> > If it _does_ have the data (just in a different track buffer), we need to
> > figure out what to do. Do we try to backfill the decoder with the first half
> > of the frame, or just drop the frame and move on? The second is much easier,
> > but I'm not sure whether the impact on playback is something we'd be unhappy
> > with.
> 
> If media segments contain partial info and are out of order, then my
> interpretation is that we should drop the coded frames.

SGTM.

> This is kind of
> covered by the need random access point flag in the Coded Frame Processing
> algorithm, but I'm not sure it is exactly what you describe.

Hm, I never quite grokked how all that 'last decode timestamp' stuff was supposed to work. Given the discontinuity here, don't we end up with an entirely new coded frame group here? Or do the coded frame groups merge somehow when we append the missing segment?
Depends on: 1062055
Flags: needinfo?(ajones)
Flags: needinfo?(karlt)
(In reply to Bobby Holley (:bholley) from comment #4)
> Hm, I never quite grokked how all that 'last decode timestamp' stuff was
> supposed to work. Given the discontinuity here, don't we end up with an
> entirely new coded frame group here? Or do the coded frame groups merge
> somehow when we append the missing segment?

We discussed this on IRC and I looked at hte spec some more. The basic gist is that 'last decode timstamp' gets reset in step 6 _after_ the condition is checked.

So in summaey, this means that, in the event of frames appended out-of-order, we should drop frames until we hit a random access point.
Flags: needinfo?(karlt)
Do we need this for YT?
Flags: needinfo?(bobbyholley)
I just looked at this again, and think we do indeed need to fix it, assuming youtube ever does out-of-order appends.

What happens is that we do 3 appends:
(A) [0, 0.8)
(B) [1.6, 4)
(C) [0.8, 1.6)

When we're decoding chunk (C), we do a synchronous/blocking call to nestegg_read_packet. But the entire packet is split across (C) and (B), so the SourceBufferResource corresponding to (C) can never produce it. So we hang the thread indefinitely like this:

http://pastebin.mozilla.org/8077060

cajbir - are either of you aware of any overlap between this and existing plans? I'm happy to dig into this and propose something, but don't want to step on any toes.
Flags: needinfo?(bobbyholley) → needinfo?(cajbir.bugzilla)
Does this issue affect WebM and MP4?
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #7)
> cajbir - are either of you aware of any overlap between this and existing
> plans? I'm happy to dig into this and propose something, but don't want to
> step on any toes.

I'm not aware of any overlap.
Flags: needinfo?(cajbir.bugzilla)
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #7)
> I just looked at this again, and think we do indeed need to fix it, assuming
> youtube ever does out-of-order appends.
> 
> What happens is that we do 3 appends:
> (A) [0, 0.8)
> (B) [1.6, 4)
> (C) [0.8, 1.6)
> 
> When we're decoding chunk (C), we do a synchronous/blocking call to
> nestegg_read_packet. But the entire packet is split across (C) and (B), so
> the SourceBufferResource corresponding to (C) can never produce it. So we
> hang the thread indefinitely like this:

Is there a reason we can't detect that A and C are continuous, and merge them?
Flags: needinfo?(bobbyholley)
We discussed this in the meeting. There are basically two potential issues:

(A) The demuxer needs to be conservative and only claim that a segment is buffered when it has that segment in entirety (rather than just having the headers and _some_ data).

(B) Potential discontinuities resulting from frames that were effectively destroyed by being split into two media segments.

It sounds like Anthony's work on the mp4 demuxer gives us a better story there for (A) than on webm, so that's probably a non-issue for mp4.

In terms of (B) - I re-read the segment parser loop algorithm, and it's more restrictive than I previously thought (which is good). Specifically, once JS starts appending a media segment, it needs to finish that media segment, and can't jump around to other ones. So from a correctness perspective, we can assume that we get complete media segments (moof + 1 or more mdats).

So whether (B) is a problem or not depends on whether a single frame can ever be split between two mdats (and two moofs). I'd hope that this is disallowed, but I don't know offhand (and the spec is imposing). Anthony, do you know?

B-frames also make this interesting. Can we ever have a trailing B-frame which depends on an I-frame in a different fragment (or, for that matter, a P-frame that depends on a previous fragment)? If so, that frame is effectively unplayable unless we implement chris' suggestion in comment 10, and we need to make sure that the decoder doesn't block indefinitely trying to read the I-Frame while rendering the B-frame.
Flags: needinfo?(bobbyholley) → needinfo?(ajones)
Blocks: 1115096
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #11)
> We discussed this in the meeting. There are basically two potential issues:
> 
> (A) The demuxer needs to be conservative and only claim that a segment is
> buffered when it has that segment in entirety (rather than just having the
> headers and _some_ data).
> 
> (B) Potential discontinuities resulting from frames that were effectively
> destroyed by being split into two media segments.
> 
> It sounds like Anthony's work on the mp4 demuxer gives us a better story
> there for (A) than on webm, so that's probably a non-issue for mp4.
> 
> In terms of (B) - I re-read the segment parser loop algorithm, and it's more
> restrictive than I previously thought (which is good). Specifically, once JS
> starts appending a media segment, it needs to finish that media segment, and
> can't jump around to other ones. So from a correctness perspective, we can
> assume that we get complete media segments (moof + 1 or more mdats).

The buffered range logic in Index::ConvertByteRangesToTimeRanges() using MoofParser is that we need the entire moof atom and the extents of the mdat associated with the actual bytes we need.

> So whether (B) is a problem or not depends on whether a single frame can
> ever be split between two mdats (and two moofs). I'd hope that this is
> disallowed, but I don't know offhand (and the spec is imposing). Anthony, do
> you know?

The offsets are relative to the start of the moof so whether they're in one or many mdat segments, or something else entirely doesn't make a lick of difference to the parser. Refer to the tests in TestMP4Reader.cpp

> B-frames also make this interesting. Can we ever have a trailing B-frame
> which depends on an I-frame in a different fragment (or, for that matter, a
> P-frame that depends on a previous fragment)? If so, that frame is
> effectively unplayable unless we implement chris' suggestion in comment 10,
> and we need to make sure that the decoder doesn't block indefinitely trying
> to read the I-Frame while rendering the B-frame.

We don't handle those cases specifically.
Flags: needinfo?(ajones)
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #12)
> (In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect
> things) from comment #11)
> > We discussed this in the meeting. There are basically two potential issues:
> > 
> > (A) The demuxer needs to be conservative and only claim that a segment is
> > buffered when it has that segment in entirety (rather than just having the
> > headers and _some_ data).
> > 
> > (B) Potential discontinuities resulting from frames that were effectively
> > destroyed by being split into two media segments.
> > 
> > It sounds like Anthony's work on the mp4 demuxer gives us a better story
> > there for (A) than on webm, so that's probably a non-issue for mp4.
> > 
> > In terms of (B) - I re-read the segment parser loop algorithm, and it's more
> > restrictive than I previously thought (which is good). Specifically, once JS
> > starts appending a media segment, it needs to finish that media segment, and
> > can't jump around to other ones. So from a correctness perspective, we can
> > assume that we get complete media segments (moof + 1 or more mdats).
> 
> The buffered range logic in Index::ConvertByteRangesToTimeRanges() using
> MoofParser is that we need the entire moof atom and the extents of the mdat
> associated with the actual bytes we need.
> 
> > So whether (B) is a problem or not depends on whether a single frame can
> > ever be split between two mdats (and two moofs). I'd hope that this is
> > disallowed, but I don't know offhand (and the spec is imposing). Anthony, do
> > you know?
> 
> The offsets are relative to the start of the moof so whether they're in one
> or many mdat segments, or something else entirely doesn't make a lick of
> difference to the parser. Refer to the tests in TestMP4Reader.cpp

Ok. If offsets are relative to the start of the moof, then we should be guaranteed that no frame belongs to two moofs, and since a media segment is marked by a moof, we should be guaranteed that frams are never split across media segments.

> > B-frames also make this interesting. Can we ever have a trailing B-frame
> > which depends on an I-frame in a different fragment (or, for that matter, a
> > P-frame that depends on a previous fragment)? If so, that frame is
> > effectively unplayable unless we implement chris' suggestion in comment 10,
> > and we need to make sure that the decoder doesn't block indefinitely trying
> > to read the I-Frame while rendering the B-frame.
> 
> We don't handle those cases specifically.

Ok. Filed bug 1117903.
Priority: -- → P2
I seem to be running into this issue on Nightly with MSE and WebM on. Youtube only plays for 2s before either stopping or just audio cutting out.
(In reply to Thomas Daede from comment #14)
> I seem to be running into this issue on Nightly with MSE and WebM on.
> Youtube only plays for 2s before either stopping or just audio cutting out.

webm is basically unsupported for the time being, and should be preffed off in all Nightlies. Did you pref it on yourself?
Yes, I had pref'd it on. It does seem to be default off. A bit odd that MP4 works and WebM doesn't, considering that Youtube's primary format is VP9/WebM now.
(In reply to Thomas Daede from comment #16)
> considering that Youtube's primary format is VP9/WebM now.

Kinda sorta not really. They prefer webm, but not all videos are available in webm.
Priority: P2 → P3
The issue described can't happen with the new MSE architecture. Marking it as WONTFIX for the old MSE
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.