Closed Bug 1119456 Opened 5 years ago Closed 5 years ago

Stop doing blocking Moof reads from within the mp4 demuxer

Categories

(Core :: Audio/Video, defect, P1)

x86
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla37
Tracking Status
firefox36 --- fixed
firefox37 --- fixed
firefox38 --- fixed

People

(Reporter: bholley, Assigned: bholley)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 9 obsolete files)

13.64 KB, patch
ajones
: review+
Details | Diff | Splinter Review
2.33 KB, patch
bholley
: review+
Details | Diff | Splinter Review
In bug 1114383, we had to unlock the mp4 demuxer when calling out to do blocking moof reads. It turns out this doesn't work, because the demuxer code isn't re-entrant. If another thread comes along and rebuilds the index while one thread is doing a blocking moof read, we'll end up using stale locals to compute persistent parser state. This manifested itself when I was debugging the tests in bug 1064128 - chunks of the audio would occasionally play twice because the moof was read twice, causing the audio and video to get out of sync.

It's tempting to "do this right" by making all of this stuff fully asynchronous with promises. I think we don't have the time to do that right now, especially since changing the async interaction patterns here in valid ways might still uncover other bugs. So the goal here is to do the minimal possible change to making things correct, and later do the long-term fix jya suggested in bug 1119208.

I think the least invasive change is to simply move the blocking out of the demuxer and into the caller (MP4Reader::DemuxAudioSample). IIUC this code does need to work for non-MSE Resources (i.e. non-SourceBufferResources), so we may have to get a bit tricky with the API to do the waiting properly. But I think it's relatively doable.

This should fix bug 1114682, as well as other potential a/v sync issues we've been having.
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #0)
> IIUC this code
> does need to work for non-MSE Resources (i.e. non-SourceBufferResources), so
> we may have to get a bit tricky with the API to do the waiting properly. But
> I think it's relatively doable.

Yes. MacOSX and Windows non-MSE <video> uses this demuxer in Gecko 35 and up.
Update - Anthony and I flirted with the idea of demuxer-per-thread, but we decided at the end of the meeting to just stick with the original plan and get rid of the blocking reads. I have patches that do this - just working out the kinks.
We can simply do one blocking and one non-blocking demuxer. That is only a two line change. Alternatively you can continue your ironing if you prefer.
Assignee: bobbyholley → ajones
Status: NEW → ASSIGNED
Comment on attachment 8546387 [details] [diff] [review]
Use a separate MoofParser for demuxing

Review of attachment 8546387 [details] [diff] [review]:
-----------------------------------------------------------------

This doesn't solve the race condition in the seek case, right? Fundamentally I think we need to get rid of the unlocking behavior, or we're going to keep missing things like this.

My patches more or less work. I'm going to massage them a bit tomorrow morning and then I'll put them up for review.
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #5)
> This doesn't solve the race condition in the seek case, right? Fundamentally
> I think we need to get rid of the unlocking behavior, or we're going to keep
> missing things like this.

When we seek we do:
* Seek Video
* Pop Sample
* Seek Audio

Although this doesn't apply to the MSE case where audio and video are separate. We don't need to worry too much about getting this case perfect right now because fMP4 is rarely used if ever outside of MSE.

It is looking more and more like the proper solution for MSE is to demux everything when it arrives and store it in a big data structure. That is definitely the long term plan.
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #6)
> (In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect
> things) from comment #5)
> > This doesn't solve the race condition in the seek case, right? Fundamentally
> > I think we need to get rid of the unlocking behavior, or we're going to keep
> > missing things like this.
> 
> When we seek we do:
> * Seek Video
> * Pop Sample
> * Seek Audio
> 
> Although this doesn't apply to the MSE case where audio and video are
> separate.

My point is that, if a decode thread is in the middle of doing a blocking read with the lock released, another thread can come by and seek the index, which will cause local variables to be out of date when the block read stack unwinds. Is there a reason this isn't a problem?
I'm not saying it isn't a problem. You are right and you should land your solution when it is ready. I'm just adding that there is a potentially annoying seeking issue that we can probably punt into next week.
I've got working patches. Spinning off some sub-bugs.
Depends on: 1120014
Depends on: 1120017
Depends on: 1120023
Blocks: 1120030
We've determined that this isn't necessary, since all of the internal monitors
on MediaResources are "leaf monitors" (i.e. they will always be the last monitor
acquired), so we don't need to worry about deadlocks.
Attachment #8546948 - Flags: review?(cpearce)
Assignee: ajones → bobbyholley
With the whole stack applied, I can occasionally seek the minecraft video back and forward enough that the audio sink gets confused and starts doing PlaySilence while it plays the video. Once I figure out the test timeouts in the dependent patch (bug 1120023#c12) I'll retest.
Comment on attachment 8546946 [details] [diff] [review]
Part 1 - Switch blocking reads to the aWaitingForData model. v1

Review of attachment 8546946 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/libstagefright/binding/Index.cpp
@@ +152,5 @@
>  
>    nsTArray<Moof>& moofs = mIndex->mMoofParser->Moofs();
>    while (true) {
>      if (mCurrentMoof == moofs.Length()) {
> +      if (!mIndex->mMoofParser->TryToReadNextMoof()) {

It seems to me that you'll now be unable to distinguish between an invalid moof, and waiting for data? So if you're passed an invalid file, you can't fail and playback will stall indefinitely? Seems like you should be passing aWaitingForData through here, and distinguishing the error condition.

@@ +180,5 @@
>    mCurrentMoof = 0;
>    mCurrentSample = 0;
>    Sample* sample;
> +  bool ignored;
> +  while (!!(sample = Get(&ignored))) {

Should this be ignored? That sounds dangerous to me.

::: media/libstagefright/binding/include/mp4_demuxer/MoofParser.h
@@ +194,5 @@
>    void ParseTrak(Box& aBox);
>    void ParseMdia(Box& aBox, Tkhd& aTkhd);
>    void ParseMvex(Box& aBox);
>  
> +  bool TryToReadNextMoof();

Pass aWaitingForData through?
Attachment #8546947 - Flags: review?(cpearce) → review-
Comment on attachment 8546947 [details] [diff] [review]
Part 2 - Remove now-unused blocking read API from MP4Stream. v1

Review of attachment 8546947 [details] [diff] [review]:
-----------------------------------------------------------------

Oh oops, just kidding, r+.
Attachment #8546947 - Flags: review- → review+
Attachment #8546948 - Flags: review?(cpearce) → review+
Attachment #8546946 - Flags: review?(cpearce) → review+
So this doesn't work, because non-MSE mp4 actually _does_ rely on blocking, and we have no way to do WaitForData for non-MSE.

I went back to a dumber (but simpler) strategy in which we make all demuxer reads non-blocking, and record the parameters of the last failed failed read on the MP4Stream. If we find ourselves erroring out of the demuxer, we check to see if the most recent read failed, and we re-attempt it as a blocking read (while releasing the demuxer). If that works, we just try again. This more or less emulates our current behavior while hoisting the blocking out a few frames.
Attachment #8546387 - Attachment is obsolete: true
Attachment #8546946 - Attachment is obsolete: true
Attachment #8546947 - Attachment is obsolete: true
Attachment #8546948 - Attachment is obsolete: true
Attachment #8547017 - Flags: review?(ajones)
Attachment #8547018 - Flags: review?(ajones)
Comment on attachment 8547017 [details] [diff] [review]
Part 1 - Switch all of MP4Demuxer to CachedReadAt and hoist blocking into callers with a hacky retry strategy. v1

I think this strategy works, but I need to tweak the patches a little bit.
Attachment #8547017 - Attachment is obsolete: true
Attachment #8547017 - Flags: review?(ajones)
Attachment #8547018 - Attachment is obsolete: true
Attachment #8547018 - Flags: review?(ajones)
Attachment #8547019 - Attachment is obsolete: true
Attachment #8547019 - Flags: review?(ajones)
Small fix to switch the VLA to dynamic allocation, since the former doesn't
work on MSVC (and probably isn't a good idea anyway).
Attachment #8547114 - Flags: review?(cpearce)
Attachment #8547114 - Flags: review?(ajones)
Attachment #8547110 - Attachment is obsolete: true
Attachment #8547110 - Flags: review?(cpearce)
Attachment #8547110 - Flags: review?(ajones)
Comment on attachment 8547114 [details] [diff] [review]
Make MP4Demuxer's blocking reads non-blocking and hoist blocking into callers with a hacky retry strategy. v5

Review of attachment 8547114 [details] [diff] [review]:
-----------------------------------------------------------------

Excellent. That all looks like godo clean fun. I don't like the LastReadFailed() business but I can see why it is needed.

::: media/libstagefright/binding/mp4_demuxer.cpp
@@ +98,5 @@
> +
> +  // Read the number of tracks. If we can't find any, make sure to bail now before
> +  // attempting any new reads to make the retry system work.
> +  size_t trackCount = e->countTracks();
> +  if (trackCount == 0) {

Suggestion: Not much value in the intermediate variable but if you're using it then you may as well use it below.
Attachment #8547114 - Flags: review?(ajones) → review+
Comment on attachment 8547114 [details] [diff] [review]
Make MP4Demuxer's blocking reads non-blocking and hoist blocking into callers with a hacky retry strategy. v5

Review of attachment 8547114 [details] [diff] [review]:
-----------------------------------------------------------------

kft's r+ is quite sufficient here.
Attachment #8547114 - Flags: review?(cpearce)
Attached patch Work around media cache bug. v1 (obsolete) — Splinter Review
Attachment #8547145 - Flags: review?(roc)
Comment on attachment 8547145 [details] [diff] [review]
Work around media cache bug. v1

Review of attachment 8547145 [details] [diff] [review]:
-----------------------------------------------------------------

Technically I think this is not a media cache bug. It never promised that Pin would have the property you want it to have. For a pinned stream, the ranges returned by GetCachedDataEnd, GetNextCachedData, GetCachedRanges never decrease and reads in those ranges will always succeed. That's all we promise:
  // Cached blocks associated with this stream will not be evicted
  // while the stream is pinned.
If you probe with GetCachedRanges etc, mPartialBlockBuffer is not included; it's not a block.

If we want the media cache to promise that "everything read from a pinned stream can be read again", then I think we should modify MediaCache::Read so, when a stream is pinned, we always block until we've reached a block boundary or end-of-stream.

Meanwhile, this workaround is OK, modulo the following comment:

::: dom/media/fmp4/MP4Reader.cpp
@@ +99,4 @@
>      MonitorAutoUnlock unlock(*aMonitor);
> +    size_t ignored;
> +    (void) stream->BlockingReadAt(failure.mOffset, dummyBuffer, bufferSize, &ignored);
> +    if (NS_WARN_IF(!stream->BlockingReadAt(failure.mOffset, dummyBuffer, failure.mCount, &ignored))) {

I don't understand why you're doing two reads here.

Why isn't this just
if (!stream->BlockingReadAt(failure.mOffset, dummyBuffer, bufferSize, &ignored)) {
Attachment #8547145 - Flags: review?(roc) → review-
> I don't understand why you're doing two reads here.

My original concern was that BlockingReadAt would return failure for EOS,
causing us to error out for bytes that weren't actually requested by the
original demuxer read. But I now note that this case doesn't actually cause
us to return error, so I think we should be good with a single best-effort
priming read.
Attachment #8547145 - Attachment is obsolete: true
Attachment #8547193 - Flags: review?(roc)
Comment on attachment 8547193 [details] [diff] [review]
Work around the fact that media cache does not quite guarantee the property we want. v2

Actually, I think comment 30 implies r+ on this patch.
Attachment #8547193 - Flags: review?(roc) → review+
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #29)
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=32b557090cc8

This one is green.

(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #32)
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=4cfa0861ac6d

This one includes roc's suggested tweak. Looking at the code now, I'm quite sure that the tweak should not cause any new failures.
(In reply to Bobby Holley (Busy with media, don't ask for DOM/JS/XPConnect things) from comment #33)
> Comment on attachment 8547193 [details] [diff] [review]
> Work around the fact that media cache does not quite guarantee the property
> we want. v2
> 
> Actually, I think comment 30 implies r+ on this patch.

Yep.
https://hg.mozilla.org/mozilla-central/rev/19b8c420b13a
https://hg.mozilla.org/mozilla-central/rev/3ffd66a3f1bd
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
Depends on: 1120324
Comment on attachment 8547114 [details] [diff] [review]
Make MP4Demuxer's blocking reads non-blocking and hoist blocking into callers with a hacky retry strategy. v5

Approval Request Comment
[Feature/regressing bug #]: MSE
[User impact if declined]: Users more like to see playback stalls with YouTube. Less consistent testing.
[Describe test coverage new/current, TBPL]: Landed on m-c. Manual testing.
[Risks and why]: Risk is moderate. This is the least-invasive fix we could come up with, but it does affect normal mp4 playback code outside MSE.
[String/UUID change made/needed]: None.

This request applies to all patches in this bug.
Attachment #8547114 - Flags: approval-mozilla-beta?
Attachment #8547114 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8547193 - Flags: approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.