MediaCodecReader crash at mAudioPromise.Resolve

RESOLVED FIXED in mozilla38

Status

()

Core
Audio/Video
--
major
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: bechen, Assigned: bechen)

Tracking

unspecified
mozilla38
x86_64
Gonk (Firefox OS)
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 5 obsolete attachments)

(Assignee)

Description

3 years ago
When I try to enable MediaCodec Preference, I hit crash when playing a video file. 100% reproduce rate.

It looks like the "promise" and "auto dispatch a/v tasks or seek" have some conflicts => Ensure function doesn't match to Resolve function !?

Backtrace shows the mPromise is null.
(gdb) p mPromise
$2 = {mRawPtr = 0x0}


(gdb) bt
#0  mozilla::BaseAutoLock<mozilla::Mutex>::BaseAutoLock (this=0xb0b8fc2c, aLock=...) at ../../dist/include/mozilla/Mutex.h:164
#1  0xb55b8ac6 in mozilla::MediaPromise<nsRefPtr<mozilla::AudioData>, mozilla::MediaDecoderReader::NotDecodedReason>::Resolve (this=this@entry=0x0, aResolveValue=..., 
    aResolveSite=0xb642ef86 <mozilla::MediaCodecReader::DecodeAudioDataTask()::__PRETTY_FUNCTION__> "bool mozilla::MediaCodecReader::DecodeAudioDataTask()") at ../../dist/include/MediaPromise.h:286
#2  0xb5611944 in Resolve (aMethodName=0xb642ef86 <mozilla::MediaCodecReader::DecodeAudioDataTask()::__PRETTY_FUNCTION__> "bool mozilla::MediaCodecReader::DecodeAudioDataTask()", aResolveValue=..., 
    this=0xb2fb7f10) at ../../../dist/include/MediaPromise.h:398
#3  mozilla::MediaCodecReader::DecodeAudioDataTask (this=0xb2fb7c00) at /home/benjamin/Documents/hg/mozilla-central/dom/media/omx/MediaCodecReader.cpp:499
#4  0xb4d7e50e in nsRunnableMethodImpl<tag_nsresult (nsMemoryReporterManager::*)(), void, true>::Run (this=<optimized out>) at ../../dist/include/nsThreadUtils.h:388
#5  0xb55b5a94 in mozilla::MediaTaskQueue::Runner::Run (this=0xb2197830) at ../../../../../../hg/mozilla-central/dom/media/MediaTaskQueue.cpp:230
#6  0xb4da6972 in nsThreadPool::Run (this=0xb3cc4330) at ../../../../../../hg/mozilla-central/xpcom/threads/nsThreadPool.cpp:225
#7  0xb4da6640 in nsThread::ProcessNextEvent (this=0xb200af60, aMayWait=<optimized out>, aResult=0xb0b8fd27) at ../../../../../../hg/mozilla-central/xpcom/threads/nsThread.cpp:855
#8  0xb4db20c8 in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=aMayWait@entry=false) at /home/benjamin/Documents/hg/mozilla-central/xpcom/glue/nsThreadUtils.cpp:265
#9  0xb4ef1786 in mozilla::ipc::MessagePumpForNonMainThreads::Run (this=0xb22aab50, aDelegate=0xb2107be0) at ../../../../../../hg/mozilla-central/ipc/glue/MessagePump.cpp:339
#10 0xb4ee6004 in MessageLoop::RunInternal (this=this@entry=0xb2107be0) at ../../../../../../hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:233
#11 0xb4ee60b8 in RunHandler (this=0xb2107be0) at ../../../../../../hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:226
#12 MessageLoop::Run (this=this@entry=0xb2107be0) at ../../../../../../hg/mozilla-central/ipc/chromium/src/base/message_loop.cc:200
#13 0xb4da6a6e in nsThread::ThreadFunc (aArg=0xb200af60) at ../../../../../../hg/mozilla-central/xpcom/threads/nsThread.cpp:356
#14 0xb69731b2 in _pt_root (arg=0xb20a7300) at ../../../../../../../../hg/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:212
#15 0xb6e3222c in __thread_entry (func=0xb6973119 <_pt_root>, arg=0xb20a7300, tls=0xb0b8fdd0) at bionic/libc/bionic/pthread_create.cpp:105
#16 0xb6e323c4 in pthread_create (thread_out=0xb1e77b5c, attr=<optimized out>, start_routine=0xb6973119 <_pt_root>, arg=0x78) at bionic/libc/bionic/pthread_create.cpp:224
#17 0x00000000 in ?? ()
(gdb) p mLock
$1 = (mozilla::Mutex *) 0x8
(gdb) fr 2
#2  0xb5611944 in Resolve (aMethodName=0xb642ef86 <mozilla::MediaCodecReader::DecodeAudioDataTask()::__PRETTY_FUNCTION__> "bool mozilla::MediaCodecReader::DecodeAudioDataTask()", aResolveValue=..., 
    this=0xb2fb7f10) at ../../../dist/include/MediaPromise.h:398
398	    mPromise->Resolve(aResolveValue, aMethodName);
(gdb) p mPromise
$2 = {mRawPtr = 0x0}
mAudioPromise.Ensure should be called before you can resolve or reject mAudioPromise. What is the repro step of your crash?
Bobby: Any thoughts on this one?
Flags: needinfo?(bobbyholley)
(Assignee)

Comment 3

3 years ago
Created attachment 8541069 [details] [diff] [review]
enableMediaCodec.patch

STR:
Apply the patch to enable MediaCodec on b2g. Then play a video file.
This probably means that the logic is trying to send two samples to the consumer when only one sample was requested. The specifics of how that's happening would probably be pretty easy to determine locally - just do NSPR_LOG_MODULES="MediaPromise:5", which gives you a nice history of who creates promises and who resolves them.
Flags: needinfo?(bobbyholley)
Summary: MediaCodecReader crash at mAudioPromise.Resolve . → MediaCodecReader crash at mAudioPromise.Resolve
(Assignee)

Comment 5

3 years ago
Because the Ensure function runs on DecodeThread and the Resolve function runs on another thread, there are 2 DecodeAudioTasks mapping to the same promise cause the crash.

One thread is calling MediaPromiseHolder::Resolve function and another thread is calling MediaPromiseHolder::Ensure function.
Oh dear. Then yes, this state should definitely be protected by a monitor. You can ensure this by invoking SetMonitor on the MediaPromiseHolder, which will cause it to assert if you ever touch the holder without holding the monitor.
(Assignee)

Updated

3 years ago
Assignee: nobody → bechen
Status: NEW → ASSIGNED
(Assignee)

Comment 7

3 years ago
Created attachment 8542493 [details] [diff] [review]
bug-1114910-MediaCodecReader-crash.v01.patch

1. Fix crash at promise.
- need a lock to protect the promise.
2.Rewrite the seek function
- we can not decode video/audio without promise.
3. Remove self dispatch code.
- we can return CANCELED by promise.
4. Replace TaskQueue->flush by TaskQueue->AwaitIdle
- apply to promise, I think it is better to return a result through promise instead of flushing it.
Attachment #8542493 - Flags: feedback?(brsun)
Comment on attachment 8542493 [details] [diff] [review]
bug-1114910-MediaCodecReader-crash.v01.patch

Review of attachment 8542493 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/omx/MediaCodecReader.cpp
@@ +398,3 @@
>    }
>  
> +  return AudioDataPromise::CreateAndReject(CANCELED, __func__);

Are you sure you want to be rejecting with CANCELED here and elsewhere? That will cause the MDSM to immediately request again.

::: dom/media/omx/MediaCodecReader.h
@@ +435,5 @@
>    VideoTrack mVideoTrack;
>    AudioTrack mAudioOffloadTrack; // only Track::mSource is valid
>  
> +  Mutex mAudioPromiseLock;
> +  Mutex mVideoPromiseLock;

Is there any reason you're using a Mutex here rather than a Monitor, which would let you take advantage of the assertions enabled by SetMonitor per my suggestion in comment 6?
(Assignee)

Comment 9

3 years ago
Comment on attachment 8542493 [details] [diff] [review]
bug-1114910-MediaCodecReader-crash.v01.patch

Review of attachment 8542493 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/omx/MediaCodecReader.cpp
@@ +398,3 @@
>    }
>  
> +  return AudioDataPromise::CreateAndReject(CANCELED, __func__);

I think the reject with CANCELED is a dead code, since MDSM guarantee there is only one DecodeAudio/Video task runs on MDSM taskqueue. The code DispatchAudioTask() should always return true here. I'll add assertion later.

::: dom/media/omx/MediaCodecReader.h
@@ +435,5 @@
>    VideoTrack mVideoTrack;
>    AudioTrack mAudioOffloadTrack; // only Track::mSource is valid
>  
> +  Mutex mAudioPromiseLock;
> +  Mutex mVideoPromiseLock;

Thanks for your reminder, I'll change it.
(Assignee)

Comment 10

3 years ago
Created attachment 8544449 [details] [diff] [review]
bug-1114910-MediaCodecReader-crash.v02.patch

Address comment 8.

Bruce:
How do you feel if I replace mDurationLock by mTrackMonitor because I add a monitor in my patch?
Attachment #8542493 - Attachment is obsolete: true
Attachment #8542493 - Flags: feedback?(brsun)
Attachment #8544449 - Flags: review?(cpearce)
Attachment #8544449 - Flags: review?(brsun)

Comment 11

3 years ago
Comment on attachment 8544449 [details] [diff] [review]
bug-1114910-MediaCodecReader-crash.v02.patch

Review of attachment 8544449 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me. Feel free to replace mDurationLock with mTrackMonitor.
Attachment #8544449 - Flags: review?(brsun) → feedback+
Comment on attachment 8544449 [details] [diff] [review]
bug-1114910-MediaCodecReader-crash.v02.patch

Review of attachment 8544449 [details] [diff] [review]:
-----------------------------------------------------------------

Please remove dead code and fix the other issues brsun and bholley pointed out before re-requesting review.

::: dom/media/omx/MediaCodecReader.cpp
@@ +398,4 @@
>    if (CheckAudioResources()) {
> +    MonitorAutoLock al(mAudioTrack.mTrackMonitor);
> +    nsRefPtr<AudioDataPromise> p = mAudioTrack.mAudioPromise.Ensure(__func__);
> +    if (DispatchAudioTask()) {

The state machine should never request a sample from the reader if there's already a request awaiting a response. So how can mAudioTrack.mTaskPending be anything other than false when you call DispatchAudioTask()?

Same thing for the video case too.

@@ +404,3 @@
>    }
> +  MOZ_ASSERT(false, "DispatchAudioTask failed!");
> +  return AudioDataPromise::CreateAndReject(CANCELED, __func__);

If you do this, the state machine will call back immediately with another request.
Attachment #8544449 - Flags: review?(cpearce) → review-
(Assignee)

Comment 13

3 years ago
Created attachment 8545712 [details] [diff] [review]
bug-1114910-MediaCodecReader-crash.v03.patch

1. Remove the mTaskPending because state machine had already guarantee only one decoding task is running.
2. Remove CreateAndReject(CANCELED, __func__) at RequestAudio/RequestVideo function.
3. Remove mDurationLock by mTrackMonitor.
4. fix build error
Attachment #8544449 - Attachment is obsolete: true
Attachment #8545712 - Flags: review?(cpearce)
Comment on attachment 8545712 [details] [diff] [review]
bug-1114910-MediaCodecReader-crash.v03.patch

Review of attachment 8545712 [details] [diff] [review]:
-----------------------------------------------------------------

This is OK, but I want answers to my questions before I r+.

::: dom/media/omx/MediaCodecReader.cpp
@@ +404,3 @@
>    int64_t threshold = sInvalidTimestampUs;
>    if (aSkipToNextKeyframe && IsValidTimestampUs(aTimeThreshold)) {
> +    mVideoTrack.mTaskQueue->AwaitIdle();

Why do you need to wait here at all? Why can't you put a task onto the task queue to dispatch with the threshold/skip?

@@ +411,3 @@
>    if (CheckVideoResources()) {
>      DispatchVideoTask(threshold);
>    }

Can you assert here that mVideoTrack.mVideoPromise.IsEmpty()? If it's not, the state machine has made an error in having multiple requests pending at the same time.

Same thing for the audio case.

@@ +437,5 @@
>      if (status == OK || status == ERROR_END_OF_STREAM) {
>        break;
>      } else if (status == -EAGAIN) {
>        if (TimeStamp::Now() > timeout) {
> +        // Don't let this loop run for too long. Return CANCELED by promise.

You have separate task queues for video and audio decoding. That means they run concurrently on different threads. So why do you still need to have a timeout here now? Why do you need to effectively cancel the promise here, as the state machine will just call us back again straight away.

Same thing for the audio case.

I'd like to understand why we can't just keep trying here until we get a sample?
(Assignee)

Comment 15

3 years ago
(In reply to Chris Pearce (:cpearce) from comment #14)
> Comment on attachment 8545712 [details] [diff] [review]
> bug-1114910-MediaCodecReader-crash.v03.patch
> 
> Review of attachment 8545712 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This is OK, but I want answers to my questions before I r+.
> 
> ::: dom/media/omx/MediaCodecReader.cpp
> @@ +404,3 @@
> >    int64_t threshold = sInvalidTimestampUs;
> >    if (aSkipToNextKeyframe && IsValidTimestampUs(aTimeThreshold)) {
> > +    mVideoTrack.mTaskQueue->AwaitIdle();
> 
> Why do you need to wait here at all? Why can't you put a task onto the task
> queue to dispatch with the threshold/skip?
> 

Before this change here is mTaskQueue->Flush() function, that is because the mTaskQueue will dispatch to itself if timeout, so I want the flush out the task without aSkipToNextKeyframe by a new task with aSkipToNextKeyframe.

Then this change I remove the "dispatch to itself" when timeout, so I change the flush() to AwaitIdle(). But since the MDSM will keep only task is running, I should remove this line because there is no more task in my mTaskQueue no matter remove "dispatch to itself(keep trying)" or not.

So I will remove the |mVideoTrack.mTaskQueue->AwaitIdle();| line.

> 
> @@ +437,5 @@
> >      if (status == OK || status == ERROR_END_OF_STREAM) {
> >        break;
> >      } else if (status == -EAGAIN) {
> >        if (TimeStamp::Now() > timeout) {
> > +        // Don't let this loop run for too long. Return CANCELED by promise.
> 
> You have separate task queues for video and audio decoding. That means they
> run concurrently on different threads. So why do you still need to have a
> timeout here now? Why do you need to effectively cancel the promise here, as
> the state machine will just call us back again straight away.
> 
> Same thing for the audio case.
> 
> I'd like to understand why we can't just keep trying here until we get a
> sample?

If the decode task without aSkipToNextKeyframe, I think both "return canceled" and "keep trying" are the same. The two approaches will do the same thing except the first approach is triggered by MDSM.

Then, consider the decode task with aSkipToNextKeyframe and timeout. If we return CANCELED here, the MDSM will trigger a new task with aSkipToNextKeyframe so that I can update the aTimeThreshold.
But the worst case is that the HW decoder keeping timeout and MDSM keeping update the aTimeThreshold => there is no video frame output to the MDSM.

So I should keep "dispatch to itself(keep trying)" to prevent the worst case instead return CANCELED.
(Assignee)

Comment 16

3 years ago
Created attachment 8546460 [details] [diff] [review]
bug-1114910-MediaCodecReader-crash.v04.patch

Address comment 14, comment 15.
Attachment #8545712 - Attachment is obsolete: true
Attachment #8545712 - Flags: review?(cpearce)
Attachment #8546460 - Flags: review?(cpearce)
(In reply to Benjamin Chen [:bechen] from comment #15)
> (In reply to Chris Pearce (:cpearce) from comment #14)
> > 
> > @@ +437,5 @@
> > >      if (status == OK || status == ERROR_END_OF_STREAM) {
> > >        break;
> > >      } else if (status == -EAGAIN) {
> > >        if (TimeStamp::Now() > timeout) {
> > > +        // Don't let this loop run for too long. Return CANCELED by promise.
> > 
> > You have separate task queues for video and audio decoding. That means they
> > run concurrently on different threads. So why do you still need to have a
> > timeout here now? Why do you need to effectively cancel the promise here, as
> > the state machine will just call us back again straight away.
> > 
> > Same thing for the audio case.
> > 
> > I'd like to understand why we can't just keep trying here until we get a
> > sample?
> 
> If the decode task without aSkipToNextKeyframe, I think both "return
> canceled" and "keep trying" are the same. The two approaches will do the
> same thing except the first approach is triggered by MDSM.
> 
> Then, consider the decode task with aSkipToNextKeyframe and timeout. If we
> return CANCELED here, the MDSM will trigger a new task with
> aSkipToNextKeyframe so that I can update the aTimeThreshold.
> But the worst case is that the HW decoder keeping timeout and MDSM keeping
> update the aTimeThreshold => there is no video frame output to the MDSM.
> 
> So I should keep "dispatch to itself(keep trying)" to prevent the worst case
> instead return CANCELED.

Yes, that's correct for skip-to-next-keyframe.

I noticed that skip-to-next-keyframe is not implemented correctly; it *decodes* the video to the next keyframe, rather than seeking to the next keyframe. It should seek to the next keyframe. I filed bug 1120243 to track this. You should fix that. Not implementing skip-to-next-keyframe correctly will cause performance problems.

Now, going back to what I was saying about re-dispatching a task to the same task queue to do video decode... When you dispatch a task to call the same function again in the same task queue, you're effectively just calling the same function in a loop. (Except that you allow any other tasks to run before the task you dispatched get a chance to run, which means you're assumptions may no longer be valid, so you're more likely to get bugs). So dispatching a task to continue decoding is pretty much the same thing as just calling your function in a loop, except that it adds overhead.

The only reason when dispatching a task to the same task queue instead of blocking waiting for output would make sense would be if you were sharing the same task queue with a different Track's decode (like the audio decode), but you're not doing that here. Each Track has its own task queue.

So I think you should be blocking (not timing out) instead of dispatching a task to call the same function again.

Why do you need to time out?

You also need to cancel any pending promises in MediaCodecReader::ResetDecode(). I filed bug 1120247 to track this.
Comment on attachment 8546460 [details] [diff] [review]
bug-1114910-MediaCodecReader-crash.v04.patch

Review of attachment 8546460 [details] [diff] [review]:
-----------------------------------------------------------------

I'll r+ this as is, but you should file a follow up to remove the timeouts in the decode tasks. They're completely unnecessary overhead.
Attachment #8546460 - Flags: review?(cpearce) → review+
Also, if you're unsure about what you *should* be doing, check out MP4Reader. It implements most of what you're trying to do here already, and it's getting a lot of attention, so it's more likely to be correct. ;)
(Assignee)

Updated

3 years ago
Blocks: 1120823
(Assignee)

Comment 21

3 years ago
Created attachment 8548022 [details] [diff] [review]
bug-1114910-MediaCodecReader-crash.v04.patch

r=cpearce.
Rebased version.
Attachment #8541069 - Attachment is obsolete: true
Attachment #8546460 - Attachment is obsolete: true
Attachment #8548022 - Flags: review+
(Assignee)

Updated

3 years ago
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/f50cce157844
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
(Assignee)

Updated

3 years ago
Blocks: 1138825
You need to log in before you can comment on or make changes to this bug.