Closed Bug 1009420 Opened 6 years ago Closed 3 years ago

[B2G] ACodec decode thread block on waiting consumed graphic buffer from GonkNativeWindow.

Categories

(Core :: WebRTC: Audio/Video, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED WONTFIX
Blocking Flags:
backlog parking-lot

People

(Reporter: jhlin, Unassigned)

References

Details

Attachments

(2 files)

ACodec always waits for spare buffer from native window before continue processing next frame if renderOutputBufferAndRelease() is called [1]. Using GonkNativeWindow with it will make ACodec block (~50ms on flame) frequently because it doesn't have enough buffers for ACodec to run smoothly.

Bug 1009410 provides a method to get graphic buffer from MediaCodec so we can stop using renderOutputBufferAndRelease() to avoid ACodec blocking.

[1] http://androidxref.com/4.4_r1/xref/frameworks/av/media/libstagefright/ACodec.cpp#3436
The patch basically works. But it seems to have the following problems that need to address/make clear.
- [1] Determing a number of spare buffers. The patch set to 4.
- [2] HandleNewFrame() need to be updated/renamed.
       HandleNewFrame() is just created from OnNewFrame().
- [3] Ensure WebrtcOMXDecoder is alive until all RecycleCallback() is called.
- [4] All gralloc buffers might need to be returned to MediaCodec, before MediaCodec free.
      + This was a very storong requirement of OMXCodec.
        MediaCodec seems also same.
- [5] Make clear how to handle release fence.
      In OMXDecoder case, the fence can be sent to GonkNativeWindow.
- [6] WebrtcOMXDecoder::ReleaseOutputBuffer() might need to be called on a thread other than RecycleCallback() calling thread, if ReleaseOutputBuffer() takes longer time.
- [7] Make clear a blocking point by applying the patch.
The patch include Bug 1009410's patch attachment 8421562 [details] [diff] [review] just for simplicity.
The patch remove the spare buffers' return to ANativeWindow.
> - [5] Make clear how to handle release fence.
>       In OMXDecoder case, the fence can be sent to GonkNativeWindow.

If the fence can not be sent to GonkNativeWindow, WebrtcOMXDecoder needs to wait fence complete before the buffer release to MediaCodec.
jhlin, can you take and proceed the paches?
Flags: needinfo?(jolin)
(In reply to Sotaro Ikeda [:sotaro] from comment #3)
> > - [5] Make clear how to handle release fence.
> >       In OMXDecoder case, the fence can be sent to GonkNativeWindow.
> 
> If the fence can not be sent to GonkNativeWindow, WebrtcOMXDecoder needs to
> wait fence complete before the buffer release to MediaCodec.
 
 Is this what [1] are for? To wait fence complete by calling cancelBuffer() on native GonkNativeWindow?

 [1] http://dxr.mozilla.org/mozilla-central/source/content/media/omx/OmxDecoder.cpp?from=OmxDecoder.cpp&case=true#1067
(In reply to Sotaro Ikeda [:sotaro] from comment #4)
> jhlin, can you take and proceed the paches?

 I'm away for the next 3 days. Will be back next Monday.
Flags: needinfo?(jolin)
(In reply to John Lin[:jolin][:jhlin] from comment #5)
> (In reply to Sotaro Ikeda [:sotaro] from comment #3)
> > > - [5] Make clear how to handle release fence.
> > >       In OMXDecoder case, the fence can be sent to GonkNativeWindow.
> > 
> > If the fence can not be sent to GonkNativeWindow, WebrtcOMXDecoder needs to
> > wait fence complete before the buffer release to MediaCodec.
>  
>  Is this what [1] are for? To wait fence complete by calling cancelBuffer()
> on native GonkNativeWindow?
> 
>  [1]
> http://dxr.mozilla.org/mozilla-central/source/content/media/omx/OmxDecoder.
> cpp?from=OmxDecoder.cpp&case=true#1067

If we want to wait fence complete, we need to call Fence::waitForever(). cancelBuffer() just put off to call Fence::waitForever(), it is called somewhere after that.

cancelBuffer() is a kind of tricky way to deliver fence to GonkNativeWindow. It could be done because of OMXCodec's implementation detail accept it. It is not clear MediaCodec could do similar thing. I am going to check if we can do same thing also at MediaCodec/ACodec.

We can put off Fence handling for the time being. Other parts implementationa are more important.
(In reply to Sotaro Ikeda [:sotaro] from comment #2)
> Created attachment 8446689 [details] [diff] [review]
> WIP patch part 2 - Change to MediaCodec
> 
> The patch include Bug 1009410's patch attachment 8421562 [details] [diff] [review]
> [review] just for simplicity.
> The patch remove the spare buffers' return to ANativeWindow.

Why do you need to do this?
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Diego Wilson [:diego] from comment #8)
> (In reply to Sotaro Ikeda [:sotaro] from comment #2)
> > Created attachment 8446689 [details] [diff] [review]
> > WIP patch part 2 - Change to MediaCodec
> > 
> > The patch include Bug 1009410's patch attachment 8421562 [details] [diff] [review]
> > [review] just for simplicity.
> > The patch remove the spare buffers' return to ANativeWindow.
> 
> Why do you need to do this?

The necessity is same to OMXCodec case. By attachment 8446687 [details] [diff] [review], GonkNativeWindow is used just as a buffer source, not a rendering target. In this usage, release buffers to GonkNativeWindow mean that we do not use the buffers.
Flags: needinfo?(sotaro.ikeda.g)
FWIW we have been testing with 256MB and increasing the buffer count for WebRTC works fine.

Now as far as media playback goes, is this a regression or has it always been a problem?
John, 
Could you use Sotaro's patch to check if there is any blocking problem or not? 
Thanks!
attachment 8446689 [details] [diff] [review] do as same thing as OmxDecoder do to OMXCodec. But it is not ideal. It add a very strict restriction to the codec shutdown. If the restriction is not met, the codec calls crash because of incorrect buffer state like Bug 1035755. This kind of crashes are sometimes very difficult to fix. But in the past it is the only choice.
On WebRTC's usage, for the time being, we prevent the blocking problem by increasing the spare buffers to 10. On WebRTC, gecko could choose the video size max limit. Therefore, even when the number of gralloc buffers is increased, we could limit the total buffer usage by limiting the video size.

But on video playback, we could not control the video size limit. The product have to playback as large as possible videos. Increase the number of buffer causes the increase of gralloc buffers usage.
To fix MediaCodec's "decode thread block" problem, there are following choices.

- [1] Do the same thing as OmxDecoder does.
  + (pro) Could address the blocking or buffer drop between got decoded buffer and compositing.
          Blocking timing is deferred until buffer return from compositor.
          Therefore when the blocking function is called,
          the next buffer(the returned buffer) is already exist.
          The block does not happen there.
  + (con) 
      +Need to expose graphic buffer from MediaCodec.
      +codec shutdown could cause the crash if buffer state is incorrect.
       Sometimes it is very difficult to analyze the problem.

- [2] Increase number of spare buffers. It is done by Bug 1031500.
  + (pro) 
      + Modification to MediaCodec or GonkNativeWindow is not necessary.
      + Could prevent the blocking between got decoded buffer and compositing,
               if gecko side does not hold more buffers.
      + Could prevent the codec shutdown crash problem in [1].
  + (con)
      + Increase gralloc buffers usage. It is a big problem on HD video playback.
      + If more buffers are held by gecko, blocking or "failed to get buffer" happens there.

- [3] Defer calling mCodec->dequeueBufferFromNativeWindow() until buffer release. 

  + (pro)
      + Could address the blocking or buffer drop between got decoded buffer and compositing.
        Blocking timing is deferred until buffer return from compositor.
      + Could prevent the codec shutdown crash problem in [1].
  + (con) 
      + Need to Modify ACodec and GonkNativeWindow.
From the characteristic, [3] seems best solution within three choices. I am not sure [3] is actually possible, until implementing and evaluating it.

By the wasy on ICS, [1] was/is the only choice.
(In reply to Sotaro Ikeda [:sotaro PTO July/25 - Aug/3] from comment #14)
> To fix MediaCodec's "decode thread block" problem, there are following
> choices.
> 
> - [1] Do the same thing as OmxDecoder does.
>   + (pro) Could address the blocking or buffer drop between got decoded
> buffer and compositing.
>           Blocking timing is deferred until buffer return from compositor.
>           Therefore when the blocking function is called,
>           the next buffer(the returned buffer) is already exist.
>           The block does not happen there.
>   + (con) 
>       +Need to expose graphic buffer from MediaCodec.
>       +codec shutdown could cause the crash if buffer state is incorrect.
>        Sometimes it is very difficult to analyze the problem.
I don't quite understand for the shutdown problem. Could you explain it more? Thanks!
What I understand is the buffer state should not be somehow incorrect since we don't modify it at any place so it should work well as what it usually does. We just expose graphicBuffer from ACodec and all the state change codes are not modified and we release that graphic buffer as usual as ABuffer does. 
In other words, the release follow is the same as user uses ABuffer in mOutputBuffers.
I would like to vote for [1] since the changes is less and it might make design more straightforward and simple :)
(In reply to Blake Wu [:bwu][:blakewu] from comment #16)
> I don't quite understand for the shutdown problem. Could you explain it
> more? Thanks!

During codec is shutdown, if there is a buffer that is not returned to the codec, the codec kill an application's process by the sanity check. It is sometimes very very difficult to analyze and reproduce.

Right now, Codec's usage is limited. Therefore, I fixed all problems related to this until now. But we are going to add more use cases of using codecs. In near future, we might fall into uncontrollable state about this problem.
(In reply to Blake Wu [:bwu][:blakewu] from comment #17)
> I would like to vote for [1] since the changes is less and it might make
> design more straightforward and simple :)

I totally disagree about it. Develop cost is cheap but debugging cost will become very expensive.
(In reply to Sotaro Ikeda [:sotaro PTO July/25 - Aug/3] from comment #19)
> (In reply to Blake Wu [:bwu][:blakewu] from comment #17)
> > I would like to vote for [1] since the changes is less and it might make
> > design more straightforward and simple :)
> 
> I totally disagree about it. Develop cost is cheap but debugging cost will
> become very expensive.

Until now, I fixed almost all problems about this. Therefore other people seems not care about the cost.
If we choose [1], we will face the crash problem forever. It is my concern about [1].
(In reply to Sotaro Ikeda [:sotaro PTO July/25 - Aug/3] from comment #18)
> (In reply to Blake Wu [:bwu][:blakewu] from comment #16)
> > I don't quite understand for the shutdown problem. Could you explain it
> > more? Thanks!
> 
> During codec is shutdown, if there is a buffer that is not returned to the
> codec, the codec kill an application's process by the sanity check. It is
> sometimes very very difficult to analyze and reproduce.
> 
> Right now, Codec's usage is limited. Therefore, I fixed all problems related
> to this until now. But we are going to add more use cases of using codecs.
> In near future, we might fall into uncontrollable state about this problem.

If there is one object that is holding TextureClient, it is not returned to the codec. And the codec detect it and trigger the crash of an application.
The problem is, it is very difficult to narrow down the the object that holding a reference pointer to TextureClient. We sometimes can not even receive actual use case of the crash from partners.
If the crash happens at a user's products, we can not receive logcat nor usecase but only crash callstack. And the crash log say nothing about who is holding the TextureClient pointer.
(In reply to Sotaro Ikeda [:sotaro PTO July/25 - Aug/3] from comment #20)
> (In reply to Sotaro Ikeda [:sotaro PTO July/25 - Aug/3] from comment #19)
> > (In reply to Blake Wu [:bwu][:blakewu] from comment #17)
> > > I would like to vote for [1] since the changes is less and it might make
> > > design more straightforward and simple :)
> > 
> > I totally disagree about it. Develop cost is cheap but debugging cost will
> > become very expensive.
> 
> Until now, I fixed almost all problems about this. Therefore other people
> seems not care about the cost.

Got your point. Thank you very much to share your concerns. 
What I think is the more straightforward and simple design should not introduce more side effects and should be easier to debug if there are problems caused by it. 
As I mentioned in comment 16, [1] should not introduce another use case since we still follow the procedure of using MediaCodec and what we do is get exposed graphic buffer when we dequeue buffer and release it when we are done with it as normal use case. If there are some problems for this use case, probably it is due to the different render pipeline we use in Gecko instead of Android's. That is what I think. 

For [3], I thought the changes might be bigger than [1], but I am not sure. Please correct me if i am wrong. Please also help further evaluate this approach since we do need a good solution to solve the problem of grapicBuffer. There are too many bugs blocked by this problem :(. Thanks again.
Responding to bug 941302 comment 95 to continue discussing, may I know when will you finish evaluating and implementing [3]? If it may take longer and is considered as a longer-term solution, maybe [1] could be  considered as a near-term solution to move some bugs on first. How do you think?
(In reply to Blake Wu [:bwu][:blakewu] from comment #25)
> Responding to bug 941302 comment 95 to continue discussing, may I know when
> will you finish evaluating and implementing [3]? If it may take longer and
> is considered as a longer-term solution, maybe [1] could be  considered as a
> near-term solution to move some bugs on first. How do you think?

Do I need to do it? I did not say I do it's evaluation. I am not working for media framework now. I just do comment when I have a time. If there is no one doing it I am going to do it.
(In reply to Sotaro Ikeda [:sotaro PTO July/25 - Aug/3] from comment #26)
> (In reply to Blake Wu [:bwu][:blakewu] from comment #25)
> > Responding to bug 941302 comment 95 to continue discussing, may I know when
> > will you finish evaluating and implementing [3]? If it may take longer and
> > is considered as a longer-term solution, maybe [1] could be  considered as a
> > near-term solution to move some bugs on first. How do you think?
> 
> Do I need to do it? I did not say I do it's evaluation. I am not working for
> media framework now. I just do comment when I have a time. If there is no
> one doing it I am going to do it.
Oh, I thought you are planning to do that since you have some analysis first as comment 14. I am not familiar with GonkNativeWindow and currently no one is helping on [3] either. Could you please help on it? If yes, may I know your opinion for comment 25? There are a couple of bugs (like bug 941302 and bug 1033903) blocked by it. If you are not available, could you recommend someone who I can ask for the help? Thanks!
(In reply to Sotaro Ikeda [:sotaro PTO July/25 - Aug/3] from comment #14)
> - [3] Defer calling mCodec->dequeueBufferFromNativeWindow() until buffer
> release. 

Hi Sotaro,
I am also interested in this idea. If we can have a fine solution from architecture-wise point of view by doing this, that would be good. Would you mind sharing more information about how to apply this solution, and how our original problems (ex. get one graphic buffer from MediaCodec for rendering) could be addressed by this method? It would be great to hear the guidance from you.
Okey, I am going to work to investigate about it.
(In reply to Sotaro Ikeda [:sotaro PTO July/25 - Aug/3] from comment #14)
> To fix MediaCodec's "decode thread block" problem, there are following
> choices.
> 
> - [1] Do the same thing as OmxDecoder does.
>   + (pro) Could address the blocking or buffer drop between got decoded
> buffer and compositing.
>           Blocking timing is deferred until buffer return from compositor.
>           Therefore when the blocking function is called,
>           the next buffer(the returned buffer) is already exist.
>           The block does not happen there.
>   + (con) 
>       +Need to expose graphic buffer from MediaCodec.
>       +codec shutdown could cause the crash if buffer state is incorrect.
>        Sometimes it is very difficult to analyze the problem.

I started to think about the candidates more concretely. I recognized that the above crash problem in [1] can be mitigated by a way like Bug 1042308. If the problem is addressed [1] becomes best way for this bug.
Remove WebRTC tag from summary since this bug is affects media playback bugs too. Also reset assignee. :)
BTW... Sotaro, do you think enabling async GonkNativeWindow (supported by your patch in bug 1036539) would help?
Assignee: jolin → nobody
Summary: [B2G][WebRTC] ACodec decode thread block on waiting consumed graphic buffer from GonkNativeWindow. → [B2G] ACodec decode thread block on waiting consumed graphic buffer from GonkNativeWindow.
(In reply to Sotaro Ikeda [:sotaro PTO July/25 - Aug/3] from comment #29)
> Okey, I am going to work to investigate about it.
Thanks!
(In reply to John Lin[:jolin][:jhlin] from comment #31)
> Remove WebRTC tag from summary since this bug is affects media playback bugs
> too. Also reset assignee. :)
> BTW... Sotaro, do you think enabling async GonkNativeWindow (supported by
> your patch in bug 1036539) would help?

If we use [1], we do not need async GonkNativeWindow at all. async GonkNativeWindow is necessary only when we get new video frame via GonkNativeWindow::getCurrentBuffer().

And when we choose [1] for video playback, WebRTC also have to be changed to [1]. Because, MediaCodec with [1] can not support GonkNativeWindow::getCurrentBuffer() way of using. [1]'s temporal implementation is attachment 8446689 [details] [diff] [review]. It changes how spare buffers are stored. By default MediaCodec, spare buffers are returned to ANativeWindow. But attachment 8446689 [details] [diff] [review] changed not to return them to ANativeWindow. To do the GonkNativeWindow::getCurrentBuffer() way, the spare buffer needs to be in ANativeWindow.
Among [1][2][3] in comment 14, we can not choose [2] for video playback, although [2] is currently used by WebRTC.

In WebRTC, we can choose video size. But on video playback, we can not choose supported video size. Phones need to support as large as possible videos. Therefore, [2] requests more gralloc buffers to the phone. It is not acceptable.

Between [1] and [3], I thought [3] is better than [1] at first, because of [1]'s crash problem during codec shutdown. But it could be mitigated by a way like Bug 1042308. If [1]'s biggest problem is fixed, [1] becomes most promising solution, because [3] needs more modification to MediaCodec.

If we choose [1] for video playback, WebRTC also need to move to [1]. It is already commented in Comment 33.
(In reply to Blake Wu [:bwu][:blakewu] from comment #32)
> (In reply to Sotaro Ikeda [:sotaro PTO July/25 - Aug/3] from comment #29)
> > Okey, I am going to work to investigate about it.
> Thanks!

From Comment 33 and Comment 34, it seems to become clear that we could choose [1]. If we could choose [1], I do not think there are remaining things that I need to do more in this bug.
By the way, Bug 1042308 is not ideal way to fix the codec shutdown problem. It is just a workaround for the problem. But we could live with it for the time being. I am going to work more proper solution at Bug 988941.
(In reply to Sotaro Ikeda [:sotaro PTO July/25 - Aug/3] from comment #36)
> By the way, Bug 1042308 is not ideal way to fix the codec shutdown problem.
> It is just a workaround for the problem. But we could live with it for the
> time being. I am going to work more proper solution at Bug 988941.

About bug 1042308, I mentioned only about how recycling is used. Current usage is not correct enough.
(In reply to Sotaro Ikeda [:sotaro PTO July/25 - Aug/3] from comment #33)
> If we use [1], we do not need async GonkNativeWindow at all. async
> GonkNativeWindow is necessary only when we get new video frame via
> GonkNativeWindow::getCurrentBuffer().

Hmm... I thought (bug 1036539 comment 3) that ACodec thread won't block at all when we use async GonkNativeWindow, and none of [1], [2], [3] is needed. Is there something I missed?
Flags: needinfo?(sotaro.ikeda.g)
(In reply to John Lin[:jolin][:jhlin] from comment #38)
> 
> Hmm... I thought (bug 1036539 comment 3) that ACodec thread won't block at
> all when we use async GonkNativeWindow, and none of [1], [2], [3] is needed.
> Is there something I missed?

In my understanding, the workaround in bug 1031500 is categorized to [2]. It increase the spare buffer to from 2 to 10. If we choose [2], async GonkNativeWindow becomes necessary. 

But [2] can not be used for video playback actually, because it need more buffers and video playback does not permit frame drop during decoding. [2] have a possibility of frame drop.

From WebRTC point of view [1] and [2] can be chosen. [2] is bug 1031500 + async GonkNativeWindow. In webrtc, frame drop is acceptable, therefore [2] is also ok.

My comments in the past was made by the assumption that video playback and WebRTC use the same way. Because how MediaCodec works are incompatible. But attachment 8462264 [details] [diff] [review] in Bug 1009410 permit two different modes about how to handle spare buffers. Therefore, WebRTC can choose [1] or [2] now. [1] seems better from the usable number of buffers. But it is not clear until we compare [1] and [2] in actual devices.
Flags: needinfo?(sotaro.ikeda.g)
I thinks [2] is necessary even when async GonkNativeWindow is used for WebRTC. From camera camera preview experience, even when there is no buffers between GonkNativeWindow and Compositor, 3-4 buffers are always in gecko side.
backlog: --- → parking-lot
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.