OMX_CommandFlush does not return all input buffers before command complete.

RESOLVED WONTFIX

Status

--
critical
RESOLVED WONTFIX
4 years ago
6 months ago

People

(Reporter: sotaro, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [POVB], crash signature)

Attachments

(3 attachments)

(Reporter)

Description

4 years ago
+++ This bug was initially created as a clone of Bug #1099356 +++

On flame device, sometimes OMX_CommandFlush does not return all input buffers before command complete. This causes an application's crash with the following log.

> CHECK_EQ(countBuffersWeOwn(mPortBuffers[portIndex]),mPortBuffers[portIndex].size()) failed: 3 vs. 4
(Reporter)

Comment 1

4 years ago
When OMX_CommandFlush is called, omx_vdec::execute_input_flush() handle input port's buffer flush. But it does not return all buffer's to OMX component's client. It handles only buffers that is in m_etb_q. It does not handle buffers that are already handled by omx_vdec::empty_this_buffer_proxy(), but does not yet handled by omx_vdec::empty_buffer_done().

https://www.codeaurora.org/cgit/quic/la/platform/hardware/qcom/media/tree/mm-video-v4l2/vidc/vdec/src/omx_vdec_msm8974.cpp?h=b2g_kk_3.5#n2460
(Reporter)

Comment 2

4 years ago
Created attachment 8525604 [details] [diff] [review]
log patch - Add log to caf OMX il component
(Reporter)

Comment 3

4 years ago
Created attachment 8525605 [details]
logcat log when the problem happened

Get logcat with attachment 8525604 [details] [diff] [review] and enabling OMXCodec's log.
Test environment also applied a patch of Bug 1101742 to suppress unrelated crash.
(Reporter)

Updated

4 years ago
Attachment #8525605 - Attachment is patch: false
(Reporter)

Comment 4

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #3)
> Created attachment 8525605 [details]

From the log, it is clear that omx_vdec::empty_buffer_done() is called after input port's OMX_CommandFlush completion.
(Reporter)

Comment 5

4 years ago
This bug is POVB problem.
Whiteboard: [POVB]
(Reporter)

Comment 6

4 years ago
[Blocking Requested - why for this release]: This bug affect to all kitkat's  msm8610 hardware. It also include b2g v2.1. When it happens, application process will crash.
blocking-b2g: --- → 2.0?
(Reporter)

Comment 7

4 years ago
caf's OMX IL's OMX_CommandFlush seems to break not only OMXCodec implementation, but also OpenMAX IL's specification. Therefore, caf's implementation need to be addressed, it it is correct.
------------------------------------------------------------
3.2.2.4 OMX_CommandFlush
This IL client calls this command to flush one or more component ports. nParam specifies the index of the port to flush. If the value of nParam is OMX_ALL, the component shall flush all ports.
When the IL client flushes a non-tunnelling port, that port shall return all buffers it is holding to the IL client using EmptyBufferDone and FillBufferDone (appropriate for an input port or an output port, respectively) to return the buffers. When tunnelling, the flushed input port uses EmptyThisBuffer or FillThisBuffer (appropriate for an input port or an output port, respectively) to return the buffers to the output port.
For each port that the component successfully flushes, the component shall send an OMX_EventCmdComplete event, indicating OMX_CommandFlush for nData1 and the individual port index for nData2, even if the flush resulted from using a value of OMX_ALL for nParam. If a flush fails, the component shall notify the IL client of the error via an OMX_EventError event.
(Reporter)

Comment 8

4 years ago
Diego, how do you think about this bug? Can caf fix this bug?
Flags: needinfo?(dwilson)

Updated

4 years ago
blocking-b2g: 2.0? → 2.0+
How do I reproduce this issue on device?
Flags: needinfo?(dwilson) → needinfo?(sotaro.ikeda.g)
(In reply to Diego Wilson [:diego] from comment #9)
> How do I reproduce this issue on device?

I've reproduced with the following:

1. Open camera app. Record a large number of short videos (5-30 seconds). Depending on how lucky you get, this can be anywhere from 10-30 videos. Close camera app.
2. Open video app. While it is generating the thumbnails, it will eventually encounter this issue.
3. If video app did not crash, reflash or clear cache/userdata (should keep the videos, but requires regenerating the thumbnails), and go back to step 1 or 2.

Bug 1100459 contains a log which shows the issue (with verbose logging for OMXCodec in libstagefright).
(Reporter)

Comment 11

4 years ago
(In reply to Diego Wilson [:diego] from comment #9)
> How do I reproduce this issue on device?

I reproduced the crash by Bug 1099356 Comment 12. In my case, without doing pause. 

n.b. with current gecko, a crash because of Bug 1101742 could also happen. It is checking-in to m-c.
Flags: needinfo?(sotaro.ikeda.g)
(Reporter)

Comment 12

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #11)
> (In reply to Diego Wilson [:diego] from comment #9)
> > How do I reproduce this issue on device?
> 
> I reproduced the crash by Bug 1099356 Comment 12. In my case, without doing
> pause. 
> 
> n.b. with current gecko, a crash because of Bug 1101742 could also happen.
> It is checking-in to m-c.

It could take long time(more than 20 minutes) until the crash happens by this STR when log out is enabled.
(Reporter)

Comment 13

4 years ago
diego, is there a progress?
Flags: needinfo?(dwilson)
Duplicate of this bug: 1100459
(Reporter)

Updated

4 years ago
Duplicate of this bug: 1106506
(Reporter)

Updated

4 years ago
Duplicate of this bug: 1110055
Duplicate of this bug: 1113900

Updated

4 years ago
Component: Video/Audio → Vendcom
Product: Core → Firefox OS

Comment 18

4 years ago
Hi! Diego,

Any update from your side? Thanks

--
Keven
(Reporter)

Comment 19

4 years ago
Hmm, no update nor response from codeaurora... Then, I am going to investigate possible fix.
Assignee: nobody → sotaro.ikeda.g
(Reporter)

Comment 20

4 years ago
I understand how the problem happens.
- OMXCodec flush input and output port individually. 2 flushes in total.
- But driver/hardware does not support flushing of individual ports.
- For each flush, driver/hardware flush both input and output port.
- V4L2_EVENT_MSM_VIDC_FLUSH_DONE is "flush done" message from driver.
  If it is received by omx hal, it tried OMX_CommandFlush command complete both input and output port.
- If input port flush and output port flush is sent very quickly.
   Both OMX_CommandFlush commands completes are sent under one V4L2_EVENT_MSM_VIDC_FLUSH_DONE message.
   Next V4L2_EVENT_MSM_VIDC_FLUSH_DONE message is just redundant. It do nothing.
- Before the above redundant V4L2_EVENT_MSM_VIDC_FLUSH_DONE message is received from driver,
  if consecutive flush is sent from OMXCodec, the redundant message is incorrectly recognized
  as the consecutive flush's complete. It causes early OMX_CommandFlush commands completes.
  The OMX_CommandFlush commands completes is sent before all input buffers return.
  Then OMXCodec's buffers state becomes inconsistent.
(Reporter)

Comment 21

4 years ago
As in Comment 20, "previous flush's complete message" causes next flush command complete too early. It causes input buffers or output buffers not returned to OMXCodec when flush is complete at OMXCodec. OMXCodec detect buffer's status is inconsistent. Then OMXCodec abort because of the error status.
(Reporter)

Comment 22

4 years ago
OMXCodec::read()
->OMXCodec::flushPortAsync() portIndex 0 // input port flush
 ->omx_vdec::execute_omx_flush() // flush both input and output ports
  ->ioctl(drv_ctx.video_driver_fd, VIDIOC_DECODER_CMD, &dec)
-OMXCodec::flushPortAsync()_E portIndex 1 // output port flush
 ->omx_vdec::execute_omx_flush() // flush both input and output ports
  ->ioctl(drv_ctx.video_driver_fd, VIDIOC_DECODER_CMD, &dec)
// on async_message_thread
async_message_thread()
->// handle V4L2_EVENT_MSM_VIDC_FLUSH_DONE message
 ->// Create and handle VDEC_MSG_RESP_FLUSH_INPUT_DONE message
  ->// post OMX_COMPONENT_GENERATE_EVENT_INPUT_FLUSH to message thread
 ->// Create and handle VDEC_MSG_RESP_FLUSH_OUTPUT_DONE message
  ->// post OMX_COMPONENT_GENERATE_EVENT_OUTPUT_FLUSH to message thread
// on message_thread
->omx_vdec::process_event_cb()
 ->case OMX_COMPONENT_GENERATE_EVENT_INPUT_FLUSH:
  ->pThis->execute_input_flush();
  ->pThis->m_cb.EventHandler() // input port flush complete
// on message_thread
->omx_vdec::process_event_cb()
 ->case OMX_COMPONENT_GENERATE_EVENT_OUTPUT_FLUSH:
  ->pThis->execute_output_flush()// out port flush complete
->OMXCodec::emptyBuffer() // send new input buffer to omx
           ************** input buffer that will cause state inconsistency
 ->omx_vdec::empty_this_buffer()
  ->omx_vdec::empty_this_buffer_proxy()

OMXCodec::read() // next read.
// on async_message_thread
async_message_thread()
->// handle V4L2_EVENT_MSM_VIDC_FLUSH_DONE message
      ************************ This is previous flush's done message
 ->// Create and handle VDEC_MSG_RESP_FLUSH_INPUT_DONE message
  ->// post OMX_COMPONENT_GENERATE_EVENT_INPUT_FLUSH to message thread
 ->// Create and handle VDEC_MSG_RESP_FLUSH_OUTPUT_DONE message
  ->// post OMX_COMPONENT_GENERATE_EVENT_OUTPUT_FLUSH to message thread
->OMXCodec::flushPortAsync() portIndex 0 // input port flush
 ->omx_vdec::execute_omx_flush() // flush both input and output ports
  ->ioctl(drv_ctx.video_driver_fd, VIDIOC_DECODER_CMD, &dec)
-OMXCodec::flushPortAsync()_E portIndex 1 // output port flush
 ->omx_vdec::execute_omx_flush() // flush both input and output ports
  ->ioctl(drv_ctx.video_driver_fd, VIDIOC_DECODER_CMD, &dec)
// on message_thread
->omx_vdec::process_event_cb()
 ->case OMX_COMPONENT_GENERATE_EVENT_INPUT_FLUSH:
  ->pThis->execute_input_flush();
  ->pThis->m_cb.EventHandler() // input port flush complete
       ***************************** early flush complete
// on message_thread
->omx_vdec::process_event_cb()
 ->case OMX_COMPONENT_GENERATE_EVENT_OUTPUT_FLUSH:
  ->pThis->execute_output_flush()// out port flush complete
       ***************************** early flush complete
//on async_message_thread
->// reveice VDEC_MSG_RESP_INPUT_BUFFER_DONE *************** this input buffer return is too late
->// handle V4L2_EVENT_MSM_VIDC_FLUSH_DONE message
->// handle V4L2_EVENT_MSM_VIDC_FLUSH_DONE message
(Reporter)

Comment 23

4 years ago
omx_vdec's implementation does not handle additional port flush during previous port flush.
(Reporter)

Comment 24

4 years ago
Created attachment 8557434 [details] [diff] [review]
patch - prevent early port flush complete
(Reporter)

Comment 25

4 years ago
By applying attachment 8557434 [details] [diff] [review] to master flame, the problem was fixed.
(Reporter)

Updated

4 years ago
Flags: needinfo?(dwilson)
(Reporter)

Comment 26

4 years ago
Diego, can you give a comment to attachment 8557434 [details] [diff] [review]?
Flags: needinfo?(dwilson)
(Reporter)

Updated

4 years ago
Attachment #8557434 - Flags: feedback?(dwilson)
(Reporter)

Comment 27

4 years ago
m1, can we have a response from diego?
Flags: needinfo?(mvines)
This doesn't seem relevant to v2.2 efforts at the moment
Flags: needinfo?(mvines)
Actually in 2.2, the dependent bug 1099356 reproduces with about 50% frequency (in the video app), since the seek operation change implemented in 2.2.  I think we include this bug within the scope of 2.2 if it fixes bug 1099356, but will wait for feedback from Bhavana in that bug.
I searched the crash database here and we have observed this crash occur on v2.0, v2.1 and v2.2.  However only 8x10 and 8x26.  If/when this is observed on active chipsets for v2.2 then this becomes very interesting for us to look at.

Comment 31

4 years ago
triage:
This is edge case and we will revisit 2.2 later per comment 30.
blocking-b2g: 2.0+ → ---
(In reply to Michael Vines [:m1] [:evilmachines] from comment #30)
> I searched the crash database here and we have observed this crash occur on
> v2.0, v2.1 and v2.2.  However only 8x10 and 8x26.  If/when this is observed
> on active chipsets for v2.2 then this becomes very interesting for us to
> look at.

mvines/Inder, this seems to be frequently reproducible in 2.1 due to the new seek mechanism (https://bugzilla.mozilla.org/show_bug.cgi?id=1138671#c5). As you are aware we do not have the chipsets for 2.2 and 8x10 and 8x26 are are reference. Could you get QA on your end as the STR in https://bugzilla.mozilla.org/show_bug.cgi?id=1099356#c37 says we encounter this 50% of the time.
Flags: needinfo?(mvines)
Flags: needinfo?(ikumar)
(In reply to bhavana bajaj [:bajaj] from comment #32)
> (In reply to Michael Vines [:m1] [:evilmachines] from comment #30)
> > I searched the crash database here and we have observed this crash occur on
> > v2.0, v2.1 and v2.2.  However only 8x10 and 8x26.  If/when this is observed
> > on active chipsets for v2.2 then this becomes very interesting for us to
> > look at.
> 
> mvines/Inder, this seems to be frequently reproducible in 2.1 due to the new
sry, meant 2.2 here!
> seek mechanism (https://bugzilla.mozilla.org/show_bug.cgi?id=1138671#c5). As
> you are aware we do not have the chipsets for 2.2 and 8x10 and 8x26 are are
> reference. Could you get QA on your end as the STR in
> https://bugzilla.mozilla.org/show_bug.cgi?id=1099356#c37 says we encounter
> this 50% of the time.
Flags: needinfo?(mvines)
Flags: needinfo?(mvines)

Updated

4 years ago
Flags: needinfo?(dwilson)

Updated

4 years ago
Attachment #8557434 - Flags: feedback?(dwilson)
(Reporter)

Comment 34

4 years ago
There is nothing I can do more.
Assignee: sotaro.ikeda.g → nobody
(In reply to bhavana bajaj [:bajaj] from comment #33)
> > Could you get QA on your end as the STR in
> > https://bugzilla.mozilla.org/show_bug.cgi?id=1099356#c37 says we encounter
> > this 50% of the time.

I'm sad to report that I've been unable to reproduce here on v2.2 devices after many attempts.
Flags: needinfo?(mvines)
Flags: needinfo?(ikumar)
Duplicate of this bug: 1099356
Crash Signature: [@ mozalloc_abort(char const*) | __android_log_assert | android::OMXCodec::onCmdComplete(OMX_COMMANDTYPE, unsigned long) ]

Updated

3 years ago
Crash Signature: [@ mozalloc_abort(char const*) | __android_log_assert | android::OMXCodec::onCmdComplete(OMX_COMMANDTYPE, unsigned long) ] → [@ mozalloc_abort(char const*) | __android_log_assert | android::OMXCodec::onCmdComplete(OMX_COMMANDTYPE, unsigned long) ] [@ mozalloc_abort | __android_log_assert | android::OMXCodec::onCmdComplete ]
(Reporter)

Updated

3 years ago
See Also: → bug 1231257

Comment 37

6 months ago
Firefox OS is not being worked on
Status: NEW → RESOLVED
Last Resolved: 6 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.