1101826 - OMX_CommandFlush does not return all input buffers before command complete.

Reporter

Description

•

10 years ago

+++ This bug was initially created as a clone of Bug #1099356 +++

On flame device, sometimes OMX_CommandFlush does not return all input buffers before command complete. This causes an application's crash with the following log.

> CHECK_EQ(countBuffersWeOwn(mPortBuffers[portIndex]),mPortBuffers[portIndex].size()) failed: 3 vs. 4

Sotaro Ikeda [:sotaro]

Reporter

Comment 1

•

10 years ago

When OMX_CommandFlush is called, omx_vdec::execute_input_flush() handle input port's buffer flush. But it does not return all buffer's to OMX component's client. It handles only buffers that is in m_etb_q. It does not handle buffers that are already handled by omx_vdec::empty_this_buffer_proxy(), but does not yet handled by omx_vdec::empty_buffer_done().

https://www.codeaurora.org/cgit/quic/la/platform/hardware/qcom/media/tree/mm-video-v4l2/vidc/vdec/src/omx_vdec_msm8974.cpp?h=b2g_kk_3.5#n2460

Sotaro Ikeda [:sotaro]

Reporter

Comment 2

•

10 years ago

Attached patch log patch - Add log to caf OMX il component — Details — Splinter Review

Sotaro Ikeda [:sotaro]

Reporter

Comment 3

•

10 years ago

Attached file logcat log when the problem happened — Details

Get logcat with attachment 8525604 [details] [diff] [review] and enabling OMXCodec's log.
Test environment also applied a patch of Bug 1101742 to suppress unrelated crash.

Sotaro Ikeda [:sotaro]

Reporter

Updated

•

10 years ago

Attachment #8525605 - Attachment is patch: false

Sotaro Ikeda [:sotaro]

Reporter

Comment 4

•

10 years ago

(In reply to Sotaro Ikeda [:sotaro] from comment #3)
> Created attachment 8525605 [details]

From the log, it is clear that omx_vdec::empty_buffer_done() is called after input port's OMX_CommandFlush completion.

Sotaro Ikeda [:sotaro]

Reporter

Comment 5

•

10 years ago

This bug is POVB problem.

Whiteboard: [POVB]

Sotaro Ikeda [:sotaro]

Reporter

Comment 6

•

10 years ago

[Blocking Requested - why for this release]: This bug affect to all kitkat's  msm8610 hardware. It also include b2g v2.1. When it happens, application process will crash.

blocking-b2g: --- → 2.0?

Sotaro Ikeda [:sotaro]

Reporter

Comment 7

•

10 years ago

caf's OMX IL's OMX_CommandFlush seems to break not only OMXCodec implementation, but also OpenMAX IL's specification. Therefore, caf's implementation need to be addressed, it it is correct.
------------------------------------------------------------
3.2.2.4 OMX_CommandFlush
This IL client calls this command to flush one or more component ports. nParam specifies the index of the port to flush. If the value of nParam is OMX_ALL, the component shall flush all ports.
When the IL client flushes a non-tunnelling port, that port shall return all buffers it is holding to the IL client using EmptyBufferDone and FillBufferDone (appropriate for an input port or an output port, respectively) to return the buffers. When tunnelling, the flushed input port uses EmptyThisBuffer or FillThisBuffer (appropriate for an input port or an output port, respectively) to return the buffers to the output port.
For each port that the component successfully flushes, the component shall send an OMX_EventCmdComplete event, indicating OMX_CommandFlush for nData1 and the individual port index for nData2, even if the flush resulted from using a value of OMX_ALL for nParam. If a flush fails, the component shall notify the IL client of the error via an OMX_EventError event.

Sotaro Ikeda [:sotaro]

Reporter

Comment 8

•

10 years ago

Diego, how do you think about this bug? Can caf fix this bug?

Flags: needinfo?(dwilson)

bhavana bajaj [:bajaj]

Updated

•

10 years ago

blocking-b2g: 2.0? → 2.0+

Diego Wilson [:diego]

Comment 9

•

10 years ago

How do I reproduce this issue on device?

Flags: needinfo?(dwilson) → needinfo?(sotaro.ikeda.g)

Andrew Osmond [:aosmond] (he/him)

Comment 10

•

10 years ago

(In reply to Diego Wilson [:diego] from comment #9)
> How do I reproduce this issue on device?

I've reproduced with the following:

1. Open camera app. Record a large number of short videos (5-30 seconds). Depending on how lucky you get, this can be anywhere from 10-30 videos. Close camera app.
2. Open video app. While it is generating the thumbnails, it will eventually encounter this issue.
3. If video app did not crash, reflash or clear cache/userdata (should keep the videos, but requires regenerating the thumbnails), and go back to step 1 or 2.

Bug 1100459 contains a log which shows the issue (with verbose logging for OMXCodec in libstagefright).

Sotaro Ikeda [:sotaro]

Reporter

Comment 11

•

10 years ago

(In reply to Diego Wilson [:diego] from comment #9)
> How do I reproduce this issue on device?

I reproduced the crash by Bug 1099356 Comment 12. In my case, without doing pause. 

n.b. with current gecko, a crash because of Bug 1101742 could also happen. It is checking-in to m-c.

Flags: needinfo?(sotaro.ikeda.g)

Sotaro Ikeda [:sotaro]

Reporter

Comment 12

•

10 years ago

(In reply to Sotaro Ikeda [:sotaro] from comment #11)
> (In reply to Diego Wilson [:diego] from comment #9)
> > How do I reproduce this issue on device?
> 
> I reproduced the crash by Bug 1099356 Comment 12. In my case, without doing
> pause. 
> 
> n.b. with current gecko, a crash because of Bug 1101742 could also happen.
> It is checking-in to m-c.

It could take long time(more than 20 minutes) until the crash happens by this STR when log out is enabled.

Sotaro Ikeda [:sotaro]

Reporter

Comment 13

•

10 years ago

diego, is there a progress?

Flags: needinfo?(dwilson)

Keven Kuo [:kkuo]

Updated

•

9 years ago

Component: Video/Audio → Vendcom

Product: Core → Firefox OS

Keven Kuo [:kkuo]

Comment 18

•

9 years ago

Hi! Diego,

Any update from your side? Thanks

--
Keven

Sotaro Ikeda [:sotaro]

Reporter

Comment 19

•

9 years ago

Hmm, no update nor response from codeaurora... Then, I am going to investigate possible fix.

Assignee: nobody → sotaro.ikeda.g

Sotaro Ikeda [:sotaro]

Reporter

Comment 20

•

9 years ago

I understand how the problem happens.
- OMXCodec flush input and output port individually. 2 flushes in total.
- But driver/hardware does not support flushing of individual ports.
- For each flush, driver/hardware flush both input and output port.
- V4L2_EVENT_MSM_VIDC_FLUSH_DONE is "flush done" message from driver.
  If it is received by omx hal, it tried OMX_CommandFlush command complete both input and output port.
- If input port flush and output port flush is sent very quickly.
   Both OMX_CommandFlush commands completes are sent under one V4L2_EVENT_MSM_VIDC_FLUSH_DONE message.
   Next V4L2_EVENT_MSM_VIDC_FLUSH_DONE message is just redundant. It do nothing.
- Before the above redundant V4L2_EVENT_MSM_VIDC_FLUSH_DONE message is received from driver,
  if consecutive flush is sent from OMXCodec, the redundant message is incorrectly recognized
  as the consecutive flush's complete. It causes early OMX_CommandFlush commands completes.
  The OMX_CommandFlush commands completes is sent before all input buffers return.
  Then OMXCodec's buffers state becomes inconsistent.

Sotaro Ikeda [:sotaro]

Reporter

Comment 21

•

9 years ago

As in Comment 20, "previous flush's complete message" causes next flush command complete too early. It causes input buffers or output buffers not returned to OMXCodec when flush is complete at OMXCodec. OMXCodec detect buffer's status is inconsistent. Then OMXCodec abort because of the error status.

Sotaro Ikeda [:sotaro]

Reporter

Comment 22

•

9 years ago

OMXCodec::read()
->OMXCodec::flushPortAsync() portIndex 0 // input port flush
 ->omx_vdec::execute_omx_flush() // flush both input and output ports
  ->ioctl(drv_ctx.video_driver_fd, VIDIOC_DECODER_CMD, &dec)
-OMXCodec::flushPortAsync()_E portIndex 1 // output port flush
 ->omx_vdec::execute_omx_flush() // flush both input and output ports
  ->ioctl(drv_ctx.video_driver_fd, VIDIOC_DECODER_CMD, &dec)
// on async_message_thread
async_message_thread()
->// handle V4L2_EVENT_MSM_VIDC_FLUSH_DONE message
 ->// Create and handle VDEC_MSG_RESP_FLUSH_INPUT_DONE message
  ->// post OMX_COMPONENT_GENERATE_EVENT_INPUT_FLUSH to message thread
 ->// Create and handle VDEC_MSG_RESP_FLUSH_OUTPUT_DONE message
  ->// post OMX_COMPONENT_GENERATE_EVENT_OUTPUT_FLUSH to message thread
// on message_thread
->omx_vdec::process_event_cb()
 ->case OMX_COMPONENT_GENERATE_EVENT_INPUT_FLUSH:
  ->pThis->execute_input_flush();
  ->pThis->m_cb.EventHandler() // input port flush complete
// on message_thread
->omx_vdec::process_event_cb()
 ->case OMX_COMPONENT_GENERATE_EVENT_OUTPUT_FLUSH:
  ->pThis->execute_output_flush()// out port flush complete
->OMXCodec::emptyBuffer() // send new input buffer to omx
           ************** input buffer that will cause state inconsistency
 ->omx_vdec::empty_this_buffer()
  ->omx_vdec::empty_this_buffer_proxy()

OMXCodec::read() // next read.
// on async_message_thread
async_message_thread()
->// handle V4L2_EVENT_MSM_VIDC_FLUSH_DONE message
      ************************ This is previous flush's done message
 ->// Create and handle VDEC_MSG_RESP_FLUSH_INPUT_DONE message
  ->// post OMX_COMPONENT_GENERATE_EVENT_INPUT_FLUSH to message thread
 ->// Create and handle VDEC_MSG_RESP_FLUSH_OUTPUT_DONE message
  ->// post OMX_COMPONENT_GENERATE_EVENT_OUTPUT_FLUSH to message thread
->OMXCodec::flushPortAsync() portIndex 0 // input port flush
 ->omx_vdec::execute_omx_flush() // flush both input and output ports
  ->ioctl(drv_ctx.video_driver_fd, VIDIOC_DECODER_CMD, &dec)
-OMXCodec::flushPortAsync()_E portIndex 1 // output port flush
 ->omx_vdec::execute_omx_flush() // flush both input and output ports
  ->ioctl(drv_ctx.video_driver_fd, VIDIOC_DECODER_CMD, &dec)
// on message_thread
->omx_vdec::process_event_cb()
 ->case OMX_COMPONENT_GENERATE_EVENT_INPUT_FLUSH:
  ->pThis->execute_input_flush();
  ->pThis->m_cb.EventHandler() // input port flush complete
       ***************************** early flush complete
// on message_thread
->omx_vdec::process_event_cb()
 ->case OMX_COMPONENT_GENERATE_EVENT_OUTPUT_FLUSH:
  ->pThis->execute_output_flush()// out port flush complete
       ***************************** early flush complete
//on async_message_thread
->// reveice VDEC_MSG_RESP_INPUT_BUFFER_DONE *************** this input buffer return is too late
->// handle V4L2_EVENT_MSM_VIDC_FLUSH_DONE message
->// handle V4L2_EVENT_MSM_VIDC_FLUSH_DONE message

Sotaro Ikeda [:sotaro]

Reporter

Comment 23

•

9 years ago

omx_vdec's implementation does not handle additional port flush during previous port flush.

Sotaro Ikeda [:sotaro]

Reporter

Comment 24

•

9 years ago

Attached patch patch - prevent early port flush complete — Details — Splinter Review

Sotaro Ikeda [:sotaro]

Reporter

Comment 25

•

9 years ago

By applying attachment 8557434 [details] [diff] [review] to master flame, the problem was fixed.

Sotaro Ikeda [:sotaro]

Reporter

Updated

•

9 years ago

Flags: needinfo?(dwilson)

Sotaro Ikeda [:sotaro]

Reporter

Comment 26

•

9 years ago

Diego, can you give a comment to attachment 8557434 [details] [diff] [review]?

Flags: needinfo?(dwilson)

Sotaro Ikeda [:sotaro]

Reporter

Updated

•

9 years ago

Attachment #8557434 - Flags: feedback?(dwilson)

Sotaro Ikeda [:sotaro]

Reporter

Comment 27

•

9 years ago

m1, can we have a response from diego?

Flags: needinfo?(mvines)

Michael Vines [:m1] [:evilmachines]

Comment 28

•

9 years ago

This doesn't seem relevant to v2.2 efforts at the moment

Flags: needinfo?(mvines)

No-Jun Park [:njpark]

Comment 29

•

9 years ago

Actually in 2.2, the dependent bug 1099356 reproduces with about 50% frequency (in the video app), since the seek operation change implemented in 2.2.  I think we include this bug within the scope of 2.2 if it fixes bug 1099356, but will wait for feedback from Bhavana in that bug.

Michael Vines [:m1] [:evilmachines]

Comment 30

•

9 years ago

I searched the crash database here and we have observed this crash occur on v2.0, v2.1 and v2.2.  However only 8x10 and 8x26.  If/when this is observed on active chipsets for v2.2 then this becomes very interesting for us to look at.

Josh Cheng [:josh]

Comment 31

•

9 years ago

triage:
This is edge case and we will revisit 2.2 later per comment 30.

blocking-b2g: 2.0+ → ---

bhavana bajaj [:bajaj]

Comment 32

•

9 years ago

(In reply to Michael Vines [:m1] [:evilmachines] from comment #30)
> I searched the crash database here and we have observed this crash occur on
> v2.0, v2.1 and v2.2.  However only 8x10 and 8x26.  If/when this is observed
> on active chipsets for v2.2 then this becomes very interesting for us to
> look at.

mvines/Inder, this seems to be frequently reproducible in 2.1 due to the new seek mechanism (https://bugzilla.mozilla.org/show_bug.cgi?id=1138671#c5). As you are aware we do not have the chipsets for 2.2 and 8x10 and 8x26 are are reference. Could you get QA on your end as the STR in https://bugzilla.mozilla.org/show_bug.cgi?id=1099356#c37 says we encounter this 50% of the time.

Flags: needinfo?(mvines)

Flags: needinfo?(ikumar)

bhavana bajaj [:bajaj]

Comment 33

•

9 years ago

(In reply to bhavana bajaj [:bajaj] from comment #32)
> (In reply to Michael Vines [:m1] [:evilmachines] from comment #30)
> > I searched the crash database here and we have observed this crash occur on
> > v2.0, v2.1 and v2.2.  However only 8x10 and 8x26.  If/when this is observed
> > on active chipsets for v2.2 then this becomes very interesting for us to
> > look at.
> 
> mvines/Inder, this seems to be frequently reproducible in 2.1 due to the new
sry, meant 2.2 here!
> seek mechanism (https://bugzilla.mozilla.org/show_bug.cgi?id=1138671#c5). As
> you are aware we do not have the chipsets for 2.2 and 8x10 and 8x26 are are
> reference. Could you get QA on your end as the STR in
> https://bugzilla.mozilla.org/show_bug.cgi?id=1099356#c37 says we encounter
> this 50% of the time.

Michael Vines [:m1] [:evilmachines]

Updated

•

9 years ago

Flags: needinfo?(mvines)

Michael Vines [:m1] [:evilmachines]

Updated

•

9 years ago

Flags: needinfo?(mvines)

Diego Wilson [:diego]

Updated

•

9 years ago

Flags: needinfo?(dwilson)

Diego Wilson [:diego]

Updated

•

9 years ago

Attachment #8557434 - Flags: feedback?(dwilson)

Sotaro Ikeda [:sotaro]

Reporter

Comment 34

•

9 years ago

There is nothing I can do more.

Assignee: sotaro.ikeda.g → nobody

Michael Vines [:m1] [:evilmachines]

Comment 35

•

9 years ago

(In reply to bhavana bajaj [:bajaj] from comment #33)
> > Could you get QA on your end as the STR in
> > https://bugzilla.mozilla.org/show_bug.cgi?id=1099356#c37 says we encounter
> > this 50% of the time.

I'm sad to report that I've been unable to reproduce here on v2.2 devices after many attempts.

Flags: needinfo?(mvines)

Flags: needinfo?(ikumar)

Naoki Hirata :nhirata (please use needinfo instead of cc)

Updated

•

9 years ago

Crash Signature: [@ mozalloc_abort(char const*) | __android_log_assert | android::OMXCodec::onCmdComplete(OMX_COMMANDTYPE, unsigned long) ]

BMO Automation

Updated

•

9 years ago

Crash Signature: [@ mozalloc_abort(char const*) | __android_log_assert | android::OMXCodec::onCmdComplete(OMX_COMMANDTYPE, unsigned long) ] → [@ mozalloc_abort(char const*) | __android_log_assert | android::OMXCodec::onCmdComplete(OMX_COMMANDTYPE, unsigned long) ] [@ mozalloc_abort | __android_log_assert | android::OMXCodec::onCmdComplete ]

Sotaro Ikeda [:sotaro]

Reporter

Updated

•

9 years ago

Comment 37

•

6 years ago

Firefox OS is not being worked on

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → WONTFIX

log patch - Add log to caf OMX il component 10 years ago Sotaro Ikeda [:sotaro] 19.46 KB, patch		Details \| Diff \| Splinter Review
logcat log when the problem happened 10 years ago Sotaro Ikeda [:sotaro] 21.21 KB, text/plain		Details
patch - prevent early port flush complete 9 years ago Sotaro Ikeda [:sotaro] 6.16 KB, patch		Details \| Diff \| Splinter Review