Closed
Bug 1101826
Opened 10 years ago
Closed 6 years ago
OMX_CommandFlush does not return all input buffers before command complete.
Categories
(Firefox OS Graveyard :: Vendcom, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: sotaro, Unassigned)
References
Details
(Whiteboard: [POVB])
Crash Data
Attachments
(3 files)
19.46 KB,
patch
|
Details | Diff | Splinter Review | |
21.21 KB,
text/plain
|
Details | |
6.16 KB,
patch
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #1099356 +++ On flame device, sometimes OMX_CommandFlush does not return all input buffers before command complete. This causes an application's crash with the following log. > CHECK_EQ(countBuffersWeOwn(mPortBuffers[portIndex]),mPortBuffers[portIndex].size()) failed: 3 vs. 4
Reporter | ||
Comment 1•10 years ago
|
||
When OMX_CommandFlush is called, omx_vdec::execute_input_flush() handle input port's buffer flush. But it does not return all buffer's to OMX component's client. It handles only buffers that is in m_etb_q. It does not handle buffers that are already handled by omx_vdec::empty_this_buffer_proxy(), but does not yet handled by omx_vdec::empty_buffer_done(). https://www.codeaurora.org/cgit/quic/la/platform/hardware/qcom/media/tree/mm-video-v4l2/vidc/vdec/src/omx_vdec_msm8974.cpp?h=b2g_kk_3.5#n2460
Reporter | ||
Comment 2•10 years ago
|
||
Reporter | ||
Comment 3•10 years ago
|
||
Get logcat with attachment 8525604 [details] [diff] [review] and enabling OMXCodec's log. Test environment also applied a patch of Bug 1101742 to suppress unrelated crash.
Reporter | ||
Updated•10 years ago
|
Attachment #8525605 -
Attachment is patch: false
Reporter | ||
Comment 4•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #3) > Created attachment 8525605 [details] From the log, it is clear that omx_vdec::empty_buffer_done() is called after input port's OMX_CommandFlush completion.
Reporter | ||
Comment 6•10 years ago
|
||
[Blocking Requested - why for this release]: This bug affect to all kitkat's msm8610 hardware. It also include b2g v2.1. When it happens, application process will crash.
blocking-b2g: --- → 2.0?
Reporter | ||
Comment 7•10 years ago
|
||
caf's OMX IL's OMX_CommandFlush seems to break not only OMXCodec implementation, but also OpenMAX IL's specification. Therefore, caf's implementation need to be addressed, it it is correct. ------------------------------------------------------------ 3.2.2.4 OMX_CommandFlush This IL client calls this command to flush one or more component ports. nParam specifies the index of the port to flush. If the value of nParam is OMX_ALL, the component shall flush all ports. When the IL client flushes a non-tunnelling port, that port shall return all buffers it is holding to the IL client using EmptyBufferDone and FillBufferDone (appropriate for an input port or an output port, respectively) to return the buffers. When tunnelling, the flushed input port uses EmptyThisBuffer or FillThisBuffer (appropriate for an input port or an output port, respectively) to return the buffers to the output port. For each port that the component successfully flushes, the component shall send an OMX_EventCmdComplete event, indicating OMX_CommandFlush for nData1 and the individual port index for nData2, even if the flush resulted from using a value of OMX_ALL for nParam. If a flush fails, the component shall notify the IL client of the error via an OMX_EventError event.
Reporter | ||
Comment 8•10 years ago
|
||
Diego, how do you think about this bug? Can caf fix this bug?
Flags: needinfo?(dwilson)
Updated•10 years ago
|
blocking-b2g: 2.0? → 2.0+
Comment 9•10 years ago
|
||
How do I reproduce this issue on device?
Flags: needinfo?(dwilson) → needinfo?(sotaro.ikeda.g)
Comment 10•10 years ago
|
||
(In reply to Diego Wilson [:diego] from comment #9) > How do I reproduce this issue on device? I've reproduced with the following: 1. Open camera app. Record a large number of short videos (5-30 seconds). Depending on how lucky you get, this can be anywhere from 10-30 videos. Close camera app. 2. Open video app. While it is generating the thumbnails, it will eventually encounter this issue. 3. If video app did not crash, reflash or clear cache/userdata (should keep the videos, but requires regenerating the thumbnails), and go back to step 1 or 2. Bug 1100459 contains a log which shows the issue (with verbose logging for OMXCodec in libstagefright).
Reporter | ||
Comment 11•10 years ago
|
||
(In reply to Diego Wilson [:diego] from comment #9) > How do I reproduce this issue on device? I reproduced the crash by Bug 1099356 Comment 12. In my case, without doing pause. n.b. with current gecko, a crash because of Bug 1101742 could also happen. It is checking-in to m-c.
Flags: needinfo?(sotaro.ikeda.g)
Reporter | ||
Comment 12•10 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #11) > (In reply to Diego Wilson [:diego] from comment #9) > > How do I reproduce this issue on device? > > I reproduced the crash by Bug 1099356 Comment 12. In my case, without doing > pause. > > n.b. with current gecko, a crash because of Bug 1101742 could also happen. > It is checking-in to m-c. It could take long time(more than 20 minutes) until the crash happens by this STR when log out is enabled.
Updated•9 years ago
|
Component: Video/Audio → Vendcom
Product: Core → Firefox OS
Comment 18•9 years ago
|
||
Hi! Diego, Any update from your side? Thanks -- Keven
Reporter | ||
Comment 19•9 years ago
|
||
Hmm, no update nor response from codeaurora... Then, I am going to investigate possible fix.
Assignee: nobody → sotaro.ikeda.g
Reporter | ||
Comment 20•9 years ago
|
||
I understand how the problem happens. - OMXCodec flush input and output port individually. 2 flushes in total. - But driver/hardware does not support flushing of individual ports. - For each flush, driver/hardware flush both input and output port. - V4L2_EVENT_MSM_VIDC_FLUSH_DONE is "flush done" message from driver. If it is received by omx hal, it tried OMX_CommandFlush command complete both input and output port. - If input port flush and output port flush is sent very quickly. Both OMX_CommandFlush commands completes are sent under one V4L2_EVENT_MSM_VIDC_FLUSH_DONE message. Next V4L2_EVENT_MSM_VIDC_FLUSH_DONE message is just redundant. It do nothing. - Before the above redundant V4L2_EVENT_MSM_VIDC_FLUSH_DONE message is received from driver, if consecutive flush is sent from OMXCodec, the redundant message is incorrectly recognized as the consecutive flush's complete. It causes early OMX_CommandFlush commands completes. The OMX_CommandFlush commands completes is sent before all input buffers return. Then OMXCodec's buffers state becomes inconsistent.
Reporter | ||
Comment 21•9 years ago
|
||
As in Comment 20, "previous flush's complete message" causes next flush command complete too early. It causes input buffers or output buffers not returned to OMXCodec when flush is complete at OMXCodec. OMXCodec detect buffer's status is inconsistent. Then OMXCodec abort because of the error status.
Reporter | ||
Comment 22•9 years ago
|
||
OMXCodec::read() ->OMXCodec::flushPortAsync() portIndex 0 // input port flush ->omx_vdec::execute_omx_flush() // flush both input and output ports ->ioctl(drv_ctx.video_driver_fd, VIDIOC_DECODER_CMD, &dec) -OMXCodec::flushPortAsync()_E portIndex 1 // output port flush ->omx_vdec::execute_omx_flush() // flush both input and output ports ->ioctl(drv_ctx.video_driver_fd, VIDIOC_DECODER_CMD, &dec) // on async_message_thread async_message_thread() ->// handle V4L2_EVENT_MSM_VIDC_FLUSH_DONE message ->// Create and handle VDEC_MSG_RESP_FLUSH_INPUT_DONE message ->// post OMX_COMPONENT_GENERATE_EVENT_INPUT_FLUSH to message thread ->// Create and handle VDEC_MSG_RESP_FLUSH_OUTPUT_DONE message ->// post OMX_COMPONENT_GENERATE_EVENT_OUTPUT_FLUSH to message thread // on message_thread ->omx_vdec::process_event_cb() ->case OMX_COMPONENT_GENERATE_EVENT_INPUT_FLUSH: ->pThis->execute_input_flush(); ->pThis->m_cb.EventHandler() // input port flush complete // on message_thread ->omx_vdec::process_event_cb() ->case OMX_COMPONENT_GENERATE_EVENT_OUTPUT_FLUSH: ->pThis->execute_output_flush()// out port flush complete ->OMXCodec::emptyBuffer() // send new input buffer to omx ************** input buffer that will cause state inconsistency ->omx_vdec::empty_this_buffer() ->omx_vdec::empty_this_buffer_proxy() OMXCodec::read() // next read. // on async_message_thread async_message_thread() ->// handle V4L2_EVENT_MSM_VIDC_FLUSH_DONE message ************************ This is previous flush's done message ->// Create and handle VDEC_MSG_RESP_FLUSH_INPUT_DONE message ->// post OMX_COMPONENT_GENERATE_EVENT_INPUT_FLUSH to message thread ->// Create and handle VDEC_MSG_RESP_FLUSH_OUTPUT_DONE message ->// post OMX_COMPONENT_GENERATE_EVENT_OUTPUT_FLUSH to message thread ->OMXCodec::flushPortAsync() portIndex 0 // input port flush ->omx_vdec::execute_omx_flush() // flush both input and output ports ->ioctl(drv_ctx.video_driver_fd, VIDIOC_DECODER_CMD, &dec) -OMXCodec::flushPortAsync()_E portIndex 1 // output port flush ->omx_vdec::execute_omx_flush() // flush both input and output ports ->ioctl(drv_ctx.video_driver_fd, VIDIOC_DECODER_CMD, &dec) // on message_thread ->omx_vdec::process_event_cb() ->case OMX_COMPONENT_GENERATE_EVENT_INPUT_FLUSH: ->pThis->execute_input_flush(); ->pThis->m_cb.EventHandler() // input port flush complete ***************************** early flush complete // on message_thread ->omx_vdec::process_event_cb() ->case OMX_COMPONENT_GENERATE_EVENT_OUTPUT_FLUSH: ->pThis->execute_output_flush()// out port flush complete ***************************** early flush complete //on async_message_thread ->// reveice VDEC_MSG_RESP_INPUT_BUFFER_DONE *************** this input buffer return is too late ->// handle V4L2_EVENT_MSM_VIDC_FLUSH_DONE message ->// handle V4L2_EVENT_MSM_VIDC_FLUSH_DONE message
Reporter | ||
Comment 23•9 years ago
|
||
omx_vdec's implementation does not handle additional port flush during previous port flush.
Reporter | ||
Comment 24•9 years ago
|
||
Reporter | ||
Comment 25•9 years ago
|
||
By applying attachment 8557434 [details] [diff] [review] to master flame, the problem was fixed.
Reporter | ||
Updated•9 years ago
|
Flags: needinfo?(dwilson)
Reporter | ||
Comment 26•9 years ago
|
||
Diego, can you give a comment to attachment 8557434 [details] [diff] [review]?
Flags: needinfo?(dwilson)
Reporter | ||
Updated•9 years ago
|
Attachment #8557434 -
Flags: feedback?(dwilson)
Comment 28•9 years ago
|
||
This doesn't seem relevant to v2.2 efforts at the moment
Flags: needinfo?(mvines)
Comment 29•9 years ago
|
||
Actually in 2.2, the dependent bug 1099356 reproduces with about 50% frequency (in the video app), since the seek operation change implemented in 2.2. I think we include this bug within the scope of 2.2 if it fixes bug 1099356, but will wait for feedback from Bhavana in that bug.
Comment 30•9 years ago
|
||
I searched the crash database here and we have observed this crash occur on v2.0, v2.1 and v2.2. However only 8x10 and 8x26. If/when this is observed on active chipsets for v2.2 then this becomes very interesting for us to look at.
Comment 31•9 years ago
|
||
triage: This is edge case and we will revisit 2.2 later per comment 30.
blocking-b2g: 2.0+ → ---
Comment 32•9 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] from comment #30) > I searched the crash database here and we have observed this crash occur on > v2.0, v2.1 and v2.2. However only 8x10 and 8x26. If/when this is observed > on active chipsets for v2.2 then this becomes very interesting for us to > look at. mvines/Inder, this seems to be frequently reproducible in 2.1 due to the new seek mechanism (https://bugzilla.mozilla.org/show_bug.cgi?id=1138671#c5). As you are aware we do not have the chipsets for 2.2 and 8x10 and 8x26 are are reference. Could you get QA on your end as the STR in https://bugzilla.mozilla.org/show_bug.cgi?id=1099356#c37 says we encounter this 50% of the time.
Flags: needinfo?(mvines)
Flags: needinfo?(ikumar)
Comment 33•9 years ago
|
||
(In reply to bhavana bajaj [:bajaj] from comment #32) > (In reply to Michael Vines [:m1] [:evilmachines] from comment #30) > > I searched the crash database here and we have observed this crash occur on > > v2.0, v2.1 and v2.2. However only 8x10 and 8x26. If/when this is observed > > on active chipsets for v2.2 then this becomes very interesting for us to > > look at. > > mvines/Inder, this seems to be frequently reproducible in 2.1 due to the new sry, meant 2.2 here! > seek mechanism (https://bugzilla.mozilla.org/show_bug.cgi?id=1138671#c5). As > you are aware we do not have the chipsets for 2.2 and 8x10 and 8x26 are are > reference. Could you get QA on your end as the STR in > https://bugzilla.mozilla.org/show_bug.cgi?id=1099356#c37 says we encounter > this 50% of the time.
Updated•9 years ago
|
Flags: needinfo?(mvines)
Updated•9 years ago
|
Flags: needinfo?(mvines)
Updated•9 years ago
|
Flags: needinfo?(dwilson)
Updated•9 years ago
|
Attachment #8557434 -
Flags: feedback?(dwilson)
Comment 35•9 years ago
|
||
(In reply to bhavana bajaj [:bajaj] from comment #33) > > Could you get QA on your end as the STR in > > https://bugzilla.mozilla.org/show_bug.cgi?id=1099356#c37 says we encounter > > this 50% of the time. I'm sad to report that I've been unable to reproduce here on v2.2 devices after many attempts.
Flags: needinfo?(mvines)
Flags: needinfo?(ikumar)
Crash Signature: [@ mozalloc_abort(char const*) | __android_log_assert | android::OMXCodec::onCmdComplete(OMX_COMMANDTYPE, unsigned long) ]
Updated•9 years ago
|
Crash Signature: [@ mozalloc_abort(char const*) | __android_log_assert | android::OMXCodec::onCmdComplete(OMX_COMMANDTYPE, unsigned long) ] → [@ mozalloc_abort(char const*) | __android_log_assert | android::OMXCodec::onCmdComplete(OMX_COMMANDTYPE, unsigned long) ]
[@ mozalloc_abort | __android_log_assert | android::OMXCodec::onCmdComplete ]
Comment 37•6 years ago
|
||
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•