Closed Bug 812756 Opened 9 years ago Closed 9 years ago

Segfault while playing MP4 & hanging for playing second MP4

Categories

(Firefox OS Graveyard :: General, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-basecamp:+, firefox18 fixed, firefox19 fixed, firefox20 fixed)

RESOLVED FIXED
blocking-basecamp +
Tracking Status
firefox18 --- fixed
firefox19 --- fixed
firefox20 --- fixed

People

(Reporter: sinker, Assigned: eflores)

References

Details

(Whiteboard: [caf:blocking])

Attachments

(6 files, 1 obsolete file)

Video app got segfault while playing MP4, and hanging while playing MP4 if previous playing is success.

The cause of segfault is OMXCodec does not allow returning buffers while flushing.  It should be a bug of stagefright.  OMXCodec::signalBufferReturned() does not allow any returning while output port is SHUTTING_DOWN.  But, output port is also in SHUTTING_DOWN while flushing.  OMXCodec should remember buffers while flushing, and handle them while flushing is done.

The cause of hanging while playing second MP4 is timed-out error of first playing.  It is also a bug of stagefright.  mState of OMXCodec should be ERROR while flushing is timed-out, but it stays in FLUSHING state.  It make OMXCodec::stop() goes to a infinite waiting.
Attachment #682736 - Flags: feedback?(eflores)
Attachment #682736 - Flags: feedback?(chris.double)
blocking-basecamp: --- → ?
Comment on attachment 682736 [details] [diff] [review]
Fix OMX buffer returning and timed-out issue

I don't think this is really a bug in OMXCodec. We appear to be returning buffers when it doesn't expect it.

I added a hack a few months back to our catchup code to seek to the next keyframe if playback is falling behind. What I _think_ is going on here (as far as the segfault goes) is:

1. The video falls behind
2. We try to seek forward to the next keyframe, causing the OMXCodec to transition into the FLUSHING state
3. One of the frames sitting on the VideoQueue is rendered, releasing the old frame
4. The old frame is released, calling signalBufferReturned
5. The assertion fails, and we crash

I suspect that if we clear the VideoQueue _before_ seeking, there will be no frames to be released _while_ seeking.

I can't repro reliably, but I'll put up a patch to fix it soon.
Attachment #682736 - Flags: feedback?(eflores) → feedback-
This patch seems to stop the segfault while playing for me. Can you try it?

Freezing on playing a second video seems to have fixed itself with a gaia update.
(In reply to Edwin Flores [:eflores] [:edwin] from comment #2)
> Comment on attachment 682736 [details] [diff] [review]
> Fix OMX buffer returning and timed-out issue
> 
> I don't think this is really a bug in OMXCodec. We appear to be returning
> buffers when it doesn't expect it.
> 
> I added a hack a few months back to our catchup code to seek to the next
> keyframe if playback is falling behind. What I _think_ is going on here (as
> far as the segfault goes) is:
> 
> 1. The video falls behind
> 2. We try to seek forward to the next keyframe, causing the OMXCodec to
> transition into the FLUSHING state
> 3. One of the frames sitting on the VideoQueue is rendered, releasing the
> old frame
> 4. The old frame is released, calling signalBufferReturned
> 5. The assertion fails, and we crash
> 
> I suspect that if we clear the VideoQueue _before_ seeking, there will be no
> frames to be released _while_ seeking.
> 
> I can't repro reliably, but I'll put up a patch to fix it soon.
Actually, I had a patch to fix it at first moment by queuing all buffer returning while seeking.  It works perfect for me.  But, after some studying of OMXCodec's history, I think it is a normal behavior of OMXCodec.  In the elder visions of OMXCodec, there is not FLUSHING state, users of OMXCodec can return buffer at any instant of executing.  But, after introduce FLUSHING state, I guess they forget to handle necessary changes at signalBufferReturned().

It makes no sense to forbid buffer returning while flushing since these buffers are supposed to be used later.  OMXCodec should keep them in a free list, or mark them free (OWNED_BY_US), and reuse them when need.  If we are not likely to change OMXCodec.cpp, we should queuing the requests on our own.
(In reply to Edwin Flores [:eflores] [:edwin] from comment #3)
> Created attachment 683363 [details] [diff] [review]
> Flush VideoQueue before seeking/catching up
> 
> This patch seems to stop the segfault while playing for me. Can you try it?
> 
> Freezing on playing a second video seems to have fixed itself with a gaia
> update.

I had tried the same change as suggestion from another bug.  But, it is still buggy.  signalBufferReturned() is called async.  You never know when the VideoFrameContainer releases the buffer hold.  So, it is still segfault occasionally.  You can enforce VideoFrameContainer to release it, but ImageContainer still hold a reference to the buffer.
This is the patch I had mentioned above.
Hmm yeah, after testing it out a little bit more it's still broken with my patch, just more intermittent.
The current image is only suspended when a new image is set. Maybe if we can suspend the VideoFrameContainer for the duration of the read() call from consuming a new image from the video queue, then we will still own everything in the video queue and the ImageContainer will hold on to the buffer that is currently set.
blocking-basecamp: ? → +
Assignee: nobody → tlee
Whiteboard: [caf:blocking]
(In reply to Edwin Flores [:eflores] [:edwin] from comment #8)
> The current image is only suspended when a new image is set. Maybe if we can
> suspend the VideoFrameContainer for the duration of the read() call from
> consuming a new image from the video queue, then we will still own
> everything in the video queue and the ImageContainer will hold on to the
> buffer that is currently set.

I like this solution.  Would you like to make a patch or/and take this bug?
Assignee: tlee → eflores
Edwin, our partners need a patch for this today. Is that possible?
Priority: -- → P1
Should have a patch up for review soon.
Comment on attachment 685319 [details] [diff] [review]
Drop frames instead of seeking when falling behind

cpearce, is does this match to the skip keyframe change you suggested?
Attachment #685319 - Flags: review?(chris.double) → review?(cpearce)
Comment on attachment 685319 [details] [diff] [review]
Drop frames instead of seeking when falling behind

Review of attachment 685319 [details] [diff] [review]:
-----------------------------------------------------------------

Yes.
Attachment #685319 - Flags: review?(cpearce) → review+
Oops, seems to have cleared that field for some reason
Priority: -- → P1
Attached patch Lower AmpleVideoFrames (obsolete) — Splinter Review
Attachment #685504 - Flags: review?(chris.double)
Horrifically ugly hack, this patch, but necessary to avoid a crash on shutdown of the OmxDecoder
Comment on attachment 685504 [details] [diff] [review]
Lower AmpleVideoFrames

>+  // Set to the minimum to prevent a race when we finish decoding MPEG4 videos

Can you explain this in more detail please. What is the race and how does lowering the value of ample video frames prevents it?
Flags: needinfo?(eflores)
https://hg.mozilla.org/mozilla-central/rev/76414ce478c3
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Attachment #685504 - Attachment is obsolete: true
Attachment #685504 - Flags: review?(chris.double)
Attachment #685834 - Flags: review?(chris.double)
Flags: needinfo?(eflores)
Comment on attachment 685834 [details] [diff] [review]
Lower AmpleVideoFrames

>+  // Due to a bug in the OMX MPEG4 decoder, we can't own too many video buffers

Change "OMX MPEG4 decoder" to the exact name of the decoder (ie. google.what.ever.mpeg4).

>+  // So we need few enough buffers in the queue that all buffers will be
>+  // returned before OMXCodec begins shutdown.
>+  uint32_t GetAmpleVideoFrames() { return 2; }

Sounds like a wonderful source of intermittent bugs coming up related to timing. r+ for the workaround but we should investigate a better way of dealing with this. Can you raise a bug for fixing or investigating a fix for this so it doesn't get forgotten.
Attachment #685834 - Flags: review?(chris.double) → review+
Edwin: your patch in comment 24 doesn't apply on beta branch. There are significant differences in files so couldn't apply manually. Can you please rebase your patch for beta branch and share. I need to verify that mp4 files are working fine after your changes.
Comment on attachment 682736 [details] [diff] [review]
Fix OMX buffer returning and timed-out issue

Edwin has already addressed this so clearing flag.
Attachment #682736 - Flags: feedback?(chris.double)
(In reply to Edwin Flores [:eflores] [:edwin] from comment #27)
> Created attachment 686239 [details] [diff] [review]
> Lower AmpleVideoFrames (rebased against beta)
Edwin: can you please get this patch landed on beta asap.
Blocks: 798448
New Zealand office is all out on Nov 30th - can someone else land this patch?
(In reply to JP Rosevear [:jpr] from comment #30)
> New Zealand office is all out on Nov 30th - can someone else land this patch?

https://hg.mozilla.org/releases/mozilla-beta/rev/79aae58dfb76
You need to log in before you can comment on or make changes to this bug.