Closed Bug 1141328 Opened 10 years ago Closed 9 years ago

Crash in video playback after pause

Categories

(Firefox OS Graveyard :: Vendcom, defect)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: ggrisco, Unassigned)

Details

(Keywords: crash, Whiteboard: [caf priority: p2][CR 799496][b2g-crash][caf-crash 554][POVB])

Crash Data

Attachments

(8 files)

Found this on reference device (8909).

Steps to reproduce:

1.  Start the video app
2.  Select a video for playback
3.  While playing video, press the home key
4.  Wait a couple of seconds
5.  From homescreen, select the video app again

Video app crashes at this point due to failed check in OMXCodec.cpp.  The current state is FLUSHING (9).

It looks like gecko is not handling INFO_FORMAT_CHANGED at the top of ReadVideo().
Attached file decoded minidump -
Whiteboard: [b2g-crash][caf-crash 554] → [b2g-crash][caf-crash 528]
Whiteboard: [b2g-crash][caf-crash 528] → [CR 799496][b2g-crash][caf-crash 528]
Whiteboard: [CR 799496][b2g-crash][caf-crash 528] → [caf priority: p1][CR 799496][b2g-crash][caf-crash 528]
Keywords: crash
I am doubtful if we'd hit this on flame, nevertheless adding qawanted to check.

In the interim, NI Hema, to re-derelict this to the right person from Core who can investigate the logs.
Flags: needinfo?(hkoka)
Keywords: qawanted
I'll try to check on flame as well (although I might not get to it until tomorrow).
Whiteboard: [caf priority: p1][CR 799496][b2g-crash][caf-crash 528] → [caf priority: p1][CR 799496][b2g-crash][caf-crash 554]
I was unable to reproduce this issue on today's Flame 2.2 and 3.0 tinderbox builds.  Leaving the tag for someone else to try or the original report.

Environmental Variables:
Device: Flame 3.0
BuildID: 20150310055519
Gaia: f6a1fcd30ee6a286f3bca9d0c3cb600e21bfbf69
Gecko: c42e7e3bb0a0
Version: 39.0a1 (3.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0

Environmental Variables:
Device: Flame 2.2
BuildID: 20150310080619
Gaia: 166491b92278dc9e648f8d49ab02d9ca00d74421
Gecko: 1dfff0a01cd3
Version: 37.0 (2.2) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0
Flags: needinfo?(ktucker)
I could not reproduce this issue on the Flame 3.0 and Flame 2.2

I tried multiple file types including .3gpp .mp4 .ogg .mov .webm and .avi and no crashed occurred when following the steps in Comment 0.

0/20 attempts

Environmental Variables:
Device: Flame 3.0 (Full Flash)(KK)(319mb)
BuildID: 20150310010227
Gaia: 2fb09da0cb9cefad9c6e40f57533fafda6d12557
Gecko: 6686aacf006f
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 39.0a1 (3.0) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0

Environmental Variables:
Device: Flame 2.2 (Full Flash)(KK)(319mb)
BuildID: 20150310002536
Gaia: 166491b92278dc9e648f8d49ab02d9ca00d74421
Gecko: 1cda026f8996
Gonk: e7c90613521145db090dd24147afd5ceb5703190
Version: 37.0 (2.2) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0
Flags: needinfo?(ktucker)
I was also unable to reproduce this bug. I had other apps in the background (camera, settings, browser, contacts, etc), played 3gpp, mp4, webm files, and tried with landscape/portrait mode and fullscreen mode and w/ controller. After trying about 30 times on each build, I could not reproduce the crash on Flame Master and 2.2.

Environmental Variables:
Device: Flame Master
Build ID: 20150310055519
Gaia: f6a1fcd30ee6a286f3bca9d0c3cb600e21bfbf69
Gecko: c42e7e3bb0a0
Version: 39.0a1 (Master)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0

Environmental Variables:
Device: Flame 2.2
BuildID: 20150310104126
Gaia: 51959bfe85815ec13a677d4745af40272044317d
Gecko: 348f59e12ddb
Version: 37.0 (2.2) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Keywords: qawanted
On 8909 I can reproduce this 20 out of 20 times, it's very repeatable.  You can see this message in the log (see the attached .extra file, containing log snippet):

01-01 03:29:38.639 18727 30818 D OmxDecoder: Unexpected error when seeking to 12797278

Gecko is printing this message after a call to read() is returning INFO_FORMAT_CHANGED which seems to not be handled well.
Adding Sotaro and Chris Pearce to see if they have any input on this.

There seems to be a couple of unresolved OMXCodec related Bugs: 

https://bugzilla.mozilla.org/show_bug.cgi?id=1101826

https://bugzilla.mozilla.org/show_bug.cgi?id=1099356
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(hkoka)
Flags: needinfo?(cpearce)
Crash in OMX, Steven Lee: is this something someone on your team or other teams in Taipei should handle?
Flags: needinfo?(cpearce) → needinfo?(slee)
(In reply to Hema Koka [:hema] from comment #12)
> Adding Sotaro and Chris Pearce to see if they have any input on this.
> 
> There seems to be a couple of unresolved OMXCodec related Bugs: 
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=1101826
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=1099356

Bug 1099356 is dup of bug 1101826. It happens because of codeaurora's OML IL component's implementation bug.
Flags: needinfo?(sotaro.ikeda.g)
(In reply to cafbot (PoC: ggrisco) from comment #1)
> Created attachment 8574911 [details]
> EXTRA file attachment -

The following seems the direct cause of the crash. OMXCodec's internal state check detect incorrect internal state.

> 01-01 02:55:08.459  4029  8804 F OMXCodec: frameworks/av/media/libstagefright/OMXCodec.cpp:3203 CHECK(mState == EXECUTING || mState == EXECUTING_TO_IDLE) failed.
(In reply to Sotaro Ikeda [:sotaro] from comment #15)
> (In reply to cafbot (PoC: ggrisco) from comment #1)
> > Created attachment 8574911 [details]
> > EXTRA file attachment -
> 
> The following seems the direct cause of the crash. OMXCodec's internal state
> check detect incorrect internal state.
> 
> > 01-01 02:55:08.459  4029  8804 F OMXCodec: frameworks/av/media/libstagefright/OMXCodec.cpp:3203 CHECK(mState == EXECUTING || mState == EXECUTING_TO_IDLE) failed.

Right.  OMXCodec thinks the state is still FLUSHING when it's expecting EXECUTING or EXECUTING_TO_IDLE.  But I think that *might* be because gecko is not handling the port settings change correctly.  This I'm not sure and would like somebody from gecko video to confirm.
Hi Blake,
Can you also check this bug? Thanks.
Flags: needinfo?(slee) → needinfo?(bwu)
blocking-b2g: 2.2? → 2.2+
(In reply to StevenLee[:slee] from comment #17)
> Hi Blake,
> Can you also check this bug? Thanks.
Sure. I will borrow the 8909 device to check it.
Flags: needinfo?(bwu)
No longer blocks: CAF-v3.0-FL-metabug
(In reply to Greg Grisco from comment #16)
> (In reply to Sotaro Ikeda [:sotaro] from comment #15)
> > (In reply to cafbot (PoC: ggrisco) from comment #1)
> > > Created attachment 8574911 [details]
> > > EXTRA file attachment -
> > 
> > The following seems the direct cause of the crash. OMXCodec's internal state
> > check detect incorrect internal state.
> > 
> > > 01-01 02:55:08.459  4029  8804 F OMXCodec: frameworks/av/media/libstagefright/OMXCodec.cpp:3203 CHECK(mState == EXECUTING || mState == EXECUTING_TO_IDLE) failed.
> 
> Right.  OMXCodec thinks the state is still FLUSHING when it's expecting
> EXECUTING or EXECUTING_TO_IDLE.  But I think that *might* be because gecko
> is not handling the port settings change correctly.  This I'm not sure and
> would like somebody from gecko video to confirm.
Gecko cannot change the port settings which is usually handled in OMXCodec.cpp. What Gecko does is trying to read decoded frames via OMXCodec with optional seek. 

For error INFO_FORMAT_CHANGED, Gecko just simply[1] set video format[2] and re-read frame[3] again.
[1]https://dxr.mozilla.org/mozilla-central/source/dom/media/omx/OmxDecoder.cpp?from=OmxDecoder.cpp&case=true#684
[2]https://dxr.mozilla.org/mozilla-central/source/dom/media/omx/OmxDecoder.cpp?from=OmxDecoder.cpp&case=true#395 (This will not access OMXCodec)
[3]https://dxr.mozilla.org/mozilla-central/source/dom/media/omx/OmxDecoder.cpp?from=OmxDecoder.cpp&case=true#689
I noticed two states, FlUSHING and PAUSING, in OMXCodec are added and implemented by CAF[1] to save power for pause. So for other platforms, we should not hit this problem. 
Possibly Gecko's current read/seek flow may not fit with the state change in CAF's OMX IL implementation and might hit some timing issue I think.
From OMX state change, we would need CAF's help to look into further.  

[1]https://www.codeaurora.org/cgit/quic/la/platform/frameworks/av/commit/media/libstagefright/OMXCodec.cpp?h=LA.AF.1.1.1&id=386444cd93366c92ff4a7d9807f8c9e2cab3a826

Hi mvines,
Someone from your team can help to check it?
Flags: needinfo?(mvines)
Flags: needinfo?(mvines) → needinfo?(ggrisco)
I'm asking for some help from our video experts.  Will let you know any information I can find out.
Flags: needinfo?(ggrisco)
Flags: needinfo?(ggrisco)
Whiteboard: [caf priority: p1][CR 799496][b2g-crash][caf-crash 554] → [caf priority: p2][CR 799496][b2g-crash][caf-crash 554]
Component: Video/Audio → Vendcom
Product: Core → Firefox OS
On Flame, the issue cannot be reproduced.
Port reconfig (onPortSettingsChanged) is always done before the first read / seek (whichs change state to flushing) arrives. Therefore the state CHECK in onPortSettingsChanged always pass (mState == EXECUTING). Somehow in your case, Port reconfig and seek run in parallel. You may need to use some mutex to protect the critical section.

Munro
(In reply to Munro Chiang from comment #22)
> On Flame, the issue cannot be reproduced.
> Port reconfig (onPortSettingsChanged) is always done before the first read /
> seek (whichs change state to flushing) arrives. Therefore the state CHECK in
> onPortSettingsChanged always pass (mState == EXECUTING). Somehow in your
> case, Port reconfig and seek run in parallel. You may need to use some mutex
> to protect the critical section.
> 
I think you meant some lock / synchronization mechanism, and it doesn't have to be "mutex", right? AFAIK, mutex has some significant cost and it results in system calls to kernel which has some overhead.
(In reply to Munro Chiang from comment #22)
> On Flame, the issue cannot be reproduced.
In this case, maybe we can revisit to see if we should mark this one as a blocker.
blocking-b2g: 2.2+ → 2.2?
The particular issue that I reported here seems to be coming from audio port settings changing after the resume of playback.  The port settings change comes only with new SoftAAC decoder on Lollipop.  On Lollipop, upstream Android uses NuPlayer which can handle the port settings change, but the older frameworks can't.

We verified the audio issue by removing the audio track from the clip and seeing the crash go away.

I'm trying to find a solution for this now.
Flags: needinfo?(ggrisco)
Hi Kevin, yes, it doesn't need to be mutex. But mutex is widely used in OMXCodec for synchronization.

I will keep digging into this problem.
Whiteboard: [caf priority: p2][CR 799496][b2g-crash][caf-crash 554] → [caf priority: p2][CR 799496][b2g-crash][caf-crash 554][POVB]
Setting assignee to Munro per comment 26.
Assignee: nobody → mchiang
blocking-b2g: 2.2? → 2.2+
No longer blocks: CAF-v2.2-metabug
blocking-b2g: 2.2+ → ---
Unassigning Munro as this is POVB and will be fixed by CAF.
Assignee: mchiang → nobody
We fixed this in CAF code, so marking RESOLVED.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Greg, 
Would it be ok to let us know which patch in the CAF fix this bug?
Thanks!
Flags: needinfo?(ggrisco)
(In reply to Blake Wu [:bwu][:blakewu] from comment #36)
> Greg, 
> Would it be ok to let us know which patch in the CAF fix this bug?
> Thanks!

Here's the main change:

https://www.codeaurora.org/cgit/external/gigabyte/platform/frameworks/av/commit/media/libstagefright/codecs/aacdec?h=caf/LF.BR.1.2.3&id=a6f98b53e1530b3fc4ec2b07356ca57eec747acf

The issue is that the new SoftAAC decoder in L generates port reconfiguration mid-stream which the old framework can't handle (it is handled in NuPlayer though).  So we decided to just re-use the SoftAAC decoder from KK (the above commit).  

This seemed to solve the problem, however in last couple builds we are seeing many crashes in the exact same place.  Port reconfig is still being generated, although not as easy to reproduce now.  It's something that I'm currently working on.
Flags: needinfo?(ggrisco)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: