[A/V] OMX crashed while youtube streaming

RESOLVED FIXED in Firefox OS v1.1hd

Status

Firefox OS
General
P1
critical
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: leo.bugzilla.gecko, Assigned: sotaro)

Tracking

({crash})

unspecified
1.1 QE3 (26jun)
ARM
Gonk (Firefox OS)
crash

Firefox Tracking Flags

(blocking-b2g:leo+, b2g18 fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 wontfix, b2g-v1.1hd fixed)

Details

(Whiteboard: [b2g-crash] [POVB], crash signature)

Attachments

(1 attachment)

(Reporter)

Description

5 years ago
1. Start any video play on youtube.
2. Back and start another one.

Sometime OMXcodec crashed while it change its state to IDLE.

I'll attach dump.
(Reporter)

Comment 1

5 years ago
Created attachment 759647 [details]
dump file
(Reporter)

Updated

5 years ago
blocking-b2g: --- → leo?

Updated

5 years ago
Severity: major → critical
Crash Signature: [@ __libc_android_abort | __android_log_assert | android::OMXCodec::onStateChange]
Keywords: crash
Whiteboard: [b2g-crash]

Updated

5 years ago
Blocks: 877024

Updated

5 years ago
Whiteboard: [b2g-crash] → [b2g-crash][YouTubeCertBlocker+]

Updated

5 years ago
Whiteboard: [b2g-crash][YouTubeCertBlocker+] → [b2g-crash]
(Assignee)

Comment 2

5 years ago
From attachment 759647 [details], following check failure seems possible. If it is the case, buffer was not correctly returned to OMXCodec.

-------------------------------------------------
                CHECK_EQ(
                    countBuffersWeOwn(mPortBuffers[kPortIndexInput]),
                    mPortBuffers[kPortIndexInput].size());

                CHECK_EQ(
                    countBuffersWeOwn(mPortBuffers[kPortIndexOutput]),
                    mPortBuffers[kPortIndexOutput].size());
(Assignee)

Comment 3

5 years ago
(In reply to leo.bugzilla.gecko from comment #0)
> 1. Start any video play on youtube.
> 2. Back and start another one.
> 
> Sometime OMXcodec crashed while it change its state to IDLE.
> 
> I'll attach dump.

To test this, which data connection is used? WLAN or 3G? And how often does the crash happen?
leo - can you help address the questions on comment 3?
Flags: needinfo?(leo.bugzilla.gecko)
(Assignee)

Comment 5

5 years ago
Additional question. When the crash happens how it is recognized from UI?
(Reporter)

Comment 6

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #3)
> To test this, which data connection is used? WLAN or 3G? And how often does
> the crash happen?

It's connected by 3G and actually it's not often.
It seems like under 10%.
Flags: needinfo?(leo.bugzilla.gecko)
(Reporter)

Comment 7

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #5)
> Additional question. When the crash happens how it is recognized from UI?

Device shows crashed popup and app disappeared.
(In reply to leo.bugzilla.gecko from comment #7)
> (In reply to Sotaro Ikeda [:sotaro] from comment #5)
> > Additional question. When the crash happens how it is recognized from UI?
> 
> Device shows crashed popup and app disappeared.

Can you include a crash report URL for that crash also?
(Reporter)

Comment 9

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #2)
> From attachment 759647 [details], following check failure seems possible. If
> it is the case, buffer was not correctly returned to OMXCodec.
> 
> -------------------------------------------------
>                 CHECK_EQ(
>                     countBuffersWeOwn(mPortBuffers[kPortIndexInput]),
>                     mPortBuffers[kPortIndexInput].size());
> 
>                 CHECK_EQ(
>                     countBuffersWeOwn(mPortBuffers[kPortIndexOutput]),
>                     mPortBuffers[kPortIndexOutput].size());


It can be this line also.
(in my build, call stack in dump file is exactly here)

 CHECK_EQ((int)mState, (int)EXECUTING_TO_IDLE);

Is it possible for Gecko to change OMX state from Paused to idle?
(Reporter)

Comment 10

5 years ago
(In reply to Jason Smith [:jsmith] from comment #8)
> Can you include a crash report URL for that crash also?

When I receive report of this problem, the URL is this one.

http://www.youtube.com/watch?v=yumfm_eQNtE
(Assignee)

Comment 11

5 years ago
(In reply to leo.bugzilla.gecko from comment #9)
> (In reply to Sotaro Ikeda [:sotaro] from comment #2)
> 
> It can be this line also.
> (in my build, call stack in dump file is exactly here)
> 
>  CHECK_EQ((int)mState, (int)EXECUTING_TO_IDLE);
> 
> Is it possible for Gecko to change OMX state from Paused to idle?

From the code, it should be prevented the transition. But the check failure says that state transition is not correctly handled in OMXCodec.
(Assignee)

Comment 12

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #3)
> (In reply to leo.bugzilla.gecko from comment #0)
> > 1. Start any video play on youtube.
> > 2. Back and start another one.

How long do you play the video normally?
(Assignee)

Comment 13

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #11)
> (In reply to leo.bugzilla.gecko from comment #9)
> > (In reply to Sotaro Ikeda [:sotaro] from comment #2)
> > 
> > It can be this line also.
> > (in my build, call stack in dump file is exactly here)
> > 
> >  CHECK_EQ((int)mState, (int)EXECUTING_TO_IDLE);
> > 
> > Is it possible for Gecko to change OMX state from Paused to idle?
> 
> From the code, it should be prevented the transition. But the check failure
> says that state transition is not correctly handled in OMXCodec.

Diego, do you have any idea about it?
Flags: needinfo?(dwilson)
(Reporter)

Comment 14

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #12)
> How long do you play the video normally?

I'm not sure...
It may be for 2~3 minutes.
(Reporter)

Comment 15

5 years ago
I'm not sure this is same crash case or not..
But i got 100% reproducible case.

STR
1. Go youtube by browser app.
2. Play any contents.
3. Pause the contents.
4. Back to list.

Than, crash occurs every time.
Leo - Can you get a crash report ID for the crash you hit in comment 15?

You can do this by following the directions here - https://wiki.mozilla.org/B2G/QA/Tips_And_Tricks#Getting_crashes_off_the_Device.
(Reporter)

Comment 17

5 years ago
(In reply to Jason Smith [:jsmith] from comment #16)
> Leo - Can you get a crash report ID for the crash you hit in comment 15?
> 
> You can do this by following the directions here -
> https://wiki.mozilla.org/B2G/QA/
> Tips_And_Tricks#Getting_crashes_off_the_Device.

I'm not sure it's reported correctly or not.
here's ID.

00410060-7cc8-7e7f-19092128-50c7f5a2
I'm not finding a crash report at that ID - https://crash-stats.mozilla.com/report/index/00410060-7cc8-7e7f-19092128-50c7f5a2. I'm getting page not found.

Comment 19

5 years ago
(In reply to leo.bugzilla.gecko from comment #17)
> 00410060-7cc8-7e7f-19092128-50c7f5a2
You need to connect your device to WiFi to get this crash submitted.
(Reporter)

Comment 20

5 years ago
(In reply to Scoobidiver from comment #19)
> (In reply to leo.bugzilla.gecko from comment #17)
> > 00410060-7cc8-7e7f-19092128-50c7f5a2
> You need to connect your device to WiFi to get this crash submitted.

Yes......I think it's not uploaded because I'm using 3G connection.
I'm trying to reproduce problem and upload crash report.
(Reporter)

Comment 21

5 years ago
hmm...crash doesn't occur with same procedure.
I'm checking if there's another precondition.

Comment 22

5 years ago
(In reply to leo.bugzilla.gecko from comment #20)
> (In reply to Scoobidiver from comment #19)
> > (In reply to leo.bugzilla.gecko from comment #17)
> > > 00410060-7cc8-7e7f-19092128-50c7f5a2
> > You need to connect your device to WiFi to get this crash submitted.
> 
> Yes......I think it's not uploaded because I'm using 3G connection.
> I'm trying to reproduce problem and upload crash report.

Note that it should be enough to connect to wifi after you crashed and any pending crashes should be submitted. (And apart from being in the submitted/ instead of pending/ subdirectory, the submitted crash also gets a modified crash ID that ends with 6 digits representing the submission date - so if one knows that, it's easy to see that the ID you provided is from a crash that hasn't been submitted correctly.)
(Assignee)

Comment 23

5 years ago
> > It can be this line also.
> > (in my build, call stack in dump file is exactly here)
> > 
> >  CHECK_EQ((int)mState, (int)EXECUTING_TO_IDLE);

I can not regenerate the crash. From source code, following scenario seems to be possible.

[precondition] OMXCodec in EXECUTING state
- 1. OMXCodec::stop() called
- 2. OMXCodec stage change to EXECUTING_TO_IDLE
- 3. OMXCodec send to OMX IL component to set sate to Idle
- 4. OMXCodec detect error somewhere around data handling
       and change it's state to ERROR
- 5. OMXCodec receive commland complete of changing state to Idle
       But OMXCodec's state is not EXECUTING_TO_IDLE.
(Reporter)

Updated

5 years ago
Priority: -- → P1
Target Milestone: --- → 1.1 QE3 (24jun)
(Reporter)

Comment 24

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #23)
> > > It can be this line also.
> > > (in my build, call stack in dump file is exactly here)
> > > 
> > >  CHECK_EQ((int)mState, (int)EXECUTING_TO_IDLE);
> 
> I can not regenerate the crash. From source code, following scenario seems
> to be possible.
> 
> [precondition] OMXCodec in EXECUTING state
> - 1. OMXCodec::stop() called
> - 2. OMXCodec stage change to EXECUTING_TO_IDLE
> - 3. OMXCodec send to OMX IL component to set sate to Idle
> - 4. OMXCodec detect error somewhere around data handling
>        and change it's state to ERROR
> - 5. OMXCodec receive commland complete of changing state to Idle
>        But OMXCodec's state is not EXECUTING_TO_IDLE.

In that case, it doesn't occur crash because....there's conditional branch above.

if(mState == ERROR) {
    CODEC_LOGV("mState in ERROR when moving from Executing to Idle\n");
} else {
    CHECK_EQ((int)mState, (int)EXECUTING_TO_IDLE);
}
I think my test team can reproduce it. Someone suggested a libstagefright patch [1] so we're testing it out now. 

[1] https://www.codeaurora.org/cgit/quic/la/platform/frameworks/av/commit/?id=f3690ee250f92c4810d19dc27ba7c208e408ae16
Flags: needinfo?(dwilson)
(Reporter)

Updated

5 years ago
blocking-b2g: leo? → leo+
(Assignee)

Comment 26

5 years ago
I intentionally added a delay to MediaStreamSource::readAt() and succeeded to generate similar fault. From the logcat, fault happend at following code.

> CHECK_EQ(
>      countBuffersWeOwn(mPortBuffers[kPortIndexInput]),
>      mPortBuffers[kPortIndexInput].size());

Following are related logcat.
-------------------------------

E OMXCodec: [OMX.google.aac.decoder] in error state, check omx il state and decide whether to free or skip
E OMXCodec: [OMX.google.aac.decoder] OMX IL is in state 3
F OMXCodec: frameworks/base/media/libstagefright/OMXCodec.cpp:3744 CHECK_EQ( countBuffersWeOwn(mPortBuffers[kPortIndexInput]),mPortBuffers[kPortIndexInput].size()) failed: 3 vs. 4
F libc    : Fatal signal 11 (SIGSEGV) at 0xdeadbaad (code=1)
(Assignee)

Comment 27

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #26)
> I intentionally added a delay to MediaStreamSource::readAt() and succeeded
> to generate similar fault. From the logcat, fault happend at following code.

I regenerate the fault by following STR
[1] Start Browser App
[2] playback H.264 video in the browser app(no youtube site)
[3] wait until video freeze.
[4] when video freeze and only audio playback, refresh H.264 video site.
(Assignee)

Updated

5 years ago
Assignee: nobody → sotaro.ikeda.g
(Assignee)

Comment 28

5 years ago
(In reply to Diego Wilson [:diego] from comment #25)
> I think my test team can reproduce it. Someone suggested a libstagefright
> patch [1] so we're testing it out now. 
> 
> [1]
> https://www.codeaurora.org/cgit/quic/la/platform/frameworks/av/commit/
> ?id=f3690ee250f92c4810d19dc27ba7c208e408ae16

By applying the change the crash at comment #26 if fixed. I confirmed on MozBuild v1.1 leo.
(Assignee)

Comment 29

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #28)
> By applying the change the crash at comment #26 if fixed.

Correction: 
the crash at comment #26 is fixed.
(Assignee)

Comment 30

5 years ago
(In reply to Diego Wilson [:diego] from comment #25)
> I think my test team can reproduce it. Someone suggested a libstagefright
> patch [1] so we're testing it out now. 
> 
> [1]
> https://www.codeaurora.org/cgit/quic/la/platform/frameworks/av/commit/
> ?id=f3690ee250f92c4810d19dc27ba7c208e408ae16

Leo, is the patch already enabled on leo's ROM?
(Reporter)

Comment 31

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #30)
> (In reply to Diego Wilson [:diego] from comment #25)
> > I think my test team can reproduce it. Someone suggested a libstagefright
> > patch [1] so we're testing it out now. 
> > 
> > [1]
> > https://www.codeaurora.org/cgit/quic/la/platform/frameworks/av/commit/
> > ?id=f3690ee250f92c4810d19dc27ba7c208e408ae16
> 
> Leo, is the patch already enabled on leo's ROM?

Not, yet.
I will apply it ASAP.
(I was reported this problem from tester. But I cannot reproduce, yet. : (     )
(Reporter)

Comment 32

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #30)
> (In reply to Diego Wilson [:diego] from comment #25)
> > I think my test team can reproduce it. Someone suggested a libstagefright
> > patch [1] so we're testing it out now. 
> > 
> > [1]
> > https://www.codeaurora.org/cgit/quic/la/platform/frameworks/av/commit/
> > ?id=f3690ee250f92c4810d19dc27ba7c208e408ae16
> 
> Leo, is the patch already enabled on leo's ROM?

The patch is applied on LG side.
(In reply to leo.bugzilla.gecko from comment #32)
> (In reply to Sotaro Ikeda [:sotaro] from comment #30)
> > (In reply to Diego Wilson [:diego] from comment #25)
> > > I think my test team can reproduce it. Someone suggested a libstagefright
> > > patch [1] so we're testing it out now. 
> > > 
> > > [1]
> > > https://www.codeaurora.org/cgit/quic/la/platform/frameworks/av/commit/
> > > ?id=f3690ee250f92c4810d19dc27ba7c208e408ae16
> > 
> > Leo, is the patch already enabled on leo's ROM?
> 
> The patch is applied on LG side.

Did it solve the issue?
(Reporter)

Comment 34

5 years ago
(In reply to Diego Wilson [:diego] from comment #33)
> Did it solve the issue?

I couldn't reproduce this problem, also.
So I will ask it to tester who reported this problem.
(In reply to Diego Wilson [:diego] from comment #25)
> I think my test team can reproduce it. Someone suggested a libstagefright
> patch [1] so we're testing it out now. 
> 
> [1]
> https://www.codeaurora.org/cgit/quic/la/platform/frameworks/av/commit/
> ?id=f3690ee250f92c4810d19dc27ba7c208e408ae16

Landing this patch in pre-production partner repos but not upstreaming to Mozilla puts us out of sync and prevents applicable testing. Please request review immediately if we think this patch needs to end up in 1.1.
(In reply to Alex Keybl [:akeybl] from comment #35)
> Landing this patch in pre-production partner repos but not upstreaming to
> Mozilla puts us out of sync and prevents applicable testing. Please request
> review immediately if we think this patch needs to end up in 1.1.

FYI this is a CAF Android patch that has not landed on CAF 1.1 gonk. I was waiting on leo.bugzilla.gecko@gmail.com to report back before porting it.
(Reporter)

Comment 37

5 years ago
(In reply to Diego Wilson [:diego] from comment #36)
> FYI this is a CAF Android patch that has not landed on CAF 1.1 gonk. I was
> waiting on leo.bugzilla.gecko@gmail.com to report back before porting it.

Test will be started from next Monday. I think I can give you a feedback after a week.
Leo, has this been tested with the new release?

Thanks
Flags: needinfo?(leo.bugzilla.gecko)
marking POVB per comment 36
Whiteboard: [b2g-crash] → [b2g-crash] [POVB]
(Reporter)

Comment 40

5 years ago
(In reply to Wayne Chang [:wchang] from comment #38)
> Leo, has this been tested with the new release?
> 
> Thanks

Yes.

I applied the patch on LG side for test. 
And new version have being tested.
Flags: needinfo?(leo.bugzilla.gecko)
(Reporter)

Comment 41

5 years ago
I think the patch works anyway.

I have never received crash report while testing.
(Reporter)

Comment 42

5 years ago
And I also confirm that the patch is applied in CAF (AU149)
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
(Assignee)

Comment 44

5 years ago
> https://www.codeaurora.org/cgit/quic/la/platform/frameworks/base/commit/
> ?h=b2g/ics_strawberry&id=bb9f8618ff7a1dfb5627b3c13219f6583a16e470

The above change is enabled in b2g/ics_strawberry branch. MozBuild uses the branch on leo and helix. hamachi also uses ICS_STRAWBERRY platform but uses b2g/ics_strawberry_v1 branch on MozBuild.
(Assignee)

Updated

5 years ago
status-b2g18: --- → fixed
status-b2g18-v1.0.0: --- → wontfix
status-b2g18-v1.0.1: --- → wontfix
status-b2g-v1.1hd: --- → fixed
You need to log in before you can comment on or make changes to this bug.