1341360 - [Fennec] Crash in libstagefright.so

Reporter

Description

•

8 years ago

[Tracking Requested - why for this release]: This bug was filed from the Socorro interface and is report bp-a8f63bf6-9355-4801-8e51-337dc2170221. ============================================================= crashes in libstagefright.so are increasing in the 52.0b cycle - they are now accounting for 17% of crashes in 52.0b6 (but were just ~8% in 51.0b).

Milan Sreckovic [:milan] (needinfo for best results)

Comment 1

•

8 years ago

This bug 1292144 as well? Note that volume is dropping, perhaps it's moved to one of the signatures tracked here?

Comment 2

•

8 years ago

Marking as blocking 52, we should probably find out what causes this spike.

tracking-firefox52: ? → blocking

[:philipp]

Reporter

Comment 3

•

8 years ago

the numbers regarding crash volume in comment #0 were based on all crash reports with a signature starting with libstagefright.so*. when looking into correlations for the top signatures in 52.0b they all seem to be related to a particular device/chipset: * libstagefright.so@0xcd7f1 (100.0% in signature vs 00.84% overall) android_model = Lenovo TAB 2 A10-70F * libstagefright.so@0x11d8e6 (100.0% in signature vs 00.52% overall) android_model = SM-G355M * libstagefright.so@0x869ed (100.0% in signature vs 01.72% overall) android_hardware = mt6755 (OPPO) * libstagefright.so@0xdd51d (100.0% in signature vs 02.66% overall) android_hardware = mt6752 (Sony) * libstagefright.so@0x1ea722 (100.0% in signature vs 05.29% overall) android_hardware = mt6582

tracking-firefox52: blocking → ?

Summary: Crash in libstagefright.so@0xcd7f1 → Crash in libstagefright.so

Julien Cristau [:jcristau]

Updated

•

8 years ago

tracking-firefox52: ? → blocking

Milan Sreckovic [:milan] (needinfo for best results)

Comment 4

•

8 years ago

It would make sense for the addresses in the module to be version/device specific; they may very well all be custom :)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 5

•

8 years ago

Blake, John, this is another media-related crash that has apparently spiked on Beta.

Flags: needinfo?(jolin)

Flags: needinfo?(bwu)

Astley Chen (inactive)

Updated

•

8 years ago

status-firefox51: --- → affected

Component: General → Audio/Video

Summary: Crash in libstagefright.so → [Fennec] Crash in libstagefright.so

John Lin [:jhlin][:jolin]

Assignee

Comment 6

•

8 years ago

Most, if not all, of collected logcat dumps show the visited page creates and uses many codec simultaneously. Looks like a video gallery page with more than a few video element could exhaust mediaserver and bring it down. This in turn will mess IPC/binder up and confuse sanity check code in ACodec, then abort browser process. James, does Fennec have some way to constrain the number of codecs in use?

Flags: needinfo?(jolin)

Flags: needinfo?(bwu)

John Lin [:jhlin][:jolin]

Assignee

Updated

•

8 years ago

Updated

•

8 years ago

Flags: needinfo?(snorp)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 7

•

8 years ago

Ok, we had similar problems before OOP decoding too. Eugen was working on some patches to limit decoders, but I don't know the status there.

Flags: needinfo?(snorp) → needinfo?(esawin)

Eugen Sawin [:esawin]

Comment 8

•

8 years ago

We can limit the number of supported decoder pairs via the "media.decoder.limit" pref since version 50. Currently, we set the limit to 1 by default on devices running Android 4.2 and earlier [1]. [1] https://searchfox.org/mozilla-central/source/dom/media/MediaPrefs.h#185

Flags: needinfo?(esawin)

John Lin [:jhlin][:jolin]

Assignee

Comment 9

•

8 years ago

For Android API 18 and later, hard-coding a limit on devices that support multiple decoder instances seems too extreme to me. How about rejecting software decoder by comparing the return value of MediaCodec.getName() [1] with "OMX.google*" and "OMX.*" like what stagefright does [2]? [1] https://developer.android.com/reference/android/media/MediaCodec.html#getName() [2] http://androidxref.com/5.0.0_r2/xref/frameworks/av/media/libstagefright/OMXCodec.cpp#147

Flags: needinfo?(snorp)

Flags: needinfo?(esawin)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 10

•

8 years ago

I like that idea. It seems like we already talked about something like that? I think we wanted to reject software vp9 in some cases.

Flags: needinfo?(snorp)

Eugen Sawin [:esawin]

Comment 11

•

8 years ago

Either way, we should only impose those limits on affected devices. Each of those signatures seems to be specific to 1-2 SoCs.

Flags: needinfo?(esawin)

John Lin [:jhlin][:jolin]

Assignee

Comment 12

•

8 years ago

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #10) > I like that idea. It seems like we already talked about something like that? > I think we wanted to reject software vp9 in some cases. I found - bug 1232911, which rejects VPX for specific devices, and - bug 1286738, which rejects software VP9 to make YouTube serve H.264 to Fennec Both use predefined conditions rather than running status. It seems another mechanism that rejects during initialization is needed for this bug. Will upload the patch later.

John Lin [:jhlin][:jolin]

Assignee

Comment 13

•

8 years ago

(In reply to Eugen Sawin [:esawin] from comment #11) > Either way, we should only impose those limits on affected devices. Each of > those signatures seems to be specific to 1-2 SoCs. The majority of crashes are mt*(MTK), few are Exynos(Samsung) and qcom (Qualcomm), and a handful of some that I don't recognize. Imposing the limits on MTK devices only should significantly reduce the number of crashes, but applying this universally seems safer to me.

Gerry Chang [:gchang]

Comment 14

•

8 years ago

We don't have the plan to have dot release for 51. Mark 51 won't fix.

status-firefox51: affected → wontfix

Astley Chen (inactive)

Comment 15

•

8 years ago

52 RC build is around the corner, it would be great if we can have a new build for beta users as trial to see whether crash rate is trending down or not.

Severity: critical → blocker

Blake Wu [:bwu][:blakewu]

Comment 16

•

8 years ago

Some crashes result from ACodec : frameworks/av/media/libstagefright/ACodec.cpp:1762 CHECK_EQ( metaData->eType,kMetadataBufferTypeGrallocSource) failed: 0 vs. 1

Julien Cristau [:jcristau]

Comment 17

•

8 years ago

After looking at this in crash stats, it seems the spike started around January 23/24 on fennec 50.1.0 (which was released over a month before that), and then followed on 51.0.2 and 51.0.3 when that was rolled out. So it seems like this was caused by other factors, and I'm not convinced 52 is actually any worse off than previous versions.

Julien Cristau [:jcristau]

Comment 18

•

8 years ago

I'm removing the blocker flag as I'm not convinced this is actually much worse in 52 than 50 or 51. It *does* look like something we should try to figure out and fix with high priority, but at this point I don't think holding the release is necessary or useful.

tracking-firefox52: blocking → +

John Lin [:jhlin][:jolin]

Assignee

Comment 19

•

8 years ago

Turns out merely rejecting software decoder is not sufficient. After applying the patch I can still crash mediaserver process when visiting the page. Also browsed with Chrome and it looks just fine. However when examining the log, I noticed that it doesn't use software decoder either. Will dig deeper to see if it has other resource management tricks.

Blake Wu [:bwu][:blakewu]

Updated

•

8 years ago

Assignee: nobody → jolin

Priority: -- → P1

John Lin [:jhlin][:jolin]

Assignee

Comment 20

•

8 years ago

I have the patch using only hardware video decoder that seems reduce the chances of crash on my Z3C/Android 5.1.1. But it cannot stop mediaserver process crash from triggering assert/abort code. That looks like a known issue to Google which was fixed in Mashmallow [1]. Since OOP decoding is going to be on in 53, it seems to me that spending more time on this one is not a good investment of efforts. James and Blake, what do you think? Should we uplift the patch, or just use the existing pref? FWIW, I vote for the later. :P [1] https://android.googlesource.com/platform/frameworks/av/+/48a31bf3f1c1ed5953a4e64f71cdf528f3a38ee5

Flags: needinfo?(snorp)

Flags: needinfo?(bwu)

Blake Wu [:bwu][:blakewu]

Comment 21

•

8 years ago

SGTM. Let's uplift to 53 first. From the crash reports, it looks most cashes comes from low end phones. I am going to get some of those phones and we can verify your patch as well. And agree that we also need to make OOP more stable to mitigate this kind of crashes.

Flags: needinfo?(bwu)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 22

•

8 years ago

Yup, sounds fine. Thanks!

Flags: needinfo?(snorp)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 23

•

8 years ago

(In reply to John Lin [:jolin][:jhlin] from comment #20) > ... > > Since OOP decoding is going to be on in 53, it seems to me that spending > more time on this one is not a good investment of efforts. This is a surprise - I thought out of process decoding is only enabled by default where we have the compositor process, which is Windows.

John Lin [:jhlin][:jolin]

Assignee

Comment 24

•

8 years ago

(In reply to Milan Sreckovic [:milan] from comment #23) > (In reply to John Lin [:jolin][:jhlin] from comment #20) > > ... > > > > Since OOP decoding is going to be on in 53, it seems to me that spending > > more time on this one is not a good investment of efforts. > > This is a surprise - I thought out of process decoding is only enabled by > default where we have the compositor process, which is Windows. Fennec has its own OOP decoding implementation. Please see bug 1257777.

Liz Henry (:lizzard) (relman/hg->git project)

Comment 27

•

8 years ago

What patch are you aiming to uplift, here?

Flags: needinfo?(jolin)

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

8 years ago

status-firefox53: --- → affected

status-firefox54: --- → ?

status-firefox55: --- → ?

John Lin [:jhlin][:jolin]

Assignee

Comment 28

•

8 years ago

Will send the patch for reviewing later. Thanks for the reminder.

Flags: needinfo?(jolin)

John Lin [:jhlin][:jolin]

Assignee

Comment 29

•

8 years ago

Attached patch Query the component names and reject SW decoders. — Details — Splinter Review

MediaCodec.createDecoderByType() allocates SW decoder automatically when it fails to get a HW one. To avoid that, use MediaCodecList to query decoder names ourselves and call MediaCodec.createByCodecName() instead.

Attachment #8846927 - Flags: review?(esawin)

Eugen Sawin [:esawin]

Comment 30

•

8 years ago

Comment on attachment 8846927 [details] [diff] [review] Query the component names and reject SW decoders. Review of attachment 8846927 [details] [diff] [review]: ----------------------------------------------------------------- Just some nits/suggestions. ::: mobile/android/geckoview/src/main/java/org/mozilla/gecko/util/HardwareCodecCapabilityUtils.java @@ +45,5 @@ > + return !findDecoderNamesForMimeType(aMimeType, false /* HW only */, true /* 1st only */).isEmpty(); > + } > + > + private static List<String> findDecoderNamesForMimeType(String aMimeType, > + boolean aHardwareOnly, Alternatively, we could add an enum (e.g., HARDWARE_ONLY, HARDWARE_SOFTWARE) instead to improve readability without inline comments. @@ +46,5 @@ > + } > + > + private static List<String> findDecoderNamesForMimeType(String aMimeType, > + boolean aHardwareOnly, > + boolean aFirstOnly) { Do we expect retrieving the whole list to cause a performance regression? I'm not sure whether adding the aFirstOnly flag is warranted. @@ +63,5 @@ > for (String mimeType : info.getSupportedTypes()) { > if (mimeType.equals(aMimeType)) { > + names.add(name); > + if (aFirstOnly) { > + break iterate; Please move the break into the outer loop (cond: aFirstOnly && names.size()) to avoid the label.

Attachment #8846927 - Flags: review?(esawin) → review+

Julien Cristau [:jcristau]

Comment 32

•

8 years ago

Mass wontfix for bugs affecting firefox 52.

status-firefox52: affected → wontfix

Blake Wu [:bwu][:blakewu]

Comment 33

•

8 years ago

We put our resource to ship OOP in 54 to mitigate libstagefright crashes.

status-firefox53: affected → wontfix

Marco Castelluccio [:marco]

Comment 34

•

8 years ago

These signatures dropped significantly recently. Looking at the URLs, before April 20 we were crashing really often on some specific porn websites (xnxx.com, xvideos.com), with ~20000 reports in one week; we are now crashing way more rarely on those websites (~800 reports over the past week). From 11 to 18, these two websites accounted for ~75% of crash reports where the URL field is not empty (~10% of all crash reports).

Emma Humphries ☕️🎸🧞‍♀️✨ (she/they) [:emceeaich] (Pacific Time) use needinfo

Comment 35

•

8 years ago

This is an assigned P1 bug without activity in two weeks. If you intend to continue working on this bug for the current release/iteration/sprint, remove the 'stale-bug' keyword. Otherwise we'll reset the priority of the bug back to '--' on Monday, August 28th.

Keywords: stale-bug

Blake Wu [:bwu][:blakewu]

Comment 36

•

7 years ago

Mass change P1->P2 to align with new Mozilla triage process.

Priority: P1 → P2

Blake Wu [:bwu][:blakewu]

Comment 37

•

7 years ago

With out-of-process decoding(Bug 1333323), it is not a problem anymore.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → INVALID

Blake Wu [:bwu][:blakewu]

Updated

•

7 years ago

Resolution: INVALID → FIXED

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox54: ? → fixed

status-firefox55: ? → ---

Keywords: stale-bug

BMO Automation

Updated

•

4 years ago

Product: Firefox for Android → Firefox for Android Graveyard