Open Bug 802827 Opened 7 years ago Updated 4 years ago

Crash in nsMediaPluginHost::DestroyDecoder @ libstagefrighthw.so@0xca3 on devices with NVIDIA Tegra/Tegra 3 GPUs

Categories

(Core :: Audio/Video: Playback, defect, P5, critical)

17 Branch
ARM
Android
defect

Tracking

()

mozilla20
Tracking Status
firefox17 + wontfix
firefox18 + wontfix
firefox19 + wontfix
firefox20 - affected
firefox21 --- affected
fennec + ---

People

(Reporter: aaronmt, Assigned: bjacob)

References

()

Details

(Keywords: crash, regression, reproducible, Whiteboard: [native-crash], [hwdecoder] [leave open])

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-39be186f-0e74-4a5e-ae82-53e532121017 .
============================================================= 

I am able to reproduce this crash rather easily on mozilla-central with my Transformer Prime (TF201), Nightly (10/17).

bp-39be186f-0e74-4a5e-ae82-53e532121017

STR

i) http://people.mozilla.com/~atrain/mobile/tests/media.html
ii) Start playing the very bottom Big Buck bunny video
iii) Quit Fennec during playback

--
Asus Transformer Prime (TF201, Android 4.1.1)
Nightly (10/17)
Hardware: All → ARM
It's #9 top crasher in 17.0b2.

Here is the breakdown by device in 17.0 Beta over the last week:
Acer A500 	45
Asus Transformer 	35
Asus Transformer TF101 	34
Samsung GT-P7510 	29
Asus Transformer Pad TF300T 	28
NVidia VegaBean Beta 5 	22
Samsung GT-P7500 	13
HTC One X 	12
Lenovo K1 	12
Unknown Malata SMBA1002 	10
Unknown Notion Ink ADAM 	8
Acer A501 	8
Samsung GT-P7310 	8
Acer A200 	6
Samsung GT-P7300 	5
Acer A100 	4
Motorola MB860 	2
Toshiba ac100 	2
NVidia VegaBean Beta 6 	2
Asus Transformer Prime TF201 	1
tracking-fennec: --- → ?
Keywords: regression, topcrash
Summary: Crash in [@ @0x0 | libstagefrighthw.so@0xca3] → Crash in nsMediaPluginHost::DestroyDecoder @ libstagefrighthw.so@0xca3 on devices with NVIDIA Tegra/Tegra 3 GPUs
Version: Trunk → 17 Branch
Edwin, can you see if you can duplicate this on your One X?
QA Contact: eflores
Assigning to Chris to follow up once there are (hopefully) STR and logs from a device.  Anything we track for 17 at this point must have an assignee so if you're not the right person, please hand off to another person who is not 'nobody'.
Assignee: nobody → chris.double
QA Contact: eflores
Assuming this is a hw decoder problem given the GPU comment..

The plan is to land Bug 806369 (create a blacklist) to deal with this blocker issue here for Firefox 17.  The root cause fix will addressed after Firefox 17.
Depends on: 806369
Whiteboard: [native-crash] → [native-crash], [hwdecoder]
Assignee: chris.double → nchen
Jim has one of these and is going to take a look
tracking-fennec: ? → 17+
Aaron - Can you still reproduce this bug? If so, could you show it to Kats?
(In reply to Mark Finkle (:mfinkle) from comment #6)
> Aaron - Can you still reproduce this bug? If so, could you show it to Kats?

Yes (TF201, CM10 (4.1), Nightly 11/01).
I still can't seem to reproduce this bug. I installed a recent CM10 build (cm10-20121026-unofficial-tf201) and nightly (11/05). Quitting Fennec while playing the video clip did not produce a crash.
We're at the stage in 17 beta cycle where we won't be able to take any more speculative fixes so it looks like this will have to remain unfixed in 17.
(In reply to Lukas Blakk [:lsblakk] from comment #9)
> it looks like this will have to remain unfixed in 17.
Unless some devices (model or GPU) are blocked for StageFright decoding.
Who is responsible for the blocking of devices to prevent SW & HW decoding crashes?
It seems that bug 806369 will partially take care of these crashes, except for Samsung devices (18.5% of crashes) that are whitelisted for StageFright on ICS.
Jim can't reproduce, bouncing back over to Chris
Assignee: nchen → chris.double
I don't have a device to test this on so can't reproduce either.
Despite the Samsung device whitelisting for StageFright in 17.0b5, it still happen on non-Samsung devices and is currently #2 top crasher in 17.0b5.
I suggest blocking the devices that are crashing until someone gets a device they can reproduce on. Was an answer made to comment 10:

"Who is responsible for the blocking of devices to prevent SW & HW decoding crashes?"

I assume the android team.
(In reply to Chris Double (:doublec) from comment #15)
> I suggest blocking the devices that are crashing until someone gets a device
> they can reproduce on.
As said in comment 14, they still crash despite the blocklisting for StageFright decoding on ICS for all devices except Samsung ones.
(In reply to Scoobidiver from comment #16)
> (In reply to Chris Double (:doublec) from comment #15)
> As said in comment 14, they still crash despite the blocklisting for
> StageFright decoding on ICS for all devices except Samsung ones.

If they're crashing, they're not blacklisted. This is because blacklisting prevents loading the code that crashes. Therefore either these devices are not blacklisted or the blacklist is broken.
(In reply to Chris Double (:doublec) from comment #17)
> (In reply to Scoobidiver from comment #16)
> > (In reply to Chris Double (:doublec) from comment #15)
> > As said in comment 14, they still crash despite the blocklisting for
> > StageFright decoding on ICS for all devices except Samsung ones.
> 
> If they're crashing, they're not blacklisted. This is because blacklisting
> prevents loading the code that crashes. Therefore either these devices are
> not blacklisted or the blacklist is broken.

Either way, this sounds like a new (separate) bug against the patch in Bug 806369 until we know more.  

In terms of who would "own" that, doublec is correct: the Android folks with the help of bjacob own implementing & debugging the blacklist code (i.e. making sure that devices get blacklisted/whitelisted correctly).
no longer tracking 17 because we have a whitelist in place
tracking-fennec: 17+ → +
It's #11 top crasher in 17.0, #4 in 18.0b4, #11 in 19.0a2, and #7 in 20.0a1.

(In reply to Brad Lassey [:blassey] from comment #19)
> no longer tracking 17 because we have a whitelist in place
The StageFright whitelist/blacklist doesn't work properly according to correlations per device in 17.0:
Device                    Android version  Status               Crashes in one day
Acer A500 	          4.0.3/4.1.2      blacklist/whitelist  55
Asus Transformer 	  4.0.3            blacklist            34
Samsung GT-P7500 	  3.2/4.0.4        blacklist/whitelist  22
Samsung GT-P7510 	  3.2/4.0.4        blacklist/whitelist  21
Asus Transformer TF101 	  4.0.3/4.1.1      blacklist/whitelist  18
Samsung GT-P7300 	  3.2              blacklist            13
NVidia VegaBean Beta 6 	  4.0.3            blacklist            13
Lenovo K1 	          4.1.2            whitelist            10
HTC One X 	          4.0.4/4.2.1      blacklist/whitelist  7
tracking-fennec: + → ?
Is there a way to tell if these crasher have stagefright.force-enabled turned on? Unlikely but would be good to knpw.
(In reply to Chris Double (:doublec) from comment #21)
> Is there a way to tell if these crasher have stagefright.force-enabled
> turned on? Unlikely but would be good to knpw.

There isn't a way at the moment, but it's easy to add such an annotation in AppNotes (also telling if Stagefright was used at all during a session, and whether it was successfully initialized or failed/denied). That's what ScopedGfxFeatureReporter does. Patch coming.
(In reply to Scoobidiver from comment #1)
> It's #9 top crasher in 17.0b2.
> 
> Here is the breakdown by device in 17.0 Beta over the last week:

Two questions:

 - how do you obtain this breakdown? (how can I reproduce/tweak it)
 - can you obtain specific Model/Product/Hardware/Manufacturer Android strings?
(In reply to Benoit Jacob [:bjacob] from comment #22)
> That's what ScopedGfxFeatureReporter does. Patch coming.
In a dedicated bug, please.
Add it also for other graphics features like WebGL anti-aliasing that can be forced.

(In reply to Benoit Jacob [:bjacob] from comment #23)
>  - how do you obtain this breakdown? (how can I reproduce/tweak it)
The first and fourth columns are obtained with https://crash-analysis.mozilla.com/rkaiser/2012-12-15/2012-12-15.fennecandroid.release.17.0.devices.html#sigs
The second column is obtained by checking manually crash reports sorted by OS: https://crash-stats.mozilla.com/report/list?signature=%400x0%20|%20libstagefrighthw.so%400xca3
The third column is deduced from https://wiki.mozilla.org/Blocklisting/Blocked_Graphics_Drivers#On_Android_2

>  - can you obtain specific Model/Product/Hardware/Manufacturer Android strings?
You should ask Robert Kaiser to improve his tools.
This crash is still reproducible on all channels.
A few remarks / questions.

1. The crash reported linked in comment 0,

https://crash-stats.mozilla.com/report/index/39be186f-0e74-4a5e-ae82-53e532121017

is on Android 4.1:

     Android API Version	16 (REL)

That certainly explains why blacklisting is not taking effect there: we explicitly don't blacklist stagefright on Android 4.1 and newer, which we identify by the Android API Version being >= 16.

Note that in the OS Version field, this crash report *lies*: it mentions version 4.0.3. That shouldn't be trusted, then.

So, do we know how many of the crashes discussed here are on Android API Version 16 and newer? Unfortunately, the crashdata CSV files don't have Android API Version.

Because of the above-mentioned inconsistency between Android API Version and the OS Version string, I don't know if I can trust the version numbers in comment 20. Do you know how these version numbers are obtained?

2. Here is a breakdown of recent crashes with Android Model/Hardware/Product/Manufacturer strings, suitable to establish blacklist rules:

$ cat a.csv | grep libstagefrighthw.so@0xca3 | sed 's/^.*\(Model[^|]*\)|.*$/\1/g' | sort | uniq -c | sort -rn
    252 Model: A500, Product: picasso, Manufacturer: Acer, Hardware: picasso' 
    148 Model: Transformer, Product: US_epad, Manufacturer: asus, Hardware: ventana' 
     73 Model: GT-P7500, Product: GT-P7500, Manufacturer: samsung, Hardware: p3' 
     69 Model: GT-P7510, Product: GT-P7510, Manufacturer: samsung, Hardware: p3' 
     66 Model: Transformer TF101, Product: US_epad, Manufacturer: asus, Hardware: ventana' 
     61 Model: folio100, Product: betelgeuse, Manufacturer: toshiba, Hardware: betelgeuse' 
     47 Model: A200, Product: picasso_e, Manufacturer: Acer, Hardware: picasso_e' 
     44 Model: HTC One X, Product: endeavoru, Manufacturer: HTC, Hardware: endeavoru' 
     38 Model: AT100, Product: unknown, Manufacturer: TOSHIBA, Hardware: antares' 
     31 Model: GT-P7300, Product: GT-P7300, Manufacturer: samsung, Hardware: p3' 
     29 Model: K1, Product: cm_k1, Manufacturer: Lenovo, Hardware: ventana' 
     27 Model: VegaBean Beta 6, Product: full_shuttle, Manufacturer: NVidia, Hardware: harmony' 
     21 Model: LG-P990, Product: lge_star, Manufacturer: LGE, Hardware: star' 
     21 Model: A100, Product: vangogh, Manufacturer: Acer, Hardware: vangogh' 
     20 Model: Ideapad K1, Product: k1, Manufacturer: unknown, Hardware: ventana' 
     17 Model: Malata SMBA1002, Product: drh_smba1002, Manufacturer: unknown, Hardware: harmony' 
     16 Model: A501, Product: full_a501, Manufacturer: Acer, Hardware: picasso' 
     15 Model: GT-P7310, Product: GT-P7310, Manufacturer: samsung, Hardware: p3' 
     14 Model: ThinkPadTablet, Product: ThinkPadTablet, Manufacturer: LENOVO, Hardware: ventana' 
     14 Model: A500, Product: full_a500, Manufacturer: Acer, Hardware: ventana' 
     12 Model: A500, Product: full_a500, Manufacturer: Acer, Hardware: picasso' 
     10 Model: vangogh, Product: thrive, Manufacturer: acer, Hardware: vangogh' 
     10 Model: Transformer TF101, Product: full_tf101, Manufacturer: asus, Hardware: ventana' 
      9 Model: Transformer Prime TF201, Product: EeePad, Manufacturer: asus, Hardware: cardhu' 
      9 Model: tervigon, Product: tervigon, Manufacturer: Motorola, Hardware: harmony' 
      7 Model: ViewPad 10S, Product: ViewPad 10S, Manufacturer: ViewSonic, Hardware: harmony' 
      7 Model: Notion Ink ADAM, Product: full_adam_3g, Manufacturer: unknown, Hardware: harmony' 
      6 Model: SCH-I905, Product: SCH-I905, Manufacturer: samsung, Hardware: p3' 
      6 Model: HTC One X, Product: tg_endeavoru, Manufacturer: unknown, Hardware: endeavoru' 
      6 Model: galaxy tab gt-p7500, Product: nexus, Manufacturer: samsung, Hardware: p3' 
      6 Model: ac100, Product: paz00, Manufacturer: toshiba, Hardware: paz00' 
      5 Model: Transformer Pad TF300T, Product: EeePad, Manufacturer: asus, Hardware: cardhu' 
      5 Model: EndeavorU, Product: endeavoru, Manufacturer: HTC, Hardware: endeavoru' 
      5 Model: a200, Product: picasso_e, Manufacturer: Acer, Hardware: picasso_e' 
      4 Model: Transformer Pad, Product: EeePad, Manufacturer: Asus, Hardware: ventana' 
      4 Model: EPAD, Product: US_epad, Manufacturer: asus, Hardware: ventana' 
      3 Model: VegaBean Beta 2.2, Product: full_shuttle, Manufacturer: NVidia, Hardware: harmony' 
      3 Model: SGH-I927, Product: SGH-I927, Manufacturer: samsung, Hardware: n1' 
      3 Model: Notion Ink ADAM, Product: cm_adam, Manufacturer: unknown, Hardware: harmony' 
      3 Model: Notion Ink ADAM, Product: cm_adam_3g, Manufacturer: unknown, Hardware: harmony' 
      2 Model: Notion Ink ADAM, Product: aokp_adam_3g, Manufacturer: unknown, Hardware: harmony' 
      2 Model: MD_LIFETAB_P9516, Product: CM_P9516v61, Manufacturer: MEDION, Hardware: ventana' 
      2 Model: MB860, Product: olyatt, Manufacturer: motorola, Hardware: olympus' 
      2 Model: LG-SU880, Product: x2_skt_kr, Manufacturer: LGE, Hardware: bssq' 
      2 Model: L-06C, Product: lge_startablet, Manufacturer: LGE, Hardware: startablet' 
      2 Model: A501, Product: picasso, Manufacturer: Acer, Hardware: picasso' 
      2 Model: A501, Product: full_a501, Manufacturer: Acer, Hardware: ventana' 
      1 Model: VegaBean Beta 8, Product: full_shuttle, Manufacturer: NVidia, Hardware: harmony' 
      1 Model: 'VegaBean Alpha 2', Product: 'full_shuttle', Manufacturer: 'NVidia', Hardware: 'harmony''. 
      1 Model: 'Transformer', Product: 'US_epad', Manufacturer: 'asus', Hardware: 'ventana''. 
      1 Model: SGH-T989, Product: SGH-T989, Manufacturer: Samsung, Hardware: harmony' 
      1 Model: Notion Ink ADAM, Product: aokp_adam, Manufacturer: unknown, Hardware: harmony' 
      1 Model: 'Malata SMBA1002', Product: 'drh_smba1002', Manufacturer: 'unknown', Hardware: 'harmony''. 
      1 Model: Iconia A500, Product: full_a500, Manufacturer: Acer, Hardware: ventana' 
      1 Model: Iconia A500, Product: full_a500, Manufacturer: Acer, Hardware: picasso' 
      1 Model: GT-P7510, Product: full_GT-P7510, Manufacturer: samsung, Hardware: p3' 
      1 Model: GT-P7100, Product: GT-P7100, Manufacturer: samsung, Hardware: p3' 
      1 Model: Folio 100, Product: betelgeuse, Manufacturer: toshiba, Hardware: betelgeuse' 
      1 Model: A500, Product: Picasso, Manufacturer: Acer, Hardware: picasso' 

Note: this is using crash data from a few days in early december, the last one being Dec. 13, as many crashdata CSV files are currently missing (see bug 819881). Hopefully recent enough.

3. Next steps:
  - need to know if the problem is that we should blacklist Android 4.1 devices, or if the problem is that we fail to blacklist Android 4.0 devices. See above point 1.; this is pretty much blocking everything else.
  - If we need to add more Android 4.1 blacklist rules, we need to construct such rules from the data in above 2.; for that we need to find factors in that table that have a strong correlation with this crash. The Hardware field is doing not too badly there,

bjacob:~/crash-stats$ cat a.csv | grep libstagefrighthw.so@0xca3 | grep 'Hardware: picasso' | wc -l
336
bjacob:~/crash-stats$ cat a.csv | grep 'Hardware: picasso' | wc -l
3666
bjacob:~/crash-stats$ cat a.csv | grep libstagefrighthw.so@0xca3 | grep 'Hardware: ventana' | wc -l
314
bjacob:~/crash-stats$ cat a.csv | grep 'Hardware: ventana' | wc -l
2162
bjacob:~/crash-stats$ cat a.csv | grep libstagefrighthw.so@0xca3 | grep 'Hardware: p3' | wc -l
202
bjacob:~/crash-stats$ cat a.csv | grep 'Hardware: p3' | wc -l
3315
bjacob:~/crash-stats$ cat a.csv | grep libstagefrighthw.so@0xca3 | grep 'Hardware: betelgeuse' | wc -l
62                                                                                                                                           
bjacob:~/crash-stats$ cat a.csv | grep 'Hardware: betelgeuse' | wc -l                                       
77                                                                                                                                           
bjacob:~/crash-stats$ cat a.csv | grep libstagefrighthw.so@0xca3 | grep 'Hardware: picasso_e' | wc -l
52
bjacob:~/crash-stats$ cat a.csv | grep 'Hardware: picasso_e' | wc -l
738
bjacob:~/crash-stats$ cat a.csv | grep libstagefrighthw.so@0xca3 | grep 'Hardware: endeavoru' | wc -l
55
bjacob:~/crash-stats$ cat a.csv | grep 'Hardware: endeavoru' | wc -l
3236
bjacob:~/crash-stats$ cat a.csv | grep libstagefrighthw.so@0xca3 | grep 'Hardware: antares' | wc -l
38
bjacob:~/crash-stats$ cat a.csv | grep 'Hardware: antares' | wc -l
622


But maybe we can find something even better.
(In reply to Scoobidiver from comment #24)
> (In reply to Benoit Jacob [:bjacob] from comment #22)
> > That's what ScopedGfxFeatureReporter does. Patch coming.
> In a dedicated bug, please.

OK.

> Add it also for other graphics features like WebGL anti-aliasing that can be
> forced.

This is already the case. Forced is indicated by an exclamation mark as in "WebGL!".
Depends on: 819881
Depends on: 822767
Filed bug 822767 about getting Android API Version in pub-crashdata.
(In reply to Benoit Jacob [:bjacob] from comment #26)
> I don't know if I can trust the version numbers in comment 20. Do you know how these version
> numbers are obtained?
I've updated comment 20 by checking manually crash reports and the blacklist/whilelist works as expected:
Device                    API level   Status     Crashes in one day
Acer A500 	          16/17       whitelist  55
Asus Transformer 	  16          whitelist  34
Samsung GT-P7500 	  16          whitelist  22
Samsung GT-P7510 	  16          whitelist  21
Asus Transformer TF101 	  16          whitelist  18
Samsung GT-P7300 	  16          whitelist  13
NVidia VegaBean Beta 6 	  16          whitelist  13
Lenovo K1 	          16          whitelist  10
HTC One X 	          16          whitelist  7
tracking-fennec: ? → 20+
(In reply to Scoobidiver from comment #29)

Thanks, that saves me time! So we've been talking with Brad and think that we should try finding the actual Android version instead of the Android API level; looking into that now.
Requesting review from 2 people as I suppose that we're time-constrained with the impending realization of the Mayan prophecy. Joe for GfxInfo aspects, Brad for Android aspects.

This patch makes us use Build.VERSION.RELEASE instead of Build.VERSION.SDK_INT. I don't know for sure that this will have the correct value on the devices that we have this crash on; but it *should* from the 'os version' strings we see in the crash reports.

Testing needed.

A little complication was that OperatingSystemVersion() really wants to return an integer, not a string; so we make up an integer by mashing together 8-bit version number components. Thankfully SplitDriverVersion exists, doing the parsing for us.
Attachment #694579 - Flags: review?(joe)
Attachment #694579 - Flags: review?(blassey.bugs)
Comment on attachment 694579 [details] [diff] [review]
Use the Android version RELEASE string, instead of the SDK_INT number

Review of attachment 694579 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/android/GfxInfo.cpp
@@ +349,5 @@
>  
>      if (aFeature == FEATURE_STAGEFRIGHT) {
>        NS_LossyConvertUTF16toASCII cManufacturer(mManufacturer);
>        NS_LossyConvertUTF16toASCII cModel(mModel);
> +      if (CompareVersions(mOSVersion.get(), "4.0.0") < 0) /* Before version 4.0 */

Comments are probably not necessary any more.
Attachment #694579 - Flags: review?(joe) → review+
This is how we'll know if people force-enabled stagefright in a crash report; will also confirm if stagefright was used at all, and whether it was blacklisted.
Attachment #694977 - Flags: review?(joe)
So the 3rd and last thing we need to do here is blacklist the devices that are actually running Android 4.1+ and are still crashing.

Here's a query from CSV files (accumulated crash data from Dec. 10 to Dec. 19, so 10 days) filtering only Android 4.1+ crashes, by parsing the 'os version' string which actually seems to work well enough:

bjacob:~/crash-stats$ cat a.csv | grep libstagefrighthw.so@0xca3 | grep \\:4\\.1\\.[0-9]\\/ | sed 's/^.*\(Model[^|]*\)|.*$/\1/g' | sort | uniq -c | sort -rn
    383 Model: A500, Product: picasso, Manufacturer: Acer, Hardware: picasso' 
     97 Model: K1, Product: cm_k1, Manufacturer: Lenovo, Hardware: ventana' 
     76 Model: AT100, Product: unknown, Manufacturer: TOSHIBA, Hardware: antares' 
     47 Model: Ideapad K1, Product: k1, Manufacturer: unknown, Hardware: ventana' 
     46 Model: A500, Product: full_a500, Manufacturer: Acer, Hardware: picasso' 
     41 Model: A501, Product: full_a501, Manufacturer: Acer, Hardware: picasso' 
     36 Model: A500, Product: full_a500, Manufacturer: Acer, Hardware: ventana' 
     28 Model: a200, Product: picasso_e, Manufacturer: Acer, Hardware: picasso_e' 
     27 Model: Malata SMBA1002, Product: drh_smba1002, Manufacturer: unknown, Hardware: harmony' 
     23 Model: Transformer TF101, Product: full_tf101, Manufacturer: asus, Hardware: ventana' 
     20 Model: tervigon, Product: tervigon, Manufacturer: Motorola, Hardware: harmony' 
     20 Model: HTC One X, Product: tg_endeavoru, Manufacturer: unknown, Hardware: endeavoru' 
     18 Model: A501, Product: full_a501, Manufacturer: Acer, Hardware: ventana' 
     11 Model: Notion Ink ADAM, Product: aokp_adam_3g, Manufacturer: unknown, Hardware: harmony' 
      9 Model: Notion Ink ADAM, Product: cm_adam_3g, Manufacturer: unknown, Hardware: harmony' 
      7 Model: Notion Ink ADAM, Product: full_adam_3g, Manufacturer: unknown, Hardware: harmony' 
      7 Model: Malata SMBA1002, Product: full_smba1002, Manufacturer: Malata, Hardware: harmony' 
      7 Model: Iconia A500, Product: full_a500, Manufacturer: Acer, Hardware: picasso' 
      7 Model: GT-I9103, Product: GT-I9103, Manufacturer: samsung, Hardware: n1' 
      6 Model: SGH-T989, Product: SGH-T989, Manufacturer: Samsung, Hardware: harmony' 
      5 Model: Notion Ink ADAM, Product: cm_adam, Manufacturer: unknown, Hardware: harmony' 
      5 Model: Iconia A500, Product: full_a500, Manufacturer: Acer, Hardware: ventana' 
      5 Model: GT-I9100, Product: k1, Manufacturer: unknown, Hardware: ventana' 
      3 Model: GT-P7510, Product: full_GT-P7510, Manufacturer: samsung, Hardware: p3' 
      3 Model: BRUNO P. SMBA1002, Product: drh_smba1002, Manufacturer: unknown, Hardware: harmony' 
      2 Model: Notion Ink ADAM, Product: full_adam, Manufacturer: unknown, Hardware: harmony' 
      2 Model: EndeavorU, Product: endeavoru, Manufacturer: HTC, Hardware: endeavoru' 
      1 Model: Notion Ink ADAM, Product: aokp_adam, Manufacturer: unknown, Hardware: harmony' 
      1 Model: 'Malata SMBA1002', Product: 'drh_smba1002', Manufacturer: 'unknown', Hardware: 'harmony''. 
      1 Model: K1, Product: IdeaPad_Tablet_K1, Manufacturer: LENOVO, Hardware: ventana' 
      1 Model: Iconia Tab A510, Product: a510_nothrills, Manufacturer: Acer, Hardware: picasso_m' 
      1 Model: A500, Product: unknown, Manufacturer: Acer, Hardware: antares' 
      1 Model: 'A500', Product: 'picasso', Manufacturer: 'Acer', Hardware: 'picasso''. 
      1 Model: A500, Product: a500_ww_gen1, Manufacturer: Acer, Hardware: antares' 
      1 Model: A500, Product: A500, Manufacturer: Acer, Hardware: picasso'
Attachment #694977 - Flags: review?(joe) → review+
So it looks like the right way to blacklist this stuff is really by the 'Hardware' field, as this give relatively few blacklist rules to cover most devices, and very high correlations with this crash.

Let's first check compute the overall prevalence of this crash:

bjacob:~/crash-stats$ cat a.csv | grep libstagefrighthw.so@0xca3 | grep 'Android' | wc -l
2969
bjacob:~/crash-stats$ cat a.csv | grep 'Android' | wc -l
418370
bjacob:~/crash-stats$ echo "scale=5;2969/418370" | bc
.00709

So 0.7 % ( = 0.007) of our Android crashes have this signature (over past 10 days).

Now let's compute the prevalence of this crash for the 'Hardware' values giving it, filtering only the ones that gave at least 10 crashes over the past 10 days:

bjacob:~/crash-stats$ cat a.csv | grep libstagefrighthw.so@0xca3 | grep \\:4\\.1\\.[0-9]\\/ | sed 's/^.*Hardware\:\ \([^|]*\)|.*$/\1/g' | sort | uniq -c | sed 's/^\ *//g' | grep ^[0-9][0-9] | cut -d ' ' -f 2 > hardwares
bjacob:~/crash-stats$ cat hardwares | (while read FOO; do echo -n "$FOO : " && echo "scale=5;(`cat a.csv | grep libstagefrighthw.so@0xca3 | grep "$FOO" | wc -l`/`cat a.csv | grep "$FOO" | wc -l`)/0.00709" | bc; done)
antares' : 6.14245
endeavoru' : 2.19322
harmony' : 87.33004
picasso' : 25.92383
picasso_e' : 11.47531
ventana' : 20.97884

These numbers tell how many times you are more likely than the average to have this crash, if you have a particular 'Hardware' field.

So all of these Hardware's are strongly correlated to this signature, from 'endeavoru' that is at 2.19 times over-represented to 'harmony' that is at 87 times over-represented.

So we just have to add blacklist rules for these 'Hardware' fields.

Let's get this added as downloadable blacklist entries to get this deployed to users asap... filing bug.
Depends on: 824118
Filed bug 824118 for the blocklisting request.
My above computation was slightly flawed as 1) in computing overall prevalence I didn't restrict to Android 4.1+ and 2) in the hardware listing I took only Android 4.1 and not 4.2.

Corrected computation for android 4.1 and 4.2:

bjacob:~/crash-stats$ cat a.csv | grep libstagefrighthw.so@0xca3 | grep 'Android' | grep \\:4\\.[12]\\.[0-9]\\/ | wc -l
958
bjacob:~/crash-stats$ cat a.csv | grep 'Android' | grep \\:4\\.[12]\\.[0-9]\\/ | wc -l
101160
bjacob:~/crash-stats$ echo "scale=5;958/101160" | bc
.00947
bjacob:~/crash-stats$ cat a.csv | grep libstagefrighthw.so@0xca3 | grep \\:4\\.[12]\\.[0-9]\\/ | sed 's/^.*Hardware\:\ \([^|]*\)|.*$/\1/g' | sort | uniq -c | sed 's/^\ *//g' | grep ^[0-9][0-9] | cut -d ' ' -f 2 > hardwares
bjacob:~/crash-stats$ cat hardwares | (while read FOO; do echo -n "$FOO : " && echo "scale=5;(`cat a.csv | grep libstagefrighthw.so@0xca3 | grep "$FOO" | wc -l`/`cat a.csv | grep "$FOO" | wc -l`)/0.00947" | bc; done)
antares' : 4.59873
endeavoru' : 1.64202
harmony' : 65.38225
picasso' : 19.40865
picasso_e' : 8.59134
ventana' : 15.70644

So now, 'endeavoru' is only at 1.6 over-representation, that borders on not being enough to warrant blacklisting. By default I'll still maintain the blacklisting request. Other values are still well into the range where blacklisting is a no-brainer.
Attachment #694579 - Flags: review?(blassey.bugs)
https://hg.mozilla.org/integration/mozilla-inbound/rev/215db511edcb
https://hg.mozilla.org/integration/mozilla-inbound/rev/3d2a5dd6fcec

Please leave open until bug 824118 is fixed and we confirm from crash-stats that it's really fixed.
Whiteboard: [native-crash], [hwdecoder] → [native-crash], [hwdecoder] [leave open]
Assignee: chris.double → bjacob
Target Milestone: --- → mozilla20
Comment on attachment 694579 [details] [diff] [review]
Use the Android version RELEASE string, instead of the SDK_INT number

[Approval Request Comment]
Bug caused by (feature/regressing bug #): We run into these system crashes since we do H264 decoding
User impact if declined: Android top-crasher on many android 4.0 devices (pre-4.0 devices are correclty blacklisted; 4.1+ devices still crashing are handled separately in bug 824118)
Testing completed (on m-c, etc.): just landed
Risk to taking this patch (and alternatives if risky): doesn't seem too risky, but I admit it's nontrivial code, it does things with stack arrays, etc. So it has a nonzero risk (of, say, crashing). That said, I've written the code carefully and tested it carefully too.
String or UUID changes made by this patch: none
Attachment #694579 - Flags: approval-mozilla-beta?
Attachment #694579 - Flags: approval-mozilla-aurora?
Note: i'm going on vacation so you may have to find someone else to land this on channels.
Attachment #694579 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 694579 [details] [diff] [review]
Use the Android version RELEASE string, instead of the SDK_INT number

We understand this to be risky to consider approving for our final beta's as per the risk analysis in comment 39.We will wontfix this for Fx18 and resolve the issue in Fx19 with the aurora landings.
Attachment #694579 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Summary: We are still seeing crashes in 4.0+ and earlier to a volume level that seems like the patch is not making a dent into the crash rate.

details : 
the bug was pushed on the 22nd; looking at this there is at least 17 crashes after the 22nd build:
https://crash-stats.mozilla.com/report/list?range_value=14&range_unit=days&date=2012-12-31&signature=%400x0%20|%20libstagefrighthw.so%400xca3&version=FennecAndroid%3A20.0a1

The OS looks like a custom OS build (not Android, but Cynogenmod or otherwise) on 4.0
Re: comment 43: that's interesting. Clicking on the reports, I see 2 kinds of reports:

Case #1: Manufacturer: asus, Hardware: ventana
Example: https://crash-stats.mozilla.com/report/index/36f04183-3d17-44f2-8662-59d9f2121230

Since this has 'Hardware: ventana', it will be blocked by bug 824118 anyway. It should be blocked by the current code though; I don't understand why it's not.

Case #2: Manufacturer: samsung, Hardware: p3
Example: https://crash-stats.mozilla.com/report/index/f33031eb-6173-44fc-8931-041fb2121230

For this one, there is a clear reason why it's not blacklisted by the current code: we are explicitly whitelisting 4.0 devices with 'Manufacturer: samsung'. Perhaps we should not; or perhaps we should blacklist those that have 'Hardware: p3'.
marking 18 as won't fix because this was denied beta approval
The fix of bug 824118 that should partially fix it hasn't landed yet.
I lost track of this bug; re-reading the comments here, it seems that we are just waiting for bug 824118 to be fixed?
(In reply to Benoit Jacob [:bjacob] from comment #49)
> I lost track of this bug; re-reading the comments here, it seems that we are
> just waiting for bug 824118 to be fixed?

I believe so, yes. I'm going to bring up the fact that QA needs to test bug 824118 as part of today's channel meeting.
Given the fallout from bug 824118, and a lack of in-product fix, we won't be able to resole for FF19.
As this led to bug 838845, I suspect we will not get to this for FF20 but will leave it tracked for now so it stays on our radar.
At this point we will untrack and leave it with the graphics team to decide on either making bug 824118 version specific or work on bug 838845 since we can't use the current blocklist.
Note that those crashes might have replaced Flash crashes. If it's a one-to-one or less than that, I don't think gfx blocklisting (allows again Flash) is the right solution.
Scoobidiver, does that mean that you have seen Flash in all the affected stacks you checked?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #55)
> Scoobidiver, does that mean that you have seen Flash in all the affected
> stacks you checked?
I see libflashplayer.so loaded in some crash reports (correlations per module don't exist for Fennec) but I though StageFright decoding was meant to replace Flash for some kinds of videos where both Flash and HTML5 video contents are proposed.
So if videos crashes both with Flash and HTML5, disabling StageFright decoding won't change the volume of crashes, only makes them hidden behind NP_Shutdown.
tracking-fennec: 20+ → ?
tracking-fennec: ? → +
Depends on: 862523
Here is a crash with p3 HW not blocklisted: bp-147dc20f-7c01-4db1-8d18-59b1c2130425.
(In reply to Scoobidiver from comment #57)
> Here is a crash with p3 HW not blocklisted:
> bp-147dc20f-7c01-4db1-8d18-59b1c2130425.
Note that Harmony HW is expected for Galaxy Tab 10.1 (see http://pdadb.net/index.php?m=cpu&id=a20t&c=nvidia_tegra_2_250_t20).
This bug and bug 785022 are #14 crasher in 21.0 and 22.0b1.
Keywords: topcrash
filter on [mass-p5]
Priority: -- → P5
Component: Audio/Video → Audio/Video: Playback
You need to log in before you can comment on or make changes to this bug.