Last Comment Bug 1337290 - Crash in java.lang.Exception: Error loading gecko libraries at org.mozilla.gecko.mozglue.GeckoLoader.loadGeckoLibsNative(Native Method)
: Crash in java.lang.Exception: Error loading gecko libraries at org.mozilla.ge...
Status: RESOLVED FIXED
: crash, topcrash
Product: Firefox for Android
Classification: Client Software
Component: General (show other bugs)
: 51 Branch
: Unspecified Unspecified
-- critical (vote)
: Firefox 54
Assigned To: Jim Chen [:jchen] [:darchons]
:
: Sebastian Kaspari (:sebastian)
Mentors:
Depends on: 1337861
Blocks:
  Show dependency treegraph
 
Reported: 2017-02-07 00:26 PST by [:philipp]
Modified: 2017-02-24 03:52 PST (History)
17 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
wontfix
wontfix
fixed
fixed
?


Attachments
Detect ARM emulation for some x86 devices (v1) (3.99 KB, patch)
2017-02-08 09:28 PST, Jim Chen [:jchen] [:darchons]
snorp: review+
Details | Diff | Splinter Review
1. Detect ARM emulation for some x86 devices (v1.1) (4.32 KB, patch)
2017-02-08 16:47 PST, Jim Chen [:jchen] [:darchons]
nchen: review+
gchang: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
2. Clear cache dir if loading libs failed (v1) (5.46 KB, patch)
2017-02-08 16:47 PST, Jim Chen [:jchen] [:darchons]
snorp: review+
Details | Diff | Splinter Review

Description User image [:philipp] 2017-02-07 00:26:37 PST
This bug was filed from the Socorro interface and is 
report bp-6a023933-f4c2-4e2a-9bd2-92c232170207.
=============================================================

[Tracking Requested - why for this release]:
this is a top startup crash on android after the update to 51.0.2 accounting for 45% of all crashes. user comments say it is crashing repeatedly on startup.

this seems to primarily affect tablet devices:
https://crash-stats.mozilla.com/search/?signature=%3Djava.lang.Exception%3A%20Error%20loading%20gecko%20libraries%20at%20org.mozilla.gecko.mozglue.GeckoLoader.loadGeckoLibsNative%28Native%20Method%29&version=51.0.2&_sort=-date&_facets=user_comments&_facets=uptime&_facets=useragent_locale&_facets=cpu_arch&_facets=android_device&_facets=android_model&_facets=android_brand&_facets=android_version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-android_model
Comment 1 User image Sylvestre Ledru [:sylvestre] 2017-02-07 00:32:25 PST
guys, can you help with this? Thanks
Comment 3 User image Sylvestre Ledru [:sylvestre] 2017-02-07 01:13:57 PST
I disabled updates for:
* Lenovo MobileLenovo YOGA Tablet Pro-1050L/Yoga Tablet 2– YT2
* Lenovo MobileLIFETAB S1034X– lifetab_s1034x
* Lenovo MobileLIFETAB P891X– lifetab_p891x
* Lenovo MobileP1035X– P1035X
* Symphony TelecaNXA8QC116– NXA8QC116
* AsusZenFone 2 (ZE551ML)– Z00A_1
* Symphony TelecaNXA116QC164– NXA116QC164
Comment 4 User image Sebastian Kaspari (:sebastian) 2017-02-07 01:55:24 PST
I'll try to reproduce with one of my devices - although it looks like something for the platform team or glandium.

Currently 50.0.1 and 51.0.2 each account for roughly 43% of the crashes. Did we add more than the patch from bug 1318667 for the 51.0.2 release?
Comment 5 User image Sylvestre Ledru [:sylvestre] 2017-02-07 02:07:13 PST
We took bug 1335134 (hsts), bug 1294650 (websense), bug 1333516 (geoloc) and bug 1333423 (e10s).
We didn't expect any regressions from these patches.
Comment 6 User image Liz Henry (:lizzard) (needinfo? me) 2017-02-07 09:01:10 PST
snorp, glandium, can either of you help with this new top crash ?
Comment 7 User image Kevin Brosnan [:kbrosnan] 2017-02-07 10:48:14 PST
We reprocessed crashes this morning. The reprocessing may be the cause of this spike.
Comment 8 User image Jim Chen [:jchen] [:darchons] 2017-02-07 18:36:02 PST
A majority of the crashes seems to be us running ARM APKs on x86 boards (Intel Bay Trail board)
Comment 9 User image Sebastian Kaspari (:sebastian) 2017-02-08 00:01:28 PST
See bug 1337318: Testing with an x86 tablet I noticed that you can still run an ARM APK on x86 devices - despite our check to prevent that. At least my device will pretend to be an ARM device when running an ARM APK so our check fails. I'll take bug 1337318 and see if I can update the check to work on those devices too.

However if we see so many crashes then this won't be just sideloads I guess?
Comment 10 User image Marco Castelluccio [:marco] 2017-02-08 05:01:52 PST
(In reply to Kevin Brosnan [:kbrosnan] from comment #7)
> We reprocessed crashes this morning. The reprocessing may be the cause of
> this spike.

Kevin, can you elaborate? Which crashes did you reprocess? The spike is on Feb 6. If you have reprocessed, why have you reprocessed only a single day?
Comment 11 User image James Willcox (:snorp) (jwillcox@mozilla.com) 2017-02-08 08:14:46 PST
Most of these crashes seem to be caused by an x86 device running an arm build. This can happen if the arm build has a higher version number in the play store than the x86 one. If we only blocked the x86 builds due to earlier bugs (like 1318667), it could cause this. Given that the spike seems to occur about the same time we blocked stuff for bug 1318667, I think that's what's going on. I think we should unblock all of the x86 stuff now that the fix for 1318667 is out.
Comment 12 User image Sylvestre Ledru [:sylvestre] 2017-02-08 09:07:56 PST
We blocked just a few devices, not an arch.
I am not sure how this would be related?!
Comment 13 User image James Willcox (:snorp) (jwillcox@mozilla.com) 2017-02-08 09:19:34 PST
OK, the problem is that the versions currently in the play store are just hosed. You can see the following message in the 'APK' section now:

Some devices with Native platforms containing any of [x86] are eligible to receive version 2015469652, which is better optimized for the device's Native Platform, but actually receive version 2015469657 because it has a higher version code and the device supports Native platforms containing any of [armeabi-v7a] either directly (e.g. ARMv7 devices support a superset of ARMv5TE instructions) or indirectly (e.g. some x86 devices support ARMv7 or ARMv5TE via native code translation). This would occur when
API levels in range 15+ and
Screen layouts containing any of [small, normal, large, xlarge] and
OpenGL ES versions in range 2.0+ and
Features containing all of [android.hardware.TOUCHSCREEN, android.hardware.WIFI].
Comment 14 User image James Willcox (:snorp) (jwillcox@mozilla.com) 2017-02-08 09:19:56 PST
Whoops, see comment #13
Comment 15 User image Sylvestre Ledru [:sylvestre] 2017-02-08 09:26:16 PST
OK, thanks. 
Looking at the previous uploads, x86 always had the biggest version id (exemple: 51.0 x86: 2015466284  / arm: 2015466281)
There, x86 (2015469652) is smaller than than arm (2015469657)

Need info a bunch of releng folks to help
Comment 16 User image Jim Chen [:jchen] [:darchons] 2017-02-08 09:28:20 PST
Created attachment 8834988 [details] [diff] [review]
Detect ARM emulation for some x86 devices (v1)

Some x86 devices set the CPU ABI to ARM (and even change /proc/cpuinfo)
as part of emulating ARM. In that case, we check the kernel release
string find out whether it's really x86 or not.
Comment 17 User image Sylvestre Ledru [:sylvestre] 2017-02-08 09:52:06 PST
rail says that https://bugzilla.mozilla.org/show_bug.cgi?id=1337027 caused it
Comment 18 User image Rail Aliiev [:rail] ⌚️ET 2017-02-08 09:53:58 PST
I believe this is related to https://bugzilla.mozilla.org/show_bug.cgi?id=1337027#c3 - we restarted the ARM build, what caused the build system to regenerate the version code, which is based on when the build was scheduled.

To fix this we would need to release another release. :/
Comment 19 User image Ritu Kothari (:ritu) 2017-02-08 10:38:16 PST
Hi Sebastian, I'd like to include this patch (not r+'d yet) in the 51.0.3 build which I will gtb in the next hour or so. Could you/snorp please review this patch asap? Also, do you believe this patch is safe for inclusion in 51.0.3 dot release. We need a dot release pushed asap to mitigate two problems, one of which is mentioned in comment 16/17/18. Thanks!
Comment 20 User image James Willcox (:snorp) (jwillcox@mozilla.com) 2017-02-08 10:42:22 PST
Comment on attachment 8834988 [details] [diff] [review]
Detect ARM emulation for some x86 devices (v1)

Review of attachment 8834988 [details] [diff] [review]:
-----------------------------------------------------------------

You should probably rebase this on the change I pushed yesterday, otherwise looks good modulo the nit.

::: mobile/android/geckoview/src/main/java/org/mozilla/gecko/util/HardwareUtils.java
@@ +87,5 @@
>      public static boolean isX86System() {
> +        if (Build.CPU_ABI != null && Build.CPU_ABI.startsWith("x86")) {
> +            return true;
> +        }
> +        if (Build.VERSION.SDK_INT >= 21) {

Do we know for sure that these shenanigans are only being done on 21+?
Comment 21 User image Sebastian Kaspari (:sebastian) 2017-02-08 10:47:08 PST
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #20)
> Do we know for sure that these shenanigans are only being done on 21+?

No! I have an affected device (Asus TF103CG) and it's running API 19.
Comment 22 User image James Willcox (:snorp) (jwillcox@mozilla.com) 2017-02-08 10:52:27 PST
I don't think we should push Jim's patch to release, since we only need to fix the play store badness to get around the crash.
Comment 23 User image Jim Chen [:jchen] [:darchons] 2017-02-08 11:36:20 PST
This particular crash is affecting (I think) certain builds of Android-x86 with API 21+ (and I think the kernel release string check only works for those particular builds).
Comment 24 User image Jim Chen [:jchen] [:darchons] 2017-02-08 16:47:04 PST
Created attachment 8835205 [details] [diff] [review]
1. Detect ARM emulation for some x86 devices (v1.1)

Rebased patch.
Comment 25 User image Jim Chen [:jchen] [:darchons] 2017-02-08 16:47:24 PST
Created attachment 8835206 [details] [diff] [review]
2. Clear cache dir if loading libs failed (v1)

If Gecko libs failed to load the first time, clear the cache dir and try
again. Only crash if we fail to load the libs a second time.
Comment 26 User image Mike Hommey [:glandium] 2017-02-08 17:17:20 PST
Comment on attachment 8835206 [details] [diff] [review]
2. Clear cache dir if loading libs failed (v1)

Review of attachment 8835206 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/geckoview/src/main/java/org/mozilla/gecko/GeckoThread.java
@@ +426,5 @@
> +            loadGeckoLibs(context, resourcePath);
> +
> +        } catch (final Exception e) {
> +            // Cannot load libs; try clearing the cached files.
> +            Log.w(LOGTAG, "Clearing cache after load libs exception", e);

Considering we're supposed to overwrite the cache when the crc doesn't match, why would that change anything?
If we're not doing that, then there's a problem in the logic handling the cache, and you're just papering over it.
Comment 27 User image Jim Chen [:jchen] [:darchons] 2017-02-08 19:22:47 PST
Well we check the APK CRC against the .crc file, which doesn't say much about the integrity of the cached .so file.
Comment 28 User image Mike Hommey [:glandium] 2017-02-08 20:23:56 PST
(In reply to Jim Chen [:jchen] [:darchons] from comment #27)
> Well we check the APK CRC against the .crc file, which doesn't say much
> about the integrity of the cached .so file.

For one, that's not the problem that is being discussed in this bug, and it still feels the wrong level to be hardening for this.
Comment 29 User image James Willcox (:snorp) (jwillcox@mozilla.com) 2017-02-09 06:52:55 PST
Comment on attachment 8835206 [details] [diff] [review]
2. Clear cache dir if loading libs failed (v1)

Review of attachment 8835206 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/geckoview/src/main/java/org/mozilla/gecko/GeckoThread.java
@@ +426,5 @@
> +            loadGeckoLibs(context, resourcePath);
> +
> +        } catch (final Exception e) {
> +            // Cannot load libs; try clearing the cached files.
> +            Log.w(LOGTAG, "Clearing cache after load libs exception", e);

We don't actually compute the crc of the file on disk when we load the library, as that would likely be pretty slow. We only compare the crc that's stored next to it with the crc in the APK in order to invalidate the cache.

@@ +427,5 @@
> +
> +        } catch (final Exception e) {
> +            // Cannot load libs; try clearing the cached files.
> +            Log.w(LOGTAG, "Clearing cache after load libs exception", e);
> +            FileUtils.delTree(GeckoLoader.getCacheDir(context),

I think we should try to get some telemetry on how often this runs. If it runs often then that would be really bad.
Comment 30 User image Liz Henry (:lizzard) (needinfo? me) 2017-02-09 08:29:17 PST
Re-enabled updates for the tablets which were excluded in Comment 3.
Comment 31 User image James Willcox (:snorp) (jwillcox@mozilla.com) 2017-02-09 09:03:06 PST
Also it looks like the versions are correct in the Play console now with the new release, so we should see the volume on this crash drop significantly.
Comment 32 User image Pulsebot 2017-02-15 14:13:56 PST
Pushed by nchen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a792d73dc345
1. Detect ARM emulation for some x86 devices; r=snorp
https://hg.mozilla.org/integration/mozilla-inbound/rev/e0daea02c01f
2. Clear cache dir if loading libs failed; r=snorp
Comment 34 User image Ryan VanderMeulen [:RyanVM] 2017-02-16 15:51:11 PST
Please request Aurora/Beta approval on this when you get a chance.
Comment 35 User image Jim Chen [:jchen] [:darchons] 2017-02-23 08:57:19 PST
Comment on attachment 8835205 [details] [diff] [review]
1. Detect ARM emulation for some x86 devices (v1.1)

The most pressing issue was fixed through the Play Store, so I think Aurora is enough for these patches; they can ride the train to Beta.

Approval Request Comment
[Feature/Bug causing the regression]: N/A
[User impact if declined]: Startup crash on certain x86 Android devices
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Not really
[Why is the change risky/not risky?]: One patch is specific to certain affected x86 devices; the other patch is an attempt to recover from a certain crash; so neither patch has a potentially far reaching impact.
[String changes made/needed]: None
Comment 36 User image Gerry Chang [:gchang] 2017-02-23 23:08:16 PST
Comment on attachment 8835205 [details] [diff] [review]
1. Detect ARM emulation for some x86 devices (v1.1)

Fix a startup crash on certain x86 Android devices. Aurora53+.

Note You need to log in before you can comment on or make changes to this bug.