Crash in java.lang.Exception: Error loading gecko libraries at org.mozilla.gecko.mozglue.GeckoLoader.loadGeckoLibsNative(Native Method)

RESOLVED FIXED in Firefox 53

Status

()

Firefox for Android
General
--
critical
RESOLVED FIXED
6 months ago
5 months ago

People

(Reporter: philipp, Assigned: jchen)

Tracking

({crash, topcrash})

51 Branch
Firefox 54
crash, topcrash
Points:
---

Firefox Tracking Flags

(firefox51 wontfix, firefox52 wontfix, firefox53 fixed, firefox54 fixed, fennec53+)

Details

(crash signature)

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

6 months ago
This bug was filed from the Socorro interface and is 
report bp-6a023933-f4c2-4e2a-9bd2-92c232170207.
=============================================================

[Tracking Requested - why for this release]:
this is a top startup crash on android after the update to 51.0.2 accounting for 45% of all crashes. user comments say it is crashing repeatedly on startup.

this seems to primarily affect tablet devices:
https://crash-stats.mozilla.com/search/?signature=%3Djava.lang.Exception%3A%20Error%20loading%20gecko%20libraries%20at%20org.mozilla.gecko.mozglue.GeckoLoader.loadGeckoLibsNative%28Native%20Method%29&version=51.0.2&_sort=-date&_facets=user_comments&_facets=uptime&_facets=useragent_locale&_facets=cpu_arch&_facets=android_device&_facets=android_model&_facets=android_brand&_facets=android_version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-android_model
guys, can you help with this? Thanks
Flags: needinfo?(s.kaspari)
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(esawin)
This is not a new regression but it seems that it is spiking:
https://crash-stats.mozilla.com/signature/?signature=java.lang.Exception%3A%20Error%20loading%20gecko%20libraries%20at%20org.mozilla.gecko.mozglue.GeckoLoader.loadGeckoLibsNative%28Native%20Method%29#graphs
status-firefox52: --- → affected
status-firefox53: --- → affected
status-firefox54: --- → ?
I disabled updates for:
* Lenovo MobileLenovo YOGA Tablet Pro-1050L/Yoga Tablet 2– YT2
* Lenovo MobileLIFETAB S1034X– lifetab_s1034x
* Lenovo MobileLIFETAB P891X– lifetab_p891x
* Lenovo MobileP1035X– P1035X
* Symphony TelecaNXA8QC116– NXA8QC116
* AsusZenFone 2 (ZE551ML)– Z00A_1
* Symphony TelecaNXA116QC164– NXA116QC164
I'll try to reproduce with one of my devices - although it looks like something for the platform team or glandium.

Currently 50.0.1 and 51.0.2 each account for roughly 43% of the crashes. Did we add more than the patch from bug 1318667 for the 51.0.2 release?
Flags: needinfo?(s.kaspari)
We took bug 1335134 (hsts), bug 1294650 (websense), bug 1333516 (geoloc) and bug 1333423 (e10s).
We didn't expect any regressions from these patches.
snorp, glandium, can either of you help with this new top crash ?
Flags: needinfo?(snorp)
We reprocessed crashes this morning. The reprocessing may be the cause of this spike.
(Assignee)

Comment 8

6 months ago
A majority of the crashes seems to be us running ARM APKs on x86 boards (Intel Bay Trail board)
See bug 1337318: Testing with an x86 tablet I noticed that you can still run an ARM APK on x86 devices - despite our check to prevent that. At least my device will pretend to be an ARM device when running an ARM APK so our check fails. I'll take bug 1337318 and see if I can update the check to work on those devices too.

However if we see so many crashes then this won't be just sideloads I guess?
(In reply to Kevin Brosnan [:kbrosnan] from comment #7)
> We reprocessed crashes this morning. The reprocessing may be the cause of
> this spike.

Kevin, can you elaborate? Which crashes did you reprocess? The spike is on Feb 6. If you have reprocessed, why have you reprocessed only a single day?
Flags: needinfo?(kbrosnan)
Keywords: topcrash
Most of these crashes seem to be caused by an x86 device running an arm build. This can happen if the arm build has a higher version number in the play store than the x86 one. If we only blocked the x86 builds due to earlier bugs (like 1318667), it could cause this. Given that the spike seems to occur about the same time we blocked stuff for bug 1318667, I think that's what's going on. I think we should unblock all of the x86 stuff now that the fix for 1318667 is out.
Flags: needinfo?(snorp) → needinfo?(sledru)
We blocked just a few devices, not an arch.
I am not sure how this would be related?!
Flags: needinfo?(sledru)
OK, the problem is that the versions currently in the play store are just hosed. You can see the following message in the 'APK' section now:

Some devices with Native platforms containing any of [x86] are eligible to receive version 2015469652, which is better optimized for the device's Native Platform, but actually receive version 2015469657 because it has a higher version code and the device supports Native platforms containing any of [armeabi-v7a] either directly (e.g. ARMv7 devices support a superset of ARMv5TE instructions) or indirectly (e.g. some x86 devices support ARMv7 or ARMv5TE via native code translation). This would occur when
API levels in range 15+ and
Screen layouts containing any of [small, normal, large, xlarge] and
OpenGL ES versions in range 2.0+ and
Features containing all of [android.hardware.TOUCHSCREEN, android.hardware.WIFI].
Whoops, see comment #13
Flags: needinfo?(sledru)
OK, thanks. 
Looking at the previous uploads, x86 always had the biggest version id (exemple: 51.0 x86: 2015466284  / arm: 2015466281)
There, x86 (2015469652) is smaller than than arm (2015469657)

Need info a bunch of releng folks to help
Flags: needinfo?(sledru)
Flags: needinfo?(rail)
Flags: needinfo?(jlund)
Flags: needinfo?(catlee)
(Assignee)

Comment 16

6 months ago
Created attachment 8834988 [details] [diff] [review]
Detect ARM emulation for some x86 devices (v1)

Some x86 devices set the CPU ABI to ARM (and even change /proc/cpuinfo)
as part of emulating ARM. In that case, we check the kernel release
string find out whether it's really x86 or not.
Attachment #8834988 - Flags: review?(snorp)
rail says that https://bugzilla.mozilla.org/show_bug.cgi?id=1337027 caused it
Depends on: 1337861
I believe this is related to https://bugzilla.mozilla.org/show_bug.cgi?id=1337027#c3 - we restarted the ARM build, what caused the build system to regenerate the version code, which is based on when the build was scheduled.

To fix this we would need to release another release. :/
Flags: needinfo?(rail)

Comment 19

6 months ago
Hi Sebastian, I'd like to include this patch (not r+'d yet) in the 51.0.3 build which I will gtb in the next hour or so. Could you/snorp please review this patch asap? Also, do you believe this patch is safe for inclusion in 51.0.3 dot release. We need a dot release pushed asap to mitigate two problems, one of which is mentioned in comment 16/17/18. Thanks!
Flags: needinfo?(s.kaspari)
Comment on attachment 8834988 [details] [diff] [review]
Detect ARM emulation for some x86 devices (v1)

Review of attachment 8834988 [details] [diff] [review]:
-----------------------------------------------------------------

You should probably rebase this on the change I pushed yesterday, otherwise looks good modulo the nit.

::: mobile/android/geckoview/src/main/java/org/mozilla/gecko/util/HardwareUtils.java
@@ +87,5 @@
>      public static boolean isX86System() {
> +        if (Build.CPU_ABI != null && Build.CPU_ABI.startsWith("x86")) {
> +            return true;
> +        }
> +        if (Build.VERSION.SDK_INT >= 21) {

Do we know for sure that these shenanigans are only being done on 21+?
Attachment #8834988 - Flags: review?(snorp) → review+
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #20)
> Do we know for sure that these shenanigans are only being done on 21+?

No! I have an affected device (Asus TF103CG) and it's running API 19.
Flags: needinfo?(s.kaspari)
I don't think we should push Jim's patch to release, since we only need to fix the play store badness to get around the crash.
(Assignee)

Comment 23

6 months ago
This particular crash is affecting (I think) certain builds of Android-x86 with API 21+ (and I think the kernel release string check only works for those particular builds).
Flags: needinfo?(kbrosnan)
(Assignee)

Comment 24

6 months ago
Created attachment 8835205 [details] [diff] [review]
1. Detect ARM emulation for some x86 devices (v1.1)

Rebased patch.
Attachment #8835205 - Flags: review+
(Assignee)

Updated

6 months ago
Attachment #8834988 - Attachment is obsolete: true
(Assignee)

Comment 25

6 months ago
Created attachment 8835206 [details] [diff] [review]
2. Clear cache dir if loading libs failed (v1)

If Gecko libs failed to load the first time, clear the cache dir and try
again. Only crash if we fail to load the libs a second time.
Attachment #8835206 - Flags: review?(snorp)
Comment on attachment 8835206 [details] [diff] [review]
2. Clear cache dir if loading libs failed (v1)

Review of attachment 8835206 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/geckoview/src/main/java/org/mozilla/gecko/GeckoThread.java
@@ +426,5 @@
> +            loadGeckoLibs(context, resourcePath);
> +
> +        } catch (final Exception e) {
> +            // Cannot load libs; try clearing the cached files.
> +            Log.w(LOGTAG, "Clearing cache after load libs exception", e);

Considering we're supposed to overwrite the cache when the crc doesn't match, why would that change anything?
If we're not doing that, then there's a problem in the logic handling the cache, and you're just papering over it.
(Assignee)

Comment 27

6 months ago
Well we check the APK CRC against the .crc file, which doesn't say much about the integrity of the cached .so file.
(In reply to Jim Chen [:jchen] [:darchons] from comment #27)
> Well we check the APK CRC against the .crc file, which doesn't say much
> about the integrity of the cached .so file.

For one, that's not the problem that is being discussed in this bug, and it still feels the wrong level to be hardening for this.
Flags: needinfo?(mh+mozilla)
Comment on attachment 8835206 [details] [diff] [review]
2. Clear cache dir if loading libs failed (v1)

Review of attachment 8835206 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/geckoview/src/main/java/org/mozilla/gecko/GeckoThread.java
@@ +426,5 @@
> +            loadGeckoLibs(context, resourcePath);
> +
> +        } catch (final Exception e) {
> +            // Cannot load libs; try clearing the cached files.
> +            Log.w(LOGTAG, "Clearing cache after load libs exception", e);

We don't actually compute the crc of the file on disk when we load the library, as that would likely be pretty slow. We only compare the crc that's stored next to it with the crc in the APK in order to invalidate the cache.

@@ +427,5 @@
> +
> +        } catch (final Exception e) {
> +            // Cannot load libs; try clearing the cached files.
> +            Log.w(LOGTAG, "Clearing cache after load libs exception", e);
> +            FileUtils.delTree(GeckoLoader.getCacheDir(context),

I think we should try to get some telemetry on how often this runs. If it runs often then that would be really bad.
Attachment #8835206 - Flags: review?(snorp) → review+
Re-enabled updates for the tablets which were excluded in Comment 3.
Also it looks like the versions are correct in the Play console now with the new release, so we should see the volume on this crash drop significantly.
Flags: needinfo?(jlund)
Flags: needinfo?(esawin)
Flags: needinfo?(catlee)

Comment 32

5 months ago
Pushed by nchen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a792d73dc345
1. Detect ARM emulation for some x86 devices; r=snorp
https://hg.mozilla.org/integration/mozilla-inbound/rev/e0daea02c01f
2. Clear cache dir if loading libs failed; r=snorp

Comment 33

5 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/a792d73dc345
https://hg.mozilla.org/mozilla-central/rev/e0daea02c01f
Status: NEW → RESOLVED
Last Resolved: 5 months ago
status-firefox54: ? → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 54
Please request Aurora/Beta approval on this when you get a chance.
Assignee: nobody → nchen
Flags: needinfo?(nchen)
(Assignee)

Comment 35

5 months ago
Comment on attachment 8835205 [details] [diff] [review]
1. Detect ARM emulation for some x86 devices (v1.1)

The most pressing issue was fixed through the Play Store, so I think Aurora is enough for these patches; they can ride the train to Beta.

Approval Request Comment
[Feature/Bug causing the regression]: N/A
[User impact if declined]: Startup crash on certain x86 Android devices
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Not really
[Why is the change risky/not risky?]: One patch is specific to certain affected x86 devices; the other patch is an attempt to recover from a certain crash; so neither patch has a potentially far reaching impact.
[String changes made/needed]: None
Flags: needinfo?(nchen)
Attachment #8835205 - Flags: approval-mozilla-aurora?
tracking-fennec: --- → ?
status-firefox51: affected → wontfix
status-firefox52: affected → wontfix
tracking-firefox51: ? → ---
Comment on attachment 8835205 [details] [diff] [review]
1. Detect ARM emulation for some x86 devices (v1.1)

Fix a startup crash on certain x86 Android devices. Aurora53+.
Attachment #8835205 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Comment 37

5 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/3cc59d37b9f3
https://hg.mozilla.org/releases/mozilla-aurora/rev/caa026ea3982
status-firefox53: affected → fixed
tracking-fennec: ? → 53+
You need to log in before you can comment on or make changes to this bug.