Closed Bug 1322029 Opened 3 years ago Closed 3 years ago

Startup crashes on ZTE Max Duo devices after update to Firefox 50.0.2

Categories

(Firefox for Android :: General, defect, critical)

50 Branch
ARM
Android
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Firefox 54
Tracking Status
firefox50 - wontfix
firefox51 --- wontfix
firefox52 + wontfix
firefox-esr52 --- unaffected
firefox53 + wontfix
firefox54 --- wontfix
firefox55 --- fixed

People

(Reporter: philipp, Assigned: snorp)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(4 files)

This bug was filed from the Socorro interface and is 
report bp-1e559988-6f48-47fb-a8cb-51f0c2161201.
=============================================================

[Tracking Requested - why for this release]:
this signature is spiking up after the firefox 50.0.2 release, where it's the #2 topcrash at the moment accounting for 3.64% of all crashes. it's a newly introduced startup crash with zte models Z963VL (CDMA variant), Z963U (Imperial Max), Z962BL (GSM variant). 

judging from a user report at sumo https://support.mozilla.org/questions/1149216 and many comments in crash reports, firefox seems to crash persistently on launch for users of affected devices:
https://crash-stats.mozilla.com/signature/?android_model=%3DZ962BL&android_model=%3DZ963U&android_model=%3DZ963VL&version=50.0.2&signature=java.lang.UnsatisfiedLinkError%3A%20at%20java.lang.Runtime.loadLibrary%28Runtime.java%29#comments
Flags: needinfo?(kbrosnan)
Last time we saw a spike on this users had installed the x86 build on an ARM device bug 1196299. Though this does not seem to be the case for this crash according to https://crash-stats.mozilla.com/rawdumps/f2aa82c4-0630-4da4-ba3a-a26e22161201.json

Seems like the device is available from Walmart.
Flags: needinfo?(snorp)
Flags: needinfo?(nchen)
Flags: needinfo?(kbrosnan)
> nativeLibraryDirectories=[/data/app/org.mozilla.firefox-1/lib/arm64, /vendor/lib64, /system/lib64]

Do we have one of these devices? I'm a little concerned we may not be finding libmozglue in one of these 64-bit lib directories.
Flags: needinfo?(nchen)
Uh yeah, those paths are weird. Jim please try to get one of these devices so we can figure out what's going on.
Flags: needinfo?(snorp) → needinfo?(nchen)
This signature is still showing up in Socorro. Have we had any luck reproducing this internally?
I still need to get a device. If it's high priority I can go do that soon.
Flags: needinfo?(snorp)
Flags: needinfo?(nchen)
Flags: needinfo?(nchen)
It's the #19 top crash on 50.1 right now, and is nearly always a startup crash. 
It won't hold back the 51 release, but it does seem valuable to get the device and investigate this to try to fix it. Sebastian, can you find a owner for this issue?
Flags: needinfo?(s.kaspari)
Redirecting to snorp: Not a front-end thing. I'd re-flag jchen, but I can't. :)
Flags: needinfo?(s.kaspari) → needinfo?(snorp)
I'll take it.
Assignee: nobody → nchen
Snorp said he could get a device faster.
Assignee: nchen → snorp
Flags: needinfo?(nchen)
ping'd snorp, he's planning on purchasing a device this week.
I have the device, but have so far not reproduced the crash on either 50 or 51.
From the logcat of https://crash-stats.mozilla.com/rawdumps/325ec77e-c348-4543-9f48-926d42170202.json 

02-02 12:13:53.620 13640 13640 E GeckoLoader: Couldn't load /mnt/expand/493f17dc-282c-416e-bce4-b35d4c4ea406/user/0/org.mozilla.firefox/files/lib/libmozglue.so: java.lang.UnsatisfiedLinkError: dlopen failed: "/mnt/expand/493f17dc-282c-416e-bce4-b35d4c4ea406/user/0/org.mozilla.firefox/files/lib/libmozglue.so" is 32-bit instead of 64-bit

Is the binary mismatch interesting?
I'm testing using the Z962BL. Maybe there is a system update that's required? The 'check updates' button on this device fails to connect to the update server...
(In reply to Kevin Brosnan [:kbrosnan] from comment #12)
> From the logcat of
> https://crash-stats.mozilla.com/rawdumps/325ec77e-c348-4543-9f48-
> 926d42170202.json 
> 
> 02-02 12:13:53.620 13640 13640 E GeckoLoader: Couldn't load
> /mnt/expand/493f17dc-282c-416e-bce4-b35d4c4ea406/user/0/org.mozilla.firefox/
> files/lib/libmozglue.so: java.lang.UnsatisfiedLinkError: dlopen failed:
> "/mnt/expand/493f17dc-282c-416e-bce4-b35d4c4ea406/user/0/org.mozilla.firefox/
> files/lib/libmozglue.so" is 32-bit instead of 64-bit
> 
> Is the binary mismatch interesting?

Ah, I don't think I've read that log in detail before. What appears to be happening is that libmozglue.so is totally missing from our datadir. This is supposed to happen when the app is installed. We have a fallback where we extract it from the APK ourselves. Loading that library results in the 64-bit mismatch. Apparently the first library you load dictates if you will be able to load 32-bit or 64-bit libraries from then on. Once you load a 64-bit one, you can no longer load any 32-bit libraries. It seems that is somehow happening in this case, though we have no 64-bit libraries in our application. There is a good explanation of how this works here: https://corbt.com/posts/2015/09/18/mixing-32-and-64bit-dependencies-in-android.html

I think this is either an OS bug or some kind of corrupted install. I'll continue to investigate.
Flags: needinfo?(snorp)
I noticed that all crashes have the 64bit ABI in Build.CPU_ABI, but Build.CPU_ABI2 is blank. This is normally set to 'armeabi-v7a', which is the 32-bit ABI. A look at the Android code that sets this[0] yields:

static {
132        /*
133         * Adjusts CPU_ABI and CPU_ABI2 depending on whether or not a given process is 64 bit.
134         * 32 bit processes will always see 32 bit ABIs in these fields for backward
135         * compatibility.
136         */
137        final String[] abiList;
138        if (VMRuntime.getRuntime().is64Bit()) {
139            abiList = SUPPORTED_64_BIT_ABIS;
140        } else {
141            abiList = SUPPORTED_32_BIT_ABIS;
142        }
143
144        CPU_ABI = abiList[0];
145        if (abiList.length > 1) {
146            CPU_ABI2 = abiList[1];
147        } else {
148            CPU_ABI2 = "";
149        }
150    }

So if you are running a 64-bit VM, you get whatever ABI list is in SUPPORTED_64_BIT_ABIS. A couple lines higher, this is defined as:

127    public static final String[] SUPPORTED_64_BIT_ABIS =
128            getStringList("ro.product.cpu.abilist64", ",");

This just reads the "ro.product.cpu.abilist64" system property and splits it into an array, delimited by comma. On the ZTE device (and other devices I've tried), this simply yields "arm64-v8a". If we're on a 32bit VM, we get the 32bit ABI list, which yields expected results ("armeabi-v7a,armeabi"). I'm pretty sure what is happening here is that we're getting started in a 64-bit VM, when normally we are not. The question now is how does that happen?

[0] http://androidxref.com/7.1.1_r6/xref/frameworks/base/core/java/android/os/Build.java#131
Another interesting thing. In my logcats, I see the following reported:

D/GeckoSharedPrefs(12583): Current version = 0, prefs version = 2
D/GeckoSharedPrefs(12583): Performing migration

This means it's the first startup on the device (or the user cleared the app data).
I am seeing the above message in consecutive crashes from the same submitter email, and they are too close together for the user to have cleared the app data. More evidence that the installation has gone bad.
If the process dies very quickly, SharedPreferences might not make it to disk.
These patches won't fix the bug, but they will:

1) Gracefully bail out if we are starting Fennec on a 64-bit-only device
2) Add a log message when starting GeckoThread, which will help determine where the mozglue load is coming from
3) Add the process name to the crash report. I think this one will be the most useful, as we will be able to figure out 
what unexpected process is trying to load gecko libs.
Comment on attachment 8832995 [details] [diff] [review]
Don't allow Fennec to start if 32-bit ABI unsupported

Review of attachment 8832995 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/SharedSurfaceEGL.cpp
@@ +234,5 @@
>  }
>  
>  SharedSurface_SurfaceTexture::~SharedSurface_SurfaceTexture()
>  {
> +    GLContextProviderEGL::DestroyEGLSurface(mEglSurface);

Do we need this?

::: mobile/android/geckoview/src/main/java/org/mozilla/gecko/util/HardwareUtils.java
@@ +84,5 @@
>       */
>      public static boolean isSupportedSystem() {
>          // See http://developer.android.com/ndk/guides/abis.html
> +        boolean isSystemARM = Build.CPU_ABI != null && Build.CPU_ABI.equals("armeabi-v7a");
> +        boolean isSystemX86 = Build.CPU_ABI != null && Build.CPU_ABI.equals("x86");

I think we should be using SUPPORTED_32_BIT_ABIS [1].

[1] https://developer.android.com/reference/android/os/Build.html#SUPPORTED_ABIS
Attachment #8832995 - Flags: review?(nchen) → review+
Attachment #8832996 - Flags: review?(nchen) → review+
Comment on attachment 8832997 [details] [diff] [review]
Put process name into Android crash reports

Review of attachment 8832997 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/geckoview/src/main/java/org/mozilla/gecko/CrashHandler.java
@@ +195,2 @@
>          } catch (final IOException e) {
> +            return null;

Don't need this line.
Attachment #8832997 - Flags: review?(nchen) → review+
(In reply to Jim Chen [:jchen] [:darchons] from comment #23)
> Comment on attachment 8832995 [details] [diff] [review]
> Don't allow Fennec to start if 32-bit ABI unsupported
> 
> Review of attachment 8832995 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/gl/SharedSurfaceEGL.cpp
> @@ +234,5 @@
> >  }
> >  
> >  SharedSurface_SurfaceTexture::~SharedSurface_SurfaceTexture()
> >  {
> > +    GLContextProviderEGL::DestroyEGLSurface(mEglSurface);
> 
> Do we need this?

Whoops, no. This is from another set of patches I'm working on. Removed.

> 
> :::
> mobile/android/geckoview/src/main/java/org/mozilla/gecko/util/HardwareUtils.
> java
> @@ +84,5 @@
> >       */
> >      public static boolean isSupportedSystem() {
> >          // See http://developer.android.com/ndk/guides/abis.html
> > +        boolean isSystemARM = Build.CPU_ABI != null && Build.CPU_ABI.equals("armeabi-v7a");
> > +        boolean isSystemX86 = Build.CPU_ABI != null && Build.CPU_ABI.equals("x86");
> 
> I think we should be using SUPPORTED_32_BIT_ABIS [1].
> 
> [1]
> https://developer.android.com/reference/android/os/Build.html#SUPPORTED_ABIS

That's only in API 21, so we can't use it (unconditionally, at least).
(In reply to Jim Chen [:jchen] [:darchons] from comment #24)
> Comment on attachment 8832997 [details] [diff] [review]
> Put process name into Android crash reports
> 
> Review of attachment 8832997 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> :::
> mobile/android/geckoview/src/main/java/org/mozilla/gecko/CrashHandler.java
> @@ +195,2 @@
> >          } catch (final IOException e) {
> > +            return null;
> 
> Don't need this line.

Yup, copy/paste error. Thanks.
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #25)
> (In reply to Jim Chen [:jchen] [:darchons] from comment #23)
> > Comment on attachment 8832995 [details] [diff] [review]
> > Don't allow Fennec to start if 32-bit ABI unsupported
> > 
> > :::
> > mobile/android/geckoview/src/main/java/org/mozilla/gecko/util/HardwareUtils.
> > java
> > @@ +84,5 @@
> > >       */
> > >      public static boolean isSupportedSystem() {
> > >          // See http://developer.android.com/ndk/guides/abis.html
> > > +        boolean isSystemARM = Build.CPU_ABI != null && Build.CPU_ABI.equals("armeabi-v7a");
> > > +        boolean isSystemX86 = Build.CPU_ABI != null && Build.CPU_ABI.equals("x86");
> > 
> > I think we should be using SUPPORTED_32_BIT_ABIS [1].
> > 
> > [1]
> > https://developer.android.com/reference/android/os/Build.html#SUPPORTED_ABIS
> 
> That's only in API 21, so we can't use it (unconditionally, at least).

Yeah I meant we should use it when API >= 21.
Pushed by jwillcox@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/435b30acbb5e
Don't allow Fennec to start if 32-bit ABI unsupported r=jchen
https://hg.mozilla.org/integration/mozilla-inbound/rev/a1259007c6cc
Add log message when running GeckoThread on Android r=jchen
https://hg.mozilla.org/integration/mozilla-inbound/rev/99769e92f4a9
Put process name into Android crash reports r=jchen
Hi James, should we uplift this fix to aurora53 and beta52? Thanks!
Flags: needinfo?(snorp)
I think we do want to uplift these, as we probably won't get any relevant crashes on Nightly to debug this.
Flags: needinfo?(snorp)
Comment on attachment 8832995 [details] [diff] [review]
Don't allow Fennec to start if 32-bit ABI unsupported

Approval Request Comment
[Feature/Bug causing the regression]: Unknown
[User impact if declined]: Crashes
[Is this code covered by automated tests?]: No, other than smoketest
[Has the fix been verified in Nightly?]: No, we cannot reproduce
[Needs manual test from QE? If yes, steps to reproduce]:  No
[List of other uplifts needed for the feature/fix]: All of the patches in this bug
[Is the change risky?]: Somewhat, because it changes how we check for architecture compatibility at startup
[Why is the change risky/not risky?]:
[String changes made/needed]: No
Attachment #8832995 - Flags: approval-mozilla-beta?
Attachment #8832995 - Flags: approval-mozilla-aurora?
Comment on attachment 8832995 [details] [diff] [review]
Don't allow Fennec to start if 32-bit ABI unsupported

Prevent a top android crash. Let's take this on aurora and beta.
Attachment #8832995 - Flags: approval-mozilla-beta?
Attachment #8832995 - Flags: approval-mozilla-beta+
Attachment #8832995 - Flags: approval-mozilla-aurora?
Attachment #8832995 - Flags: approval-mozilla-aurora+
This bug isn't fixed, I just pushed some patches to help debug it. Reopening.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This could be caused by this Android bug [1], which only affects Android 6.0 (and not 6.0.1). It's triggered when an app is installed onto an "adopted storage," and I think could result in the wrong ABI being used. A workaround is to disable installing to external storage through the "android:installLocation" attribute.

[1] https://android.googlesource.com/platform/frameworks/base/+/bbcb331
Snorp, anything actionable here (from the results of your diagnostic patches?) The crash rate still looks high.
Flags: needinfo?(snorp)
We can try limiting the install location. I'll put up a patch that does that.
Flags: needinfo?(snorp)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #40)
> Snorp, anything actionable here (from the results of your diagnostic
> patches?) The crash rate still looks high.

Actually I'm a little uncomfortable with the proposed workaround from Jim, as it may prevent people from installing Firefox even on unaffected devices. We also have no idea how many people have Firefox installed on external media.

I'm only seeing 4 crashes with this signature in the last 7 days, which certainly seems low. Liz, which signature are you looking at?
Flags: needinfo?(lhenry)
Looking here, https://crash-stats.mozilla.com/signature/?signature=java.lang.UnsatisfiedLinkError%3A%20at%20java.lang.Runtime.loadLibrary(Runtime.java)  There are 2900 crashes in the last week on Fennec 52.0.1 (the #16 top crash for Fennec). Maybe it won't be a problem for 53, as I don't see *any* crashes for that signature in 53.   

I also notice now that I dug into the signature summary page that for nearly every one of these crashes, the OS is blank rather than FennecAndroid. That seems odd.
Flags: needinfo?(lhenry)
Hmm, the link in comment #43 is truncated, but I managed to find the right thing. Definitely still getting a bunch of these, and like 99% of them are API 23 with Fennec installed to external storage.

It would be great if we could apply this workaround *only* on API 23, but I really don't think we have that ability right now. I'll put a patch that just disallows external install everywhere.
Comment on attachment 8855372 [details]
Bug 1322029 - Disallow installing Fennec to external storage

https://reviewboard.mozilla.org/r/124250/#review130034
Attachment #8855372 - Flags: review?(nchen) → review+
Pushed by jwillcox@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8a8e46482e90
Disallow installing Fennec to external storage r=jchen
Comment on attachment 8855372 [details]
Bug 1322029 - Disallow installing Fennec to external storage

Approval Request Comment
[Feature/Bug causing the regression]: Bug in Android 6.0
[User impact if declined]: Startup crashes on Android 6.0 for some users who have moved Fennec to external storage
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: No
[Needs manual test from QE? If yes, steps to reproduce]: Maybe. The "Move to SD card" option in the Android app settings for Firefox should no longer appear.
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Unlikely
[Why is the change risky/not risky?]: This change reverts us to the default android setting of not allowing external storage for an app. The only potential problem I see is if Android reacts badly for folks who have already installed an older build onto the SD card. I was not able to test this scenario as I have no devices with an SD card. My expectation is that either Android will move the app to the internal storage or leave it as-is -- either way, we're no worse off.
[String changes made/needed]: None
Attachment #8855372 - Flags: approval-mozilla-beta?
Attachment #8855372 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/8a8e46482e90
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
But, are you changing things for all devices ? It seems like this crash only affects ZTE Max Duo users. We may be changing things for a much bigger set of users (who may have Fennec installed on an SD card). Since we don't know what will happen for that possibly bigger set of people, it seems best to keep this on trunk for now until we have more testing.  I may be misunderstanding the fix though.
Flags: needinfo?(snorp)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #50)
> But, are you changing things for all devices ? It seems like this crash only
> affects ZTE Max Duo users. We may be changing things for a much bigger set
> of users (who may have Fennec installed on an SD card). Since we don't know
> what will happen for that possibly bigger set of people, it seems best to
> keep this on trunk for now until we have more testing.  I may be
> misunderstanding the fix though.

We are changing it for all devices, since there's no way to do this for just the ZTE Max Duo without doing a separate build. We think it probably affects more devices than that anyway, it's just that the ZTE has the most users stuck on Android 6.0.

I'm fine keeping it on trunk for now.
Flags: needinfo?(snorp)
Attachment #8855372 - Flags: approval-mozilla-beta?
Attachment #8855372 - Flags: approval-mozilla-beta-
Attachment #8855372 - Flags: approval-mozilla-aurora?
Attachment #8855372 - Flags: approval-mozilla-aurora-
See Also: → 1387384
Mehh
So, you guys think Firefox for Android is a small app that can fit on the internal storage of every phone?
50MB app + 45MB data + 66MB cache = more than 160MB.
Not to mention rendering performance is much worse than Chromium-based browsers.

Is this how Mozilla "fix" bugs, by disabling important features?
What a weird way to fix something that "only affects Android 6.0" when installing over "adopted storage".

So, my suggestion for users that need FFA for something is to stay at v54. Otherwise, tap Uninstall button.
Hi there

I'm also affected by not being able to move Firefox(v.57, 196MB) to external storage (sd-card) any more. Android 6.0.1. (bug 1430652) and having big struggle keeping enough (internal) disk-space. If I understood correctly android 6.0.1 and later versions are not affected anymore.

So I would be very happy to hear a rough estimation for the following question:
Is there a chance that this manifest attribute android:installLocation can be set back to "auto" again, 
within the next 3 month? within the next two years?

(https://developer.android.com/guide/topics/data/install-location.html)

And in case 6.0 will be widely spread for long time:
Did you consider the option to check android version and storage location at App startup and return an error message "you can not use external storage on this android version" before the app crashes?

It's because I have to decide what browser I will use. Using an old version for long time is not an option for me. And I guess if in the foreseeable future it isn't not possible to move current version on SD again, I'd have to change. and others will too. (this is meant supportive, not trying to force you) If there where sooo many crash reports, this means a lot of people used this option (to store Firefox on the Sdcard). don't know if there is a statistic available about this.

Thx for the nice work!
Hi there

I'm also affected by not being able to move Firefox(v.57, 196MB) to external storage (sd-card) any more. Android 6.0.1. (bug 1430652) and having big struggle keeping enough (internal) disk-space. If I understood correctly android 6.0.1 and later versions are not affected anymore.

So I would be very happy to hear a rough estimation for the following question:
Is there a chance that this manifest attribute android:installLocation can be set back to "auto" again, 
within the next 3 month? within the next two years?

(https://developer.android.com/guide/topics/data/install-location.html)

And in case 6.0 will be widely spread for long time:
Did you consider the option to check android version and storage location at App startup and return an error message "you can not use external storage on this android version" before the app crashes?

It's because I have to decide what browser I will use. Using an old version for long time is not an option for me. And I guess if in the foreseeable future it isn't not possible to move current version on SD again, I'd have to change. and others will too. (this is meant supportive, not trying to force you) If there where sooo many crash reports, this means a lot of people used this option (to store Firefox on the Sdcard). don't know if there is a statistic available about this.

Thx for the nice work!
You need to log in before you can comment on or make changes to this bug.