Closed Bug 1835231 Opened 1 years ago Closed 1 years ago

Fenix crashes on startup on Android 5 devices

Categories

(Fenix :: General, defect, P1)

Firefox 115
ARM
Android
defect

Tracking

(firefox113 unaffected, firefox114+ verified, firefox115+ verified)

VERIFIED FIXED
115 Branch
Tracking Status
firefox113 --- unaffected
firefox114 + verified
firefox115 + verified

People

(Reporter: mlobontiuroman, Assigned: zmckenney)

References

(Regression)

Details

(Keywords: crash, regression)

Attachments

(6 files)

Attached video crash.mp4

Steps to reproduce

  1. Install Fenix Beta 114.0b8, or Beta 114.0b9, or the latest Nightly 115.0a1, on a device with Android 5.
  2. Open Fenix.

Expected behavior

Fenix can be opened.

Actual behavior

Fenix crashes. Cannot open about:crashes to get the details.

Device information

  • Firefox version: Nightly 115.0a1 from 5/26, Beta 114.0b8, Beta 114.0b9
  • Android devices: Samsung Galaxy Tab A6 (Android 5.1.1), and Xiaomi mi4i (Android 5.0.2)
  • NOT reproducible on RC 113.2.0
Severity: -- → S3
Keywords: crash
Attached file crash.txt

I've added a crash log, maybe it helps.

Hello, the issue is only reproducible with:

  • 114.0b9 and 114.0b8;
    We were not able to reproduce it with:
  • 114.0b1, 114.0b3, 114.9b4, 114.0b7;
  • 113.0b8, 113.0b9;
  • RC 113.2.0;
    Tested with:
  • Huawei MediaPad M2 (Android 5.1.1)
  • Samsung Galaxy Tab A6 (Android 5.1.1)

This crash is now reproducible also on RC 114.0 build 2, with Samsung Galaxy Tab A6 (Android 5.1.1).

Duplicate of this bug: 1836043

Old LG Leon (5.1.1) cannot run the latest Nightly 115.0a1 as well. FF just fails to start. There is no crash report either.

Severity: S3 → S2
Priority: -- → P1

The bug is marked as tracked for firefox114 (beta). However, the bug still isn't assigned.

:amoya, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(amoya)
Assignee: nobody → zmckenney
Component: Crash Reporting → General
Flags: needinfo?(amoya)
Keywords: regression

First update while I continue to investigate this issue:

This is occurring across both x86 and AArch64 for API level 22 (with API level 21 tested only on x86 but also confirmed). The crash signature does not appear the same as the interposer issue we have seen before when running debug local builds of GV in Fenix. Unfortunately, the error given is not helpful yet.

I'm doing a manual bisection of GV local builds to find which revision caused the crashing and will update as soon as I have answers.

I checked a lot of builds today and there are 2 crashes that can occur which involve the interposer work previously done. Previously when crashing around the interposer we would see a call to get or set of the env from any library and it would crash on startup. The stack trace would show the native library (it could be any) called one of these methods and we would know to look at the interposer.

In these crashes the only stack trace we could gather was 2023-06-01 11:06:21.977 17661-17690 libc org.mozilla.fenix.debug A Fatal signal 11 (SIGSEGV), code 128, fault addr 0x0 in tid 17690 (Gecko)

There was some difficulty trying to get the tombstones from the CI builds which were failing as well but this is what the backtrace with symbols showed:

backtrace:
      #00 pc 0000007568e1106c  <unknown>
      #01 pc 0000000000502250  /data/app/~~lJMmSH_Uxma-2dA9gAtNMg==/org.mozilla.geckoview_example-DF6N6UOkyAofIMBPgoUCxg==/lib/arm64/libxul.so!libxul.so (offset 0x502000) (BuildId: 4200bd12824e0527d37033861c395d5d68ef562f)

It seems the patch to fix the interposer which allows for the direct libc lookup when libmozglue has not been linked is now crashing Android 5.0 and 5.1 (which is only throwing the generic libc error message above). The crash location is HERE.

Backing out both interposer revisions fixes the issue on Android 5.0 and 5.1 for updated default branch.

Flags: needinfo?(gsvelto)

Looping in Alexandre as well in case he can help.

So it was a bit unclear how to get an android build reproducing, but I think I have something. I'll see how much I can try and help there.

06-02 12:33:54.760  1183  1183 I DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
06-02 12:33:54.760  1183  1183 I DEBUG   : Build fingerprint: 'Android/sdk_google_phone_x86/generic_x86:5.1.1/LMY48X/6695563:userdebug/test-keys'
06-02 12:33:54.760  1183  1183 I DEBUG   : Revision: '0'
06-02 12:33:54.760  1183  1183 I DEBUG   : ABI: 'x86'
06-02 12:33:54.760  1183  1183 I DEBUG   : pid: 4531, tid: 4551, name: Gecko  >>> org.mozilla.geckoview_example <<<
06-02 12:33:54.760  1183  1183 I DEBUG   : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x63626974
06-02 12:33:54.761  1183  1183 I DEBUG   :     eax 6362696c  ebx ae410bd4  ecx 93af7d42  edx 00000000
06-02 12:33:54.761  1183  1183 I DEBUG   :     esi b77d23dc  edi ae281ba7
06-02 12:33:54.762  1183  1183 I DEBUG   :     xcs 00000073  xds 0000007b  xes 0000007b  xfs 0000006f  xss 0000007b
06-02 12:33:54.762  1183  1183 I DEBUG   :     eip ae308f34  ebp a0bff988  esp a0bff960  flags 00010292
06-02 12:33:54.762  1183  1183 I DEBUG   : 
06-02 12:33:54.762  1183  1183 I DEBUG   : backtrace:
06-02 12:33:54.762  1183  1183 I DEBUG   :     #00 pc 00020f34  /data/app/org.mozilla.geckoview_example-1/lib/x86/libmozglue.so
06-02 12:33:54.762  1183  1183 I DEBUG   :     #01 pc 0002ddef  /data/app/org.mozilla.geckoview_example-1/lib/x86/libmozglue.so
06-02 12:33:54.762  1183  1183 I DEBUG   :     #02 pc 0002bdac  /data/app/org.mozilla.geckoview_example-1/lib/x86/libmozglue.so
06-02 12:33:54.762  1183  1183 I DEBUG   :     #03 pc 00b9acca  /data/dalvik-cache/x86/data@app@org.mozilla.geckoview_example-1@base.apk@classes.dex
06-02 12:33:54.798  1183  1183 I DEBUG   : 
06-02 12:33:54.798  1183  1183 I DEBUG   : Tombstone written to: /data/tombstones/tombstone_00

I am setting it as a release blocker since we officially support Android 5 and have a few users.

at least with void* handle = __wrap_dlopen("libc.so", RTLD_LAZY); it's opening geckoview_example

Attachment #9337246 - Attachment description: WIP: Bug 1835231 - Use dlopen() wrapper for Android <= 22 → Bug 1835231 - Use dlopen() wrapper for Android <= 22 r?gsvelto!
Pushed by alissy@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2aad669b353d Use dlopen() wrapper for Android <= 22 r=gsvelto

I am setting the tracking flag back from blocking to + as we have a temporary mitigation in place for next week (block installs and updates for Android 5 devices) and a real fix incoming that we will test in nightly and beta next week and can include in our weekly Android update of the app.

Status: NEW → RESOLVED
Closed: 1 years ago
Resolution: --- → FIXED
Target Milestone: --- → 115 Branch

Flagging this early while I continue testing this patch across API levels but so far I've found that on API level 28 and 29 it no longer crashes at launch but I'm getting a 100% reproducible crash in regular browsing behavior. It looks like we're crashing in libc for multiple different reasons (gpu process crash, Android UI crash). Without the patch this does not appear to occur (or could potentially be flaky? I need to continue testing without the patch). Below is the STR, I've attached a video of it occurring, and I'm also attaching stack traces

STR in GV example:

  • Download diff for revision
  • Update default branch
  • git apply your_diff_file.patch
  • Setup testing device: Pixel 6 pro - Arm64 - API 28
  • ./mach build
  • Build and run GV example
  • Browse to slickdeals.net
  • When the iframe for "Sign in with Google" pops up click the X to close

(In reply to Zac McKenney [:zmckenney] from comment #20)

Flagging this early while I continue testing this patch across API levels but so far I've found that on API level 28 and 29 it no longer crashes at launch but I'm getting a 100% reproducible crash in regular browsing behavior. It looks like we're crashing in libc for multiple different reasons (gpu process crash, Android UI crash). Without the patch this does not appear to occur (or could potentially be flaky? I need to continue testing without the patch). Below is the STR, I've attached a video of it occurring, and I'm also attaching stack traces

STR in GV example:

  • Download diff for revision
  • Update default branch
  • git apply your_diff_file.patch
  • Setup testing device: Pixel 6 pro - Arm64 - API 28
  • ./mach build
  • Build and run GV example
  • Browse to slickdeals.net
  • When the iframe for "Sign in with Google" pops up click the X to close

Your stack shows an unrelated crash: 2023-06-02 15:25:27.985 17581-17627 MOZ_Assert org.mozilla.geckoview_example A Assertion failure: sideBits == hit.mNode->GetFixedPosSides() (Fixed position side bits do not match), at /Users/mozilla/StudioProjects/gecko/gfx/layers/apz/src/WRHitTester.cpp:230

Flags: needinfo?(gsvelto)
Flags: needinfo?(zmckenney)

I agree that this appears unrelated which is why I had to triple check myself. I was finding multiple different reasons for the crash as well, I'll add another stack trace to an issue that at first glance also seems unrelated. Unfortunately, checking out default and doing a clean mach build then running GV example never crashes but as soon as I just make the changes in the patch for this bug it's 100% reproducible.

Maybe we could have someone else also confirm with their default checkout the no-crash and then run my STR? I really didn't believe it to be related either and began writing a new bug before realizing that it only was occurring for me with this patch. If someone else is able to confirm that would help though.

Flags: needinfo?(zmckenney) → needinfo?(lissyx+mozillians)
Flags: needinfo?(lissyx+mozillians)

You do realize that on API levels > 22, the current patch does not make any changes? We still call dlopen() directly (with one level of indirection maybe, but it's static inline).

Either way, it's weekend and I am attending an event I can't investigate this.

Sorry for the added confusion, I'm on family vacation as well but now that it has officially landed I tested to see if it still crashes and it no longer does. I recognize that it was only supposed to run in API levels > 22 and I'm not sure why this was happening before when applying the diff for this but it does appear unrelated.

No problem, good to know it was just a simple mistake :)

Verified on the latest Fenix Nightly 116.0a1 from 6/6, and Beta 115.0b1 with the following devices:

  • Huawei MediaPad M2 (Android 5.1.1), and
  • Samsung Galaxy Tab A6 (Android 5.1.1).

Both apps could be opened and used, no crash occured.

Zac, could you request uplift to mozilla-release please? Thanks

Flags: needinfo?(zmckenney)

Or Alexandre, as the patch author, Thanks

Flags: needinfo?(zmckenney) → needinfo?(lissyx+mozillians)

Comment on attachment 9337246 [details]
Bug 1835231 - Use dlopen() wrapper for Android <= 22 r?gsvelto!

Beta/Release Uplift Approval Request

  • User impact if declined: instant crash
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: install, try to start
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): fix easy
  • String changes made/needed: no
  • Is Android affected?: Yes
Flags: needinfo?(lissyx+mozillians)
Attachment #9337246 - Flags: approval-mozilla-beta?
Flags: qe-verify+

Comment on attachment 9337246 [details]
Bug 1835231 - Use dlopen() wrapper for Android <= 22 r?gsvelto!

Beta/Release Uplift Approval Request

  • User impact if declined: instant crash
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: install, try to run
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): easy fix
  • String changes made/needed: no
  • Is Android affected?: Yes
Attachment #9337246 - Flags: approval-mozilla-release?
Attachment #9337246 - Flags: approval-mozilla-beta?

Comment on attachment 9337246 [details]
Bug 1835231 - Use dlopen() wrapper for Android <= 22 r?gsvelto!

Approved for our 114.0.1 release, thanks.

Attachment #9337246 - Flags: approval-mozilla-release? → approval-mozilla-release+

Verified as fixed on the latest RC 114.1.0 and on latest Fenix Nightly 116.0a1 from 06/09 and Beta 115.0b3 as well with the following devices:

  • Huawei MediaPad M2 (Android 5.1.1)
  • Samsung Galaxy Tab A6 (Android 5.1.1)
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Duplicate of this bug: 1837879
Duplicate of this bug: 1839071
No longer duplicate of this bug: 1839071
See Also: → 1861724
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: