1851704 - Crash in [@ java.lang.NoSuchFieldException: at java.lang.Class.getDeclaredField(Class.java)] org.mozilla.fenix.wallpapers.WallpapersUseCases$DefaultInitializeWallpaperUseCase

Chris Peterson [:cpeterson]

Reporter

Description

•

2 years ago

•

Edited

Crash report: https://crash-stats.mozilla.org/report/index/83d0b7d9-fca4-42ad-ac01-196840230905

Exception: java.lang.NoSuchFieldException
Message: _decision

Frame   Module  Function  Source
0   java.lang.Class   getDeclaredField  Class.java:929
1   java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl   <init>  AtomicIntegerFieldUpdater.java:251
2   java.util.concurrent.atomic.AtomicIntegerFieldUpdater   newUpdater  AtomicIntegerFieldUpdater.java:49 
3   kotlinx.coroutines.DispatchedCoroutine  <clinit>  Builders.common.kt:5
4   kotlinx.coroutines.BuildersKt   withContext   :90 
5   org.mozilla.fenix.wallpapers.WallpapersUseCases$DefaultInitializeWallpaperUseCase   invoke  WallpapersUseCases.kt:439 
6   org.mozilla.fenix.FenixApplication$downloadWallpapers$1   invokeSuspend   FenixApplication.kt:53
7   kotlin.coroutines.jvm.internal.BaseContinuationImpl   resumeWith  ContinuationImpl.kt:9 
8   kotlinx.coroutines.DispatchedTask   run   DispatchedTask.kt:112 
9   kotlinx.coroutines.scheduling.CoroutineScheduler$Worker   run   CoroutineScheduler.kt:96

BugBot [:suhaib / :marco/ :calixte]

Comment 1

•

2 years ago

The bug is linked to a topcrash signature, which matches the following criteria:

Top 10 AArch64 and ARM crashes on beta
Top 10 AArch64 and ARM crashes on release

For more information, please visit BugBot documentation.

Keywords: topcrash

Ryan VanderMeulen [:RyanVM]

Updated

•

2 years ago

status-firefox117: affected → fix-optional

Matt Tighe [:matt-tighe]

Comment 2

•

2 years ago

This looks to be a known issue in the stack: https://github.com/Kotlin/kotlinx.coroutines/issues/1900

Chris Peterson [:cpeterson]

Reporter

Comment 3

•

2 years ago

please check your proguard rules.
If your rules are in order, but some users still see the crash, it's likely to be an Android-specific issue and it's worth reporting to Google issue tracker

Do we use ProGuard, Redex, or any other dex optimizers that cause symbol-renaming issues like this?

Emilio Cobos Álvarez [:emilio]

Comment 4

•

2 years ago

Joe is this on your team's radar?

status-firefox118: affected → fix-optional

Flags: needinfo?(jmahon)

Joe M [:jmahon]

Comment 5

•

2 years ago

Wasn't actively being pursued. Adding it to our working board.

Component: General → Homepage

Flags: needinfo?(jmahon)

Whiteboard: [fxdroid]

Jira Integration Bot

Updated

•

2 years ago

See Also: → https://mozilla-hub.atlassian.net/browse/FXDROID-851

Jonathan Almeida [:jonalmeida]

Comment 6

•

2 years ago

•

Edited

(In reply to Chris Peterson [:cpeterson] from comment #3)

please check your proguard rules.
If your rules are in order, but some users still see the crash, it's likely to be an Android-specific issue and it's worth reporting to Google issue tracker

Do we use ProGuard, Redex, or any other dex optimizers that cause symbol-renaming issues like this?

Yes, we should look into ensuring that we are using the appropriate keep rules to avoid this crash if possible, however, this happening in the same release and API version 21 as bug 1844964 is suspect that we are suffering from another root cause that is masked by these two different crash reports.

Comment 7

•

2 years ago

Note that this bug might look like a regression in August, but more likely is that it's an existing crash that was misreported and buried in the "EMPTY: no frame data available" crash reports (bug 1847429). willkg added a Socorro workaround for these "empty" crash reports and started generating proper crash signatures for reports received after August 1.

Keywords: regression, regressionwindow-wanted

Jonathan Almeida [:jonalmeida]

Comment 8

•

2 years ago

This sadly seems like a WONTFIX issue. Looking at it with fresh eyes after the bug 1844964 investigation, and I no longer think it's related.

From this spicy thread in the coroutine library (Kotlin/kotlinx.coroutines#490), this is a known issue and there is no consensus on a solution as it's a bug on the Samsung Android API 21 devices that Samsung has not responded to.

What can we do? Not much without taking some incredible amount of energy to rewrite a Samsung-only API 21 patch which I'm not eager to do.

I agree with comment 7, that this is probably not a new crash because nothing has changed in this code path either.

See Also: 1844964 → https://github.com/Kotlin/kotlinx.coroutines/issues/490

Chris Peterson [:cpeterson]

Reporter

Comment 9

•

2 years ago

Jonathan, if we can't avoid hitting this exception in Kotlin code, can we catch and ignore it lower in the call stack in WallpapersUseCases$DefaultInitializeWallpaperUseCase or FenixApplication$downloadWallpapers?

Also, if this exception has been a known Kotlin issue since 2018, why did we only start hitting it in Fx 117? Did we rewrite our wallpapers code to use coroutines in 117?

Frame   Module  Function  Source
0   java.lang.Class   getDeclaredField  Class.java:929
1   java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl   <init>  AtomicIntegerFieldUpdater.java:251
2   java.util.concurrent.atomic.AtomicIntegerFieldUpdater   newUpdater  AtomicIntegerFieldUpdater.java:49 
3   kotlinx.coroutines.DispatchedCoroutine  <clinit>  Builders.common.kt:5
4   kotlinx.coroutines.BuildersKt   withContext   :90 
5   org.mozilla.fenix.wallpapers.WallpapersUseCases$DefaultInitializeWallpaperUseCase   invoke  WallpapersUseCases.kt:439 
6   org.mozilla.fenix.FenixApplication$downloadWallpapers$1   invokeSuspend   FenixApplication.kt:53
7   kotlin.coroutines.jvm.internal.BaseContinuationImpl   resumeWith  ContinuationImpl.kt:9 
8   kotlinx.coroutines.DispatchedTask   run   DispatchedTask.kt:112 
9   kotlinx.coroutines.scheduling.CoroutineScheduler$Worker   run   CoroutineScheduler.kt:96

status-firefox117: fix-optional → wontfix

status-firefox120: --- → ?

Flags: needinfo?(jonalmeida942)

Matt Tighe [:matt-tighe]

Comment 10

•

2 years ago

•

Edited

IIUC we can catch the exception in Kotlin code if we just want this specific crash to go away. However, that would imply a few things:

Users on those devices won't have access to a feature without any indication as to why. Our solve here would be to basically silently fail downloading wallpapers forever.
The underlying issue that Samsung has caused here is liable to appear in other places throughout the app, so we either
1. Litter the codebase with exception handling (and random dead spots in the app for users)
2. Undertake a pretty massive (and hacky, ultimately) technical initiative to solve it more generally

IMO the short-term solution is not worth the damage to the user experience or the long-term health of our code, and the long-term investment is not worth the effort given the (shrinking) size of the device segment of Samsung users on Android 5.0.x.

Jonathan Almeida [:jonalmeida]

Comment 11

•

2 years ago

•

Edited

(In reply to Matt Tighe [:matt-tighe] from comment #10)

The underlying issue that Samsung has caused here is liable to appear in other places throughout the app, so we either

Thanks Matt!

Chris, I want to emphasize this point of Matt's: we will just be moving the problem to the next place in the code base where we need to use AtomicIntegerFieldUpdater in any of our libraries (everything kotlin with a suspend function).

(In reply to Chris Peterson [:cpeterson] from comment #9)

Also, if this exception has been a known Kotlin issue since 2018, why did we only start hitting it in Fx 117? Did we rewrite our wallpapers code to use coroutines in 117?

That's a good question, I forgot to reconsider that again.

The kotlinx.coroutines 1.7.0 update had a new API implementation for Channel (coroutine API that is used internally) which is faster, but touches on this buggy code in Samsung API 21 devices whenever it needs to access the java Atomic*FieldUpdater. One of the library author's have a better explanation of the problem here.

I don't think backing it out is feasible because we have to do this update eventually (our dependencies are also updating their dependencies to it too).

The wordpress app seems to have faced this issue too and updated their min SDK version to 24. Bumping up to API 22 alone would solve a lot of bugs like this for us (even in our native rust components).

Flags: needinfo?(jonalmeida942)

Joe M [:jmahon]

Comment 12

•

2 years ago

Seems like the simplest and most maintainable option would indeed be to bump the minSDK version. I just added an investigation task to help us determine if there are considerable downsides to doing so, let's circle back to this after doing that investigation, before closing this as WONTFIX.

I generally get nervous about WONTFIX'ing unresolved crashes, because those crashes will still continue to pop up in our logs, so it just perpetuates noise that makes it a little harder to notice other emerging issues. (that would be my main personal justification for catching and handling this one, even if it leaves the users with a broken experience - but your counterarguments to doing so are solid, as well).

Chris Peterson [:cpeterson]

Reporter

Comment 13

•

2 years ago

Seems like the simplest and most maintainable option would indeed be to bump the minSDK version.

Since this crash only affects users with API 21, bumping the minimum API to 22 would leave these users with an unsupported (and still crashing) Firefox.

This telemetry query says that only 0.13% of Fenix users have API 21 (vs 2.37% with API 22):

https://sql.telemetry.mozilla.org/queries/80030#198699

Dianna Smith [:diannaS]

Updated

•

1 year ago

status-firefox118: fix-optional → wontfix

status-firefox119: affected → wontfix

status-firefox120: ? → fix-optional

status-firefox121: --- → affected

Ryan VanderMeulen [:RyanVM]

Updated

•

1 year ago

status-firefox120: fix-optional → wontfix

status-firefox121: affected → wontfix

status-firefox122: --- → fix-optional

twhite

Updated

•

1 year ago

Priority: P1 → P2

Chris Peterson [:cpeterson]

Reporter

Updated

•

1 year ago

Updated

•

1 year ago

Updated

•

1 year ago

status-firefox122: fix-optional → wontfix

Chris Peterson [:cpeterson]

Reporter

Updated

•

1 year ago

status-firefox123: --- → affected

status-firefox124: --- → affected

status-firefox125: --- → ?

Priority: P2 → P3

Hardware: Unspecified → ARM

Ryan VanderMeulen [:RyanVM]

Updated

•

1 year ago

status-firefox123: affected → wontfix

status-firefox124: affected → wontfix

status-firefox125: ? → fix-optional

status-firefox126: --- → affected

Ryan VanderMeulen [:RyanVM]

Updated

•

1 year ago

status-firefox125: fix-optional → wontfix

status-firefox126: affected → wontfix

BugBot [:suhaib / :marco/ :calixte]

Comment 14

•

1 year ago

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash

BugBot [:suhaib / :marco/ :calixte]

Comment 15

•

1 year ago

Since the crash volume is low (less than 15 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit BugBot documentation.

Severity: S2 → S3