Crash in [@ java.lang.NoSuchFieldException: at java.lang.Class.getDeclaredField(Class.java)] org.mozilla.fenix.wallpapers.WallpapersUseCases$DefaultInitializeWallpaperUseCase
Categories
(Firefox for Android :: Homepage, defect, P3)
Tracking
()
People
(Reporter: cpeterson, Unassigned)
References
Details
(Keywords: crash, Whiteboard: [fxdroid])
Crash Data
Crash report: https://crash-stats.mozilla.org/report/index/83d0b7d9-fca4-42ad-ac01-196840230905
Exception: java.lang.NoSuchFieldException
Message: _decision
Frame Module Function Source
0 java.lang.Class getDeclaredField Class.java:929
1 java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl <init> AtomicIntegerFieldUpdater.java:251
2 java.util.concurrent.atomic.AtomicIntegerFieldUpdater newUpdater AtomicIntegerFieldUpdater.java:49
3 kotlinx.coroutines.DispatchedCoroutine <clinit> Builders.common.kt:5
4 kotlinx.coroutines.BuildersKt withContext :90
5 org.mozilla.fenix.wallpapers.WallpapersUseCases$DefaultInitializeWallpaperUseCase invoke WallpapersUseCases.kt:439
6 org.mozilla.fenix.FenixApplication$downloadWallpapers$1 invokeSuspend FenixApplication.kt:53
7 kotlin.coroutines.jvm.internal.BaseContinuationImpl resumeWith ContinuationImpl.kt:9
8 kotlinx.coroutines.DispatchedTask run DispatchedTask.kt:112
9 kotlinx.coroutines.scheduling.CoroutineScheduler$Worker run CoroutineScheduler.kt:96
Comment 1•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criteria:
- Top 10 AArch64 and ARM crashes on beta
- Top 10 AArch64 and ARM crashes on release
For more information, please visit BugBot documentation.
Updated•2 years ago
|
Comment 2•2 years ago
|
||
This looks to be a known issue in the stack: https://github.com/Kotlin/kotlinx.coroutines/issues/1900
| Reporter | ||
Comment 3•2 years ago
|
||
please check your proguard rules.
If your rules are in order, but some users still see the crash, it's likely to be an Android-specific issue and it's worth reporting to Google issue tracker
Do we use ProGuard, Redex, or any other dex optimizers that cause symbol-renaming issues like this?
Comment 5•2 years ago
|
||
Wasn't actively being pursued. Adding it to our working board.
Updated•2 years ago
|
Comment 6•2 years ago
•
|
||
(In reply to Chris Peterson [:cpeterson] from comment #3)
please check your proguard rules.
If your rules are in order, but some users still see the crash, it's likely to be an Android-specific issue and it's worth reporting to Google issue trackerDo we use ProGuard, Redex, or any other dex optimizers that cause symbol-renaming issues like this?
Yes, we should look into ensuring that we are using the appropriate keep rules to avoid this crash if possible, however, this happening in the same release and API version 21 as bug 1844964 is suspect that we are suffering from another root cause that is masked by these two different crash reports.
| Reporter | ||
Comment 7•2 years ago
|
||
Note that this bug might look like a regression in August, but more likely is that it's an existing crash that was misreported and buried in the "EMPTY: no frame data available" crash reports (bug 1847429). willkg added a Socorro workaround for these "empty" crash reports and started generating proper crash signatures for reports received after August 1.
Comment 8•2 years ago
|
||
This sadly seems like a WONTFIX issue. Looking at it with fresh eyes after the bug 1844964 investigation, and I no longer think it's related.
From this spicy thread in the coroutine library (Kotlin/kotlinx.coroutines#490), this is a known issue and there is no consensus on a solution as it's a bug on the Samsung Android API 21 devices that Samsung has not responded to.
What can we do? Not much without taking some incredible amount of energy to rewrite a Samsung-only API 21 patch which I'm not eager to do.
I agree with comment 7, that this is probably not a new crash because nothing has changed in this code path either.
| Reporter | ||
Comment 9•2 years ago
|
||
Jonathan, if we can't avoid hitting this exception in Kotlin code, can we catch and ignore it lower in the call stack in WallpapersUseCases$DefaultInitializeWallpaperUseCase or FenixApplication$downloadWallpapers?
Also, if this exception has been a known Kotlin issue since 2018, why did we only start hitting it in Fx 117? Did we rewrite our wallpapers code to use coroutines in 117?
Frame Module Function Source
0 java.lang.Class getDeclaredField Class.java:929
1 java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl <init> AtomicIntegerFieldUpdater.java:251
2 java.util.concurrent.atomic.AtomicIntegerFieldUpdater newUpdater AtomicIntegerFieldUpdater.java:49
3 kotlinx.coroutines.DispatchedCoroutine <clinit> Builders.common.kt:5
4 kotlinx.coroutines.BuildersKt withContext :90
5 org.mozilla.fenix.wallpapers.WallpapersUseCases$DefaultInitializeWallpaperUseCase invoke WallpapersUseCases.kt:439
6 org.mozilla.fenix.FenixApplication$downloadWallpapers$1 invokeSuspend FenixApplication.kt:53
7 kotlin.coroutines.jvm.internal.BaseContinuationImpl resumeWith ContinuationImpl.kt:9
8 kotlinx.coroutines.DispatchedTask run DispatchedTask.kt:112
9 kotlinx.coroutines.scheduling.CoroutineScheduler$Worker run CoroutineScheduler.kt:96
Comment 10•2 years ago
•
|
||
IIUC we can catch the exception in Kotlin code if we just want this specific crash to go away. However, that would imply a few things:
- Users on those devices won't have access to a feature without any indication as to why. Our solve here would be to basically silently fail downloading wallpapers forever.
- The underlying issue that Samsung has caused here is liable to appear in other places throughout the app, so we either
- Litter the codebase with exception handling (and random dead spots in the app for users)
- Undertake a pretty massive (and hacky, ultimately) technical initiative to solve it more generally
IMO the short-term solution is not worth the damage to the user experience or the long-term health of our code, and the long-term investment is not worth the effort given the (shrinking) size of the device segment of Samsung users on Android 5.0.x.
Comment 11•2 years ago
•
|
||
(In reply to Matt Tighe [:matt-tighe] from comment #10)
- The underlying issue that Samsung has caused here is liable to appear in other places throughout the app, so we either
Thanks Matt!
Chris, I want to emphasize this point of Matt's: we will just be moving the problem to the next place in the code base where we need to use AtomicIntegerFieldUpdater in any of our libraries (everything kotlin with a suspend function).
(In reply to Chris Peterson [:cpeterson] from comment #9)
Also, if this exception has been a known Kotlin issue since 2018, why did we only start hitting it in Fx 117? Did we rewrite our wallpapers code to use coroutines in 117?
That's a good question, I forgot to reconsider that again.
The kotlinx.coroutines 1.7.0 update had a new API implementation for Channel (coroutine API that is used internally) which is faster, but touches on this buggy code in Samsung API 21 devices whenever it needs to access the java Atomic*FieldUpdater. One of the library author's have a better explanation of the problem here.
I don't think backing it out is feasible because we have to do this update eventually (our dependencies are also updating their dependencies to it too).
The wordpress app seems to have faced this issue too and updated their min SDK version to 24. Bumping up to API 22 alone would solve a lot of bugs like this for us (even in our native rust components).
Comment 12•2 years ago
|
||
Seems like the simplest and most maintainable option would indeed be to bump the minSDK version. I just added an investigation task to help us determine if there are considerable downsides to doing so, let's circle back to this after doing that investigation, before closing this as WONTFIX.
I generally get nervous about WONTFIX'ing unresolved crashes, because those crashes will still continue to pop up in our logs, so it just perpetuates noise that makes it a little harder to notice other emerging issues. (that would be my main personal justification for catching and handling this one, even if it leaves the users with a broken experience - but your counterarguments to doing so are solid, as well).
| Reporter | ||
Comment 13•2 years ago
|
||
Seems like the simplest and most maintainable option would indeed be to bump the minSDK version.
Since this crash only affects users with API 21, bumping the minimum API to 22 would leave these users with an unsupported (and still crashing) Firefox.
This telemetry query says that only 0.13% of Fenix users have API 21 (vs 2.37% with API 22):
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
| Reporter | ||
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 14•1 year ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit BugBot documentation.
Comment 15•1 year ago
|
||
Since the crash volume is low (less than 15 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.
For more information, please visit BugBot documentation.
Description
•