Crash in [@ GeckoAppShellSupport::ReportJavaCrash]
Categories
(Firefox for Android :: Crash Reporting, defect, P2)
Tracking
()
People
(Reporter: cpeterson, Unassigned)
References
Details
(Keywords: crash, regression, regressionwindow-wanted)
Crash Data
Attachments
(1 file)
Crash report: https://crash-stats.mozilla.org/report/index/f370ce7f-32a3-40cb-9ddd-c24ca0230823
Looks like there was a spike in these crash reporter crashes in Fx116.
Reason: SIGSEGV / SEGV_MAPERR
Top 10 frames of crashing thread:
0 libxul.so GeckoAppShellSupport::ReportJavaCrash widget/android/nsAppShell.cpp:234
0 libxul.so mozilla::jni::NativeStub<mozilla::java::GeckoAppShell::ReportJavaCrash_t, GeckoAppShellSupport, mozilla::jni::Args<mozilla::jni::Ref<mozilla::jni::TypedObject<_jthrowable*>, _jthrowable*> const&, mozilla::jni::StringParam const&> >::Wrap<&GeckoAppShellSupport::ReportJavaCrash widget/android/jni/Natives.h:1462
1 base.odex base.odex@0x4a9503
2 base.art] base.art]@0x4765f6
3 base.art] base.art]@0x409b6
4 system@framework@boot.art system@framework@boot.art@0x176faa
5 libart.so libart.so@0x40d775
6 libart.so libart.so@0x3e7415
7 base.art] base.art]@0x48637a
8 libart.so libart.so@0x3fce7e
Comment 1•2 years ago
|
||
The first buildid that seems to be involved in the spike is 20230622214511 in the nightly channel. The issue seems to be caused by the Java side though: we call abortThroughJava() which triggers a whole lot of stuff on the JVM side, and then we crash because that code throws an exception (from the Java side) and there's no handler to catch it. So, this would have been an existing crash but with a different signature.
Comment 2•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 AArch64 and ARM crashes on release
For more information, please visit BugBot documentation.
Updated•2 years ago
|
Comment 3•2 years ago
|
||
Is it possible that bug 1689358 is related? I'll also note that a large number of the Fenix 117 crash reports we've seen in Socorro so far have been of the EMPTY variety for various reasons. Like roughly 10x higher than the first useful crash signature.
Comment 4•2 years ago
|
||
Bug 1689358 only affects how child process crashes are written, it didn't change the exception handler nor how the main process crashes are dealy with. The crashes here appear to be all from child process crashes and the minidumps are fully written, so the changes in bug 1689358 are doing their job. The question is why we're throwing exceptions that aren't caught? I read in another bug that there were changes in how Java crashes were being reported and maybe that's the problem.
Regarding the EMPTY crash signatures I see two problems that appear to be unrelated. The first problem is within the [@ EMPTY: no frame data available ] crashes. These aren't native crashes, they're all uncaught Java exceptions. If you check the signature they all have the JavaException annotation set. However they don't seem to have the JavaStackTrace annotation. This annotation is used to generate the crash signature, in its absence Socorro is reporting the lack of a minidump, this is likely a problem in the Java crash reporter that is responsible for populating the annotations. I see that code has been touched in bug 1550206 and the changes went into version 116 so maybe that's something that could affect the results.
Finally the [@ EMPTY: no frame data available; EmptyMinidump] are 100% main process crashes. They're cases where the Breakpad minidump writer failed to generate a proper minidump. Their volume didn't change across major versions, it's a known problem that I will hopefully solve with OOP minidump generation.
Comment 5•2 years ago
|
||
Here's a graph of the crash reports with empty minidumps. As you can see there's no major changes across versions and they're all main process crashes so unaffected by my changes or the Java ones.
Comment 6•2 years ago
|
||
FYI the issue with the missing stack traces is bug 1847429.
Updated•2 years ago
|
Comment 7•2 years ago
|
||
I went through these crashes again and found a few useful things. A significant minority of these crashes has a call to mozalloc_abort() in the stack. That is they're OOMs. This call delegates to abortThroughJava() which in turn calls the Java method GeckoLoader.abort(). The latter finds the uncaught exception handler for the current thread and fires an AbortException.
Given that fixes were introduced in the Fenix crash reporting machinery - an presumably the exception handler too - it's possible that the crashes here were unreported. The crash itself seems to be caused by no exception handler actually catching an exception, but we don't know if it's an AbortException or another one which might have been thrown from the uncaught exception handler itself.
To make some progress here we should figure out what exception is the cause for the ultimate crash. Possibly by adding a crash annotation with the name/type of the exception right before crashing here.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 8•2 years ago
|
||
Hey Gabriele, we're not seeing any crashes in 119 and 120 beta. Can you clarify what is happening here?
Comment 9•2 years ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit BugBot documentation.
Comment 10•2 years ago
|
||
I've had a quick glance and I could find a nightly crash on version 120 and a beta one for version 119. The volume is low from both channels, but I'd chalk it up to fewer users experiencing those kind of crashes.
Updated•2 years ago
|
Description
•