Open Bug 1849794 Opened 2 years ago Updated 1 year ago

Crash in [@ GeckoAppShellSupport::ReportJavaCrash]

Categories

(Firefox for Android :: Crash Reporting, defect, P2)

Unspecified
Android
defect

Tracking

()

Tracking Status
firefox116 --- wontfix
firefox117 --- wontfix
firefox118 --- wontfix
firefox119 --- wontfix
firefox120 --- wontfix

People

(Reporter: cpeterson, Unassigned)

References

Details

(Keywords: crash, regression, regressionwindow-wanted)

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/f370ce7f-32a3-40cb-9ddd-c24ca0230823

Looks like there was a spike in these crash reporter crashes in Fx116.

Reason: SIGSEGV / SEGV_MAPERR

Top 10 frames of crashing thread:

0  libxul.so  GeckoAppShellSupport::ReportJavaCrash  widget/android/nsAppShell.cpp:234
0  libxul.so  mozilla::jni::NativeStub<mozilla::java::GeckoAppShell::ReportJavaCrash_t, GeckoAppShellSupport, mozilla::jni::Args<mozilla::jni::Ref<mozilla::jni::TypedObject<_jthrowable*>, _jthrowable*> const&, mozilla::jni::StringParam const&> >::Wrap<&GeckoAppShellSupport::ReportJavaCrash  widget/android/jni/Natives.h:1462
1  base.odex  base.odex@0x4a9503  
2  base.art]  base.art]@0x4765f6  
3  base.art]  base.art]@0x409b6  
4  system@framework@boot.art  system@framework@boot.art@0x176faa  
5  libart.so  libart.so@0x40d775  
6  libart.so  libart.so@0x3e7415  
7  base.art]  base.art]@0x48637a  
8  libart.so  libart.so@0x3fce7e  

The first buildid that seems to be involved in the spike is 20230622214511 in the nightly channel. The issue seems to be caused by the Java side though: we call abortThroughJava() which triggers a whole lot of stuff on the JVM side, and then we crash because that code throws an exception (from the Java side) and there's no handler to catch it. So, this would have been an existing crash but with a different signature.

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 AArch64 and ARM crashes on release

For more information, please visit BugBot documentation.

Keywords: topcrash

Is it possible that bug 1689358 is related? I'll also note that a large number of the Fenix 117 crash reports we've seen in Socorro so far have been of the EMPTY variety for various reasons. Like roughly 10x higher than the first useful crash signature.

Flags: needinfo?(gsvelto)

Bug 1689358 only affects how child process crashes are written, it didn't change the exception handler nor how the main process crashes are dealy with. The crashes here appear to be all from child process crashes and the minidumps are fully written, so the changes in bug 1689358 are doing their job. The question is why we're throwing exceptions that aren't caught? I read in another bug that there were changes in how Java crashes were being reported and maybe that's the problem.

Regarding the EMPTY crash signatures I see two problems that appear to be unrelated. The first problem is within the [@ EMPTY: no frame data available ] crashes. These aren't native crashes, they're all uncaught Java exceptions. If you check the signature they all have the JavaException annotation set. However they don't seem to have the JavaStackTrace annotation. This annotation is used to generate the crash signature, in its absence Socorro is reporting the lack of a minidump, this is likely a problem in the Java crash reporter that is responsible for populating the annotations. I see that code has been touched in bug 1550206 and the changes went into version 116 so maybe that's something that could affect the results.

Finally the [@ EMPTY: no frame data available; EmptyMinidump] are 100% main process crashes. They're cases where the Breakpad minidump writer failed to generate a proper minidump. Their volume didn't change across major versions, it's a known problem that I will hopefully solve with OOP minidump generation.

Flags: needinfo?(gsvelto)

Here's a graph of the crash reports with empty minidumps. As you can see there's no major changes across versions and they're all main process crashes so unaffected by my changes or the Java ones.

FYI the issue with the missing stack traces is bug 1847429.

I went through these crashes again and found a few useful things. A significant minority of these crashes has a call to mozalloc_abort() in the stack. That is they're OOMs. This call delegates to abortThroughJava() which in turn calls the Java method GeckoLoader.abort(). The latter finds the uncaught exception handler for the current thread and fires an AbortException.

Given that fixes were introduced in the Fenix crash reporting machinery - an presumably the exception handler too - it's possible that the crashes here were unreported. The crash itself seems to be caused by no exception handler actually catching an exception, but we don't know if it's an AbortException or another one which might have been thrown from the uncaught exception handler itself.

To make some progress here we should figure out what exception is the cause for the ultimate crash. Possibly by adding a crash annotation with the name/type of the exception right before crashing here.

Hey Gabriele, we're not seeing any crashes in 119 and 120 beta. Can you clarify what is happening here?

Flags: needinfo?(gsvelto)

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash

I've had a quick glance and I could find a nightly crash on version 120 and a beta one for version 119. The volume is low from both channels, but I'd chalk it up to fewer users experiencing those kind of crashes.

Flags: needinfo?(gsvelto)

changing to S3 given the lower volume of crashes

Severity: S2 → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: