Closed Bug 1847429 Opened 1 year ago Closed 1 year ago

[@ EMPTY: no frame data available ] instead of Java signature for crash reports from Android

Categories

(Socorro :: Signature, defect, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: robwu, Assigned: willkg)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Crash Data

Attachments

(3 files)

In bug 1847372, I pinpointed a reliable OOM crash-trigger, and witnessed the crash happening on all recent versions (Release 116, Beta 117, Nightly 118). Strangely, the last known entry in crash-stats only is associated with Firefox 111:

When I tried to trigger a crash report, I was unable to do so due to a regression that broke crash reporting in 115 and 116: bug 1838389. This bug is fixed in Nightly 117.

After running the STR from bug 1847372, I got a crash report, but its signature is [@ EMPTY: no frame data available ]:

Both reports have the Java Stack Trace field populated, which feeds my suspicion that this is a bug in Socorro rather than the client side.

Is this a duplicate of bug 1245570? ("crash in EMPTY: no crashing thread identified; no frame data available (Firefox for Android only)")

See Also: → 1245570

That other bug is much older, and it was not clearly actionable.

I filed this one because of a specific actionable task: figure out why two similar crashes appear to have different crash signatures. Due to the overlapping mrtadata to extract the information from, I think that Socorro is the first place to take a look, but I wouldn't completely rule out this being a (Firefox for Android) client issue either.

I glanced at the crash report in question and it's weird it picked up that signature. I'll grab this to look into further this week.

Assignee: nobody → willkg
Status: NEW → ASSIGNED
Type: task → defect
Priority: -- → P2

What's going on is that there's no JavaStackTrace annotation in bp-3f3611e7-97e4-4db1-8061-0541b0230806 and that's what signature generation uses to generate signatures for Java crash reports.

Rob: Any idea why this crash report is missing JavaStackTrace?

Flags: needinfo?(rob)

(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #4)

What's going on is that there's no JavaStackTrace annotation in bp-3f3611e7-97e4-4db1-8061-0541b0230806 and that's what signature generation uses to generate signatures for Java crash reports.

Rob: Any idea why this crash report is missing JavaStackTrace?

Probably the same reason as bug 1838389 (example pasted below): In that bug a NPE was fixed by wrapping logic in try-catch and returning null otherwise: https://github.com/mozilla-mobile/firefox-android/commit/884a6086756fd35320e49a9d80768a646492477c

getExceptionStackTrace is used here to populate JavaStackTrace: https://github.com/mozilla-mobile/firefox-android/blob/884a6086756fd35320e49a9d80768a646492477c/android-components/components/lib/crash/src/main/java/mozilla/components/lib/crash/service/MozillaSocorroService.kt#L286-L299

Here is an example of a stack trace that causes throwable.getStacktraceAsString to raise an error, copy-pasted from about:crashes. The issue occurred when I tried to submit the crash report from bug 1847372 on Beta.

ddf4650b-cf4a-431c-b461-d920a70eda9e
java.lang.NullPointerException: Attempt to invoke virtual method 'java.lang.String java.lang.Object.toString()' on a null object reference
 * New Sentry Instance: https://sentry.io/organizations/mozilla/issues/?project=6295551&query=4414637fdbc2433eb352dca9124104e2
 * New Sentry Instance: https://sentry.io/organizations/mozilla/issues/?project=6295551&query=34b5fd80ceb041f0adfbfa4aa6a298d9
----
java.lang.NullPointerException: Attempt to invoke virtual method 'java.lang.String java.lang.Object.toString()' on a null object reference
	at java.lang.String.valueOf(String.java:3657)
	at java.lang.StringBuilder.append(StringBuilder.java:132)
	at java.lang.Throwable.printEnclosedStackTrace(Throwable.java:717)
	at java.lang.Throwable.printStackTrace(Throwable.java:682)
	at java.lang.Throwable.printStackTrace(Throwable.java:743)
	at mozilla.components.support.base.ext.ThrowableKt.getStacktraceAsString$default(Throwable.kt:19)
	at mozilla.components.lib.crash.service.MozillaSocorroService.sendCrashData(MozillaSocorroService.kt:607)
	at mozilla.components.lib.crash.service.MozillaSocorroService.sendReport$lib_crash_release(MozillaSocorroService.kt:285)
	at mozilla.components.lib.crash.service.MozillaSocorroService.report(MozillaSocorroService.kt:5)
	at mozilla.components.lib.crash.CrashReporter$submitReport$2.invokeSuspend(CrashReporter.kt:70)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:9)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:112)
	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:4)
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:3)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:96)
	Suppressed: kotlinx.coroutines.internal.DiagnosticCoroutineContextException: [StandaloneCoroutine{Cancelling}@337a98a, Dispatchers.IO]
Flags: needinfo?(rob)

That sounds like an issue you should raise in the fenix crash reporter.

Currently, Socorro needs a JavaStackTrace value for signature generation. Changing that is a project and covered in bug #1693863.

I don't think there's anything I can do here. Unassigning myself.

Assignee: willkg → nobody
Status: ASSIGNED → NEW
Depends on: 1693863

Oops--the bug for the "let's rethink signatures for Java" is bug #1541120.

Depends on: 1541120
No longer depends on: 1693863

Bug 1541120 looks like a larger-scope issue than this one. That one is about being smarter than extracting the signature from JavaStackTrace.
https://crash-stats.mozilla.org/signature/?product=Fenix&signature=EMPTY%3A%20no%20frame%20data%20available#reports (on Nightly 118.0a1 alone, there are 160 such reported crashes in the past 7 days).

In this bug, JavaStackTrace is null, but JavaException is not (MozillaSocorroService.kt sets both at the same time, but the value may sometimes be null as seen in bug 1838389).

What would it take to extract the signature from JavaException when JavaStackTrace is null?

FYI:

Currently, Socorro requires JavaStackTrace to generate a signature. If the crash report doesn't contain a JavaStackTrace, that's a bug with the relevant crash reporter that should get figured out.

Your idea of changing signature generation to factor in JavaException seems reasonable, but it's a much bigger project than a "well, why don't we just ..." because of the way signature generation is implemented. Looks like this affects < 350 crash reports out of 1 million for Fenix in the last month. Unless there's some serious urgency here, I'm not going to get to fixing this any time soon.

The data on Crash Stats is available via APIs. You can unblock your work by writing scripts to manipulate the data to get what you want to see out of it. I have a set of utility commands to make that easier:

https://github.com/willkg/crashstats-tools

Hope that helps!

Component: Processor → Signature

FYI the [@ EMPTY: no frame data available] is currently Fenix' top crasher. It might be worth putting the signature here since those are all Java exceptions missing the Java stack trace, but since it looks like a native crash it's confusing people.

Crash Signature: [@ EMPTY: no frame data available]

(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #10)

Currently, Socorro requires JavaStackTrace to generate a signature. If the crash report doesn't contain a JavaStackTrace, that's a bug with the relevant crash reporter that should get figured out.

Will, is there any Socorro work to be done in this bug? Or can I move this bug to the Fenix::Crash Reporting component and use it to investigate what Fenix client changes might be needed to fix crash reports without a JavaStackTrace?

Flags: needinfo?(willkg)
See Also: 1245570

There should definitely be a bug/issue for Fenix and maybe android-components about why there is a JavaException, but no JavaStackTrace.

Since comment #10, it looks like the number of crash reports this affects has increased dramatically and this is now a top crasher signature. I don't think we should move this bug to Fenix::Crash Reporting. I should probably grab this and figure out what I can do about it in socorro.

Assignee: nobody → willkg
Status: NEW → ASSIGNED
Flags: needinfo?(willkg)

btw, I suspect the new crashes are bug 1846306. I looked in Sentry for top crash signatures that aren't in Socorro and found that bug. It's the top Sentry crash signature over the last 30 days, by both number of crash events and number of affected users.

The crash volume spike started August ~16, which happened to be the release date for the Fenix 116.0.3 dot release.

See Also: → 1846306

116.0.3 included a crash reporter fix to help diagnose bug 1846306.

That fix minimally disrupts things. It should only affect crash reports where we have a JavaException but no JavaStackTrace. It generates a signature just like it would have if there was a JavaStackTrace with the mild caveat that it does the right thing by not including line numbers. The current JavaStackTrace-using code includes the line numbers for non .java files. That's in bug #1851202.

I'll try to get it to production next week. Once I do, I can reprocess all the existing crash reports with the problem and they'll pick up new signatures.

willkg merged PR #6464: "bug 1847429: implement signature generation for JavaException" in b60b65b.

This will automatically deploy to the stage environment. I'll test it there and (hopefully) deploy it next week to production.

No longer depends on: 1541120
See Also: → 1541120

Also, since this involves signature generation changes, I'll write an intent-to-ship email on stability and crash-reporting-wg mailing lists before pushing it to production.

I checked stage this morning and the change looks good:

$ supersearchfacet --host=https://crash-stats.allizom.org \
    --_facets=product \
    --signature='=EMPTY: no frame data available' \
    --relative-range=2w --period=daily --format=markdown
date -- Fenix Focus total notes
2023-08-22 00:00:00 0 242 5 247
2023-08-23 00:00:00 0 271 9 280
2023-08-24 00:00:00 0 274 4 278
2023-08-25 00:00:00 0 295 4 299
2023-08-26 00:00:00 0 280 1 281
2023-08-27 00:00:00 0 310 4 314
2023-08-28 00:00:00 0 293 2 295
2023-08-29 00:00:00 0 355 9 364
2023-08-30 00:00:00 0 351 7 358
2023-08-31 00:00:00 0 282 7 289
2023-09-01 00:00:00 0 331 5 336 <-- landed fix late afternoon
2023-09-02 00:00:00 0 11 0 11
2023-09-03 00:00:00 0 8 0 8
2023-09-04 00:00:00 0 4 0 4
2023-09-05 00:00:00 0 4 0 4

Currently, there are around 50k crash reports since August 1st with this signature that will change signatures when I reprocess them.

I emailed the stability and crash-reporting-wg mailing lists with the intended deploy and reprocessing.

Thanks for this Will!

I deployed this to prod just now in bug #1851648. I'm reprocessing crash reports from 2023-08-01 through now.

I reprocessed the crash reports in that list. There are still 7,101 Fenix crash reports since 2023-08-01 which have "EMPTY: no frame data available". I spot checked those and they don't have a JavaStackTrace annotation, a JavaException annotation, or a minidump, so ... I think that's the best we're going to do for now.

When we redo signature generation for Java crash reports, we can include information from other annotations like CrashType or something like that which adds some information and differentiates between crash reports that have no frame data.

Marking this as FIXED.

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Attachment #9351658 - Attachment description: list of crash ids reprocessed → list of crash ids reprocessed: 2023-08-01 to now
Attachment #9351714 - Attachment description: second list of crash ids reprocessed 2023-07-01 - 2023-08-01 → second list of crash ids reprocessed: 2023-07-01 - 2023-08-01

I did a first round of reprocessing for crash reports >= 2023-08-01. We went from 51,320 to 7,232.

Before:

$ supersearchfacet --signature='=EMPTY: no frame data available' --date='>=2023-08-01' \
    --_facets=product --format=markdown
product count
Fenix 50344
Focus 947
ReferenceBrowser 29
total 51320

After:

$ supersearchfacet --signature='=EMPTY: no frame data available' --date='>=2023-08-01' \
    --_facets=product --format=markdown
product count
Fenix 7104
Focus 127
ReferenceBrowser 1
total 7232

At Chris' behest, I did a second round of reprocessing for crash reports >= 2023-07-01 and < 2023-08-01. We went from 3,527 to 3,527--it looks like those weren't affected.

$ supersearchfacet --signature='=EMPTY: no frame data available' --date='>=2023-07-01' --date='<2023-08-01' \
    --_facets=product --format=markdown
product count
Fenix 3493
Focus 34
total 3527

It looks like none of them have a JavaStackTrace or JavaException.

$ supersearch --signature='=EMPTY: no frame data available' --date='>=2023-07-01' --date='<2023-08-01' --num=all \
    | wc -l
3527
$ supersearch --signature='=EMPTY: no frame data available' --date='>=2023-07-01' --date='<2023-08-01' \
    --crash_report_keys=JavaStackTrace --crash_report_keys=JavaException --num=all \
    | wc -l
0

One interesting bit about JavaException is that it's a bit of a misnomer. It's actually a stack trace, just in a different format compared to JavaStackTrace. Anyway, I feel like the remaining fixes need to happen in Fenix' crash handler. We can close this bug as fixed and open a new one in Fenix crash handler to make sure it tries harder to populate at least one of the two annotations.

Depends on: 1851898

I filed bug 1851898 to fix the Fenix crash reporter.

Should crash reports include both JavaException and JavaStackTrace annotations? Or prefer JavaException? Bug 1792902 asks if we should retire JavaStackTrace now that we have JavaException.

Currently, Socorro depends on JavaStackTrace. We'd need to figure out what's involved in changing that and then change it. I wrote up bug #1851903 for that work. Until that work is completed, we need at least JavaStackTrace for the foreseeable future.

See Also: → 1863336
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: