Open Bug 1644486 Opened 4 years ago Updated 9 months ago

Crash in [@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER] for Android

Categories

(Fenix :: Crash Reporting, defect, P3)

All
Android
defect

Tracking

(firefox78 wontfix, firefox79 wontfix, firefox80 wontfix, firefox96 wontfix, firefox97 wontfix, firefox98 wontfix, firefox100 wontfix, firefox101 wontfix, firefox102 wontfix, firefox106 wontfix, firefox107 wontfix, firefox108 wontfix)

Tracking Status
firefox78 --- wontfix
firefox79 --- wontfix
firefox80 --- wontfix
firefox96 --- wontfix
firefox97 --- wontfix
firefox98 --- wontfix
firefox100 --- wontfix
firefox101 --- wontfix
firefox102 --- wontfix
firefox106 --- wontfix
firefox107 --- wontfix
firefox108 --- wontfix

People

(Reporter: fluffyemily, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: crash, Whiteboard: [geckoview:2022h2?])

Crash Data

This bug is for crash report bp-abe9b05f-1857-42c0-9836-04e6c0200609.

I may have identified this bug as being related to mozilla.components.browser.engine.gecko.fetch.GeckoViewFetchClient or geckoview.GeckoWebExecutor.fetch.

For some reason this one does not show up properly in either Sentry or Socorro but the Play Store has good crash reports, all in the area of Fetch.

Here is one trace:

java.lang.IllegalArgumentException: 
 
  at org.mozilla.geckoview.GeckoWebExecutor.fetch (GeckoWebExecutor.java:12)
 
  at mozilla.components.browser.engine.gecko.fetch.GeckoViewFetchClient.fetch (GeckoViewFetchClient.kt:70)
 
  at mozilla.components.feature.downloads.AbstractFetchDownloadService.performDownload$feature_downloads_release (AbstractFetchDownloadService.kt:12)
 
  at mozilla.components.feature.downloads.AbstractFetchDownloadService.startDownloadJob$feature_downloads_release (AbstractFetchDownloadService.kt:3)
 
  at mozilla.components.feature.downloads.AbstractFetchDownloadService$onStartCommand$1.invokeSuspend (AbstractFetchDownloadService.kt:5)
 
  at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith (ContinuationImpl.kt:2)
 
  at kotlinx.coroutines.DispatchedTask.run (DispatchedTask.kt:19)
 
  at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely (CoroutineScheduler.kt:1)
 
  at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run (CoroutineScheduler.kt:14)

More at https://play.google.com/apps/publish/?account=7083182635971239206#AndroidMetricsErrorsPlace:p=org.mozilla.firefox_beta&appid=4972447553788559254&appVersion=2015744455,2015744453,2015744451,2015744449&clusterName=apps/org.mozilla.firefox_beta/clusters/f47f35ec&detailsAppVersion=2015744455,2015744453,2015744451,2015744449&detailsSpan=7

If these Fetch crashes are indeed bundled under the NativeCodeCrash in Sentry then we may want to address this ASAP with an update because the volume has increased 5x in the past day.

Assignee: nobody → agi
Severity: -- → S2
Priority: -- → P1

I don't think Fetch is solely responsible for this, we have 2200 reports in the last week for this in nightly https://crash-stats.mozilla.org/topcrashers/?product=Fenix&version=0.0a1 but the play console doesn't show any crashes for Fetch in the same timeframe.

We have seen a huge reduction in the incidents of this bug since June 7.

Crash Signature: [@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER] → [@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER] [@ EMPTY: no crashing thread identified]
Depends on: 1655196
OS: Unspecified → Android
Summary: Crash in [@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER] → Crash in [@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER] for fenix
Whiteboard: [geckoview:m83]
Whiteboard: [geckoview:m83]
Severity: S2 → S3
Priority: P1 → P2

Mostly waiting for Bug 1666733 at this point.

Assignee: agi → nobody

Gabriele, do we have any new information about this?

Flags: needinfo?(gsvelto)

Sadly not yet. We've implemented error reporting for minidump generation in bug 1666733 but it's enabled only with the oxydized minidump generator. Said minidump generator is only implemented for x86 and x86-64 at the moment and the ARM/AArch64 implementation in bug 1689358 is stuck because we first need to upgrade the libc crate in Gecko. So sadly I must report that this has stalled for now but I hope we'll be able to move it forward soon(ish).

Flags: needinfo?(gsvelto)

This is a signature change caused by switching Socorro's stack walker to the new oxidized version. On the topic of oxidation we haven't enable ARM minidump generation yet - and thus proper error recording - but we should be able to do it before the end of January.

Crash Signature: [@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER] [@ EMPTY: no crashing thread identified] → [@ EMPTY: no crashing thread identified; EmptyMinidump] [@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER] [@ EMPTY: no crashing thread identified]

This crash is blowing up in nightly right now. Raising priority.

Looks like the increase started with 20220209095640.

Severity: S3 → S2
Priority: P2 → P1

Enabling the GPU process landed within that nightly, it might be related. https://hg.mozilla.org/mozilla-central/rev/f93a4ff5c045531102de678c93951deb095137d6

Whiteboard: [geckoview:m99]

I can reliably reproduce this crash on a low-end device (Samsung A5) doing this:

  • pm clear org.mozilla.fenix
  • Open Fenix, load cnn.com, put Fenix to background
  • Open gmail load a few emails
  • Go back to fenix -> "Sorry Fenix has crashed" tab

If I disable the GPU process, I don't get a crash, my guess is that we're not handling GPU process kills correctly.

We rolled back Bug 1331109 which should hopefully make this crash go away.

Opened Bug 1755375 to handle GPU process kills correctly, which caused the spike in crashes in this bug.

Priority: P1 → P2
Whiteboard: [geckoview:m99]
See Also: → 1757854
Crash Signature: [@ EMPTY: no crashing thread identified; EmptyMinidump] [@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER] [@ EMPTY: no crashing thread identified] → [@ EMPTY: no crashing thread identified] [@ EMPTY: no crashing thread identified; EmptyMinidump] [@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER] [@ EMPTY: no crashing thread identified; MissingThreadList]
Component: General → Stability
Product: GeckoView → Fenix
Summary: Crash in [@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER] for fenix → Crash in [@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER] for Android

FYI we'll soon have detailed error information about the most common error that causes failures in minidump generations on Android. Given enough nightly users a couple of weeks from now we should get to the bottom of this.

(In reply to Gabriele Svelto [:gsvelto] from comment #16)

FYI we'll soon have detailed error information about the most common error that causes failures in minidump generations on Android. Given enough nightly users a couple of weeks from now we should get to the bottom of this.

Hi Gabriele, do you see any changes in Android minidump errors? AFAICT, Socorro reports roughly the same number of "EmptyMinidump" crash reports from Android Nightly 102.0a1 (933) as 101.0a1 (954):

https://crash-stats.mozilla.org/search/?release_channel=nightly&signature=EmptyMinidump&product=Focus&product=Fenix&date=%3E%3D2021-11-23T17%3A22%3A00.000Z&date=%3C2022-05-23T17%3A22%3A00.000Z&_facets=product&_facets=version&_facets=signature&_facets=build_id&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-version

Flags: needinfo?(gsvelto)

Yes, we finally have the reasons why we're failing to write out the minidumps, see this.

So out of the 34 recent crashes one thirds has a No threads left to suspend (out of X) error and another third has Error during init phase: IO error for file /proc/<pid>/auxv: Permission denied (os error 13) errors.

Regarding the first error it's probably happening because it's too late to write a minidump. I wonder if we could experiment with sending a SIGSTOP instead or use ptrace() with PTRACE_ATTACH. Regarding the latter error we can probably do away with the contents of the auxiliary vector and still write a mostly complete minidump. Additionally we might ptrace the auxiliary vector directly out of the crashed process if reading the corresponding /proc file fails. I'll file bugs for both.

Edit: We already use PTRACE_ATTACH on every thread, sending it to the PID only stops that thread not the whole process but it's possible to SIGSTOP a process and then attach when the threads have already been stopped.

Flags: needinfo?(gsvelto)

Filed minidump-writer issue #27 for handling the auxiliary vector, that should be a relatively easy fix.

... and filed minidump-writer issue #28 for the thread suspension problem.

Whiteboard: [geckoview:2022h2?]
Depends on: 1588530
Depends on: 1793784

Signature change

Crash Signature: [@ EMPTY: no crashing thread identified] [@ EMPTY: no crashing thread identified; EmptyMinidump] [@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER] [@ EMPTY: no crashing thread identified; MissingThreadList] → [@ EMPTY: no crashing thread identified] [@ EMPTY: no crashing thread identified; EmptyMinidump] [@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER] [@ EMPTY: no crashing thread identified; MissingThreadList] [@ EMPTY: no frame data ava…
See Also: → 1710940
Component: Stability → Crash Reporting
See Also: → 1245570
Duplicate of this bug: 1803899

Gab, should bug 1360392 merged into this one ?

No, they're different issues as the fixes are different for mobile and desktop. Unfortunately the crash signatures are the same as we can't tell them apart.

No longer duplicate of this bug: 1803899

ok, sorry, i missed the information in the subject!

Dropping priority from P2 to P3 because this bug is not currently actionable for the Android engineering team. We're waiting for the new ARM minidump writer in bug 1689358.

Priority: P2 → P3

Any new priority on this for either mobile now that bug 1689358 is resolved?

Flags: needinfo?(royang)

Yeah, I'm now actively working on bug 1620998 which should allow me to eliminate the last part of the minidump generation pipeline that cause these failures. It will take a few months as it's quite a bit of work, but it's been actively worked.

thanks gsvelto!

Flags: needinfo?(royang)
You need to log in before you can comment on or make changes to this bug.