Crash in [@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER] for Android
Categories
(Fenix :: Crash Reporting, defect, P3)
Tracking
(firefox78 wontfix, firefox79 wontfix, firefox80 wontfix, firefox96 wontfix, firefox97 wontfix, firefox98 wontfix, firefox100 wontfix, firefox101 wontfix, firefox102 wontfix, firefox106 wontfix, firefox107 wontfix, firefox108 wontfix)
People
(Reporter: fluffyemily, Unassigned)
References
(Depends on 1 open bug)
Details
(Keywords: crash, Whiteboard: [geckoview:2022h2?])
Crash Data
This bug is for crash report bp-abe9b05f-1857-42c0-9836-04e6c0200609.
Comment 1•4 years ago
|
||
I may have identified this bug as being related to mozilla.components.browser.engine.gecko.fetch.GeckoViewFetchClient
or geckoview.GeckoWebExecutor.fetch
.
For some reason this one does not show up properly in either Sentry or Socorro but the Play Store has good crash reports, all in the area of Fetch.
Here is one trace:
java.lang.IllegalArgumentException:
at org.mozilla.geckoview.GeckoWebExecutor.fetch (GeckoWebExecutor.java:12)
at mozilla.components.browser.engine.gecko.fetch.GeckoViewFetchClient.fetch (GeckoViewFetchClient.kt:70)
at mozilla.components.feature.downloads.AbstractFetchDownloadService.performDownload$feature_downloads_release (AbstractFetchDownloadService.kt:12)
at mozilla.components.feature.downloads.AbstractFetchDownloadService.startDownloadJob$feature_downloads_release (AbstractFetchDownloadService.kt:3)
at mozilla.components.feature.downloads.AbstractFetchDownloadService$onStartCommand$1.invokeSuspend (AbstractFetchDownloadService.kt:5)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith (ContinuationImpl.kt:2)
at kotlinx.coroutines.DispatchedTask.run (DispatchedTask.kt:19)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely (CoroutineScheduler.kt:1)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run (CoroutineScheduler.kt:14)
Comment 2•4 years ago
|
||
If these Fetch crashes are indeed bundled under the NativeCodeCrash in Sentry then we may want to address this ASAP with an update because the volume has increased 5x in the past day.
Reporter | ||
Updated•4 years ago
|
Comment 3•4 years ago
|
||
I don't think Fetch
is solely responsible for this, we have 2200 reports in the last week for this in nightly https://crash-stats.mozilla.org/topcrashers/?product=Fenix&version=0.0a1 but the play console doesn't show any crashes for Fetch in the same timeframe.
Reporter | ||
Comment 4•4 years ago
|
||
We have seen a huge reduction in the incidents of this bug since June 7.
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Comment 8•4 years ago
|
||
Gabriele, do we have any new information about this?
Comment 9•4 years ago
|
||
Sadly not yet. We've implemented error reporting for minidump generation in bug 1666733 but it's enabled only with the oxydized minidump generator. Said minidump generator is only implemented for x86 and x86-64 at the moment and the ARM/AArch64 implementation in bug 1689358 is stuck because we first need to upgrade the libc crate in Gecko. So sadly I must report that this has stalled for now but I hope we'll be able to move it forward soon(ish).
Comment 10•3 years ago
|
||
This is a signature change caused by switching Socorro's stack walker to the new oxidized version. On the topic of oxidation we haven't enable ARM minidump generation yet - and thus proper error recording - but we should be able to do it before the end of January.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 11•3 years ago
|
||
This crash is blowing up in nightly right now. Raising priority.
Looks like the increase started with 20220209095640
.
Comment 12•3 years ago
|
||
Enabling the GPU process landed within that nightly, it might be related. https://hg.mozilla.org/mozilla-central/rev/f93a4ff5c045531102de678c93951deb095137d6
Reporter | ||
Updated•3 years ago
|
Comment 13•3 years ago
•
|
||
I can reliably reproduce this crash on a low-end device (Samsung A5) doing this:
pm clear org.mozilla.fenix
- Open Fenix, load cnn.com, put Fenix to background
- Open gmail load a few emails
- Go back to fenix -> "Sorry Fenix has crashed" tab
If I disable the GPU process, I don't get a crash, my guess is that we're not handling GPU process kills correctly.
Comment 14•3 years ago
|
||
We rolled back Bug 1331109 which should hopefully make this crash go away.
Comment 15•3 years ago
|
||
Opened Bug 1755375 to handle GPU process kills correctly, which caused the spike in crashes in this bug.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 16•3 years ago
|
||
FYI we'll soon have detailed error information about the most common error that causes failures in minidump generations on Android. Given enough nightly users a couple of weeks from now we should get to the bottom of this.
Comment 17•3 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #16)
FYI we'll soon have detailed error information about the most common error that causes failures in minidump generations on Android. Given enough nightly users a couple of weeks from now we should get to the bottom of this.
Hi Gabriele, do you see any changes in Android minidump errors? AFAICT, Socorro reports roughly the same number of "EmptyMinidump" crash reports from Android Nightly 102.0a1 (933) as 101.0a1 (954):
Comment 18•3 years ago
•
|
||
Yes, we finally have the reasons why we're failing to write out the minidumps, see this.
So out of the 34 recent crashes one thirds has a No threads left to suspend (out of X) error and another third has Error during init phase: IO error for file /proc/<pid>/auxv: Permission denied (os error 13) errors.
Regarding the first error it's probably happening because it's too late to write a minidump. I wonder if we could experiment with sending a SIGSTOP
instead or use ptrace() with PTRACE_ATTACH. Regarding the latter error we can probably do away with the contents of the auxiliary vector and still write a mostly complete minidump. Additionally we might ptrace the auxiliary vector directly out of the crashed process if reading the corresponding /proc file fails. I'll file bugs for both.
Edit: We already use PTRACE_ATTACH on every thread, sending it to the PID only stops that thread not the whole process but it's possible to SIGSTOP a process and then attach when the threads have already been stopped.
Comment 19•3 years ago
|
||
Filed minidump-writer issue #27 for handling the auxiliary vector, that should be a relatively easy fix.
Comment 20•3 years ago
|
||
... and filed minidump-writer issue #28 for the thread suspension problem.
Updated•3 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 21•2 years ago
|
||
Signature change
Updated•2 years ago
|
Updated•2 years ago
|
Comment 23•2 years ago
|
||
Gab, should bug 1360392 merged into this one ?
Comment 24•2 years ago
|
||
No, they're different issues as the fixes are different for mobile and desktop. Unfortunately the crash signatures are the same as we can't tell them apart.
Comment 25•2 years ago
|
||
ok, sorry, i missed the information in the subject!
Comment 26•2 years ago
|
||
Dropping priority from P2 to P3 because this bug is not currently actionable for the Android engineering team. We're waiting for the new ARM minidump writer in bug 1689358.
Comment 27•1 year ago
•
|
||
Any new priority on this for either mobile now that bug 1689358 is resolved?
Comment 28•1 year ago
|
||
Yeah, I'm now actively working on bug 1620998 which should allow me to eliminate the last part of the minidump generation pipeline that cause these failures. It will take a few months as it's quite a bit of work, but it's been actively worked.
Updated•1 year ago
|
Comment hidden (offtopic) |
Comment hidden (offtopic) |
Comment hidden (obsolete) |
Comment hidden (offtopic) |
Comment 34•2 months ago
|
||
Denis, please stop adding links to all these individual reports. We have access to all of them already and it isn't helping to resolve this problem. Per comment 28, the root cause of this specific issue is already understood and being worked on.
Comment 35•2 months ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #34)
Denis, please stop adding links to all these individual reports. We have access to all of them already and it isn't helping to resolve this problem. Per comment 28, the root cause of this specific issue is already understood and being worked on.
Thanks for the feedback!
OK, if it doesn't add any value, then of course I won't do it in future. It also saves me work.
Description
•