Investigate to determine why MOZ_CRASH_UNSAFE_PRINTF was not reported in launch crash
Categories
(GeckoView :: Core, task, P2)
Tracking
(Not tracked)
People
(Reporter: zmckenney, Assigned: zmckenney)
Details
(Whiteboard: [geckoview:m111][geckoview:m112][geckoview:m113])
In Bug 1807716 it appears MOZ_CRASH_UNSAFE_PRINTF
was not caught by the crash reporter and we had no reporting on a launch crash. This ticket is to investigate why this wasn't reported and whether we are currently capturing other variations of MOZ_CRASH_*
without issue.
Assignee | ||
Updated•1 year ago
|
Updated•1 year ago
|
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 2•1 year ago
|
||
Investigation Results
Problem 1.) When a user opened the app they could crash AFTER the CrashReporter
was created (at initializeGlean()
in FenixApplication
) which after logging the native code shows MOZ_CRASH_UNSAFE_PRINTF
properly asserts and sends to MOZ_Crash
which completes as expected. Higher in the stack this crash was not caught and recorded to file or reported in my testing (more details below). Crash in Glean was here.
Problem 2.) When a user opened the app they could crash BEFORE the CrashReporter
was created if there was a crash file that was found. This occurred as soon as GleanCrashReporterService
is created. This is because the file is parsed and it is added to CrashMetrics.crashCount
in AC via this line. That add()
function in turn calls the native code which causes the crash.
I suspect when we merged this PR to move the engine warmup above initializeGlean
this "fixed" the initializeGlean
crash. If the user did not have a crash file at next launch they would not see the app crashing anymore. If the user DID have a crash file (whether due to problem 1 or not) they would encounter Problem 2 which would not report because it was before CrashReporter
has been created. Also note, if at any point the user had a new crash file they would be stuck crashing without reporting (such as going to about:crashparent
).
Extra Details
If a user updated after we pushed to nightly the PR fix above (and with no crash file), they would be able to navigate to pages which would break (ex. ign.com) but would still report.
A potential reason Problem 1 did not get reported was because of the process being killed and a DeadObjectException
being thrown.
2023-01-30 22:05:51.449 1339-1366 BootReceiver system_process I Copying /data/tombstones/tombstone_14 to DropBox (SYSTEM_TOMBSTONE)
2023-01-30 22:05:51.456 1339-20581 ActivityManager system_process W Exception thrown during pause
android.os.DeadObjectException
at android.os.BinderProxy.transactNative(Native Method)
at android.os.BinderProxy.transact(Binder.java:764)
at android.app.IApplicationThread$Stub$Proxy.schedulePauseActivity(IApplicationThread.java:1079)
at com.android.server.am.ActivityStack.startPausingLocked(ActivityStack.java:1347)
at com.android.server.am.ActivityStack.finishActivityLocked(ActivityStack.java:3779)
at com.android.server.am.ActivityStack.finishActivityLocked(ActivityStack.java:3721)
at com.android.server.am.ActivityStack.finishTopRunningActivityLocked(ActivityStack.java:3602)
at com.android.server.am.ActivityStackSupervisor.finishTopRunningActivityLocked(ActivityStackSupervisor.java:2124)
at com.android.server.am.AppErrors.handleAppCrashLocked(AppErrors.java:668)
at com.android.server.am.AppErrors.makeAppCrashingLocked(AppErrors.java:500)
at com.android.server.am.AppErrors.crashApplicationInner(AppErrors.java:376)
at com.android.server.am.AppErrors.crashApplication(AppErrors.java:321)
at com.android.server.am.ActivityManagerService.handleApplicationCrashInner(ActivityManagerService.java:14375)
at com.android.server.am.NativeCrashListener$NativeCrashReporter.run(NativeCrashListener.java:85)
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 3•1 year ago
|
||
A better and more final answer to "why MOZ_CRASH_UNSAFE_PRINTF was not reported in launch crash" is that LaunchCrashHandlerService
had not yet been created when the crash occurred in the libmozglue.
Adding a log in MinidumpCallback()
which invokes the launch crash handler service here and also adding a MOZ_CRASH_UNSAFE_PRINTF()
in loadGeckoLibs()
here validates this.
Assignee | ||
Updated•1 year ago
|
Comment 4•3 months ago
|
||
The Android team has not been keeping our P1 bug list up to date, so we're resetting all our P1 bugs to P2 to avoid signalling that we're actively working on bugs that we're not. The BMO documentation https://wiki.mozilla.org/BMO/UserGuide/BugFields#priority says P1 means "fix in the current release cycle" and P2 means "fix in the next release cycle or the following (nightly + 1 or nightly + 2)".
If you are actively working on this bug and expect to ship it in Fx 122 or 123, then please restore the priority back to P1.
Description
•