1808616 - Investigate to determine why MOZ_CRASH_UNSAFE_PRINTF was not reported in launch crash

Assignee

Description

•

1 year ago

In Bug 1807716 it appears MOZ_CRASH_UNSAFE_PRINTF was not caught by the crash reporter and we had no reporting on a launch crash. This ticket is to investigate why this wasn't reported and whether we are currently capturing other variations of MOZ_CRASH_* without issue.

Zac McKenney [:zmckenney]

Assignee

Updated

•

1 year ago

Component: Crash Reporting → Core

Product: Fenix → GeckoView

Chris Peterson [:cpeterson]

Comment 1

•

1 year ago

111

Severity: -- → N/A

Rank: 210

Priority: -- → P2

Whiteboard: [geckoview:m111]

Chris Peterson [:cpeterson]

Updated

•

1 year ago

Rank: 210 → 111

Zac McKenney [:zmckenney]

Assignee

Updated

•

1 year ago

Assignee: nobody → zmckenney

Zac McKenney [:zmckenney]

Assignee

Comment 2

•

1 year ago

Investigation Results

Problem 1.) When a user opened the app they could crash AFTER the CrashReporter was created (at initializeGlean() in FenixApplication) which after logging the native code shows MOZ_CRASH_UNSAFE_PRINTF properly asserts and sends to MOZ_Crash which completes as expected. Higher in the stack this crash was not caught and recorded to file or reported in my testing (more details below). Crash in Glean was here.

Problem 2.) When a user opened the app they could crash BEFORE the CrashReporter was created if there was a crash file that was found. This occurred as soon as GleanCrashReporterService is created. This is because the file is parsed and it is added to CrashMetrics.crashCount in AC via this line. That add() function in turn calls the native code which causes the crash.

I suspect when we merged this PR to move the engine warmup above initializeGlean this "fixed" the initializeGlean crash. If the user did not have a crash file at next launch they would not see the app crashing anymore. If the user DID have a crash file (whether due to problem 1 or not) they would encounter Problem 2 which would not report because it was before CrashReporter has been created. Also note, if at any point the user had a new crash file they would be stuck crashing without reporting (such as going to about:crashparent).

Extra Details

If a user updated after we pushed to nightly the PR fix above (and with no crash file), they would be able to navigate to pages which would break (ex. ign.com) but would still report.

A potential reason Problem 1 did not get reported was because of the process being killed and a DeadObjectException being thrown.

2023-01-30 22:05:51.449  1339-1366  BootReceiver            system_process                       I  Copying /data/tombstones/tombstone_14 to DropBox (SYSTEM_TOMBSTONE)
2023-01-30 22:05:51.456  1339-20581 ActivityManager         system_process                       W  Exception thrown during pause
                                                                                                    android.os.DeadObjectException
                                                                                                    	at android.os.BinderProxy.transactNative(Native Method)
                                                                                                    	at android.os.BinderProxy.transact(Binder.java:764)
                                                                                                    	at android.app.IApplicationThread$Stub$Proxy.schedulePauseActivity(IApplicationThread.java:1079)
                                                                                                    	at com.android.server.am.ActivityStack.startPausingLocked(ActivityStack.java:1347)
                                                                                                    	at com.android.server.am.ActivityStack.finishActivityLocked(ActivityStack.java:3779)
                                                                                                    	at com.android.server.am.ActivityStack.finishActivityLocked(ActivityStack.java:3721)
                                                                                                    	at com.android.server.am.ActivityStack.finishTopRunningActivityLocked(ActivityStack.java:3602)
                                                                                                    	at com.android.server.am.ActivityStackSupervisor.finishTopRunningActivityLocked(ActivityStackSupervisor.java:2124)
                                                                                                    	at com.android.server.am.AppErrors.handleAppCrashLocked(AppErrors.java:668)
                                                                                                    	at com.android.server.am.AppErrors.makeAppCrashingLocked(AppErrors.java:500)
                                                                                                    	at com.android.server.am.AppErrors.crashApplicationInner(AppErrors.java:376)
                                                                                                    	at com.android.server.am.AppErrors.crashApplication(AppErrors.java:321)
                                                                                                    	at com.android.server.am.ActivityManagerService.handleApplicationCrashInner(ActivityManagerService.java:14375)
                                                                                                    	at com.android.server.am.NativeCrashListener$NativeCrashReporter.run(NativeCrashListener.java:85)

Zac McKenney [:zmckenney]

Assignee

Updated

•

1 year ago

Whiteboard: [geckoview:m111] → [geckoview:m111][geckoview:m112]

Zac McKenney [:zmckenney]

Assignee

Updated

•

1 year ago

Updated

•

1 year ago

Updated

•

1 year ago

Priority: P2 → P1

Zac McKenney [:zmckenney]

Assignee

Comment 3

•

1 year ago

A better and more final answer to "why MOZ_CRASH_UNSAFE_PRINTF was not reported in launch crash" is that LaunchCrashHandlerService had not yet been created when the crash occurred in the libmozglue.

Adding a log in MinidumpCallback() which invokes the launch crash handler service here and also adding a MOZ_CRASH_UNSAFE_PRINTF() in loadGeckoLibs() here validates this.

Zac McKenney [:zmckenney]

Assignee

Updated

•

1 year ago

Whiteboard: [geckoview:m111][geckoview:m112] → [geckoview:m111][geckoview:m112][geckoview:m113]

Zac McKenney [:zmckenney]

Assignee

Updated

•

1 year ago

Updated

•

1 year ago

Comment 4

•

3 months ago

The Android team has not been keeping our P1 bug list up to date, so we're resetting all our P1 bugs to P2 to avoid signalling that we're actively working on bugs that we're not. The BMO documentation https://wiki.mozilla.org/BMO/UserGuide/BugFields#priority says P1 means "fix in the current release cycle" and P2 means "fix in the next release cycle or the following (nightly + 1 or nightly + 2)".

If you are actively working on this bug and expect to ship it in Fx 122 or 123, then please restore the priority back to P1.

Priority: P1 → P2

Bugzilla

Quick Search

Investigate to determine why MOZ_CRASH_UNSAFE_PRINTF was not reported in launch crash

Categories

(GeckoView :: Core, task, P2)

Tracking

(Not tracked)

People

(Reporter: zmckenney, Assigned: zmckenney)

References

Details

(Whiteboard: [geckoview:m111][geckoview:m112][geckoview:m113])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Updated

Comment 2

Investigation Results

Extra Details

Updated

Updated

Updated

Updated

Comment 3

Updated

Updated

Updated

Comment 4