Crash reporter dialog not working on Android 4.2

VERIFIED FIXED in Firefox 17

Status

()

VERIFIED FIXED
6 years ago
2 years ago

People

(Reporter: kats, Assigned: kats)

Tracking

({reproducible})

unspecified
Firefox 19
ARM
Android
reproducible
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox17+ verified, firefox18+ verified, firefox19 verified, firefox-esr17 fixed)

Details

Attachments

(1 attachment, 1 obsolete attachment)

There are reports that the crash reporter dialog isn't showing (and we're not getting crash stats) on Android 4.2. See for example:

https://bugzilla.mozilla.org/show_bug.cgi?id=802510#c27
https://bugzilla.mozilla.org/show_bug.cgi?id=811657#c1
https://bugzilla.mozilla.org/show_bug.cgi?id=811657#c3

Comment 1

6 years ago
Aaron is going to verify whether or not we see a crash reporter for another reproducible crash after bug 802510 is resolved. This may be another mobile issue we need to resolve for FF17.
Assignee: nobody → bugmail.mozilla
tracking-firefox17: --- → ?
tracking-firefox18: --- → ?
QA Contact: general → aaron.train
On Firefox Beta, bug 804600 comment #1 has a reproducible crash (with 17 affected) and I don't see a crash reporter. Also, looking for a link to the latest CrashMe I can try out.
Confirming - Nexus 7 (Android 4.2), no crash reporter (bug 804600 comment #1); while my Galaxy Note II (Android 4.1), invokes the crash reporter.
Summary: Crash reporter dialog may not be working on Android 4.2 → Crash reporter dialog not working on Android 4.2
Also confirming via CrashMe http://code.google.com/p/crashme/

Updated

6 years ago
tracking-firefox17: ? → +
tracking-firefox18: ? → +

Updated

6 years ago
Keywords: reproducible
Hardware: All → ARM

Comment 5

6 years ago
As a note for landing once resolved - we'll want to immediately land to mozilla-aurora and mozilla-beta (with a possible re-spin), followed by landing to mozilla-release once we're confident in the change (merge to release is happening today already).
I'm doing a local build now with the crash reporter enabled so I can debug this better. In the meantime I looked at the code and see that we invoke the crash reporter by doing an exec call to /system/bin/am and broadcasting the reportCrash intent back to ourselves. [1]

I ran that exec command directly on the shell on a GN 4.2 device and it does bring up the crash reporter dialog, so that part is working properly. So something is going wrong before that point, or (more likely) the am start timing/implementation changed and the intent is getting lost somehow while exec'ing am start.

[1] http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/nsExceptionHandler.cpp#698
As far as I can tell this is happening because the multi-user support in 4.2 broke some things with respect to starting processes. Specifically, the exec call we do at [1] runs but the am command fails. I see this in the logcat output:

11-15 01:29:37.771 D/AndroidRuntime( 6089): Calling main entry com.android.commands.am.Am
11-15 01:29:37.771 D/dalvikvm( 6089): Note: class Landroid/app/ActivityManagerNative; has 156 unimplemented (abstract) methods
11-15 01:29:37.779 I/AndroidRuntime( 6089): VM exiting with result code 1.
11-15 01:29:37.779 W/ActivityManager(  387): Permission Denial: startActivity asks to run as user -2 but is calling from user 0; this requires android.permission.INTERACT_ACROSS_USERS_FULL

This happens even if I add the INTERACT_ACROSS_USERS_FULL permission to Fennec. There's very little documentation about the multi-user stuff so I don't know how or why it's tripping us up yet.

[1] http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/nsExceptionHandler.cpp#698
Created attachment 682124 [details] [diff] [review]
Patch

Thanks to a bunch of help from the people in the deep dive we were able to come up with an explanation as to why this is happening - basically the fork call is losing the user id and so the am call gets run as the wrong user. The attached patch gets the user id in java on startup and stashes it in an env variable so that it can be read at the point of the crash. This patch should be backwards compatible to pre-4.2 as well since it will just not set the env variable and the crash reporter code path will fall back to the old code. I haven't tested that yet though.

Requesting double review since this patch will likely get uplifted to 17 and it's better to be safe than sorry.
Attachment #682124 - Flags: review?(snorp)
Attachment #682124 - Flags: review?(blassey.bugs)
Comment on attachment 682124 [details] [diff] [review]
Patch

Review of attachment 682124 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +696,5 @@
>  #else
> +    // on Android 4.2 and above there is a user associated
> +    // with the current process that gets lost when we fork
> +    // so we need to explicitly pass it to am
> +    char* androidUserId = getenv("MOZ_ANDROID_USER_SERIAL_NUMBER");

It'd be better if you could get this env var before this point, since this is in a callback from a signal handler and thus the process is in a potentially unsafe state.
Comment on attachment 682124 [details] [diff] [review]
Patch

Review of attachment 682124 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +701,3 @@
>      // Invoke the reportCrash activity using am
> +    if (androidUserId) {
> +      (void) execlp("/system/bin/am",

Would arguably be nicer to build the args procedurally so there wasn't so much duplicated stuff, but whatevs.
Attachment #682124 - Flags: review?(snorp) → review+
Comment on attachment 682124 [details] [diff] [review]
Patch

Review of attachment 682124 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/GeckoAppShell.java
@@ +448,5 @@
>          f = context.getCacheDir();
>          GeckoAppShell.putenv("CACHE_DIRECTORY=" + f.getPath());
>  
> +        if (Build.VERSION.SDK_INT >= 17) {
> +            android.os.UserManager um = (android.os.UserManager)context.getSystemService(Context.USER_SERVICE);

please use reflection to get the UserManager so we don't require the 17 SDK
Attachment #682124 - Flags: review?(snorp)
Attachment #682124 - Flags: review?(blassey.bugs)
Attachment #682124 - Flags: review-
Attachment #682124 - Flags: review+
Created attachment 682147 [details] [diff] [review]
Patch v2

Now with more reflection. I also do the getenv call before registering the handler so it should be safer.
Attachment #682124 - Attachment is obsolete: true
Attachment #682124 - Flags: review?(snorp)
Attachment #682147 - Flags: review?(snorp)
Attachment #682147 - Flags: review?(blassey.bugs)
Comment on attachment 682147 [details] [diff] [review]
Patch v2

Review of attachment 682147 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, and worked fine on my nexus 7 with 4.2
Attachment #682147 - Flags: review?(snorp) → review+
I also tested it on a Galaxy Nexus running 4.2 with a single user account. It still needs to be tested on a device running pre-4.2 software, which I will do shortly.

https://tbpl.mozilla.org/?tree=Try&rev=57e08143a586 to ensure it builds on all platforms and with SDK version 16 (default on TBPL).
(In reply to Kartikaya Gupta (:kats) from comment #14)
> I also tested it on a Galaxy Nexus running 4.2 with a single user account.
> It still needs to be tested on a device running pre-4.2 software, which I
> will do shortly.
> 

Tested on a One X running 4.1.2, crashed FF using kill -6 and the crash reporter came up as expected.
Comment on attachment 682147 [details] [diff] [review]
Patch v2

Review of attachment 682147 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/GeckoAppShell.java
@@ +459,5 @@
> +                Log.d(LOGTAG, "Unable to obtain user manager service on a device with SDK version " + Build.VERSION.SDK_INT);
> +            }
> +        }
> +        */
> +        try {

check  if (Build.VERSION.SDK_INT >= 17)  before entering this try block
Attachment #682147 - Flags: review?(blassey.bugs) → review+
How risky is this patch for pre-4.2 users? How about post-4.2 users?
(In reply to Brad Lassey [:blassey] from comment #16)
> check  if (Build.VERSION.SDK_INT >= 17)  before entering this try block

If I do that the code will get compiled out entirely since we're building with SDK_INT == 16. It's a compile-time constant.
(In reply to Brad Lassey [:blassey] from comment #16)
> check  if (Build.VERSION.SDK_INT >= 17)  before entering this try block

Landed without this change since it didn't seem to work with the change and I'd rather not make more changes on top of the tested patch.

https://hg.mozilla.org/integration/mozilla-inbound/rev/dc61a415f3e9
(In reply to Alex Keybl [:akeybl] from comment #17)
> How risky is this patch for pre-4.2 users? How about post-4.2 users?

So for pre-4.2 users it should be pretty safe. For post-4.2 users it's a little less safe, mostly because we're not really sure what other changes android has in 4.2 and if other use cases will expose bugs. However I still think there's a relatively low chance of regressing existing behaviour, the risk is more than we don't fix the problem completely and the crash reporter still doesn't show up in some cases.
Comment on attachment 682147 [details] [diff] [review]
Patch v2

[Approval Request Comment]
Bug caused by (feature/regressing bug #): none
User impact if declined: On android 4.2 and up the crash reporter doesn't appear when fennec crashes. (We also don't get any crash data)
Testing completed (on m-c, etc.): on local builds
Risk to taking this patch (and alternatives if risky): see previous comment. All the changes should affect android only
String or UUID changes made by this patch: none
Attachment #682147 - Flags: approval-mozilla-beta?
Attachment #682147 - Flags: approval-mozilla-aurora?
Comment on attachment 682147 [details] [diff] [review]
Patch v2

[Triage Comment]
Low risk, enables us to grab crash reports in Android 4.2. Approving for all branches - we're going to spin up a beta 7 for this changeset.
Attachment #682147 - Flags: approval-mozilla-release+
Attachment #682147 - Flags: approval-mozilla-beta?
Attachment #682147 - Flags: approval-mozilla-beta+
Attachment #682147 - Flags: approval-mozilla-aurora?
Attachment #682147 - Flags: approval-mozilla-aurora+
Comment on attachment 682147 [details] [diff] [review]
Patch v2

For completness's sake, let's also land on mozilla-esr17 just in case.
Attachment #682147 - Flags: approval-mozilla-esr17+
I also just want to say, the turnaround on this bug was fantastic, and precluded the need for 17.0.1.

\o/ \o/ to Kats, Aaron, Brad, and Ted for nailing this down.
Landed on aurora, beta, and release. I don't have a clone of esr17 but ehsan volunteered to clone and land the patch there (thanks ehsan!)

https://hg.mozilla.org/releases/mozilla-aurora/rev/ea8661c0067f
https://hg.mozilla.org/releases/mozilla-beta/rev/000e362a815a
https://hg.mozilla.org/releases/mozilla-release/rev/0fa38190b64e
status-firefox17: --- → fixed
status-firefox18: --- → fixed

Comment 26

6 years ago
https://hg.mozilla.org/releases/mozilla-esr17/rev/04d267fd04b1
status-firefox-esr17: --- → fixed
https://hg.mozilla.org/mozilla-central/rev/dc61a415f3e9
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 19
QA verified this morning on mozilla-beta (17.0b7), mozilla-aurora (18.0a2) and mozilla-inbound yesterday (19.0)
Status: RESOLVED → VERIFIED
status-firefox17: fixed → verified
status-firefox18: fixed → verified
status-firefox19: --- → verified

Updated

4 years ago
Depends on: 1030265

Updated

4 years ago
No longer depends on: 1030265
You need to log in before you can comment on or make changes to this bug.