Suppress crash dialog for out-of-process decoder service crashes

VERIFIED FIXED in Firefox 54

Status

()

P1
normal
VERIFIED FIXED
2 years ago
3 months ago

People

(Reporter: snorp, Assigned: jhlin)

Tracking

Trunk
Firefox 55
Points:
---

Firefox Tracking Flags

(firefox54 verified, firefox55 verified)

Details

Attachments

(1 attachment)

I see 'Nightly has stopped' dialogs in Nightly sometimes while playing video, and I'm fairly sure what's happening is that he decoder service crashed. We need to find a way to suppress those dialogs.
Flags: needinfo?(jolin)
(Assignee)

Comment 2

2 years ago
Following previous comment:
From what I can tell, [1] handles Java crash by assigning uncaught exception handler. Since it's possible to intercept exceptions by try/catch, this should be not a problem.

On the other hand, for native crashes [2] Android uses signal handler to collect and dump relative info and then send them through socket to activity manager, which shows the dialog box for the crashing process (except for 'persistent' processes). It's unlikely for Fennec to be persistent, so what I'm going to do next is refreshing my memory of UNIX signals and see how we can hijack them. Also, we might want to collect the coredump/tombstone by ourselves.
Priority: -- → P1
John, it should be enough to just use sigaction to make SEGV a noop. There are some examples of hooking up to that in mozglue here: https://dxr.mozilla.org/mozilla-central/source/mozglue/linker/ElfLoader.cpp#1099

In fact maybe we just want to use mozglue for this? Decoder process already uses it for shmem, right?
(Assignee)

Comment 4

2 years ago
It looks like Android also shows crash dialog box for signals other than SEGV [1] so I guess we need to intercept those too.


[1] http://androidxref.com/7.1.1_r6/xref/bionic/linker/debugger.cpp#312
Comment hidden (mozreview-request)

Comment 7

2 years ago
mozreview-review
Comment on attachment 8857388 [details]
Bug 1343541 - restore signal actions changed by Android debugger.

https://reviewboard.mozilla.org/r/129384/#review133252

::: mozglue/android/APKOpen.cpp:490
(Diff revision 1)
>  }
> +
> +extern "C" APKOPEN_EXPORT void MOZ_JNICALL
> +Java_org_mozilla_gecko_mozglue_GeckoLoader_suppressCrashDialog(JNIEnv *jenv, jclass jc)
> +{
> +  // Restore all singal actions changed by Android debugger.

Please assert that this doesn't run on Gecko processes. Please also fix the typos for signal here and in the commit message.
Attachment #8857388 - Flags: review?(mh+mozilla)
Comment hidden (mozreview-request)

Comment 9

a year ago
mozreview-review
Comment on attachment 8857388 [details]
Bug 1343541 - restore signal actions changed by Android debugger.

https://reviewboard.mozilla.org/r/129384/#review134060

::: mozglue/android/APKOpen.cpp:488
(Diff revision 2)
> +static bool
> +IsMediaProcess()

I guess that's good enough, but maybe we could instead check whether Gecko's crash reporter was initialized (which if that happened, would have called the getLibraryMapping function in this file.

::: mozglue/android/APKOpen.cpp:499
(Diff revision 2)
> +  FILE* f = fopen(str, "r");
> +  if (f) {
> +    fgets(str, sizeof(str), f);
> +    fclose(f);
> +    const size_t strLen = strlen(str);
> +    const char* suffix = ":media";

const char suffix[] = ...

::: mozglue/android/APKOpen.cpp:500
(Diff revision 2)
> +  if (f) {
> +    fgets(str, sizeof(str), f);
> +    fclose(f);
> +    const size_t strLen = strlen(str);
> +    const char* suffix = ":media";
> +    const size_t suffixLen = strlen(suffix);

sizeof(suffix) + 1
Attachment #8857388 - Flags: review?(mh+mozilla) → review+
(Assignee)

Comment 10

a year ago
mozreview-review-reply
Comment on attachment 8857388 [details]
Bug 1343541 - restore signal actions changed by Android debugger.

https://reviewboard.mozilla.org/r/129384/#review134060

> sizeof(suffix) + 1

I suppose you mean `sizeof(suffix) - 1`?
err, yeah.
Comment hidden (mozreview-request)
(Assignee)

Comment 13

a year ago
mozreview-review-reply
Comment on attachment 8857388 [details]
Bug 1343541 - restore signal actions changed by Android debugger.

https://reviewboard.mozilla.org/r/129384/#review134060

> I guess that's good enough, but maybe we could instead check whether Gecko's crash reporter was initialized (which if that happened, would have called the getLibraryMapping function in this file.

Thanks for the suggestion but I will keep this media process only for now and change the assertion if there is another process wants to suppress crash dialog in the future.

Comment 14

a year ago
Pushed by jolin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/103dceccaf84
restore signal actions changed by Android debugger. r=glandium

Comment 15

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/103dceccaf84
Status: NEW → RESOLVED
Last Resolved: a year ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 55
(Assignee)

Comment 16

a year ago
Comment on attachment 8857388 [details]
Bug 1343541 - restore signal actions changed by Android debugger.

Approval Request Comment
[Feature/Bug causing the regression]: bug 1257777
[User impact if declined]: It shows 'Fennec has stopped' dialog when playing video.
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: yes.
 1. Watch a video
 2. Open CMD or PowerShell use the command line "adb shell ps |FINDSTR :media” for CMD or "adb shell ps |sls :media” for PowerShell
    There will be one process that will be displayed. For example:
    u0_a89 11086 2410 993952 52964 ffffffff 00000000 S org.mozilla.firefox:media  
 3. Use commandline "adb shell kill -ABRT <PID>", where <PID> is the PID of above process (11086 in the example)
 4. There will a short pause (may show loading icon) then the video will resume.
    Before the fix, there will be a dialog box pop up saying "Firefox has stopped". After the fix you won't see the dialog. 
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: low risk
[Why is the change risky/not risky?]: it makes crashing OOP decoder process behave as default.
[String changes made/needed]: none
Attachment #8857388 - Flags: approval-mozilla-beta?
Hi Ioana, could you help find someone to verify if this issue was fixed as expected on a latest Nightly build? Thanks!
Flags: qe-verify+
Flags: needinfo?(ioana.chiorean)

Comment 18

a year ago
Yes, I will let Bogdan look over it as he did some work on this side.
Flags: needinfo?(ioana.chiorean) → needinfo?(bogdan.surd)
Hello,

 @John could this be verified without needing to root your device? Following the instructions you provided I can't use the kill command without su access. I would need to rood a device in order to verify this. Thank you.
Flags: needinfo?(jolin)
Assignee: nobody → jolin
(Assignee)

Comment 20

a year ago
(In reply to Bogdan Surd, QA [:bogdan] from comment #19)
> Hello,
> 
>  @John could this be verified without needing to root your device? Following
> the instructions you provided I can't use the kill command without su
> access. I would need to rood a device in order to verify this. Thank you.

Unfortunately super user access is needed to use the 'kill' command to send signal to other user's processes. So it can only be verified on rooted devices.
Flags: needinfo?(jolin)
Hi :Bogdan,
Any thoughts on comment #20?
Hello,

 Sorry for the late response, I've been out of office. Was and still am trying to root a device in order to verify this, I will give it another try this Thursday after we finish with this Beta release. If I am not able to successfully root a device we will not be able to properly verify this. I will get back to you this Thursday with a response.

Thank you for your patience.
Hello,

 I was not able to rood a device but I have tried the steps in Comment 16 on a OnePlus One (Android 6.0.1) rooted device from a colleague of mine and I kept getting the "Operation not permitted" error as on a non rooted device. I did check the root access permissions for both apps and ADB in the developer option menu.
(Assignee)

Comment 24

a year ago
(In reply to Bogdan Surd, QA [:BogdanS] from comment #23)
> Hello,
> 
>  I was not able to rood a device but I have tried the steps in Comment 16 on
> a OnePlus One (Android 6.0.1) rooted device from a colleague of mine and I
> kept getting the "Operation not permitted" error as on a non rooted device.
> I did check the root access permissions for both apps and ADB in the
> developer option menu.

Hi Bodgan, what's the last char of 'adb shell' prompt? If it's '$', please try the following steps:
1. 'adb root'. It should say 'restarting adbd as root' or 'adbd is already running as root'
2. 'adb shell'. It should start a shell prompt
3. 'su' in the shell prompt. If the prompt changes to '#' then you can use the 'kill' command. If it says something like '/system/bin/sh: su: not found' or other error, then I'm sorry that you need another device. :(
Hello,

 I am sorry to inform you that we are not able to verify this, we don't have any other device that is rooted and we were unable to properly root a device.
Flags: needinfo?(bogdan.surd)
Comment on attachment 8857388 [details]
Bug 1343541 - restore signal actions changed by Android debugger.

Suppress the crash dialog to avoid confusion. Beta 54+. Should be in 54 beta 6.
Attachment #8857388 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 27

a year ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/71ee4cad7943
status-firefox54: affected → fixed
Hello,

 I've managed to successfully root a device and verified this in both nightly and beta. After killing the process the video would stop and resume from where it left off. No other dialog box would open. 
 Sorry for not being able to verify this faster.
status-firefox54: fixed → verified
status-firefox55: fixed → verified
Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.