Closed Bug 944612 Opened 11 years ago Closed 10 years ago

Odinmonkey (ARM): Intermittent SIGSEGV crashes, ElfLoader sigaction wrapper drops install of asm.js handler when it gives up installing its own handler

Categories

(Core :: JavaScript Engine: JIT, defect)

ARM
Android
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla31

People

(Reporter: dougc, Assigned: dougc)

References

Details

Attachments

(1 file, 3 obsolete files)

There are intermittent crashes running asm.js code on ARM/Android.  Bug 886736 added code to give up on installing the ElfLoader SIGSEGV handler if it detected slow handling.  The detection algorithm can be confused by system delays, is more likely to note slow handling on slower ARMv6 devices, but this is also reproducible on a Nexus 4, explaining the intermittent nature of the crashes.  The ElfLoader sigaction wrapper is always installed, and it catches the asm.js SIGSEGV handler installation, saving the handler, but when the ElfLoader handler is not also installed an asm.js related fault does not call the asm.js handler and leads to a crash.

A proof of concept workaround is to delay installation of the wrapper until the ElfLoader SIGSEGV handler has been installed.

Bug 874708 comment 12 notes that some system libraries redefine the handler within the scope of some of their code.  This might be mitigated in part by retaining the wrapper and ElfLoader handler, for the purpose of ensuring that the sigaction flags are compatible with the asm.js handler.  However in the multi-threaded code this would still be unreliable, so perhaps a list of problematic systems is needed for which asm.js compilation would be disabled.  Might the wrapper be retained and extended to report on unexpected attempts to redefine the SIGSEGV handler to help build this list?

This cause of crashes might account for some of the crashes reported in bug 867228, but only explains Android/ElfLoader specific crashes.
Sorry, just fixing the patch description.
Attachment #8340252 - Attachment is obsolete: true
Blocks: 864220
Blocks: 867228
Comment on attachment 8340253 [details] [diff] [review]
Avoid installation of the ElfLoader sigaction wrapper when it gives up installing its SIGSEGV handler

This patch appears to also resolve bug 980498.  The problem is causing intermittent crashes.  Feedback welcomed?
Attachment #8340253 - Flags: feedback?(mh+mozilla)
Comment on attachment 8340253 [details] [diff] [review]
Avoid installation of the ElfLoader sigaction wrapper when it gives up installing its SIGSEGV handler

Review of attachment 8340253 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/ElfLoader.cpp
@@ +847,5 @@
>    dbg->r_state = r_debug::RT_CONSISTENT;
>    dbg->r_brk();
>  }
>  
> +#if defined(ANDROID) && (defined(__i386__) || defined(__arm__))

This would break non android builds. Please leave this alone, especially the endif. It's also unrelated to the bug.

@@ +881,5 @@
>  
>  /* Replace the first instructions of the given function with a jump
>   * to the given new function. */
>  template <typename T>
> +static void

Whether diversion worked or not is still something that should be taken into account, even if it's delayed, so keep it a bool.

@@ +1026,5 @@
>    action.sa_flags = SA_SIGINFO | SA_NODEFER | SA_ONSTACK;
>    registeredHandler = !sys_sigaction(SIGSEGV, &action, nullptr);
> +
> +  if (registeredHandler)
> +    Divert(sigaction, __wrap_sigaction);

The diversion should be done before setting the handler, and the loader should be disabled when the diversion doesn't work.

All in all, a simpler fix is to change __wrap_sigaction to test registeredHandler.
Attachment #8340253 - Flags: feedback?(mh+mozilla) → feedback-
Blocks: 980498
Thank you for the feedback.  The patch has been reworked as suggested.

There are a few other changes to the initialization code to better restore the prior handler in some of the bail out paths - happy to remove these?

The original patch was based on the result of Divert being determined at compile time.  I guess you must expect it might be a dynamic result in future?

It's a simple patch that corrects crashes so would it be appropriate to request it be uplifted after it has been proven in nightly?

Geoff: it would be interesting to confirm if this also fixes bug 980498?
Assignee: nobody → dtc-moz
Attachment #8340253 - Attachment is obsolete: true
Attachment #8393430 - Flags: review?(mh+mozilla)
Attachment #8393430 - Flags: feedback?(gbrown)
Comment on attachment 8393430 [details] [diff] [review]
Guard the ElfLoader sigaction wrapper against the SIGSEGV handler not being installed.

Review of attachment 8393430 [details] [diff] [review]:
-----------------------------------------------------------------

This patch also fixes bug 980498 (I verified .dmp files are created on SIGSEGV, in the Android 2.3 emulator).
Attachment #8393430 - Flags: feedback?(gbrown) → feedback+
Comment on attachment 8393430 [details] [diff] [review]
Guard the ElfLoader sigaction wrapper against the SIGSEGV handler not being installed.

Review of attachment 8393430 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/ElfLoader.cpp
@@ +998,5 @@
>                                      PROT_READ | PROT_WRITE,
>                                      MAP_PRIVATE | MAP_ANONYMOUS, -1, 0));
> +  if (stackPtr.get() == MAP_FAILED) {
> +    /* Attempt to restore the original segfault signal handler. */
> +    sys_sigaction(SIGSEGV, &this->action, nullptr);

Should just move stackPtr.Assign and this test above sys_sigaction. This would avoid having to restore at all.

@@ +1008,5 @@
>    mprotect(stackPtr, stackPtr.GetLength(), PROT_NONE);
>    data->crash_int = 123;
>    stackPtr.Assign(MAP_FAILED, 0);
> +  /* Restore the original segfault signal handler. */
> +  sys_sigaction(SIGSEGV, &this->action, nullptr);

I guess you moved this for safety. Then you should move it above stackPtr.Assign(MAP_FAILED, 0);
Attachment #8393430 - Flags: review?(mh+mozilla) → feedback+
Thank you again for the feedback, these were good suggestions and have been applied.

gbrown: Thank you for testing.
Attachment #8393430 - Attachment is obsolete: true
Attachment #8393906 - Flags: review?(mh+mozilla)
Attachment #8393906 - Flags: review?(mh+mozilla) → review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/5e6978476efa
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: