Do not use raise in UnixExceptionHandler

RESOLVED FIXED in Firefox 52

Status

()

Core
JavaScript Engine
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: nbp, Assigned: ehoogeveen)

Tracking

unspecified
mozilla52
Points:
---

Firefox Tracking Flags

(firefox52 fixed)

Details

Attachments

(1 attachment)

15:21:57 from decoder:
> did we change anything about how signals are handled in the JS engine
> i keep seeing these stack frames now for many crashes or assertion failures:
> # 00    raise
> # 01    js::UnixExceptionHandler
> # 02    AsmJSFaultHandler<(Signal)0>
> # 03    <signal handler called>
> + the original stack then
> and i ve never seen that before

From the source of both SEGV handler that we have in SpiderMonkey, I can notice that we use different mechanism for forwarding the issue.  Apparently "raise" [1] causes the SEGV-handler to remain on the stack, as opposed to the sigaction [2] call made in the bound-check handler used by Wasm/Asm.js.

[1] http://searchfox.org/mozilla-central/rev/e8dd21b1673c07307068303ab3e99360c78f4d12/js/src/ds/MemoryProtectionExceptionHandler.cpp#253
[2] http://searchfox.org/mozilla-central/rev/e8dd21b1673c07307068303ab3e99360c78f4d12/js/src/asmjs/WasmSignalHandlers.cpp#1197
Yeah, I'm not sure why raise() is used instead of what WasmSignalHandlers.cpp is doing.

Unrelated, I noticed that sHandlingException won't have effect unless SA_NODEFER is passed to sigaction().

Also, didn't we discuss making this feature #ifdef NIGHTLY?
(Assignee)

Comment 2

2 years ago
I did this because "It is not safe to return normally from the handler for a program error signal, because the behavior of the program when the handler function returns is not defined after a program error." according to [1], though I don't know what the POSIX standard has to say about it. If returning is the right thing to do then I'll prepare a patch!

> Unrelated, I noticed that sHandlingException won't have effect unless SA_NODEFER is passed to sigaction().

Yeah, that's why I didn't guard against reentrance in the initial patch, but I wasn't sure about it.

> Also, didn't we discuss making this feature #ifdef NIGHTLY?

We did, but I thought the decision was that it wasn't necessary. Sorry if I misinterpreted that!

(unfortunately, or perhaps fortunately the crash we were tracking seems to have been resolved by unrelated work, so now all that code doesn't seem to be gaining us anything in practice :( )

[1] https://www.gnu.org/software/libc/manual/html_node/Handler-Returns.html
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #2)
> I did this because "It is not safe to return normally from the handler for a
> program error signal, because the behavior of the program when the handler
> function returns is not defined after a program error." according to [1],
> though I don't know what the POSIX standard has to say about it. If
> returning is the right thing to do then I'll prepare a patch!

Yeah, you're right that it's UB according to the spec, but so are the asm.js fixups and PC redirection we're doing for OOB SIGSEGVs so I think we (and probably all debuggers and JVMs :) are already relying on what happens when you resume from a SIGSEGV.

> > Also, didn't we discuss making this feature #ifdef NIGHTLY?
> 
> We did, but I thought the decision was that it wasn't necessary. Sorry if I
> misinterpreted that!
> 
> (unfortunately, or perhaps fortunately the crash we were tracking seems to
> have been resolved by unrelated work, so now all that code doesn't seem to
> be gaining us anything in practice :( )

Oh, maybe I misunderstood.  Exciting to hear the bug might've been fixed!  Do you have the bug #?  Jan I suppose you're already considering this question of what to do here (release, #ifdef NIGHTLY, remove)?
Flags: needinfo?(jdemooij)
(Assignee)

Comment 4

2 years ago
(In reply to Luke Wagner [:luke] from comment #3)
> Oh, maybe I misunderstood.  Exciting to hear the bug might've been fixed! 
> Do you have the bug #?

Not a clue unfortunately - we can only infer that the bug is fixed by the fact that we haven't gotten any annotated crashes, and none of the memory exception crashes look related (they're usually memsets, not memcpys). It's still possible that the exception handler isn't even working, though that seems unlikely!

Regarding what to do next, we can 1) do nothing, 2) install the exception handler only on Nightly or 3) remove the code again (I'd rather not, but if no one is using it..). Orthogonal to that, we can switch AssemblerBuffer back to a normal mozilla::Vector (which will leave PageProtectingVector and the exception handlers with no users for now).
(Assignee)

Comment 5

2 years ago
Oh, maybe I misunderstood: the bug where we've been tracking the crashes is bug 1124397.
Did the drop in crashes coincide precisely with the landing of MemoryProtectionExceptionHandler?  Because if so, it seems possible we're simply losing those crash reports by breakpad not getting called.  A way to test that would be to disable MemoryProtectionExceptionHandler and see if the crashes come back.
(Assignee)

Comment 7

2 years ago
Created attachment 8799871 [details] [diff] [review]
Return from the unix signal handler rather than re-raising.

(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #4)
> Regarding what to do next, we can 1) do nothing

To clarify, I didn't mean "don't fix this bug", I just meant not making the exception handler Nightly only :)

Luke, this replaces the raise call with a call to sigaction like the wasm signal handler uses. Technically we should have already restored the signal handler at this point (since it's the first thing we do when we receive a signal), but setting the same signal handler twice can't hurt. I also added SA_NODEFER to match the wasm signal handlers (and since we already check for reentry).

I'll leave the decision of whether to make the signal handlers Nightly only for another patch or bug.
Assignee: nobody → emanuel.hoogeveen
Status: NEW → ASSIGNED
Attachment #8799871 - Flags: review?(luke)
(Assignee)

Comment 8

2 years ago
(In reply to Luke Wagner [:luke] from comment #6)
> Did the drop in crashes coincide precisely with the landing of
> MemoryProtectionExceptionHandler?  Because if so, it seems possible we're
> simply losing those crash reports by breakpad not getting called.  A way to
> test that would be to disable MemoryProtectionExceptionHandler and see if
> the crashes come back.

It's a possibility but.. even on Windows? The exception handler is so simple there. I agree that we need to investigate - Jan says he'll look at the crashes again tomorrow.
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #8)
Yeah, I was thinking the same thing (this is one of the few areas where I feel the Windows API got it right-er than POSIX); maybe it's just a coincidence.
Attachment #8799871 - Flags: review?(luke) → review+
(Assignee)

Comment 10

2 years ago
Thanks!

I confirmed locally that it still works, try looks good:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b348cd341e3bd5b52a988111ff23f5964842a3d3
Keywords: checkin-needed

Comment 11

2 years ago
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/860edbf262f7
Return from the unix signal handler rather than re-raising. r=luke
Keywords: checkin-needed
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #7)
> Luke, this replaces the raise call with a call to sigaction like the wasm
> signal handler uses. Technically we should have already restored the signal
> handler at this point (since it's the first thing we do when we receive a
> signal), but setting the same signal handler twice can't hurt. I also added
> SA_NODEFER to match the wasm signal handlers (and since we already check for
> reentry).

That's one of the thing I did not understood from the current code.  It does not sounds safe from my point of view to do that if the UnixExceptionHandler is not the last SEGV handler registered.  So I wonder if we have risks of accidentally remove the AsmJS OOB handler.

Also, I did not understood the condition with the atomic which guards that we do not re-enter, as I did not found the location where we reset the flag to re-enter this SEGV handler a second time, if the SEGV is handled by one of the previous SEGV handler registered.
Flags: needinfo?(emanuel.hoogeveen)
(Assignee)

Comment 13

2 years ago
This signal handler is installed early during startup - after the crash reporter, but before the wasm signal handler. As a result it's always called *after* the wasm signal handler, and we don't expect to be able to recover from a SEGV at that point. The signal handler it restores should always be the crash reporter or the default signal handler.

That's why we make sure to only run it once - if we somehow recover and enter the exception handler again, someone is probably doing some pretty hefty debugging, so losing the annotation is no great loss.
Flags: needinfo?(emanuel.hoogeveen)
(Assignee)

Comment 14

2 years ago
As a heads up, we figured out that the crashes aren't getting annotated because *no* SpiderMonkey crashes get annotated: bug 1309573.

Comment 15

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/860edbf262f7
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox52: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla52

Updated

2 years ago
Flags: needinfo?(jdemooij)
You need to log in before you can comment on or make changes to this bug.