Closed Bug 1436811 Opened 6 years ago Closed 6 years ago

Baldr: reset stack pointer in signal handler on Trap::StackOverflow

Categories

(Core :: JavaScript Engine: JIT, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1428453

People

(Reporter: jolesen, Unassigned)

Details

Currently, the over-recursion stack overflow check for wasm functions will jump to out-of-line code after each function that resets the stack pointer before jumping to the trap handler. This is to make sure the trap handler has enough stack space to run.

Now that we have frame pointers everywhere, this functionality can be moved to the signal handler:

prologue:
    push TLSReg
    push %rbp
    mov %rsp, %rbp
    sub #frame, %rsp
stack_check:
    cmp %rsp, limit(TLSReg)
    jb good
    ud2             ; TrapSite(StackOverflow)
good:
    ...


SIGILL signal handler:
    When detecting a StackOverflow trap, pop the last frame that failed to allocate by using the frame pointer:
    %rsp = %rbp + 3*8   ; return address, TLS pointer, frame pointer.
    %rbp = *%rbp

This saves precious executable memory since the stack overflow trap no longer needs its own trap stub after each function.

This would also be a first step towards using stack guard pages instead of explicit stack overflow checking: When we get a SIGSEGV from a stack guard page, the signal handler can pop enough frames to make space for the trap handler to run. This is possible with a frame pointer.
I think you're right, but the change I was planning (when converting StackOverflow from an OldTrap to new-style Trap) was to just have a ud2 that is jumped over by the stack overflow branch.  (New traps never have out-of-line paths.)  I got hung up on converting the bad-indirect-call-signature trap (because it's within the prologue) but this stack overflow trap should be easy to convert from old to new so I can do that soon if it helps.
The question is where to we rewind the stack pointer after it overflowed. If the function failed to allocate a MB of frame, the trap handler won't be able to run without rewinding the sp.
Oh sorry, I missed that.  So on linux, we use SA_ONSTACK which means we're save to run the signal handler.  On OSX, we simply use a separate thread which waits on debug ports.  On Windows, we apparently do nothing today, but I just discovered SetThreadStackGuarantee() which could help (seems like breakpad should be setting that regardless...).

Regardless, though, I worry that we'll run into OS problems if SP shoots past the end-of-stack when we ud2.  Actually, that seems like a vague concern today if a random (e.g., profiling) signal interrupts PC after the increment and before the stack-overflow check.

It seems like what we should do is:
 - if the framePushed is small (<100 bytes or so), don't bother fixing up sp; we have a large amount of space before the actual end of native stack
 - if the framePushed is large, do the more expensive `(tls->stackLimit - sp) > framePushed)` test *before* incrementing sp, and fault if this fails
Thus, there is never any fixup and probably the large-framePushed case overhead is amortized by the actual call.

Thoughts?
Also, once we use ud2 and remove the ool path, it seems good to move this into GenerateFunctionPrologue() (passing in a new 'needsOverflowCheck' bool).
(In reply to Luke Wagner [:luke] from comment #3)
> Oh sorry, I missed that.  So on linux, we use SA_ONSTACK which means we're
> save to run the signal handler.  On OSX, we simply use a separate thread
> which waits on debug ports.  On Windows, we apparently do nothing today, but
> I just discovered SetThreadStackGuarantee() which could help (seems like
> breakpad should be setting that regardless...).

Ooh, interesting. Breakpad uses SetUnhandledExceptionFilter() [1] to forward unhandled exceptions to a dedicated thread, so it should be safe (like OSX), but our vectored exception handlers are a different story. SetThreadStackGuarantee() only applies to the calling thread so we should probably set it on every thread that could trigger an exception we want to handle.

[1] https://msdn.microsoft.com/en-us/library/windows/desktop/ms680634.aspx
(In reply to Luke Wagner [:luke] from comment #3)
I think testing `(tls->stackLimit - sp) > framePushed)` in large frames is a good idea. This also avoids the potential integer overflow when a huge frame is larger than the numerical value of the stack pointer so the stack pointer adjustment would wrap around. (I don't know if any OS places stacks that close to the null page).

Moving the whole thing into GenerateFunctionPrologue() would certainly make things easier for Cretonne, where modeling this cleanly is tricky.
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #5)
Yeah.  On IRC, Ted said we are known to handle stack overflows on Windows.  I guess maybe it's usually "ok" or there is a default value that is big enough?

(In reply to Jakob Stoklund Olesen [:jolesen] from comment #6)
Ok, great.  I can do this (and bump it ahead of other patches in my priority queue) or you can if you'd like.
(In reply to Luke Wagner [:luke] from comment #7)
> Ok, great.  I can do this (and bump it ahead of other patches in my priority
> queue) or you can if you'd like.

I'll implement my original plan [1] which is essentially the small-frame case handled Cretonne-side. This unblocks me and other users of Cretonne. Then I can simply stop using Cretonne's stack_check instruction once GenerateFunctionPrologue() handles the big frames.

[1] https://github.com/stoklund/cretonne/issues/234
To keep the patch sequence contained, I put the patch in bug 1428453.  Should be able to land soon.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.