Closed Bug 749226 Opened 12 years ago Closed 12 years ago

IonMonkey: Repeated bailouts can cause stack overflow

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jandem, Assigned: dvander)

References

Details

Attachments

(1 file, 5 obsolete files)

Currently after a bailout we call js::Interpret to finish the current frame. The interpreter can then enter Ion again (via OSR). This means that instead of a single frame, there can be multiple js::Interpret + Ion frames on the C stack.

With chunked compilation, we have to bailout every time we reach an uncompiled chunk. If the script is huge, this can quickly lead to recursion errors: every time we reach a new chunk and re-enter Ion we will push a new js::Interpret and Ion frame.
We want to ensure that every time we want to execute Ion code, we remove the existing C++ js::Interpret frame which is very large. Two ideas:

(1) We could make Interpret a static function and introduce a wrapper responsible for calling it. js::Interpret would return immediately if it needed to execute Ion code, and the wrapper would be responsible for entering Ion, then re-entering the interpreter after. The wrapper function would need JS_NEVER_INLINE.

(2) We could limit the problem to just bailouts by returning all the way out of ThunkToInterpreter as soon as we want to OSR. Then we'd push a lightweight frame in-between the bailout handler and new OSR code, so when the OSR code returns, we'd call back into ThunkToInterpreter.
(3) We could discard the EnterIon frames which are stack on top of each others each time you enter & bailout.  In such case ThunkToInterpreter should look if this is the last frame and update the enter-frames before returning to EnterIon and returning to the Interpreter.
(In reply to Nicolas B. Pierron [:pierron] from comment #2)
> (3) We could discard the EnterIon frames which are stack on top of each
> others each time you enter & bailout.  In such case ThunkToInterpreter
> should look if this is the last frame and update the enter-frames before
> returning to EnterIon and returning to the Interpreter.

Could you explain this in more detail? Does it reduce Interpret activations?
(In reply to David Anderson [:dvander] from comment #3)
> (In reply to Nicolas B. Pierron [:pierron] from comment #2)
> > (3) We could discard the EnterIon frames which are stack on top of each
> > others each time you enter & bailout.  In such case ThunkToInterpreter
> > should look if this is the last frame and update the enter-frames before
> > returning to EnterIon and returning to the Interpreter.
> 
> Could you explain this in more detail? Does it reduce Interpret activations?

When we bailout from the only JSFrame remaining on top of EnterIon, we can detect in ThunkToInterpreter that the IonActivation is now empty and that we will no longer return any JIT-code except for returning directly to the interpreter which lead us to call EnterIon.

The idea is to get rid of ( Interpreter | EnterIon | Trampoline | ThunkToInterpreter ) and update the parent StackFrame with the information recovered by the bailout.  Then we can continue in the parent interpreter.

This does not solve the problem of recursive functions, but in such case we expect to consume stack space.
Okay - so, this would only work when the entry Ion JS frame bails?
(In reply to David Anderson [:dvander] from comment #5)
> Okay - so, this would only work when the entry Ion JS frame bails?

Yes.

I think this can solve the current issue if there is only one function which is not calling it-self.
(1) doesn't work - we still blow out of the stack, it just allows 10X more frames. We really need to escape out of the inner Interpret activations. I'll start working on (2) now.
Assignee: general → dvander
Status: NEW → ASSIGNED
Current plan of attack: Interpret() + JSINTERP_BAILOUT will immediately exit with a special code upon OSR. ThunkToInterpret will immediately exit with this code as well. Then:

 (1) Pop OSR fp (might need to plop it on the C stack, too)
 (2) Push a lightweight IonFrame_Bridge
 (3) Perform OSR with the old fp

A few tricky cases to handle:
 (1) StackIter will need to learn about bridge frames so it can read intervening JSStackFrames
 (2) When we remove our own frame due to bailout, we must remove a parent bridge frame as well.

Since this will involve a significant amount of assembly changing in the bailout handler, I'm hoping to make most of that code shared since it's basically identical on every platform.

In theory this all should minimize stack explosion and be flexible enough to solve other problems as well (like ion->interp->ion calls, or intermixing with frames from future JITs).
Attached patch WIP v1 (obsolete) — Splinter Review
hugely hacky proof of concept, performs OSR inline in a bailout (and then crashes later). needs work to make IonFrameIterator work across these new OSR frames, and for bailouts inside inlined OSR frames to not push additional OSR frames.
Attached patch WIP v2 (obsolete) — Splinter Review
Passes *an* test. It turns out that we can't just remove forbidOsr_ for testing because that leads to infinite loops (Nicolas has hot-fixed this in another bug - here I make forbidOsr a countdown mechanism for testing).

Now we introduce a new IonFrame_Osr when we OSR from within a bailout. Bailouts always look for this frame before calling ThunkToInterpreter, and remove it if it's there.

The last step is making StackIter recognize IonFrame_Osr and correctly handling the interleaving of interp/Ion frames.
Attachment #620921 - Attachment is obsolete: true
Comment on attachment 620938 [details] [diff] [review]
WIP v2

Review of attachment 620938 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/Bailouts.cpp
@@ +204,5 @@
> +    do {
> +        ++iter;
> +        if (iter.isScripted())
> +            return false;
> +    } while (!iter.done());

++iter;
for(; !iter.done(); ++iter) {
    if (iter.isScripted())
        return false;
}

I am not sure we want to use anything else after !done(), even if isScripted is working.

::: js/src/ion/IonFrames.cpp
@@ +498,5 @@
>      ++it;
> +
> +    // If the top frame is not scripted, then we've re-used a frame from
> +    // Interpret() after bailing out in that initial frame.
> +    if (!it.isScripted()) {

This can be a rectifier frame followed by a VM call.

function g(a) { print(a) }
function f() { g() }

This add extra complexity for the iteration process, would be good to find a way to make it simple, like changing the done function to return true on OSR and on Entry frames.

By the way, I don't understand in what cases this can happen if the OSR is supposed to be an equivalent of the Entry frame.
Attached patch WIP v3 (obsolete) — Splinter Review
Down to a few test failures. This patch is reduced in scope, and now only supports the easy case (EnterIon with a, bailout at a, OSR at a). Will test with chunked compilation tomorrow.

IonFrame_Osr still exists, since it makes pushing a callee-token relatively uncomplicated.
Attachment #620938 - Attachment is obsolete: true
> > +
> > +    // If the top frame is not scripted, then we've re-used a frame from
> > +    // Interpret() after bailing out in that initial frame.
> > +    if (!it.isScripted()) {
> 
> This can be a rectifier frame followed by a VM call.
> 
> function g(a) { print(a) }
> function f() { g() }
> 

A rectifier frame can't be followed by a VM call (I think), it will immediately go to a scripted call first. If it were possible - that would mean the existing code is broken?
(In reply to David Anderson [:dvander] from comment #13)
> > > +
> > > +    // If the top frame is not scripted, then we've re-used a frame from
> > > +    // Interpret() after bailing out in that initial frame.
> > > +    if (!it.isScripted()) {
> > 
> > This can be a rectifier frame followed by a VM call.
> > 
> > function g(a) { print(a) }
> > function f() { g() }
> > 
> 
> A rectifier frame can't be followed by a VM call (I think), it will
> immediately go to a scripted call first. If it were possible - that would
> mean the existing code is broken?

You are right, I mess my mind between bailouts and VM calls.  GetPCScript is only supposed to be called under a VM call in which case the stack contains:

Rectifier
JS Frame
VM call
C++ call

The iterator assume it starts at an Exit frame registered by masm.linkExitFrame.

About the OSR frame, I would prefer if the OSR frame could be an Entry-like frame, for example:

Entry
JS Frame
Bailout Exit Frame
OSR Frame
JS Frame

At the moment this is the simplest solution because it does not involve replacement of any existing frames.  I won't go for a modification of existing frames before we changed the way we iterate over ion frames and use a shift between frame pointers only (no sizeof(frame) involved).

This will consume stack space but would be a good first step for this issue.
Attached patch passes tests (obsolete) — Splinter Review
Cleans up previous nits. Implements a combination of ideas (2) and (3), re-using the existing |fp| in a bailout if possible, but shortcutting out of the interpreter rather than EnterIon. 

re: OSR acting like an entry frame, that's sort of what it does. It's like a cross between a JS frame (it has no argv) and an entry frame. But right now it's just a way to easily push the callee token. If chunked compilation needs the full, more advanced fix, the osr frame will probably end up more like an entry frame.
Attachment #621905 - Attachment is obsolete: true
Attachment #622193 - Flags: review?(nicolas.b.pierron)
Comment on attachment 622193 [details] [diff] [review]
passes tests

Review of attachment 622193 [details] [diff] [review]:
-----------------------------------------------------------------

This solution is a mix of (2) & (3), it seems to me that (2) and (3) are 2 different problems (different stack layout).  Using an OSR frame after the bailout of the entry frame (by re-using the entry frame) cause an invariant failure (fp.runningInIon() =≠=> the top frame is an Ion frame) which implies modification of GetPcScript.

(2) is about not re-entering the Ion frames with EnterIon if the Interpreter is just on top of an Ion frame (called after a bailout).
(3) is about removing the EnterIon calls from the stack when a bailout of the entry frame happens, then re-using the interpreter loop and fp which called EnterIon.

The current implementation is about not re-entering Ion frames with EnterIon if the Interpreter is just on top of an Ion/Entry frame (called after a bailout), while re-using the Entry frame if possible.

note: Currently fp.runningInIon() does not imply that the we have an IonFrame (except for the top-frame), which explain why StackIter has a loop to check if we are done or not.  Solving (3) alone should ensure that we can remove the loop and the  “if (ionFrames_.done()) { … }”  from StackIter.

::: js/src/ion/Bailouts.cpp
@@ +231,5 @@
>          return BAILOUT_RETURN_FATAL_ERROR;
>      activation->setBailout(br);
>  
>      StackFrame *fp;
> +    if (IsInitialFrame(in.fp())) {

Add Spew in each branch, such as we can relate bugs to this choice which is a huge pivot in the way the stack is manipulated.

::: js/src/ion/IonFrameIterator.h
@@ +71,4 @@
>      IonFrame_Bailed_Rectifier,
> +
> +    // An exit frame is necessary for transitioning from a JS frame into C++.
> +    IonFrame_Exit,

// It is never created in a frame descriptor but is updated by linkExitFrame macro assembler function.

::: js/src/ion/IonFrames.cpp
@@ +496,5 @@
>      ++it;
> +
> +    // If the top frame is not scripted, then we've re-used a frame from
> +    // Interpret() after bailing out in that initial frame.
> +    if (!it.isScripted()) {

Replace this test by JS_ASSERT(it.isScripted()).  This case should never happen.

::: js/src/ion/IonSpewer.h
@@ +162,5 @@
>  
>  #endif /* DEBUG */
>  
> +template <IonSpewChannel Channel>
> +class AutoDisableSpew

nice.

::: js/src/ion/arm/Trampoline-arm.cpp
@@ +289,4 @@
>  
> +        // Test for an exception
> +        masm.as_cmp(r0, Imm8(0));
> +        masm.ma_b(&exception, Assembler::Zero);

masm.tst ?

@@ +292,5 @@
> +        masm.ma_b(&exception, Assembler::Zero);
> +
> +        // Test for OSR
> +        masm.as_cmp(r0, Imm8(Interpret_OSR));
> +        masm.ma_b(&osr, Assembler::Zero);

Assembler::Equal

@@ +295,5 @@
> +        masm.as_cmp(r0, Imm8(Interpret_OSR));
> +        masm.ma_b(&osr, Assembler::Zero);
> +
> +        // Return to the caller
> +        masm.ma_pop(pc);

masm.ret()

::: js/src/vm/Stack.cpp
@@ +1176,5 @@
>                  continue;
>              }
>  
>  #ifdef JS_ION
> +            while (fp_->runningInIon()) {

knowing that fp_ is not changed, this is confusing.

@@ +1182,5 @@
>  
>                  while (!ionFrames_.done() && !ionFrames_.isScripted())
>                      ++ionFrames_;
>  
>                  if (ionFrames_.done()) {

invert the condition and exchange the else-then parts and remove the break & the while.
Attachment #622193 - Flags: review?(nicolas.b.pierron)
> This solution is a mix of (2) & (3), it seems to me that (2) and (3) are 2
> different problems

I don't think they're separate problems. This patch is a general solution: the bailout process must be iterative, not recursive. However I think it's worth pursuing (3) in addition to the general solution, to avoid making the runningInIon invariants more confusing. With the patch as-is, runningInIon doesn't guarantee a scripted Ion frame, which is a little annoying.
Attached patch poc leaving EnterIon (obsolete) — Splinter Review
proof-of-concept, untested, needs a little cleanup. i'm not yet convinced about adding this approach, it's a little hairy to propagate up this reverse-OSR flag and the only benefit is avoiding one-line complexity in StackIter.
Comment on attachment 622307 [details] [diff] [review]
poc leaving EnterIon

Review of attachment 622307 [details] [diff] [review]:
-----------------------------------------------------------------

To be honest this look like a really small addition compared to your previous patch, and it should restore the invariant that fp->runningInIon implies that there is an active Ion frame on the C stack.

::: js/src/jsinterp.cpp
@@ +474,5 @@
>              return false;
> +        if (status == ion::Method_Compiled) {
> +            InterpretStatus status = ion::Cannon(cx, fp);
> +            if (status == Interpret_OSR)
> +                return Interpret(cx, fp);

This will stack Interpreter frame on top of each other unless the compiler is removing the tail call here by a goto to the beginning of the function.

::: js/src/jsinterp.h
@@ +238,5 @@
> +enum InterpretStatus
> +{
> +    Interpret_Error    = 0, /* interpreter had an error */
> +    Interpret_Ok       = 1, /* interpreter executed successfully */
> +    Interpret_OSR      = 2  /* when mode=BAILOUT and we should OSR into Ion */

I can suggest (if you want) to have 2 Interpret_OSR, this is will avoid confusing interpretation mixing (2) and (3).  You can even give them the same value to avoid additional check as they are not supposed to overlap.

Interpret_OSR_Ion2Int = 2, /* When we bailout the entry frame we should OSR into the Interpreter */
Interpret_OSR_Int2Ion = 2  /* when mode=BAILOUT and we should OSR into Ion */

::: js/src/vm/Stack.cpp
@@ +684,3 @@
>       */
>      FrameRegs *regs = cx->maybeRegs();
> +    if (regs && report != DONT_REPORT_ERROR) {

This modification should no longer be necessary.
Comment on attachment 622307 [details] [diff] [review]
poc leaving EnterIon

Review of attachment 622307 [details] [diff] [review]:
-----------------------------------------------------------------

Oops, forgot to comment this …

::: js/src/vm/Stack.cpp
@@ +1187,5 @@
>  
>                  while (!ionFrames_.done() && !ionFrames_.isScripted())
>                      ++ionFrames_;
>  
>                  if (ionFrames_.done()) {

You can remove this if statement now and the while statement above it, because we now have the garantee that there is an Ion frame if fp->runningInIon() is set.
Attached patch v2Splinter Review
I kept the DONT_REPORT_ERROR test since I think it's a little safer.

re: Interpret(cx, fp) - this is actually correct (it bit me, too). The entry frame is the stopping point, not the starting point - the interpreter always begins interpreting from regs.fp().
Attachment #622193 - Attachment is obsolete: true
Attachment #622307 - Attachment is obsolete: true
Attachment #622606 - Flags: review?(nicolas.b.pierron)
Comment on attachment 622606 [details] [diff] [review]
v2

Marty for ARM changes
Attachment #622606 - Flags: review?(mrosenberg)
Comment on attachment 622606 [details] [diff] [review]
v2

Review of attachment 622606 [details] [diff] [review]:
-----------------------------------------------------------------

This patch handle 2 ways to remove interpreter/ion frames interleaving.

1/ Called (2) in this bug.  This case happens when the bailed out frame is not the initial frame which entered IonCode.  A new frame is created and the interpreter is called (as done before), except that the interpreter has been instrumented to return to discard the interpreter stack frame, and start an ion frame instead.  This avoid interleaving of Interpreter and Ion frames when we still have active Ion frames.

2/ Called (3) in this bug.  This case happens when the bailed out frame is the initial frame which entered IonCode.  This means the frame before it is the Entry frame or a rectifier frame followed by an entry frame.  So instead of creating a new frame, the bailout reuse the entry frame to store properties of the bailed ion frame and return to the interpreter with a magic value which is then interpreted in the interpreter to resume its execution.

These 2 ways enforce that fp->runningInIon() there is at least one active JS frame in the activation corresponding to the current fp.  In addition, it return as soon as possible which avoid “zombie” (dead) interpreter/ion stack frames which goal are only to return the result of the next function.

::: js/src/ion/Bailouts.cpp
@@ +231,5 @@
>          return BAILOUT_RETURN_FATAL_ERROR;
>      activation->setBailout(br);
>  
>      StackFrame *fp;
> +    if (IsInitialFrame(in.fp())) {

nit: Add spew in each branch.

@@ +508,5 @@
>          pc += GET_JUMP_OFFSET(pc);
>      if (JSOp(*pc) == JSOP_LOOPENTRY)
>          cx->regs().pc = GetNextPc(pc);
>  
> +    if (activation->entryfp() == br->entryfp()) {

nit: or Add spew here.

::: js/src/jsinterp.cpp
@@ +1845,5 @@
>          if (status == ion::Method_Compiled) {
> +            // Return back to Ion to perform inline OSR, but only if this frame
> +            // is the entry fp to EnterIon.
> +            if (interpMode == JSINTERP_BAILOUT && regs.fp()->runningInIon()) {
> +                JS_ASSERT(entryFrame == regs.fp());

Are you sure you are dealing with the previous frames and not the current frame ?  It seems like the condition of this branch cannot be accomplished since the current interpreter frame is no longer running in ion because we bailout with a new frame.  You might want to check if the previous frame is running in ion instead.

@@ +1847,5 @@
> +            // is the entry fp to EnterIon.
> +            if (interpMode == JSINTERP_BAILOUT && regs.fp()->runningInIon()) {
> +                JS_ASSERT(entryFrame == regs.fp());
> +                return Interpret_OSR;
> +            }

Would it be possible to have the same for the other entry point in IonCode ?

::: js/src/vm/Stack.cpp
@@ +1184,5 @@
>  #ifdef JS_ION
>              if (fp_->runningInIon()) {
>                  ionFrames_ = ion::IonFrameIterator(ionActivations_);
>  
>                  while (!ionFrames_.done() && !ionFrames_.isScripted())

nit: You can remove the  “!ionFrames_.done() && ”  part of the condition.  Which should assert that your are not done when calling isScripted.

::: testing/testsuite-targets.mk
@@ +228,5 @@
>  jstestbrowser:
>  	$(MAKE) -C $(DEPTH)/config
>  	$(MAKE) -C $(DEPTH)/js/src/config
>  	$(MAKE) stage-jstests
> +	$(call RUN_REFTEST,$(DIST)/$(TESTS_PATH)/jstests.list --extra-profile-file=$(DIST)/$(TESTS_PATH)/user.js)

Related ?
Attachment #622606 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 622606 [details] [diff] [review]
v2

Review of attachment 622606 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/arm/Trampoline-arm.cpp
@@ +292,5 @@
> +        masm.ma_b(&exception, Assembler::Zero);
> +        masm.ma_pop(pc);
> +    }
> +
> +    masm.bind(&osr);

Did you want to have a branch to this label from somewhere?
Attachment #622606 - Flags: review?(mrosenberg) → review+
> > +            if (interpMode == JSINTERP_BAILOUT && regs.fp()->runningInIon()) {
> > +                JS_ASSERT(entryFrame == regs.fp());
> 
> Are you sure you are dealing with the previous frames and not the current
> frame ?  It seems like the condition of this branch cannot be accomplished

With the special cases added elsewhere, this is now unreachable. Since chunked compilation is postponed I'm not going to pursue (yet) the generic fix. I'm landing this patch somewhat incomplete because it does fix known issues with gczeal and --ion-eager.

http://hg.mozilla.org/projects/ionmonkey/rev/d3593b87e0af
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.