Closed Bug 775818 Opened 12 years ago Closed 12 years ago

IonMonkey: Add JM inline cache for calls to Ion code

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: dvander, Assigned: dvander)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [ion:p1:fx18])

Attachments

(2 files, 6 obsolete files)

We've been trying to balance heuristics for when a function should be compiled in Ion versus JM. The problem is, an inner function might become hot faster than a function which calls it, and transitioning from JM to Ion is extremely expensive.

So far the heuristics have been hit or miss. By making calls from JM to Ion fast, we should be able to relax them. 

Disabling the heuristics, disabling inlining, and writing a microbenchmark, I get the following times:

JM->JM: 89ms
JM->IM: 289ms
IM->IM: 145ms

And then for posterity, some v8 measurements:
CS->CS: 120ms
V8->CS: 110ms

(I suspect Ion to Ion calls are slow because we need two loads and a guard, and also push more stack words than other JITs. It's not clear how much that matters on benchmarks though.)

In this bug, I would like to create an IonActivation as part of EnterMethodJIT, then add a new trampoline that we can call via a new call IC from JM. If that's not fast enough, we can start shaving instructions off that path (which should not be too difficult).
Attached patch WIP v0 (obsolete) — Splinter Review
With this, our JM->Ion time is identical to Ion->Ion time, which is kind of surprising. 2X might be good enough. I'll clean up the patch so it passes tests and try SunSpider soon.

(Note, this patch changes two heuristics: we can call into Ion functions when the caller is not Ion-compilable, and the call count heuristic is dropped.)
Attached patch WIP v1 (obsolete) — Splinter Review
Handles NULL IonScripts by chaining an Ion stub into a generic call stub. The reverse seems a little harder so I'll hold off on that until we know we need it.
Attachment #644580 - Attachment is obsolete: true
Attached patch WIP v2 (obsolete) — Splinter Review
Bailouts are working now... this adds a new join point in the call IC path, so Ion can completely avoid pushing a StackFrame. We now flag the existing frame so that StackIter knows to find an IonActivation, and store the outgoing prevpc into the IonActivation.

TODO: error handling, recompilation handling, cross-platform support.
Attachment #645095 - Attachment is obsolete: true
Attached patch WIP v3 (obsolete) — Splinter Review
Exception handling, fix a bunch of test failures (forgot a call to setupFallibleABICall).
Attachment #645163 - Attachment is obsolete: true
Attached patch WIP v4 (obsolete) — Splinter Review
x64 support, recompilation handling.
Attachment #645171 - Attachment is obsolete: true
Attached patch WIP v5 (obsolete) — Splinter Review
Okay this is starting to near completion... only 1-2 tests failures remaining (one is a crash in sunspider so I can't test that yet).

Baseline v8 results:
 Richards: 11374 
 DeltaBlue: 12880
 Crypto: 12879
 RayTrace: 7055
 EarleyBoyer: 12943
 RegExp: 1668
 Splay: 11531
 NavierStokes: 18513
 ----
 Score (version 7): 9408

W/ Patch, and removing the call count heuristic:
 Richards: 10919
 DeltaBlue: 12609
 Crypto: 11271
 RayTrace: 8701
 EarleyBoyer: 11975
 RegExp: 1661
 Splay: 11792
 NavierStokes: 18383
 ----
 Score (version 7): 9346

W/ patch, keeping the call count heuristic:
 Richards: 11060
 DeltaBlue: 12269
 Crypto: 12183
 RayTrace: 8983
 EarleyBoyer: 12076
 RegExp: 1647
 Splay: 11865
 NavierStokes: 18642
 ----
 Score (version 7): 9481
Attachment #645621 - Attachment is obsolete: true
Attached patch v0Splinter Review
This is about a 2ms win on SunSpider. math-partial-sums regresses 4ms because we run it more in Ion now, but we can get that back (bugs are on file).

Jan, could you review my JM changes? Note I don't have ARM support working yet, will do that tomorrow in a separate patch.
Attachment #645625 - Attachment is obsolete: true
Attachment #646492 - Flags: review?(jdemooij)
Comment on attachment 646492 [details] [diff] [review]
v0

Nicolas, could you review the stack changes outside JM?
Attachment #646492 - Flags: review?(nicolas.b.pierron)
Comment on attachment 646492 [details] [diff] [review]
v0

Review of attachment 646492 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work, r=me on the JM changes with nits addressed.

::: js/src/methodjit/InvokeHelpers.cpp
@@ +261,5 @@
>      // is likely not worth the overhead.
>      if (!callee->hasAnalysis() || !callee->analysis()->hasLoops())
>          return true;
>  
> +    //// If we make a ton of JM -> Ion calls, use JM.

Nit: s/ //// / //

::: js/src/methodjit/MonoIC.cpp
@@ +554,5 @@
>      }
>  
> +    bool generateIonStub()
> +    {
> +        JS_ASSERT(!f.regs.inlined());

Why does this hold? Is there a check somewhere that's not part of the patch? Asking since we'd have to use f.pc() instead of f.regs.pc if f.regs.inlined() can be true.

@@ +581,5 @@
> +
> +        RegisterID t0 = regs.takeAnyReg().reg();
> +        RegisterID t1 = Registers::ClobberInCall;
> +
> +        masm.move(ImmPtr(f.regs.pc), t1);

You don't need t1 if you change the storePtr below to

masm.storePtr(ImmPtr(f.regs.pc), Address(t0, ion::IonActivation::offsetOfPrevPc()));

@@ +632,5 @@
> +        RegisterID stackPushed = regs.takeAnyReg().reg();
> +        masm.move(Imm32(staticPushed - sizeof(uintptr_t)), stackPushed);
> +
> +        /*
> +         * Check whether argc < nags, because if so, we need to pad the stack.

Nit: s/nags/nargs

@@ +656,5 @@
> +            Label loop = masm.label();
> +            masm.subPtr(Imm32(sizeof(Value)), Registers::StackPointer);
> +            masm.storeValue(UndefinedValue(), Address(Registers::StackPointer, 0));
> +            masm.sub32(Imm32(1), t0);
> +            Jump test = masm.branchTest32(Assembler::NonZero, t0, t0);

You can avoid the test with:

Jump test = masm.branchSub32(Assembler::NonZero, Imm32(1), t0);

@@ +675,5 @@
> +
> +            /* Copy the argument onto the native stack. */
> +#ifdef JS_NUNBOX32
> +            masm.load32(Address(t0, 0), t1);
> +            masm.store32(t1, Address(Registers::StackPointer, 0));

It would be good if we could get rid of the two sub instructions. There is a push(Address) method so the following should work, where x is a constant offset from t0 that's different for each argument:

masm.push(Address(t0, x));
masm.push(Address(t0, x + 4));

::: js/src/methodjit/MonoIC.h
@@ +232,5 @@
> +    }
> +    JSC::CodeLocationJump oolJump() {
> +        return slowPathStart.jumpAtOffset(oolJumpOffset);
> +    }
> +    JSC::CodeLocationJump lastOolJump() {

Nit: looks like this method is never called, can we remove it + the lastOolJump_ member?
Attachment #646492 - Flags: review?(jdemooij) → review+
Comment on attachment 646492 [details] [diff] [review]
v0

Review of attachment 646492 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work, fix the StackIter and fix the few nits and check on ARM.

r=me once fixed.

::: js/src/ion/Bailouts.cpp
@@ +255,5 @@
>          // work).
> +        // 
> +        // Note: If the entry frame is a placeholder (a stub frame pushed for
> +        // JM -> Ion calls), then cannot re-use it as it does not have enough
> +        // slots.

nit: then >we< cannot ?

::: js/src/ion/IonFrames.cpp
@@ +334,5 @@
>  {
> +    while (activation_ && !activation_->entryfp()) {
> +        top_ = activation_->prevIonTop();
> +        activation_ = activation_->prev();
> +    }

nit: Create a small function skipEmptyActivation with a comment indicating where such frame can be created and call it from the constructors and from the operator++.

::: js/src/methodjit/InvokeHelpers.cpp
@@ +261,5 @@
>      // is likely not worth the overhead.
>      if (!callee->hasAnalysis() || !callee->analysis()->hasLoops())
>          return true;
>  
> +    //// If we make a ton of JM -> Ion calls, use JM.

jan: s,////,//, is a valid unambiguous sed expression ;)

::: js/src/methodjit/MonoIC.cpp
@@ +670,5 @@
> +        /* Copy all remaining arguments. */
> +        for (size_t i = 0; i < argc + 1; i++) {
> +            /* Bump the stack, and advance to the next argument. */
> +            masm.subPtr(Imm32(sizeof(Value)), Registers::StackPointer);
> +            masm.subPtr(Imm32(sizeof(Value)), t0);

comment: ok, to keep my idea clear, the JM (StackFrame) stack is growing in the opposite direction but the arguments are also in reverse order. right?

@@ +696,5 @@
> +
> +        /* Call into Ion. */
> +        masm.loadPtr(Address(ionScript, ion::IonScript::offsetOfMethod()), t0);
> +        masm.loadPtr(Address(t0, ion::IonCode::OffsetOfCode()), t0);
> +        masm.call(t0);

This won't work on ARM because the Ion masm.call is different than the JM masm.call.
You might have to do a

masm.push(pc)

before which as opposed of the common sense should work because when you push the pc it in fact push the pc + 8 which makes it good for a return address, ask Marty for confirmation.

::: js/src/vm/Stack.cpp
@@ +1490,5 @@
>              ++ionActivations_;
>              popFrame();
>              settleOnNewState();
> +        } else {
> +            JS_ASSERT(fp_->callingIntoIon());

You need to modify StackIter::settleOnNewState() to accept frame callingIntoIon.

nit: hum … I think this function is lacking some return statements.
Attachment #646492 - Flags: review?(nicolas.b.pierron) → review+
Whiteboard: [ion:p1:fx18]
(In reply to Jan de Mooij (:jandem) from comment #10)
> > +    bool generateIonStub()
> > +    {
> > +        JS_ASSERT(!f.regs.inlined());
> 
> Why does this hold? Is there a check somewhere that's not part of the patch?
> Asking since we'd have to use f.pc() instead of f.regs.pc if
> f.regs.inlined() can be true.

According to Brian, JM+TI does not use the Call IC during inlining.

> > +        masm.move(ImmPtr(f.regs.pc), t1);
> 
> You don't need t1 if you change the storePtr below to

Ah, I did this because it gets moved twice and on x64 that's two very large instructions.

> It would be good if we could get rid of the two sub instructions. There is a
> push(Address) method so the following should work, where x is a constant
> offset from t0 that's different for each argument:
> 
> masm.push(Address(t0, x));
> masm.push(Address(t0, x + 4));

Thanks, this is much cleaner.

> 
> ::: js/src/methodjit/MonoIC.h
> @@ +232,5 @@
> > +    }
> > +    JSC::CodeLocationJump oolJump() {
> > +        return slowPathStart.jumpAtOffset(oolJumpOffset);
> > +    }
> > +    JSC::CodeLocationJump lastOolJump() {
> 
> Nit: looks like this method is never called, can we remove it + the
> lastOolJump_ member?

Whoops, the fact that it is unused is a bug, I meant for generateFullCallStub to link to it.
(In reply to David Anderson [:dvander] from comment #12)
> (In reply to Jan de Mooij (:jandem) from comment #10)
> > > +    bool generateIonStub()
> > > +    {
> > > +        JS_ASSERT(!f.regs.inlined());
> > 
> > Why does this hold? Is there a check somewhere that's not part of the patch?
> > Asking since we'd have to use f.pc() instead of f.regs.pc if
> > f.regs.inlined() can be true.
> 
> According to Brian, JM+TI does not use the Call IC during inlining.

Not quite, JM+TI will use call ICs within inlined frames, but those ICs cannot generate native stubs.  Rather than bailing out, JM+TI just marks the innermost function as uninlineable (which will cause the calling script to be invalidated if it is currently inlined).  The relevant code, in generateNativeStub:

        /*
         * Native stubs are not generated for inline frames. The overhead of
         * bailing out from the IC is far greater than the time saved by
         * inlining the parent frame in the first place, so mark the immediate
         * caller as uninlineable.
         */
        if (f.script()->function()) {
            f.script()->uninlineable = true;
            MarkTypeObjectFlags(cx, f.script()->function(), types::OBJECT_FLAG_UNINLINEABLE);
        }
(In reply to Brian Hackett (:bhackett) from comment #13)
> (In reply to David Anderson [:dvander] from comment #12)
> > (In reply to Jan de Mooij (:jandem) from comment #10)
> > > > +    bool generateIonStub()
> > > > +    {
> > > > +        JS_ASSERT(!f.regs.inlined());
> > > 
> > > Why does this hold? Is there a check somewhere that's not part of the patch?
> > > Asking since we'd have to use f.pc() instead of f.regs.pc if
> > > f.regs.inlined() can be true.
> > 
> > According to Brian, JM+TI does not use the Call IC during inlining.
> 
> Not quite, JM+TI will use call ICs within inlined frames, but those ICs
> cannot generate native stubs.  Rather than bailing out, JM+TI just marks the
> innermost function as uninlineable (which will cause the calling script to
> be invalidated if it is currently inlined).  The relevant code, in
> generateNativeStub:

Ah, ok. Is it a problem if I generate these stubs within inlined frames? I'm worried that otherwise I'll accidentally forbid innocent functions from being inlined.
(In reply to David Anderson [:dvander] from comment #14)
> Ah, ok. Is it a problem if I generate these stubs within inlined frames? I'm
> worried that otherwise I'll accidentally forbid innocent functions from
> being inlined.

Yeah, you should do the same thing as the native stub and make sure you don't generate any Ion-calling stubs from inline frames.  I wouldn't worry about the effects on JM inlining, as in most cases JM won't do any inlining anyways, since IM compilation is triggered when we'd normally start doing JM inlining.  After IM lands, JM inlining should probably just be ripped right out, would simplify the VM somewhat.
(In reply to Nicolas B. Pierron [:pierron] from comment #11)
> 
> nit: Create a small function skipEmptyActivation with a comment indicating
> where such frame can be created and call it from the constructors and from
> the operator++.

Sure.

> comment: ok, to keep my idea clear, the JM (StackFrame) stack is growing in
> the opposite direction but the arguments are also in reverse order. right?

Right.

> >              ++ionActivations_;
> >              popFrame();
> >              settleOnNewState();
> > +        } else {
> > +            JS_ASSERT(fp_->callingIntoIon());
> 
> You need to modify StackIter::settleOnNewState() to accept frame
> callingIntoIon.

Thanks, fixed.
Is this FIXED?
It looks like this managed to stick and was a slight win on the benchmarks (maybe 2%/4ms on sunspider, and ~100 points v8). I'll take it, since the final patch removed the Ion heuristics.

http://hg.mozilla.org/projects/ionmonkey/rev/b46621aba6fd
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: