Last Comment Bug 775818 - IonMonkey: Add JM inline cache for calls to Ion code
: IonMonkey: Add JM inline cache for calls to Ion code
Status: RESOLVED FIXED
[ion:p1:fx18]
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: ---
Assigned To: David Anderson [:dvander]
:
:
Mentors:
Depends on:
Blocks: IonSpeed 777561
  Show dependency treegraph
 
Reported: 2012-07-19 19:26 PDT by David Anderson [:dvander]
Modified: 2012-08-02 14:10 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
microbenchmark (387 bytes, application/javascript)
2012-07-19 19:27 PDT, David Anderson [:dvander]
no flags Details
WIP v0 (14.52 KB, patch)
2012-07-20 20:10 PDT, David Anderson [:dvander]
no flags Details | Diff | Splinter Review
WIP v1 (14.52 KB, patch)
2012-07-23 15:16 PDT, David Anderson [:dvander]
no flags Details | Diff | Splinter Review
WIP v2 (28.97 KB, patch)
2012-07-23 18:35 PDT, David Anderson [:dvander]
no flags Details | Diff | Splinter Review
WIP v3 (29.21 KB, patch)
2012-07-23 19:22 PDT, David Anderson [:dvander]
no flags Details | Diff | Splinter Review
WIP v4 (33.82 KB, patch)
2012-07-24 19:23 PDT, David Anderson [:dvander]
no flags Details | Diff | Splinter Review
WIP v5 (30.28 KB, patch)
2012-07-24 20:00 PDT, David Anderson [:dvander]
no flags Details | Diff | Splinter Review
v0 (30.11 KB, patch)
2012-07-27 02:14 PDT, David Anderson [:dvander]
jdemooij: review+
nicolas.b.pierron: review+
Details | Diff | Splinter Review

Description David Anderson [:dvander] 2012-07-19 19:26:36 PDT
We've been trying to balance heuristics for when a function should be compiled in Ion versus JM. The problem is, an inner function might become hot faster than a function which calls it, and transitioning from JM to Ion is extremely expensive.

So far the heuristics have been hit or miss. By making calls from JM to Ion fast, we should be able to relax them. 

Disabling the heuristics, disabling inlining, and writing a microbenchmark, I get the following times:

JM->JM: 89ms
JM->IM: 289ms
IM->IM: 145ms

And then for posterity, some v8 measurements:
CS->CS: 120ms
V8->CS: 110ms

(I suspect Ion to Ion calls are slow because we need two loads and a guard, and also push more stack words than other JITs. It's not clear how much that matters on benchmarks though.)

In this bug, I would like to create an IonActivation as part of EnterMethodJIT, then add a new trampoline that we can call via a new call IC from JM. If that's not fast enough, we can start shaving instructions off that path (which should not be too difficult).
Comment 1 David Anderson [:dvander] 2012-07-19 19:27:46 PDT
Created attachment 644140 [details]
microbenchmark
Comment 2 David Anderson [:dvander] 2012-07-20 20:10:03 PDT
Created attachment 644580 [details] [diff] [review]
WIP v0

With this, our JM->Ion time is identical to Ion->Ion time, which is kind of surprising. 2X might be good enough. I'll clean up the patch so it passes tests and try SunSpider soon.

(Note, this patch changes two heuristics: we can call into Ion functions when the caller is not Ion-compilable, and the call count heuristic is dropped.)
Comment 3 David Anderson [:dvander] 2012-07-23 15:16:37 PDT
Created attachment 645095 [details] [diff] [review]
WIP v1

Handles NULL IonScripts by chaining an Ion stub into a generic call stub. The reverse seems a little harder so I'll hold off on that until we know we need it.
Comment 4 David Anderson [:dvander] 2012-07-23 18:35:10 PDT
Created attachment 645163 [details] [diff] [review]
WIP v2

Bailouts are working now... this adds a new join point in the call IC path, so Ion can completely avoid pushing a StackFrame. We now flag the existing frame so that StackIter knows to find an IonActivation, and store the outgoing prevpc into the IonActivation.

TODO: error handling, recompilation handling, cross-platform support.
Comment 5 David Anderson [:dvander] 2012-07-23 19:22:50 PDT
Created attachment 645171 [details] [diff] [review]
WIP v3

Exception handling, fix a bunch of test failures (forgot a call to setupFallibleABICall).
Comment 6 David Anderson [:dvander] 2012-07-24 19:23:53 PDT
Created attachment 645621 [details] [diff] [review]
WIP v4

x64 support, recompilation handling.
Comment 7 David Anderson [:dvander] 2012-07-24 20:00:12 PDT
Created attachment 645625 [details] [diff] [review]
WIP v5

Okay this is starting to near completion... only 1-2 tests failures remaining (one is a crash in sunspider so I can't test that yet).

Baseline v8 results:
 Richards: 11374 
 DeltaBlue: 12880
 Crypto: 12879
 RayTrace: 7055
 EarleyBoyer: 12943
 RegExp: 1668
 Splay: 11531
 NavierStokes: 18513
 ----
 Score (version 7): 9408

W/ Patch, and removing the call count heuristic:
 Richards: 10919
 DeltaBlue: 12609
 Crypto: 11271
 RayTrace: 8701
 EarleyBoyer: 11975
 RegExp: 1661
 Splay: 11792
 NavierStokes: 18383
 ----
 Score (version 7): 9346

W/ patch, keeping the call count heuristic:
 Richards: 11060
 DeltaBlue: 12269
 Crypto: 12183
 RayTrace: 8983
 EarleyBoyer: 12076
 RegExp: 1647
 Splay: 11865
 NavierStokes: 18642
 ----
 Score (version 7): 9481
Comment 8 David Anderson [:dvander] 2012-07-27 02:14:38 PDT
Created attachment 646492 [details] [diff] [review]
v0

This is about a 2ms win on SunSpider. math-partial-sums regresses 4ms because we run it more in Ion now, but we can get that back (bugs are on file).

Jan, could you review my JM changes? Note I don't have ARM support working yet, will do that tomorrow in a separate patch.
Comment 9 David Anderson [:dvander] 2012-07-27 02:16:26 PDT
Comment on attachment 646492 [details] [diff] [review]
v0

Nicolas, could you review the stack changes outside JM?
Comment 10 Jan de Mooij [:jandem] 2012-07-27 06:39:02 PDT
Comment on attachment 646492 [details] [diff] [review]
v0

Review of attachment 646492 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work, r=me on the JM changes with nits addressed.

::: js/src/methodjit/InvokeHelpers.cpp
@@ +261,5 @@
>      // is likely not worth the overhead.
>      if (!callee->hasAnalysis() || !callee->analysis()->hasLoops())
>          return true;
>  
> +    //// If we make a ton of JM -> Ion calls, use JM.

Nit: s/ //// / //

::: js/src/methodjit/MonoIC.cpp
@@ +554,5 @@
>      }
>  
> +    bool generateIonStub()
> +    {
> +        JS_ASSERT(!f.regs.inlined());

Why does this hold? Is there a check somewhere that's not part of the patch? Asking since we'd have to use f.pc() instead of f.regs.pc if f.regs.inlined() can be true.

@@ +581,5 @@
> +
> +        RegisterID t0 = regs.takeAnyReg().reg();
> +        RegisterID t1 = Registers::ClobberInCall;
> +
> +        masm.move(ImmPtr(f.regs.pc), t1);

You don't need t1 if you change the storePtr below to

masm.storePtr(ImmPtr(f.regs.pc), Address(t0, ion::IonActivation::offsetOfPrevPc()));

@@ +632,5 @@
> +        RegisterID stackPushed = regs.takeAnyReg().reg();
> +        masm.move(Imm32(staticPushed - sizeof(uintptr_t)), stackPushed);
> +
> +        /*
> +         * Check whether argc < nags, because if so, we need to pad the stack.

Nit: s/nags/nargs

@@ +656,5 @@
> +            Label loop = masm.label();
> +            masm.subPtr(Imm32(sizeof(Value)), Registers::StackPointer);
> +            masm.storeValue(UndefinedValue(), Address(Registers::StackPointer, 0));
> +            masm.sub32(Imm32(1), t0);
> +            Jump test = masm.branchTest32(Assembler::NonZero, t0, t0);

You can avoid the test with:

Jump test = masm.branchSub32(Assembler::NonZero, Imm32(1), t0);

@@ +675,5 @@
> +
> +            /* Copy the argument onto the native stack. */
> +#ifdef JS_NUNBOX32
> +            masm.load32(Address(t0, 0), t1);
> +            masm.store32(t1, Address(Registers::StackPointer, 0));

It would be good if we could get rid of the two sub instructions. There is a push(Address) method so the following should work, where x is a constant offset from t0 that's different for each argument:

masm.push(Address(t0, x));
masm.push(Address(t0, x + 4));

::: js/src/methodjit/MonoIC.h
@@ +232,5 @@
> +    }
> +    JSC::CodeLocationJump oolJump() {
> +        return slowPathStart.jumpAtOffset(oolJumpOffset);
> +    }
> +    JSC::CodeLocationJump lastOolJump() {

Nit: looks like this method is never called, can we remove it + the lastOolJump_ member?
Comment 11 Nicolas B. Pierron [:nbp] 2012-07-27 12:44:10 PDT
Comment on attachment 646492 [details] [diff] [review]
v0

Review of attachment 646492 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work, fix the StackIter and fix the few nits and check on ARM.

r=me once fixed.

::: js/src/ion/Bailouts.cpp
@@ +255,5 @@
>          // work).
> +        // 
> +        // Note: If the entry frame is a placeholder (a stub frame pushed for
> +        // JM -> Ion calls), then cannot re-use it as it does not have enough
> +        // slots.

nit: then >we< cannot ?

::: js/src/ion/IonFrames.cpp
@@ +334,5 @@
>  {
> +    while (activation_ && !activation_->entryfp()) {
> +        top_ = activation_->prevIonTop();
> +        activation_ = activation_->prev();
> +    }

nit: Create a small function skipEmptyActivation with a comment indicating where such frame can be created and call it from the constructors and from the operator++.

::: js/src/methodjit/InvokeHelpers.cpp
@@ +261,5 @@
>      // is likely not worth the overhead.
>      if (!callee->hasAnalysis() || !callee->analysis()->hasLoops())
>          return true;
>  
> +    //// If we make a ton of JM -> Ion calls, use JM.

jan: s,////,//, is a valid unambiguous sed expression ;)

::: js/src/methodjit/MonoIC.cpp
@@ +670,5 @@
> +        /* Copy all remaining arguments. */
> +        for (size_t i = 0; i < argc + 1; i++) {
> +            /* Bump the stack, and advance to the next argument. */
> +            masm.subPtr(Imm32(sizeof(Value)), Registers::StackPointer);
> +            masm.subPtr(Imm32(sizeof(Value)), t0);

comment: ok, to keep my idea clear, the JM (StackFrame) stack is growing in the opposite direction but the arguments are also in reverse order. right?

@@ +696,5 @@
> +
> +        /* Call into Ion. */
> +        masm.loadPtr(Address(ionScript, ion::IonScript::offsetOfMethod()), t0);
> +        masm.loadPtr(Address(t0, ion::IonCode::OffsetOfCode()), t0);
> +        masm.call(t0);

This won't work on ARM because the Ion masm.call is different than the JM masm.call.
You might have to do a

masm.push(pc)

before which as opposed of the common sense should work because when you push the pc it in fact push the pc + 8 which makes it good for a return address, ask Marty for confirmation.

::: js/src/vm/Stack.cpp
@@ +1490,5 @@
>              ++ionActivations_;
>              popFrame();
>              settleOnNewState();
> +        } else {
> +            JS_ASSERT(fp_->callingIntoIon());

You need to modify StackIter::settleOnNewState() to accept frame callingIntoIon.

nit: hum … I think this function is lacking some return statements.
Comment 12 David Anderson [:dvander] 2012-07-27 18:57:08 PDT
(In reply to Jan de Mooij (:jandem) from comment #10)
> > +    bool generateIonStub()
> > +    {
> > +        JS_ASSERT(!f.regs.inlined());
> 
> Why does this hold? Is there a check somewhere that's not part of the patch?
> Asking since we'd have to use f.pc() instead of f.regs.pc if
> f.regs.inlined() can be true.

According to Brian, JM+TI does not use the Call IC during inlining.

> > +        masm.move(ImmPtr(f.regs.pc), t1);
> 
> You don't need t1 if you change the storePtr below to

Ah, I did this because it gets moved twice and on x64 that's two very large instructions.

> It would be good if we could get rid of the two sub instructions. There is a
> push(Address) method so the following should work, where x is a constant
> offset from t0 that's different for each argument:
> 
> masm.push(Address(t0, x));
> masm.push(Address(t0, x + 4));

Thanks, this is much cleaner.

> 
> ::: js/src/methodjit/MonoIC.h
> @@ +232,5 @@
> > +    }
> > +    JSC::CodeLocationJump oolJump() {
> > +        return slowPathStart.jumpAtOffset(oolJumpOffset);
> > +    }
> > +    JSC::CodeLocationJump lastOolJump() {
> 
> Nit: looks like this method is never called, can we remove it + the
> lastOolJump_ member?

Whoops, the fact that it is unused is a bug, I meant for generateFullCallStub to link to it.
Comment 13 Brian Hackett (:bhackett) 2012-07-27 19:03:50 PDT
(In reply to David Anderson [:dvander] from comment #12)
> (In reply to Jan de Mooij (:jandem) from comment #10)
> > > +    bool generateIonStub()
> > > +    {
> > > +        JS_ASSERT(!f.regs.inlined());
> > 
> > Why does this hold? Is there a check somewhere that's not part of the patch?
> > Asking since we'd have to use f.pc() instead of f.regs.pc if
> > f.regs.inlined() can be true.
> 
> According to Brian, JM+TI does not use the Call IC during inlining.

Not quite, JM+TI will use call ICs within inlined frames, but those ICs cannot generate native stubs.  Rather than bailing out, JM+TI just marks the innermost function as uninlineable (which will cause the calling script to be invalidated if it is currently inlined).  The relevant code, in generateNativeStub:

        /*
         * Native stubs are not generated for inline frames. The overhead of
         * bailing out from the IC is far greater than the time saved by
         * inlining the parent frame in the first place, so mark the immediate
         * caller as uninlineable.
         */
        if (f.script()->function()) {
            f.script()->uninlineable = true;
            MarkTypeObjectFlags(cx, f.script()->function(), types::OBJECT_FLAG_UNINLINEABLE);
        }
Comment 14 David Anderson [:dvander] 2012-07-27 19:11:46 PDT
(In reply to Brian Hackett (:bhackett) from comment #13)
> (In reply to David Anderson [:dvander] from comment #12)
> > (In reply to Jan de Mooij (:jandem) from comment #10)
> > > > +    bool generateIonStub()
> > > > +    {
> > > > +        JS_ASSERT(!f.regs.inlined());
> > > 
> > > Why does this hold? Is there a check somewhere that's not part of the patch?
> > > Asking since we'd have to use f.pc() instead of f.regs.pc if
> > > f.regs.inlined() can be true.
> > 
> > According to Brian, JM+TI does not use the Call IC during inlining.
> 
> Not quite, JM+TI will use call ICs within inlined frames, but those ICs
> cannot generate native stubs.  Rather than bailing out, JM+TI just marks the
> innermost function as uninlineable (which will cause the calling script to
> be invalidated if it is currently inlined).  The relevant code, in
> generateNativeStub:

Ah, ok. Is it a problem if I generate these stubs within inlined frames? I'm worried that otherwise I'll accidentally forbid innocent functions from being inlined.
Comment 15 Brian Hackett (:bhackett) 2012-07-27 19:22:57 PDT
(In reply to David Anderson [:dvander] from comment #14)
> Ah, ok. Is it a problem if I generate these stubs within inlined frames? I'm
> worried that otherwise I'll accidentally forbid innocent functions from
> being inlined.

Yeah, you should do the same thing as the native stub and make sure you don't generate any Ion-calling stubs from inline frames.  I wouldn't worry about the effects on JM inlining, as in most cases JM won't do any inlining anyways, since IM compilation is triggered when we'd normally start doing JM inlining.  After IM lands, JM inlining should probably just be ripped right out, would simplify the VM somewhat.
Comment 16 David Anderson [:dvander] 2012-07-30 14:31:27 PDT
(In reply to Nicolas B. Pierron [:pierron] from comment #11)
> 
> nit: Create a small function skipEmptyActivation with a comment indicating
> where such frame can be created and call it from the constructors and from
> the operator++.

Sure.

> comment: ok, to keep my idea clear, the JM (StackFrame) stack is growing in
> the opposite direction but the arguments are also in reverse order. right?

Right.

> >              ++ionActivations_;
> >              popFrame();
> >              settleOnNewState();
> > +        } else {
> > +            JS_ASSERT(fp_->callingIntoIon());
> 
> You need to modify StackIter::settleOnNewState() to accept frame
> callingIntoIon.

Thanks, fixed.
Comment 17 Paul Wright 2012-07-31 07:49:24 PDT
Is this FIXED?
Comment 18 David Anderson [:dvander] 2012-08-02 14:10:20 PDT
It looks like this managed to stick and was a slight win on the benchmarks (maybe 2%/4ms on sunspider, and ~100 points v8). I'll take it, since the final patch removed the Ion heuristics.

http://hg.mozilla.org/projects/ionmonkey/rev/b46621aba6fd

Note You need to log in before you can comment on or make changes to this bug.