Closed Bug 859609 Opened 7 years ago Closed 7 years ago

IonMonkey: JSOP_CALL: Inline lambda functions.

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla24

People

(Reporter: nbp, Assigned: djvj)

References

(Depends on 1 open bug)

Details

Attachments

(5 files, 1 obsolete file)

Attached patch Inline calls of lambda function. (obsolete) — Splinter Review
When a lambda function is seen the it is likely created with a scope chain, which cause us to forbid the inlining of the function.  In addition TI only find a TypeObject and not a singleton which cause IonBuilder to discard any inlining options.

The attached patch is a WIP, which attempt to inline lambda declared in the same context.
This patch is working on PdfJS but it does not show any improvement, on the contrary it seems to reduce the performances by inlining too much instead of executing the generated code.

The reason I made this patch in the first place was to help the escape analysis when only non-escaped lambda are capturing the scope chain.  Which cause us to allocate the scope chain when it could live on the stack.
Perhaps the issue is related to the inlining heuristic which is too eager to do inlining, thus bloating the code and reducing performance.

Also, can you explain if your patch affects direct calls to lambdas, i.e.

    (function (x) { return x*x; })(10)

Is the function inlined to get more or less the inline evaluation of 10*10 ?
(In reply to Marc Feeley from comment #1)
> Perhaps the issue is related to the inlining heuristic which is too eager to
> do inlining, thus bloating the code and reducing performance.

Last time I checked, the information provided by TI was no good enough.  My guess was that we were allocation a new lambda every time to at least capture the scope chain, but based on brian claim we should not have that.

This patch hacks it way to the call/inline as the lambda is a unique known function.

> Also, can you explain if your patch affects direct calls to lambdas, i.e.
> 
>     (function (x) { return x*x; })(10)

js> function f() { return (function (x) { return x*x; })(10); }
js> dis(f)
flags:
loc     op
-----   --
main:
00000:  lambda (function (x) { return x*x; })
00005:  undefined
00006:  notearg
00007:  int8 10
00009:  notearg
00010:  call 1
00013:  return
00014:  stop

As long as the rest of the method can compile, yes it will.  This patch looks for JSOP_LAMBDA and bake them in the JSOP_CALL as a substitute of the type information.

> Is the function inlined to get more or less the inline evaluation of 10*10 ?

Yes it should be.  I have not tried to see if it would, but the bytecode looks exactly like the case I was trying to solve in pdfjs:16934 (getToken).
I am taking this bug.  Nicholas, let me know if I'm hijacking something and we can talk about it.

My plan is to enable inlining of functions which us their scope objects.  And then proceed to enable inlining of functions which contain JSOP_LAMBDA.  That should allow for inlining a much larger set of functions, and generally be useful for improving performance of code which uses higher-order functions such as |forEach| and |map| with callbacks that use aliased vars.
Assignee: general → kvijayan
(In reply to Kannan Vijayan [:djvj] from comment #3)
> I am taking this bug.  Nicholas, let me know if I'm hijacking something and
> we can talk about it.

I don't know if Marc planned on working on it or not, but I guess not yet.

I made this change in hope to see what was needed to inline the getToken function in the function 16934.  Inlining such functions in their parent would likely increase the compilation cost, but at soon as we have any resume operations we can see that the lambda is not returned and see that nothing else can capture the scope chain, which would imply that we can allocate the scope chain on the stack and thus have better type information on the loop variable.

Feel free to work on it.

> My plan is to enable inlining of functions which us their scope objects. 
> And then proceed to enable inlining of functions which contain JSOP_LAMBDA. 
> That should allow for inlining a much larger set of functions, and generally
> be useful for improving performance of code which uses higher-order
> functions such as |forEach| and |map| with callbacks that use aliased vars.

Great.
Depends on: 879655
This patch allows for inlining of heavyweight functions, as well as recognizing and inlining "monomorphic" lambda call sites where the callees are all clones of a single template function.
Attachment #734933 - Attachment is obsolete: true
Attachment #759752 - Flags: review?(hv1989)
Comment on attachment 759752 [details] [diff] [review]
Inline heavyweight functions

Review of attachment 759752 [details] [diff] [review]:
-----------------------------------------------------------------

Nice. It are all nits, except for the removal of IonBuilder::lazyArguments_ (that is available through inlineCallInfo_ now).

::: js/src/ion/IonBuilder.cpp
@@ +190,5 @@
> +            }
> +            if (obj->toFunction()->isInterpreted() && !obj->toFunction()->getOrCreateScript(cx))
> +                return false;
> +            if (!targets.append(obj))
> +                return false;

You can loose the "if" here, since it is pre-allocated.

@@ +201,5 @@
> +            }
> +            if (!typeObj->interpretedFunction) {
> +                targets.clear();
> +                return true;
> +            }

Can you collapse both "if" into one?

@@ +203,5 @@
> +                targets.clear();
> +                return true;
> +            }
> +            if (!targets.append(typeObj->interpretedFunction))
> +                return false;

You can loose the "if" here, since it is pre-allocated.

@@ +243,5 @@
>      return true;
>  }
>  
>  bool
> +IonBuilder::canInlineTarget(JSFunction *target)

nice

@@ +4703,5 @@
>      }
>  
>      int calleeDepth = -((int)argc + 2);
>  
> +    // Acquire known ctagall target if existent.

ctagall??

::: js/src/ion/IonBuilder.h
@@ +625,5 @@
>      bool nonStringIteration_;
>  
>      // If this script can use a lazy arguments object, it will be pre-created
>      // here.
>      MInstruction *lazyArguments_;

This isn't needed anymore, since that info is the same as inlineCallInfo_->argv(). So remove this everywhere

@@ +628,5 @@
>      // here.
>      MInstruction *lazyArguments_;
> +
> +    // If this is an inline builder, the call info for the builder.
> +    CallInfo *inlineCallInfo_;

Can we make a const CallInfo * out of this? So that we don't adjust it anymore after inlining?

@@ +730,5 @@
>      }
>  
> +    bool lambda() const {
> +        return lambda_;
> +    }

isLambda()

::: js/src/ion/MIR.h
@@ +6182,5 @@
>      }
> +
> +    TypePolicy *typePolicy() {
> +        return this;
> +    }

Good find :P
Attachment #759752 - Flags: review?(hv1989) → review+
(In reply to Hannes Verschore [:h4writer] from comment #6)
> Comment on attachment 759752 [details] [diff] [review]
> Inline heavyweight functions
> 
> Review of attachment 759752 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Nice. It are all nits, except for the removal of IonBuilder::lazyArguments_
> (that is available through inlineCallInfo_ now).
...
> ::: js/src/ion/IonBuilder.h
> @@ +625,5 @@
> >      bool nonStringIteration_;
> >  
> >      // If this script can use a lazy arguments object, it will be pre-created
> >      // here.
> >      MInstruction *lazyArguments_;
> 
> This isn't needed anymore, since that info is the same as
> inlineCallInfo_->argv(). So remove this everywhere

We sitll need lazyArguments_.

The argv() in inline call info is a vector of the argument definitions.  lazyArguments is a MIR instruction that constructs the arguments object (for needsArgsObj functions), and it happens inside the function's jitcode, not outside.
Owh I ment Vector<MDefinition *, 0, IonAllocPolicy> inlinedArguments_; Sorry
Comments addressed, tested on try (linux and linux64), and pushed:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1684c32be328

Try Run:
https://tbpl.mozilla.org/?tree=Try&rev=a911dccd8ad0
https://hg.mozilla.org/mozilla-central/rev/1684c32be328
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
Thanks so much Kannan!

I did some quick benchmarking and the performance of tail-recursive nested functions is much improved.  For example (see attached test programs):

% ./js-release-old tail-calls-toplevel.js
463
% ./js-release-new tail-calls-toplevel.js
463
% ./js-release-old tail-calls-nested.js
955
% ./js-release-new tail-calls-nested.js
562

The performance of tail-recursive toplevel functions hasn't changed, but tail-recursive nested functions are 41% faster than before your patch.  Nested functions are however not as fast as toplevel functions.

I tried my earley benchmark and got this:

% ./js-release-old earley1.js
950
% ./js-release-new earley1.js
899
% ./js-release-old earley1-ll.js 
519
% ./js-release-new earley1-ll.js 
519

Which means earley has gotten about 5% faster without lambda-lifting (yay!), but with lambda-lifting there is another 42% speed boost we can expect to get.
Attached file test program
Attached file test program #2
Attached file test program #3
Attached file test program #4
(In reply to Marc Feeley from comment #11)
> Thanks so much Kannan!
> 
> I did some quick benchmarking and the performance of tail-recursive nested
> functions is much improved.  For example (see attached test programs):
> 
...
> 
> Which means earley has gotten about 5% faster without lambda-lifting (yay!),
> but with lambda-lifting there is another 42% speed boost we can expect to
> get.

Thanks for the feedback Marc!  The top-level calls didn't improve because top-level functions were already eligible for inlining.  However, nested functions which accessed their parent scope's variables were not previously eligible for inlining, and now they are, hence the speedup for those.

I was aiming for gains in code that uses higher-order functions with callbacks, e.g.:

function sum(arr) {
  var s = 0;
  arr.forEach(function (i) { s += i; });
  return s;
}

Good to hear about gains elsewhere as well.

The remaining bit of perf differential exists because we have to force aliased-variable accesses to go through an explicit scope object.  In the above example, setting |s| from inside the callback is an explicit object write, not a local slot write.  Ion can't optimize |s| into a register, and we do an extra memory read+write for every iteration of the loop.

I'm investigating ways of eliminating the scope object, but it'll be tricky.  In particular because if the callback function escapes, we really need to be able to reconstitute the explicit scope object.  And the callback function can escape in really subtle ways (e.g. it calls another function which uses |fn.caller| to get a ref to the callback).
Depends on: 883973
Duplicate of this bug: 885697
You need to log in before you can comment on or make changes to this bug.