Closed
Bug 876733
Opened 11 years ago
Closed 9 months ago
Large performance gap between calling toplevel and nested functions
Categories
(Core :: JavaScript Engine, enhancement)
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: feeley, Unassigned)
Details
Attachments
(1 file)
621 bytes,
application/x-javascript
|
Details |
There seems to be a large difference in performance between calling functions defined at toplevel and calling functions nested inside other functions. Here is my test program (tail.js):
function test1(n) {
function f1(n, x) {
if (n > 0)
return f1(n-1, x+n);
else
return x;
}
return f1(n, 0);
}
function test2(n) {
return f2(n, 0);
}
function f2(n, x) {
if (n > 0)
return f2(n-1, x+n);
else
return x;
}
function go(n, test) {
var r;
while (n !== 0) {
r = test(1000);
n--;
}
return r;
}
go(1000, test1); // dry runs
go(1000, test2);
var start = new Date();
go(100000, test1);
var end = new Date();
print("test1: " + (end - start));
var start = new Date();
go(100000, test2);
var end = new Date();
print("test2: " + (end - start));
Here is the run time using js and also d8:
% js tail.js
test1: 974
test2: 478
% d8 tail.js
test1: 1807
test2: 499
So for js it is a factor of 2 faster to use a global definition (f2) rather than a local one (f1). For d8 it is almost 4 times faster.
I have looked at the disassembly of f1 and f2 and these functions differ in only one bytecode, the recursive call:
Disassembly of function f1:
flags:
loc op
----- --
main:
00000: getarg 0
00003: zero
00004: gt
00005: ifeq 44 (+39)
00010: callaliasedvar "f1" (hops = 0, slot = 2) ***only difference***
00019: undefined
00020: notearg
00021: getarg 0
00024: one
00025: sub
00026: notearg
00027: getarg 1
00030: getarg 0
00033: add
00034: notearg
00035: call 2
00038: return
00039: goto 48 (+9)
00044: getarg 1
00047: return
00048: stop
Disassembly of function f2:
flags:
loc op
----- --
main:
00000: getarg 0
00003: zero
00004: gt
00005: ifeq 40 (+35)
00010: callgname "f2" ***only difference***
00015: undefined
00016: notearg
00017: getarg 0
00020: one
00021: sub
00022: notearg
00023: getarg 1
00026: getarg 0
00029: add
00030: notearg
00031: call 2
00034: return
00035: goto 44 (+9)
00040: getarg 1
00043: return
00044: stop
How can this large performance difference be explained? Is there an optimization for callgname that is not done for callaliasedvar?
Updated•11 years ago
|
Flags: needinfo?(jdemooij)
Comment 1•11 years ago
|
||
As far as I know, we do not track type information on the scope chain slots, so the callaliasedvar does not have any insight on the callee, which probably cause us to produce a generic call instead of a specialized call.
Also, we currently forbid the inlining of lambdas because they hold a different scope chain. I made a patch for this one, but this would not work here because we read the lambda from the scope chain.
I guess, the first step would be to extract this info from the baseline compiler ICs.
I made a similar patch for setter & getter IC a few weeks ago …
Comment 2•11 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #1)
> I made a similar patch for setter & getter IC a few weeks ago …
See the attachement I made on Bug 846648, and on Bug 859609. The other thing to note, is that currently we have no way to freeze on the baseline ICs. If we extract a type set from the baseline ICs, then freezing the IC would mean that if we extend the baseline ICs we should then invalidate Ion's code.
Reporter | ||
Comment 3•11 years ago
|
||
I'm not clear about what you mean with "freeze" the baseline ICs. Can you elaborate?
Comment 4•11 years ago
|
||
IonMonkey is able to inline test2 -> f2 -> f2 directly in the "go" function (so it even inlines the recursive call once). It's not inlining anything for test1/f1.
test1 and test2 are called 100000 times. Every time we call test1, we have to create a "call object" (to hold closed over locals/arguments) and clone f1. This lambda cloning is required by the spec (and not doing it is observable: by comparing functions using == or === and functions can have extra properties etc). test2 does not have this problem.
Also, because f2 is always the same function (defined in the global scope), it has a singleton type: when we call f2 we know exactly which function we are going to call. f1 can't have a singleton type because the lambda cloning creates many different function objects.
As Nicolas suggested, what we could do for f1 is look at the baseline IC's and notice that they are always calling the same JSScript. We can then guard on that before inlining, but due to the extra guards it's going to be slower than f2.
Flags: needinfo?(jdemooij)
Reporter | ||
Comment 5•11 years ago
|
||
This argument is weak because a trivial analysis shows that f1 is never used in a == or === test (simply by observing that f1 is only used in the function position of calls).
I think this case (nested functions with no free variables being only used in the function position of calls) occurs often, so it might be worthwhile to invest some time in handling it efficiently.
Comment 6•11 years ago
|
||
(In reply to Marc Feeley from comment #3)
> I'm not clear about what you mean with "freeze" the baseline ICs. Can you
> elaborate?
Currently IonBuilder check for external type by asking type inference. When we use type inference function, we implicitly freeze on the data that they are returning us. That way we can safely avoid guards while having a bound on the lifetime of a script.
For example, if the result of a getter is defined by type inference as being an int, we can avoid baking in a guard, and add a constraint on the type set to ensure that no new types are added (freezing). As soon as a new type is added to the type set, we invalidate the code which assumed that we could avoid doing any guards. The invalidation process patch the return path of all running instances on the stack and prevent the reentry in the code.
If we extract this information from the baseline compiler, then we only get a StackTypeSet which is only known by IonBuilder. So even if we were to freeze on it, nothing would be able to modify its content, and so nothing would be able to invalidate the script through this TypeSet. Currently it is unsafe to assume the information provided from the baseline ICs as we cannot invalidate when it is becoming wrong.
(In reply to Marc Feeley from comment #5)
> This argument is weak because a trivial analysis shows that f1 is never used
> in a == or === test (simply by observing that f1 is only used in the
> function position of calls).
Sadly JavaScript is a very expressive language where we can ""inspect"" the call stack … so one can do “f.caller == g.caller” to compare multiple calls to f1 without naming it directly.
We can potentially do the same optimization as we do for f.arguments, to disable the optimization on f1, if f1 is ever access through f.caller (in addition to the static analysis that you are suggesting). This is wrong the first time we do it in the JIT, but as long as we have a good fallback path which can emulate this information, this would be good enough.
Welcome :)
Assignee | ||
Updated•10 years ago
|
Assignee: general → nobody
Updated•2 years ago
|
Severity: normal → S3
Updated•9 months ago
|
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•