Closed Bug 876733 Opened 11 years ago Closed 9 months ago

Large performance gap between calling toplevel and nested functions

Categories

(Core :: JavaScript Engine, enhancement)

21 Branch
x86
macOS
enhancement

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: feeley, Unassigned)

Details

Attachments

(1 file)

621 bytes, application/x-javascript
Details
Attached file test program
There seems to be a large difference in performance between calling functions defined at toplevel and calling functions nested inside other functions. Here is my test program (tail.js): function test1(n) { function f1(n, x) { if (n > 0) return f1(n-1, x+n); else return x; } return f1(n, 0); } function test2(n) { return f2(n, 0); } function f2(n, x) { if (n > 0) return f2(n-1, x+n); else return x; } function go(n, test) { var r; while (n !== 0) { r = test(1000); n--; } return r; } go(1000, test1); // dry runs go(1000, test2); var start = new Date(); go(100000, test1); var end = new Date(); print("test1: " + (end - start)); var start = new Date(); go(100000, test2); var end = new Date(); print("test2: " + (end - start)); Here is the run time using js and also d8: % js tail.js test1: 974 test2: 478 % d8 tail.js test1: 1807 test2: 499 So for js it is a factor of 2 faster to use a global definition (f2) rather than a local one (f1). For d8 it is almost 4 times faster. I have looked at the disassembly of f1 and f2 and these functions differ in only one bytecode, the recursive call: Disassembly of function f1: flags: loc op ----- -- main: 00000: getarg 0 00003: zero 00004: gt 00005: ifeq 44 (+39) 00010: callaliasedvar "f1" (hops = 0, slot = 2) ***only difference*** 00019: undefined 00020: notearg 00021: getarg 0 00024: one 00025: sub 00026: notearg 00027: getarg 1 00030: getarg 0 00033: add 00034: notearg 00035: call 2 00038: return 00039: goto 48 (+9) 00044: getarg 1 00047: return 00048: stop Disassembly of function f2: flags: loc op ----- -- main: 00000: getarg 0 00003: zero 00004: gt 00005: ifeq 40 (+35) 00010: callgname "f2" ***only difference*** 00015: undefined 00016: notearg 00017: getarg 0 00020: one 00021: sub 00022: notearg 00023: getarg 1 00026: getarg 0 00029: add 00030: notearg 00031: call 2 00034: return 00035: goto 44 (+9) 00040: getarg 1 00043: return 00044: stop How can this large performance difference be explained? Is there an optimization for callgname that is not done for callaliasedvar?
Flags: needinfo?(jdemooij)
As far as I know, we do not track type information on the scope chain slots, so the callaliasedvar does not have any insight on the callee, which probably cause us to produce a generic call instead of a specialized call. Also, we currently forbid the inlining of lambdas because they hold a different scope chain. I made a patch for this one, but this would not work here because we read the lambda from the scope chain. I guess, the first step would be to extract this info from the baseline compiler ICs. I made a similar patch for setter & getter IC a few weeks ago …
(In reply to Nicolas B. Pierron [:nbp] from comment #1) > I made a similar patch for setter & getter IC a few weeks ago … See the attachement I made on Bug 846648, and on Bug 859609. The other thing to note, is that currently we have no way to freeze on the baseline ICs. If we extract a type set from the baseline ICs, then freezing the IC would mean that if we extend the baseline ICs we should then invalidate Ion's code.
I'm not clear about what you mean with "freeze" the baseline ICs. Can you elaborate?
IonMonkey is able to inline test2 -> f2 -> f2 directly in the "go" function (so it even inlines the recursive call once). It's not inlining anything for test1/f1. test1 and test2 are called 100000 times. Every time we call test1, we have to create a "call object" (to hold closed over locals/arguments) and clone f1. This lambda cloning is required by the spec (and not doing it is observable: by comparing functions using == or === and functions can have extra properties etc). test2 does not have this problem. Also, because f2 is always the same function (defined in the global scope), it has a singleton type: when we call f2 we know exactly which function we are going to call. f1 can't have a singleton type because the lambda cloning creates many different function objects. As Nicolas suggested, what we could do for f1 is look at the baseline IC's and notice that they are always calling the same JSScript. We can then guard on that before inlining, but due to the extra guards it's going to be slower than f2.
Flags: needinfo?(jdemooij)
This argument is weak because a trivial analysis shows that f1 is never used in a == or === test (simply by observing that f1 is only used in the function position of calls). I think this case (nested functions with no free variables being only used in the function position of calls) occurs often, so it might be worthwhile to invest some time in handling it efficiently.
(In reply to Marc Feeley from comment #3) > I'm not clear about what you mean with "freeze" the baseline ICs. Can you > elaborate? Currently IonBuilder check for external type by asking type inference. When we use type inference function, we implicitly freeze on the data that they are returning us. That way we can safely avoid guards while having a bound on the lifetime of a script. For example, if the result of a getter is defined by type inference as being an int, we can avoid baking in a guard, and add a constraint on the type set to ensure that no new types are added (freezing). As soon as a new type is added to the type set, we invalidate the code which assumed that we could avoid doing any guards. The invalidation process patch the return path of all running instances on the stack and prevent the reentry in the code. If we extract this information from the baseline compiler, then we only get a StackTypeSet which is only known by IonBuilder. So even if we were to freeze on it, nothing would be able to modify its content, and so nothing would be able to invalidate the script through this TypeSet. Currently it is unsafe to assume the information provided from the baseline ICs as we cannot invalidate when it is becoming wrong. (In reply to Marc Feeley from comment #5) > This argument is weak because a trivial analysis shows that f1 is never used > in a == or === test (simply by observing that f1 is only used in the > function position of calls). Sadly JavaScript is a very expressive language where we can ""inspect"" the call stack … so one can do “f.caller == g.caller” to compare multiple calls to f1 without naming it directly. We can potentially do the same optimization as we do for f.arguments, to disable the optimization on f1, if f1 is ever access through f.caller (in addition to the static analysis that you are suggesting). This is wrong the first time we do it in the JIT, but as long as we have a good fallback path which can emulate this information, this would be good enough. Welcome :)
Assignee: general → nobody
Severity: normal → S3
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: