Closed Bug 1483869 Opened 6 years ago Closed 1 year ago

Consider rewriting Function.prototype.bind

Categories

(Core :: JavaScript Engine, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
112 Branch
Tracking Status
firefox112 --- fixed

People

(Reporter: jandem, Assigned: jandem)

References

(Blocks 6 open bugs)

Details

(Whiteboard: [sp3])

Attachments

(12 files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
Our self-hosted Function.prototype.bind implementation has a number of issues:

a) Performance: we optimize some common few-arguments cases but there are perf cliffs especially when there are many (bound) arguments or when we bind() more than a few different functions (because type information will be bad).

b) Correctness: bound functions are pretty different from normal functions in the spec. Like proxies, when we call a bound function, we should not switch to the bound function's realm (bound functions don't have a realm per spec) - this is bug 1297179. Same-compartment realms will fix this for proxies, but we can/should also fix this for bound functions.

I think the following design should work:

1) When binding a scripted function (or a bound function for one), we create an "optimized bound function", a JSFunction* that is native but (just like Wasm functions!) has a JIT entry. This JIT entry will point to a per-runtime bound-function trampoline where we add the bound arguments and call the bound function. This ensures calling (scripted) bound functions in JIT code will be very fast.

2) When binding something that's not a scripted function (native function, proxy, etc), we create an object that's not a JSFunction (we could call it NativeBoundFunction or UnoptimizedBoundFunction). This object's Class will have call/construct hooks and will do everything in C++. This will make it a lot easier to fix b).

3) Inlining bound function calls in Ion works with the current implementation, but only when there are a few different bound function targets. I think the new design could do this for 1) by using Baseline IC information; it would be more robust.
See Also: → 1483871
(In reply to Jan de Mooij [:jandem] from comment #0)
> 2) When binding something that's not a scripted function (native function,
> proxy, etc), we create an object that's not a JSFunction (we could call it
> NativeBoundFunction or UnoptimizedBoundFunction). This object's Class will
> have call/construct hooks and will do everything in C++. This will make it a
> lot easier to fix b).

Actually this could just be the same native function but without a JIT entry - when the JIT calls a JSNative we always know the target so it's easy enough to not do the realm-entering for the bound function case.

One annoying thing here is that we probably need a new function AllocKind with 4 reserved slots (target, boundThis, boundArgs, length).
I for one would be happy to see the atrocities in builtin/Function.js go!

Jan, have you looked at the various optimizations V8 did around this early last year to see if there's something in there that we could apply, too? I remember that they had multiple steps to their optimizations. I don't know if any of them are relevant to our architecture, however.
Maybe I misunderstood a) to only refer to invoking bound functions, but we have to make sure .bind() calls are also fast. I seem to remember a lot of frameworks using that function unreasonably much.
(In reply to Tom Schuster [:evilpie] from comment #3)
> Maybe I misunderstood a) to only refer to invoking bound functions, but we
> have to make sure .bind() calls are also fast. I seem to remember a lot of
> frameworks using that function unreasonably much.

Oh, that is a good point! I spent a ton of time optimizing .bind() itself for the self-hosted implementation. I doubt that it'd be too hard to do as good or better perf-wise with a native implementation though: we do still have a call to a JSNative in there, after all.
(In reply to Till Schneidereit [:till] from comment #4)
> we do still have a call to a JSNative in there, after all.

All parts of |FunctionBind| should run in jitted Assembler code nowadays, including the call to |_FinishBoundFunctionInit|.
Priority: -- → P3
(In reply to André Bargull [:anba] from comment #5)
> All parts of |FunctionBind| should run in jitted Assembler code nowadays,
> including the call to |_FinishBoundFunctionInit|.

Oh, good—I hadn't realized that!
(In reply to Tom Schuster [:evilpie] from comment #3)
> Maybe I misunderstood a) to only refer to invoking bound functions, but we
> have to make sure .bind() calls are also fast. I seem to remember a lot of
> frameworks using that function unreasonably much.

I expect Ion will be able to inline the "create a bound function" intrinsic, similar to _FinishBoundFunctionInit. It should actually be a bit faster in Baseline/Interpreter/Ion because we no longer need to create a separate CallObject.
I've been experimenting a bit with this, and it can be a pretty massive win especially for Baseline. The micro-benchmark below improves from 752 ms to 110 ms with --no-ion. If I pass 10 arguments to |h| the numbers are 1614 ms before, 130 ms after.

With Ion enabled it's still slower (8 ms => 37 ms) because we need to fix inlining. One option is to have bind() always return singletons but that's not great when bind is used in a loop - I think ideally we would have a (canonicalFunction + numBoundArgs) => ObjectGroup table but I don't like the complexity of that. In any case, regressing the current best-case-Ion perf a bit to fix other perf cliffs might be worth it.

I'll see if I can get this stable enough tomorrow to get some Talos (Speedometer) numbers.

function g(x, y, z) {
    return x + y + z;
}
function f() {
    var h = g.bind(null);
    var res = 0;
    var t = new Date;
    for (var i = 0; i < 3000000; i++)
        res = h(1, 2, 3);
    print(new Date - t);
    print(res);
}
f();
Severity: normal → S3
See Also: → 1812316
Blocks: 1812316
See Also: 1812316

This is just removing the JIT inlining code, to simplify the next patch.

Assignee: nobody → jdemooij
Status: NEW → ASSIGNED

This patch rewrites Function.prototype.bind. Instead of using a self-hosted
implementation that uses JS lambda functions, use a separate BoundFunctionObject
type and JSClass with call/construct hooks.

Using a type other than JSFunction matches the spec (and other implementations)
better. It also better matches the WrappedFunctionObject implementation we use for
shadow realms. It also lets us remove some complexity from JSFunction.

One of the main reasons for doing this is that it uses less memory: the self-hosted
implementation allocated a JSFunction, a CallObject, and an ArrayObject (for bound
arguments). The new implementation allocates only a BoundFunctionObject when we're
binding <= 3 arguments (the common case).

This patch stack improves Speedmeter's React-TodoMVC by a few percent. This test
allocates a lot of bound functions but has relatively few calls, so this is likely
due to reduced GC pressure.

The new implementation has a few minor differences in various corner cases with toSource
and the debugger. Another patch in this stack changes a few tests to account for this.

Depends on D170565

Now that bound functions use a JSClass that's not JSFunction, we need to give
them a JSProtoKey to make (xray) wrapper calls work. This fixes a lot of xpcshell
and browser test failures.

The old implementation used JSProto_Function. This patch adds JSProto_BoundFunction
and (as much as possible) tries to use the same code paths for it.

Depends on D170567

  • Debugger-onNativeCall-07.js: the old Debugger.Object.name implementation was incorrect for bound functions. We now correctly get "bound call" instead of just "call" without the prefix.

  • debug/Object-boundTargetFunction-02.js, debug/Object-script-lazy.js: isBoundFunction no longer returns a boolean only if isFunction, so we now get false instead of undefined.

  • heap-analysis/byteSize-of-object.js: bound functions are now a bit bigger, but we don't have a separate CallObject/ArrayObject anymore.

  • saved-stacks/capture-first-frame-with-principals.js: just use a different self-hosted function (array.map).

  • non262/extensions/uneval/function-bind.js: for non-standard and deprecated toSource, we now always return the same string. This should be okay because we no longer expose this to content.

Depends on D170568

Similar to the old implementation, add some fast paths for the common cases.

Depends on D170569

We almost always have Function.prototype as proto. This fast path optimizes
that case.

Depends on D170570

This cache has a high hit rate so it's worth it to avoid some slow(er) StringBuffer
and atomization code.

Depends on D170572

In Ion we can now allocate the object in JIT code so the VM function only
has to initialize it. This is faster than allocating the object in C++.

Depends on D170573

For now we only handle standard non-constructor calls.

Depends on D170574

List of follow-up optimization ideas I'll file bugs for and look into after this lands:

  • In BoundFunctionObject::call, reuse incoming args if we're only binding this. Measure but probably common.
  • If a bind() site is always binding a function with the same shape/name etc, we can probably avoid a VM call.
  • In tryAttachBoundFunction, try to do a shape guard instead of a class guard with the optimistic class guard optimization.
  • For megamorphic bound calls, consider using a loop that handles any numBoundArgs. Currently harder to implement in Ion.
  • Bound function state is immutable. We could communicate this to MLoadFixedSlot so the JIT can optimize this better.
  • Support constructor calls to bound functions in tryAttachCallHook and for call-bound-scripted.
  • Look into bound-scripted FunApply, this shows up in Speedometer/React.
  • Investigate trial-inlining for standard calls.
  • Consider flattening chains of bound functions to enable easier call optimizations. React-TodoMVC has a lot of this.
  • See if scalar replacement of bind would help.
Pushed by jdemooij@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2977375e714e
part 1 - Remove FinishBoundFunctionInit optimization. r=iain
https://hg.mozilla.org/integration/autoland/rev/634f0d65b4ad
part 2 - Rewrite bound function implementation. r=iain
https://hg.mozilla.org/integration/autoland/rev/c5e8334fc103
part 3 - Remove now empty builtin/Function.js file. r=iain
https://hg.mozilla.org/integration/autoland/rev/6d2e26b555f2
part 4 - Make wrappers work correctly with bound functions. r=iain,peterv
https://hg.mozilla.org/integration/autoland/rev/0b80b352475a
part 5 - Fix shell tests. r=iain
https://hg.mozilla.org/integration/autoland/rev/f09bfbdf49f2
part 6 - Optimize initialization of .length and .name. r=iain
https://hg.mozilla.org/integration/autoland/rev/f802273d0d57
part 7 - Optimize bound function allocation. r=iain
https://hg.mozilla.org/integration/autoland/rev/0b4fddb14c07
part 8 - Inline and tidy up some functions. r=iain
https://hg.mozilla.org/integration/autoland/rev/6ccef534e6ac
part 9 - Add cache for AppendBoundFunctionPrefix. r=iain
https://hg.mozilla.org/integration/autoland/rev/70e235a3de72
part 10 - Optimize Function.prototype.bind in the JITs. r=iain
https://hg.mozilla.org/integration/autoland/rev/985db4d90a4e
part 11 - Optimize calls to bound scripted functions in Baseline. r=iain
https://hg.mozilla.org/integration/autoland/rev/443570c2740f
part 12 - Transpile calls to bound functions. r=iain
Regressions: 1819090
Regressions: 1819651

For posterity, this improved Speedometer's React-TodoMVC/CompletingAllItems by ~5-6% (?) and also improved some other React and React-Redux subtests a bit. Matrix-React got a little faster as well (more clear if you look at the 50/100 iterations subtests on AWFY).

(In reply to Jan de Mooij [:jandem] from comment #21)

List of follow-up optimization ideas I'll file bugs for and look into after this lands:

  • If a bind() site is always binding a function with the same shape/name etc, we can probably avoid a VM call.

Jeff and Markus have profile data that indicates that this could be a valuable optimization. We're spending a solid chunk of ReactTodoMVC in the shell in FunctionBindImpl; if we can make that go away by precomputing and storing the name/etc and initializing the bound function in jit code, Jeff estimates that it would cut the difference between us and V8 by 40%.

(More discussion in the sp3 channel.)

(In reply to Iain Ireland [:iain] from comment #25)

Jeff and Markus have profile data that indicates that this could be a valuable optimization. We're spending a solid chunk of ReactTodoMVC in the shell in FunctionBindImpl; if we can make that go away by precomputing and storing the name/etc and initializing the bound function in jit code, Jeff estimates that it would cut the difference between us and V8 by 40%.

Here's where V8 does something similar by lowering JSCreateBoundFunction:
https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/js-create-lowering.cc;l=880;drc=6bacc8c983672cddc6d9e043472925cd60c27464

Whiteboard: [sp3]
Blocks: 1820763

Jan, while we did not observed significant speedup in DevTools React frontend, we notified significant memory improvement.

reload-inspector:content-process memory is especially interesting as it tanslates a significant reduction in memory usage of the DevTools Inspector while reloading the test page. It surely has a notable impact on typical developer workflow reloading their page with the inspector opened!
All other "object-with-stacks" tests are reporting a reduction is objects being potentially leaked in many tests.
I imagine that you reduced the number of internal JS objects related to bound functions? Or fixed an internal leak?

== Change summary for alert #37361 (as of Mon, 27 Feb 2023 18:54:03 GMT) ==

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
24% reload-netmonitor:content-process objects-with-no-stacks linux1804-64-tsan-qr 35.71 -> 44.42
10% damp console.log-in-loop-content-process-longstring macosx1015-64-shippable-qr e10s fission stylo webrender-sw 46.80 -> 51.25
7% damp complicated.inspector.close.DAMP windows10-64-shippable-qr e10s fission stylo webrender-sw 16.33 -> 17.42
6% damp complicated.inspector.close.DAMP windows10-64-shippable-qr e10s fission stylo webrender 16.30 -> 17.24
2% damp console.log-in-loop-content-process-array windows10-64-shippable-qr e10s fission stylo webrender 137.19 -> 139.99

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
100% reload-debugger:content-process objects-with-no-stacks linux1804-64-tsan-qr 4.00 -> 0.00
69% reload-inspector:parent-process objects-with-stacks linux1804-64-qr 94.93 -> 29.00
68% toolbox:parent-process objects-with-stacks linux1804-64-shippable-qr 8.92 -> 2.83
67% reload-inspector:parent-process objects-with-stacks linux1804-64-tsan-qr 89.00 -> 29.00
67% target:parent-process objects-with-stacks linux1804-64-qr 3.00 -> 1.00
... ... ... ... ...
2% reload-no-devtools:parent-process objects-with-no-stacks linux1804-64-qr 3,316.64 -> 3,234.67

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=37361

(In reply to Alexandre Poirot [:ochameau] from comment #28)

All other "object-with-stacks" tests are reporting a reduction is objects being potentially leaked in many tests.
I imagine that you reduced the number of internal JS objects related to bound functions? Or fixed an internal leak?

Probably the former: we used to allocate multiple (3 or so) JS objects for a bound function and now we allocate just one. The numbers seem to reflect that.

Thanks for posting this, very interesting :)

Blocks: 1355096
Blocks: 1824085
Blocks: 1824088
Blocks: 1816838
Regressions: 1838587
Blocks: 1815823
Blocks: 1816220
You need to log in before you can comment on or make changes to this bug.