Closed Bug 1000780 Opened 10 years ago Closed 8 years ago

Self-host Function.prototype.bind

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: jandem, Assigned: till)

References

Details

Attachments

(8 files, 11 obsolete files)

4.47 KB, application/x-javascript
Details
1.07 KB, text/plain
Details
1.08 KB, text/plain
Details
3.84 KB, patch
jandem
: review+
Details | Diff | Splinter Review
2.93 KB, patch
jandem
: review+
Details | Diff | Splinter Review
6.53 KB, patch
till
: review+
Details | Diff | Splinter Review
2.56 KB, patch
fitzgen
: review+
Details | Diff | Splinter Review
69.58 KB, patch
jandem
: review+
Details | Diff | Splinter Review
Our current fun.bind implementation is really inefficient. Both creating and calling bound functions is slow (calling a C++ native that calls back into JS)

With GGC this became even worse, see bug 945555.

Both V8 and JSC self-host function.bind, so it's definitely possible. This should at least avoid the JS -> C++ -> JS calls. I'd like to prototype this and see how it turns out.
Till was interested in doing this so assigning this bug to him.
Assignee: jdemooij → till
Depends on: 889456
Attached patch The not-selfhosted way (obsolete) — Splinter Review
So while doing this in self-hosted code, might look like a way to challenge the self-hosted infrastructure. I don't think it is a requirement. E.g. this would be a way to do in using the inlineNativeCall infrastructure. (Just like FUNCALL/FUNAPPLY/some Setters/some Getters)

My request for feedback is not on the patch itself. But to know it make sense to land this, before landing a selfhosted version? This patch can land almost immediately. No need for extra opcodes in baseline/ionmonkey.

And we still can do the selfhosted version, since it was whispered in the first comment it would also speedup .bind() function itself. This patch only increases speed of calling a function that has been "bound" to normal speed.

given "var f1 = foo.bind(undefined);"

"f1(1,2,3);" is now as fast as "foo(1,2,3)"

So just requesting feedback if I need to look to land this already. Or if it wouldn't make sense...
Attachment #8412602 - Flags: feedback?(till)
Comment on attachment 8412602 [details] [diff] [review]
The not-selfhosted way

Review of attachment 8412602 [details] [diff] [review]:
-----------------------------------------------------------------

A patch this simple? Land it already! Seriously, getting this in before the uplift would be nice, if it doesn't regress anything. Also, this might be useful even with a self-hosted version (see below).

I have a WIP of a self-hosted version, but I'm not entirely sure if it'll be ready before the next uplift. Preliminary numbers are pretty encouraging, though: Performance of the binding operation increases 20-fold(!) for all use cases, and invocation performance changes along the following lines:

- result of .bind(thisArg) called without args: 5x faster
- result of .bind(thisArg) called with up to 5 args: 3x faster
- result of .bind(thisArg, 1, 2, 3, 4, 5) (or less args) called without args: 2.2x _worse_
- result of .bind(thisArg, 1, 2, 3, 4, 5) (or less args) called with up to 5 args: 4.5x _worse_

We should be able to just forward the last two cases to the current native implementation, so we should get no regressions in the worst case and very nice improvements in the best case.

The WIP is very rough, and it is theoretically possible that I'm overlooking some correctness issue fixing which would introduce overhead, but I don't believe that's the case.
Attachment #8412602 - Flags: feedback?(till) → feedback+
Results for WIP patch I'll attach in a moment:

--- binding performance ---
Native bind, 0 args: 165.385986328125
Self-h bind, 0 args: 6.8369140625
Native bind, 5 args: 135.77197265625
Self-h bind, 5 args: 7.486083984375
--- calling performance ---
call native, 0,0 args: 11.538818359375
call self-h, 0,0 args: 2.615966796875
call native, 2,3 args: 12.444091796875
call self-h, 2,3 args: 3.988037109375
call native, 0,5 args: 11.9580078125
call self-h, 0,5 args: 2.47412109375
call native, 5,0 args: 12.2880859375
call self-h, 5,0 args: 3.634033203125
This WIP patch mostly works, but has some severe issues still. I'm sharing it to discuss potential solutions for those issues.

Abstract: issues include not properly dealing with exceptions thrown in bound function, binding performance being negatively affected by making the bound function constructible, not having a way of detecting whether the bound function is called as a ctor, and not having a fast way of changing the bound function's name.

As shown in comment 4, results (for less than 8 arguments in total) with this patch are excellent: performance of .bind itself improves ~20x, that of calling the bound function ~3-4x. I believe that this should at least be very close to final performance for the cases of not having any bound arguments (but potentially arguments to the bound function) and having bound arguments, but no arguments to the bound function. For the case where both bound args and args to the bound function are present, things just might not work out.

The reason is that, to get around creating a temporary array, I re-use an array that's created once during self-hosting initialization (and then cloned over into each compartment upon first usage). This works nicely, but it's invalid in the general case: the bound function could throw an exception, preventing cleanup of this array. It has to have its length set to 0 to ensure that it doesn't keep any of the arguments alive. One thing we could do is to catch these exceptions and introduce a mechanism to rethrow them with the original source info in place. I tend to think that this might just not be worth it, under the assumption that it would very rarely apply, and even then, probably not in performance-sensitive code.

The next issue is invocation of bound functions as ctors: the patch doesn't contain it, but it's possible to make self-hosted functions constructible using the MakeConstructible intrinsic. That isn't inlined and hence rather slow: the binding operation takes roughly 4x as long. That's still much better than what we have now, but I think we can do something very simple: bound functions, while constructible, don't have a .prototype property. That means we can just set the isSelfHostedConstructor flag on the function, and deal with doing the actual construction work inside the self-hosted code. I guess just setting that flag should be easily inlineable, and hence not cost us any meaningful amounts of performance.

Then there's the bit about detecting whether the bound function is called as a ctor. Jandem, you said that that is fairly slow in Ion right now, but you could make it faster. I can introduce an intrinsic for it, but having that inlineable in the JIT, and fast at that, would be very valuable.

Finally, we need to change the bound function's name to that of the original. This should be straight-forward in an intrinsic, but since we need it to be fast, we need to inline it, too. My guess is that that shouldn't be ridiculously hard, but I really don't know.
Attachment #8413428 - Flags: feedback?(jdemooij)
Depends on: 1002473
Comment on attachment 8413428 [details] [diff] [review]
Self-host Function.prototype.bind. wip

Looks great. We just discussed comment 5 on IRC so I'll just f+ this.
Attachment #8413428 - Flags: feedback?(jdemooij) → feedback+
Depends on: 1003149
This mostly works, with 22 jstests and 3 jit-tests failing. The two remaining known issues are:
- boundFunction.length isn't set to the correct value (which is originalFunction.length - number of bound arguments)
- `new BoundFunction() instanceof OriginalFunction` throws a typeerror: "'prototype' property of BoundFunction is not an object"

The first of these is hard to fix without fixing bug 911142: js::fun_resolve returns the number of parameters for the script, which we obviously can't change.

I have no idea yet what the second is about.


Additionally, performance is pretty abysmal with this version: to detect whether the bound function is invoked as a constructor, we have to create a ScriptFrameIter. Doing that throws us into a very slow path, and makes us go from about 3x-4x faster than the native bind to about 3x slower.

I'll put this down for a while and pick it up again once bug 1002473 is fixed.
Attachment #8413428 - Attachment is obsolete: true
Oh, but without the IsConstructing part, numbers are pretty good: ~15x faster bind operation and, as said above, 3x-4x faster calling of bound functions. These numbers should be pretty final with a fast IsConstructing.
Depends on: 911142
Comment on attachment 8412602 [details] [diff] [review]
The not-selfhosted way

This should probably go into a different bug. Mind filing one?
Flags: needinfo?(hv1989)
(In reply to Florian Bender from comment #9)
> This should probably go into a different bug. Mind filing one?

It did: bug 1001850
Flags: needinfo?(hv1989)
Depends on: 1084019
Blocks: 1093158
I think this is good to go, except for being blocked by bug 911142. I'll request a review for just that part once it works.

Numbers look pretty good:
- binding performance 6 to 8 times better than the native implementation
- calling performance within 20% of unbound and hence 2 to 3 times better than before
- construction performance within 25% of before
- stack space usage way down: the test case from bug 1093158 results in about 10 times more iterations than before, which is roughly half of the unbound case

Some foot notes on these results:
I optimized what I think should be the vast majority of use cases by throwing lots of gnarly switch statements and almost-identical functions at them. Specifically, the cases where 0,1, or 2 arguments are bound get their own bound function. Each of these in turn has its own set of switch statements optimizing the cases where the bound function is called with 0-5 arguments. "Sets of switch statements" because two are needed: one for the function being invoked as a ctor, one for normal calls.

Functions with more than 2 bound arguments or more than 5 call arguments are still optimized when called without any arguments and are roughly as fast as the unbound version (for calls) or only slightly slower than the bound version before this patch (for ctors).

Cases not covered by these optimizations are slower, but not devastatingly so:
- calling a function bound with 2 arguments, called with 3 is about as fast as calling an unbound one with 5 arguments; one with 3 bound arguments, called with 2 is about 120% slower (which means it's about 20% slower than without this patch)
- same cases but for constructing: about 300% slower than unbound, 100% slower than before this patch

We don't have telemetry, but I'm pretty confident that these slower cases are almost never hit in practive.


A few interesting observations:

I had to move away from using rest args and destructuring calls in most places because just touching anything (like the length) on rest args or using a destructuring call in any branch of a function, even one that's never hit, induces substantial performance hits. I haven't been able to figure out why that is: no IONFLAGS gave me any insights. That's probably because I'm too unfamiliar with reading their output, of course.

Using bound functions as constructors is weirdly slow. I again haven't been able to figure out what's causing this, and haven't been able to create a usefully reduced test case, either. Would be great if someone with more JIT experience could look into that, as I think these cases shouldn't be any slower than the call cases (or, don't have more of an overhead over the unbound case, rather).

There's a performance fault in the current bind implementation that causes calls with 0 (bound or call) arguments to be slower than calls with any (bound or call) arguments. Should something prevent this patch from landing soon, it might be worth looking into that.
Attachment #8517438 - Flags: review?(jdemooij)
Attachment #8414493 - Attachment is obsolete: true
Comment on attachment 8517438 [details] [diff] [review]
Self-host Function.prototype.bind

Review of attachment 8517438 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thanks for doing this! Can you file bugs for remaining performance issues and anything that's blocking us from simplifying the code?

::: js/src/builtin/Function.js
@@ +7,5 @@
> +        ThrowError(JSMSG_INCOMPATIBLE_PROTO, 'Function', 'bind', this);
> +    var boundFun;
> +    switch (boundArgs.length) {
> +        case 0:
> +            boundFun = bind_bindFunction0(this, thisArg);

We don't indent switch cases with 2 spaces instead of 4, like we do in C++ code?

@@ +72,5 @@
> +        var callArgsCount = arguments.length;
> +        var args = NewDenseArray(callArgsCount);
> +        for (var i = 0; i < callArgsCount; i++)
> +            UnsafePutElements(args, i, arguments[i]);
> +        if (_IsConstructing)

This should be a call, _IsConstructing(). Maybe we need some tests for this if nothing caught it.

@@ +118,5 @@
> +        var args = NewDenseArray(callArgsCount + 1);
> +        UnsafePutElements(args, bound1);
> +        for (var i = 0; i < callArgsCount; i++)
> +            UnsafePutElements(args, i + 1, arguments[i]);
> +        if (_IsConstructing)

Same here.

@@ +164,5 @@
> +        var args = NewDenseArray(callArgsCount + 2);
> +        UnsafePutElements(args, bound1, bound2);
> +        for (var i = 0; i < callArgsCount; i++)
> +            UnsafePutElements(args, i + 2, arguments[i]);
> +        if (_IsConstructing)

Same here.

@@ +193,5 @@
> +            return bind_callFunctionInScopeN(fun, thisArg, combinedArgs);
> +    };
> +}
> +
> +function bind_callFunctionInScopeN(fun, thisArg, a) {

What does the "InScope" part mean? I was wondering why this function has InScope in the name and constructFunctionN doesn't.

::: js/src/jsfun.cpp
@@ -1674,5 @@
>  #endif
>      JS_FN(js_toString_str,   fun_toString,   0,0),
>      JS_FN(js_apply_str,      js_fun_apply,   2,0),
>      JS_FN(js_call_str,       js_fun_call,    1,0),
> -    JS_FN("bind",            fun_bind,       1,0),

We can remove fun_bind itself and a lot of other code, but we can do that in a follow-up patch, of course.
Attachment #8517438 - Flags: review?(jdemooij) → review+
Blocks: 1097976
Till, any idea when you might get back to this?
Flags: needinfo?(till)
This mostly works and supports retrieving the target function, bound receiver, and the bound arguments for the debugger.

It fails in jit-test/tests/basic/bug630865-5.js: that test adds a `prototype` member on the bound function and then constructs the function. We're supposed to look up the `prototype` member on the target function, not the bound one, but because we don't special-case bound functions anymore, we do the latter in js::CreateThisForFunction called by RunState::maybeCreateThisForConstructor.

I'll attach a second patch next that fixes that, but introduces a new issue.
Attachment #8517438 - Attachment is obsolete: true
With this patch, the issue described in the previous comment is fixed, but a new one occurs: in RunState::maybeCreateThisForConstructor we now special-case bound functions and resolve their target function before continuing. This works if the target function is interpreted. It doesn't work, however, if the target function is native: we get an exception in JSFunction::getOrCreateScript, with the following stack:

0 JSFunction::getOrCreateScript
1 CreateThisForFunctionWithProto
2 CreateThisForFunction
3 RunState::maybeCreateThisForConstructor

Which makes sense, because all of this expects an interpreted function but we now have a native one. I'm not sure how to change this, and where it should be changed.
Flags: needinfo?(till)
Attachment #8535894 - Flags: feedback?(jdemooij)
Comment on attachment 8535894 [details] [diff] [review]
part 2: Special-case bound functions in RunState::maybeCreateThisForConstructor (will be merged into part 1)

Review of attachment 8535894 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Till Schneidereit [:till] from comment #15)
> Which makes sense, because all of this expects an interpreted function but
> we now have a native one. I'm not sure how to change this, and where it
> should be changed.

The bound function doesn't really care about its |this| value in the _IsConstructing() case, right? I think CreateThisForFunction can just return an arbitrary object in this case, like |callee| or the global. The bound function will then use "new target()" and construct the real |this|.
Attachment #8535894 - Flags: feedback?(jdemooij)
Drive-by comment:
I haven't read the patches, so I don't know if the approach is different, but someone has recently found a subtle bug in the F.p.bind polyfill that was available on MDN related to native functions which don't have a 'prototype' property. Analysis can be found here https://gist.github.com/jacomyal/4b7ae101a1cf6b985c60

Maybe just something to be aware of. Perhaps add a test for this case if there isn't one yet.
(In reply to David Bruant from comment #17)
> Drive-by comment:
> I haven't read the patches, so I don't know if the approach is different,
> but someone has recently found a subtle bug in the F.p.bind polyfill that
> was available on MDN related to native functions which don't have a
> 'prototype' property. Analysis can be found here
> https://gist.github.com/jacomyal/4b7ae101a1cf6b985c60
> 
> Maybe just something to be aware of. Perhaps add a test for this case if
> there isn't one yet.

Thanks for the heads-up. Luckily, this being self-hosted code we can do slightly more about this.

Specifically, `instanceof` is supposed to use the target function's `prototype` property. That isn't entirely straight-forward to do without giving the bound function a prototype, too. That, however, is not what we're supposed to do: `'prototype' in (function(){}).bind(null) === false`

This is supported internally by special-casing and essentially unwrapping the target function in the `instanceof` operator.

Coincidentally, the last few comments here are about issues with precisely that: when constructing a scripted function, the engine enters a path where it allocates an object for the constructor to run on. For bound non-scripted functions, this caused crashes because that path doesn't expect the constructor to be non-scripted. Which is only relevant if .bind is self-hosted and thus scripted, of course.

All in all, we've got this covered, but it complicates things a bit.
@jandem, this got long, so just so you know: there are two questions hidden at the very end :)

Ok, so AFAICT this works in all aspects. However, there are issues.

First the good news:

Binding performance is about 2.5x better than with the current implementation.

Call performance in the optimized cases is about on par or slightly better. In any case, there's not too much headroom there compared to unbound function calls.

Construction in the optimized cases is about 2x better and can be improved.


Now the bad news:

Binding performance is about 3x worse than it could be. The reason is that setting the `length` property is slow. In the attached patch, that happens in intrinsic_FinishBoundFunctionInit in SelfHosting.cpp, because that's faster than doing it in JS. You can compare by commenting out lines 257-261 in SelfHosting.cpp and activating lines 46-47 in builtin/Function.js. I think the best solution here (which would be really nice for other usages, too) is to inline intrinsic_DefineDataProperty.

Call performance falls off a very deep cliff if the optimized path isn't hit. As an example, calling a function with 2 bound arguments and 7 call arguments takes 360ms, compared to 13ms with the current implementation. A slowdown of more than 25x seems hardly acceptable to me, even for cases we think will be rare.

Construction performance falls off a cliff, too. This one isn't quite as steep at only 10x worse, but I think that's largely because construction isn't as fast in the optimized cases.

Which brings me to: construction is way slower than it could be because we currently create a bogus object in CreatethisForFunction (and presumably inlined code in the JIT somewhere). It would be great if we could instead not ever create this object, obviously.


Now about that performance cliff. What's happening there is that we create a temporary dense array, copy bound and call arguments into it and then call either Function#apply or _ConstructFunction, a new intrinsic.


Jandem, do you think there's anything we can optimize about this? Perhaps by adding an inlined intrinsic for calling/constructing with arguments passed in as two separate lists?

Also, how much effort do you think inlining _DefineDataProperty will be? I think there was a WIP patch for that somewhere, but can't find it at the moment.
Attachment #8537217 - Flags: feedback?(jdemooij)
Attachment #8535655 - Attachment is obsolete: true
Ok, I just realized that it might make sense to slightly repurpose CallOrConstructBoundFunction for the slow path. I'll look into that, and then we can see about maybe inlining that.
Sorry for the delay.

(In reply to Till Schneidereit [:till] from comment #19)
> Binding performance is about 2.5x better than with the current
> implementation.

Nice!

> Binding performance is about 3x worse than it could be. The reason is that
> setting the `length` property is slow. In the attached patch, that happens
> in intrinsic_FinishBoundFunctionInit in SelfHosting.cpp, because that's
> faster than doing it in JS. You can compare by commenting out lines 257-261
> in SelfHosting.cpp and activating lines 46-47 in builtin/Function.js. I
> think the best solution here (which would be really nice for other usages,
> too) is to inline intrinsic_DefineDataProperty.

I think we had a similar situation with the RegExp match result object, it's an array with "index" and "input" properties and defining those each time was slow. What we do there is we have a template object (see createMatchResultTemplateObject) that we clone in CreateRegExpMatchResult. Then setting the properties is simply storing a Value to a slot and that's easy to inline... Could we do something similar here?

Inlining DefineDataProperty seems hard but I'd have to investigate what the fast path needs to do exactly.

(In reply to Till Schneidereit [:till] from comment #20)
> Ok, I just realized that it might make sense to slightly repurpose
> CallOrConstructBoundFunction for the slow path. I'll look into that, and
> then we can see about maybe inlining that.

OK, hopefully that will address the second problem :)
Comment on attachment 8537217 [details] [diff] [review]
Self-host Function.prototype.bind

Review of attachment 8537217 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Function.js
@@ +33,5 @@
> +    var L;
> +    // Step 7.
> +    if (targetHasLength) {
> +        // Steps a-d.i.
> +        var targetLen = target.length|0;

ToInteger(arg) is not equivalent to arg|0.

js> f=Object.defineProperty(a=>a, "length", {value:1/0})
a=>a
js> f.length
Infinity
js> f.bind().length
0 // should be Infinity
Flags: needinfo?(till)
(In reply to ziyunfei from comment #22)
> ToInteger(arg) is not equivalent to arg|0.

Thanks, good catch.

Note that this wouldn't have mattered before bug 911142 landed as we don't have functions with real parameter counts above 31bit. (Probably quite a bit less, in fact). With `Function#length` configurable, this becomes observable, so I'll fix it.
Flags: needinfo?(till)
Attachment #8537217 - Flags: feedback?(jdemooij)
No longer blocks: shumway-fb2
I spent quite a bit of time fixing issues with this and optimizing it some more. I'm not sure we can really land it in this state, but it's much better than before. Crucially, I changed things such that we don't create a superfluous object when constructing, setting the `this` value to JS_UNINITIALIZED_LEXICAL instead. This makes the optimized cases of constructing go from ~0.8x as fast as currently to about 5x as fast. (No, I don't know why the difference is this dramatic.)

On the v8 team's recent bind benchmark, this patch improves performance 10x, to 200ms, so that's pretty nice. My own benchmark, which tests quite a bit more stuff, shows pretty different numbers, and we do much better on it right now. I'm not sure what the fundamental difference is. The following numbers are all for that benchmark, which I'll attach next.

Binding performance is between 12% and 45% faster, depending on the number of bound arguments and whether we hit an optimized path or not. However, most time during binding is spent resolving the length property on the bound function. If we had a faster way to define that property, binding would be about 4x to 5x as fast as it is with this patch.

Unfortunately calling performance is about 0.55x as fast even in the best case (no bound arguments, no call arguments). This is for cases where we can currently inline the bound function, but there are many such cases. In the worst case, where we miss all fast paths and have to manually create an array joining the bound and call args, calls are 0.02x(!) as fast, i.e. my test case goes from 53ms to 2327ms.

The same pattern holds for constructing performance, but less bad: the non-optimized cases are 0.11x as fast (620ms to 5545ms).

One thing I realized is that we might be able to do much better here: we could in theory keep the same inlining optimization we have for the C++ implementation, by checking `isBoundFunction` in MCallOptimize, instead of checking the JSNative identity. However, this doesn't work right now, because we don't keep the exact JSFunction around in case of script-sharing clones. Instead, we only have the canonical function. It's possible to mark that as being a bound function, too, but then we still don't have the environment, which we need to retrieve the things (i.e., `this` and bound arguments) we closed over. If we can still get at the environment somehow, we could inline this and get the best of both worlds.


Finally, this doesn't yet pass all tests. We need a flag on JSFunction, which I hope efaust will free up in bug 1234702. Also, there are test failures around async stack retrieval, which I don't understand and will talk to devtools people about. I don't expect either of these issues to make too much of a difference to the fundamental setup or most of the actual implementation here.
Attachment #8703417 - Flags: feedback?(jdemooij)
Attachment #8537217 - Attachment is obsolete: true
The benchmark mentioned in the last comment. The gc() calls are in there mostly to ensure that the bound function target is tenured. For the self-hosted implementation that doesn't make a difference, but the current implementation fails to inline otherwise.
Attachment #8412602 - Attachment is obsolete: true
Attachment #8413425 - Attachment is obsolete: true
Attachment #8535894 - Attachment is obsolete: true
efaust just made me realize that I don't need to resolve the name immediately: instead, I can store it in extended slot 1, and change the resolve hook to special-case bound functions. That should speed up .bind() by ~4x without the need to do anything about the property definition being slow.
Comment on attachment 8703417 [details] [diff] [review]
Self-host Function.prototype.bind, v4

Some ways to improve performance:

(1) We can't inline functions containing arguments[i] in Ion. Maybe we can fix this by using apply to call a helper function.

(2) The slowest case (2 + 8 arguments) is slow because we don't inline the apply call. We emit JSOP_CALL for the callFunction(std_Function_apply...) but we really want JSOP_FUNAPPLY. A simple hack is to check funNode->name() in emitSelfHostedCallFunction, and emit JSOP_FUNAPPLY if needed.

(3) After that, the biggest bottleneck is growing the array in self-hosted code. We can fix that by replacing `var args = [];` with `var args = std_Array(callArgsCount + 2);`

Also, would it be possible to use a single, global array for this, instead of creating a new one on each call? We're passing the array to the builtin fun_apply, so we know it won't escape.
Attachment #8703417 - Flags: feedback?(jdemooij) → feedback+
(In reply to Jan de Mooij [:jandem] from comment #29)
> Also, would it be possible to use a single, global array for this, instead
> of creating a new one on each call? We're passing the array to the builtin
> fun_apply, so we know it won't escape.

A problem with this is that we want to empty it after the call, to avoid holding onto GC things, so we'd probably need a try-catch or something..
This means that self-hosted code can ensure that even codepaths that might not yet have been excercised have baked-in intrinsics by ensuring that the intrinsic has been cloned into the current compartment.
Attachment #8705998 - Flags: review?(jdemooij)
Waldo reviewed this in bug 1234702.
Attachment #8703417 - Attachment is obsolete: true
Attachment #8706000 - Flags: review+
Turns out the tests can be fixed much easier than we discussed yesterday by just changing them to not use Function#bind.
Attachment #8706001 - Flags: review?(nfitzgerald)
This needs to get a final try run once the trees reopen, but I think it should be ready to go.

Performance for the slow paths still isn't great, but much better than before, and the fast paths are really fast.

Final performance tally:
- Function#bind itself is 7x to 8x as fast (700-760ms to 80-110ms)
- Calling bound functions is
  - as fast as before for the cases optimized by both approaches
  - about 10x to 12x as slow for the cases that were optimized by the old approch and not the new one
    (10x for more than two bound args, and 12x for more than 9 args in total)
  - about 40x as fast for the cases that weren't optimized by the old approach, but are by the new one
    (e.g. the benchmark from http://benediktmeurer.de/2015/12/25/a-new-approach-to-function-prototype-bind/, which takes 100ms now)
- Constructing bound functions is
  - 5x as fast for the optimized cases
  - 7x as slow for the non-optimized cases
    (could be optimized with an ion-inlining for the _ConstructFunction intrinsic)
Attachment #8706003 - Flags: review?(jdemooij)
Fixes two asserts, sorry for the churn.
Attachment #8706112 - Flags: review?(jdemooij)
Attachment #8706003 - Attachment is obsolete: true
Attachment #8706003 - Flags: review?(jdemooij)
Attachment #8706001 - Flags: review?(nfitzgerald) → review+
Bah, yet another issue fixed. Try is closed right now, but I'm reasonably confident in this version. Changes are required becaues on 32bit Linux I had weird test failures, that I'm fairly hopeful won't be there anymore in the new version.

It even simplifies the JS code and makes a few cases faster.
Attachment #8706159 - Flags: review?(jdemooij)
Attachment #8706112 - Attachment is obsolete: true
Attachment #8706112 - Flags: review?(jdemooij)
Comment on attachment 8705998 [details] [diff] [review]
Part 1: Bake in already-cloned intrinsic functions even if the callsite doesn't have type information

Review of attachment 8705998 [details] [diff] [review]:
-----------------------------------------------------------------

Nice.
Attachment #8705998 - Flags: review?(jdemooij) → review+
Comment on attachment 8705999 [details] [diff] [review]
Part 2: Emit JSOP_FUNAPPLY when using std_Function_apply in self-hosted code

Review of attachment 8705999 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/frontend/BytecodeEmitter.cpp
@@ +7230,3 @@
>          return false;
>  
>      checkTypeSet(pn->getOp());

Nit: use callOp here as well (although there's no difference between CALL and FUNAPPLY here).
Attachment #8705999 - Flags: review?(jdemooij) → review+
Comment on attachment 8706159 [details] [diff] [review]
Part 5: Self-host Function.prototype.bind. v7

Review of attachment 8706159 [details] [diff] [review]:
-----------------------------------------------------------------

Very nice! Thanks for doing this.

We should add some of these benchmarks to AWFY-assorted, to make sure we don't accidentally regress bound function perf.

::: js/src/builtin/Function.js
@@ +256,5 @@
> +        return callContentFunction(fun, thisArg, a[0], a[1], a[2], a[3], a[4], a[5]);
> +      case 7:
> +        return callContentFunction(fun, thisArg, a[0], a[1], a[2], a[3], a[4], a[5], a[6]);
> +      case 8:
> +          return callContentFunction(fun, thisArg, a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[7]);

Nit: indentation is off here and below (tabs?)

@@ +281,5 @@
> +        return new fun(a[0], a[1], a[2], a[3], a[4], a[5]);
> +      case 7:
> +        return new fun(a[0], a[1], a[2], a[3], a[4], a[5], a[6]);
> +      case 8:
> +          return new fun(a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[7]);

And here.

@@ +293,5 @@
> +          return new fun(a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[7], a[8], a[9], a[10], a[11]);
> +      default:
> +        assert(a.length !== 0,
> +               "bound function construction without args should be handled by caller");
> +        return _ConstructFunction(fun, a);

So is this still faster than `new fun(...a)`? Or is the problem that spread calls prevent Ion-compilation of the whole function? We should probably work on that instead of speeding up _ConstructFunction.

::: js/src/jit/IonBuilder.cpp
@@ +6269,5 @@
>          current->add(magic);
>          return magic;
>      }
>  
> +    if (target->isBoundFunction()) {

Nit: no {}

::: js/src/jit/VMFunctions.cpp
@@ +85,5 @@
> +            // JS_UNINITIALIZED_LEXICAL in
> +            // js::RunState::maybeCreateThisForConstructor.
> +            if (thisv.whyMagic() == JS_UNINITIALIZED_LEXICAL)
> +                thisv.setMagic(JS_IS_CONSTRUCTING);
> +            MOZ_ASSERT(thisv.whyMagic() == JS_IS_CONSTRUCTING);

efaust fixed this on m-i in bug 1236548, so we no longer need this hunk. (Also, thisv isn't passed to Construct so we don't need to change it.)

::: js/src/jsfun.cpp
@@ +1294,5 @@
>      return isDefault;
>  }
>  
> +static const js::Value&
> +BoundFunctionEnvironmentSlotValue(const JSFunction* fun, uint32_t slotIndex) {

Nit: { on next line

@@ +1304,3 @@
>  
> +bool
> +JSFunction::getLength(JSContext* cx, uint16_t* length)

After seeing the fun_resolve changes, I wonder if this still does the right thing for bound functions? Should we assert the function is not bound?

::: js/src/jsfun.h
@@ +452,5 @@
>          MOZ_ASSERT(!hasUncompiledScript());
>          return u.i.s.script_;
>      }
>  
> +    inline bool getLength(JSContext* cx, uint16_t* length);

Nit: remove the 'inline'

::: js/src/jsobjinlines.h
@@ -427,5 @@
>  
>  inline bool
>  JSObject::isBoundFunction() const
>  {
> -    return hasAllFlags(js::BaseShape::BOUND_FUNCTION);

We can now remove BaseShape::BOUND_FUNCTION too, right?

::: js/src/jsscript.cpp
@@ +3408,5 @@
>  
>      gc::AllocKind allocKind = srcFun->getAllocKind();
> +    uint16_t flags = srcFun->flags();
> +    if (srcFun->isSelfHostedBuiltin()) {
> +        allocKind = gc::AllocKind::FUNCTION_EXTENDED;

So srcFun->getAllocKind() is not guaranteed to be FUNCTION_EXTENDED?

::: js/src/vm/Interpreter.cpp
@@ +303,5 @@
>      RootedAtom name(cx, atom == cx->names().empty ? nullptr : atom);
>  
>      RootedFunction ctor(cx);
>      if (!cx->runtime()->createLazySelfHostedFunctionClone(cx, selfHostedName, name,
> +                                                          /* nargs = */ 0,

Why do we change !!derived to 0?
Attachment #8706159 - Flags: review?(jdemooij) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/6aee1612510b7b5b7b02581e99abfa7f6cb74601
Bug 1000780 - Part 1: Bake in already-cloned intrinsic functions even if the callsite doesn't have type information. r=jandem

https://hg.mozilla.org/integration/mozilla-inbound/rev/2defd29d0df9074e5067ff45a561bb429c8db03f
Bug 1000780 - Part 2: Emit JSOP_FUNAPPLY when using std_Function_apply in self-hosted code. r=jandem

https://hg.mozilla.org/integration/mozilla-inbound/rev/ef34a9cec9703aacaacd74d1331876c21256d48b
Bug 1000780 - Part 3: Free up JSFunction flag. r=jwalden+bmo

https://hg.mozilla.org/integration/mozilla-inbound/rev/ccfd55971698728b83ab65acad04c5ab2faab39d
Bug 1000780 - Part 4: Remove Function#bind usage from async stack tests. r=fitzgen

https://hg.mozilla.org/integration/mozilla-inbound/rev/f29f1d9a3cd31eb7a12eaab889a8a138c79d5d98
Bug 1000780 - Part 5: Self-host Function.prototype.bind. r=jandem
(In reply to Jan de Mooij [:jandem] from comment #47)
> > +        return _ConstructFunction(fun, a);
> 
> So is this still faster than `new fun(...a)`? Or is the problem that spread
> calls prevent Ion-compilation of the whole function? We should probably work
> on that instead of speeding up _ConstructFunction.

It's a correctness issue: ...a can have side effects, so it's not safe to use here.

> efaust fixed this on m-i in bug 1236548, so we no longer need this hunk.
> (Also, thisv isn't passed to Construct so we don't need to change it.)

\o/

> > +JSFunction::getLength(JSContext* cx, uint16_t* length)
> 
> After seeing the fun_resolve changes, I wonder if this still does the right
> thing for bound functions? Should we assert the function is not bound?

Good call, done.

> > +    inline bool getLength(JSContext* cx, uint16_t* length);
> 
> Nit: remove the 'inline'

Pushed as part 6, overlooked it before. :(

> We can now remove BaseShape::BOUND_FUNCTION too, right?

Yep, done.

> So srcFun->getAllocKind() is not guaranteed to be FUNCTION_EXTENDED?

Added a comment explaining why that is.

> >      if (!cx->runtime()->createLazySelfHostedFunctionClone(cx, selfHostedName, name,
> > +                                                          /* nargs = */ 0,
> 
> Why do we change !!derived to 0?

Because ...args don't count towards argc anymore, as they shouldn't according to spec.
According to AWFY, these patches regressed:
ss-fasta
kraken-crypto-ccm
kraken-crypto-sha256-iterative
octane-deltablue
misc-apply-array-headroom
misc-basic-array-forof
misc-bugs-652377-jslint-on-jslint
(In reply to Guilherme Lima from comment #51)
> According to AWFY, these patches regressed:
> ss-fasta
> kraken-crypto-ccm
> kraken-crypto-sha256-iterative
> octane-deltablue
> misc-apply-array-headroom
> misc-basic-array-forof
> misc-bugs-652377-jslint-on-jslint

Thanks for the report. Turns out it's entirely valid to reference `undefined` in self-hosted code and we don't need to bail out every single time that happens, forever ...
Depends on: 1242662
Depends on: 1243943
Blocks: 1257349
Depends on: 1280818
You need to log in before you can comment on or make changes to this bug.