Closed Bug 826148 Opened 7 years ago Closed 7 years ago

Implement callsite cloning for specially-marked functions

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla21

People

(Reporter: shu, Assigned: shu)

References

(Blocks 1 open bug)

Details

Attachments

(4 files, 7 obsolete files)

In the self-hosted ParallelArray code, there are times when we want a JS function to have more sensitivity. This patch implements 'callsite cloning', whereby callees marked as |cloneAtCallsite| are cloned at callsites, which grants extra TI sensitivity and better compiled code from the JITs.
Attachment #697311 - Flags: review?(luke)
Attached patch Part 2: TI changes (obsolete) — Splinter Review
Assignee: general → shu
Attachment #697312 - Flags: review?(bhackett1024)
Attached patch Part 3: Jaeger IC (obsolete) — Splinter Review
Attachment #697313 - Flags: review?(bhackett1024)
Attached patch Part 4: Ion IC (obsolete) — Splinter Review
Attachment #697314 - Flags: review?(dvander)
Note that there are currently no users of this functionality nor a way to actually set the cloneAtCallsite flag. Those will be forthcoming when we land the new ParallelArray implementation.
Comment on attachment 697312 [details] [diff] [review]
Part 2: TI changes

Review of attachment 697312 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsinfer.cpp
@@ +1261,5 @@
> +{
> +    /*
> +     * To avoid computing the callee PC at the callsite when we clone to
> +     * propagate the cloned function type to this point, we do not monitor in
> +     * the interpreter and simply clone again when doing analysis.

This comment is a bit hard to follow.  Maybe just 'Clone called functions at appropriate callsites to match interpreter behavior' and assert fun->isCloneAtCallsite().
Attachment #697312 - Flags: review?(bhackett1024) → review+
Comment on attachment 697313 [details] [diff] [review]
Part 3: Jaeger IC

Review of attachment 697313 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine, though there are cases I think where the cloning will not occur.  First, SlowCall and SlowNew use InvokeKernel directly, these are used in such cases as failed fun.apply speculation and disabled call ICs.  Second, the full call stub used for polymorphic call sites will invoke jitcode attached to scripts without checking if they need cloning, is there a mechanism elsewhere to prevent this?  I don't think this would actually be undesirable, as there shouldn't be any issues besides loss of precision with directly invoking functions which want to be cloned at each callsite.
Attachment #697313 - Flags: review?(bhackett1024) → review+
(In reply to Brian Hackett (:bhackett) from comment #7)
> Comment on attachment 697313 [details] [diff] [review]
> Part 3: Jaeger IC
> 
> Review of attachment 697313 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This looks fine, though there are cases I think where the cloning will not
> occur.  First, SlowCall and SlowNew use InvokeKernel directly, these are
> used in such cases as failed fun.apply speculation and disabled call ICs. 
> Second, the full call stub used for polymorphic call sites will invoke
> jitcode attached to scripts without checking if they need cloning, is there
> a mechanism elsewhere to prevent this?  I don't think this would actually be
> undesirable, as there shouldn't be any issues besides loss of precision with
> directly invoking functions which want to be cloned at each callsite.

Thanks for the catch. This actually highlights a bug in the jsinfer changes -- if marking a function as 'clone at callsite' is merely a hint to try to provide extra sensitivity where we can, then we need to propagate types in |TypeConstraintCall| and |TypeConstraintPropagateThis| to both the original and the clone.

That said, I will add cloning to the places you pointed out.
Attached patch Part 3.1: Jaeger errata (obsolete) — Splinter Review
Add callsite cloning to SlowCall and SlowNew.

Disables poly inline when any callee is marked as clone-at-callsite.

Note that the clone-at-callsite flag is not a hint currently -- it will be cloned at the callsite every time. I will put off making it fall back to the original function until we see a case where that's needed. This way we can avoid analyzing the original function at all during TI propagation.
Attachment #697736 - Flags: review?(bhackett1024)
Blocks: 826626
Filed bug 826148 to add this to the baseline compiler.

Shu, especially since this is not a hint, you should assert in js::RunScript (and maybe also js::Invoke) that the callee has been cloned.
(In reply to Jan de Mooij [:jandem] from comment #10)
> Filed bug 826148 to add this to the baseline compiler.
> 
> Shu, especially since this is not a hint, you should assert in js::RunScript
> (and maybe also js::Invoke) that the callee has been cloned.

Great suggestion. This made me realize I didn't implement callsite cloning for accessors and funcall and funapply.
Attachment #697311 - Attachment is obsolete: true
Attachment #697311 - Flags: review?(luke)
Attachment #698163 - Flags: review?(luke)
Attached patch Part 3.1: Jaeger and TI errata (obsolete) — Splinter Review
Attachment #697736 - Attachment is obsolete: true
Attachment #697736 - Flags: review?(bhackett1024)
Attachment #698164 - Flags: review?(bhackett1024)
I'm kindof worried about adding a non-trivial amount of global engine complexity just for the parallel array code.  What is the observed speedup in the parallel array code?

Second: won't it be semantically visible for client JS to see that their functions have been cloned?  That is, function objects have visible identities (via === or sticking properties on the function objects) and cloning will turn one identity into many.
(In reply to Luke Wagner [:luke] from comment #14)
> I'm kindof worried about adding a non-trivial amount of global engine
> complexity just for the parallel array code.  What is the observed speedup
> in the parallel array code?
> 

This isn't for speedup per se -- self-hosted PAs don't work without the extra sensitivity this provides, as Ion bails out / constantly recompiles. The sensitivity problem is a general one for writing pure-JS collections.

> Second: won't it be semantically visible for client JS to see that their
> functions have been cloned?  That is, function objects have visible
> identities (via === or sticking properties on the function objects) and
> cloning will turn one identity into many.

This cloning is done at the callsite only, so is invisible to client JS. So, if I have |o.f(x)|, the cloning of |f| is done at the last second, during dispatch. The clone never escapes to script. Any non-calls like |o.f| gets the original.
(In reply to Shu-yu Guo [:shu] from comment #15)
> This isn't for speedup per se -- self-hosted PAs don't work without the
> extra sensitivity this provides, as Ion bails out / constantly recompiles.

That sounds like a more fundamental problem with TI/Ion.  As in, if this was just some random functional-looking JS, we wouldn't want it to have these problems.

> The sensitivity problem is a general one for writing pure-JS collections.

But, in general, we'll have a problem deciding when to clone since cloning has a very high cost, so this doesn't seem like a general solution.

> > Second: won't it be semantically visible for client JS to see that their
> > functions have been cloned?  That is, function objects have visible
> > identities (via === or sticking properties on the function objects) and
> > cloning will turn one identity into many.
> 
> This cloning is done at the callsite only, so is invisible to client JS. So,
> if I have |o.f(x)|, the cloning of |f| is done at the last second, during
> dispatch. The clone never escapes to script. Any non-calls like |o.f| gets
> the original.

The identity of a called function is indeed visible; consider:

  function f() { assertEq(f.x, 4); f.x = 5 }
  f.x = 4;
  f();
  assertEq(f.x, 5);

Same issues with arguments.caller.  I guess we could store, in the stack frame, some notion of the "uncloned function", but that sounds like more complexity all around.
I think that after bug 807464 we'll be able to revisit this approach and do the call site specialization directly within Ion when inlining the call, rather than requiring it to happen so early as with this approach.  That should allow reverting all this stuff, but bug 807464 depends on the baseline compiler and is months away, and it doesn't make much sense to gate ParallelArray on that change if indeed performance is abysmal without any context sensitivity.

The semantics issues seem like they can be resolved at a few chokepoints in the VM, just going from the cloned function back to the original when reading functions from the stack during JSOP_CALLEE, arguments.* and function.* accesses.  The JITs stub all this stuff.

Also, re: comment 9, shu, I think it's a mistake to treat the cloning as anything other than a hint.  JS functions can be invoked in all sorts of ways, not just fun.call and fun.apply but higher order builtins like Array.map, getters, setters and all sorts of stuff.  Stuff that the JIT does like generic polymorphic dispatch when the callee of a given callsite is not known makes things even more complicated.  Doing the cloning as a hint turns these into potential performance issues rather than potential security issues.
Err, that should be bug 804676, not bug 807464.
So is the basic problem here that we have some 'map' function that is first getting called:
  a -> map -> b
such that 'map' and 'b' get inlined into 'a' and then later 'map' gets called
  c -> map -> d
which throws away all the 'a' code, and then later we call 'a' again and need to recompile and so on for a bunch more callers of 'map'?

If so, then yes it does seem like bug 804676 would address this problem since there is no longer a single set of types shared by all inlined calls of 'map' (right?).

Also agree with last para of comment 17.
(In reply to Luke Wagner [:luke] from comment #19)
> If so, then yes it does seem like bug 804676 would address this problem
> since there is no longer a single set of types shared by all inlined calls
> of 'map' (right?).

After bug 804676 the type analysis will be done separately for each context in which the call was inlined to.  There will still be a single collection of type sets for the types that have been observed at property accesses etc. within the call, independent of context.  This latter bit could potentially degrade precision, but will I think be largely mitigated by intersecting the observed types with the information that is known about the inlining context.  e.g.

function read(x) {
  return x[0];
}

function foo(arrayOfInts) {
  return read(arrayOfInts);
}

function bar(arrayOfStrings) {
  return read(arrayOfStrings);
}

There is a single set of observed types for read which includes int and string for the getelem.  If we have good information about the element types of arrayOfInts and arrayOfStrings though, when inlining read into foo/bar we can figure out exactly what types can be read from the getelem.

(This is getting a little far afield, just describing how this stuff should work down the road.)
(In reply to Luke Wagner [:luke] from comment #16)
> (In reply to Shu-yu Guo [:shu] from comment #15)
> > This isn't for speedup per se -- self-hosted PAs don't work without the
> > extra sensitivity this provides, as Ion bails out / constantly recompiles.
> 
> That sounds like a more fundamental problem with TI/Ion.  As in, if this was
> just some random functional-looking JS, we wouldn't want it to have these
> problems.
> 

The set of thread-safe Ion-compiled code is a subset of general Ion-compiled code. Without cloning we get ICs, which are uncompilable for parallel execution because they mutate VM state. And so, without the monomorphism we get from cloning, we can't support nested parallelism (calling map from within a map), which is stipulated by contract.

> > The sensitivity problem is a general one for writing pure-JS collections.
> 
> But, in general, we'll have a problem deciding when to clone since cloning
> has a very high cost, so this doesn't seem like a general solution.
> 

Agreed, an adaptive solution is needed.

> > > Second: won't it be semantically visible for client JS to see that their
> > > functions have been cloned?  That is, function objects have visible
> > > identities (via === or sticking properties on the function objects) and
> > > cloning will turn one identity into many.
> > 
> > This cloning is done at the callsite only, so is invisible to client JS. So,
> > if I have |o.f(x)|, the cloning of |f| is done at the last second, during
> > dispatch. The clone never escapes to script. Any non-calls like |o.f| gets
> > the original.
> 
> The identity of a called function is indeed visible; consider:
> 
>   function f() { assertEq(f.x, 4); f.x = 5 }
>   f.x = 4;
>   f();
>   assertEq(f.x, 5);
> 

This example isn't an issue since f is a GETGNAME, so it should get the original function.

> Same issues with arguments.caller.  I guess we could store, in the stack
> frame, some notion of the "uncloned function", but that sounds like more
> complexity all around.

function.caller though, is just crazy. That definitely breaks...

Re: making it a hint, I'm convinced, it needs to be a hint.

What's your take on it then, Luke? We can't wait for bug 804676.
Here's the following short-term plan:

1. Making the bit only a hint, to do it where possible. What's currently implemented is: direct calls, accessors, and call/apply, each of which was done out of necessity to get PAs working).

2. function.caller can be fixed in the slowest way imaginable: iterating through the clone table to extract the original. But this fix is entirely local within fun_getProperty and whatever JIT stubs are used, and I'm very, very okay with the cost of function.caller being slow.

In the coming months when the baseline compiler and bug 804676 lands, all of this can of course then happily go away.
(In reply to Shu-yu Guo [:shu] from comment #21)
> Without cloning we get ICs, which are uncompilable for parallel
> execution because they mutate VM state.

Ahhh, so that's why.  So this whole bug isn't performance-related at all, then.

> What's your take on it then, Luke? We can't wait for bug 804676.

Agreed.  Given that a better long-term solution is coming down the pipes and this is a hint, not a correctness requirement, it seems fine as a short-term spot-fix.
Re comment 22, storing a backlink from the clone to the original sounds better.  You can get the extra slot by making the clone be an ExtendedFunction.
Regarding point 1. above, it will be cleaner to limit it only to direct calls to reduce complexity and maintenance burden.
Comment on attachment 698163 [details] [diff] [review]
Part 1: Function flag and interpreter changes v2

Review of attachment 698163 [details] [diff] [review]:
-----------------------------------------------------------------

Almost there

::: js/src/jscntxt.cpp
@@ +263,5 @@
> +    NumCloneSpewChannels
> +};
> +
> +static bool
> +IsCloneSpewActive(CloneSpewChannel channel)

The spew doesn't seem significant enough to commit to trunk, especially since it's only for PA code and there are only two fprintf's below.  Can you remove it from the patch?

@@ +290,5 @@
> +js::CloneFunctionAtCallsite(JSContext *cx, HandleFunction fun, HandleScript script, jsbytecode *pc)
> +{
> +    JS_ASSERT(cx->typeInferenceEnabled());
> +    JS_ASSERT(types::UseNewTypeForClone(fun));
> +    JS_ASSERT(!fun->nonLazyScript()->enclosingStaticScope());

Both callers of this function have to explicitly guard and initializeLazyScript anyhow, so can you just explicitly do that here and remove it from both callers?

::: js/src/jsfun.h
@@ +41,5 @@
>                                         must be constructible but not decompilable. */
>          HAS_REST         = 0x0400,  /* function has a rest (...) parameter */
>          HAS_DEFAULTS     = 0x0800,  /* function has at least one default parameter */
>          INTERPRETED_LAZY = 0x1000,  /* function is interpreted but doesn't have a script yet */
> +        CALLSITE_CLONE   = 0x2000,  /* function is cloned anew at each call site. */

How about expanding on this comment a bit to explain that this bit is intended to be temporary and is only set for ParallelArray self-hosted functions (referencing bug 826148).

::: js/src/jsinterp.cpp
@@ +399,5 @@
>      JS_ASSERT_IF(construct, !fun->isNativeConstructor());
>      if (fun->isNative())
>          return CallJSNative(cx, fun->native(), args);
>  
> +    JS_ASSERT_IF(CanCloneAtCallsite(cx), !fun->isCloneAtCallsite());

What about the JS_CallFunctionValue path into InvokeKernel?

Instead, could you put the cloning in InvokeKernel, allowing you to remove the changes to js_fun_apply/call and Shape::set/get.
Attachment #698163 - Flags: review?(luke)
> ::: js/src/jsinterp.cpp
> @@ +399,5 @@
> >      JS_ASSERT_IF(construct, !fun->isNativeConstructor());
> >      if (fun->isNative())
> >          return CallJSNative(cx, fun->native(), args);
> >  
> > +    JS_ASSERT_IF(CanCloneAtCallsite(cx), !fun->isCloneAtCallsite());
> 
> What about the JS_CallFunctionValue path into InvokeKernel?
> 
> Instead, could you put the cloning in InvokeKernel, allowing you to remove
> the changes to js_fun_apply/call and Shape::set/get.

js_fun_apply/call and Shape::set/get changes are going to be gone in any case, as I'll limit the cloning to only direct calls. The reason why it's not done in InvokeKernel, though, is that there are JIT paths that call that directly that already have the callsite script/pc computed.
(In reply to Luke Wagner [:luke] from comment #26)
> Comment on attachment 698163 [details] [diff] [review]
> Part 1: Function flag and interpreter changes v2
> 
> Review of attachment 698163 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Almost there
> 
> ::: js/src/jscntxt.cpp
> @@ +263,5 @@
> > +    NumCloneSpewChannels
> > +};
> > +
> > +static bool
> > +IsCloneSpewActive(CloneSpewChannel channel)
> 
> The spew doesn't seem significant enough to commit to trunk, especially
> since it's only for PA code and there are only two fprintf's below.  Can you
> remove it from the patch?
> 
> @@ +290,5 @@
> > +js::CloneFunctionAtCallsite(JSContext *cx, HandleFunction fun, HandleScript script, jsbytecode *pc)
> > +{
> > +    JS_ASSERT(cx->typeInferenceEnabled());
> > +    JS_ASSERT(types::UseNewTypeForClone(fun));
> > +    JS_ASSERT(!fun->nonLazyScript()->enclosingStaticScope());
> 
> Both callers of this function have to explicitly guard and
> initializeLazyScript anyhow, so can you just explicitly do that here and
> remove it from both callers?
> 

Ah, that's tricky. That's done because for lazy functions, the flags are not copied over until after the script has been initialized. So we don't know if we should even call into this function until the lazy script has been initialized.
Comment on attachment 698163 [details] [diff] [review]
Part 1: Function flag and interpreter changes v2

Alright, then the rest were nits.
Attachment #698163 - Flags: review+
Applied luke's comments. Carrying over r+.
Attachment #698163 - Attachment is obsolete: true
Attachment #698992 - Flags: review+
v2, now propagates type information in the Call and PropagateThis constraints to both the clone and the original, as the flag is now only a hint.
Attachment #697312 - Attachment is obsolete: true
Attachment #698994 - Flags: review?(bhackett1024)
Carrying over r+.
Attachment #697313 - Attachment is obsolete: true
Attachment #698164 - Attachment is obsolete: true
Attachment #698164 - Flags: review?(bhackett1024)
Attachment #698995 - Flags: review+
v2, now only bothers to callsite clone on direct calls.
Attachment #697314 - Attachment is obsolete: true
Attachment #697314 - Flags: review?(dvander)
Attachment #698996 - Flags: review?(dvander)
Comment on attachment 698994 [details] [diff] [review]
Part 2: TI changes

Review of attachment 698994 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsinfer.cpp
@@ +688,5 @@
>  
>      const char *kind() { return "call"; }
>  
>      void newType(JSContext *cx, TypeSet *source, Type type);
> +    bool newTypeTail(JSContext *cx, HandleFunction callee, HandleScript script);

Can you name this newCallee instead?

@@ +772,5 @@
>  
>      const char *kind() { return "propagatethis"; }
>  
>      void newType(JSContext *cx, TypeSet *source, Type type);
> +    bool newTypeTail(JSContext *cx, HandleFunction callee);

Ditto.
Attachment #698994 - Flags: review?(bhackett1024) → review+
Depends on: PJS
Comment on attachment 698996 [details] [diff] [review]
Part 4: Ion IC v2

Review of attachment 698996 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/CodeGenerator.cpp
@@ +3897,5 @@
> +                                masm.labelForPatch(), liveRegs,
> +                                callee, mir->block()->info().script(), mir->callPc(), output);
> +
> +    JS_ASSERT(!mir->resumePoint());
> +    cache.setIdempotent();

You don't have to set this here, since idempotency only matters for GetProp caches.

::: js/src/ion/IonBuilder.cpp
@@ +152,5 @@
>  
>      return obj->toFunction();
>  }
>  
> +

nit: accidental newline

::: js/src/ion/MIR.h
@@ +1171,5 @@
>      CompilerRootFunction target_;
>      // Original value of argc from the bytecode.
>      uint32_t numActualArgs_;
> +    // The location of the call.
> +    jsbytecode *pc_;

You don't need to explicitly track the pc here, since an MCall instruction always has a resume point, you can use call->resumePoint()->pc()
Attachment #698996 - Flags: review?(dvander) → review+
Blocks: PJS
No longer depends on: PJS
You need to log in before you can comment on or make changes to this bug.