Closed Bug 1087468 Opened 5 years ago Closed 5 years ago

Inline forEach when we can remove the scope chain.

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: nbp, Assigned: nbp)

References

Details

Attachments

(2 files, 1 obsolete file)

In Bug 1073033, we are looking at using escape analysis for removing the scope chain.  This means that if all inner functions are non-escaped and inlined, then we can remove the allocation of the scope chain and use registers instead of slots of the scope chain.

One of the case where this work would find a lot of interest, is for code which are similar to

  function sumAll(arr) {
    var sum = 0;
    arr.forEach(function (x) { sum += x; });
    return sum;
  }

Unfortunately, in such cases we are not yet inlining forEach, the reason being that it contains a for-loop.  The reason is that inlining functions with for-loops cause regressions.

I suggest to refine the heuristics such as we can still inline functions which are containing a for-loop under the condition that one of the arguments of the function is a MLambda.  This way we should be able to cover all cases where the scope chain optimization matters.
(In reply to Nicolas B. Pierron [:nbp] from comment #0)
> I suggest to refine the heuristics such as we can still inline functions
> which are containing a for-loop under the condition that one of the
> arguments of the function is a MLambda.  This way we should be able to cover
> all cases where the scope chain optimization matters.

Going *purely* from the description here: all the various typed array methods, once we start self-hosting them, are going to want to be cloned at callsite (probably) and likely inlinable.  Some like find/findIndex (bug 1078975) will take lambdas.  Others (set) will not.  Most to all of them will contain for-loops, I expect.  So it'd be good if the design isn't narrowly limited to *only* methods that take lambdas.
Doing so bring a tiny improvement of 0.2% on the test case from comment 0,
which I guess is caused by MFunctionEnvironment::foldsTo as it removes one
indirection.

Also, this case is likely to be less frequent that just inlining loops, and
I cannot spot any regressions on octane.
Attachment #8509751 - Flags: review?(hv1989)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #1)
> So it'd be good if the design isn't
> narrowly limited to *only* methods that take lambdas.

I agree. Which tests regress if we inline functions with loops? Maybe now's a good time to fix that instead of adding more and more heuristics.
Comment on attachment 8509751 [details] [diff] [review]
IonMonkey: Inline functions which have loop if they are given an inner function.

Review of attachment 8509751 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Jan de Mooij [:jandem] from comment #3)
> I agree. Which tests regress if we inline functions with loops? Maybe now's
> a good time to fix that instead of adding more and more heuristics.

(In reply to Nicolas B. Pierron [:nbp] from comment #2)
> Doing so bring a tiny improvement of 0.2% on the test case from comment 0,
> which I guess is caused by MFunctionEnvironment::foldsTo as it removes one
> indirection.

You didn't mean 20%, but 0.2% improvement???

An improvement of 0.2% on such a dedicated benchmark seems small and doesn't really justifies this heuristic, imho. Also we don't know what causes these regressions with inlining big functions with loops. So we don't know if we will hit it with MLambda. (And a 0.2% improvement won't offset the possible regression a big function with loops will have).

I also think the best way forward would be to investigate the loop inlining problem and find the issue and try to solve it. That way we will have
1) Less heuristics
2) Inlining loops of big functions
3) This forEach inlining for free
Attachment #8509751 - Flags: review?(hv1989)
Inlining function with loops is useless unless you can guarantee that you have more information, and that such information is valuable for optimizing functions.  Knowing that if the function contains a loop it likely got compiled since, I cannot guarantee that for any loop we would bring additional information to optimize the loop even more.  On the other hand, I can say if we give a lambda as argument of such function, then there is a non-neglectable likelyhood that it would be called and not escaped.  In which case Bug 1073033 will optimize such function.

So, unless you have a better idea for providing better information to every for loop, feel free to fix it and remove this check.

(In reply to Hannes Verschore [:h4writer] from comment #4)
> Comment on attachment 8509751 [details] [diff] [review]
> IonMonkey: Inline functions which have loop if they are given an inner
> function.
> 
> Review of attachment 8509751 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> (In reply to Jan de Mooij [:jandem] from comment #3)
> > I agree. Which tests regress if we inline functions with loops? Maybe now's
> > a good time to fix that instead of adding more and more heuristics.
> 
> (In reply to Nicolas B. Pierron [:nbp] from comment #2)
> > Doing so bring a tiny improvement of 0.2% on the test case from comment 0,
> > which I guess is caused by MFunctionEnvironment::foldsTo as it removes one
> > indirection.
> 
> You didn't mean 20%, but 0.2% improvement???

Yes because the pointer indirection is an invariant of the loop and it is lifted outside the loop when we do not inline the function. The  previous test was increasing the array size cause at every outer loop iteration, which minimize the impact of loop invariants.  If I test with the ideal workload, then I see a 8% improvement.
These results highlight that inlining all loops cause major changes in our benchmark scores:

  Richards:        29k  ->  21k
  NavierStokes:    31k  ->  38k
  PdfJS:           21k  ->  15k
  MandreelLatency: 46k  ->  27k

Including constant in addition to lambdas, make us keep the NavierStokes improvement without loosing on other benchmarks.

Don't take me wrong, I hate heuristics which have no reason of being, but both constants and lambda are logical choices.

 - Constants might be useful for filtering out branches and removing bounds-check.
 - Lambdas are useful if we can remove the scope chain.
Attachment #8509751 - Attachment is obsolete: true
Attachment #8515232 - Flags: review?(hv1989)
Comment on attachment 8515232 [details] [diff] [review]
IonMonkey: Inline functions with loops based on their arguments.

Review of attachment 8515232 [details] [diff] [review]:
-----------------------------------------------------------------

I'm gonna allow this exception on the loop inlining rule, since evidence indeed shows an improvement on workload AND
we currently have no data suggesting this will cause regressions.

BUT if we get reports of regressions caused by this (like the ones we see when allowing inlining loops always). I will ask to revert this and suggest bug 1087468 comment 3 again. We need to find and fix the reason when/why loops inlining causes regressions.

::: js/src/jit/IonBuilder.cpp
@@ +4436,5 @@
>  
> +            if (targetScript->hasLoops()) {
> +                // Usualy, we are not inlining function which have loops because
> +                // it is likely already compiled and the cost of the call would
> +                // be neglectable compared to the cost of the loop.

This is incorrect. We don't inline loops, because we have unexplained regressions because of it. We actually want this heuristic to go away since we will have better types at the callee. I think it is bad to reason away that inlining loops in big functions would be bad. Theoretically it shouldn't!

So please update this part to:
// Currently we have regressions when allowing to inline big functions that contain loops. The reason isn't well understood yet and therefore it is disallowed.
Attachment #8515232 - Flags: review?(hv1989) → review+
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #10)
> Backed out for SM(ggc) permafail.
> https://hg.mozilla.org/integration/mozilla-inbound/rev/ccaeff47aa20
> 
> https://treeherder.mozilla.org/ui/logviewer.
> html#?job_id=3528823&repo=mozilla-inbound

I cannot reproduce this issue locally, both are taking exactly the same time, with/without compacting GC.

name: ~/mozilla/make.sh dbg-gcc48-run --ion-eager --ion-offthread-compile=off
user: 13.35s  system: 0.07s  total: 13.42s (100%)

name: ~/mozilla/make.sh dbg-cgc-gcc48-run --ion-eager --ion-offthread-compile=off
user: 13.73s  system: 0.06s  total: 13.75s (100%)

So, I would expect the timeout to appear both with/without the compacting GC.

Also, It sounds like the test case is no longer checking what it was expecting to check.  The test case is doing some stack overflow which is harder to reach since we are now inlining Array.prototype.some and the lambda.

We can modify the test case, in such a way that we prevent the current optimization, and to keep it failing as it was, or maybe we should just remove this test case.
Fixing the test case to prevent the inlining caused by the current optimization show:

name: ~/mozilla/make.sh dbg-cgc-gcc48-run --ion-eager --ion-offthread-compile=off
user: 10.04s  system: 0.05s  total: 10.08s (100%)

Adding 26 extra arguments, fill-up the stack and cause the stack overflow to appear faster:

name: ~/mozilla/make.sh dbg-cgc-gcc48-run --ion-eager --ion-offthread-compile=off
user: 2.97s  system: 0.03s  total: 3.02s (99%)


Note: The test case is faster when we inhibit this optimization, as by inlining, we reduce the number of stack frame pushed on the stack, thus we need more time to fill the stack up to the point where we do a stack overflow.
This regresses deltablue with 5%-7%. So the big function with loop inlining also hits when having a constant/lambda :(. We definitely need to find out why "inlining big functions with loops" is cursed!

http://arewefastyet.com/#machine=11&view=single&suite=octane&subtest=DeltaBlue&start=1415069471&end=1415126055
http://arewefastyet.com/#machine=12&view=single&suite=octane&subtest=DeltaBlue&start=1415069471&end=1415126055
https://hg.mozilla.org/mozilla-central/rev/d6cfdeca8a4c
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Depends on: 1094108
You need to log in before you can comment on or make changes to this bug.