Open
Bug 1377710
Opened 7 years ago
Updated 6 months ago
Compiling lots of unused branches can be expensive
Categories
(Core :: JavaScript Engine, enhancement, P5)
Core
JavaScript Engine
Tracking
()
NEW
Performance Impact | low |
People
(Reporter: ehsan.akhgari, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: perf)
Attachments
(6 files, 2 obsolete files)
7.31 KB,
text/html
|
Details | |
19.67 KB,
patch
|
Details | Diff | Splinter Review | |
1.29 KB,
patch
|
tcampbell
:
review+
|
Details | Diff | Splinter Review |
5.57 KB,
patch
|
tcampbell
:
review+
|
Details | Diff | Splinter Review |
7.92 KB,
patch
|
nbp
:
review+
|
Details | Diff | Splinter Review |
10.74 KB,
patch
|
tcampbell
:
review+
|
Details | Diff | Splinter Review |
See the microbenchmark. Nightly on my machine gets around 550-850ms (the numbers can vary vastly), Chrome Dev channel gets around 5-8ms.
We seem to be spending a lot of time in PruneUnusedBranches() and MDefinition::justReplaceAllUsesWith().
Note that I wrote this microbenchmark while trying to see if another patch improves the speed of another function, so I don't necessarily know whether optimizing this case is interesting or not but I figured I should file this.
Comment 1•7 years ago
|
||
I should be the right person to look into this issue, but I am currently focusing on enabling the bytecode cache.
Flags: needinfo?(tcampbell)
Comment 2•7 years ago
|
||
I'm seeing a completely different profile now. What I see is in https://perfht.ml/2uKD2Rq (on Win 10 2017-07-04 nightly) is that we bailout from Ion while looking up foo() and get stuck in baseline. I'll investigate this more as I look at other issues with IonBuilder perf.
Clearing ni? for now as this was not related to specific realworld perf.
Flags: needinfo?(tcampbell)
Comment 3•7 years ago
|
||
Oops.. I was reading non-symbolicated profile wrong. Behavior is the same as before.
Updated•7 years ago
|
Assignee: nobody → nicolas.b.pierron
Status: NEW → ASSIGNED
Comment 4•7 years ago
|
||
I can reproduce this issue in the JS shell. Disabling Branch Pruning causes the time to go from 550ms to 30ms. Which is still a lot of time for such a simple function.
Comment 5•7 years ago
|
||
Attachment #8892125 -
Flags: review?(jdemooij)
Comment 6•7 years ago
|
||
Attachment #8892126 -
Flags: review?(sstangl)
Comment 7•7 years ago
|
||
This patch add extra MIR flags which are only used for the analysis of
Branch Pruning.
Previously branch pruning had a loop to check for the usage of Phi, by
visiting the linked list of uses, and iterate for at most 128 iterations
throught the list of uses.
This rewrite is caching the is-used state on the flags of the MPhi, and
benefit from the Post-order visit of the graph to only propagate the is-used
state one edge at a time. In addition, we still need to loop over the
back-edges, until we reached the fix-point where all the MPhi have the
is-used state. (which should be definitely less costly)
The previous computation had a way higher complexity, but was only applied
on the dead branches, where this code has a tighter loop which runs on all
blocks (even the live one), but with a smaller complexity overall.
Attachment #8892131 -
Flags: review?(tcampbell)
Comment 8•7 years ago
|
||
Counting Phi nodes as part of the set of instructions is not helpful, mostly
because Phi are converted into move group in the previous blocks. Also Mphi
cannot be effectful, so there is no need to go through the overhead of
checking it.
Attachment #8892136 -
Flags: review?(tcampbell)
Comment 9•7 years ago
|
||
Previously the cost of each block was computed when one candidate was
detected, which might have caused the code to re-visit the blocks multiple
times.
This pre-compute the stats for each basic blocks in tight and predictible
loops, and reduce a lot the remaining overhead seen while running Branch Pruning.
Attachment #8892137 -
Flags: review?(tcampbell)
Comment 10•7 years ago
|
||
Attachment #8892138 -
Flags: review+
Comment 11•7 years ago
|
||
Comment on attachment 8892125 [details] [diff] [review]
Branch Pruning: Check the compile info associated with the resume point.
Review of attachment 8892125 [details] [diff] [review]:
-----------------------------------------------------------------
Why is this better, considering MResumePoint::isObservableOperand just calls CompileInfo::isObservableSlot anyway? Will a later patch fix this?
MResumePoint::isObservableOperand is not inlined so it will be slower..
Comment 12•7 years ago
|
||
Responding to Comment 11, note that it basically reverts Bug 1342016, which was a performance improvement. I do believe reverting those lines would be slower.
Comment 13•7 years ago
|
||
This fails jsapi-tests.
> 8:03.41 c:\Users\tcampbell\Projects\gecko.ion\js\src\jsapi-tests/testJitMinimalFunc.h(49): error C2664: 'js::jit::CompileInfo::CompileInfo(js::jit::CompileInfo &&)': cannot convert argument 1 from 'int' to 'const js::jit::CompileInfo &'
> 8:03.41 c:\Users\tcampbell\Projects\gecko.ion\js\src\jsapi-tests/testJitMinimalFunc.h(49): note: Reason: cannot convert from 'int' to 'const js::jit::CompileInfo'
> 8:03.41 c:\Users\tcampbell\Projects\gecko.ion\js\src\jsapi-tests/testJitMinimalFunc.h(49): note: No constructor could take the source type, or constructor overload resolution was ambiguous
Comment 14•7 years ago
|
||
Comment on attachment 8892125 [details] [diff] [review]
Branch Pruning: Check the compile info associated with the resume point.
Review of attachment 8892125 [details] [diff] [review]:
-----------------------------------------------------------------
Clearing review for now - we should avoid replacing an inlined call with an out-of-line call.
Attachment #8892125 -
Flags: review?(jdemooij)
Updated•7 years ago
|
Attachment #8892125 -
Attachment is obsolete: true
Comment 15•7 years ago
|
||
(In reply to Ted Campbell [:tcampbell] from comment #13)
> This fails jsapi-tests.
>
> > 8:03.41 c:\Users\tcampbell\Projects\gecko.ion\js\src\jsapi-tests/testJitMinimalFunc.h(49): error C2664: 'js::jit::CompileInfo::CompileInfo(js::jit::CompileInfo &&)': cannot convert argument 1 from 'int' to 'const js::jit::CompileInfo &'
Fixed locally as part of the modification of the CompileInfo, which now requires a TempAllocator& argument.
Comment 16•7 years ago
|
||
Comment on attachment 8892126 [details] [diff] [review]
Cache isObservableSlot and isRecoverableSlot on the CompileInfo.
Review of attachment 8892126 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/CompileInfo.h
@@ +600,5 @@
>
> + // Recall whether the slots have to be kept for stack iteration or if they
> + // have to be recovered on bailout.
> + Vector<bool, 0, JitAllocPolicy> isObservable_;
> + Vector<bool, 0, JitAllocPolicy> isRecoverable_;
How much memory do these things wind up consuming? What's the tradeoff?
::: js/src/jit/MIR-inl.h
@@ +4,5 @@
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#ifndef jit_MIR_inl_h
> +#define jit_MIR_inl_h
Per Bug 1388045 Comment 2, could these just go in MIR.h, as part of that bug?
Comment 17•7 years ago
|
||
(In reply to Sean Stangl [:sstangl] from comment #16)
> ::: js/src/jit/CompileInfo.h
> @@ +600,5 @@
> >
> > + // Recall whether the slots have to be kept for stack iteration or if they
> > + // have to be recovered on bailout.
> > + Vector<bool, 0, JitAllocPolicy> isObservable_;
> > + Vector<bool, 0, JitAllocPolicy> isRecoverable_;
>
> How much memory do these things wind up consuming? What's the tradeoff?
They are only allocated once, without resize, so these only consumes ~2 byte per slot. This can be quite large for functions with tons of locals, but currently we consume way more space with all the resume points that rely on this info. Thus these 2 allocations per inlined JSScript are by far neglectable.
The trade-off is between (before) call, cache misses and branch miss-predictions, vs. (after) one cache miss and few neglectable allocations.
> ::: js/src/jit/MIR-inl.h
> @@ +4,5 @@
> > + * License, v. 2.0. If a copy of the MPL was not distributed with this
> > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> > +
> > +#ifndef jit_MIR_inl_h
> > +#define jit_MIR_inl_h
>
> Per Bug 1388045 Comment 2, could these just go in MIR.h, as part of that bug?
If needed, yes.
Comment 18•7 years ago
|
||
Comment on attachment 8892137 [details] [diff] [review]
Branch Pruning: Pre-compute and cache the cost of basic blocks.
Review of attachment 8892137 [details] [diff] [review]:
-----------------------------------------------------------------
Saving the instruction traversal seems like a nice win.
The whole traversal seems rough and I wonder why we don't use real dominator information. Incremental improvement on existing code already, so I won't hold up this patch.
::: js/src/jit/IonAnalysis.cpp
@@ +298,5 @@
> + size_t numInst;
> + // Number of effectful instructions per block.
> + size_t numEffectful;
> + // Number of predecessors (excluding OSR blocks).
> + size_t numPred;
Not sure we gain a lot by caching numPred/numSucc, but is fine this way.
@@ +334,5 @@
> + auto& info = infos[bid];
> + info.numInst = numInst;
> + info.numEffectful = numEffectful;
> + info.numPred = block->numPredecessors();
> + info.numPred = block->numSuccessors();
TYPO: info.numSucc
@@ +421,5 @@
> // Iterate over the approximated set of dominated blocks and count
> // the number of instructions which are dominated. Note that this
> // approximation has issues with OSR blocks, but this should not be
> // a big deal.
> + if (!CollectBlocksInfos(mir, graph, infos))
Move before comment maybe?
@@ +435,1 @@
> // Iterate over dominated blocks, and visit exit blocks as well.
I'm not sure why we don't use BuildDominatorTree information rather than this rough approximation?
Attachment #8892137 -
Flags: review?(tcampbell) → review+
Comment 19•7 years ago
|
||
Comment on attachment 8892136 [details] [diff] [review]
Branch Pruning: Do not count Phi nodes as part of the weight of basic blocks.
Review of attachment 8892136 [details] [diff] [review]:
-----------------------------------------------------------------
Ignoring PHIs seems like the right call. The LLVM inliner seems to treat PHIs as zero-cost as well.
Do we need to tweak |JitOptions.branchPruningInstFactor| based on average PHI density?
Attachment #8892136 -
Flags: review?(tcampbell) → review+
Comment 20•7 years ago
|
||
Comment on attachment 8892131 [details] [diff] [review]
Branch Pruning: Use MIR flags to track usage of Phi instructions.
Review of attachment 8892131 [details] [diff] [review]:
-----------------------------------------------------------------
I like the with general approach of having analysis flags to avoid repeated work.
As we discussed on IRC, I'm having trouble following the BPPredRemoved logic. At the very least this transform needs better commenting to explain the invariant. I think you are also investigating if there is a hole or not.
Will be nice to get rid of another quadratic analysis :)
::: js/src/jit/IonAnalysis.cpp
@@ +93,5 @@
> + op->setInWorklist();
> + if (!worklist.append(op->toPhi()))
> + return false;
> + }
> +
B) If BPPredRemoved is set, we check all pred blocks for their marked status. Is this guaranteed to only have the expected blocks? I see either i) BPPredRemoved is uneeded, ii) there is a potential edge case, or iii) this is all very correct but needs documentation about why.
@@ +109,5 @@
> + // operands would be marked in the following loop.
> + phi->getOperand(p)->setUseRemovedUnchecked();
> + }
> + }
> +
Can the block above and below be shared?
@@ +200,5 @@
> // between the |block| and its successor |succ|.
> MDefinition* def = phi->getOperand(predIndex);
> if (def->isUseRemoved())
> continue;
>
A) Set on Phi without indicated which predIndex we are.
@@ +517,5 @@
> if (!block->isMarked() && !block->unreachable())
> continue;
>
> + if (!FlagAllOperandsAsHavingRemovedUses(mir, block))
> + return false;
Do other users of FlagAllOperandsAsHavingRemovedUses need to be fixed too?
::: js/src/jit/MIR.h
@@ +211,5 @@
> + \
> + /* Used for branch pruning optimizarion. They mirror usage of resume point
> + * and other instructions on phi instructions to cache the analysis.
> + */ \
> + _(BPUsed) \
I'm not a fan of have local analysis local flags on MIR types. I wish we had a better system managing these sorts of IR analysis temporary metadata. Maybe explicitly mention they only are valid within branch pruning pass?
Attachment #8892131 -
Flags: review?(tcampbell)
Comment 21•7 years ago
|
||
Comment on attachment 8892126 [details] [diff] [review]
Cache isObservableSlot and isRecoverableSlot on the CompileInfo.
Patch looks OK -- would just like to get rid of MIR-inl.h. Will re-review with that changed (addressed in a previous comment, also possibly addressed by another bug).
Attachment #8892126 -
Flags: review?(sstangl)
Comment 22•7 years ago
|
||
This patch duplicates the EliminatePhis loop which flags all unused phi
instructions. This is slightly too conservative about the observability in
case of the presence of try blocks, which should be fixed in a follow-up
issue.
In the mean time, this patch solves the issue by flagging all Phi
non-observable & non-used phis as being unused.
Thus, FlagPhiInputsAsHavingRemovedUses only checks for phi which are lacking
the Unused flag, to set the UseRemoved flag on the operand.
FlagUnusedPhi is equivalent to the previous patch, and does not try to do
it in the same loop, and it re-uses the existing logic for removing
redundant Phis, and also for flag phis as unused.
Checking the speed of the original test case, this is now executing in
29.8ms instead of 30ms when branch pruning is disabled. (comment 4)
Attachment #8899562 -
Flags: review?(tcampbell)
Updated•7 years ago
|
Attachment #8892131 -
Attachment is obsolete: true
Comment 23•7 years ago
|
||
Comment on attachment 8899562 [details] [diff] [review]
Flag unused Phis before flagging phis as having removed uses.
Review of attachment 8899562 [details] [diff] [review]:
-----------------------------------------------------------------
This is a lot clearer :) Nice compile time improvement too.
Please add a comment near FlagAllOperandsAsHavingRemovedUses that FlagUnusedPhis should be called first.
::: js/src/jit/IonAnalysis.cpp
@@ +1221,5 @@
> // use from an actual instruction.
> + //
> + // At the end of this phase, every Phi which is not flag as being
> + // unused is observed. Thus, the removal of of used Phi should
> + // introduce the addition of UseRemoved flag to any of its operands,
s/not flag as/not flagged as/
s/of of/of/
s/operands,/operands./
Attachment #8899562 -
Flags: review?(tcampbell) → review+
Comment 24•7 years ago
|
||
(In reply to Ted Campbell [:tcampbell] from comment #18)
> @@ +435,1 @@
> > // Iterate over dominated blocks, and visit exit blocks as well.
>
> I'm not sure why we don't use BuildDominatorTree information rather than
> this rough approximation?
Computing the dominator tree would be as costly as iterating over the dominated set, by counting that we have a balanced number of predecessors and successors.
Moreover, after branch pruning, the dominator computation would no longer be valid as branch pruning might remove edges, causing more block to dominates others.
(In reply to Ted Campbell [:tcampbell] from comment #19)
> Do we need to tweak |JitOptions.branchPruningInstFactor| based on average
> PHI density?
Technically, yes.
In practice, I do not think that this matter much knowing this is the small factor, and that this was a way to discard large single blocks with no Phi (octane/mandreel if I recall correctly). I might come back to increase this number in a follow-up bug if this causes performance issues on our benchmarks.
Comment 25•7 years ago
|
||
Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c9985a01db87
Branch Pruning: Do not count Phi nodes as part of the weight of basic blocks. r=tcampbell
https://hg.mozilla.org/integration/mozilla-inbound/rev/be2075c60ec8
Branch Pruning: Pre-compute and cache the cost of basic blocks. r=tcampbell
https://hg.mozilla.org/integration/mozilla-inbound/rev/e5a213c7b80e
Flag unused Phis before flagging Phi inputs as having removed uses. r=tcampbell
https://hg.mozilla.org/integration/mozilla-inbound/rev/35c735592ec2
Branch Pruning: Add pathological test case. r=nbp
Comment 26•7 years ago
|
||
Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e3b8f85ad004
FlagUnusedPhi should not assert that Unused flag does not exists as it is now re-run multiple times. r=jandem
Comment 27•7 years ago
|
||
Backed out for asserting at js/src/jit/IonAnalysis.cpp:2835 while running js/src/jit/IonAnalysis.cpp:283:
https://hg.mozilla.org/integration/mozilla-inbound/rev/801f38f144051335ce76152cb123cc8ea688e218
https://hg.mozilla.org/integration/mozilla-inbound/rev/16a30269573302a5724ba30bffff326844ed20a4
https://hg.mozilla.org/integration/mozilla-inbound/rev/59ac198566094b6394a8bdccff13ec2069b5bcdc
https://hg.mozilla.org/integration/mozilla-inbound/rev/4d9598ac42647e2d58f6baabeb0cf321aab5e6a2
https://hg.mozilla.org/integration/mozilla-inbound/rev/c15951f74658fbd01d307cf973f1ebaa87ac7758
Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=e3b8f85ad0049a8ff809d1f5672182196f186d2c&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log assertion: https://treeherder.mozilla.org/logviewer.html#?job_id=126438203&repo=mozilla-inbound
Assertion failure: !block->isMarked(), at /home/worker/workspace/build/src/js/src/jit/IonAnalysis.cpp:2835
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/gc/bug-1282986.js | Assertion failure: !block->isMarked(), at /home/worker/workspace/build/src/js/src/jit/IonAnalysis.cpp:2835 (code -11, args "--baseline-eager") [4.5 s]
Failure log js/src/jit-test/tests/ion/branch-pruning-abuse.js: https://treeherder.mozilla.org/logviewer.html#?job_id=126438171&repo=mozilla-inbound
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/ion/branch-pruning-abuse.js | Timeout (code -9, args "--ion-eager --ion-offthread-compile=off") [150.0 s]
TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/ion/branch-pruning-abuse.js | Timeout (code -9, args "--ion-eager --ion-offthread-compile=off --non-writable-jitcode --ion-check-range-analysis --ion-extra-checks --no-sse3 --no-threads") [150.0 s]
Flags: needinfo?(nicolas.b.pierron)
Comment 28•7 years ago
|
||
(In reply to Ted Campbell [:tcampbell] from comment #23)
>
> This is a lot clearer :) Nice compile time improvement too.
Hum … apparently this is not the case, as I fixed the typo from comment 18.
We seems to now be spending a lot of time under the justReplaceAllUsesWith, which is under FlagUnusedPhi. (~190 ms on the test case)
Note, that the compilation time issue compared to v8, is that we have resume points for each block entry, and in this case we spam the graph with tons of resume point capturing hundreds of locals. As we add these resume points, we also add Phis, to simplify the graph construction.
I will look if we have any simple way to avoid adding these Phis node eagerly in IonBuilder.
I will look if there is a way to avoid generating Phis under IonBuilder.
(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #27)
> Failure log assertion:
> https://treeherder.mozilla.org/logviewer.html#?job_id=126438203&repo=mozilla-
> inbound
> Assertion failure: !block->isMarked(), at
> /home/worker/workspace/build/src/js/src/jit/IonAnalysis.cpp:2835
> TEST-UNEXPECTED-FAIL | js/src/jit-test/tests/gc/bug-1282986.js | Assertion
> failure: !block->isMarked(), at
> /home/worker/workspace/build/src/js/src/jit/IonAnalysis.cpp:2835 (code -11,
> args "--baseline-eager") [4.5 s]
Note, this is highly intermittent, and only emerged recently. I now managed to reproduce it under rr, and I will investigate.
Comment 29•7 years ago
|
||
Comment on attachment 8899562 [details] [diff] [review]
Flag unused Phis before flagging phis as having removed uses.
Review of attachment 8899562 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/IonAnalysis.cpp
@@ +469,5 @@
> + Observability observability = graph.hasTryBlock()
> + ? ConservativeObservability
> + : AggressiveObservability;
> + if (!FlagUnusedPhis(mir, graph, observability))
> + return true;
self-nit: This should be "return false;"
This caused the various intermittent "Assertion failures !block->isMarked()" failures under OOMTest function.
@@ +2339,5 @@
> + Observability observability = graph.hasTryBlock()
> + ? ConservativeObservability
> + : AggressiveObservability;
> + if (!FlagUnusedPhis(mir, graph, observability))
> + return true;
self-nit: same here.
Updated•7 years ago
|
status-firefox57:
--- → wontfix
status-firefox58:
--- → fix-optional
Keywords: perf
Priority: -- → P3
Whiteboard: [qf:p3]
Comment 30•7 years ago
|
||
status-firefox59:
--- → ?
Updated•5 years ago
|
Assignee: nicolas.b.pierron → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(nicolas.b.pierron)
Priority: P3 → P5
Updated•3 years ago
|
Performance Impact: --- → P3
Whiteboard: [qf:p3]
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•