Closed
Bug 1286816
Opened 8 years ago
Closed 8 years ago
Wasm baseline: Optimize boolean evaluation for control
Categories
(Core :: JavaScript Engine: JIT, defect, P1)
Core
JavaScript Engine: JIT
Tracking
()
RESOLVED
FIXED
mozilla53
Tracking | Status | |
---|---|---|
firefox53 | --- | fixed |
People
(Reporter: lth, Assigned: lth)
References
Details
Attachments
(6 files, 9 obsolete files)
4.42 KB,
patch
|
h4writer
:
review+
|
Details | Diff | Splinter Review |
14.72 KB,
patch
|
h4writer
:
feedback+
|
Details | Diff | Splinter Review |
5.73 KB,
patch
|
h4writer
:
review+
|
Details | Diff | Splinter Review |
2.44 KB,
patch
|
h4writer
:
review+
|
Details | Diff | Splinter Review |
16.61 KB,
patch
|
h4writer
:
review+
|
Details | Diff | Splinter Review |
24.82 KB,
patch
|
h4writer
:
review+
|
Details | Diff | Splinter Review |
Two entangled problems:
(1) Currently the baseline compiler always evaluates the boolean expression into a register, then tests that register. This is silly (except on MIPS, maybe). It does not actually take a lot to cover most interesting cases of boolean evaluation for control.
(2) Currently the baseline compiler will usually evaluate a conditional branch to a label as a conditional branch across code that pops the execution stack + jumps. This is silly if there is no cleanup code.
Assignee | ||
Comment 1•8 years ago
|
||
Refactor and clean up comparison handling to make later changes simpler.
Assignee | ||
Comment 2•8 years ago
|
||
Add a simple token queue so that we can do one-token lookahead. This should be rolled into the WasmBinaryIterator, where it would be more efficient.
Assignee | ||
Comment 3•8 years ago
|
||
WIP boolean evaluation for control: if, at a compare node, we see a conditional branch as the next token then defer the evaluation so that when the branch is handled it can absorb the compare too.
Updated•8 years ago
|
Priority: -- → P2
Assignee | ||
Updated•8 years ago
|
Priority: P2 → P3
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → lhansen
Status: NEW → ASSIGNED
Assignee | ||
Comment 4•8 years ago
|
||
Comment on attachment 8770929 [details] [diff] [review]
bug1286816-token-queue.patch
Moved to bug 1316614.
Attachment #8770929 -
Attachment is obsolete: true
Assignee | ||
Comment 5•8 years ago
|
||
Attachment #8770927 -
Attachment is obsolete: true
Assignee | ||
Comment 6•8 years ago
|
||
Attachment #8770930 -
Attachment is obsolete: true
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Updated•8 years ago
|
Priority: P3 → P1
Assignee | ||
Comment 7•8 years ago
|
||
Clean up how we emit code for comparisons by making the code generation table-driven instead of switch-driven. These tables will be used heavily in the patch that implements the optimization (the next patch on this bug).
Attachment #8809446 -
Attachment is obsolete: true
Attachment #8812124 -
Flags: review?(hv1989)
Assignee | ||
Comment 8•8 years ago
|
||
Implement optimizations of boolean expressions used for control. This is good for performance and good for code density.
The idea here is that when we see certain boolean operators (currently comparisons and Eqz) we will look ahead to the next opcode, and if that opcode is a conditional branch we know how to handle (BrIf, If, and Select) then we will stash away the information about the boolean operator and not generate code yet.
The code generators for conditional branches will then use generalized machinery that make use of the stashed information to avoid generating boolean result values but instead fold the boolean operator into the branching directly.
On AngryBots, 84% of boolean operators (measuring statically) are optimized by this optimization.
Code generation is then divided into two, with a setup step (corresponding to popping operands) and a branch step (corresponding to testing and branching), because both If and Select need to perform additional work between the two steps. Information that pertains to both of the steps is stored in a BranchState structure. You will see that the logic in BrIf, If, and Select remains unchanged as a result of this division of labor.
Additionally, this patch has two tweaks:
- in int32 comparisons where the lhs is a constant, we do not move that
constant to a register but use it directly
- if a BrIf has no stack to pop the branch is direct to the target rather
than an inverted branch around code that pops the stack and branches to
the target (Select and If did not have this issue)
Attachment #8809447 -
Attachment is obsolete: true
Attachment #8812125 -
Flags: review?(hv1989)
Assignee | ||
Comment 9•8 years ago
|
||
Test cases that test that things continue to work in situations where the optimization is supposed to kick in. (See the next patch for some ideas about how to test that the optimization does kick in.)
Attachment #8812128 -
Flags: review?(hv1989)
Assignee | ||
Comment 10•8 years ago
|
||
This is a framework - not ready for landing - that is used to measure whether optimizations do kick in as they are supposed to, and which can be used to profile the efficacy of the optimization.
(It's not ready for landing primarily because test cases start failing if the baseline compiler is not enabled, so some kind of guarding is needed for that.)
We can sort-of safeguard that optimizations work with benchmarks, but it seemed like here we have an opportunity to check it directly, so I explored it a bit. For other optimizations in the baseline compiler we may be able to do similar things, but really not all that often.
What I'd like to know is whether you think something like this is useful, or if it's a distraction. And any other thoughts you may have.
Attachment #8812129 -
Flags: feedback?(hv1989)
Assignee | ||
Comment 11•8 years ago
|
||
Assignee | ||
Comment 12•8 years ago
|
||
Needs an Arm64 stub for the MacroAssembler interface branch64(Cond,Register64,Register64,Label*).
Comment 13•8 years ago
|
||
Comment on attachment 8812124 [details] [diff] [review]
bug1286816-refactor-comparisons-v3.patch
Review of attachment 8812124 [details] [diff] [review]:
-----------------------------------------------------------------
I'm not totally convinced a table-driven approach is better?
Adding a function that can translate the CmpOp to Assembler::Condition seems better. (E.g. something like JSOpToCondition).
That way the compiler will warn us if we add a new cases that aren't considered in those switches.
Instead of relying on a comment in the code. And needing to have NumCmpIntBinop etc...
Also we have InvertCondition we can use for the "notTaken" case. That way we don't have to keep "notTaken" in the table.
::: js/src/wasm/WasmBaselineCompile.cpp
@@ +503,5 @@
> + CMP_GE,
> + CMP_ULT, // I32, I64
> + CMP_ULE,
> + CMP_UGT,
> + CMP_UGE
Might make sense to add the comments inline in the code, instead of after. Since at first glance I thought CMP_EQ and CMP_ULT were special.
// Binary
// I32, I64, F32, F64
CMP_EQ,
CMP_NE,
CMP_LT,
CMP_LE,
CMP_GT,
CMP_GE,
// I32, I64
CMP_ULT,
CMP_ULE,
CMP_UGT,
CMP_UGE
@@ +3785,5 @@
> + { Assembler::Below, Assembler::AboveOrEqual }, // ULT
> + { Assembler::BelowOrEqual, Assembler::Above }, // ULE
> + { Assembler::Above, Assembler::BelowOrEqual }, // UGT
> + { Assembler::AboveOrEqual, Assembler::Below } // UGE
> +};
Assembler::InvertCondition exists for the second row ;). Please use that.
@@ +3795,5 @@
> + { Assembler::DoubleNotEqualOrUnordered, Assembler::DoubleEqual },
> + { Assembler::DoubleLessThan, Assembler::DoubleGreaterThanOrEqualOrUnordered },
> + { Assembler::DoubleLessThanOrEqual, Assembler::DoubleGreaterThanOrUnordered },
> + { Assembler::DoubleGreaterThan, Assembler::DoubleLessThanOrEqualOrUnordered },
> + { Assembler::DoubleGreaterThanOrEqual, Assembler::DoubleLessThanOrUnordered }
Don't think it would be bad to add InvertCondition for Assembler::DoubleCondition.
Attachment #8812124 -
Flags: review?(hv1989)
Comment 14•8 years ago
|
||
Comment on attachment 8812128 [details] [diff] [review]
bug1286816-test-cases.patch
Review of attachment 8812128 [details] [diff] [review]:
-----------------------------------------------------------------
tests++
Attachment #8812128 -
Flags: review?(hv1989) → review+
Updated•8 years ago
|
Attachment #8812129 -
Flags: feedback?(hv1989) → feedback+
Comment 15•8 years ago
|
||
Comment on attachment 8812125 [details] [diff] [review]
bug1286816-boolean-evaluation-v3.patch
Review of attachment 8812125 [details] [diff] [review]:
-----------------------------------------------------------------
Clearing for now waiting for updated patch. Design already looks nice.
Attachment #8812125 -
Flags: review?(hv1989)
Assignee | ||
Comment 16•8 years ago
|
||
Add Assembler::InvertCondition(DoubleCondition) for x86, x64, and ARM - MIPS already has this. (I forgot about None but will check that.)
Attachment #8814000 -
Flags: review?(hv1989)
Assignee | ||
Comment 17•8 years ago
|
||
Add a branch64(reg64,reg64) stub for ARM64.
Attachment #8814001 -
Flags: review?(hv1989)
Assignee | ||
Comment 18•8 years ago
|
||
Refactor comparisons, using standard Condition codes rather than the CMP_ values.
Attachment #8812124 -
Attachment is obsolete: true
Attachment #8814003 -
Flags: review?(hv1989)
Comment 19•8 years ago
|
||
Comment on attachment 8814000 [details] [diff] [review]
bug1286816-invertcondition-double.patch
Review of attachment 8814000 [details] [diff] [review]:
-----------------------------------------------------------------
Looks nice! Thanks!
Attachment #8814000 -
Flags: review?(hv1989) → review+
Assignee | ||
Comment 20•8 years ago
|
||
Implement boolean optimization for control using InvertCondition rather than the old table. (The consequence of having type-specific condition codes shows up in that there are type-specific "latent" variables for storing the condition codes and an extra template to handle the type specificity. Otherwise the changes are superficial at this level.)
Attachment #8812125 -
Attachment is obsolete: true
Attachment #8814005 -
Flags: review?(hv1989)
Comment 21•8 years ago
|
||
Comment on attachment 8814001 [details] [diff] [review]
bug1286816-arm64-branch64.patch
Review of attachment 8814001 [details] [diff] [review]:
-----------------------------------------------------------------
Nice :D
Attachment #8814001 -
Flags: review?(hv1989) → review+
Comment 22•8 years ago
|
||
Comment on attachment 8814003 [details] [diff] [review]
bug1286816-refactor-comparisons-v4.patch
Review of attachment 8814003 [details] [diff] [review]:
-----------------------------------------------------------------
+1
Attachment #8814003 -
Flags: review?(hv1989) → review+
Assignee | ||
Comment 23•8 years ago
|
||
Re the optimization instrumentation, that needs more work before it can land (it needs to only run when the code is baseline-compiled) and I may spin that off as a new bug, where I'll also try to capture the IRC discussion.
Comment 24•8 years ago
|
||
Comment on attachment 8814005 [details] [diff] [review]
bug1286816-boolean-evaluation-v4.patch
Review of attachment 8814005 [details] [diff] [review]:
-----------------------------------------------------------------
I mostly went through the logical part. Some feedback.
I'll need another round, but with these comments fixed I expect only nits anymore.
::: js/src/wasm/WasmBaselineCompile.cpp
@@ +3752,5 @@
> + void jumpConditionalWithJoinReg(BranchState* b, Cond cond, Lhs lhs, Rhs rhs)
> + {
> + AnyReg r = popJoinRegUnlessVoid(b->resultType);
> +
> + if (b->framePushed != BranchState::NoPop && willPopStackBeforeBranch(b->framePushed)) {
Shouldn't this be || instead of &&?
@@ +5006,5 @@
> +BaseCompiler::emitBranchSetup(BranchState* b)
> +{
> + switch (latentOp_) {
> + case LatentOp::None: {
> + b->i32.lhs = popI32NotJoinReg(b->resultType);
For every LatentOp can we set the corresponding "Imm" rhs already?
That way "emitBranchPerform" would be universal for every latentOp_ ...
I.e. set b->i32.rhsImm = true and b->i32.imm = 0 here.
@@ +5046,5 @@
> + }
> + case LatentOp::Eqz: {
> + switch (latentType_) {
> + case ValType::I32: {
> + b->i32.lhs = popI32NotJoinReg(b->resultType);
I.e. set b->i32.rhsImm = true and b->i32.imm = 0 here.
@@ +5050,5 @@
> + b->i32.lhs = popI32NotJoinReg(b->resultType);
> + break;
> + }
> + case ValType::I64: {
> + b->i64.lhs = popI64NotJoinReg(b->resultType);
I.e. set b->i64.rhsImm = true and b->i64.imm = 0 here.
@@ +5272,5 @@
> if (!pushControl(&endLabel, &elseLabel))
> return false;
>
> + if (!deadCode_)
> + emitBranchPerform(&b);
There is a bug here:
emitBranchPerform resets the "latentOp_", which is not reset when "deadCode_" is true. As result the if/select/... could assume there is. Can you add a testcase for this?
Attachment #8814005 -
Flags: review?(hv1989)
Comment 25•8 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #23)
> Re the optimization instrumentation, that needs more work before it can land
> (it needs to only run when the code is baseline-compiled) and I may spin
> that off as a new bug, where I'll also try to capture the IRC discussion.
Makes sense
Assignee | ||
Comment 26•8 years ago
|
||
(In reply to Hannes Verschore [:h4writer] from comment #24)
> Comment on attachment 8814005 [details] [diff] [review]
> bug1286816-boolean-evaluation-v4.patch
>
> ::: js/src/wasm/WasmBaselineCompile.cpp
> @@ +3752,5 @@
> > + void jumpConditionalWithJoinReg(BranchState* b, Cond cond, Lhs lhs, Rhs rhs)
> > + {
> > + AnyReg r = popJoinRegUnlessVoid(b->resultType);
> > +
> > + if (b->framePushed != BranchState::NoPop && willPopStackBeforeBranch(b->framePushed)) {
>
> Shouldn't this be || instead of &&?
No, && is right: if framePushed == NoPop then it doesn't matter what the other conjunct is; framePushed has to be different from NoPop AND there has to be something to pop for the first arm to be used, which performs the popping.
> @@ +5006,5 @@
> > +BaseCompiler::emitBranchSetup(BranchState* b)
> > +{
> > + switch (latentOp_) {
> > + case LatentOp::None: {
> > + b->i32.lhs = popI32NotJoinReg(b->resultType);
>
> For every LatentOp can we set the corresponding "Imm" rhs already?
> That way "emitBranchPerform" would be universal for every latentOp_ ...
>
> I.e. set b->i32.rhsImm = true and b->i32.imm = 0 here.
Fixed. Nice cleanup.
> @@ +5272,5 @@
> > if (!pushControl(&endLabel, &elseLabel))
> > return false;
> >
> > + if (!deadCode_)
> > + emitBranchPerform(&b);
>
> There is a bug here:
> emitBranchPerform resets the "latentOp_", which is not reset when
> "deadCode_" is true. As result the if/select/... could assume there is. Can
> you add a testcase for this?
Nice find. I have cleaned this up by (a) introducing and documenting a method resetLatentOp that must be called if emitBranchSetup / emitBranchPerform are not called because deadCode_ is true and (b) adding assertions in sniffLatentWhatever that check that the latentOp_ field is None. Part (b) is not as good as a test case, I'm still working on that, but it will catch some future mistakes.
Attachment #8814005 -
Attachment is obsolete: true
Attachment #8814406 -
Flags: review?(hv1989)
Assignee | ||
Comment 27•8 years ago
|
||
FWIW there's a bug here somewhere, the emscripten box2d benchmark no longer terminates with this optimization (x64 release build, Mac).
Assignee | ||
Comment 28•8 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #27)
> FWIW there's a bug here somewhere, the emscripten box2d benchmark no longer
> terminates with this optimization (x64 release build, Mac).
Easy fix in emitBranchSetup, it must set the type when synthesizing an operation in the "None" case.
Assignee | ||
Comment 29•8 years ago
|
||
Comment on attachment 8814406 [details] [diff] [review]
bug1286816-boolean-evaluation-v5.patch
New patch forthcoming.
Attachment #8814406 -
Flags: review?(hv1989)
Assignee | ||
Comment 30•8 years ago
|
||
Ran the embenchen benchmarks (in asm_vs_wasm), results are generally very good, with minor improvements across the suite. The times are measured at the shell (/usr/bin/time), which will tend to give an edge to the baseline compiler because the runtime ratio is diluted by startup overhead and faster compilation time:
Without the optimization
Baldr Rabaldr Ratio
box2d 1.06 1.78 .595
bullet 1.32 2.19 .602
conditionals 0.32 1.01 .316
copy 0.62 3.05 .203
corrections 1.00 1.57 .636
fannkuch 2.26 5.06 .446
fasta 0.78 2.17 .359
ifs 0.23 0.38 .605
linpack .8222 1.0493 .783
lua_binarytrees 4.03 6.35 .634
lua_scimark 4.145 7.352 .563
memops 0.92 3.50 .262
primes 1.17 1.44 .812
skinning 0.84 2.21 .380
zlib 1.20 2.08 .576
With the optimization
Baldr Rabaldr Ratio
box2d 1.09 1.72 .633 +
bullet 1.27 2.15 .590 -
conditionals 0.35 1.06 .330 +
copy 0.62 2.96 .209 +
corrections 1.02 1.51 .675 +
fannkuch 2.22 4.63 .479 +
fasta 0.78 2.14 .364 +
ifs 0.23 0.34 .676 +
linpack .8219 1.0293 .798 +
lua_binarytrees 4.01 6.06 .661 +
lua_scimark 4.110 7.331 .560 -
memops 0.92 3.51 .262
primes 1.16 1.18 .983 +
skinning 0.84 2.20 .381 +
zlib 1.20 2.00 .600 +
For bullet and scimark: though they are slower here a second run produced numbers that had a better ratio than without the optimization (.627 and .580 respectively). There's probably plenty of noise, but the trend is good.
Assignee | ||
Comment 31•8 years ago
|
||
This is the same patch as in comment #26, except that two bugs that crept into the rewritten emitBranchSetup / emitBranchPerform logic have been fixed (basically, failing to properly initialize variables along all paths).
Attachment #8814406 -
Attachment is obsolete: true
Attachment #8814823 -
Flags: review?(hv1989)
Updated•8 years ago
|
Attachment #8814823 -
Flags: review?(hv1989) → review+
Assignee | ||
Comment 32•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/b023c7d0e8b2e00960b134793dc71fd25b74dc3d
Bug 1286816 - branch64(reg,reg) stub for arm64. r=h4writer
https://hg.mozilla.org/integration/mozilla-inbound/rev/119af218c29ab502f82d60b4340bdb5f734bf6fd
Bug 1286816 - InvertCondition(DoubleCondition). r=h4writer
https://hg.mozilla.org/integration/mozilla-inbound/rev/3c9d3d8df078b513149aa0228621dad461abe94d
Bug 1286816 - wasm baseline, simplify code generation of branches. r=h4writer
https://hg.mozilla.org/integration/mozilla-inbound/rev/2b4ee4fa6f1939f34dd8ee4c46f49855d613fd3c
Bug 1286816 - wasm baseline, test cases for boolean optimization for control. r=h4writer
https://hg.mozilla.org/integration/mozilla-inbound/rev/c7dc06c5f56aa301af64092ce75033a6e7d9b0a4
Bug 1286816 - wasm baseline, optimize boolean expressions for control. r=h4writer
Assignee | ||
Comment 33•8 years ago
|
||
Optimization instrumentation forked off as bug 1320645.
Comment 34•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b023c7d0e8b2
https://hg.mozilla.org/mozilla-central/rev/119af218c29a
https://hg.mozilla.org/mozilla-central/rev/3c9d3d8df078
https://hg.mozilla.org/mozilla-central/rev/2b4ee4fa6f19
https://hg.mozilla.org/mozilla-central/rev/c7dc06c5f56a
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox53:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
You need to log in
before you can comment on or make changes to this bug.
Description
•