Closed Bug 1286816 Opened 8 years ago Closed 8 years ago

Wasm baseline: Optimize boolean evaluation for control

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla53

Tracking Flags:

Tracking

Status

firefox53

---

fixed

People

(Reporter: lth, Assigned: lth)

References

Details

Attachments

(6 files, 9 obsolete files)

bug1286816-refactor-comparisons.patch 8 years ago Lars T Hansen [:lth] 19.74 KB, patch		Details \| Diff \| Splinter Review
bug1286816-token-queue.patch 8 years ago Lars T Hansen [:lth] 2.72 KB, patch		Details \| Diff \| Splinter Review
bug1286816-boolean-evaluation.patch 8 years ago Lars T Hansen [:lth] 14.00 KB, patch		Details \| Diff \| Splinter Review
bug1286816-refactor-comparisons-v2.patch 8 years ago Lars T Hansen [:lth] 19.79 KB, patch		Details \| Diff \| Splinter Review
bug1286816-boolean-evaluation-v2.patch 8 years ago Lars T Hansen [:lth] 14.88 KB, patch		Details \| Diff \| Splinter Review
bug1286816-refactor-comparisons-v3.patch 8 years ago Lars T Hansen [:lth] 19.50 KB, patch		Details \| Diff \| Splinter Review
bug1286816-boolean-evaluation-v3.patch 8 years ago Lars T Hansen [:lth] 23.54 KB, patch		Details \| Diff \| Splinter Review
bug1286816-test-cases.patch 8 years ago Lars T Hansen [:lth] 4.42 KB, patch	h4writer : review+	Details \| Diff \| Splinter Review
bug1286816-optimization-instrumentation.patch 8 years ago Lars T Hansen [:lth] 14.72 KB, patch	h4writer : feedback+	Details \| Diff \| Splinter Review
bug1286816-invertcondition-double.patch 8 years ago Lars T Hansen [:lth] 5.73 KB, patch	h4writer : review+	Details \| Diff \| Splinter Review
bug1286816-arm64-branch64.patch 8 years ago Lars T Hansen [:lth] 2.44 KB, patch	h4writer : review+	Details \| Diff \| Splinter Review
bug1286816-refactor-comparisons-v4.patch 8 years ago Lars T Hansen [:lth] 16.61 KB, patch	h4writer : review+	Details \| Diff \| Splinter Review
bug1286816-boolean-evaluation-v4.patch 8 years ago Lars T Hansen [:lth] 24.11 KB, patch		Details \| Diff \| Splinter Review
bug1286816-boolean-evaluation-v5.patch 8 years ago Lars T Hansen [:lth] 24.67 KB, patch		Details \| Diff \| Splinter Review
bug1286816-boolean-evaluation-v6.patch 8 years ago Lars T Hansen [:lth] 24.82 KB, patch	h4writer : review+	Details \| Diff \| Splinter Review

Lars T Hansen [:lth]

Assignee

Description

•

8 years ago

Two entangled problems: (1) Currently the baseline compiler always evaluates the boolean expression into a register, then tests that register. This is silly (except on MIPS, maybe). It does not actually take a lot to cover most interesting cases of boolean evaluation for control. (2) Currently the baseline compiler will usually evaluate a conditional branch to a label as a conditional branch across code that pops the execution stack + jumps. This is silly if there is no cleanup code.

Lars T Hansen [:lth]

Assignee

Comment 1

•

8 years ago

Attached patch bug1286816-refactor-comparisons.patch (obsolete) — Details — Splinter Review

Refactor and clean up comparison handling to make later changes simpler.

Lars T Hansen [:lth]

Assignee

Comment 2

•

8 years ago

Attached patch bug1286816-token-queue.patch (obsolete) — Details — Splinter Review

Add a simple token queue so that we can do one-token lookahead. This should be rolled into the WasmBinaryIterator, where it would be more efficient.

Lars T Hansen [:lth]

Assignee

Comment 3

•

8 years ago

Attached patch bug1286816-boolean-evaluation.patch (obsolete) — Details — Splinter Review

WIP boolean evaluation for control: if, at a compare node, we see a conditional branch as the next token then defer the evaluation so that when the branch is handled it can absorb the compare too.

Benjamin Bouvier [:bbouvier] (inactive)

Updated

•

8 years ago

Blocks: wasm

Eric Faust [:efaust]

Updated

•

8 years ago

Priority: -- → P2

Lars T Hansen [:lth]

Assignee

Updated

•

8 years ago

Priority: P2 → P3

Lars T Hansen [:lth]

Assignee

Updated

•

8 years ago

Assignee: nobody → lhansen

Status: NEW → ASSIGNED

Lars T Hansen [:lth]

Assignee

Updated

•

8 years ago

Depends on: 1316614

Lars T Hansen [:lth]

Assignee

Comment 4

•

8 years ago

Comment on attachment 8770929 [details] [diff] [review] bug1286816-token-queue.patch Moved to bug 1316614.

Attachment #8770929 - Attachment is obsolete: true

Lars T Hansen [:lth]

Assignee

Comment 5

•

8 years ago

Attached patch bug1286816-refactor-comparisons-v2.patch (obsolete) — Details — Splinter Review

Attachment #8770927 - Attachment is obsolete: true

Lars T Hansen [:lth]

Assignee

Comment 6

•

8 years ago

Attached patch bug1286816-boolean-evaluation-v2.patch (obsolete) — Details — Splinter Review

Attachment #8770930 - Attachment is obsolete: true

Lars T Hansen [:lth]

Assignee

Updated

•

8 years ago

Blocks: wasm-baseline-perf
No longer blocks: wasm

Lars T Hansen [:lth]

Assignee

Updated

•

8 years ago

Updated

•

8 years ago

Priority: P3 → P1

Lars T Hansen [:lth]

Assignee

Comment 7

•

8 years ago

Attached patch bug1286816-refactor-comparisons-v3.patch (obsolete) — Details — Splinter Review

Clean up how we emit code for comparisons by making the code generation table-driven instead of switch-driven. These tables will be used heavily in the patch that implements the optimization (the next patch on this bug).

Attachment #8809446 - Attachment is obsolete: true

Attachment #8812124 - Flags: review?(hv1989)

Lars T Hansen [:lth]

Assignee

Comment 8

•

8 years ago

Attached patch bug1286816-boolean-evaluation-v3.patch (obsolete) — Details — Splinter Review

Implement optimizations of boolean expressions used for control. This is good for performance and good for code density. The idea here is that when we see certain boolean operators (currently comparisons and Eqz) we will look ahead to the next opcode, and if that opcode is a conditional branch we know how to handle (BrIf, If, and Select) then we will stash away the information about the boolean operator and not generate code yet. The code generators for conditional branches will then use generalized machinery that make use of the stashed information to avoid generating boolean result values but instead fold the boolean operator into the branching directly. On AngryBots, 84% of boolean operators (measuring statically) are optimized by this optimization. Code generation is then divided into two, with a setup step (corresponding to popping operands) and a branch step (corresponding to testing and branching), because both If and Select need to perform additional work between the two steps. Information that pertains to both of the steps is stored in a BranchState structure. You will see that the logic in BrIf, If, and Select remains unchanged as a result of this division of labor. Additionally, this patch has two tweaks: - in int32 comparisons where the lhs is a constant, we do not move that constant to a register but use it directly - if a BrIf has no stack to pop the branch is direct to the target rather than an inverted branch around code that pops the stack and branches to the target (Select and If did not have this issue)

Attachment #8809447 - Attachment is obsolete: true

Attachment #8812125 - Flags: review?(hv1989)

Lars T Hansen [:lth]

Assignee

Comment 9

•

8 years ago

Attached patch bug1286816-test-cases.patch — Details — Splinter Review

Test cases that test that things continue to work in situations where the optimization is supposed to kick in. (See the next patch for some ideas about how to test that the optimization does kick in.)

Attachment #8812128 - Flags: review?(hv1989)

Lars T Hansen [:lth]

Assignee

Comment 10

•

8 years ago

Attached patch bug1286816-optimization-instrumentation.patch — Details — Splinter Review

This is a framework - not ready for landing - that is used to measure whether optimizations do kick in as they are supposed to, and which can be used to profile the efficacy of the optimization. (It's not ready for landing primarily because test cases start failing if the baseline compiler is not enabled, so some kind of guarding is needed for that.) We can sort-of safeguard that optimizations work with benchmarks, but it seemed like here we have an opportunity to check it directly, so I explored it a bit. For other optimizations in the baseline compiler we may be able to do similar things, but really not all that often. What I'd like to know is whether you think something like this is useful, or if it's a distraction. And any other thoughts you may have.

Attachment #8812129 - Flags: feedback?(hv1989)

Lars T Hansen [:lth]

Assignee

Comment 11

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=67021b2a5a61

Lars T Hansen [:lth]

Assignee

Comment 12

•

8 years ago

Needs an Arm64 stub for the MacroAssembler interface branch64(Cond,Register64,Register64,Label*).

Hannes Verschore [:h4writer]

Comment 13

•

8 years ago

Comment on attachment 8812124 [details] [diff] [review] bug1286816-refactor-comparisons-v3.patch Review of attachment 8812124 [details] [diff] [review]: ----------------------------------------------------------------- I'm not totally convinced a table-driven approach is better? Adding a function that can translate the CmpOp to Assembler::Condition seems better. (E.g. something like JSOpToCondition). That way the compiler will warn us if we add a new cases that aren't considered in those switches. Instead of relying on a comment in the code. And needing to have NumCmpIntBinop etc... Also we have InvertCondition we can use for the "notTaken" case. That way we don't have to keep "notTaken" in the table. ::: js/src/wasm/WasmBaselineCompile.cpp @@ +503,5 @@ > + CMP_GE, > + CMP_ULT, // I32, I64 > + CMP_ULE, > + CMP_UGT, > + CMP_UGE Might make sense to add the comments inline in the code, instead of after. Since at first glance I thought CMP_EQ and CMP_ULT were special. // Binary // I32, I64, F32, F64 CMP_EQ, CMP_NE, CMP_LT, CMP_LE, CMP_GT, CMP_GE, // I32, I64 CMP_ULT, CMP_ULE, CMP_UGT, CMP_UGE @@ +3785,5 @@ > + { Assembler::Below, Assembler::AboveOrEqual }, // ULT > + { Assembler::BelowOrEqual, Assembler::Above }, // ULE > + { Assembler::Above, Assembler::BelowOrEqual }, // UGT > + { Assembler::AboveOrEqual, Assembler::Below } // UGE > +}; Assembler::InvertCondition exists for the second row ;). Please use that. @@ +3795,5 @@ > + { Assembler::DoubleNotEqualOrUnordered, Assembler::DoubleEqual }, > + { Assembler::DoubleLessThan, Assembler::DoubleGreaterThanOrEqualOrUnordered }, > + { Assembler::DoubleLessThanOrEqual, Assembler::DoubleGreaterThanOrUnordered }, > + { Assembler::DoubleGreaterThan, Assembler::DoubleLessThanOrEqualOrUnordered }, > + { Assembler::DoubleGreaterThanOrEqual, Assembler::DoubleLessThanOrUnordered } Don't think it would be bad to add InvertCondition for Assembler::DoubleCondition.

Attachment #8812124 - Flags: review?(hv1989)

Hannes Verschore [:h4writer]

Comment 14

•

8 years ago

Comment on attachment 8812128 [details] [diff] [review] bug1286816-test-cases.patch Review of attachment 8812128 [details] [diff] [review]: ----------------------------------------------------------------- tests++

Attachment #8812128 - Flags: review?(hv1989) → review+

Hannes Verschore [:h4writer]

Updated

•

8 years ago

Attachment #8812129 - Flags: feedback?(hv1989) → feedback+

Hannes Verschore [:h4writer]

Comment 15

•

8 years ago

Comment on attachment 8812125 [details] [diff] [review] bug1286816-boolean-evaluation-v3.patch Review of attachment 8812125 [details] [diff] [review]: ----------------------------------------------------------------- Clearing for now waiting for updated patch. Design already looks nice.

Attachment #8812125 - Flags: review?(hv1989)

Lars T Hansen [:lth]

Assignee

Comment 16

•

8 years ago

Attached patch bug1286816-invertcondition-double.patch — Details — Splinter Review

Add Assembler::InvertCondition(DoubleCondition) for x86, x64, and ARM - MIPS already has this. (I forgot about None but will check that.)

Attachment #8814000 - Flags: review?(hv1989)

Lars T Hansen [:lth]

Assignee

Comment 17

•

8 years ago

Attached patch bug1286816-arm64-branch64.patch — Details — Splinter Review

Add a branch64(reg64,reg64) stub for ARM64.

Attachment #8814001 - Flags: review?(hv1989)

Lars T Hansen [:lth]

Assignee

Comment 18

•

8 years ago

Attached patch bug1286816-refactor-comparisons-v4.patch — Details — Splinter Review

Refactor comparisons, using standard Condition codes rather than the CMP_ values.

Attachment #8812124 - Attachment is obsolete: true

Attachment #8814003 - Flags: review?(hv1989)

Hannes Verschore [:h4writer]

Comment 19

•

8 years ago

Comment on attachment 8814000 [details] [diff] [review] bug1286816-invertcondition-double.patch Review of attachment 8814000 [details] [diff] [review]: ----------------------------------------------------------------- Looks nice! Thanks!

Attachment #8814000 - Flags: review?(hv1989) → review+

Lars T Hansen [:lth]

Assignee

Comment 20

•

8 years ago

Attached patch bug1286816-boolean-evaluation-v4.patch (obsolete) — Details — Splinter Review

Implement boolean optimization for control using InvertCondition rather than the old table. (The consequence of having type-specific condition codes shows up in that there are type-specific "latent" variables for storing the condition codes and an extra template to handle the type specificity. Otherwise the changes are superficial at this level.)

Attachment #8812125 - Attachment is obsolete: true

Attachment #8814005 - Flags: review?(hv1989)

Hannes Verschore [:h4writer]

Comment 21

•

8 years ago

Comment on attachment 8814001 [details] [diff] [review] bug1286816-arm64-branch64.patch Review of attachment 8814001 [details] [diff] [review]: ----------------------------------------------------------------- Nice :D

Attachment #8814001 - Flags: review?(hv1989) → review+

Hannes Verschore [:h4writer]

Comment 22

•

8 years ago

Comment on attachment 8814003 [details] [diff] [review] bug1286816-refactor-comparisons-v4.patch Review of attachment 8814003 [details] [diff] [review]: ----------------------------------------------------------------- +1

Attachment #8814003 - Flags: review?(hv1989) → review+

Lars T Hansen [:lth]

Assignee

Comment 23

•

8 years ago

Re the optimization instrumentation, that needs more work before it can land (it needs to only run when the code is baseline-compiled) and I may spin that off as a new bug, where I'll also try to capture the IRC discussion.

Hannes Verschore [:h4writer]

Comment 24

•

8 years ago

Comment on attachment 8814005 [details] [diff] [review] bug1286816-boolean-evaluation-v4.patch Review of attachment 8814005 [details] [diff] [review]: ----------------------------------------------------------------- I mostly went through the logical part. Some feedback. I'll need another round, but with these comments fixed I expect only nits anymore. ::: js/src/wasm/WasmBaselineCompile.cpp @@ +3752,5 @@ > + void jumpConditionalWithJoinReg(BranchState* b, Cond cond, Lhs lhs, Rhs rhs) > + { > + AnyReg r = popJoinRegUnlessVoid(b->resultType); > + > + if (b->framePushed != BranchState::NoPop && willPopStackBeforeBranch(b->framePushed)) { Shouldn't this be || instead of &&? @@ +5006,5 @@ > +BaseCompiler::emitBranchSetup(BranchState* b) > +{ > + switch (latentOp_) { > + case LatentOp::None: { > + b->i32.lhs = popI32NotJoinReg(b->resultType); For every LatentOp can we set the corresponding "Imm" rhs already? That way "emitBranchPerform" would be universal for every latentOp_ ... I.e. set b->i32.rhsImm = true and b->i32.imm = 0 here. @@ +5046,5 @@ > + } > + case LatentOp::Eqz: { > + switch (latentType_) { > + case ValType::I32: { > + b->i32.lhs = popI32NotJoinReg(b->resultType); I.e. set b->i32.rhsImm = true and b->i32.imm = 0 here. @@ +5050,5 @@ > + b->i32.lhs = popI32NotJoinReg(b->resultType); > + break; > + } > + case ValType::I64: { > + b->i64.lhs = popI64NotJoinReg(b->resultType); I.e. set b->i64.rhsImm = true and b->i64.imm = 0 here. @@ +5272,5 @@ > if (!pushControl(&endLabel, &elseLabel)) > return false; > > + if (!deadCode_) > + emitBranchPerform(&b); There is a bug here: emitBranchPerform resets the "latentOp_", which is not reset when "deadCode_" is true. As result the if/select/... could assume there is. Can you add a testcase for this?

Attachment #8814005 - Flags: review?(hv1989)

Hannes Verschore [:h4writer]

Comment 25

•

8 years ago

(In reply to Lars T Hansen [:lth] from comment #23) > Re the optimization instrumentation, that needs more work before it can land > (it needs to only run when the code is baseline-compiled) and I may spin > that off as a new bug, where I'll also try to capture the IRC discussion. Makes sense

Lars T Hansen [:lth]

Assignee

Comment 26

•

8 years ago

Attached patch bug1286816-boolean-evaluation-v5.patch (obsolete) — Details — Splinter Review

(In reply to Hannes Verschore [:h4writer] from comment #24) > Comment on attachment 8814005 [details] [diff] [review] > bug1286816-boolean-evaluation-v4.patch > > ::: js/src/wasm/WasmBaselineCompile.cpp > @@ +3752,5 @@ > > + void jumpConditionalWithJoinReg(BranchState* b, Cond cond, Lhs lhs, Rhs rhs) > > + { > > + AnyReg r = popJoinRegUnlessVoid(b->resultType); > > + > > + if (b->framePushed != BranchState::NoPop && willPopStackBeforeBranch(b->framePushed)) { > > Shouldn't this be || instead of &&? No, && is right: if framePushed == NoPop then it doesn't matter what the other conjunct is; framePushed has to be different from NoPop AND there has to be something to pop for the first arm to be used, which performs the popping. > @@ +5006,5 @@ > > +BaseCompiler::emitBranchSetup(BranchState* b) > > +{ > > + switch (latentOp_) { > > + case LatentOp::None: { > > + b->i32.lhs = popI32NotJoinReg(b->resultType); > > For every LatentOp can we set the corresponding "Imm" rhs already? > That way "emitBranchPerform" would be universal for every latentOp_ ... > > I.e. set b->i32.rhsImm = true and b->i32.imm = 0 here. Fixed. Nice cleanup. > @@ +5272,5 @@ > > if (!pushControl(&endLabel, &elseLabel)) > > return false; > > > > + if (!deadCode_) > > + emitBranchPerform(&b); > > There is a bug here: > emitBranchPerform resets the "latentOp_", which is not reset when > "deadCode_" is true. As result the if/select/... could assume there is. Can > you add a testcase for this? Nice find. I have cleaned this up by (a) introducing and documenting a method resetLatentOp that must be called if emitBranchSetup / emitBranchPerform are not called because deadCode_ is true and (b) adding assertions in sniffLatentWhatever that check that the latentOp_ field is None. Part (b) is not as good as a test case, I'm still working on that, but it will catch some future mistakes.

Attachment #8814005 - Attachment is obsolete: true

Attachment #8814406 - Flags: review?(hv1989)

Lars T Hansen [:lth]

Assignee

Comment 27

•

8 years ago

FWIW there's a bug here somewhere, the emscripten box2d benchmark no longer terminates with this optimization (x64 release build, Mac).

Lars T Hansen [:lth]

Assignee

Comment 28

•

8 years ago

(In reply to Lars T Hansen [:lth] from comment #27) > FWIW there's a bug here somewhere, the emscripten box2d benchmark no longer > terminates with this optimization (x64 release build, Mac). Easy fix in emitBranchSetup, it must set the type when synthesizing an operation in the "None" case.

Lars T Hansen [:lth]

Assignee

Comment 29

•

8 years ago

Comment on attachment 8814406 [details] [diff] [review] bug1286816-boolean-evaluation-v5.patch New patch forthcoming.

Attachment #8814406 - Flags: review?(hv1989)

Lars T Hansen [:lth]

Assignee

Comment 30

•

8 years ago

Ran the embenchen benchmarks (in asm_vs_wasm), results are generally very good, with minor improvements across the suite. The times are measured at the shell (/usr/bin/time), which will tend to give an edge to the baseline compiler because the runtime ratio is diluted by startup overhead and faster compilation time: Without the optimization Baldr Rabaldr Ratio box2d 1.06 1.78 .595 bullet 1.32 2.19 .602 conditionals 0.32 1.01 .316 copy 0.62 3.05 .203 corrections 1.00 1.57 .636 fannkuch 2.26 5.06 .446 fasta 0.78 2.17 .359 ifs 0.23 0.38 .605 linpack .8222 1.0493 .783 lua_binarytrees 4.03 6.35 .634 lua_scimark 4.145 7.352 .563 memops 0.92 3.50 .262 primes 1.17 1.44 .812 skinning 0.84 2.21 .380 zlib 1.20 2.08 .576 With the optimization Baldr Rabaldr Ratio box2d 1.09 1.72 .633 + bullet 1.27 2.15 .590 - conditionals 0.35 1.06 .330 + copy 0.62 2.96 .209 + corrections 1.02 1.51 .675 + fannkuch 2.22 4.63 .479 + fasta 0.78 2.14 .364 + ifs 0.23 0.34 .676 + linpack .8219 1.0293 .798 + lua_binarytrees 4.01 6.06 .661 + lua_scimark 4.110 7.331 .560 - memops 0.92 3.51 .262 primes 1.16 1.18 .983 + skinning 0.84 2.20 .381 + zlib 1.20 2.00 .600 + For bullet and scimark: though they are slower here a second run produced numbers that had a better ratio than without the optimization (.627 and .580 respectively). There's probably plenty of noise, but the trend is good.

Lars T Hansen [:lth]

Assignee

Comment 31

•

8 years ago

Attached patch bug1286816-boolean-evaluation-v6.patch — Details — Splinter Review

This is the same patch as in comment #26, except that two bugs that crept into the rewritten emitBranchSetup / emitBranchPerform logic have been fixed (basically, failing to properly initialize variables along all paths).

Attachment #8814406 - Attachment is obsolete: true

Attachment #8814823 - Flags: review?(hv1989)

Hannes Verschore [:h4writer]

Updated

•

8 years ago

Attachment #8814823 - Flags: review?(hv1989) → review+

Lars T Hansen [:lth]

Assignee

Comment 32

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/b023c7d0e8b2e00960b134793dc71fd25b74dc3d Bug 1286816 - branch64(reg,reg) stub for arm64. r=h4writer https://hg.mozilla.org/integration/mozilla-inbound/rev/119af218c29ab502f82d60b4340bdb5f734bf6fd Bug 1286816 - InvertCondition(DoubleCondition). r=h4writer https://hg.mozilla.org/integration/mozilla-inbound/rev/3c9d3d8df078b513149aa0228621dad461abe94d Bug 1286816 - wasm baseline, simplify code generation of branches. r=h4writer https://hg.mozilla.org/integration/mozilla-inbound/rev/2b4ee4fa6f1939f34dd8ee4c46f49855d613fd3c Bug 1286816 - wasm baseline, test cases for boolean optimization for control. r=h4writer https://hg.mozilla.org/integration/mozilla-inbound/rev/c7dc06c5f56aa301af64092ce75033a6e7d9b0a4 Bug 1286816 - wasm baseline, optimize boolean expressions for control. r=h4writer

Lars T Hansen [:lth]

Assignee

Comment 33

•

8 years ago

Optimization instrumentation forked off as bug 1320645.

Carsten Book [:Tomcat]

Comment 34

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b023c7d0e8b2 https://hg.mozilla.org/mozilla-central/rev/119af218c29a https://hg.mozilla.org/mozilla-central/rev/3c9d3d8df078 https://hg.mozilla.org/mozilla-central/rev/2b4ee4fa6f19 https://hg.mozilla.org/mozilla-central/rev/c7dc06c5f56a

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox53: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla53

Makoto Kato [:m_kato]

Updated

•

8 years ago

Depends on: 1324989

You need to log in before you can comment on or make changes to this bug.