542905 - cse chains should be cleared more selectively in case of labels

Reporter

Description

•

16 years ago

We currently flush the CSE state in every label because it could be the tail of a control flow diamond that contains definitions we put inside the diamond that goes dead now. a / \ | b | | \ / * In *, a is dead, but b is still alive. Since we don't have an explicit way to just kill that one (now dead) cse-able instruction b, we clear the entire table. There are several ways how we could fix this. We can explicitly destroy cse-entries as we merge register allocator states. Might be tricky. A more explicit way would be to communicate scopes to nanojit and give it an understand of the control-flow structure. In the alternative we could explicitly disable entering values into the cse table while compiling such control flow splits, either in a global way (cse_off()) or in a scoped way. The latter is probably more desirable for tamarin which uses complex control flows, where ours is usually just a sequence of small diamonds. | / \ \ / | | / \ \ / |

Nicholas Nethercote [inactive]

Comment 1

•

16 years ago

Hmm. The whole diamond support in NJ is really primitive. That could make this difficult.

Nicholas Nethercote [inactive]

Comment 2

•

16 years ago

Also, CSE is occasionally a big win (I think one of the SunSpider crypto ones goes 2x faster or 4x faster or something with it) but most of the time doesn't help, and it's one of the more expensive parts of NJ compilation.

Edwin Smith

Comment 3

•

16 years ago

Could we maintain a stack of CSE tables that represents the dominators of the current block, as code is emitted? at a branch, push an entry, and at the label of the branch, pop it. i'm handwaving a bit, but if this modeling of dominators is accurate, then we win. for this to be robust, branch patching probably cannot be allowed by mutating a LIns* directly; it should be a call that flows through the LirWriter pipeline so each stage can see what is being patched.

Nicholas Nethercote [inactive]

Comment 4

•

15 years ago

Does anyone have some example code where the current approach causes problems? One reason I ask is that CSE is one of the more expensive parts of compilation and so we have to be careful about adding new bits to it.

Julian Seward [:jseward]

Comment 5

•

15 years ago

(In reply to comment #3) > Could we maintain a stack of CSE tables that represents the dominators of the > current block, as code is emitted? at a branch, push an entry, and at the > label of the branch, pop it. Right; that was my first thought too. Except it's a bit overly pessimistic. At the end of the diamond we do need to pop the topmost table (available-expressions set), but we can add to the one underneath, any new expressions which get defined by phi-nodes associated with (iow, immediately following) the diamond.

Nicholas Nethercote [inactive]

Updated

•

15 years ago

Depends on: 545270

Nicholas Nethercote [inactive]

Comment 6

•

15 years ago

> In the alternative we could explicitly disable entering values into the cse > table while compiling such control flow splits, either in a global way > (cse_off()) or in a scoped way. The latter is probably more desirable for > tamarin which uses complex control flows, where ours is usually just a sequence > of small diamonds. I think this (the scoped version) sounds like it'll provide the best cost/benefit trade-off. It's nice to talk about diamonds but we only have jumps and labels, which makes things a bit harder. Here's what I think is about the simplest possible algorithm that works: - If we hit a label for a backward jump, clear everything. I think we don't want to CSE any value defined before a loop without also having a 'live' for it after the loop. This doesn't seem bothering with because currently nothing of interest is computed before we enter a loop? Maybe if we did loop-hoisting (bug 545406) it might be worthwhile but I don't see that happening for a while. - If we hit a forward jump, set the 'cse_off' bit, and record the jump in a list. While 'cse_off' is set we are in a section of code that isn't always executed, so we can perform CSE using already-available expressions already stored, but we can't mark new expressions as available. - If we hit a label for a forward jump, remove that jump from the jump list. If the jump list becomes empty as a result, clear the 'cse_off' bit -- we're now back into always-executed code. In order to actually get the forward jump when we hit the label, we'll need to incorporate LIns::setTarget() somehow into the writer pipeline (like what Ed said in comment 3). - As for working out load/store aliases, I think there'd be little change -- if a not-always-taken path contains a store that invalidates a load, then we just be conservative and invalidate that load. This is a fairly minor change to CSE and the writer pipeline, and no change to LIR, but it'll greatly increase the scope for CSE. It does miss CSE opportunities for values produced and consumed within a diamond. For TM at least I think that'll hardly matter, because our diamonds are small. If TR's diamonds are bigger then more opportunities might be missed, but it'll still be much better than the current approach of trashing the entire CSE state.

Edwin Smith

Comment 7

•

15 years ago

slight tweak: I think it's a label list rather than a jump list. Say you have two jumps to the same label (e.g. from a short-circuit && expression); once you see the label, remove it from the list, turn cse back on. intuitively this is Rightious, because once you've seen the last label (no forward branches left to patch), you must be in code that's always executed. for TR compiling whole methods at a time, we probably get hosed by labels clearing CSE, but we'd probably also get killed by not supporting CSE inside an if statement, e.g. function foo() { if () { /* tons of code */ } } I think the way we'd use this is for inlined primitives that expand into small diamond flows, but we'd want to leave CSE working as-is for the large-scale control flow structure of the original program. hmm...

Nicholas Nethercote [inactive]

Comment 8

•

15 years ago

(In reply to comment #7) > slight tweak: I think it's a label list rather than a jump list. Conceptually they're equivalent. In practice a label list won't work. When you switch on 'cse_off' you have the jump at hand, but not the label, because the label doesn't exist yet. > for TR compiling whole methods at a time, we probably get hosed by labels > clearing CSE, but we'd probably also get killed by not supporting CSE inside an > if statement, e.g. > > function foo() { > if () { > /* tons of code */ > } > } Oh, you're talking about the case where more code is maybe-taken than definitely-taken? Hmm, that does complicate things. That points towards the stack-of-Cse-tables idea, but I'm having trouble seeing how to implement that efficiently -- seems like it requires lots of copying.

Nicholas Nethercote [inactive]

Comment 9

•

15 years ago

Attached patch draft patch — Details — Splinter Review

I implemented the algorithm from comment 6. It appears to work, but does remarkably little for TM code. Scanning through the diffs of the generated code I could only see that lots of 32-bit immediates were being CSEd that weren't previously. But that's not very interesting, as most 32-bit immediates get folded into other operations anyway. I couldn't see any cases where anything more complex was CSE'd. Instruction counts were barely changed, in some cases slightly worse (eg. 0.2%) because CseFilter's hash tables are more stressed than before. Given this result, I'm disinclined to work on this any more, at least until something changes.

Nicholas Nethercote [inactive]

Comment 10

•

15 years ago

Oh, a downside of the patch is that names given to immediates with addName() become less-than-useless. Eg. you give '2' the name JSVAL_DOUBLE and then every use of 2 gets that name even when it's not appropriate. This already happens a bit but with the patch it's much worse.

Edwin Smith

Updated

•

15 years ago

Blocks: 563944

Edwin Smith

Updated

•

15 years ago

Whiteboard: PACMAN

Target Milestone: --- → Future

draft patch 15 years ago Nicholas Nethercote [inactive] 23.92 KB, patch		Details \| Diff \| Splinter Review
Simple patch to suspend/resume CSE around synthetic control-flow diamonds 15 years ago William Maddox 7.21 KB, patch	edwsmith : review+	Details \| Diff \| Splinter Review
Simple patch to suspend/resume CSE around synthetic control-flow diamonds (v2) 15 years ago William Maddox 5.61 KB, patch	n.nethercote : review+ edwsmith : feedback+	Details \| Diff \| Splinter Review
Tamarin use case for CseFilter::suspend() 15 years ago William Maddox 8.43 KB, patch		Details \| Diff \| Splinter Review
Tamarin helper methods: CodegenLIR::suspendCSE() and CodegenLIR::resumeCSE() 15 years ago William Maddox 3.86 KB, patch	edwsmith : review+	Details \| Diff \| Splinter Review