Closed Bug 518493 Opened 15 years ago Closed 15 years ago

TM merge: handleLoopCarriedExprs changes from TM do not work with Tamarin

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect, P2)

x86_64
macOS
defect

Tracking

(Not tracked)

VERIFIED DUPLICATE of bug 525293
flash10.1

People

(Reporter: rreitmai, Assigned: rreitmai)

References

Details

Bug 514110 (tamarin bug 515822) introduced a change to handleLoopCarriedExprs that is incompatible with Tamarin.

In tamarin the original code is maintain while the merged version is marked by #ifdef TM_MERGE.

We need to resolve this issue.
Assignee: nobody → rreitmai

Need investigation and further testing on other platforms.

Many of the Date tests fail, including:

test/acceptance/ecma3/Date/e15_9_5_10_5.abc
Blocks: 503556
Summary: handleLoopCarriedExprs changes from TM do not work with Tamarin → TM merge: handleLoopCarriedExprs changes from TM do not work with Tamarin
Status: NEW → ASSIGNED
Priority: -- → P2
Target Milestone: --- → flash10.1
repo's on mac x64, but could be a problem on other platforms too.
OS: All → Mac OS X
Hardware: All → x86_64
Brain dump of what I've gleamed from this so far.

At the very bottom of the LIR there are qparam(N, 1) where each N is a saved register. qparam(RBX) gets assigned AR entry #26 (FP - 104).

After that there is a live(qparam(0, 0)), which is equivalent to live(qparam(RDI)). TR uses this to store the |MethodEnv*| as the first incoming parameter to the method. The qparam(RDI) gets assigned to RBX.

Now there is this control flow:

Block 0 starts a loop, goes Block 1 or Block 2
Block 1 has LIR_ret
Block 2 branches out further, and eventually jumps to Block 0 or out of the loop. Let's call the "out of loop block" Block N. We start assembling here and read these blocks in decreasing order.

An instruction in one of Block 2's successors needs to be spilled, and gets AR entry #40 (FP-160). This spill doesn't last long and entry #40 is freed.

The assembler keeps walking up and eventually it sees the LIR_ret in Block #1. At this point, RBX has |MethodEnv*| from qparam(RDI) and RDI has a LIR_quad from an earlier call. The state of registers is forcefully zeroed in asm_ret(), (to differentiate from eviction).

Further up in Block 1 there are two LIR_calls. The second one the assembler reads wants to pass param0 as qparam(RDI). So RDI=qparam(RDI).

Now the Assembler sees a forward-referencing LIR_jt, meaning it's time to call unionRegisterState. There's a conflict for RDI. Block 1 wants RDI=qparam(RDI). The other side wants RDI=live(quad), some other value that's being kept live. So RDI is evicted with qparam(RDI) as the victim. *It does not have an AR entry yet*, and get slots #40, which is free.

At this point, the assembler is wrong. It has compiled a loop where one branch clobbers a spill location that is live-out on a dominating block. The JIT'd code, after one iteration of the loop, passes an invalid pointer to some method call and dies.

It seems to me that trying to pin pending lives to registers is just wrong, because if the first spilling-use occurs in anything other than the first-read branch in a loop, it has no knowledge of which spill locations are safe. I'm not sure though since I don't have a full grasp on the register allocator, and if this is really the case, it seems like it would blow up much more often.

I will try and test-case this with lirasm tomorrow.
live(qparam(RDI)) should have caused a stack slot to be assigned to qparam(RDI) that is alive across the whole loop.  assigning it a register as well would be fine but a stack slot still must be assigned because at the point of the bottom of the loop we cannot guarantee that qparam(RDI) won't spill.

possible solutions:

 * LIR_live must reserve spill space for a value even if we also want it to assign a register to it.

 * change the releaseRegisters() to re-assign registers to instructions that are supposed to be "live" according to LIR_live, like we do at loop bottoms (* see below)

 * we could track and save/merge stack area allocations the same way we track and save/merge register allocations.  if we did this it would notice the collision in entry #40 and resolve it somehow.  however this seems like a very heavy solution.

the general pattern for this problem is:

   x = ...       -- start of x live range
   do {
      ... = x    -- end of x live range
      ...
   } while       -- end of loop

x's live range *must* extend to the end of the loop.  LIR_live accomplishes that if it reserves stack space for x.

any expression needs either a register or a stack location assigned to it, to keep it alive.  if we dont want to force something to spill to the stack just to keep it alive, we might be able to modify the register allocator slightly:

we have various points in the register allocator where we clear the allocation state when we know we're at an unreachable point, by calling releaseRegisters().  examples: just after a return or just after an unconditional jump.  If at these points we reassigned registers to expressions referenced by LIR_live, like we do at loop bottoms, then we avoid having to assign stack space to them in order to keep them alive.

njn, gal, dvander, rick, please critique this idea :-)
(In reply to comment #4)
>  * LIR_live must reserve spill space for a value even if we also want it to
> assign a register to it.

I barely understand LIR_live, but this sounds not that hard.  Perhaps we could have two variants of LIR_live, one that requires a register and one that also requires a spill slot?  Or add two boolean fields to LIR_live, one indicating if it needs a register, one indicating if it needs a stack slot?
 
>  * change the releaseRegisters() to re-assign registers to instructions that
> are supposed to be "live" according to LIR_live, like we do at loop bottoms (*
> see below)
> 
>  * we could track and save/merge stack area allocations the same way we track
> and save/merge register allocations.  if we did this it would notice the
> collision in entry #40 and resolve it somehow.  however this seems like a very
> heavy solution.

Very heavy.  intersectRegisterState() accounts for a non-negligible fraction of compile-time, at least on X64.  Trying to do something similar with the stack, which has many more entries (particularly once we enlarge it, as per bug 473769) sounds expensive.
The variant I landed in bug 525293 (that this should probably be dupe'd to, assuming it sticks), and that edswmith is presently testing a tamarin variant of, is much simpler: it just does *both* a findMemFor() and a findRegsFor(). The findMemFor() is non-optional, and causes (so we hypothesize) early allocation of the spill slot, which is then shared among any edges that happen to spill.
Dupe'ing. Reopen if you think it deserves additional work; I'm happy with this solution.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.