Closed Bug 513615 Opened 11 years ago Closed 6 years ago

nanojit: premature register allocation state updates

Categories

(Core Graveyard :: Nanojit, defect)

x86
Linux
defect
Not set

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: njn, Unassigned)

References

Details

During register allocation, Nanojit records the current state of registers and the stack -- which registers are free, active, etc.  Unfortunately, it pervasively updates this state prematurely in a way that can cause subtle bugs.
Consider a debug output line like this:

   mov ecx,8(eax)         eax(state) esi(sp)

The register state printed is that just before the 'mov' is executed
(which I'll call the "pre-state").  The other possibility would be to
print the post-state, in which case there'd be an "ecx(something)"
entry as well.  (Note that we're using "mov dst,src" syntax.)  The
choice between pre-state and post-state looks arbitrary, but either
way you have to be consistent.

Now, the way this gets printed is that when asm_output() is called (as
it is from all the codegen macros like LD, ALU, etc) it prints the
generated native instruction, then follows it with the current register
state in Assembler::_allocator.  So, if we are printing the pre-state, we
need to make sure _allocator contains the pre-state.

However, we are also generating the code, and for that we need the post-state!
In other words, we've prematurely updated to the pre-state.  freeRsrcOf() is a prime example of this, it calls _allocator.retire(r) prematurely.

In most cases the code generation is very simple and doesn't involve looking at the register state, so the premature update doesn't matter.  But this isn't guaranteed.  We've already seen at least one bug caused by this (bug 495239, the solution to which fixed the symptoms but not this underlying cause).  Ed says he's also seen some other bugs that were probably caused by this.  It also causes the debug output to be wrong in some cases, such as this one:

     js_NewEmptyArray1 = call #js_NewEmptyArray ( cx JSVAL_TO_OBJECT(pval) )
              mov edx,166076544      ebx(ld21) esi(sp) edi(ld22)
              mov ecx,-8(ebp)        ebx(ld21) esi(sp) edi(ld22)
              call js_NewEmptyArray  ebx(ld21) esi(sp) edi(ld22)
              mov edx,-12(ebp)       ebx(ld21) esi(sp) edi(ld22)  <= restore $var21
              mov ecx,-8(ebp)        edx($var21) ebx(ld21) esi(sp) edi(ld22)  <= restore cx
              mov -8(ebp),eax        eax(js_NewEmptyArray1) ecx(cx) edx($var21) ebx(ld21) esi(sp) edi(ld22)

(Think forwards now, ie. the order the code is executed in.  Also note
that the register state shown on a line applies before the asm
instruction on that same line executes.)

After the 'call', %eax is defined and holds js_NewEmptyArray1.
However, we don't see that until the last line.  The movs on the 2nd
and 3rd last lines don't show that because they are generated by
evictScratchRegs() after the register state for eax
has been prematurely cleared.  (Come to think of it, the state isn't
shown properly for ecx and edx after the first two movs, but I think
that's a separate issue.)

If we switched to printing the post-state I think that would avoid a
lot of these problems.  (I also think it would be more readable, as I
find the current choice of printing the pre-state _after_ the
instruction to be confusing.)  Unfortunately, doing that switch is
difficult as it requires changing every single place where code is
generated.

So I have a plan to do it gradually by parameterising all
the macros like LD with a bool that indicates if the _allocator state
is the pre- or post-state, and have asm_output() act accordingly, eg.
by adding a "pre:" or "post:" prefix.  Then I can change one
asm_*() function at a time from pre-state to post-state.  Once they
are all post-states, I can remove the bool parameter again.  I plan to do this
over a number of patches, I think trying to do it in a single big-bang patch
is a bad idea -- too hard to get right and too hard to review.
Blocks: 513616
(In reply to comment #0)
> entry as well.  (Note that we're using "mov dst,src" syntax.)  The
> choice between pre-state and post-state looks arbitrary, but either
> way you have to be consistent.

Minor hair splitting (in favour of your cleanups): The concept of 
"register allocation state _at_ an instruction" is bogus and confusing.
Register allocation state (the virtual-to-real mapping, and the 
liveness of virtual and real regs)  only exists _between_ instructions.
(eg, it's confusing to say register X is live at insn Y.  It might be
live immediately before or immediately afterwards, and it's clear what
that means, but "at" is just confusing.)
Whew, I thought I was going nuts trying to read the "after (on the right)" state as post-instruction state. Looking forward to this bug being fixed.

/be
Fixing this requires potentially modifying every single code generation
point in all five back-ends.  This is a large and difficult task so I want
to do it in pieces, which will be filed in bugs that block this one.  Doing
it in pieces will require some temporary scaffolding code, which isn't
ideal, but trying to do it big-bang feels like a terrible idea.
Depends on: 516347
Hmm, this is a subtle one.  I've started doing some work to fix things up as comment 0 suggests (can be seen in bug 516347).  I've been using a workflow like this (eg. in asm_arith()):

- begin (S0==post-state)
- do some state manipulations to determine where regs are, etc (S0 -> S1)
- codegen
- more state manipulations (S1 -> S2==pre-state)
    
We want to print S0 to the right of the generated code.  But if regstate printing is tied to codegen (as is currently is) then we unavoidably end up printing S1.  And S1 is not the post-state, but rather an intermediate regstate.

Therefore we have to separate regstate printing from codegen, something like this:

- begin (S0==post-state)
- debug_print(S0)
- do some state manipulations to determine where regs are, etc (S0 -> S1)
- codegen
- more state manipulations (S1 -> S2==pre-state)

Consider an example, where the instruction is "x = add y, z" and it is the only use of y and z:

- begin (S0: x in GpRegs, y unused, z unused)
- debug_print(S0)
- do some state manipulations (S1: x in GpRegs, y in GpRegs, z in GpRegs)
- codegen
- more state manipulations (S2: x is unused, y in GpRegs, z in GpRegs)

The debug_print(S0) step will print something like "eax(x)", which is what we want.  Without separating regstate printing from codegen we would get something like "eax(x) ebx(y) ecx(z)", which is correct in a way but doesn't make it clear that ebx(y) and ecx(z) are dead (they will be omitted from the subsequent post-states though).

Another thing I've found when it comes to printing the regstate:  if a single LIns causes multiple asm instructions to be generated, it's best to avoid printing the regstate in the middle, because it'll be wrong and/or confusing, as it'll undoubtedly be in some kind of intermediate state.  The best thing to do is probably only print the regstate between each LIns.  This can be done in gen(), which saves the back-ends from having to add lines equivalent to debug_print() in all the asm_*() functions, which is nice.

Once we've decided to separate debug printing from codegen, the next question is this:  what state should the regstate hold when codegen occurs?  Our answer can be arbitrary, so long as its safe w.r.t. codegen.  In the above workflow I use neither the post-state (S0) nor the pre-state (S2), but rather an intermediate state (S1).  I suggest that this is a good regstate to use... you can see it's some kind of union of S0 and S2, ie. if a register is allocated in either S0 or S2, then it's allocated in S1.  This seems good because it seems safe -- if the codegen has to allocate any new registers, it won't take any that it's not allowed to.  And it's as descriptive as possible of which registers are involved by the codegen.  It might be overly conservative, but probably not in a significant way, because most codegen snippets don't involve the registers.

Anyway, I'll work on a patch that incorporates all of the above and see how it looks.
Another good thing about separating the regstate printing from codegen:  it avoids the need for the temporary boolean pre/post parameter in all the codegen macros.
+1, great idea.
(In reply to comment #4)
> Therefore we have to separate regstate printing from codegen

Bug 512181 does this.
Depends on: 512181
Depends on: 531347
Depends on: 535705
Depends on: 535706
Depends on: 535707
Depends on: 535708
Depends on: 535709
When all the back-ends are done, we'll be able to remove Assembler::prepResultReg() and Assembler::freeRsrcOf().
Duplicate of this bug: 515897
Depends on: 547298
Assignee: nnethercote → general
Edwin, is this still wanted for Nanojit?
Yes, thanks, I changed component to Nanojit.
Assignee: general → nobody
Component: JavaScript Engine → Nanojit
QA Contact: general → nanojit
Product: Core → Core Graveyard
Nanojit has been dead for several years. Its Bugzilla component has been moved to the graveyard (bug 984276).

I checked all the open bugs. They're all uninteresting, so I'm WONTFIXing them all. Apologies for the bugspam.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.