513615 - nanojit: premature register allocation state updates

Reporter

Description

•

15 years ago

During register allocation, Nanojit records the current state of registers and the stack -- which registers are free, active, etc.  Unfortunately, it pervasively updates this state prematurely in a way that can cause subtle bugs.
Consider a debug output line like this:

   mov ecx,8(eax)         eax(state) esi(sp)

The register state printed is that just before the 'mov' is executed
(which I'll call the "pre-state").  The other possibility would be to
print the post-state, in which case there'd be an "ecx(something)"
entry as well.  (Note that we're using "mov dst,src" syntax.)  The
choice between pre-state and post-state looks arbitrary, but either
way you have to be consistent.

Now, the way this gets printed is that when asm_output() is called (as
it is from all the codegen macros like LD, ALU, etc) it prints the
generated native instruction, then follows it with the current register
state in Assembler::_allocator.  So, if we are printing the pre-state, we
need to make sure _allocator contains the pre-state.

However, we are also generating the code, and for that we need the post-state!
In other words, we've prematurely updated to the pre-state.  freeRsrcOf() is a prime example of this, it calls _allocator.retire(r) prematurely.

In most cases the code generation is very simple and doesn't involve looking at the register state, so the premature update doesn't matter.  But this isn't guaranteed.  We've already seen at least one bug caused by this (bug 495239, the solution to which fixed the symptoms but not this underlying cause).  Ed says he's also seen some other bugs that were probably caused by this.  It also causes the debug output to be wrong in some cases, such as this one:

     js_NewEmptyArray1 = call #js_NewEmptyArray ( cx JSVAL_TO_OBJECT(pval) )
              mov edx,166076544      ebx(ld21) esi(sp) edi(ld22)
              mov ecx,-8(ebp)        ebx(ld21) esi(sp) edi(ld22)
              call js_NewEmptyArray  ebx(ld21) esi(sp) edi(ld22)
              mov edx,-12(ebp)       ebx(ld21) esi(sp) edi(ld22)  <= restore $var21
              mov ecx,-8(ebp)        edx($var21) ebx(ld21) esi(sp) edi(ld22)  <= restore cx
              mov -8(ebp),eax        eax(js_NewEmptyArray1) ecx(cx) edx($var21) ebx(ld21) esi(sp) edi(ld22)

(Think forwards now, ie. the order the code is executed in.  Also note
that the register state shown on a line applies before the asm
instruction on that same line executes.)

After the 'call', %eax is defined and holds js_NewEmptyArray1.
However, we don't see that until the last line.  The movs on the 2nd
and 3rd last lines don't show that because they are generated by
evictScratchRegs() after the register state for eax
has been prematurely cleared.  (Come to think of it, the state isn't
shown properly for ecx and edx after the first two movs, but I think
that's a separate issue.)

If we switched to printing the post-state I think that would avoid a
lot of these problems.  (I also think it would be more readable, as I
find the current choice of printing the pre-state _after_ the
instruction to be confusing.)  Unfortunately, doing that switch is
difficult as it requires changing every single place where code is
generated.

So I have a plan to do it gradually by parameterising all
the macros like LD with a bool that indicates if the _allocator state
is the pre- or post-state, and have asm_output() act accordingly, eg.
by adding a "pre:" or "post:" prefix.  Then I can change one
asm_*() function at a time from pre-state to post-state.  Once they
are all post-states, I can remove the bool parameter again.  I plan to do this
over a number of patches, I think trying to do it in a single big-bang patch
is a bad idea -- too hard to get right and too hard to review.

Nicholas Nethercote [inactive]

Reporter

Updated

•

15 years ago

Blocks: 513616

Julian Seward [:jseward]

Comment 1

•

15 years ago

(In reply to comment #0)
> entry as well.  (Note that we're using "mov dst,src" syntax.)  The
> choice between pre-state and post-state looks arbitrary, but either
> way you have to be consistent.

Minor hair splitting (in favour of your cleanups): The concept of 
"register allocation state _at_ an instruction" is bogus and confusing.
Register allocation state (the virtual-to-real mapping, and the 
liveness of virtual and real regs)  only exists _between_ instructions.
(eg, it's confusing to say register X is live at insn Y.  It might be
live immediately before or immediately afterwards, and it's clear what
that means, but "at" is just confusing.)

Brendan Eich [:brendan]

Comment 2

•

15 years ago

Whew, I thought I was going nuts trying to read the "after (on the right)" state as post-instruction state. Looking forward to this bug being fixed.

/be

Nicholas Nethercote [inactive]

Reporter

Comment 3

•

15 years ago

Fixing this requires potentially modifying every single code generation
point in all five back-ends.  This is a large and difficult task so I want
to do it in pieces, which will be filed in bugs that block this one.  Doing
it in pieces will require some temporary scaffolding code, which isn't
ideal, but trying to do it big-bang feels like a terrible idea.

Nicholas Nethercote [inactive]

Reporter

Updated

•

15 years ago

Depends on: 516347

Nicholas Nethercote [inactive]

Reporter

Comment 4

•

15 years ago

Hmm, this is a subtle one.  I've started doing some work to fix things up as comment 0 suggests (can be seen in bug 516347).  I've been using a workflow like this (eg. in asm_arith()):

- begin (S0==post-state)
- do some state manipulations to determine where regs are, etc (S0 -> S1)
- codegen
- more state manipulations (S1 -> S2==pre-state)
    
We want to print S0 to the right of the generated code.  But if regstate printing is tied to codegen (as is currently is) then we unavoidably end up printing S1.  And S1 is not the post-state, but rather an intermediate regstate.

Therefore we have to separate regstate printing from codegen, something like this:

- begin (S0==post-state)
- debug_print(S0)
- do some state manipulations to determine where regs are, etc (S0 -> S1)
- codegen
- more state manipulations (S1 -> S2==pre-state)

Consider an example, where the instruction is "x = add y, z" and it is the only use of y and z:

- begin (S0: x in GpRegs, y unused, z unused)
- debug_print(S0)
- do some state manipulations (S1: x in GpRegs, y in GpRegs, z in GpRegs)
- codegen
- more state manipulations (S2: x is unused, y in GpRegs, z in GpRegs)

The debug_print(S0) step will print something like "eax(x)", which is what we want.  Without separating regstate printing from codegen we would get something like "eax(x) ebx(y) ecx(z)", which is correct in a way but doesn't make it clear that ebx(y) and ecx(z) are dead (they will be omitted from the subsequent post-states though).

Another thing I've found when it comes to printing the regstate:  if a single LIns causes multiple asm instructions to be generated, it's best to avoid printing the regstate in the middle, because it'll be wrong and/or confusing, as it'll undoubtedly be in some kind of intermediate state.  The best thing to do is probably only print the regstate between each LIns.  This can be done in gen(), which saves the back-ends from having to add lines equivalent to debug_print() in all the asm_*() functions, which is nice.

Once we've decided to separate debug printing from codegen, the next question is this:  what state should the regstate hold when codegen occurs?  Our answer can be arbitrary, so long as its safe w.r.t. codegen.  In the above workflow I use neither the post-state (S0) nor the pre-state (S2), but rather an intermediate state (S1).  I suggest that this is a good regstate to use... you can see it's some kind of union of S0 and S2, ie. if a register is allocated in either S0 or S2, then it's allocated in S1.  This seems good because it seems safe -- if the codegen has to allocate any new registers, it won't take any that it's not allowed to.  And it's as descriptive as possible of which registers are involved by the codegen.  It might be overly conservative, but probably not in a significant way, because most codegen snippets don't involve the registers.

Anyway, I'll work on a patch that incorporates all of the above and see how it looks.

Nicholas Nethercote [inactive]

Reporter

Comment 5

•

15 years ago

Another good thing about separating the regstate printing from codegen:  it avoids the need for the temporary boolean pre/post parameter in all the codegen macros.

Edwin Smith

Comment 6

•

15 years ago

+1, great idea.

Nicholas Nethercote [inactive]

Reporter

Comment 7

•

15 years ago

(In reply to comment #4)
> Therefore we have to separate regstate printing from codegen

Bug 512181 does this.

Depends on: 512181

Nicholas Nethercote [inactive]

Reporter

Updated

•

15 years ago

Depends on: 531347

Nicholas Nethercote [inactive]

Reporter

Updated

•

15 years ago

Depends on: 535705

Nicholas Nethercote [inactive]

Reporter

Updated

•

15 years ago

Depends on: 535706

Nicholas Nethercote [inactive]

Reporter

Updated

•

15 years ago

Depends on: 535707

Nicholas Nethercote [inactive]

Reporter

Updated

•

15 years ago

Depends on: 535708

Nicholas Nethercote [inactive]

Reporter

Updated

•

15 years ago

Depends on: 535709

Nicholas Nethercote [inactive]

Reporter

Comment 8

•

15 years ago

When all the back-ends are done, we'll be able to remove Assembler::prepResultReg() and Assembler::freeRsrcOf().

Nicholas Nethercote [inactive]

Reporter

Updated

•

14 years ago

Depends on: 547298

Nicholas Nethercote [inactive]

Reporter

Updated

•

13 years ago

Assignee: nnethercote → general

Ryan VanderMeulen [:RyanVM]

Comment 10

•

13 years ago

Edwin, is this still wanted for Nanojit?

Edwin Smith

Comment 11

•

13 years ago

Yes, thanks, I changed component to Nanojit.

Assignee: general → nobody

Component: JavaScript Engine → Nanojit

QA Contact: general → nanojit

Nobody; OK to take it and work on it

Assignee

Updated

•

10 years ago

Product: Core → Core Graveyard

Nicholas Nethercote [inactive]

Reporter

Comment 12

•

10 years ago

Nanojit has been dead for several years. Its Bugzilla component has been moved to the graveyard (bug 984276).

I checked all the open bugs. They're all uninteresting, so I'm WONTFIXing them all. Apologies for the bugspam.

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → WONTFIX

Bugzilla

Quick Search

nanojit: premature register allocation state updates

Categories

(Core Graveyard :: Nanojit, defect)

Tracking

(Not tracked)

People

(Reporter: n.nethercote, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Updated

Updated

Updated

Updated

Updated

Comment 8

Updated

Updated

Comment 10

Comment 11

Updated

Comment 12