Closed Bug 535681 Opened 15 years ago Closed 15 years ago

nanojit: make iterating over all active registers faster

Categories

(Core Graveyard :: Nanojit, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Attachments

(1 file)

Nanojit frequently iterates over all active registers.  This accounts for a non-negligible fraction of compile time.  (And see bug 524890 for a surprising consequence of this.)  The code always looks something like this:

  for (Register r = FirstReg; r <= LastReg; r = nextreg(r)) {
      if (LIns* ins = _allocator.getActive(r)) { ... }
  }

getActive(r) is just an array lookup, ie. _allocator.active[r].

A lot of the time only a small fraction of the registers are active, and we don't need to do random accesses in active[] very much.  This points towards changing active[] into a list of (Register, LIns*) pairs.

I have started experimenting with this but it turns out to be trickier than I expected because often when we are iterating over the registers we are also doing things like evicting them, ie. we are effectively modifying the list as we iterate over it.  Also, intersectRegisterState() and unionRegisterState() don't follow the simple pattern above.  I need to do some more work on it.
Blocks: 524890
If we have a bitmask of the registers we want to iterate over, we could use a variation on nRegisterAllocFromSet() which uses CPU's bit-first-set instruction.

Currently, RegAlloc.free is the only mask we maintain, but if we have a "managed" mask, then we can always get active = (~free)&managed, or vice-versa:  free = (~active)&managed.  (do we need "active" or "free" more often?)

call site                iteration set
---------                -------------
ReleaseRegisters()       active
evictScratchRegs()       active & GpRegs
evictAllActiveRegs()     active
evictSomeActiveRegs()    active & regs
intersectRegState()      cur.active | saved.active  (iterate hi->lo)
unionRegisterState()     cur.active | saved.active  (iterate hi->lo)
assignSaved()            saved.active & ~skip
findVictim()             allow & active

if we are careful, modifying the list as we iterate should be a matter of zeroing out the 1 bits in the mask as we visit them.

           lo->hi       hi->lo
           ------       ------
intel      bsf          bsr
ppc        cntzlw
arm        clz
sparc      ?

if the cpu doesn't have a reversed bitscan instruction we're sort of stuck.  We could use a 256-entry lookup table with 4 loads and stores to build the reversed mask or just do the slow scan.  even skipping ahead one register at a time via an index and a bitmask should be marginally better than loading from the active[] array each time.
(In reply to comment #2)
> if the cpu doesn't have a reversed bitscan instruction we're sort of stuck.

see http://graphics.stanford.edu/~seander/bithacks.html for various approaches.
'free' is a bitmask but 'active' is an array of LIns*.  So I think bit scanning is a red herring.

I've made some progress on changing 'active' into a hybrid array/list structure. I've augmented each entry with a 'next' pointer so a list can be threaded through the active entries -- addActive() and removeActive() are then responsible for updating the pointers appropriately and so are marginally slower.  It's still a register-indexed array so random accesses are still O(1), but you can traverse the active registers in O(n) time where n is the number of active registers.

I should make more progress with it on Monday.
Depends on: 536098
I didn't say it in the bit-scanning proposal but the gist of the idea is to have RegAlloc have a new bitmask, so that we indeed can do bit scanning on (masks of) the active array.
I just marked bug 524890 as WONTFIX.  The likelihood of this bug making a noticeable difference seems smaller now, and what hacking I've done with the hybrid array/list structure has made things a lot more complicated.  It doesn't seem worth the effort.

I'm not planning to do any more here, so I'm marking this as WONTFIX as well.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Resolution: FIXED → WONTFIX
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: