nanojit: make iterating over all active registers faster

RESOLVED WONTFIX

Status

RESOLVED WONTFIX
9 years ago
5 years ago

People

(Reporter: njn, Assigned: njn)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Assignee)

Description

9 years ago
Nanojit frequently iterates over all active registers.  This accounts for a non-negligible fraction of compile time.  (And see bug 524890 for a surprising consequence of this.)  The code always looks something like this:

  for (Register r = FirstReg; r <= LastReg; r = nextreg(r)) {
      if (LIns* ins = _allocator.getActive(r)) { ... }
  }

getActive(r) is just an array lookup, ie. _allocator.active[r].

A lot of the time only a small fraction of the registers are active, and we don't need to do random accesses in active[] very much.  This points towards changing active[] into a list of (Register, LIns*) pairs.

I have started experimenting with this but it turns out to be trickier than I expected because often when we are iterating over the registers we are also doing things like evicting them, ie. we are effectively modifying the list as we iterate over it.  Also, intersectRegisterState() and unionRegisterState() don't follow the simple pattern above.  I need to do some more work on it.
(Assignee)

Comment 1

9 years ago
Created attachment 418292 [details] [diff] [review]
very incomplete patch
(Assignee)

Updated

9 years ago
Blocks: 524890

Comment 2

9 years ago
If we have a bitmask of the registers we want to iterate over, we could use a variation on nRegisterAllocFromSet() which uses CPU's bit-first-set instruction.

Currently, RegAlloc.free is the only mask we maintain, but if we have a "managed" mask, then we can always get active = (~free)&managed, or vice-versa:  free = (~active)&managed.  (do we need "active" or "free" more often?)

call site                iteration set
---------                -------------
ReleaseRegisters()       active
evictScratchRegs()       active & GpRegs
evictAllActiveRegs()     active
evictSomeActiveRegs()    active & regs
intersectRegState()      cur.active | saved.active  (iterate hi->lo)
unionRegisterState()     cur.active | saved.active  (iterate hi->lo)
assignSaved()            saved.active & ~skip
findVictim()             allow & active

if we are careful, modifying the list as we iterate should be a matter of zeroing out the 1 bits in the mask as we visit them.

           lo->hi       hi->lo
           ------       ------
intel      bsf          bsr
ppc        cntzlw
arm        clz
sparc      ?

if the cpu doesn't have a reversed bitscan instruction we're sort of stuck.  We could use a 256-entry lookup table with 4 loads and stores to build the reversed mask or just do the slow scan.  even skipping ahead one register at a time via an index and a bitmask should be marginally better than loading from the active[] array each time.

Comment 3

9 years ago
(In reply to comment #2)
> if the cpu doesn't have a reversed bitscan instruction we're sort of stuck.

see http://graphics.stanford.edu/~seander/bithacks.html for various approaches.
(Assignee)

Comment 4

9 years ago
'free' is a bitmask but 'active' is an array of LIns*.  So I think bit scanning is a red herring.

I've made some progress on changing 'active' into a hybrid array/list structure. I've augmented each entry with a 'next' pointer so a list can be threaded through the active entries -- addActive() and removeActive() are then responsible for updating the pointers appropriately and so are marginally slower.  It's still a register-indexed array so random accesses are still O(1), but you can traverse the active registers in O(n) time where n is the number of active registers.

I should make more progress with it on Monday.
(Assignee)

Updated

9 years ago
Depends on: 536098

Comment 5

9 years ago
I didn't say it in the bit-scanning proposal but the gist of the idea is to have RegAlloc have a new bitmask, so that we indeed can do bit scanning on (masks of) the active array.
(Assignee)

Comment 6

9 years ago
I just marked bug 524890 as WONTFIX.  The likelihood of this bug making a noticeable difference seems smaller now, and what hacking I've done with the hybrid array/list structure has made things a lot more complicated.  It doesn't seem worth the effort.

I'm not planning to do any more here, so I'm marking this as WONTFIX as well.
Status: ASSIGNED → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
(Assignee)

Updated

9 years ago
Resolution: FIXED → WONTFIX
Component: Nanojit → Nanojit
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.