Closed Bug 535681 Opened 16 years ago Closed 16 years ago

nanojit: make iterating over all active registers faster

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Attachments

(1 file)

very incomplete patch 16 years ago Nicholas Nethercote [inactive] 19.04 KB, patch		Details \| Diff \| Splinter Review

Nicholas Nethercote [inactive]

Assignee

Description

•

16 years ago

Nanojit frequently iterates over all active registers. This accounts for a non-negligible fraction of compile time. (And see bug 524890 for a surprising consequence of this.) The code always looks something like this: for (Register r = FirstReg; r <= LastReg; r = nextreg(r)) { if (LIns* ins = _allocator.getActive(r)) { ... } } getActive(r) is just an array lookup, ie. _allocator.active[r]. A lot of the time only a small fraction of the registers are active, and we don't need to do random accesses in active[] very much. This points towards changing active[] into a list of (Register, LIns*) pairs. I have started experimenting with this but it turns out to be trickier than I expected because often when we are iterating over the registers we are also doing things like evicting them, ie. we are effectively modifying the list as we iterate over it. Also, intersectRegisterState() and unionRegisterState() don't follow the simple pattern above. I need to do some more work on it.

Nicholas Nethercote [inactive]

Assignee

Comment 1

•

16 years ago

Attached patch very incomplete patch — Details — Splinter Review

Nicholas Nethercote [inactive]

Assignee

Updated

•

16 years ago

Blocks: 524890

Edwin Smith

Comment 2

•

16 years ago

If we have a bitmask of the registers we want to iterate over, we could use a variation on nRegisterAllocFromSet() which uses CPU's bit-first-set instruction. Currently, RegAlloc.free is the only mask we maintain, but if we have a "managed" mask, then we can always get active = (~free)&managed, or vice-versa: free = (~active)&managed. (do we need "active" or "free" more often?) call site iteration set --------- ------------- ReleaseRegisters() active evictScratchRegs() active & GpRegs evictAllActiveRegs() active evictSomeActiveRegs() active & regs intersectRegState() cur.active | saved.active (iterate hi->lo) unionRegisterState() cur.active | saved.active (iterate hi->lo) assignSaved() saved.active & ~skip findVictim() allow & active if we are careful, modifying the list as we iterate should be a matter of zeroing out the 1 bits in the mask as we visit them. lo->hi hi->lo ------ ------ intel bsf bsr ppc cntzlw arm clz sparc ? if the cpu doesn't have a reversed bitscan instruction we're sort of stuck. We could use a 256-entry lookup table with 4 loads and stores to build the reversed mask or just do the slow scan. even skipping ahead one register at a time via an index and a bitmask should be marginally better than loading from the active[] array each time.

Steven Johnson

Comment 3

•

16 years ago

(In reply to comment #2) > if the cpu doesn't have a reversed bitscan instruction we're sort of stuck. see http://graphics.stanford.edu/~seander/bithacks.html for various approaches.

Nicholas Nethercote [inactive]

Assignee

Comment 4

•

16 years ago

'free' is a bitmask but 'active' is an array of LIns*. So I think bit scanning is a red herring. I've made some progress on changing 'active' into a hybrid array/list structure. I've augmented each entry with a 'next' pointer so a list can be threaded through the active entries -- addActive() and removeActive() are then responsible for updating the pointers appropriately and so are marginally slower. It's still a register-indexed array so random accesses are still O(1), but you can traverse the active registers in O(n) time where n is the number of active registers. I should make more progress with it on Monday.

Nicholas Nethercote [inactive]

Assignee

Updated

•

16 years ago

Depends on: 536098

Edwin Smith

Comment 5

•

16 years ago

I didn't say it in the bit-scanning proposal but the gist of the idea is to have RegAlloc have a new bitmask, so that we indeed can do bit scanning on (masks of) the active array.

Nicholas Nethercote [inactive]

Assignee

Comment 6

•

16 years ago

I just marked bug 524890 as WONTFIX. The likelihood of this bug making a noticeable difference seems smaller now, and what hacking I've done with the hybrid array/list structure has made things a lot more complicated. It doesn't seem worth the effort. I'm not planning to do any more here, so I'm marking this as WONTFIX as well.

Status: ASSIGNED → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Nicholas Nethercote [inactive]

Assignee

Updated

•

16 years ago

Resolution: FIXED → WONTFIX

Nobody; OK to take it and work on it

Updated

•

12 years ago

Product: Core → Core Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

nanojit: make iterating over all active registers faster

Categories

(Core Graveyard :: Nanojit, defect)

Tracking

(Not tracked)

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Updated

Updated

Attachment

General

Description

File Name

Content Type