Closed Bug 445568 Opened 16 years ago Closed 3 years ago

Consider aligning opcode cases on 4/8/16-byte boundaries

Categories

(Core :: JavaScript Engine, defect)

All
macOS
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: dmandelin, Unassigned)

Details

In working on call-threaded dispatch, I accidentally observed that adding NOPs and even additional functional instructions could speed things up by 3-7%. I think it might have been from alignment changes, although I really don't know. (I suspect some of it is from increased path lengths between control flow ops.) I do know that Intel recommends branch targets be aligned on 16-byte boundaries, because then the first fetch will fetch a whole cache line from the target. (Conversely, if the branch target is 15 bytes after a 16-byte boundary, the first fetch will fetch only 1 byte that will be executed, and 15 "wasted" bytes.)

Regular SM might be able to benefit from this. On GCC, you can use

  asm(".align N")

to align on 2^N-byte boundaries. In my experiments, I got the best results with .align 3 (8-byte boundaries), but my benchmark is very small as I am currently using an unreliable experimental system. Presumably it would be easier to get definitive measurements with trunk SM.
Really want an owner for this (thanks, dmandelin, for filing it) -- Igor, are you around this week? Cc'ing others who might be able to help.

/be
Assignee: general → nobody

Old bug, no longer valid.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.