Closed
Bug 445568
Opened 16 years ago
Closed 3 years ago
Consider aligning opcode cases on 4/8/16-byte boundaries
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
INVALID
People
(Reporter: dmandelin, Unassigned)
Details
In working on call-threaded dispatch, I accidentally observed that adding NOPs and even additional functional instructions could speed things up by 3-7%. I think it might have been from alignment changes, although I really don't know. (I suspect some of it is from increased path lengths between control flow ops.) I do know that Intel recommends branch targets be aligned on 16-byte boundaries, because then the first fetch will fetch a whole cache line from the target. (Conversely, if the branch target is 15 bytes after a 16-byte boundary, the first fetch will fetch only 1 byte that will be executed, and 15 "wasted" bytes.) Regular SM might be able to benefit from this. On GCC, you can use asm(".align N") to align on 2^N-byte boundaries. In my experiments, I got the best results with .align 3 (8-byte boundaries), but my benchmark is very small as I am currently using an unreliable experimental system. Presumably it would be easier to get definitive measurements with trunk SM.
Comment 1•16 years ago
|
||
Really want an owner for this (thanks, dmandelin, for filing it) -- Igor, are you around this week? Cc'ing others who might be able to help. /be
Assignee | ||
Updated•10 years ago
|
Assignee: general → nobody
Comment 2•3 years ago
|
||
Old bug, no longer valid.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•