bugzilla.mozilla.org has resumed normal operation. Attachments prior to 2014 will be unavailable for a few days. This is tracked in Bug 1475801.
Please report any other irregularities here.

Consider aligning opcode cases on 4/8/16-byte boundaries

NEW
Unassigned

Status

()

Core
JavaScript Engine
10 years ago
4 years ago

People

(Reporter: dmandelin, Unassigned)

Tracking

Trunk
All
Mac OS X
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

10 years ago
In working on call-threaded dispatch, I accidentally observed that adding NOPs and even additional functional instructions could speed things up by 3-7%. I think it might have been from alignment changes, although I really don't know. (I suspect some of it is from increased path lengths between control flow ops.) I do know that Intel recommends branch targets be aligned on 16-byte boundaries, because then the first fetch will fetch a whole cache line from the target. (Conversely, if the branch target is 15 bytes after a 16-byte boundary, the first fetch will fetch only 1 byte that will be executed, and 15 "wasted" bytes.)

Regular SM might be able to benefit from this. On GCC, you can use

  asm(".align N")

to align on 2^N-byte boundaries. In my experiments, I got the best results with .align 3 (8-byte boundaries), but my benchmark is very small as I am currently using an unreliable experimental system. Presumably it would be easier to get definitive measurements with trunk SM.
Really want an owner for this (thanks, dmandelin, for filing it) -- Igor, are you around this week? Cc'ing others who might be able to help.

/be
(Assignee)

Updated

4 years ago
Assignee: general → nobody
You need to log in before you can comment on or make changes to this bug.