Closed Bug 513514 Opened 15 years ago Closed 14 years ago

nanojit: make hint() faster

Categories

(Core Graveyard :: Nanojit, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Future

People

(Reporter: gal, Assigned: n.nethercote)

References

Details

(Whiteboard: PACMAN, fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)

Attachments

(4 files, 3 obsolete files)

It shows up in some profiles clocking in at about 1% total time for aes. The long series of if statements is painful, especially the isCmp part. We probably want to put this in some kind of table since most branches only depend on op.
The if-then-else chains are painful, as you say, but the actions for each alternative are different -- usually an assignment or masking, but sometimes (eg. the LIR_param case) depending on other values.  They also vary across back-ends so doing it via the usual LIRopcode.tbl mechanism wouldn't work.

There's also the issue that some of the back-ends mask the result with regs.free in hint() but not all of them do.  That's easier to make consistent, though.
Here's another suggestion: get rid of hint() altogether.  I turned it off and it barely made a difference;  it may have even caused a 1--2ms speedup.

Why?  On x86, at least, it's because hint() almost never makes any useful contribution.  There are three cases:

- 'prefer' ends up the same as 'allow'.  This can happen because the 'allow' set passed to findRegFor() often already has been chosen carefully and hint() is effectively repeating that choosing.  Eg. for calls we do prepResultReg(ins, rmask(EAX)) and then hint() sees that it's a LIR_call and so suggests rmask(EAX).  Alternatively, the opcode is one that hint() doesn't treat specially.  Either way, I call this the "I second your fine decision!" case.

- In the remaining cases, 'prefer' usually ends up having no overlap with 'free', so we just return 'allow' anyway.  I call this the "I give up!" case.  

- In the remaining cases, 'prefer' overlaps with 'free' and is more specific than 'allow'.  I call this the "I have something useful to add!" case.

For 3d-raytrace (for which bug 516042 tells us hint() accounts for 0.9% of run-time) here are the proportions of the above cases:

- "I second your fine decision!"    65%
- "I give up!"                      33%
- "I have something useful to add!"  2%
Great statistics. Thanks. If you focus on the "I have something useful to add" part, is there any specific op that we contribute something useful for? Maybe there is one or two cases where we do something useful and we can drop the rest (or we just drop them all if there is no easy cheap hints we might want to keep).
Looking some more, the main cause of the "I second your fine decision!" case is
findSpecificRegFor() -- ie. when we've already narrowed 'allow' down to a
single register.  There's no pointing trying to apply hints in that case.

Nb: it's interesting to see that in the NativeX64 backend hint() is a no-op.
Here's the stats for 3d-raytrace.  On each line is:

- the number of occurrences for this pattern
- the opcode
- the value of 'allow' at the end of hint()
- the value of 'prefer' at the end of hint()


1102 eq      allow(eax ecx edx ebx esi edi) prefer(eax ecx edx ebx) 

 346 int     allow(eax edx ebx esi edi) prefer(eax edx) 

 236 icall   allow(eax edx ebx esi edi) prefer(eax) 

 216 icall   allow(eax ecx edx ebx esi edi) prefer(eax) 

 114 iparam  allow(eax ecx edx ebx esi edi) prefer(eax) 

  87 iparam  allow(eax ecx edx esi edi) prefer(eax) 

  68 icall   allow(eax ecx edx ebx edi) prefer(eax) 

  53 fcall   allow(xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7 f0) prefer(f0) 

  26 flt     allow(eax ecx edx ebx esi edi) prefer(eax ecx edx ebx) 

  13 int     allow(eax ecx edx esi edi) prefer(eax ecx edx) 

  11 fgt     allow(eax ecx edx ebx esi edi) prefer(eax ecx edx ebx) 

   3 flt     allow(eax ecx edx ebx esi) prefer(eax ecx edx ebx) 

   3 flt     allow(ecx edx ebx esi edi) prefer(ecx edx ebx) 

   1 int     allow(eax ecx edx ebx esi edi) prefer(eax ecx edx) 


In other words, it's spread across all the hint() cases.

I guess these cases mostly correspond to times when findRegFor() is called not on the instruction we're looking at, but one of its operands, and so we haven't already narrowed 'allow' down sensibly?
Can we narrow down when hint is called? findRegForOperand()?
We already have findSpecificRegFor(), but it just calls findRegFor().  If we clone findRegFor() and specialise for the single-register-allowed case we can skip the hint calls.  We can also then specialise registerAlloc() for the single-register-allowed case.

But I just tried all that and didn't get a noticeable speedup.  Perhaps not surprising -- we're looking at minor speedups to a functions that account for less than 1% of SS time.
(In reply to comment #2)

> - 'prefer' ends up the same as 'allow'.  This can happen because the 'allow'
> set passed to findRegFor() often already has been chosen carefully and hint()
> is effectively repeating that choosing.  Eg. for calls we do prepResultReg(ins,
> rmask(EAX)) and then hint() sees that it's a LIR_call and so suggests
> rmask(EAX).  Alternatively, the opcode is one that hint() doesn't treat
> specially.  Either way, I call this the "I second your fine decision!" case.

> - "I second your fine decision!"    65%
> - "I give up!"                      33%
> - "I have something useful to add!"  2%

this is surprising -- since the code is assembled bottom up, the purpose
of hint is to ask for a result to be in the register that the opcode will
have to put it in anyway.  e.g. calls to into EAX.

the hint matters at the point the result is used; usually a result can
be used by any register, and i'd expect hint to matter.

wonder why we're not seeing that?  maybe because the result is often just
passed to another function as a register param (and forced into ecx or edx) thus constraining both "ends" of the lifetime?

re: X64 hint() being a no-op:  lazyness on my part.  but based on this data you're making a good case to just drop it completely.

could you post a patch that adds the instrumentation?  i'd like to run it on some tamarin tests.  if our results aren't much different then dropping it sounds fine.
Attached patch instrumentation patch (obsolete) — Splinter Review
It's very primitive, it just does a different printf for each case.  I pipe the output to a file and then use search-and-replace to count them :)
(In reply to comment #8)
> (In reply to comment #2)
> 
> > - 'prefer' ends up the same as 'allow'.  This can happen because the 'allow'
> > set passed to findRegFor() often already has been chosen carefully and hint()
> > is effectively repeating that choosing.  Eg. for calls we do prepResultReg(ins,
> > rmask(EAX)) and then hint() sees that it's a LIR_call and so suggests
> > rmask(EAX).  Alternatively, the opcode is one that hint() doesn't treat
> > specially.  Either way, I call this the "I second your fine decision!" case.
> 
> this is surprising -- since the code is assembled bottom up, the purpose
> of hint is to ask for a result to be in the register that the opcode will
> have to put it in anyway.  e.g. calls to into EAX.
> 
> the hint matters at the point the result is used; usually a result can
> be used by any register, and i'd expect hint to matter.
> 
> wonder why we're not seeing that?  maybe because the result is often just
> passed to another function as a register param (and forced into ecx or edx)
> thus constraining both "ends" of the lifetime?

I distinguished between the two sub-cases described above.  The case where the opcode is one that hint() doesn't treat specially dominates, accounting for 60% of all calls to hint().  The case where the opcode is handled by hint() but it just gives the same result as 'allow' account for 5% of cases.

In other words:
1. The majority of cases (60%) hint doesn't even try to add anything.
2. Most of the time when it does try (33%) it restricts things too much, giving no overlap with 'free'.
3. Sometimes it tries, doesn't restrict things too much, but the end result is the same as 'allow' (5%)
4. Occasionally it tries, doesn't restrict things too much, and the end result is an improvement (2%)

These numbers are for 3d-raytrace, but I tried several other SS ones and got similar results.
Attached patch better patchSplinter Review
This one prints "case 1", "case 2", "case 3", "case 4" corresponding to the four cases in my previous comment.
Attachment #398237 - Attachment is obsolete: true
(In reply to comment #8)
> could you post a patch that adds the instrumentation?  i'd like to run it on
> some tamarin tests.  if our results aren't much different then dropping it
> sounds fine.

Just remember that the acid test is to simply disable hint() and see if it makes a difference.  I just tried it again on x86 on SunSpider and the timings were unchanged.
yup, agreed.
Actually, the hint hint() gives us is also not particular useful except when we are getting pretty close to the definition. Currently we pass state as param0, and we keep shuffling it into ECX across the entire trace because thats where param0 would want it. We do so while calling a bunch of fastcalls, leading to constant spilling of ECX. I think we could fix both problems here at once with a simple isnear(current, op) in hint with an early out. isnear can be as easy as op in [current, current+10]. If there is a page boundary, then we just miss the hint. That would bypass hinting until we get close to the op and there is a chance that its not going to get spilled, and it also shortcuts consulting the heuristics.
Assignee: general → gal
Summary: TM: turn Assembler::hint() into a table → TM: only consult Assembler::hint() if the live range is reasonably short
Attached patch mildy hackish patch (obsolete) — Splinter Review
This patch eliminates about 90% of the calls to hint based on the distance heuristics. No measurable perf impact.

whale:src gal$ cat /tmp/x | grep "1" | wc -l
  234600
whale:src gal$ cat /tmp/x | grep "2" | wc -l
   24949
Attachment #398520 - Attachment is obsolete: true
Attachment #398526 - Attachment is obsolete: true
If it's relevant for all back ends, it might be better to put the test at the hint() call sites instead of within hint().
I can't test the heuristics on other platforms. I just leave it to other backend authors. We only really care about x86-* and arm.
I just stumbled across this ... any activity of interest?
Whiteboard: has-patch
Grab it if you think it makes sense or lets wont-fix it otherwise. I picked 8 randomly without measurements. The patch was motivated by looking at code that moves something into ECX for a shift op but is so far away that it can't possibly help anything anyway.
Attachment #398527 - Flags: feedback?(edwsmith)
Whiteboard: has-patch → PACMAN, has-patch
Component: JavaScript Engine → Nanojit
OS: Mac OS X → All
Target Milestone: --- → Future
Attachment #398527 - Flags: feedback?(edwsmith)
(In reply to comment #22)
>
> Results on TR x86 with hinting disabled showed no changes outside "noise"

Last time I measured the same was true for TM, but if you look at the generated code it is a bit worse, which makes me reluctant to drop hint().

It's the chain of tests in Nativei386.cpp:hint() that sucks so much.  If we changed it to use a table lookup that could help a lot.  LIR_param is more difficult because the hinted regSet isn't static... maybe there could be a special table value that means "run some extra code to get the regSet".
Yeah.  I think in the end, something like Gal's patch is a great heuristic - we find most of the nearby opportunities at low cost (pointer subtraction) and shrug about the rest.  combined with your suggestions it would speed up compilation.  especially if a sizable % of references are to faraway definitions (stretch).

static counts of R<->R and R<->M moves, and compile times, might be more useful for evaluating this than runtime performance benchmarks.
Adds a hints[] table which can be used to specify preferred register sets
for particular opcodes on a per-backend basis.  PREFER_SPECIAL is an escape
hatch that lets more complicated reasoning be applied, it's used in all the
backends for LIR_paramp.  So for most cases there is now a single
conditional test, whereas on i386 there were many.  This saves a tiny bit on
TM/SunSpider (1.001x fewer instructions).

On X64 I actually implemented some hints and got good results, close to 2%
fewer instructions on trace (about 0.5% overall).  regexp-dna was particular
good, something like 5% fewer instructions.

I haven't done anything with the "isnear" heuristic.  If we're just 
interested in reducing the cost of doing the hints (and not interested in 
changing the generated code, modulo the additions of hints on X64) then this
patch is enough.
Attachment #452996 - Flags: review?(edwsmith)
Comment on attachment 452996 [details] [diff] [review]
patch implementing table-based hinting

jit-time improvements are definitely wanted.

Could the hint table be static-const instead?  looks like it always holds constant values.
Attachment #452996 - Flags: review?(edwsmith) → review+
(In reply to comment #26)
> (From update of attachment 452996 [details] [diff] [review])
> jit-time improvements are definitely wanted.
> 
> Could the hint table be static-const instead?  looks like it always holds
> constant values.

oh, i now see that this would be a pain, because LIR opcode numbering would need to be rigid, unless hints were encoded into LIRopcodes.tbl, which would also be a pain.  nevermind.
Gal, I'm going to rename this bug "make hint() faster" and land my patch.  If you want to pursue your "only hint if the instruction is close" approach can you open a new bug?  Thanks.
Assignee: gal → nnethercote
Status: NEW → ASSIGNED
Summary: TM: only consult Assembler::hint() if the live range is reasonably short → nanojit: make hint() faster
http://hg.mozilla.org/projects/nanojit-central/rev/f95a1857f8fe
Whiteboard: PACMAN, has-patch → PACMAN, has-patch, fixed-in-nanojit
Sounds like you solved the problem I measured. I am happy with the solution.
http://hg.mozilla.org/tracemonkey/rev/2a59a7d20362
Whiteboard: PACMAN, has-patch, fixed-in-nanojit → PACMAN, has-patch, fixed-in-nanojit, fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/2a59a7d20362
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Blocks: 581702
TR: http://hg.mozilla.org/tamarin-redux/rev/003b98740a2a
Whiteboard: PACMAN, has-patch, fixed-in-nanojit, fixed-in-tracemonkey → PACMAN, fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: