Closed
Bug 513514
Opened 15 years ago
Closed 15 years ago
nanojit: make hint() faster
Categories
(Core Graveyard :: Nanojit, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
Future
People
(Reporter: gal, Assigned: n.nethercote)
References
Details
(Whiteboard: PACMAN, fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)
Attachments
(4 files, 3 obsolete files)
1.98 KB,
patch
|
Details | Diff | Splinter Review | |
8.31 KB,
patch
|
Details | Diff | Splinter Review | |
34.36 KB,
text/plain
|
Details | |
17.04 KB,
patch
|
edwsmith
:
review+
|
Details | Diff | Splinter Review |
It shows up in some profiles clocking in at about 1% total time for aes. The long series of if statements is painful, especially the isCmp part. We probably want to put this in some kind of table since most branches only depend on op.
Assignee | ||
Comment 1•15 years ago
|
||
The if-then-else chains are painful, as you say, but the actions for each alternative are different -- usually an assignment or masking, but sometimes (eg. the LIR_param case) depending on other values. They also vary across back-ends so doing it via the usual LIRopcode.tbl mechanism wouldn't work.
There's also the issue that some of the back-ends mask the result with regs.free in hint() but not all of them do. That's easier to make consistent, though.
Assignee | ||
Comment 2•15 years ago
|
||
Here's another suggestion: get rid of hint() altogether. I turned it off and it barely made a difference; it may have even caused a 1--2ms speedup.
Why? On x86, at least, it's because hint() almost never makes any useful contribution. There are three cases:
- 'prefer' ends up the same as 'allow'. This can happen because the 'allow' set passed to findRegFor() often already has been chosen carefully and hint() is effectively repeating that choosing. Eg. for calls we do prepResultReg(ins, rmask(EAX)) and then hint() sees that it's a LIR_call and so suggests rmask(EAX). Alternatively, the opcode is one that hint() doesn't treat specially. Either way, I call this the "I second your fine decision!" case.
- In the remaining cases, 'prefer' usually ends up having no overlap with 'free', so we just return 'allow' anyway. I call this the "I give up!" case.
- In the remaining cases, 'prefer' overlaps with 'free' and is more specific than 'allow'. I call this the "I have something useful to add!" case.
For 3d-raytrace (for which bug 516042 tells us hint() accounts for 0.9% of run-time) here are the proportions of the above cases:
- "I second your fine decision!" 65%
- "I give up!" 33%
- "I have something useful to add!" 2%
Reporter | ||
Comment 3•15 years ago
|
||
Great statistics. Thanks. If you focus on the "I have something useful to add" part, is there any specific op that we contribute something useful for? Maybe there is one or two cases where we do something useful and we can drop the rest (or we just drop them all if there is no easy cheap hints we might want to keep).
Assignee | ||
Comment 4•15 years ago
|
||
Looking some more, the main cause of the "I second your fine decision!" case is
findSpecificRegFor() -- ie. when we've already narrowed 'allow' down to a
single register. There's no pointing trying to apply hints in that case.
Nb: it's interesting to see that in the NativeX64 backend hint() is a no-op.
Assignee | ||
Comment 5•15 years ago
|
||
Here's the stats for 3d-raytrace. On each line is:
- the number of occurrences for this pattern
- the opcode
- the value of 'allow' at the end of hint()
- the value of 'prefer' at the end of hint()
1102 eq allow(eax ecx edx ebx esi edi) prefer(eax ecx edx ebx)
346 int allow(eax edx ebx esi edi) prefer(eax edx)
236 icall allow(eax edx ebx esi edi) prefer(eax)
216 icall allow(eax ecx edx ebx esi edi) prefer(eax)
114 iparam allow(eax ecx edx ebx esi edi) prefer(eax)
87 iparam allow(eax ecx edx esi edi) prefer(eax)
68 icall allow(eax ecx edx ebx edi) prefer(eax)
53 fcall allow(xmm0 xmm1 xmm2 xmm3 xmm4 xmm5 xmm6 xmm7 f0) prefer(f0)
26 flt allow(eax ecx edx ebx esi edi) prefer(eax ecx edx ebx)
13 int allow(eax ecx edx esi edi) prefer(eax ecx edx)
11 fgt allow(eax ecx edx ebx esi edi) prefer(eax ecx edx ebx)
3 flt allow(eax ecx edx ebx esi) prefer(eax ecx edx ebx)
3 flt allow(ecx edx ebx esi edi) prefer(ecx edx ebx)
1 int allow(eax ecx edx ebx esi edi) prefer(eax ecx edx)
In other words, it's spread across all the hint() cases.
I guess these cases mostly correspond to times when findRegFor() is called not on the instruction we're looking at, but one of its operands, and so we haven't already narrowed 'allow' down sensibly?
Reporter | ||
Comment 6•15 years ago
|
||
Can we narrow down when hint is called? findRegForOperand()?
Assignee | ||
Comment 7•15 years ago
|
||
We already have findSpecificRegFor(), but it just calls findRegFor(). If we clone findRegFor() and specialise for the single-register-allowed case we can skip the hint calls. We can also then specialise registerAlloc() for the single-register-allowed case.
But I just tried all that and didn't get a noticeable speedup. Perhaps not surprising -- we're looking at minor speedups to a functions that account for less than 1% of SS time.
Comment 8•15 years ago
|
||
(In reply to comment #2)
> - 'prefer' ends up the same as 'allow'. This can happen because the 'allow'
> set passed to findRegFor() often already has been chosen carefully and hint()
> is effectively repeating that choosing. Eg. for calls we do prepResultReg(ins,
> rmask(EAX)) and then hint() sees that it's a LIR_call and so suggests
> rmask(EAX). Alternatively, the opcode is one that hint() doesn't treat
> specially. Either way, I call this the "I second your fine decision!" case.
> - "I second your fine decision!" 65%
> - "I give up!" 33%
> - "I have something useful to add!" 2%
this is surprising -- since the code is assembled bottom up, the purpose
of hint is to ask for a result to be in the register that the opcode will
have to put it in anyway. e.g. calls to into EAX.
the hint matters at the point the result is used; usually a result can
be used by any register, and i'd expect hint to matter.
wonder why we're not seeing that? maybe because the result is often just
passed to another function as a register param (and forced into ecx or edx) thus constraining both "ends" of the lifetime?
re: X64 hint() being a no-op: lazyness on my part. but based on this data you're making a good case to just drop it completely.
could you post a patch that adds the instrumentation? i'd like to run it on some tamarin tests. if our results aren't much different then dropping it sounds fine.
Assignee | ||
Comment 9•15 years ago
|
||
It's very primitive, it just does a different printf for each case. I pipe the output to a file and then use search-and-replace to count them :)
Assignee | ||
Comment 10•15 years ago
|
||
(In reply to comment #8)
> (In reply to comment #2)
>
> > - 'prefer' ends up the same as 'allow'. This can happen because the 'allow'
> > set passed to findRegFor() often already has been chosen carefully and hint()
> > is effectively repeating that choosing. Eg. for calls we do prepResultReg(ins,
> > rmask(EAX)) and then hint() sees that it's a LIR_call and so suggests
> > rmask(EAX). Alternatively, the opcode is one that hint() doesn't treat
> > specially. Either way, I call this the "I second your fine decision!" case.
>
> this is surprising -- since the code is assembled bottom up, the purpose
> of hint is to ask for a result to be in the register that the opcode will
> have to put it in anyway. e.g. calls to into EAX.
>
> the hint matters at the point the result is used; usually a result can
> be used by any register, and i'd expect hint to matter.
>
> wonder why we're not seeing that? maybe because the result is often just
> passed to another function as a register param (and forced into ecx or edx)
> thus constraining both "ends" of the lifetime?
I distinguished between the two sub-cases described above. The case where the opcode is one that hint() doesn't treat specially dominates, accounting for 60% of all calls to hint(). The case where the opcode is handled by hint() but it just gives the same result as 'allow' account for 5% of cases.
In other words:
1. The majority of cases (60%) hint doesn't even try to add anything.
2. Most of the time when it does try (33%) it restricts things too much, giving no overlap with 'free'.
3. Sometimes it tries, doesn't restrict things too much, but the end result is the same as 'allow' (5%)
4. Occasionally it tries, doesn't restrict things too much, and the end result is an improvement (2%)
These numbers are for 3d-raytrace, but I tried several other SS ones and got similar results.
Assignee | ||
Comment 11•15 years ago
|
||
This one prints "case 1", "case 2", "case 3", "case 4" corresponding to the four cases in my previous comment.
Attachment #398237 -
Attachment is obsolete: true
Assignee | ||
Comment 12•15 years ago
|
||
(In reply to comment #8)
> could you post a patch that adds the instrumentation? i'd like to run it on
> some tamarin tests. if our results aren't much different then dropping it
> sounds fine.
Just remember that the acid test is to simply disable hint() and see if it makes a difference. I just tried it again on x86 on SunSpider and the timings were unchanged.
Comment 13•15 years ago
|
||
yup, agreed.
Reporter | ||
Comment 14•15 years ago
|
||
Actually, the hint hint() gives us is also not particular useful except when we are getting pretty close to the definition. Currently we pass state as param0, and we keep shuffling it into ECX across the entire trace because thats where param0 would want it. We do so while calling a bunch of fastcalls, leading to constant spilling of ECX. I think we could fix both problems here at once with a simple isnear(current, op) in hint with an early out. isnear can be as easy as op in [current, current+10]. If there is a page boundary, then we just miss the hint. That would bypass hinting until we get close to the op and there is a chance that its not going to get spilled, and it also shortcuts consulting the heuristics.
Reporter | ||
Updated•15 years ago
|
Assignee: general → gal
Summary: TM: turn Assembler::hint() into a table → TM: only consult Assembler::hint() if the live range is reasonably short
Reporter | ||
Comment 15•15 years ago
|
||
Reporter | ||
Comment 16•15 years ago
|
||
This patch eliminates about 90% of the calls to hint based on the distance heuristics. No measurable perf impact.
whale:src gal$ cat /tmp/x | grep "1" | wc -l
234600
whale:src gal$ cat /tmp/x | grep "2" | wc -l
24949
Attachment #398520 -
Attachment is obsolete: true
Reporter | ||
Comment 17•15 years ago
|
||
Attachment #398526 -
Attachment is obsolete: true
Assignee | ||
Comment 18•15 years ago
|
||
If it's relevant for all back ends, it might be better to put the test at the hint() call sites instead of within hint().
Reporter | ||
Comment 19•15 years ago
|
||
I can't test the heuristics on other platforms. I just leave it to other backend authors. We only really care about x86-* and arm.
Comment 20•15 years ago
|
||
I just stumbled across this ... any activity of interest?
Updated•15 years ago
|
Whiteboard: has-patch
Reporter | ||
Comment 21•15 years ago
|
||
Grab it if you think it makes sense or lets wont-fix it otherwise. I picked 8 randomly without measurements. The patch was motivated by looking at code that moves something into ECX for a shift op but is so far away that it can't possibly help anything anyway.
Updated•15 years ago
|
Attachment #398527 -
Flags: feedback?(edwsmith)
Updated•15 years ago
|
Whiteboard: has-patch → PACMAN, has-patch
Updated•15 years ago
|
Component: JavaScript Engine → Nanojit
OS: Mac OS X → All
Updated•15 years ago
|
Target Milestone: --- → Future
Updated•15 years ago
|
Attachment #398527 -
Flags: feedback?(edwsmith)
Comment 22•15 years ago
|
||
Assignee | ||
Comment 23•15 years ago
|
||
(In reply to comment #22)
>
> Results on TR x86 with hinting disabled showed no changes outside "noise"
Last time I measured the same was true for TM, but if you look at the generated code it is a bit worse, which makes me reluctant to drop hint().
It's the chain of tests in Nativei386.cpp:hint() that sucks so much. If we changed it to use a table lookup that could help a lot. LIR_param is more difficult because the hinted regSet isn't static... maybe there could be a special table value that means "run some extra code to get the regSet".
Comment 24•15 years ago
|
||
Yeah. I think in the end, something like Gal's patch is a great heuristic - we find most of the nearby opportunities at low cost (pointer subtraction) and shrug about the rest. combined with your suggestions it would speed up compilation. especially if a sizable % of references are to faraway definitions (stretch).
static counts of R<->R and R<->M moves, and compile times, might be more useful for evaluating this than runtime performance benchmarks.
Assignee | ||
Comment 25•15 years ago
|
||
Adds a hints[] table which can be used to specify preferred register sets
for particular opcodes on a per-backend basis. PREFER_SPECIAL is an escape
hatch that lets more complicated reasoning be applied, it's used in all the
backends for LIR_paramp. So for most cases there is now a single
conditional test, whereas on i386 there were many. This saves a tiny bit on
TM/SunSpider (1.001x fewer instructions).
On X64 I actually implemented some hints and got good results, close to 2%
fewer instructions on trace (about 0.5% overall). regexp-dna was particular
good, something like 5% fewer instructions.
I haven't done anything with the "isnear" heuristic. If we're just
interested in reducing the cost of doing the hints (and not interested in
changing the generated code, modulo the additions of hints on X64) then this
patch is enough.
Attachment #452996 -
Flags: review?(edwsmith)
Comment 26•15 years ago
|
||
Comment on attachment 452996 [details] [diff] [review]
patch implementing table-based hinting
jit-time improvements are definitely wanted.
Could the hint table be static-const instead? looks like it always holds constant values.
Attachment #452996 -
Flags: review?(edwsmith) → review+
Comment 27•15 years ago
|
||
(In reply to comment #26)
> (From update of attachment 452996 [details] [diff] [review])
> jit-time improvements are definitely wanted.
>
> Could the hint table be static-const instead? looks like it always holds
> constant values.
oh, i now see that this would be a pain, because LIR opcode numbering would need to be rigid, unless hints were encoded into LIRopcodes.tbl, which would also be a pain. nevermind.
Assignee | ||
Comment 28•15 years ago
|
||
Gal, I'm going to rename this bug "make hint() faster" and land my patch. If you want to pursue your "only hint if the instruction is close" approach can you open a new bug? Thanks.
Assignee: gal → nnethercote
Status: NEW → ASSIGNED
Summary: TM: only consult Assembler::hint() if the live range is reasonably short → nanojit: make hint() faster
Assignee | ||
Comment 29•15 years ago
|
||
Whiteboard: PACMAN, has-patch → PACMAN, has-patch, fixed-in-nanojit
Reporter | ||
Comment 30•15 years ago
|
||
Sounds like you solved the problem I measured. I am happy with the solution.
Assignee | ||
Comment 31•15 years ago
|
||
Whiteboard: PACMAN, has-patch, fixed-in-nanojit → PACMAN, has-patch, fixed-in-nanojit, fixed-in-tracemonkey
Comment 32•15 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Comment 33•15 years ago
|
||
Whiteboard: PACMAN, has-patch, fixed-in-nanojit, fixed-in-tracemonkey → PACMAN, fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin
Updated•11 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•