Closed
Bug 514062
Opened 16 years ago
Closed 16 years ago
TM/nanojit: detailed profiling data for nanojit compile-time
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: n.nethercote, Assigned: n.nethercote)
Details
Andreas keeps saying we need to do more profiling...
Below is Shark profiling results for every benchmark in SunSpider and V8.
This testing was done on TraceMonkey r32111, with SunSpider's --shark20
option.
-----------------------------------------------------------------------------
Analysis of results
-----------------------------------------------------------------------------
Given that some programs are longer running than others, if we want to
reduce SunSpider's overall running time, 3d-raytrace is easily the most
important, then date-format-xparb. In comparison crypto-* also have high
compile times proportionally but they are much shorter-running -- all the
crypto-* ones combined take about 2/3 the time of 3d-raytrace.
Having said that, we tend to see the same functions coming up again and
again. In rough order of importance, they are:
- findRegFor
- StackFilter::read
- CseFilter::{ins2,ins1,...}
- intersectRegisterState
- releaseRegisters
- LirReader::read
- hashcode
- gen
- hint
Note that the insXYZ() ones will tend to be underreported because there are
so many of them. Any change which improves many or all of them will have an
outsize effect.
NJ compile-time improvements will improve SunSpider results much more than
V8, because intepretation still currently accounts for a large fraction of
V8 execution time.
-----------------------------------------------------------------------------
Legend for the table
-----------------------------------------------------------------------------
1st column is the program name.
2nd column is the time spent in nanojit::compile() and its children (from
Shark's top-down view), which covers roughly half of NJ compile-time and is
a good indicator.
3rd column shows the most significant NJ functions for each test; these are
self-times (from Shark's bottom-up view). Generally I showed ones that
account for 0.4% or more; or if there are none that signifiiant, the most
significant one is shown.
Programs are sorted according to the nanojit::compile time, except that
all-of-SunSpider and all-of-v8 results are at the top.
-----------------------------------------------------------------------------
program nanojit::compile significant functions
-----------------------------------------------------------------------------
all-of-SunSpider 3.8% 0.5%: findRegFor
0.5%: StackFilter::read
0.4%: intersectRegisterState
0.3%: CseFilter::ins2
0.3%: releaseRegisters
0.3%: LirReader::read
0.3%: hashcode
all-of-v8 1.4% 0.2%: findRegFor
0.2%: intersectRegisterState
0.2%: CseFilter::ins2
0.2%: releaseRegisters
0.2%: StackFilter::read
-----------------------------------------------------------------------------
3d-raytrace 17.7% 2.3%: findRegFor
2.0%: intersectRegisterState
1.8%: releaseRegisters
1.4%: CseFilter::ins2
1.2%: Stackfilter::read
1.2%: hashcode
1.0%: gen
0.9%: hint
0.9%: grow
0.9%: LirReader::read
0.9%: CseFilter::insGuard
0.6%: nRegisterAllocFromSet
0.5%: asm_load
crypto-md5 9.5% 1.7%: StackFilter::read
1.4%: CseFilter:ins2
1.4%: CseFilter:ins1
1.3%: findRegFor
1.1%: hashcode
1.0%: grow
0.8%: LirReader::read
0.7%: gen
0.6%: releaseRegisters
0.5%: LirBufWriter::insStorei
0.5%: CseFilter::insImm
0.5%: registerAlloc
0.4%: hint
0.4%: asm_store32
0.4%: LirWriter::insStorei
0.4%: imm64f
date-format-xparb 7.1% 1.0%: findRegFor
0.7%: StackFilter::read
0.6%: intersectRegisterState
0.5%: releaseRegisters
0.5%: CseFilter::ins2
0.4%: LirReader::read
0.4%: gen
0.4%: CseFilter:insImm
crypto-aes 6.4% 0.7%: StackFilter::read
0.7%: findRegFor
0.7%: intersectRegisterState
0.6%: gen
0.5%: releaseRegisters
0.4%: LirReader::read
crypto-sha1 5.5% 0.8%: StackFilter::read
0.8%: findRegFor
0.4%: LirReader::read
0.4%: releaseRegisters
0.4%: intersectRegisterState
access-nbody 4.1% 0.6%: findRegFor
0.5%: CseFilter::ins2
0.4%: StackFilter::read
v8-crypto 3.8% 0.6%: findRegFor
0.6%: CseFilter::ins2
3d-cube 3.8% 0.4%: intersectRegisterState
0.4%: findRegFor
0.4%: hashcode
math-spectral-norm 3.4% 0.4%: StackFilter::read
0.4%: intersectRegisterState
string-unpack-code 2.1% 0.4%: findRegFor
v8-regexp 1.4% 0.2%: findRegFor
access-fannkuch 1.3% 0.3%: releaseRegisters
v8-earley-boyer 1.0% 0.2%: findRegFor
string-validate-input 1.0% 0.2%: StackFilter::read()
string-base64 0.9% 0.2%: findRegFor
v8-deltablue 0.8% 0.2%: findRegFor
3d-morph 0.8% 0.2%: findRegFor
math-cordic 0.8% 0.1%: asm_branch
access-nsieve 0.7% 0.1%: CodeAlloc:alloc
math-partial-sums 0.7% 0.2%: findRegFor
v8-richards 0.6% 0.1%: findRegFor
bitops-bits-in-byte 0.5% 0.1%: StackFilter::read()
bitops-nsieve-bits 0.4% 0.1%: findRegFor
bitops-3bit-bits-in-... 0.4% 0.1%: LirReader::read
bitops-bitwise-and 0.3% 0.1%: findRegFor
string-tagcloud 0.3% 0.1%: nRegisterAllocFromSet
v8-raytrace 0.2% 0.1%: CseFilter::ins2
regexp-dna 0.2% 0.1%: StackFilter::read
string-fasta 0.2% 0.1%: findRegFor
date-format-tofte 0.1% 0.0%: ...
v8-splay 0.0% 0.0%: ...
access-binary-trees n/a
controlflow-recursive n/a
| Assignee | ||
Comment 1•16 years ago
|
||
I plan to look at the important functions. If anyone else wants to work on anything identified here, please let me know so we can co-ordinate and not duplicate effort. Thanks.
| Assignee | ||
Comment 2•16 years ago
|
||
Bug 513865 improved LirReader::read() slightly, but it didn't make a measurable difference to SunSpider.
| Assignee | ||
Comment 3•16 years ago
|
||
There are a number of ways findRegFor() can be cloned and specialised:
- Often we know the opcode isn't LIR_alloc. We could have a version that avoids the LIR_alloc test
- Often we know the Reservation is used. We could have a version that avoids the !resv->used test. (And Cachegrind says that test is not easy to predict -- 45,000 mispredicts for 315,000 branches which is pretty high.)
- Often 'allow' holds a single register (ie. when called from findSpecificRegFor()). In that case hint() can be skipped, and specialised versions of registerAlloc() can be called.
But each of these would involve lots of code duplication, and provide at best a very slight (ie. probably not measurable in SS) speed-up, so it doesn't seem worthwhile.
Comment 4•16 years ago
|
||
Really cool analysis. I would be comfortable dropping LIR hint based on this. Rick what do you guys think?
| Assignee | ||
Comment 5•16 years ago
|
||
(In reply to comment #4)
> Really cool analysis. I would be comfortable dropping LIR hint based on this.
> Rick what do you guys think?
Was that comment meant to go on bug 513514?
| Assignee | ||
Comment 6•16 years ago
|
||
I've hammered on Nanojit compile-time performance long enough -- I haven't got a clear SunSpider win for a while now, remaining improvements are in the noise. Furthermore, enough changes have occurred that the data in comment 0 is now out-of-date. So there's little point in keeping this bug open.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•