Closed Bug 514062 Opened 16 years ago Closed 16 years ago

TM/nanojit: detailed profiling data for nanojit compile-time

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: n.nethercote, Assigned: n.nethercote)

Details

Andreas keeps saying we need to do more profiling... Below is Shark profiling results for every benchmark in SunSpider and V8. This testing was done on TraceMonkey r32111, with SunSpider's --shark20 option. ----------------------------------------------------------------------------- Analysis of results ----------------------------------------------------------------------------- Given that some programs are longer running than others, if we want to reduce SunSpider's overall running time, 3d-raytrace is easily the most important, then date-format-xparb. In comparison crypto-* also have high compile times proportionally but they are much shorter-running -- all the crypto-* ones combined take about 2/3 the time of 3d-raytrace. Having said that, we tend to see the same functions coming up again and again. In rough order of importance, they are: - findRegFor - StackFilter::read - CseFilter::{ins2,ins1,...} - intersectRegisterState - releaseRegisters - LirReader::read - hashcode - gen - hint Note that the insXYZ() ones will tend to be underreported because there are so many of them. Any change which improves many or all of them will have an outsize effect. NJ compile-time improvements will improve SunSpider results much more than V8, because intepretation still currently accounts for a large fraction of V8 execution time. ----------------------------------------------------------------------------- Legend for the table ----------------------------------------------------------------------------- 1st column is the program name. 2nd column is the time spent in nanojit::compile() and its children (from Shark's top-down view), which covers roughly half of NJ compile-time and is a good indicator. 3rd column shows the most significant NJ functions for each test; these are self-times (from Shark's bottom-up view). Generally I showed ones that account for 0.4% or more; or if there are none that signifiiant, the most significant one is shown. Programs are sorted according to the nanojit::compile time, except that all-of-SunSpider and all-of-v8 results are at the top. ----------------------------------------------------------------------------- program nanojit::compile significant functions ----------------------------------------------------------------------------- all-of-SunSpider 3.8% 0.5%: findRegFor 0.5%: StackFilter::read 0.4%: intersectRegisterState 0.3%: CseFilter::ins2 0.3%: releaseRegisters 0.3%: LirReader::read 0.3%: hashcode all-of-v8 1.4% 0.2%: findRegFor 0.2%: intersectRegisterState 0.2%: CseFilter::ins2 0.2%: releaseRegisters 0.2%: StackFilter::read ----------------------------------------------------------------------------- 3d-raytrace 17.7% 2.3%: findRegFor 2.0%: intersectRegisterState 1.8%: releaseRegisters 1.4%: CseFilter::ins2 1.2%: Stackfilter::read 1.2%: hashcode 1.0%: gen 0.9%: hint 0.9%: grow 0.9%: LirReader::read 0.9%: CseFilter::insGuard 0.6%: nRegisterAllocFromSet 0.5%: asm_load crypto-md5 9.5% 1.7%: StackFilter::read 1.4%: CseFilter:ins2 1.4%: CseFilter:ins1 1.3%: findRegFor 1.1%: hashcode 1.0%: grow 0.8%: LirReader::read 0.7%: gen 0.6%: releaseRegisters 0.5%: LirBufWriter::insStorei 0.5%: CseFilter::insImm 0.5%: registerAlloc 0.4%: hint 0.4%: asm_store32 0.4%: LirWriter::insStorei 0.4%: imm64f date-format-xparb 7.1% 1.0%: findRegFor 0.7%: StackFilter::read 0.6%: intersectRegisterState 0.5%: releaseRegisters 0.5%: CseFilter::ins2 0.4%: LirReader::read 0.4%: gen 0.4%: CseFilter:insImm crypto-aes 6.4% 0.7%: StackFilter::read 0.7%: findRegFor 0.7%: intersectRegisterState 0.6%: gen 0.5%: releaseRegisters 0.4%: LirReader::read crypto-sha1 5.5% 0.8%: StackFilter::read 0.8%: findRegFor 0.4%: LirReader::read 0.4%: releaseRegisters 0.4%: intersectRegisterState access-nbody 4.1% 0.6%: findRegFor 0.5%: CseFilter::ins2 0.4%: StackFilter::read v8-crypto 3.8% 0.6%: findRegFor 0.6%: CseFilter::ins2 3d-cube 3.8% 0.4%: intersectRegisterState 0.4%: findRegFor 0.4%: hashcode math-spectral-norm 3.4% 0.4%: StackFilter::read 0.4%: intersectRegisterState string-unpack-code 2.1% 0.4%: findRegFor v8-regexp 1.4% 0.2%: findRegFor access-fannkuch 1.3% 0.3%: releaseRegisters v8-earley-boyer 1.0% 0.2%: findRegFor string-validate-input 1.0% 0.2%: StackFilter::read() string-base64 0.9% 0.2%: findRegFor v8-deltablue 0.8% 0.2%: findRegFor 3d-morph 0.8% 0.2%: findRegFor math-cordic 0.8% 0.1%: asm_branch access-nsieve 0.7% 0.1%: CodeAlloc:alloc math-partial-sums 0.7% 0.2%: findRegFor v8-richards 0.6% 0.1%: findRegFor bitops-bits-in-byte 0.5% 0.1%: StackFilter::read() bitops-nsieve-bits 0.4% 0.1%: findRegFor bitops-3bit-bits-in-... 0.4% 0.1%: LirReader::read bitops-bitwise-and 0.3% 0.1%: findRegFor string-tagcloud 0.3% 0.1%: nRegisterAllocFromSet v8-raytrace 0.2% 0.1%: CseFilter::ins2 regexp-dna 0.2% 0.1%: StackFilter::read string-fasta 0.2% 0.1%: findRegFor date-format-tofte 0.1% 0.0%: ... v8-splay 0.0% 0.0%: ... access-binary-trees n/a controlflow-recursive n/a
I plan to look at the important functions. If anyone else wants to work on anything identified here, please let me know so we can co-ordinate and not duplicate effort. Thanks.
Bug 513865 improved LirReader::read() slightly, but it didn't make a measurable difference to SunSpider.
There are a number of ways findRegFor() can be cloned and specialised: - Often we know the opcode isn't LIR_alloc. We could have a version that avoids the LIR_alloc test - Often we know the Reservation is used. We could have a version that avoids the !resv->used test. (And Cachegrind says that test is not easy to predict -- 45,000 mispredicts for 315,000 branches which is pretty high.) - Often 'allow' holds a single register (ie. when called from findSpecificRegFor()). In that case hint() can be skipped, and specialised versions of registerAlloc() can be called. But each of these would involve lots of code duplication, and provide at best a very slight (ie. probably not measurable in SS) speed-up, so it doesn't seem worthwhile.
Really cool analysis. I would be comfortable dropping LIR hint based on this. Rick what do you guys think?
(In reply to comment #4) > Really cool analysis. I would be comfortable dropping LIR hint based on this. > Rick what do you guys think? Was that comment meant to go on bug 513514?
I've hammered on Nanojit compile-time performance long enough -- I haven't got a clear SunSpider win for a while now, remaining improvements are in the noise. Furthermore, enough changes have occurred that the data in comment 0 is now out-of-date. So there's little point in keeping this bug open.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.