Closed
Bug 446556
Opened 16 years ago
Closed 10 years ago
Speeding up the hot loop of MMGc PinStackObjects using SSE2 parallel compares
Categories
(Tamarin Graveyard :: Garbage Collection (mmGC), enhancement)
Tracking
(Not tracked)
RESOLVED
WONTFIX
Future
People
(Reporter: mohammad.r.haghighat, Unassigned)
Details
(Whiteboard: PACMAN)
Attachments
(1 file)
2.50 KB,
text/plain
|
Details |
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.1; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; MS-RTC LM 8) Build Identifier: Profiling shows that ZCT::PinStackObjects is one of the hot spots of MMGc. The function contains a loop of the form: while(p < end) { const void *val = GC::Pointer(*p++); if(val < _memStart || val >= _memEnd) continue; ... ... } Profiling also shows that the hot loop is actually the short looped formed through the "continue" statement. Reproducible: Always Steps to Reproduce: 1. For example, run TT on Sunspider/string-fasta under VTune 2. 3. Actual Results: Look at the hot spots in the file GC.cpp Expected Results: You can see the short hot loop. Value profiling indicates that the trip count of this short loop can sometimes be large. Here are the data from vprof on some of the Sunspider benchmarks: bytes scanned to find a match Test avg [min:max] total calls -------------------- ---- ----------- -------- ----- access-binary-trees 703 [0 : 130412] 165280 235 crypto-aes 702 [0 : 128812] 8925832 12714 3d-cube 389 [0 : 130468] 3288704 8445 3d-raytrace 743 [0 : 130060] 5128848 6900 string-fasta 1018 [0 : 130644] 27625244 27142 string-validate-input 859 [0 : 130620] 37867600 44098 More specificaly, we can see that in the case of string-fasta a total of 27M bytes are scanned by this short loop. The average number of bytes to find the next match is 1018 while in several of these benchmarks the maximum of bytes scanned before getting to the next match is 130K. This shows great potentials for benefitting from SSE2 parallel compares. The attached patch is the first attempt for such a solution. The main loop looks like: while(p < end) { const void *val = GC::Pointer(*p++); if ((p = (RCObject**) skipOutOfRange ((uintptr_t*) p, (uintptr_t*) end, _memStart, _memEnd)) >= end) break; if(val < _memStart || val >= _memEnd) continue; ... ... } The new helper function skipOutOfRange gets a pointer and the bounary values for the range, and bumps the pointer to the first location where the value is in the given range or returns end of the list if the list is exhausted and no match is found. The patch also takes care of a shortcoming with the lack of unsigned parallel compares in SSE2. I have not noticed any negative performance impact yet. In string benchmarks where GC kicks in, we get good speedups. Here are some of the improvements on Sunspider: test avm avm2 %sp sunspider/string-fasta.as 422.0 375.0 12.5 sunspider/as3\string-fasta.as 265.0 234.0 13.2 sunspider/string-validate-input.as 828.0 765.0 8.2 sunspider/as3\string-validate-input.as 797.0 734.0 8.6 sunspider/as3\s3d-cube.as 219.0 203.0 7.9
Reporter | ||
Comment 1•16 years ago
|
||
Component: Tracing Virtual Machine → JIT Compiler (NanoJIT)
QA Contact: tracing-vm → nanojit
Updated•15 years ago
|
Component: JIT Compiler (NanoJIT) → Garbage Collection (mmGC)
QA Contact: nanojit → gc
Flags: flashplayer-qrb+
Priority: -- → P3
Target Milestone: --- → Future
Updated•15 years ago
|
OS: Windows XP → All
Priority: P3 → --
Comment 2•10 years ago
|
||
Any update on this bug?
Reporter | ||
Comment 3•10 years ago
|
||
This initial submission of this bug entry is from more than six years ago. I don't know whether or not it's still relevant. For sure, I haven't been working on it for the past several years.
Comment 4•10 years ago
|
||
Bill McCloskey [:billm] - Can you look here or maybe know who will be the right person to review this patch and mainly that we still need this? Thank you.
Severity: normal → enhancement
Flags: needinfo?(wmccloskey)
I think MMgc is dead at this point. This looks like an optimization for conservative stack scanning. SpiderMonkey no longer uses a conservative stack scanner, so I don't think there's any way we could apply this to our current codebase. It looks like a nice optimization though.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(wmccloskey)
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•