Closed Bug 446556 Opened 16 years ago Closed 10 years ago

Speeding up the hot loop of MMGc PinStackObjects using SSE2 parallel compares

Categories

(Tamarin Graveyard :: Garbage Collection (mmGC), enhancement)

x86
All
enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX
Future

People

(Reporter: mohammad.r.haghighat, Unassigned)

Details

(Whiteboard: PACMAN)

Attachments

(1 file)

User-Agent:       Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.1; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; MS-RTC LM 8)
Build Identifier: 

Profiling shows that ZCT::PinStackObjects is one of the hot spots of MMGc. The function contains a loop of the form:

while(p < end)
{
    const void *val = GC::Pointer(*p++);	
    if(val < _memStart || val >= _memEnd)
        continue;
    ...
    ...
}

Profiling also shows that the hot loop is actually the short looped formed through the "continue" statement.

Reproducible: Always

Steps to Reproduce:
1. For example, run TT on Sunspider/string-fasta under VTune
2.
3.
Actual Results:  
Look at the hot spots in the file GC.cpp

Expected Results:  
You can see the short hot loop.

Value profiling indicates that the trip count of this short loop can sometimes be large. Here are the data from vprof on some of the Sunspider benchmarks:

                        bytes scanned to find a match
Test                    avg  [min:max]       total  calls
--------------------   ----  -----------  --------  -----
access-binary-trees     703  [0 : 130412]   165280    235
crypto-aes              702  [0 : 128812]  8925832  12714
3d-cube                 389  [0 : 130468]  3288704   8445
3d-raytrace             743  [0 : 130060]  5128848   6900
string-fasta           1018  [0 : 130644] 27625244  27142
string-validate-input   859  [0 : 130620] 37867600  44098

More specificaly, we can see that in the case of string-fasta a total of 27M bytes are scanned by this short loop. The average number of bytes to find the next match is 1018 while in several of these benchmarks the maximum of bytes scanned before getting to the next match is 130K. This shows great potentials for benefitting from SSE2 parallel compares.

The attached patch is the first attempt for such a solution. The main loop looks like:

while(p < end)
{
    const void *val = GC::Pointer(*p++);

    if ((p = (RCObject**) skipOutOfRange ((uintptr_t*) p, (uintptr_t*) end, _memStart, _memEnd)) >= end) break;
	
    if(val < _memStart || val >= _memEnd)
        continue;
    ...
    ...
}

The new helper function skipOutOfRange gets a pointer and the bounary values for the range, and bumps the pointer to the first location where the value is in the given range or returns end of the list if the list is exhausted and no match is found. The patch also takes care of a shortcoming with the lack of unsigned parallel compares in SSE2.

I have not noticed any negative performance impact yet. In string benchmarks where GC kicks in, we get good speedups. Here are some of the improvements on Sunspider:

test                                                   avm    avm2     %sp

sunspider/string-fasta.as                            422.0   375.0    12.5
sunspider/as3\string-fasta.as                        265.0   234.0    13.2
sunspider/string-validate-input.as                   828.0   765.0     8.2
sunspider/as3\string-validate-input.as               797.0   734.0     8.6
sunspider/as3\s3d-cube.as                            219.0   203.0     7.9
Component: Tracing Virtual Machine → JIT Compiler (NanoJIT)
QA Contact: tracing-vm → nanojit
Component: JIT Compiler (NanoJIT) → Garbage Collection (mmGC)
QA Contact: nanojit → gc
Flags: flashplayer-qrb+
Priority: -- → P3
Target Milestone: --- → Future
OS: Windows XP → All
Priority: P3 → --
Whiteboard: PACMAN
Any update on this bug?
This initial submission of this bug entry is from more than six years ago. I don't know whether or not it's still relevant. For sure, I haven't been working on it for the past several years.
Bill McCloskey [:billm] - Can you look here or maybe know who will be the right person to review this patch and mainly that we still need this? Thank you.
Severity: normal → enhancement
Flags: needinfo?(wmccloskey)
I think MMgc is dead at this point. This looks like an optimization for conservative stack scanning. SpiderMonkey no longer uses a conservative stack scanner, so I don't think there's any way we could apply this to our current codebase. It looks like a nice optimization though.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(wmccloskey)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: