Closed
Bug 624512
Opened 14 years ago
Closed 6 years ago
Add a GCSet function to replace fast manual WB/WBRC calls
Categories
(Tamarin Graveyard :: Garbage Collection (mmGC), defect, P2)
Tamarin Graveyard
Garbage Collection (mmGC)
Tracking
(Not tracked)
RESOLVED
WONTFIX
Q1 12 - Brannan
People
(Reporter: treilly, Unassigned)
References
Details
There are two historic justifications for manual write barriers, the smart pointer didn't handle secondary page of large allocations (fixed recently) and that they are faster due to avoiding the container and GC calculation.
The container is calculated in order to call GetGCBits, a derived pointer will work just as well I suspect as an argument to GetGCBits. This may not be true of the fast bits path but I'm pretty sure its true of the !FASTBITS path. In which case we could just enable both paths (ie compile in fastbits and !fastbits which happens to work because I made the fast bits bit shift always be in the header for layout simplicity) and avoid container calculation in the write barrier. Then we could get rid of the write barrier paths that take the container as an arg simplify things.
If we can pull this off it will relieve some of the performance risk from safegc and should result in overall speedups.
Note that in addition to avoiding the container calculation we could experiment with moving the GC * calculation out of the fast path and into the hit path. Ie check object bits first then BarrierActive(). That might be a slow down.
Updated•14 years ago
|
Priority: P2 → --
Target Milestone: flash10.x-Serrano → Future
Reporter | ||
Comment 1•14 years ago
|
||
Assertion was correct, GetGCBits on derived ptrs works but only with FASTBITS turned off.
Reporter | ||
Comment 2•14 years ago
|
||
My thinking here and I haven't had time to whip up a patch is to have a byte[512] for each size class that given (pointer & 0xfff) >> 3 returns the gcbits index (ie the FASTBITS index). Then we can use it to get the bits from a derived pointer and skip the container calc.
Flags: flashplayer-qrb+
Flags: flashplayer-injection-
Flags: flashplayer-bug-
Reporter | ||
Comment 3•14 years ago
|
||
Just skipping the container calculation should speed things up a little another real big cost of the write barrier is hitting the page table to find out if the object is large or small. The bulk majority of write barriers are small and we could actually optimize out the page table hit if we could know statically that the write barrier type was small. One idea I had for that was to strip out the "extra" new form into separate base classes, (ie GCObjectExtra, GCFinalizedObjectExtra etc). Then we could special if sizeof(T) < kLargestAlloc. My theory is removing the container calc and the page table hit from the write barrier will result in a large performance increase across the board.
Comment 4•14 years ago
|
||
Linking to bug #654943 because of the previous comment about splitting objects with and without extra fields.
Regarding the gcbits observation: it's almost certainly the case that we're moving to a world with object headers, and that strategy won't apply there. But it might still be useful in the short term.
Blocks: 654943
Reporter | ||
Comment 5•14 years ago
|
||
That was a non-sensical notion, the write barrier page table hit is on the container, not the type of the payload (which is the only thing the template class knows about).
I'm convinced that operator= can't ever be as fast as we want it and we need a GCSet API, ie:
foo = bar;
becomes
GCSet(foo,bar);
where foo is:
GCMember<Bar> foo;
And GCSet is a member function inherited from the GC base classes.
Thinking of looking into clang to generate a patch to change all operator= uses into such a beast to investigate the perf angle. Obviously such a big API change would have to have a big perf benefit to justify itself. Although operator= can go away and GCSet can just become the tool we use to remove all manual WB/WBRC instances.
Reporter | ||
Updated•14 years ago
|
Summary: Make WriteBarrier avoid container calculation → Add a GCSet function to replace fast manual WB/WBRC calls
Comment 6•14 years ago
|
||
I believe the GCSet() thing would work but the churn is monumental. There are other alternatives here that we should consider.
Recall that the purpose of the barrier is to record the assignments of white pointers into black objects. (We actually only care about the last such assignment in a GC cycle.) There are several strategies available for that, at least these:
- make the black container gray by enqueueing it (what we do, requires finding
the container)
- make the white value gray by enqueueing it (does not require finding the
container)
- record the location of field within the black object, and then later
examine it to see if it has a pointer to a white object
In addition there are overall changes in strategy like allocating objects black (snapshot-at-the-beginning) but I'm a little skeptical about whether that's a good idea given that our marker will be active most of the time.
(Card maps are optimizations to the third technique - cheaply store conservative summaries of locations that need to be examined).
Which technique that is best probably depends on various parameters, like the frequency of black-to-white pointers, the frequency of multiple black-to-white updates in a program (if high then just graying the white pointer would result in a fair amount of floating garbage), the cost of checking whether an object is in fact gray, and so on.
Before going for GCSet we should probably look to the literature too. Pekka Pirinen had a paper in 1998 on barrier techniques for incremental collectors; I belive Tony Hosking had a paper on barrier techniques in general last year but I've not read it.
Comment 7•14 years ago
|
||
All that said, "not finding" the black container means doing extra work later because some assignments would have been into non-black locations.
Comment 8•14 years ago
|
||
Let's give this the priority it deserves.
Priority: -- → P2
Target Milestone: Future → Q1 12 - Brannan
Reporter | ||
Comment 9•14 years ago
|
||
Most of these are for uintptr_t fields carrying different pointer types so doing 608708 makes sense first.
Depends on: 608708
Comment 10•6 years ago
|
||
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•