Closed Bug 632453 Opened 13 years ago Closed 13 years ago

Clean up the mark stack

Categories

(Tamarin Graveyard :: Garbage Collection (mmGC), defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED
Q3 11 - Serrano

People

(Reporter: lhansen, Assigned: lhansen)

References

Details

(Whiteboard: has-patch)

Attachments

(7 files, 3 obsolete files)

The mark stack is becoming increasingly baroque and inefficient for our purposes.  On the one hand the size field is not used by exactly marked objects (almost all of them) and pushing and popping it is a waste of effort; on the other hand it is probable that the system for exactly scanning GCRoots is going to require more information bits about the mark item to be kept in the item than there is space for.

Also, the stack code needs the inline treatment.

A sketch for a plausible design:

Mark stack items become variable length.  The two low bits contain information about the item.  The low bit is the "last word" bit (if zero this is the last word of the item); the high bit is the "continuation bit" (if zero this is the first word of the item).  That gives eg the stack flushing logic enough to work with.

Specifically,

  00   normal exactly-traced mark item
  01   start of special mark item: high bits contain information, not pointer
  10   last word of some mark item
  11   intermediate word of some mark item

Thus sentinels (which we'll still need) and roots will still be two words but the bulk of the traffic will be one-word items.  We'll make it possible to describe mark items easily because we'll have at least 30 bits available for metainformation.

There's some risk to rewriting the mark stack logic now; we'll have to see how bad it gets once we have a draft of the code.
No functional changes, just refactoring.
Attachment #510747 - Flags: review?(treilly)
Attachment #510747 - Flags: review?(treilly) → review+
Another nice refactoring would be to have a MarkStack abstract type that completely hides GCWorkItem and GCStack.
(In reply to comment #2)
> Another nice refactoring would be to have a MarkStack abstract type that
> completely hides GCWorkItem and GCStack.

I agree.  I will make that my next refactoring exercise.
changeset: 5904:fe46a6af1480
user:      Lars T Hansen <lhansen@adobe.com>
summary:   For 632453 - Clean up the mark stack: Comments and inline treatment for GCStack (r=treilly)

http://hg.mozilla.org/tamarin-redux/rev/fe46a6af1480
This is functionally equivalent to the old code, and inside GCMarkStack we still rely on GCWorkItem, but nobody gets to see that data type at all.  Instead, pushing and popping are in terms of multi-word items, which is where we want to be going.  The semantics of sentinels are the same as before.

I removed the RECURSIVE_MARK #ifdef; the code was already incorrect (it would mark immediately after pushing a sentinel item).  We can restore that later if it still has utility.

I've asked for a review (rather than feedback) but that doesn't mean I'll land this immediately even if it gets R+ - I may choose to let it mature as I work on cleaning things up further.

BTW, one might think there's a small possible performance drain in the new function GC::MarkTopItem (see comment).  However, one test previously on the hot path inside GC::MarkItem has gone away so overall I'd be surprised if it isn't a wash.  The only thing that could be troublesome is if we get less inlining than before (GC::Mark calls GC::MarkTopItem which calls GC::MarkItem; the middle one would normally be inlined into the first one but I've not checked it).

Finally, one can debate whether the "indices" introduced in the mark stack to point to specific items should be uintptr_t, as I have them, or void*.  I figure it doesn't matter much.
Attachment #510989 - Flags: review?(treilly)
RECURSIVE_MARK was made obsolete by the new heap graph stuff and can be nuked.
Builds on the former patch.  This is an intermediate step; it keeps the GCWorkItem structure internal to the mark stack but completely hides details of encoding and sentinels and so on by providing a set of mark item types and exposing those types through getters, setters, and type sniffers.  Once the implementation of the mark stack can support it a couple of types that are currently conflated (interior pieces of objects and roots that are both marked conservatively) will become distinguishable types.

With this patch some optimization opportunities for the fast path are becoming apparent, but have not been exploited much yet.  Some comments address this.
Attachment #512449 - Flags: review?(treilly)
The mark stack internal representation is now a sequence of words, and there can be multiword items.  These are never split across segments; segments don't have to be full.  There's more debug-mode error checking here too.

Remaining work items:

 - Now that we can have more data types on the stack, start distinguishing
   between object pieces and root pieces.

 - Consider whether the large-object and root protection items should
   be rolled into the object and root pieces: for the root protector there's
   a liability because it needs to find any item on the stack representing
   a piece of itself.  This is brittle.  The alternative to merging the items
   is that we can scan the stack for the correct piece (now that the stack has
   proper type information).

 - Optimization work, vetting.
Attachment #512761 - Flags: review?(treilly)
Depends on: 634890
Attachment #510989 - Attachment is obsolete: true
Attachment #510989 - Flags: review?(treilly)
Attachment #512449 - Attachment is obsolete: true
Attachment #512449 - Flags: review?(treilly)
Attachment #512761 - Attachment is obsolete: true
Attachment #512761 - Flags: review?(treilly)
Also some renaming and other cleanup.
Attached patch Cumulative patchSplinter Review
This patch merges the four preceding patches, for easier review.  It sits on top of the patch for bug #634890 and the base TR version is 5933.
Attachment #513118 - Flags: review?(treilly)
Attached file Performance results
Comparative runs with old and new mark stacks on MacPro and MacBookPro, Release build with configure.py, SDK=10.4.

There are no red flags.  There are the usual outliers (globalvar read and write), and then some strange positive outliers where there should be none (lastIndexOf?); I've not investigated but I'm speculating that the more parsimonious use of memory in the new stack might have altered memory use patterns and we avoided a GC where we ran one before.

Basically the new mark stack ought not change performance.  Read/write traffic to the mark stack should be reduced down but the stack is very hot and should be in the cache when the marker's running.  Changes to the marker structure have removed some conditional branches but they were always easy to predict well.  Performance changes should come from secondary effects, such as memory use patterns.
Comment on attachment 513118 [details] [diff] [review]
Cumulative patch

Didn't find anything critically wrong, assorted nits and
conversational items:

Confusing to be told second bit is end bit and see patterns that don't
follow that, then later on read a note that the second bit isn't used

What's with all the underscores, Push_ etc?  Readability?

I would have gone with MarkStack over GCMarkStack (and MarkStack.h)

GC::Mark with no args should be called something like GC::MarkAll

What's with deadItem?  Avoid null checks?

Would love to see a large object deletion test and a large gcroot
deletion selftest (duh).

Crazy perf results!  Be interesting to see the cache stats for the big
wins.
Attachment #513118 - Flags: review?(treilly) → review+
(In reply to comment #15)
> Confusing to be told second bit is end bit and see patterns that don't
> follow that, then later on read a note that the second bit isn't used

I should probably tidy that up, it feels dated already.

(What do you mean patterns that don't follow that?  Remember that the bit is clear if it is on, set if it is off... to allow 00 to be the "start and end" marker for GCObjects that are just one word.)

> What's with all the underscores, Push_ etc?  Readability?

Yes.  A bit unconventional but it helps me, at least.

> I would have gone with MarkStack over GCMarkStack (and MarkStack.h)

I know, but I figure that kind of refactoring/cleanup should come separately from these functional changes.  And GCMarkStack is what we've had so it doesn't harm us to wait a little longer.  If we're going to do a grand renaming let's do it all at once, after the asymmetric GC bits land.

> GC::Mark with no args should be called something like GC::MarkAll

Hm, maybe, it's always been called Mark.  I will consider that a follow-up item.

> What's with deadItem?  Avoid null checks?

Yes.  It's not strictly necessary because dead items would just end up on as non-GCObject items on the slow path; maybe I've over-analyzed this problem.  The underlying problem is that non-GCObject items are at least two words long.  Thus ClearItemAt() could never be used to clear a GCObject item if we were to overwrite the item with a tagged stack item.  (The NULL check is clearly undesirable on the hot path in Pop_GCObject.)  So I went for the general solution that allows any stack item to be cleared, even though in practice, the only items we currently need to clear are multi-word items anyway.  (The other issue is that a "dead item" slot would be variable-length, which complicates several functions that parse stack items.)

I guess that explanation could go into a comment in the code.

> Would love to see a large object deletion test and a large gcroot
> deletion selftest (duh).

Noted.

> Crazy perf results!  Be interesting to see the cache stats for the big
> wins.

No kidding - and on several architectures, too.
Whiteboard: has-patch
Will file a separate bug for some selftest cleanup.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
changeset: 5949:de5eae0d18f6
user:      Lars T Hansen <lhansen@adobe.com>
summary:   632453 - Clean up the mark stack (r=treilly)

http://hg.mozilla.org/tamarin-redux/rev/de5eae0d18f6
changeset: 6296:9c4b988f4c07
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 648249: cleanup handling of markStackSentinel (r=treilly).

http://hg.mozilla.org/tamarin-redux/rev/9c4b988f4c07
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: