Closed Bug 632453 Opened 13 years ago Closed 13 years ago

Clean up the mark stack

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

Q3 11 - Serrano

People

(Reporter: lhansen, Assigned: lhansen)

References

Details

(Whiteboard: has-patch)

Attachments

(7 files, 3 obsolete files)

Comments and inline treatment for GCStack 13 years ago Lars T Hansen 20.91 KB, patch	treilly : review+	Details \| Diff \| Splinter Review
Step 1: Hide GCWorkItem inside GCMarkStack 13 years ago Lars T Hansen 79.83 KB, patch		Details \| Diff \| Splinter Review
Step 2: Turn the mark stack into a proper ADT 13 years ago Lars T Hansen 71.92 KB, patch		Details \| Diff \| Splinter Review
Step 3: Change the representation of the mark stack 13 years ago Lars T Hansen 57.70 KB, patch		Details \| Diff \| Splinter Review
Step 1: Hide GCWorkItem inside GCMarkStack 13 years ago Lars T Hansen 79.83 KB, patch		Details \| Diff \| Splinter Review
Step 2: Turn the mark stack into a proper ADT 13 years ago Lars T Hansen 71.92 KB, patch		Details \| Diff \| Splinter Review
Step 3: Change the representation of the mark stack 13 years ago Lars T Hansen 57.72 KB, patch		Details \| Diff \| Splinter Review
Step 4: Distinguish large-object and root chunks; tuning 13 years ago Lars T Hansen 60.34 KB, patch		Details \| Diff \| Splinter Review
Cumulative patch 13 years ago Lars T Hansen 110.87 KB, patch	treilly : review+	Details \| Diff \| Splinter Review
Performance results 13 years ago Lars T Hansen 23.05 KB, text/plain		Details

Lars T Hansen

Assignee

Description

•

13 years ago

The mark stack is becoming increasingly baroque and inefficient for our purposes.  On the one hand the size field is not used by exactly marked objects (almost all of them) and pushing and popping it is a waste of effort; on the other hand it is probable that the system for exactly scanning GCRoots is going to require more information bits about the mark item to be kept in the item than there is space for.

Also, the stack code needs the inline treatment.

A sketch for a plausible design:

Mark stack items become variable length.  The two low bits contain information about the item.  The low bit is the "last word" bit (if zero this is the last word of the item); the high bit is the "continuation bit" (if zero this is the first word of the item).  That gives eg the stack flushing logic enough to work with.

Specifically,

  00   normal exactly-traced mark item
  01   start of special mark item: high bits contain information, not pointer
  10   last word of some mark item
  11   intermediate word of some mark item

Thus sentinels (which we'll still need) and roots will still be two words but the bulk of the traffic will be one-word items.  We'll make it possible to describe mark items easily because we'll have at least 30 bits available for metainformation.

There's some risk to rewriting the mark stack logic now; we'll have to see how bad it gets once we have a draft of the code.

Lars T Hansen

Assignee

Comment 1

•

13 years ago

Attached patch Comments and inline treatment for GCStack — Details — Splinter Review

No functional changes, just refactoring.

Attachment #510747 - Flags: review?(treilly)

Tommy Reilly

Updated

•

13 years ago

Attachment #510747 - Flags: review?(treilly) → review+

Tommy Reilly

Comment 2

•

13 years ago

Another nice refactoring would be to have a MarkStack abstract type that completely hides GCWorkItem and GCStack.

Lars T Hansen

Assignee

Comment 3

•

13 years ago

(In reply to comment #2)
> Another nice refactoring would be to have a MarkStack abstract type that
> completely hides GCWorkItem and GCStack.

I agree.  I will make that my next refactoring exercise.

Tamarin Bot

Comment 4

•

13 years ago

changeset: 5904:fe46a6af1480
user:      Lars T Hansen <lhansen@adobe.com>
summary:   For 632453 - Clean up the mark stack: Comments and inline treatment for GCStack (r=treilly)

http://hg.mozilla.org/tamarin-redux/rev/fe46a6af1480

Lars T Hansen

Assignee

Comment 5

•

13 years ago

Attached patch Step 1: Hide GCWorkItem inside GCMarkStack (obsolete) — Details — Splinter Review

This is functionally equivalent to the old code, and inside GCMarkStack we still rely on GCWorkItem, but nobody gets to see that data type at all.  Instead, pushing and popping are in terms of multi-word items, which is where we want to be going.  The semantics of sentinels are the same as before.

I removed the RECURSIVE_MARK #ifdef; the code was already incorrect (it would mark immediately after pushing a sentinel item).  We can restore that later if it still has utility.

I've asked for a review (rather than feedback) but that doesn't mean I'll land this immediately even if it gets R+ - I may choose to let it mature as I work on cleaning things up further.

BTW, one might think there's a small possible performance drain in the new function GC::MarkTopItem (see comment).  However, one test previously on the hot path inside GC::MarkItem has gone away so overall I'd be surprised if it isn't a wash.  The only thing that could be troublesome is if we get less inlining than before (GC::Mark calls GC::MarkTopItem which calls GC::MarkItem; the middle one would normally be inlined into the first one but I've not checked it).

Finally, one can debate whether the "indices" introduced in the mark stack to point to specific items should be uintptr_t, as I have them, or void*.  I figure it doesn't matter much.

Attachment #510989 - Flags: review?(treilly)

Tommy Reilly

Comment 6

•

13 years ago

RECURSIVE_MARK was made obsolete by the new heap graph stuff and can be nuked.

Lars T Hansen

Assignee

Comment 7

•

13 years ago

Attached patch Step 2: Turn the mark stack into a proper ADT (obsolete) — Details — Splinter Review

Builds on the former patch.  This is an intermediate step; it keeps the GCWorkItem structure internal to the mark stack but completely hides details of encoding and sentinels and so on by providing a set of mark item types and exposing those types through getters, setters, and type sniffers.  Once the implementation of the mark stack can support it a couple of types that are currently conflated (interior pieces of objects and roots that are both marked conservatively) will become distinguishable types.

With this patch some optimization opportunities for the fast path are becoming apparent, but have not been exploited much yet.  Some comments address this.

Attachment #512449 - Flags: review?(treilly)

Lars T Hansen

Assignee

Comment 8

•

13 years ago

Attached patch Step 3: Change the representation of the mark stack (obsolete) — Details — Splinter Review

The mark stack internal representation is now a sequence of words, and there can be multiword items.  These are never split across segments; segments don't have to be full.  There's more debug-mode error checking here too.

Remaining work items:

 - Now that we can have more data types on the stack, start distinguishing
   between object pieces and root pieces.

 - Consider whether the large-object and root protection items should
   be rolled into the object and root pieces: for the root protector there's
   a liability because it needs to find any item on the stack representing
   a piece of itself.  This is brittle.  The alternative to merging the items
   is that we can scan the stack for the correct piece (now that the stack has
   proper type information).

 - Optimization work, vetting.

Attachment #512761 - Flags: review?(treilly)

Lars T Hansen

Assignee

Updated

•

13 years ago

Depends on: 634890

Lars T Hansen

Assignee

Updated

•

13 years ago

Attachment #510989 - Attachment is obsolete: true

Attachment #510989 - Flags: review?(treilly)

Lars T Hansen

Assignee

Updated

•

13 years ago

Attachment #512449 - Attachment is obsolete: true

Attachment #512449 - Flags: review?(treilly)

Lars T Hansen

Assignee

Updated

•

13 years ago

Attachment #512761 - Attachment is obsolete: true

Attachment #512761 - Flags: review?(treilly)

Lars T Hansen

Assignee

Comment 9

•

13 years ago

Attached patch Step 1: Hide GCWorkItem inside GCMarkStack — Details — Splinter Review

Lars T Hansen

Assignee

Comment 10

•

13 years ago

Attached patch Step 2: Turn the mark stack into a proper ADT — Details — Splinter Review

Lars T Hansen

Assignee

Comment 11

•

13 years ago

Attached patch Step 3: Change the representation of the mark stack — Details — Splinter Review

Lars T Hansen

Assignee

Comment 12

•

13 years ago

Attached patch Step 4: Distinguish large-object and root chunks; tuning — Details — Splinter Review

Also some renaming and other cleanup.

Lars T Hansen

Assignee

Comment 13

•

13 years ago

Attached patch Cumulative patch — Details — Splinter Review

This patch merges the four preceding patches, for easier review.  It sits on top of the patch for bug #634890 and the base TR version is 5933.

Attachment #513118 - Flags: review?(treilly)

Lars T Hansen

Assignee

Comment 14

•

13 years ago

Attached file Performance results — Details

Comparative runs with old and new mark stacks on MacPro and MacBookPro, Release build with configure.py, SDK=10.4.

There are no red flags.  There are the usual outliers (globalvar read and write), and then some strange positive outliers where there should be none (lastIndexOf?); I've not investigated but I'm speculating that the more parsimonious use of memory in the new stack might have altered memory use patterns and we avoided a GC where we ran one before.

Basically the new mark stack ought not change performance.  Read/write traffic to the mark stack should be reduced down but the stack is very hot and should be in the cache when the marker's running.  Changes to the marker structure have removed some conditional branches but they were always easy to predict well.  Performance changes should come from secondary effects, such as memory use patterns.

Tommy Reilly

Comment 15

•

13 years ago

Comment on attachment 513118 [details] [diff] [review]
Cumulative patch

Didn't find anything critically wrong, assorted nits and
conversational items:

Confusing to be told second bit is end bit and see patterns that don't
follow that, then later on read a note that the second bit isn't used

What's with all the underscores, Push_ etc?  Readability?

I would have gone with MarkStack over GCMarkStack (and MarkStack.h)

GC::Mark with no args should be called something like GC::MarkAll

What's with deadItem?  Avoid null checks?

Would love to see a large object deletion test and a large gcroot
deletion selftest (duh).

Crazy perf results!  Be interesting to see the cache stats for the big
wins.

Attachment #513118 - Flags: review?(treilly) → review+

Lars T Hansen

Assignee

Comment 16

•

13 years ago

(In reply to comment #15)
> Confusing to be told second bit is end bit and see patterns that don't
> follow that, then later on read a note that the second bit isn't used

I should probably tidy that up, it feels dated already.

(What do you mean patterns that don't follow that?  Remember that the bit is clear if it is on, set if it is off... to allow 00 to be the "start and end" marker for GCObjects that are just one word.)

> What's with all the underscores, Push_ etc?  Readability?

Yes.  A bit unconventional but it helps me, at least.

> I would have gone with MarkStack over GCMarkStack (and MarkStack.h)

I know, but I figure that kind of refactoring/cleanup should come separately from these functional changes.  And GCMarkStack is what we've had so it doesn't harm us to wait a little longer.  If we're going to do a grand renaming let's do it all at once, after the asymmetric GC bits land.

> GC::Mark with no args should be called something like GC::MarkAll

Hm, maybe, it's always been called Mark.  I will consider that a follow-up item.

> What's with deadItem?  Avoid null checks?

Yes.  It's not strictly necessary because dead items would just end up on as non-GCObject items on the slow path; maybe I've over-analyzed this problem.  The underlying problem is that non-GCObject items are at least two words long.  Thus ClearItemAt() could never be used to clear a GCObject item if we were to overwrite the item with a tagged stack item.  (The NULL check is clearly undesirable on the hot path in Pop_GCObject.)  So I went for the general solution that allows any stack item to be cleared, even though in practice, the only items we currently need to clear are multi-word items anyway.  (The other issue is that a "dead item" slot would be variable-length, which complicates several functions that parse stack items.)

I guess that explanation could go into a comment in the code.

> Would love to see a large object deletion test and a large gcroot
> deletion selftest (duh).

Noted.

> Crazy perf results!  Be interesting to see the cache stats for the big
> wins.

No kidding - and on several architectures, too.

Lars T Hansen

Assignee

Updated

•

13 years ago

Whiteboard: has-patch

Lars T Hansen

Assignee

Comment 17

•

13 years ago

Will file a separate bug for some selftest cleanup.

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Tamarin Bot

Comment 18

•

13 years ago

changeset: 5949:de5eae0d18f6
user:      Lars T Hansen <lhansen@adobe.com>
summary:   632453 - Clean up the mark stack (r=treilly)

http://hg.mozilla.org/tamarin-redux/rev/de5eae0d18f6

Tamarin Bot

Comment 19

•

13 years ago

changeset: 6296:9c4b988f4c07
user:      Felix S Klock II <fklockii@adobe.com>
summary:   Bug 648249: cleanup handling of markStackSentinel (r=treilly).

http://hg.mozilla.org/tamarin-redux/rev/9c4b988f4c07

You need to log in before you can comment on or make changes to this bug.