Last Comment Bug 658386 - eliminate mLastChild field from PtrInfo
: eliminate mLastChild field from PtrInfo
Status: RESOLVED FIXED
[MemShrink:P2]
:
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: Other Branch
: All All
: -- normal (vote)
: mozilla7
Assigned To: Andrew McCreight [:mccr8]
:
Mentors:
Depends on:
Blocks: 638299 mslim-fx5+
  Show dependency treegraph
 
Reported: 2011-05-19 14:15 PDT by Andrew McCreight [:mccr8]
Modified: 2011-06-13 21:38 PDT (History)
10 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
first attempt at a patch (8.43 KB, patch)
2011-05-19 14:39 PDT, Andrew McCreight [:mccr8]
no flags Details | Diff | Review
eliminate mLastChild from PtrInfo (9.32 KB, patch)
2011-05-22 12:20 PDT, Andrew McCreight [:mccr8]
no flags Details | Diff | Review
eliminate mLastChild from PtrInfo (9.32 KB, patch)
2011-05-24 14:18 PDT, Andrew McCreight [:mccr8]
peterv: review+
Details | Diff | Review
remove redundant stores of child pointers (3.41 KB, patch)
2011-05-24 14:26 PDT, Andrew McCreight [:mccr8]
peterv: review+
Details | Diff | Review
part 1: eliminate mLastChild from PtrInfo (9.34 KB, patch)
2011-06-10 09:40 PDT, Andrew McCreight [:mccr8]
continuation: review+
Details | Diff | Review
part 2: eliminate redundant stores of PtrInfo child pointers (3.05 KB, patch)
2011-06-10 09:43 PDT, Andrew McCreight [:mccr8]
continuation: review+
Details | Diff | Review

Description Andrew McCreight [:mccr8] 2011-05-19 14:15:22 PDT
Andreas pointed out to me that the mLastChild field isn't really needed in PtrInfo.  Instead, you just look at the mFirstChild field of the next node.  Indeed, this appears to be the standard representation of static graphs.

This saves one pointer per PtrInfo.  When running the MemBuster benchmark, there are about 100k to 300k nodes per cycle collection, so this could save a couple of megs of memory.

Andreas had a patch for this in Bug 548733 a year ago, but that also messes around with EdgePool, I think to lay the groundwork to support insertion (to support WeakMaps, but I guess it wasn't needed in the end?), but I think a much simpler patch can eliminate mLastChild.  I'm currently running a patch on try, I'll clean it up and attach it.
Comment 1 Andrew McCreight [:mccr8] 2011-05-19 14:22:42 PDT
Separately, I think we should consider reducing BlockSize for the NodePool.  Currently there are 32k nodes allocated at a time.  With the total numbers of nodes I'm seeing, there could be 10% to 25% internal fragmentation, which seems like quite a bit.  My guess is that the numbers weren't tuned after the removal of marked JS objects, which eliminates a huge number of nodes.

I'm also not really clear on why we have both mInternalRefs and mRefCount.  I feel like we could steal 2 more bits from the count to distinguish GC and RC objects, then just keep the ref count.  This would save another 4 bytes per node.  It might make some logging a little less informative, but it doesn't look too bad.
Comment 2 Andrew McCreight [:mccr8] 2011-05-19 14:39:03 PDT
Created attachment 533794 [details] [diff] [review]
first attempt at a patch

Store mLastChild in mFirstChild of the next node.  Add an extra node to each block to handle this.  I added getter/setter methods from LastChild, so it can be accessed mostly as before.  I should probably add similar getter/setters for FirstChild so it doesn't look as asymmetric.  I'm also not sure if I should put explicit inline annotations on the new methods or not.
Comment 3 Nicholas Nethercote [:njn] 2011-05-19 16:14:35 PDT
(In reply to comment #1)
> Separately, I think we should consider reducing BlockSize for the NodePool. 
> Currently there are 32k nodes allocated at a time.

I've seen that come up in profiles numerous times.  It did seem like surprisingly large chunk of memory and I wondered if it could be improved... custom allocators are usually suspect in my book.
Comment 4 Andrew McCreight [:mccr8] 2011-05-19 16:54:58 PDT
The patch breaks on the same Moth test on 3 platforms.  I'll have to look into it.
http://tbpl.mozilla.org/?tree=Try&rev=b7df081948ab
Comment 5 Andreas Gal :gal 2011-05-20 12:14:02 PDT
++andrew, thanks for looking into this
Comment 6 Andrew McCreight [:mccr8] 2011-05-22 12:20:44 PDT
Created attachment 534315 [details] [diff] [review]
eliminate mLastChild from PtrInfo

Eliminates the mLastChild pointer from PtrInfo, which will reduce space usage.  The mFirstChild field is made private, and getters and setters are added for the first and last child.

One unfortunate aspect of this implementation is that we are still doing just as many stores as before: we set both the first and last child for every node, while we only need to set one of them, except at the end of a block.  In a conventional implementation of a static graph, all nodes are allocated in a single block, so you can just have a FinishNodes() method that fills out the single dummy node.  In the Fx implementation, there are multiple blocks, which requires one dummy node per block.  Probably the best way to fix this would be to add a reference to the EdgePool to the NodePool builder, and then set the last child of a block in NodePool::Builder::Add when you move to the next block.  I'll try to think about whether the added complexity is worth it or not.  I don't think the extra stores are going to cost any additional cache misses, because the same address basically gets the same value written to it twice in a row.

I did some try runs with this patch.  After removing DEBUG_CC, I had a clean run except for OSX Debug Moth which was having intermittent oranges that showed up on somebody else's TM push, but after updating to the latest version, that Moth ran twice without any orange.
Comment 7 Andrew McCreight [:mccr8] 2011-05-24 14:18:19 PDT
Created attachment 534882 [details] [diff] [review]
eliminate mLastChild from PtrInfo

Should just be some whitespace cleanup compared to the previous version of this patch.
Comment 8 Andrew McCreight [:mccr8] 2011-05-24 14:26:16 PDT
Created attachment 534885 [details] [diff] [review]
remove redundant stores of child pointers

Only set the child of the next node if we are as far as we are going to go in the current block.  This will be the case if we are at the end of the current block (which required adding a new predicate AtBlockEnd for NodePool enumerators), or if there are no more items in the queue.

This change is a bit awkward, because GCGraphBuilder doesn't know where we are in the NodePool, and nsCycleCollector doesn't know where we are in the EdgePool.  To solve this, I test in nsCycleCollector if we should store the last child, then pass the result of the test into Traverse.  There's a fair amount of redundant testing here between GetNext, AtBlockEnd, and the two calls to IsDone, but because it is all inlined hopefully the compiler can do something smart with it.

Part 1 can be used with or without this, depending on whether trading a redundant store to a location in the cache for (possibly) a few comparisons is worth the little bit of complexity.  I'll put this up for review if it passes a single platform try run.
Comment 9 Andrew McCreight [:mccr8] 2011-05-24 22:29:44 PDT
Comment on attachment 534885 [details] [diff] [review]
remove redundant stores of child pointers

Try run on this patch came back green for Linux64 opt and debug.
Comment 10 Peter Van der Beken [:peterv] 2011-06-09 06:16:49 PDT
Comment on attachment 534882 [details] [diff] [review]
eliminate mLastChild from PtrInfo

Review of attachment 534882 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsCycleCollector.cpp
@@ +531,5 @@
> +
> +    // this PtrInfo must be part of a NodePool
> +    EdgePool::Iterator LastChild()
> +    {
> +        return (this+1)->mFirstChild;

Spaces around +.

@@ +542,5 @@
> +
> +    // this PtrInfo must be part of a NodePool
> +    void SetLastChild(EdgePool::Iterator aLastChild)
> +    {
> +        (this+1)->mFirstChild = aLastChild;

Spaces around +.
Comment 11 Peter Van der Beken [:peterv] 2011-06-09 06:34:04 PDT
Comment on attachment 534885 [details] [diff] [review]
remove redundant stores of child pointers

Review of attachment 534885 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsCycleCollector.cpp
@@ +1456,5 @@
>      {
>          return AddNode(s, aParticipant);
>      }
>  #endif
> +    void Traverse(PtrInfo* aPtrInfo, PRBool blockFinalNode);

Arguments are normally prefixed with 'a' in this file, so aBlockFinalNode

@@ +1804,1 @@
>      }

Could avoid doing the queue.IsDone() check for every call to Traverse if you you add a function to the graph builder that does: |mCurrPi->SetLastChild(mEdgeBuilder.Mark());|. Call it in the while-loop if queue.AtBlockEnd() and once after the while-loop if mGraph.mRootCount > 0. Up to you.
Comment 12 Andrew McCreight [:mccr8] 2011-06-09 12:26:56 PDT
In theory the compiler should be able to merge those redundant queue.IsDone() checks, but I'll try refactoring it a bit and see how it looks, thanks.
Comment 13 Andrew McCreight [:mccr8] 2011-06-09 15:31:46 PDT
That actually looks pretty nice.  I'll tryserver it and see how it goes.
>    NodePool::Enumerator queue(mGraph.mNodes);
>    while (!queue.IsDone()) {
>        PtrInfo *pi = queue.GetNext();
>        builder.Traverse(pi);
>        if (queue.AtBlockEnd())
>            builder.SetLastChild();
>    }
>    if (mGraph.mRootCount > 0)
>        builder.SetLastChild();
Comment 14 Andrew McCreight [:mccr8] 2011-06-10 09:40:29 PDT
Created attachment 538540 [details] [diff] [review]
part 1: eliminate mLastChild from PtrInfo

Addressed peterv's comments.
Comment 15 Andrew McCreight [:mccr8] 2011-06-10 09:43:31 PDT
Created attachment 538542 [details] [diff] [review]
part 2: eliminate redundant stores of PtrInfo child pointers

Addressed peterv's comments.

Carrying forward his r+ on both of these patches.  I tried them out on try and it came out okay.
Comment 17 Doug Turner (:dougt) 2011-06-13 20:55:23 PDT
what turned out to be the actual savings by this patch?
Comment 18 Andrew McCreight [:mccr8] 2011-06-13 21:38:07 PDT
It saves 1 pointer per node in the CC graph.  With a few tabs open, there's around 15k to 44k nodes in the graph.  With MemBuster, there are 100k to 300k nodes.  The CC graph only exists for the duration of the cycle collection, so it doesn't exist for that long.

Note You need to log in before you can comment on or make changes to this bug.