Last Comment Bug 702032 - investigate mega blobs in the cycle collector graph
: investigate mega blobs in the cycle collector graph
Status: RESOLVED FIXED
[Snappy]
:
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: Trunk
: All All
: -- normal with 1 vote (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on: 702036 702263 702452
Blocks: 698919 698629 701878
  Show dependency treegraph
 
Reported: 2011-11-12 11:17 PST by Andrew McCreight [:mccr8]
Modified: 2012-05-20 12:35 PDT (History)
10 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
huge numbers of nsJSEventListeners in the graph (83.95 KB, application/pdf)
2011-11-12 11:23 PST, Andrew McCreight [:mccr8]
no flags Details
one giant XUL-y blob (559.95 KB, application/pdf)
2011-11-14 15:27 PST, Andrew McCreight [:mccr8]
no flags Details

Description Andrew McCreight [:mccr8] 2011-11-12 11:17:40 PST
If the cycle collector graph is split into disconnected subgraphs, then a large percentage of the total size of the graph is a handful of very large subgraphs.  In one extreme example I came across, there are 741 nodes in subgraphs smaller than 220, then one subgraph with 16231 nodes.

These large blobs have two bad effects on cycle collector pause time.  They directly increase pause time by adding nodes to the graph that maybe don't need to be there, and block efforts to make cycle collection more incremental.

The megablob is so common in these CC graphs that I suspect a lot of it does not actually need to be traversed.  The child-skipping code in nsINode::Traverse is a good example of skipping nodes we know are alive.

I've noticed a few oddities already.

- They contain a lot of XUL (such as nsGenericElement (XUL) and nsXULPrototypeNode).  This seems weird to me.  XUL elements are nsGenericElements, and thus subject to nsINode:Traversal skipping.  However, these seem to often have preserved wrappers, which get added as children, and I wonder if that is causing things to get pulled in that don't need to be.  The nsGenericElement traversal code has this comment, and I wonder if it is still accurate:

  // Always need to traverse script objects, so do that before we check
  // if we're uncollectable.
  NS_IMPL_CYCLE_COLLECTION_TRAVERSE_SCRIPT_OBJECTS

  if (!nsINode::Traverse(tmp, cb)) {
    return NS_SUCCESS_INTERRUPTED_TRAVERSE;
  }

I'm not really sure what the deal is, though.  In some measurements I've done, there are thousands of XUL-y nodes that are reachable both from XPConnect roots and purple roots.

- nsJSEventListeners are added to the graph even when they are holding onto things that don't contain anything.  It seems to be fairly common that mTarget is a node in the current document has no children, and the mContext has two children mGlobalObjectRef and mContext that themselves have no children (presumably these grandchildren of the nsJSEventListeners have no children due to some kind of similar "this is current live" thing).  I've seen almost 600 of these kinds of nsJSEventListeners in the graph before, which means an additional 600 things added to the graph that don't really need to be.

There's probably other stuff, but it is hard to investigate these large knots because it is difficult to display them in any coherent fashion with my visualization scripts.
Comment 1 Andrew McCreight [:mccr8] 2011-11-12 11:23:53 PST
Created attachment 574077 [details]
huge numbers of nsJSEventListeners in the graph

This is a fun-looking picture generated from a CC graph while I was browsing around.  I dubbed it the "bird of prey" because it looks like a Klingon spaceship.

The purple nodes are nsJSEventListeners.  The green nodes are form nodes.  Orange, red and yellow nodes are other misc. HTML nodes.  One disclaimer here is that this is from my experiment with non-JS cycle collections, so it is possible that the nsJSContext would actually reach something, but jst said it was a little surprising there were this many nsJSEventListeners in the graph. There are around 500 nsJSEventListeners in this diagram, and every one seems to connect only to nodes that are cut off.
Comment 2 Olli Pettay [:smaug] 2011-11-12 11:32:05 PST
I'm not sure what " and every one seems to connect only to nodes that are cut off." means.

Would it help if nsJSEventListener didn't have strong ref to mTarget?
That is something we should be able to change easily.
Comment 3 Andrew McCreight [:mccr8] 2011-11-12 12:30:28 PST
Sorry, I'll try to be more concrete.  This nsJSEventListener was put in the purple buffer (because there are no other references to it in the graph):
  0x1496cc840 [rc=1] nsJSEventListener
  > 0x1496aea40 mTarget
  > 0x119d40f90 mContext

The listener's mTarget has no children (perhaps because it is part of the current document), and there are no other references to it in the CC graph:
  0x1496aea40 [rc=5] nsGenericElement (xhtml) a

The listener's mContext does have children:
  0x119d40f90 [rc=40] nsJSContext
  > 0x1288aa478 mGlobalObjectRef
  > 0x1003d2a30 mContext

...but these children have no children:
  0x1288aa478 [rc=4] nsGlobalWindow

  0x1003d2a30 [rc=1] JSContext

There are 39 other references to this nsJSContext in the CC graph, and all but one looked like they were from nsJSContexts.

Removing mTarget from the CC graph would fix the immediate problem.  It seems like the nsJSContexts don't really need to be there in this case, but they are shared so it doesn't really matter.
Comment 4 Olli Pettay [:smaug] 2011-11-12 12:36:18 PST
(In reply to Andrew McCreight [:mccr8] from comment #3)
> Removing mTarget from the CC graph would fix the immediate problem. 
If that is the case, try the patch for Bug 702036.
Comment 5 Andrew McCreight [:mccr8] 2011-11-12 16:25:08 PST
(In reply to Andrew McCreight [:mccr8] from comment #0)
>   // Always need to traverse script objects, so do that before we check
>   // if we're uncollectable.
>   NS_IMPL_CYCLE_COLLECTION_TRAVERSE_SCRIPT_OBJECTS
> 
>   if (!nsINode::Traverse(tmp, cb)) {
>     return NS_SUCCESS_INTERRUPTED_TRAVERSE;
>   }

I tried putting the script traversal after the interrupt, and it didn't do anything really noticeable to the graph, so I'll have to think of something else to try.
Comment 6 Olli Pettay [:smaug] 2011-11-14 00:08:40 PST
One thing is that NS_SUCCESS_INTERRUPTED_TRAVERSE isn't as effective as it should be.
We should change it to something which actually removes the object from suspected object set.
IIRC peterv had a patch to do that or something similar.
Comment 7 Andrew McCreight [:mccr8] 2011-11-14 07:03:25 PST
That is a little tricky, because the CC uses a breadth-first traversal.  A node gets added to the graph, then only later gets traversed, so it is already embedded amongst other nodes so it can't be removed easily.
Comment 8 Andrew McCreight [:mccr8] 2011-11-14 08:35:57 PST
(In reply to Andrew McCreight [:mccr8] from comment #0)
>   // Always need to traverse script objects, so do that before we check
>   // if we're uncollectable.
>   NS_IMPL_CYCLE_COLLECTION_TRAVERSE_SCRIPT_OBJECTS

Peter and I concluded that this comment is still accurate.  If we don't trace through every XPCOM->JS edge, then it is possible that the CC will see only non-rooted path through JS to an object x (and thus conclude that x is garbage), when there is a rooted path that was skipped.
Comment 9 Andrew McCreight [:mccr8] 2011-11-14 15:27:13 PST
Created attachment 574452 [details]
one giant XUL-y blob

I found some new layout settings online that is actually able to render these giant graphs in a somewhat reasonable way.  The green nodes are XUL elements, pink is nsXULProtoTypeNode.  The orange ones are nsDocuments that where their children aren't being clipped by the nsCCUncollectableMarker optimization.  There's a lot of XBL in there, too.

This graph is a little clipped off.  I removed all JS nodes, nsJSEventListeners, and nodes with no children, to try to get to the root cause of the giant blobs.

Smaug, do you see anything here that looks like we shouldn't traverse it at every CC?
Comment 10 Andrew McCreight [:mccr8] 2011-11-14 17:23:25 PST
Another thing I'll try looking at is objects that show up in every CC until shutdown.  This is kind of a crude measure, but it may indicate missing opportunities for cutting things from the CC graph.  When I ran it on a series of 9 edge logs I have, almost 8000 nodes were seen in every graph.  To try to avoid the case where an object is freed, then another is reallocated into the same spot between CCs, I also check the label of the object.
Comment 11 Andrew McCreight [:mccr8] 2012-02-16 12:03:09 PST
Smaug's CC patches have gotten rid of the worst of the mega blobs in common usage, and I think we've moved from investigating these to fixing concrete examples of it, so I'm going to mark this as fixed.

Note You need to log in before you can comment on or make changes to this bug.