Closed Bug 653191 Opened 13 years ago Closed 11 years ago

merge JS objects in strongly connected components when building cycle collector graph


(Core :: XPCOM, defect)

Other Branch
Not set





(Reporter: mccr8, Assigned: mccr8)


(Blocks 1 open bug)


(Whiteboard: [Snappy:P2])


(9 files, 2 obsolete files)

3.78 KB, text/plain
80.51 KB, application/pdf
21.65 KB, patch
Details | Diff | Splinter Review
12.71 KB, patch
Details | Diff | Splinter Review
12.85 KB, patch
Details | Diff | Splinter Review
9.28 KB, patch
Details | Diff | Splinter Review
15.31 KB, patch
Details | Diff | Splinter Review
1.63 KB, patch
Details | Diff | Splinter Review
2.22 KB, patch
Details | Diff | Splinter Review
It is common for more than half of a cycle collector graph to be JS objects.  This is unfortunate because the CC is only interested in JS objects to the extent that they are connected to DOM objects, and it will never free any JS objects.  The JS heap is also very loopy (though it will be somewhat less so after the removal of the parent link), which will impede the effectiveness of the dynamic detection of acyclic objects.

Fortunately, I believe that strongly connected components including only JS nodes can be safely merged into a single node in the CC graph.  This will not improve the initial building of the graph, but will reduce memory usage and speed up subsequent traversals.  I have done some experiments where a simple version of this optimization (merge two JS nodes if they point to each other) was able to eliminate about half of the nodes in the CC graph.  If we also prune out self-loops on JS objects (which will be created by this algorithm), this should also expose further opportunities for pruning acyclic objects.

We cannot easily eliminate DOM nodes from the cycle collector graph due to imperfect unlinking.  (Though possibly we could maintain a separate list of dependent DOM nodes for nodes in the CC graph, that must be freed if the corresponding node in the CC graph is freed.)

In terms of implementation, algorithms for finding SCCs work by performing a DFS of the graph.  Eyeballing the Cheriyan–Mehlhorn/Gabow algorithm, it seems like we should be able to roll this into the construction of the CC graph, if it is transformed from a BFS to a DFS.  It will require some modification to compute SCCs that only contain JS objects, but hopefully that won't be too bad.

The basic merging algorithm works like this: if nodes A and B are both in a JS-only SCC, merge B into A: all edges that pointed to B, now point to A.  All edges that started at B now start from A.  If B is marked, then mark A.  Delete B and all of its edges from the graph.  (The part about marking is something I didn't think about until I tried to prove that the merging is sound.)  Performing this repeatedly will collapse an entire JS-only SCC into a single node.
Depends on: 653019
Blocks: 638299
"Theorem: Carrying out the CC algorithm on the transformed graph
produces the same set of garbage nodes as carrying out the CC
algorithm on the transformed graph, minus B."
That should be "as carrying out the CC algorithm on the original graph".
I rigged up my Python script to use the Cheriyan–Mehlhorn/Gabow algorithm to find SCCs, merging together JS nodes in an SCC in a way that may miss things if there are DOM nodes also in the SCC.  Even with that slightly crude approach, it gets rid of about 75% of the total nodes in the CC graph.  That's more than 90% of the JS nodes in the graph.  I should model the effect of eliminating the parent edge on this pruning...
Improving the algorithm so that it always merges JS nodes in an SCC eliminates only a tiny number of additional nodes (<1%).

To figure out why, I did some analysis of the strongly connected components the algorithm finds in the CC graph.  The data is in the following format: number of SCCs / total number of nodes / description

570 / 120747 / mix of ref counted and JS
141 / 1274 / pure ref counted
302 / 2278 / pure JS
168 / 1357 / one or more garbage objects

As you can see, most of the SCCs are mixed, which doesn't really square with my finding that a heuristic grabs most of the loops.  I did a further analysis of the mixed SCCs, and it turns out that almost all of the large SCCs (with 10 or more nodes) are 99% JS nodes, so most of the time you'll end up merging JS nodes.

The numbers look a little funny to me, though.  There are more than 10 SCCs that have exactly 788 nodes in them.  Why that many?  I'm not sure if my SCC implementation is bugged, or this is some weird artifact of the way things are set up in FF.
It looks like condensing only the GCed nodes in a SCC will be difficult to do efficiently, as you can't do it in a single pass.  If the SCC stack looks like GRG, where G is a GCed object and R is an RCed object, then you have to start allocating the mega GC node, then you hit R, but the GC node isn't done yet, so you have to come back to R later.  This is especially bad because these SCCs are 99% JS, so your pass that allocates RCed objects will spend most of the time scanning over nothing.

Alternatively, I think we can condense all nodes together, but then add little stub nodes to the condensed nodes to hold the DOM nodes, which must be kept around because of imperfect unlinking.  How this would work is that all of the nodes are condensed into a single node, taking up all edges in the SCC, globbing together the ref counts appropriately.  Additionally, for each RC node in the SCC allocate a new node with no children.  The condensed node points to all of these new nodes.  If the cycle collector marks the condensed node white, it will also mark all of these nodes white, so that should be okay.  A later possible optimization would be to store pointers to these stub nodes in the condensed node itself.
I rigged up an implementation of this in Python that does condensing of SCCs that is aware of the types of the nodes in the SCC.

- All GC nodes in an SCC are merged into a single node.  If any GC node in the SCC is a root, the merged node is a root.
- RC nodes in an SCC are merged into a second node, which is doubly linked with the GC merge node.  This allows us to count up the number of self-edges in the SCC that point to RC nodes rather than GC nodes, for the purpose of updating the ref count for the node.
- For each RC node beyond the first, we create a dummy node hanging off of the RC node, with no children.  This is to deal with imperfect unlinking.  No nodes aside from the RC master node will point to these nodes.  (This turns death stars into plain old stars.)

All edges to a GC node in the SCC will end up pointing at the GC merge node, and all edges to a RC node in the SCC will end up pointing at the RC merge node.

I've tried it on a few graphs and it seems to get rid of a lot of nodes.  However, these are WANT_ALL_TRACES graphs, and thus much heavy on the JS than usual graphs.  The entire heap (aside from death stars) seems to be merged into a single connected blob by this.  I'll have to generate some edge files without WANT_ALL_TRACES.

One factor I didn't anticipate is the massive growth in the number of multiple edges.  I do clear out all self-edges that are created from condensing SCCs, but if you have one SCC that has a bunch of edges to another SCC, then when you condense both you get a ton of edges from the condensed node to the other.  This makes the heap graph look like the planet Saturn's family reunion.  In my example heaps, I saw up to 3300 or so duplicated edges between two nodes!  A more typical number is 10, but the overall increase in duplicated edges is still pretty severe.  In previous experiments on unaltered graphs, only a couple percent of edges were duplicated, but after condensation around 10% of edges are duplicated.

These edges aren't making the graph worse than it was before the condensation, but it would be nice if we could eliminate some of them.  Though I'll have to reassess how bad things are without WANT_ALL_TRACES.
Removing parent edges causes a fairly dramatic effect on JS-heavy graphs (without WANT_ALL_TRACES, so matching the amount of JS the CC would actually see).  With parent edges, SCC condensation removes 65% of the nodes in the graph.  Without parent edges, it only removes 11%.  My hope is still that the reduced effectiveness of SCC optimizations will be counteracted by increased effectiveness of ayclic optimizations.
I wrote a little function to determine how many acyclic GCed nodes there are in the graph.  That is to say, JS nodes whose only children are acyclic JS nodes.  These can be safely pruned from the graph because we never remove JS nodes in the CC.  Acyclic DOM nodes cannot be safely removed due to imperfect unlinking.

I did a little study of the interaction of acyclicity and SCC condensation in graphs with and without parent nodes.  This graph has 46k nodes.  First, I got the count of acyclic nodes (without removing any), then I ran SCC condensation to see how many that removed, then I ran the acyclic counter again on the condensed graph.

With parent edges:
1149 (#acyclic) -> 28944 (#condensed) -> 1149 (#acyclic)

As you can see, there are not many acyclic nodes, either before or after condensation.  The SCC condensation captures over 60% of nodes.  The total number of nodes that would be removed by condensation followed by acyclic pruning is 30093.

Look at what happens when the parent edges are removed:
1620 (#acyc) -> 5142 (#condensed) -> 24643 (#acyc)

Even after removing the parent edges, the acyclic pruner is still fairly pathetic.  It removes 50% more nodes than before, but that is still only about 3.5% of nodes.  Removing the parent edge apparently ruined the condenser, which now only eliminates about 20% of the nodes it removed before.  But, the condenser now opens up huge opportunities for acyclic pruning, so the total of condensing plus acyclic pruning is 29785, which is almost as many nodes removed as with the parent edge. (60%+ of nodes)

Thus while the condenser alone is fairly worthless without the parent edge, with acyclic pruning it seems to be worthwhile.

I was hoping that in a parent-edge-free world pruning acyclic nodes present in the pre-condensed graph would get most of the work done, but it does not appear so.  Now I just have to figure out how to incorporate acyclic pruning into the SCC algorithm.
I just realized a pretty cute property of the algorithm.  At the point in the algorithm that it finds an SCC, the only committed edges in the graph that point to any member of the SCC are in the SCC itself.  Thus the internal count we discover when condensing the SCC comes entirely from the SCC itself.  Thus, if at the time we build the SCC the internal count is equal to the ref count, the entire SCC is garbage (assuming the SCC does not include a marked JS object).  This is especially cute because it will catch garbage death stars before we leave the first phase.

Unfortunately, the current algorithm I have actually commits nodes and edges during the calculation of the ref count, but if the edge and node pool data structures were made slightly more sophisticated, by turning them into buffered stacks where we could get rid of the last X allocations on them, then we could peel off the entire last SCC.  To deal with imperfect unlinking, we'd have to push all the DOM nodes in the SCC onto a stack of white nodes to be unlinked later.  (Or perhaps they could be unlinked right away.)

There are not that many of these "dead on arrival" SCCs in one graph I looked at (only 69), but it is something to keep in mind in the future.
About 3-6% of nodes in the shrunken graph are in DoA SCCs, though that may include some double counting of acyclic nodes.  Not a huge percent, but they might be worth pruning if it can be done cheaply.
I figured out a way to modify the algorithm to eliminate acyclic GCed nodes while condensing SCCs.

In a mixed SCC (containing both RC and GC nodes), we need to forward the nodes before we forward the edges, to ensure that we can probably count up the number of internal edges that point to RC nodes, to update the internal count.

In a pure GC SCC, we have to traverse the edges to decide if the condensed SCC will be acyclic or not (the SCC is acyclic iff all edges are either self edges or point to acyclic nodes).  If the SCC is acyclic, we forward all of its nodes to the special acyclic value.  If it isn't, we forward them to a newly allocated node.

Since the behavior depends on the type of the SCC, we want to determine what kind of SCC it is.  This could be easily done by scanning the nodes, but this would require two passes over the nodes in the case of pure GC SCCs, which is bad.  Instead, we track the location of the last RC node in the SCC stack, and the location of each root in the SCC stack.  This requires a little extra space per node, but by comparing these two locations, we can tell if an RC node falls within the current SCC.
I still think this is pretty cool, and graph compression techniques are useful for visualization (I've been trying to make some sense out of some 700,000 node graphs recently), but given that building the graph takes most of the time in cycle collection, doing this doesn't seem to make any sense.
Closed: 12 years ago
Resolution: --- → WONTFIX
Assignee: nobody → continuation
Resolution: WONTFIX → ---
Attachment #612039 - Attachment description: doesn't crash → compute strongly connected components of JS objects, WIP
Attached patch merge together JS SCCs, WIP (obsolete) — Splinter Review
This patch computes SCC of JS objects, and uses that information to condense SCCs in the CC graph.  Seems to work okay, but it has some quadratic behavior I need to get rid of before it can be useful, or tell if there are any speed improvements.  I should also figure out how to prune out JS nodes that don't reach C++.  It shouldn't be that hard to do while computing the SCCs.
Attachment #612039 - Attachment is obsolete: true
Attached file example graph
Here's what the compressed graph looks like.  The description of all JS nodes is "GLOBBED NODE n", where n is the number of nodes in the SCC.  You'll note that by far the most common case is 1.  If you look at the center of the star shape on the right in the middle, there's an SCC with 53 nodes.  With a larger CC graph, hopefully it would get more.  I need to check an unmodified graph to make sure that it is actually gathering up everything it should.  The nsIBlockListService implementation looks a little suspect to me.
Whiteboard: [Snappy]
P2 because maybe this will help speed up CCs with lots of JS.
Whiteboard: [Snappy] → [Snappy:P2]
These patches could be simplified, as I introduce things then remove them later.
Attachment #612426 - Attachment is obsolete: true
Given a lack of any speedup, I'm going to put this on hold for the moment in favor of bug 749784.  I think that all the CC<->XPC<->JS goop contributes a lot to the total time, so this bug won't really help.
I'm getting some assertion failures that make it look like that sometimes an nsGlobalWindow holds onto a freed object, which seems bad.  I'd think that nsGlobalWindow::OnFinalize would clear that out when the object gets freed, but maybe not...
That last comment was actually meant for bug 754495.

Anyways, this didn't demonstrate any speedup, and bug 754495 helps with page close CC times (which are when CC pauses are worst) and is much less complex, so I'll close this again.  Bug 749784 would still be nice to do, but it is intimidating.
Closed: 12 years ago11 years ago
Resolution: --- → WONTFIX
Blocks: 741417
You need to log in before you can comment on or make changes to this bug.