Closed Bug 641243 Opened 14 years ago Closed 11 years ago

cycle collector model heap includes many acyclic nodes

Categories

(Core :: XPCOM, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: mccr8, Unassigned)

References

Details

Attachments

(4 files)

User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0) Gecko/20100101 Firefox/4.0 Build Identifier: The cycle collector only needs to look at nodes that are part of a cycle. Right now, though, a huge percentage of visited nodes are members of tiny acyclic subgraphs. Preventing these nodes from being added to the model heap the CC traverses will improve the speed and memory use of the cycle collector. My plan is to come up with an algorithm that will avoid these nodes, implement it in Python to try it out on the real CC heap data I have extracted, then implement it in the CC itself. Reproducible: Always
Blocks: 638299
acyc(0) is the set of all nodes that don't have any children acyc(k+1) is the set of all nodes whose children are in acyc(k) Note that acyc(k) implies acyc(k+1). This chart shows the size of the acyc(k) sets on a real cycle collector heap snapshot. The x axis is k, and the y axis is the size of the set. The rightmost bar is the number of nodes in the entire graph. The idea is that if we have some kind of DFS lookhead of k steps, we should be able to prevent the cycle collector's inclusion of the objects in set acyc(k). acyc(0) includes 26% of nodes acyc(1) includes 53% of nodes acyc(2) includes 61% of nodes acyc(3) includes 70% of nodes acyc(4) includes 73% of nodes The payoff falls off rapidly after that. Potentially, we should be able to look ahead a couple of steps and eliminate 50-70% of these nodes from further consideration. It is possible that a lot of these little acyclic graphs are never actually participants in cycles, and thus could just be removed, but it seems reasonably likely to me that there are just a lot of classes that are cycle participants only rarely, which requires a dynamic approach.
The basic problem here is that the cycle collector does a BFS, starting from the roots. When a node is enqueued, it is also allocated from the NodePool. At that point, it doesn't really seem possible to free it again. Examining the outgoing edges of that node doesn't happen until far later. One idea I had was to do a bounded DFS from a node that is being visited, to see if all of the connected nodes are acyclic. This will potentially cause extra work for nodes that are part of a cycle, and will make the traversal not a pure BFS any longer, which could make locality worse.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Andrew, do you want to do a quick average distance measurement for nodes in the graph? On both sides C++ and JS I would assume that many neighbors are relatively close to each other, reducing L2 and TLB misses. The Node/Edgelist structures are not really great in general. We could come up with a different representation of the graph that is easier to update (i.e. remove nodes from).
Thanks to some advice from Andreas, I was able to make a small modification to GCGraphBuilder::NoteScriptChild that eliminates almost all marked GC nodes. These never have children, so they don't need to be added to the graph. In one run I did, it eliminated all but 2 of the 4765 marked GC nodes, out of about 70k nodes total. As for those remaining 2, I think the JS runtime pushes marked GC nodes as roots sometimes. It seems to happen very rarely, so it probably isn't a big deal.
While recomputing the acyc numbers, I realized that I wasn't computing them correctly: I was underestimating the size of acyc(0): I was defining it as the set of nodes that are the targets of edges, but not the sources. This excludes all nodes that are neither targets not sources of edges. This should only effect the initial acyc(0), and not change the increases by later iterations. Anyways, with this correction, on my new sample data we get the following numbers: acyc(0) includes 53% of nodes acyc(1) includes 56% of nodes acyc(2) includes 70% of nodes acyc(3) includes 78% of nodes acyc(4) includes 80% of nodes The combination of correcting the acyc(0) case and eliminating a lot of marked JS nodes increased acyc(0) by quite a bit, but seems to have decreased the payoff for looking ahead. The delta for going from 0 to 4 is 27% now, where it was 47% before. And I guess as a % improvement over the 0 case, it is even less. I'm going to try quickly hacking up something for the acyc(0) case and see how things look from there.
Lets file a bug for the patch for comment 4 and make it blocking this bug for tracking purposes.
Depends on: 641910
I write a bit of code for the C++ CC implementation that detects when a node has no outgoing edges. There were some discrepancies with the set of acyc(0) nodes generated by my Python script, but I figured out that this was a combination of the self-loop removal in the Python code and (more importantly) my edge counting code didn't have the same pruning logic that the "real" child visiting code has. Adding that pruning to my counter made the Python and C++ numbers match up. I've decided that I shouldn't really prune self-edges in my Python graph. They can't really be removed lightly, so I should leave them in for now. Removing self-loops doesn't affect the acyc(k) numbers very much. I think these loops are just a special case of tiny loops (as present in the death stars I've noticed earlier). I've looked at some more CC data, and not too surprisingly, if the CC actually is collecting a lot of things (eg there are a lot of cycles), the acyc(k) numbers get a lot lower. In this one case, the CC collects 480 nodes and the graph has acyc(k) numbers of 43/48/69/75/77%. In the next CC, 50007 nodes are collected, and the acyc(k) numbers drop to 14/16/20/21/22%. The latter collection also visits far more nodes, so in absolute terms the acyc(k) value is similar (around 7-10k nodes).
I figured out a way to store green (eg acyclic) nodes in the existing hash map, so I've been able to implement the acyc(0) case. It seems to work, in that the browser doesn't crash and can load some pages (in contrast to my earlier attempt). The code does an extra traverse for every object. I'm going to push ahead and implement a more complete lookahead code (first in Python) now that I have figured out how to save acyc data. I think I can do it without an extra traverse.
This is pseudo code for an algorithm to build the model graph using a DFS. While building the graph, it eliminates acyclic nodes. Contrary to what I said earlier, I read today that DFS produces better locality than BFS. I'll have to get more details. It also does what I am calling "dechaining" here: if you have a ref counted node n with a RC of 1, you can delete it from the graph, if you make n's children into n's parent's children. Basically, if you have nodes a -> b -> c -> d, you can represent that as a -> d in the model graph. Any cycles that b is a member of must also contain a. If a does an unlink, then b will also be unlinked. I implemented this in Python, and it seems to work reasonable well. It eliminates a reasonable number more nodes that acyclic pruning alone. I'm not 100% sure the algorithm is sound, so I'll have to think about it some more. I think it can mostly be implemented in the existing data structures: the mapping of nodes to values can probably be mashed into the existing hash table, and with an additional back link per block the edge allocator can probably be used as a stack. The main problem is that ScanRoots has to iterate over all of the purple nodes, and with a DFS they won't all be lined up nicely in a row. I can think of a few solutions to that, but they all have some drawbacks.
(In reply to comment #9) > If a does an unlink, then b will also be > unlinked. Doesn't that rely on perfect unlinking (which we don't have)?
Oh, right, I forgot about that. I guess I'll have to stick to pruning acyclic nodes.
Well, we could try to make unlinking perfect but it'll take some time.
I assume imperfect linking is on a per-class basis? Where basically a class includes some field in Note*Child but doesn't Unlink it? A class could report in Note*Child whether the child has imperfect linking, if that is rare, and knowing that a child is perfect would allow a lot of optimization. Though that would introduce more opportunities for errors in the interface, which is bad.
This is a quick hack to stop adding DOM objects that we know we won't traverse. andrew, any chance you could check if this significantly reduces the number of objects with no children that end up in the graph? For certain DOM objects (nodes) we implemented a quick check to see if they were connected to a known "live" object (a document being displayed or kept alive in history). We won't traverse those objects if we know that they are "live", but they are still added to the graph. With the patch we shouldn't even be adding them to the graph. I haven't tested the patch very well, but the browser does start up. The change is a bit risky, it could cause weird crashes that I missed.
I haven't gotten anything set up to try to reliably reproduce output enough to compare the results between different runs, though it should be good enough for a rough comparison for just starting up and shutting down. I could look at the classes of objects without any children and see how many of them are from the classes that you modified (nsDOMAttribute etc.). I don't know how accurate that will be. Separately, if we can't rely on Note*Child children being Unlinked, is it space-safe to not include acyclic data in the graph? If you have acyclic data hanging off of a garbage cycle, it sounds like you can't guarantee that the acyclic data will be killed when the cycle is.
Looking at a few random CCs I had data sitting around for, the classes the patches changed from NS_INTERFACE_MAP_ENTRIES_CYCLE_COLLECTION to NS_NODE_INTERFACE_MAP_ENTRIES_CYCLE_COLLECTION account for 78% to 88% of XPCOM nodes with no children. Almost all of this is nsGenericElement, but nsGenericDOMDataNode also accounts for a decent number of them as well. I don't know how many instances of these classes would actually be removed by the change, but it does look promising.
Inspired by peterv's patch, I made a little script to look at the % of the instances of each class that is acyclic at a CC. In addition to the classes he modified already, nsNodeInfo, nsXBLInsertionPointEntry and nsXULPrototypeNode show up as being frequently 98% or more acyclic in a couple of samples I looked at. It might be worth looking at classes like that (or ones like nsXPCWrappedJS which were very rarely acyclic) that show up at the extremes in case there's anything interesting there.
I applied peterv's patch from comment 14, and looked at how nodes there are of the affected classes (nsDocument, nsGenericDOMDataNode, nsGenericElement, nsDOMAttribute) and what percentage are acyc(0) (have no children). The variance is really high from CC to CC, so it is hard to make any real conclusions, but after the patch there are still quite a few acyclic nodes of those classes. For instance, from one CC run post-patch: 67% nsDocument (out of 73) 55% nsGenericElement (out of 3780) 74% nsGenericDOMDataNode (out of 812) 0% nsDOMAttribute (out of 50) The numbers look like they may be better overall, based on a tiny sample, but the number of nodes can vary from 3000-6000 within a single run, so it is hard to tell. Thus far, I've been able to tell how many things are pruned by explicitly logging it when some are found. Perhaps I should go back to looking at getting repeatability.
We could count how many nodes we avoid adding to the graph, but we'd need some way to remember whether we've counted a node before to avoid multiple-counting just because we try to add it multiple times. I wonder why these nodes with no children end up in the graph, that would be good to find out. Maybe they were already unlinked but that didn't destroy them because of incomplete unlinking? It might also be interesting to know how many objects get added to the graph that have already been unlinked. I'm sure we do have some, but no idea how many.
If the log of avoided-nodes includes information about when the CCs are happening, then the log scraping script can just credit that node once per CC. Alternatively, the fact that we could have skipped the node could be logged, but then don't actually skip it, which will also limit it to once per CC. These lone nodes don't have any interesting follow-on effects. I'll try to investigate where the lone nodes come from. It would be nice if we could avoid them even before we get to the CC.
This patches peterv's patch so that instead of skipping a node when it detects that it won't be part of a cycle, it instead prints out a log message, then behaves as normal. I think that this will cause the node to get purpled, so it can't happen again until the node gets its ref count incremented. Still, a node can get purpled and unpurpled many times in between CCs, so this doesn't translate exactly into the number of nodes that are saved. I can probably log which node is being skipped to figure out exactly how many nodes it will save per CC. I'm guessing aParticipant is the value that is going to be suspected? Anyways, when I ran this, I got the nsIDocument message about 19k times in the log, and the nsINode one 96k times. This is out of about 1.3 million suspected nodes, and 115k visited nodes. I poked around in the code to bit to try to think of how you might figure out if imperfect unlinking is causing childless nodes to be suspected, but I couldn't figure out how to relate the parent to the child.
(In reply to comment #21) > I'm guessing aParticipant is the value that is going to be suspected? No, that's the helper object that we use to traverse, I guess it's not named very well. aNode/aDocument is the object that we'll suspect. > Anyways, when I ran this, I got the nsIDocument message about 19k times in the > log, and the nsINode one 96k times. This is out of about 1.3 million suspected > nodes, and 115k visited nodes. 1.3 million unique objects, or 1.3 million calls to Suspect? > I poked around in the code to bit to try to think of how you might figure out > if imperfect unlinking is causing childless nodes to be suspected, but I > couldn't figure out how to relate the parent to the child. You could log the unlinking (in nsCycleCollector::CollectWhite?) and then correlate with subsequent suspecting of the same object?
(In reply to comment #22) > > Anyways, when I ran this, I got the nsIDocument message about 19k times in the > > log, and the nsINode one 96k times. This is out of about 1.3 million suspected > > nodes, and 115k visited nodes. > > 1.3 million unique objects, or 1.3 million calls to Suspect? That's 1.3 million calls to Suspect. So a single object can get Suspected multiple times in between CCs. But I think this number is probably most comparable to the number of times the log message I added is produced, because the main limit on an object being logged multiple times is purpleness. > You could log the unlinking (in nsCycleCollector::CollectWhite?) and then > correlate with subsequent suspecting of the same object? Okay, that sounds like a good idea, I'll try it out. I could probably even detect imperfect unlinking in CollectWhite using a Traverse.
I rigged up some code to detect incomplete unlinking. I do a Traverse after the Unlink in CollectWhite, and I assume that any children left over must be something that wasn't perfectly unlinked (regular unlinking sets the field to NULL after unlinking). This required hacking up nsXPConnect::Traverse and NS_CYCLE_COLLECTION_CLASSNAME(XPCWrappedNative)::Traverse, because those crash if you try to Traverse them after FinishTraverse, not too surprisingly. Here's the number of instances of each class that was an orphaned child, in each CC (none in the 2nd or 3rd CC): 1) nsXPCWrappedJS:1, nsNodeInfoManager:2, nsXULPrototypeNode:35, nsNodeInfo:18 4) nsXPCWrappedJS:1, nsNodeInfoManager:3, nsXULPrototypeNode:10, nsNodeInfo:50, nsTextEditorState:1, nsEventListenerManager:1 5) nsNodeInfoManager:21, nsNodeInfo:50 Not a huge number of them, if my methodology is correct. There are a large number of them in the 6th CC (around 20k), but I think that is one of the cleanup CCs, and those aren't tracked by the graph dumper. I'm going to try to figure out - what are the classes of the imperfect suspecters? - what do these left behind objects look like? Are they childless?
A total of 100 of these objects (out of 193 total objects) have no parents in the graph besides ones that imperfectly unlinked them. 41 of these are not collected by the CC... but it looks like those almost all have a ton of external references anyways, so they wouldn't be up for deletion anyways.
Here are the classes involved with imperfect unlinking from one CC: 39 * nsGenericElement --(mPrototype)--> nsXULPrototypeNode 3 * nsNodeInfo --(mOwnerManager)--> nsNodeInfoManager 2 * nsGenericDOMDataNode --(mNodeInfo)--> nsNodeInfo 43 * nsGenericElement --(mNodeInfo)--> nsNodeInfo The number is the number of occurrences of the combo, the first class is the parent, next is the name of the edge, then finally is the child class. I tried looking at the Traverse and Unlink code, and I could confirm that nsGenericDOMDataNode does Traverse to the mNodeInfo field, but doesn't Unlink it. There's no explanation for that omission in the code. (Though in nsDocument the Unlink function says that most of the fields are not explicitly Unlinked because the destructor is complicated.) There are some more combinations in the following CCs if that is useful at all.
One minor nitpick: NS_IMPL_CYCLE_COLLECTION_TRAVERSE_BEGIN_INHERITED mechanism for extending the Traversal of a parent class (which I don't entirely understand yet) doesn't update the invocation of DescribeNode with the new subclass name. I think this also means that incorrect data about the size of the object could be fed to DescribeNode if the child is larger. However, the size data isn't actually used anywhere (it is saved into mCurrPi->mBytes, but it looks like nothing looks at it), so I guess it doesn't matter either.
Depends on: 653019
The graph is much tinier nowadays.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: