Closed Bug 701878 Opened 8 years ago Closed 8 years ago

investigate "soft incremental" cycle collection via cap on graph size

Categories

(Core :: XPCOM, defect)

defect
Not set

Tracking

()

RESOLVED WONTFIX

People

(Reporter: mccr8, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [Snappy:P3])

Attachments

(1 file)

Currently, the cycle collector adds all the roots, then traces out all nodes reachable from those nodes.  It should be possible to change this to: add one root, trace out all nodes reachable from that root, then move onto the next root.  I think that given this setup, it is okay to stop building the graph at the point where we are moving onto the next root, and finish out the rest of the cycle collection.

Thus, after we finish tracing out from a particular node, we can see if we've hit some limit, either in terms of how long we've spent or as a cruder measure, how many nodes we've added to the graph.  For simplicity, assume we use graph size.  This would bound the maximum size of the graph to roughly be the limit on the size of the CC graph plus the size of the largest number of nodes you can reach from a single node in the graph.  This would happen if you are almost at the limit, and then the next node you touch pulls in a huge amount of stuff that hasn't been visited yet.  In a simple browsing session with two tabs, on shutdown, this max blob size was about 11000. [1]

I'm not entirely sure if this is sound, but it seems plausible to me.

This is a tradeoff of throughput for responsiveness, as doing more nodes at once reduces the chance you'll examine a node twice.

Instead of doing UnmarkPurple only in AddPurpleRoot, we'd have to check if a node is purple any time we add a graph to the node, and UnmarkPurple if it is.

One tricky part of this would be incrementalize traversal of XPConnect roots.  We have to make sure that every root is eventually visited, or there will be leaks.  The XPConnect roots are stored in hashtables and things, that can have things added or removed from them in between CCs.  Incrementalizing the purple buffer is probably more straightforward: you can just trawl linearly through the purple buffer.  Dead objects won't be moved, so you'll eventually get to them.

[1] I think independent work may be possible to break up some of these large blobs.  The ones I've looked at have a lot of XUL in them, which looks suspicious.  I'll file a separate bug on that at some point, as eliminating these nodes from the graph would help even with our current situation.
Depends on: 702032
Whiteboard: [Snappy]
Depends on: 653019
This could be interesting combined with an interruptible CC: instead of pushing the entire graph back into the purple buffer, just push back unexplored nodes, then finish out the CC with what you have so far.  That way we might be able to get some forward progress despite being interrupted.

One caveat here is that if we traverse any JS object, we have to traverse all JS holders.

Patrick has done some interesting experiments here by seeing how much you have to pull in when doing DFS that has some surprisingly promosing results.
This is an interesting idea, but it is probably a better use of time to focus on other approaches first.
Whiteboard: [Snappy] → [Snappy:P3]
No longer depends on: 653019
Here's a first crack at changing the traversal order.  You can browse without crashing, but it crashes on shutdown during a cycle collection for some reason.

The basic idea is that every time you add a new root, you traverse everything that is reachable from that root before moving on to the next root.  In theory, once you have that, then after you add all of the XPConnect roots, after every purple root you add, you can see how many things are in the graph.  If there are too many, then you just stop pulling things out of the purple buffer and run the rest of the CC.

(We could also be trickier and not add XPConnect roots until we see another JS object while traversing purple roots, or if we have decided to trace through JS.)
I hacked up that code a bit to make it we stop pulling things out of the purple buffer once we have added at least 2000 nodes to the graph beyond what are in there from the XPConnect roots, then tried out usage pattern suggested by pcwalton: I opened up about a dozen CNN.com tabs, then opened a new blank tab, and selected "close all other tabs" (which I didn't know existed before today).

You then get the following sequence of CCs (ignoring the many GCs that happened in between).  While this particular tuning seem kind of bad (we probably spend most of our time in the CC traversing XPConnect roots, and because we're GCing, that's probably hurting Olli's CC optimizations), you can kind of see the general idea here.  We slowly drain the purple buffer, collecting about a thousand nodes at a time, until for some reason near the end we start freeing up huge chunks.  The fact that the pages I opened are all from the same domain probably does not help.  I should try MemBuster.

CC(T+40.5) collected: 1667 (1667 waiting for GC), suspected: 11834, duration: 48 ms.
CC(T+49.6) collected: 8086 (8086 waiting for GC), suspected: 9089, duration: 34 ms.
CC(T+60.2) collected: 1557 (1557 waiting for GC), suspected: 8558, duration: 101 ms.
CC(T+70.6) collected: 649 (649 waiting for GC), suspected: 8210, duration: 111 ms.
CC(T+81.2) collected: 836 (836 waiting for GC), suspected: 7900, duration: 109 ms.
CC(T+91.4) collected: 1387 (1387 waiting for GC), suspected: 7461, duration: 136 ms.
CC(T+101.6) collected: 818 (818 waiting for GC), suspected: 6854, duration: 142 ms.
CC(T+111.7) collected: 1177 (1177 waiting for GC), suspected: 6306, duration: 136 ms.
CC(T+121.9) collected: 1445 (1445 waiting for GC), suspected: 6287, duration: 164 ms.
CC(T+132.1) collected: 752 (752 waiting for GC), suspected: 5685, duration: 136 ms.
CC(T+142.2) collected: 715 (715 waiting for GC), suspected: 5125, duration: 133 ms.
CC(T+152.4) collected: 982 (982 waiting for GC), suspected: 4741, duration: 137 ms.
CC(T+162.5) collected: 1180 (1180 waiting for GC), suspected: 4217, duration: 150 ms.
CC(T+172.7) collected: 907 (907 waiting for GC), suspected: 3695, duration: 139 ms.
CC(T+182.9) collected: 3304 (3304 waiting for GC), suspected: 3237, duration: 173 ms.
CC(T+193.0) collected: 5281 (5281 waiting for GC), suspected: 2663, duration: 141 ms.
CC(T+203.1) collected: 1460 (1460 waiting for GC), suspected: 2179, duration: 138 ms.
CC(T+213.4) collected: 37900 (37900 waiting for GC), suspected: 1629, duration: 204 ms.
CC(T+223.5) collected: 39444 (39444 waiting for GC), suspected: 950, duration: 173 ms.
MemBuster is actually much worse, as it turns out.  You get a ton of CCs like this:

CC(T+47654.1) collected: 0 (206 waiting for GC), suspected: 38443, duration: 636 ms.
ForgetSkippable 1 times before CC, min: 15 ms, max: 15 ms, avg: 15 ms, total: 15 ms, removed: 34

Sometimes it collects < 100 nodes.  I'm not sure why there's so much junk in the purple buffer.  Perhaps combined with my patches for removing childless nodes it would be less terrible.
(In reply to Andrew McCreight [:mccr8] from comment #5)
> MemBuster is actually much worse, as it turns out.  You get a ton of CCs
> like this:
How does that compare to the situation right now? Are the CCs the same length (or slightly longer) but far fewer in number?
What would happen now is you'll get a single long CC.  I'm not sure how bad it would be for MemBuster.  With a ton of nodes, a CC can take anywhere from 1 second to 15 seconds.

The basic problem with the current setup in this patch is that we're not making enough forward progress at each CC.  One thing I realized is that we should try to unpurple nodes we encounter as children.  That could help.
I tried just now with 10 CNN tabs, which is similar to the sequence above, though maybe I had a few more tabs there.  This is, more or less, our "normal" behavior:

CC(T+87.8) duration: 461ms, suspected: 3356, visited: 56377 RCed and 55513 GCed, collected: 54467 RCed and 53892 GCed (108359 waiting for GC)
ForgetSkippable 15 times before CC, min: 0 ms, max: 11 ms, avg: 2 ms, total: 34 ms, removed: 29008

This cleaned up everything in a single chunk, at the cost of a half-second pause.  Compared to my "incremental" CC, where the longest pause time was 200ms.  Of course, it ended up spending about 4 times as long total, and the pauses were still kind of crappy.  Hopefully with some tweaking we can reduce the throughput penalty, and bring down the max pause time a bit.  If we could keep the max pause time around 50ms, and only pay a throughput penalty of say 2x, that would be cool.
Smaug, you might find this interesting.  It is a very early prototype...
I tried out using the "root at a time" approach to graph building, then measured the size of the graph after all XPConnect roots had been added, then again after everything has been added.  With a case where we close 7 or so tabs at a time, it looks like this:

cc: init count is 120864, final count is 122604

Almost everything is reachable from an XPConnect root, which makes sense. But it means this approach won't help without some further work.

Here are the full stats from that GC:

CC(T+107.3) duration: 449ms, suspected: 525, visited: 29622 RCed and 92982 GCed, collected: 27605 RCed and 90680 GCed (90680 waiting for GC)
ForgetSkippable 2 times before CC, min: 1 ms, max: 8 ms, avg: 4 ms, total: 9 ms, removed: 480

The graph is about 75% JS, so even in the best case we can't shrink the graph much, unless we get very clever about adding only specific compartments to the CC graph somehow.
perhaps we should try this with more-often-used compartment gc.
No longer blocks: 698919
Blocks: 698919
This doesn't seem like it is worth doing, as the CC graph is very connected, via JS.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.