Open Bug 931491 Opened 11 years ago Updated 1 year ago

When the cycle collector graph exceeds 64M vertices, the CC asserts in debug builds, and leaks everything in release builds, due to pldhash 64M entries limit

Categories

(Core :: Cycle Collector, defect)

defect

Tracking

()

People

(Reporter: bjacob, Unassigned)

Details

(Whiteboard: [MemShrink:P3])

Attachments

(1 file)

The attached testcase creates over 64M HTML <div> elements. As the cycle collector tries to build its graph, it will run into pldhash's 64M entries limit. Currently this gives an assertion failure in debug builds,

Assertion failure: false (Ran out of memory while building cycle collector graph), at /hack/mozilla-central/xpcom/base/nsCycleCollector.cpp:2136

and in release builds, this bails out of the cycle collector, leaking everything.

To run this, make sure that you use an optimized build (otherwise it's too slow),  and you'll probably also need to run a 64-bit build on a machine with 32G of RAM (on my machine with 16G of RAM, I OOM during CC).

Note: this is a follow-up to bug 902922 where a similar situation was hit in the real world before the pldhash limit was upped from 8M to 64M.
STR:
 1. use a 64 bit optimized build on a machine with at least 32G of RAM
 2. load testcase
 3. quit firefox, as a sure way of triggering a full CC (in debug builds)
 4. actual result: assert failure as in comment 0 (make sure your build has cset 338760b1955c)

Alternatively, in an optimized build, you could verify that the memory is not free'd, in about:memory.
The STR in comment 1 require a _debug_ build (since the 'actual results' part is an assert failure). Hitting this assertion is equivalent to reproducing the leak, but easier to verify than going to about:memory and looking at measurements (which one can do in non-debug builds).
Whiteboard: [MemShrink]
So, handling 60m entries is always going to be pretty slow, so I wonder if a better approach is to try and make this hash table smaller.

In particular 1 node should be sufficient to represent any cycle because if one element of the cycle is rooted then they all are and if not the cycle is garbage.  The tricky part of this would be that since there isn't a bijection between graph nodes and objects we'd need a way to map an object to a graph node.  Maybe one way to do that would be to only merge objects that can be associated with a document and in the same document.  we could then have a per document hash table mapping elements to graph nodes (still has the could get huge issue but maybe less?).

Note I don't understand the current CC merging stuff, but given this bug exists with this test case I'm assuming we optimize the graph after creating the whole one instead of optimizing it as we create it.
There are two separate issues here:
1. How much memory do we use to store the graph objects?
2. How much memory do we use to map real objects to objects in the graph?

I have tried a few things before to make the graph smaller (meta bug 741417), but with an eye towards making CC faster, not reducing memory usage:

a. Bug 754495 - Represent an entire JS compartment as a single node in the graph.  This reduces both kinds of memory (1 and 2 above), and is way faster, in JS heavy pages.  For the case in bug 902922, it reduces the graph size by 99.9%.  The drawback is that, unlike other methods I mention next, it is an approximation (all nodes in a compartment aren't necessarily strongly connected), so we can't always run it, or we could leak.  It also wouldn't help much with bjacob's case in this bug, which involves many DOM nodes.  I did land this.

b. Bug 653191 - Dynamically discover strongly connected components and represent them as a single node in the graph. This uses the Cheriyan–Mehlhorn/Gabow algorithm.  This reduces memory to store the objects (1), but you need the same amount of memory to map real objects to graph objects (2).  You could probably still create a giant singly-linked tree to defeat this (I don't remember how well I dealt with acyclic nodes).  I had a working prototype, but I never landed it because it was really complex and didn't speed the CC up at all.

c. Bug 756675 - Represent an entire DOM tree as a single node in the CC graph.  This is a special case of the previous approach, and has the potential to make things faster, because we could scan the DOM in one go without all the CC indirection getting in the way.  This would reduce 1 and 2.  It would reduce the size of the hash table because you make the QI to CycleCollectingISupports for a node in a DOM tree return the top of the tree, so the CC is never even aware of most of the nodes in a tree.  That would make bjacob's test case here into a single graph node, but wouldn't help if you had millions of tiny DOM trees, or the test case in bug 902922, which was all JS.  I abandoned this approach because it got bogged down in problems with anonymous content, where a node is not the child of its parent.
Trevor's comment does bring to mind that potentially we could store the mapping from real objects to graph objects in the objects themselves, by reusing the ref count field in each object.  This would make tearing down the graph trickier, as you'd have to move ref count information back from the CC graph into the objects.  You also would be unable to have multiple CCs going on at the same time, which isn't something we do right now, but is something I'm trying out in debug builds for ICC verification.

A big limitation is that JS objects don't have a refcount, so you'd have to come up with some other spare space in JS objects. That would be needed to fix the case in bug 902922, but not the one in this bug.
It sounds still silly that pldhash has such an odd limitation.
I would expect it to be able to handle UINT32_MAX.


Note, we try to break disconnected DOM subtrees without CC during CanSkip, but that works only
in case the subtree has the normal parent element<->child node edges and nothing more.
So doesn't help with the testcase which creates a massive DOM tree.
> It sounds still silly that pldhash has such an odd limitation.
> I would expect it to be able to handle UINT32_MAX.

An element count of UINT32_MAX is going to require an allocation size of at least 8*UINT32_MAX bytes.  Good luck with that on a 32-bit machine.  Basically, if the table gets big enough, insertion will fail somehow, and the caller needs to be able to handle that.
(In reply to Andrew McCreight [:mccr8] from comment #5)
> Trevor's comment does bring to mind that potentially we could store the
> mapping from real objects to graph objects in the objects themselves, by
> reusing the ref count field in each object.  This would make tearing down
> the graph trickier, as you'd have to move ref count information back from
> the CC graph into the objects.  You also would be unable to have multiple

it's certainly work, but I'd think when you go to unlink something you'd already have all the data you need to do that so it would be fairly easy.

> CCs going on at the same time, which isn't something we do right now, but is
> something I'm trying out in debug builds for ICC verification.

hm, I thought that was just one cc followed immediately by another?  Also why exactly does this make it harder to do this?

> A big limitation is that JS objects don't have a refcount, so you'd have to
> come up with some other spare space in JS objects. That would be needed to
> fix the case in bug 902922, but not the one in this bug.

yeah, or you could do my per document hash thing with s/document/compartment/ not a memory win, but each hash table is smaller and maybe you get better locality?


(In reply to Nicholas Nethercote [:njn] from comment #7)
> > It sounds still silly that pldhash has such an odd limitation.
> > I would expect it to be able to handle UINT32_MAX.
> 
> An element count of UINT32_MAX is going to require an allocation size of at
> least 8*UINT32_MAX bytes.  Good luck with that on a 32-bit machine. 

well, that's not going to happen here because you can't put 2^32 objects in memory either so the graph can't be that big, but there is certainly still a size at which objects + graph > avaliable space.

> Basically, if the table gets big enough, insertion will fail somehow, and
> the caller needs to be able to handle that.

its not clear that's possible here, sure you can try to collect cycles with the graph you have and hope that gives you enough room to fit the full graph, but there are certainly cases that doesn't work.  So afaik we need to work on ways to keep the hash table small.
Whiteboard: [MemShrink] → [MemShrink:P3]
Severity: normal → S3
Component: XPCOM → Cycle Collector
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: