368530 - Should cycle collector walk XPCWrappedNativeTearOffs?

Reporter

Description

•

19 years ago

I was looking into the leaks we have on trunk, and in particular into why we leak nsGlobalWindows. We seemed to be leaking them through the XPConnect machinery somehow. So I switched to code inspection, and it seems to me that the strong references held to the various interfaces of the C++ object by the XPCWrappedNativeTearOffs of an XPCWrappedNative are not tracked by cycle collection... That said, in Firefox I can't reproduce the window leak with my patch for bug 368523, no matter what I do with this code. And in Seamonkey I leak even if I add traversal of the tearoffs (though I sorta hacked it in from NS_CYCLE_COLLECTION_CLASSNAME(XPCWrappedNative)::Traverse, so maybe it's just bogus to do it from there).

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

19 years ago

Blocks: cycle-collector

Peter Van der Beken [:peterv]

Assignee

Comment 1

•

19 years ago

I assumed that the tearoffs' class had JSCLASS_PRIVATE_IS_NSISUPPORTS set, but they don't, so you're right.

Assignee: nobody → peterv

Boris Zbarsky [:bzbarsky]

Reporter

Comment 2

•

19 years ago

Yeah, the problem is that the private of the tearoff JSObject is the XPCWrappedNativeTearOff itself, which is not an nsISupports. I guess nsXPConnect could just compare the return value of OBJ_GET_CLASS to &XPC_WN_Tearoff_JSClass or something?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 3

•

19 years ago

Attached patch I tried this, but it doesn't obviously help... — Details — Splinter Review

Peter Van der Beken [:peterv]

Assignee

Comment 4

•

19 years ago

Yeah, I did something similar. I have a bunch of other changes too though, to follow more closely what all the Mark callbacks for the different XPConnect classes do.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 5

•

19 years ago

I'd be happy to try patches if you send them my way, either in this bug, or e-mail.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 6

•

19 years ago

Oh, and one thing that bothers me is that I added __proto__ and __parent__ but the JS engine may have other things it marks in interesting ways. For example, does a Function object mark the canonical object for the JSFunction? That sort of thing. As I understand, the cycle collector needs to exactly duplicate what marking does, right? One other thing. I see how we can start suspecting a cycle when a Release() happens on an nsISupports, via Suspect(). But what about the case when a cycle is kept alive by a JSObject A pointing to a JSObject B that's an XPCWrappedNative for something participating in the cycle? Then when A drops the reference to B... what happens?

Peter Van der Beken [:peterv]

Assignee

Comment 7

•

19 years ago

Attached image GlobalWindow graph — Details

Here's one graph I got today when closing a window (in a tree with cycle collection added to most of content/html,xbl,xul). Maroon is objects that we don't have all the references for. Note the tooltip, and I'm far from having found all the JS references.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 8

•

19 years ago

Is that graph with nsXULDocument marking its mTooltipNode?

Brendan Eich [:brendan]

Comment 9

•

19 years ago

(In reply to comment #6) > Oh, and one thing that bothers me is that I added __proto__ and __parent__ but > the JS engine may have other things it marks in interesting ways. For example, > does a Function object mark the canonical object for the JSFunction? That sort > of thing. > > As I understand, the cycle collector needs to exactly duplicate what marking > does, right? (I want to write:) <keanu>Dude</keanu>, the cycle collector already does track mark operations. See http://lxr.mozilla.org/mozilla/search?string=gcThingCallback > One other thing. I see how we can start suspecting a cycle when a Release() > happens on an nsISupports, via Suspect(). But what about the case when a cycle > is kept alive by a JSObject A pointing to a JSObject B that's an > XPCWrappedNative for something participating in the cycle? Then when A drops > the reference to B... what happens? Cycles may be formed by any de-reference lowering the referent's count to non-zero value. For GC'ed objects, the cycle collector relies on JS GC calling the gc-thing callback for every reference, so it can compute counts. Thus the bug where the GC locks hashtable was hiding extra counts. /be

Boris Zbarsky [:bzbarsky]

Reporter

Comment 10

•

19 years ago

> See http://lxr.mozilla.org/mozilla/search?string=gcThingCallback Hmm. I guess the cycle collector doesn't actually keep track of _who_ marked the object, but just how many times it was marked. OK. I had thought it was keeping track of who was pointing to what. ;) So if this is the case, why is nsXPConnect::Traverse marking slots? Doesn't GC handle that already? It sounds like everything except marking things held via the private ptr should get handled by GC, basically.

Peter Van der Beken [:peterv]

Assignee

Comment 11

•

19 years ago

(In reply to comment #9) > (I want to write:) <keanu>Dude</keanu>, the cycle collector already does track > mark operations. See > http://lxr.mozilla.org/mozilla/search?string=gcThingCallback But that just gives it the refcount of one object. The cycle collector also needs to know the arcs that cause these refcounts, which object holds on to which other object. And currently the JS GC doesn't give it that information, so it tries to figure it out by itself. nsXPConnect::Traverse doesn't actually mark any slots, but it registers them as marked by the object that is currently being marked. I'm sure that it's missing out on some of these though, anything marked by the object itself with JS_MarkGCThing for example. The end result seems to be garbage collecting refcounted objects and refcounting garbage collected objects ;-).

Peter Van der Beken [:peterv]

Assignee

Comment 12

•

19 years ago

Btw, I've gotten things so far that I see a browser.xul document being collected (about 1900 objects involved). I'm trying to get my patches into a reasonable state to attach them. More bugs and patches tomorrow.

Brendan Eich [:brendan]

Comment 13

•

19 years ago

(In reply to comment #11) > But that just gives it the refcount of one object. The cycle collector also > needs to know the arcs that cause these refcounts, which object holds on to > which other object. And currently the JS GC doesn't give it that information, > so it tries to figure it out by itself. nsXPConnect::Traverse doesn't actually > mark any slots, but it registers them as marked by the object that is currently > being marked. I'm sure that it's missing out on some of these though, anything > marked by the object itself with JS_MarkGCThing for example. I should have seen this early on when talking to Graydon; sorry about that. So we need a different gcThingCallback, one that takes the "parent" object as well as the "child" gc-thing and flags. Right? /be

Boris Zbarsky [:bzbarsky]

Reporter

Comment 14

•

19 years ago

I think that would be the way to go. Then the cycle collector would know about everything the JS engine knows about automatically, and whoever has private pointers that reference stuff would be responsible for them somehow. Possibly from Mark() callbacks or something? As things stand, only XPConnect really gets to do stuff when the cycle collector is working with a JSObject, but maybe that's enough for us.

Graydon Hoare :graydon

Comment 15

•

19 years ago

I don't think it'll work to drive the edges all the way back through to the cycle collector, but you might reorganize the xpconnect cycle collection helper to use a table of edges that the js gc mark callback builds, rather than explicitly walking the object slots as it currently does. Seems expensive to me -- the table would temporarily eat 2 words per edge in the js graph, not just 2 words per object as it currently is -- but maybe that's an unavoidable hit. Shall I cook up a patch? (You can't easily reorganize the cycle collector to accept the js gc callback driving edges into it directly because the cycle collector is performing incremental color-flooding through the graph. If you dumped a bunch of edges on it, there's no reason to believe it would be ready to receive them: its cycle-collection frontier might not have arrived at the js graph frontier yet)

Graydon Hoare :graydon

Comment 16

•

19 years ago

(In reply to comment #6) > One other thing. I see how we can start suspecting a cycle when a Release() > happens on an nsISupports, via Suspect(). But what about the case when a cycle > is kept alive by a JSObject A pointing to a JSObject B that's an > XPCWrappedNative for something participating in the cycle? Then when A drops > the reference to B... what happens? Nobody calls the cycle collector in such a case. At least one node of the cycle has to be out in XPCOM space and the cycle must have turned to garbage between that node being suspected and the cycle being scanned. It's a conservative algorithm.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 17

•

19 years ago

OTOH, if the cycle is all JS objects won't the normal GC catch it?

Graydon Hoare :graydon

Comment 18

•

19 years ago

Yes. If a cycle is *all* JS, it's caught by JS GC. The problem scenario is a stable near-garbage-cycle XPCOM_A -> JS_B -> XPCOM_A being kept alive by a JS edge JS_C -> JS_B. Assume that at some point, the edge JS_C -> JS_B is going to be dropped, say by script. The question is: will we ever see the now-garbage subgraph XPCOM_A -> JS_B -> XPCOM_A? The answer is "sometimes". There are three events to consider: SUSPECT_A, SCAN, and DROP_CB_EDGE. SUSPECT_A always happens before SCAN. If DROP_CB_EDGE happens between SUSPECT_A and SCAN, we find the garbage cycle. If DROP_CB_EDGE happens after SCAN, we don't find the cycle because it hasn't *quite* turned into garbage by the time we scan, and we forget any suspect we scanned and couldn't prove to be garbage. This is a risk inherent in interfacing the Bacon-Rajan algorithm (which is driven by refcounting events) with a subgraph that has no refcounting events. The JS graph is mark/sweep. We don't know when a node's refcount goes from N+1 -> N for nonzero N. We're only ever driven by refcounting events in the XPCOM subgraph, and they can essentially "race" against the activity within the JS graph. To eliminate the racing entirely, you would probably need to do something really awful like forcibly suspect all JS roots at the beginning of every cycle collection. This would cost, but it's not clear to me how much more: we may already be near the situation of scanning most of the JS heap each cycle collection, due to all the parent links.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 19

•

19 years ago

So as things stand we can't rely on the cycle collector to collect cycles, right? That is, we can't start removing stuff like clearing the scope on windows and manual cycle-breaking yet. That was the part I really cared about.

Brendan Eich [:brendan]

Comment 20

•

19 years ago

We should not remove any JS_ClearScope calls in any event. Graydon's right about parent links -- they are everywhere. Suggest trying out the suspect-all-roots patch and see how it performs. /be

Graydon Hoare :graydon

Comment 21

•

19 years ago

Building this patch presently. It's a bit less than suspecting all roots; just all native wrappers. I'll try a couple variants. It might even be sufficient to suspect all the identity nsISupports' on each native wrapper, not the wrappers themselves.

Graydon Hoare :graydon

Comment 22

•

19 years ago

Patch filed in new bug 368869. Doesn't seem to fix significant leakage.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 23

•

19 years ago

I wouldn't expect it to on our leak tests -- none of those pages have any script on them, e.g. ;)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Updated

•

18 years ago

Blocks: 348156

Peter Van der Beken [:peterv]

Assignee

Comment 24

•

18 years ago

This was fixed as part of the fix for bug 367779.

Status: NEW → RESOLVED

Closed: 18 years ago

Resolution: --- → FIXED

I tried this, but it doesn't obviously help... 19 years ago Boris Zbarsky [:bzbarsky] 2.43 KB, patch		Details \| Diff \| Splinter Review
GlobalWindow graph 19 years ago Peter Van der Beken [:peterv] 165.39 KB, image/png		Details