Should cycle collector walk XPCWrappedNativeTearOffs?

RESOLVED FIXED

Status

()

defect
RESOLVED FIXED
13 years ago
13 years ago

People

(Reporter: bzbarsky, Assigned: peterv)

Tracking

Trunk
x86
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

I was looking into the leaks we have on trunk, and in particular into why we leak nsGlobalWindows.  We seemed to be leaking them through the XPConnect machinery somehow.

So I switched to code inspection, and it seems to me that the strong references held to the various interfaces of the C++ object by the XPCWrappedNativeTearOffs of an XPCWrappedNative are not tracked by cycle collection...

That said, in Firefox I can't reproduce the window leak with my patch for bug  368523, no matter what I do with this code.  And in Seamonkey I leak even if I add traversal of the tearoffs (though I sorta hacked it in from NS_CYCLE_COLLECTION_CLASSNAME(XPCWrappedNative)::Traverse, so maybe it's just bogus to do it from there).
Assignee

Comment 1

13 years ago
I assumed that the tearoffs' class had JSCLASS_PRIVATE_IS_NSISUPPORTS set, but they don't, so you're right.
Assignee: nobody → peterv
Yeah, the problem is that the private of the tearoff JSObject is the XPCWrappedNativeTearOff itself, which is not an nsISupports.

I guess nsXPConnect could just compare the return value of OBJ_GET_CLASS to &XPC_WN_Tearoff_JSClass or something?
Assignee

Comment 4

13 years ago
Yeah, I did something similar. I have a bunch of other changes too though, to follow more closely what all the Mark callbacks for the different XPConnect classes do.
I'd be happy to try patches if you send them my way, either in this bug, or e-mail.
Oh, and one thing that bothers me is that I added __proto__ and __parent__ but the JS engine may have other things it marks in interesting ways.  For example, does a Function object mark the canonical object for the JSFunction?  That sort of thing.

As I understand, the cycle collector needs to exactly duplicate what marking does, right?

One other thing.  I see how we can start suspecting a cycle when a Release() happens on an nsISupports, via Suspect().  But what about the case when a cycle is kept alive by a JSObject A pointing to a JSObject B that's an XPCWrappedNative for something participating in the cycle?  Then when A drops the reference to B... what happens?
Assignee

Comment 7

13 years ago
Posted image GlobalWindow graph
Here's one graph I got today when closing a window (in a tree with cycle collection added to most of content/html,xbl,xul). Maroon is objects that we don't have all the references for. Note the tooltip, and I'm far from having found all the JS references.
Is that graph with nsXULDocument marking its mTooltipNode?
(In reply to comment #6)
> Oh, and one thing that bothers me is that I added __proto__ and __parent__ but
> the JS engine may have other things it marks in interesting ways.  For example,
> does a Function object mark the canonical object for the JSFunction?  That sort
> of thing.
> 
> As I understand, the cycle collector needs to exactly duplicate what marking
> does, right?

(I want to write:) <keanu>Dude</keanu>, the cycle collector already does track mark operations.  See http://lxr.mozilla.org/mozilla/search?string=gcThingCallback

> One other thing.  I see how we can start suspecting a cycle when a Release()
> happens on an nsISupports, via Suspect().  But what about the case when a cycle
> is kept alive by a JSObject A pointing to a JSObject B that's an
> XPCWrappedNative for something participating in the cycle?  Then when A drops
> the reference to B... what happens?

Cycles may be formed by any de-reference lowering the referent's count to non-zero value. For GC'ed objects, the cycle collector relies on JS GC calling the gc-thing callback for every reference, so it can compute counts. Thus the bug where the GC locks hashtable was hiding extra counts.

/be
> See http://lxr.mozilla.org/mozilla/search?string=gcThingCallback

Hmm.  I guess the cycle collector doesn't actually keep track of _who_ marked the object, but just how many times it was marked.  OK.  I had thought it was keeping track of who was pointing to what.  ;)

So if this is the case, why is nsXPConnect::Traverse marking slots?  Doesn't GC handle that already?  It sounds like everything except marking things held via the private ptr should get handled by GC, basically.
Assignee

Comment 11

13 years ago
(In reply to comment #9)
> (I want to write:) <keanu>Dude</keanu>, the cycle collector already does track
> mark operations.  See
> http://lxr.mozilla.org/mozilla/search?string=gcThingCallback

But that just gives it the refcount of one object. The cycle collector also needs to know the arcs that cause these refcounts, which object holds on to which other object. And currently the JS GC doesn't give it that information, so it tries to figure it out by itself. nsXPConnect::Traverse doesn't actually mark any slots, but it registers them as marked by the object that is currently being marked. I'm sure that it's missing out on some of these though, anything marked by the object itself with JS_MarkGCThing for example.

The end result seems to be garbage collecting refcounted objects and refcounting garbage collected objects ;-).
Assignee

Comment 12

13 years ago
Btw, I've gotten things so far that I see a browser.xul document being collected (about 1900 objects involved). I'm trying to get my patches into a reasonable state to attach them. More bugs and patches tomorrow.
(In reply to comment #11)
> But that just gives it the refcount of one object. The cycle collector also
> needs to know the arcs that cause these refcounts, which object holds on to
> which other object. And currently the JS GC doesn't give it that information,
> so it tries to figure it out by itself. nsXPConnect::Traverse doesn't actually
> mark any slots, but it registers them as marked by the object that is currently
> being marked. I'm sure that it's missing out on some of these though, anything
> marked by the object itself with JS_MarkGCThing for example.

I should have seen this early on when talking to Graydon; sorry about that.  So we need a different gcThingCallback, one that takes the "parent" object as well as the "child" gc-thing and flags.  Right?

/be
I think that would be the way to go.  Then the cycle collector would know about everything the JS engine knows about automatically, and whoever has private pointers that reference stuff would be responsible for them somehow.  Possibly from Mark() callbacks or something?  As things stand, only XPConnect really gets to do stuff when the cycle collector is working with a JSObject, but maybe that's enough for us.
I don't think it'll work to drive the edges all the way back through to the cycle collector, but you might reorganize the xpconnect cycle collection helper to use a table of edges that the js gc mark callback builds, rather than explicitly walking the object slots as it currently does. Seems expensive to me -- the table would temporarily eat 2 words per edge in the js graph, not just 2 words per object as it currently is -- but maybe that's an unavoidable hit. Shall I cook up a patch?

(You can't easily reorganize the cycle collector to accept the js gc callback driving edges into it directly because the cycle collector is performing incremental color-flooding through the graph. If you dumped a bunch of edges on it, there's no reason to believe it would be ready to receive them: its cycle-collection frontier might not have arrived at the js graph frontier yet)
(In reply to comment #6)

> One other thing.  I see how we can start suspecting a cycle when a Release()
> happens on an nsISupports, via Suspect().  But what about the case when a cycle
> is kept alive by a JSObject A pointing to a JSObject B that's an
> XPCWrappedNative for something participating in the cycle?  Then when A drops
> the reference to B... what happens?

Nobody calls the cycle collector in such a case. At least one node of the cycle has to be out in XPCOM space and the cycle must have turned to garbage between that node being suspected and the cycle being scanned. It's a conservative algorithm.

OTOH, if the cycle is all JS objects won't the normal GC catch it?
Yes. If a cycle is *all* JS, it's caught by JS GC.

The problem scenario is a stable near-garbage-cycle XPCOM_A -> JS_B -> XPCOM_A being kept alive by a JS edge JS_C -> JS_B. Assume that at some point, the edge JS_C -> JS_B is going to be dropped, say by script. The question is: will we ever see the now-garbage subgraph XPCOM_A -> JS_B -> XPCOM_A?

The answer is "sometimes". There are three events to consider: SUSPECT_A, SCAN, and DROP_CB_EDGE. SUSPECT_A always happens before SCAN. If DROP_CB_EDGE happens between SUSPECT_A and SCAN, we find the garbage cycle. If DROP_CB_EDGE happens after SCAN, we don't find the cycle because it hasn't *quite* turned into garbage by the time we scan, and we forget any suspect we scanned and couldn't prove to be garbage.

This is a risk inherent in interfacing the Bacon-Rajan algorithm (which is driven by refcounting events) with a subgraph that has no refcounting events. The JS graph is mark/sweep. We don't know when a node's refcount goes from N+1 -> N for nonzero N. We're only ever driven by refcounting events in the XPCOM subgraph, and they can essentially "race" against the activity within the JS graph.

To eliminate the racing entirely, you would probably need to do something really awful like forcibly suspect all JS roots at the beginning of every cycle collection. This would cost, but it's not clear to me how much more: we may already be near the situation of scanning most of the JS heap each cycle collection, due to all the parent links.
So as things stand we can't rely on the cycle collector to collect cycles, right?  That is, we can't start removing stuff like clearing the scope on windows and manual cycle-breaking yet.

That was the part I really cared about.
We should not remove any JS_ClearScope calls in any event.

Graydon's right about parent links -- they are everywhere. Suggest trying out the suspect-all-roots patch and see how it performs.

/be
Building this patch presently. It's a bit less than suspecting all roots; just all native wrappers. I'll try a couple variants. It might even be sufficient to suspect all the identity nsISupports' on each native wrapper, not the wrappers themselves.
Patch filed in new bug 368869. Doesn't seem to fix significant leakage.
I wouldn't expect it to on our leak tests -- none of those pages have any script on them, e.g.   ;)

Assignee

Comment 24

13 years ago
This was fixed as part of the fix for bug 367779.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.