Closed Bug 367779 Opened 18 years ago Closed 17 years ago

Make cycle collection through JS objects more reliable

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: peterv, Assigned: peterv)

References

Details

Attachments

(4 files, 8 obsolete files)

I'm going to attach a patch that I made while trying to fix bug 324871.

* Currently the refcounting of JS objects in the cycle collector is sometimes
  wrong because the JS GC restarts itself, and when that happens the mark
  notification for a given object is called a "multiple of the real refcount"
  times (as many times as the GC was restarted). I solved this by adding a GC
  callback that records JSGC_MARK_END and make the mark notification reset the
  refcounts after that.
  There's still a problem when GC_MARK_DEBUG is defined, we still get too many
  mark notifications. A bit annoying because GC_MARK_DEBUG is useful to figure
  out the chains that keep JS objects alive, but I haven't been able to find
  out what exactly makes us get multiple notifications.
* XPCNativeWrapper keeps its XPCWrappedNative alive, so I added code to
  nsXPConnect::Traverse to record that.
* I made JSFunction objects traverse their parent object. This fixed a cycle
  that I saw from a closure (XPCOM object holding on to a JS function that kept
  the XPCOM object alive through the closure).
* Added traversal code for XPCVariant.
* Made nsXPCWrappedJS record the strong ref that it keeps to itself.
* Made XPCWrappedNative traverse its XPCNativeWrapper.

After this lands all that needs to happen for bug 324871 to be fixed is add traversal from the nodes to their UserData.
Attached patch v1 (obsolete) — Splinter Review
Attachment #252369 - Flags: review?(graydon)
Forgot to mention:

* Made the labels in the cycle collection graph contain a bit more information
  (JS class names, etc).
Status: NEW → ASSIGNED
I just ran into "traversed refs exceed refcount" for a JS object, so something's going wrong somewhere.
Comment 3 happens when two XPCVariant's hold onto the same jsval. Both XPCVariant's lock the jsval with JS_LockGCThing, but we only get one mark notification. We should probably try to get the count from the JSGCLockHashEntry into the cycle collector somehow or we need to make XPConnect cache XPCVariants.
The third solution, and the easiest, is to use JS_AddNamedRoot instead of JS_LockGCThing. XPCVariant has a comment about using JS_LockGCThing because it's faster, but I'm not sure that we should care.
There's a bunch of other places that use JS_LockGCThing, so we should either move them all to JS_AddNamedRoot (can we?) or we need to solve the problem of getting the JS_LockGCThing refcount into the cycle collector.
> XPCVariant has a comment about using JS_LockGCThing because it's faster, but
> I'm not sure that we should care.

Please see the blame on that comment?  We sorta care if we want canvas performant, unfortunately.  Or at least it helps _some_.  :(
I'm not sufficiently familiar with this code to know whether what you propose really is easier. If the information is in the JSGCLockHashEntry, what makes it difficult to propagate to the refcount table we're building in XPConnect during cycle collection? Currently we use a JS GC marking callback, but we could probably modify or augment that to convey more / different information.
The lock hash is internal to the JS engine and nothing exposes the count, we'd need to add a public API to get the nesting level of the locks.

Brendan: the cycle collector needs to know the nesting level of locked GC things, otherwise its refcount for the locked things is mostly wrong (too low). We can detect in the GC thing callback that a thing is locked since it passes us the flags. Should we add a JS_GetLockNestingLevel API that we then use on the locked thing?
For each lhe, call rt->gcThingCallback lhe->count - 1 times, in gc_lock_marker?

/be
Heh, I just came up with that while shopping. I'll make a new patch.
Attached patch v1.1 (obsolete) — Splinter Review
* Tearoffs hold a strong reference to their native
* Protos keep their scope alive
* Mark JSObject parent and proto
* XPCWrappedNative keep their proto and scope
* Scopes keep their principal alive (had to add a hash for this, I don't think
  there is an easy way of knowing whether an object is a global object)
Attachment #252369 - Attachment is obsolete: true
Attachment #252369 - Flags: review?(graydon)
Not really sure if we care about traversing the principals actually.
It goes beyond what I understand about xpconnect, but it reads well. Does it help with the number of leaked artifacts under leak-gauge.pl?

I'm currently building a slightly different patch that changes the object-slot traversal to read edges from an edge (not refcount) table built during JS GC, as suggested in bug 368530. Let's try to coordinate these changes.
Attached patch v1.2 (obsolete) — Splinter Review
* Undo double marking of XPCWrappedNative's GetProto()->GetJSProtoObject() in
  xpc_MarkForValidWrapper (the MarkBeforeJSFinalize call in that function already
  does that marking)
* Mark nsXPCWrappedJS chains and the aggregated native

This one applies on top of the patch for bug 368549.
Attachment #253378 - Attachment is obsolete: true
Attached patch v1.3 (obsolete) — Splinter Review
Attachment #254041 - Attachment is obsolete: true
Attachment #254163 - Flags: review?(jst)
Comment on attachment 254163 [details] [diff] [review]
v1.3

diff -x CVS -x .DS_Store -ru ...

Next time, add -p to the diff arguments :)

- In nsXPConnect.cpp:

+#define IS_WRAPPER_CLASS(clazz)                                               \
+          ((clazz) == &XPC_WN_NoHelper_JSClass.base ||                        \
+           (clazz)->getObjectOps == XPC_WN_GetObjectOpsNoCall ||              \
+           (clazz)->getObjectOps == XPC_WN_GetObjectOpsWithCall)
>+

This looks unused.

Besides that and some of these changes not following the xpconnect bracing style etc (which isn't 100% consistent any more), this looks right to me. r=jst
Attachment #254163 - Flags: review?(jst) → review+
(In reply to comment #14)
> * Undo double marking of XPCWrappedNative's GetProto()->GetJSProtoObject() in
>   xpc_MarkForValidWrapper (the MarkBeforeJSFinalize call in that function
>   already does that marking)

Is there an easy way to do a debug-only assertion that the object is already marked, or no?  That seems like it would be a good thing, if possible.
(In reply to comment #11)
> * Scopes keep their principal alive (had to add a hash for this, I don't think
>   there is an easy way of knowing whether an object is a global object)

JSCLASS_IS_GLOBAL set in clasp->flags for JSClass *clasp = JS_GET_CLASS(cx, obj)?

/be
(In reply to comment #17)
> (In reply to comment #14)
> > * Undo double marking of XPCWrappedNative's GetProto()->GetJSProtoObject() in
> >   xpc_MarkForValidWrapper (the MarkBeforeJSFinalize call in that function
> >   already does that marking)
> 
> Is there an easy way to do a debug-only assertion that the object is already
> marked, or no?

JS_ASSERT(!JS_IsAboutToBeFinalized(cx, obj));

/be
(In reply to comment #15)
> Created an attachment (id=254163) [details]
> v1.3

+    // XXX With GC_MARK_DEBUG defined we end up too many times in
+    //     XPCMarkNotification, even while taking JSGC_MARK_END into account.

Cc'ing igor in case this is easy to fix.

+            snprintf(name, kStringLength, "XPCNativeWrapper (%s)",

Nit: sizeof name rather than coupling to kStringLength is better.

/be

jst's plea for xpconnect prevailing brace and if( style is worth attention, I think. We may reindent _en masse_, but until then, "when in Rome".

/be
Attached patch v1.4Splinter Review
(In reply to comment #18)
> (In reply to comment #11)
> > * Scopes keep their principal alive (had to add a hash for this, I don't think
> >   there is an easy way of knowing whether an object is a global object)
> 
> JSCLASS_IS_GLOBAL set in clasp->flags for JSClass *clasp = JS_GET_CLASS(cx,
> obj)?

Ok, I added a check for that flag but kept the hash to avoid having to walk the native scopes linked list every time. The warning about not having the JSCLASS_IS_GLOBAL flag fires for the Sandbox class in xpccomponents, maybe we should add it here: http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/js/src/xpconnect/src/xpccomponents.cpp&rev=1.103&mark=3113#3100

(In reply to comment #20)
> (In reply to comment #15)
> +    // XXX With GC_MARK_DEBUG defined we end up too many times in
> +    //     XPCMarkNotification, even while taking JSGC_MARK_END into account.
> 
> Cc'ing igor in case this is easy to fix.

I suspect the GC_MARK call at http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/js/src/jsgc.c&rev=&cvsroot=/cvsroot&mark=2118#2100 That will end up calling gcThingCallback again.

> Nit: sizeof name rather than coupling to kStringLength is better.

Done.

(In reply to comment #21)
> jst's plea for xpconnect prevailing brace and if( style is worth attention, I
> think. We may reindent _en masse_, but until then, "when in Rome".

Looking at surrounding code it felt more like being in the Vatican, in Rome but still somewhere else ;-). New patch tries to use XPConnect style in new/changed code.
Attachment #254163 - Attachment is obsolete: true
Attachment #254418 - Flags: superreview?(brendan)
Attachment #254418 - Flags: review+
(In reply to comment #0)
>   There's still a problem when GC_MARK_DEBUG is defined, we still get too many
>   mark notifications. A bit annoying because GC_MARK_DEBUG is useful to figure
>   out the chains that keep JS objects alive, but I haven't been able to find
>   out what exactly makes us get multiple notifications.

You mean multiple notifications per single object?
Multiple notifications per object are normal, but we expect only one per time an object gets marked by another object (or through a lock or a named root) and I see more than that.
(In reply to comment #24)
> Multiple notifications per object are normal, but we expect only one per time
> an object gets marked by another object (or through a lock or a named root) and
> I see more than that.

Is it possible to get a printout of MARK_DEBUG traceing together with notification reports? I.e. combining GC_MARK_DEBUG output with printouts about notifications?  
(In reply to comment #22)
> (In reply to comment #18)
> > (In reply to comment #11)
> > > * Scopes keep their principal alive (had to add a hash for this, I don't think
> > >   there is an easy way of knowing whether an object is a global object)
> > 
> > JSCLASS_IS_GLOBAL set in clasp->flags for JSClass *clasp = JS_GET_CLASS(cx,
> > obj)?
> 
> Ok, I added a check for that flag but kept the hash to avoid having to walk the
> native scopes linked list every time. The warning about not having the
> JSCLASS_IS_GLOBAL flag fires for the Sandbox class in xpccomponents, maybe we
> should add it here:
> http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/js/src/xpconnect/src/xpccomponents.cpp&rev=1.103&mark=3113#3100

Cc'ing mrbkap for his thoughts.

> (In reply to comment #20)
> > (In reply to comment #15)
> > +    // XXX With GC_MARK_DEBUG defined we end up too many times in
> > +    //     XPCMarkNotification, even while taking JSGC_MARK_END into account.
> > 
> > Cc'ing igor in case this is easy to fix.
> 
> I suspect the GC_MARK call at
> http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/js/src/jsgc.c&rev=&cvsroot=/cvsroot&mark=2118#2100
> That will end up calling gcThingCallback again.

Indeed. Igor, why was that added? I forget, if ever I knew.

/be
Comment on attachment 254418 [details] [diff] [review]
v1.4

Looks good to me. I forget, why must all nsIPrincipal (or CycleCollector) code be excluded if XPCONNECT_STANDALONE? Standalone always depended on XPCOM. We want to fix any bogus caps dependencies, but cycle-collector deps are right and proper for standalone xpconnect.

I'm having intense deja vu now, sorry if I asked this in another bug within the last year and forgot the answer.

/be
Comment on attachment 254418 [details] [diff] [review]
v1.4

sr=me, but maybe we can get that GC_MARK_DEBUG redundant GC_MARK fixed this go-round?

/be
Attachment #254418 - Flags: superreview?(brendan) → superreview+
(In reply to comment #28)
> (From update of attachment 254418 [details] [diff] [review])
> sr=me, but maybe we can get that GC_MARK_DEBUG redundant GC_MARK fixed this
> go-round?

There is a related question: why MarkGCThingChildren never calls cx->runtime->gcThingCallback when marking things in JSObject.slots array? I remember Gordon was fixing that in one of his patches. This "missing" calls in fact presents under GC_MARK_DEBUG since the code skips direct marking optimizations in MarkGCThingChildren.
(In reply to comment #29)
> There is a related question: why MarkGCThingChildren never calls
> cx->runtime->gcThingCallback when marking things in JSObject.slots array? I
> remember Gordon was fixing that in one of his patches. This "missing" calls in
> fact presents under GC_MARK_DEBUG since the code skips direct marking
> optimizations in MarkGCThingChildren.

More info: during XPCOM collector development Gordon mentioned once a problem of JS objects with 0 refcounts. That was caused by lack of calls to cx->runtime->gcThingCallback. Now the code in CVS tip does not have all those calls as gcThingCallback is notified only for the substring case. Is it a bug? 
(In reply to comment #24)
> Multiple notifications per object are normal, but we expect only one per time
> an object gets marked by another object (or through a lock or a named root) and
> I see more than that.

But what if object1 refers to object2 multiple times? 
(In reply to comment #29)
> There is a related question: why MarkGCThingChildren never calls
> cx->runtime->gcThingCallback when marking things in JSObject.slots array?

Not really sure what you mean: http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/js/src/jsgc.c&rev=&cvsroot=/cvsroot&mark=2108#2100 shows the callback.

Here's what I think is happening: js_MarkGCThing calls the callback (http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/js/src/jsgc.c&rev=&cvsroot=/cvsroot&mark=2481#2470) and then calls MarkGCThingChildren. MarkGCThingChildren loops over the slots and calls the callback for the objects in the slots. But then, in the case of GC_MARK_DEBUG it calls GC_MARK on thing, which is one of the objects in the slot. GC_MARK calls js_MarkNamedGCThing which calls js_MarkGCThing which calls the callback again, but MarkGCThingChildren called it already for that object in that slot.
See http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/js/src/jsgc.c&rev=&cvsroot=/cvsroot&mark=2108&mark=2118&mark=2135#2100

> (In reply to comment #31)
> But what if object1 refers to object2 multiple times? 

Then the callback is called multiple times. That's why the callback is called early, before those two if statements: http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/js/src/jsgc.c&rev=&cvsroot=/cvsroot&mark=2108,2111,2113#2100

(In reply to comment #27)
> I forget, why must all nsIPrincipal (or CycleCollector) code
> be excluded if XPCONNECT_STANDALONE? Standalone always depended on XPCOM. We
> want to fix any bogus caps dependencies, but cycle-collector deps are right and
> proper for standalone xpconnect.

All nsIPrincipal use in xpconnect is ifdef'ed XPCONNECT_STANDALONE, probably because of the caps dependency. I'm not ifdef'ing all cycle collector code, but only cycle collector code that deals with nsIPrincipal in xpconnect. When the caps dependency is solved the ifdef's can be removed with all the others in xpconnect.
(In reply to comment #32)
> Not really sure what you mean:
> http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/js/src/jsgc.c&rev=&cvsroot=/cvsroot&mark=2108#2100
> shows the callback.
> 
> Here's what I think is happening: js_MarkGCThing calls the callback

Sorry for a blunder, I was looking to 1.8.1 version of jsgc.c. So the code is correct as your description of the problem with GC_MARK_DEBUG. Now a simple fix would be in the slot loop in MarkGCThingChildren to call gcThingCallback only when GC_MARK_DEBUG is not defined.

> (http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/js/src/jsgc.c&rev=&cvsroot=/cvsroot&mark=2481#2470)
> and then calls MarkGCThingChildren. MarkGCThingChildren loops over the slots
> and calls the callback for the objects in the slots. But then, in the case of
> GC_MARK_DEBUG it calls GC_MARK on thing, which is one of the objects in the
> slot. GC_MARK calls js_MarkNamedGCThing which calls js_MarkGCThing which calls
> the callback again, but MarkGCThingChildren called it already for that object
> in that slot.
> See
> http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/js/src/jsgc.c&rev=&cvsroot=/cvsroot&mark=2108&mark=2118&mark=2135#2100
> 
> > (In reply to comment #31)
> > But what if object1 refers to object2 multiple times? 
> 
> Then the callback is called multiple times. That's why the callback is called
> early, before those two if statements:
> http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/js/src/jsgc.c&rev=&cvsroot=/cvsroot&mark=2108,2111,2113#2100
> 

Attached patch Fix for GC_MARK_DEBUG (obsolete) — Splinter Review
Here is a patch for GC_MARK_DEBUG problem. The idea is to treat non-GC_MARK_DEBUG case as an optimization with all that inlining / tail call elimination.
Attachment #254454 - Flags: review?(peterv)
Attachment #254454 - Flags: review?(brendan)
Comment on attachment 254454 [details] [diff] [review]
Fix for GC_MARK_DEBUG

>+#ifdef GC_MARK_DEBUG
>+        strcpy(name, "base");
>+        GC_MARK(cx, thing, name);
>+#else

So we don't need name here, could just pass "base" to GC_MARK -- localize the name auto array to the OBJECT case? r=me with that.

/be
Attachment #254454 - Flags: review?(brendan) → review+
(In reply to comment #37)
> So we don't need name here, could just pass "base" to GC_MARK -- localize the
> name auto array to the OBJECT case? r=me with that.

You mean declaring name inside an extra block like in:

            /*
             * Do not eliminate C recursion when debugging to allow
             * js_MarkNamedGCThing to build a full dump of live GC
             * things.
             */
            {
                char name[32];
                GetObjSlotName(obj, i, name, sizeof name);
                GC_MARK(cx, JSVAL_TO_GCTHING(v), name);
            }

?
Attached patch Fix for GC_MARK_DEBUG v1.b (obsolete) — Splinter Review
Patch with nits addressed.
Attachment #254454 - Attachment is obsolete: true
Attachment #254528 - Flags: review+
Attachment #254454 - Flags: review?(peterv)
Depends on: 368549
Comment on attachment 254528 [details] [diff] [review]
Fix for GC_MARK_DEBUG v1.b

Peter: can you check that the patch eliminates duplicates?
Comment on attachment 254528 [details] [diff] [review]
Fix for GC_MARK_DEBUG v1.b

With this patch and GC_MARK_DEBUG defined I get a crash.

>Index: jsgc.c
>===================================================================

>       case GCX_MUTABLE_STRING:
>         str = (JSString *)thing;
>         if (!JSSTRING_IS_DEPENDENT(str))
>             break;
>         thing = JSSTRDEP_BASE(str);
>+#ifdef GC_MARK_DEBUG
>+        GC_MARK(cx, thing, "base");

Do you need a break here?

>+#else

>         goto start;
> #endif
Fixing the embarrassing bug pointed by Peter.
Attachment #254528 - Attachment is obsolete: true
Attachment #254540 - Flags: review?(peterv)
Comment on attachment 254540 [details] [diff] [review]
Fix for GC_MARK_DEBUG v2

Sorry, I should have caught that. FWIW, r=me.

/be
Attachment #254540 - Flags: review+
Comment on attachment 254540 [details] [diff] [review]
Fix for GC_MARK_DEBUG v2

Yes, that seems to work. Thanks!
Attachment #254540 - Flags: review?(peterv) → review+
I committed GC_MARK_DEBUG fix:

Checking in jsgc.c;
/cvsroot/mozilla/js/src/jsgc.c,v  <--  jsgc.c
new revision: 3.203; previous revision: 3.202
done
Comment on attachment 255588 [details] [diff] [review]
v1.4 (mozilla/js part)

Checked in:

js/src/jsgc.c 3.203

/be
Attachment #255588 - Flags: review+
peterv: Your check-in to nsXPConnect.cpp added a few snprintf function calls, my VC8 compiler does not like this. I'm not sure why the tinderboxen do not go red, but my local setup here breaks (and I think it's ok that it breaks here).
Comment on attachment 254418 [details] [diff] [review]
v1.4

Sorry, I should have caught those snprintfs and said use JS_snprintf (or the NSPR counterpart, PR_snprintf -- looks like xpconnect has two calls to JS_snprintf and one to PR_vsnprintf for some, or perhaps no good, reason).

/be
Attached patch Fix v1.0 (obsolete) — Splinter Review
Fix v1.0

* Replace the |snprintf| calls with |JS_snprintf| calls.
Attachment #255615 - Flags: superreview?(brendan)
Attachment #255615 - Flags: review?(brendan)
(In reply to comment #49)
> (From update of attachment 254418 [details] [diff] [review])
> Sorry, I should have caught those snprintfs and said use JS_snprintf (or the
> NSPR counterpart, PR_snprintf -- looks like xpconnect has two calls to
> JS_snprintf and one to PR_vsnprintf for some, or perhaps no good, reason).
> 
> /be
> 

Can easily change the calls from |JS_snprintf| too |PR_snprintf| if you think those are better in this case.
Comment on attachment 255615 [details] [diff] [review]
Fix v1.0

Whoops... missed some.
Attachment #255615 - Flags: superreview?(brendan)
Attachment #255615 - Flags: review?(brendan)
Attached patch Fix v2.0 (obsolete) — Splinter Review
Fix v2.0

* Like so?
Attachment #255615 - Attachment is obsolete: true
Attachment #255617 - Flags: superreview?(brendan)
Attachment #255617 - Flags: review?(brendan)
Comment on attachment 255617 [details] [diff] [review]
Fix v2.0

I'd go with JS_ for now, in case JS_C_STRINGS_ARE_UTF8 comes back on and a format string requires one of the code paths it enables.

/be
And JS_ outnumbers PR_ two to one ;-).

Looking for a followup fix to make XPConnect uniformly prefer jsprf.h functions in case a JS-specific format specifier or rule is needed.

/be
Same fix as above but using |JS_snprintf| :)
Attachment #255617 - Attachment is obsolete: true
Attachment #255623 - Flags: superreview?(brendan)
Attachment #255623 - Flags: review?(brendan)
Attachment #255617 - Flags: superreview?(brendan)
Attachment #255617 - Flags: review?(brendan)
Comment on attachment 255623 [details] [diff] [review]
Fix v2.0 (using JS_snprintf)

Thanks,

/be
Attachment #255623 - Flags: superreview?(brendan)
Attachment #255623 - Flags: superreview+
Attachment #255623 - Flags: review?(brendan)
Attachment #255623 - Flags: review+
Comment on attachment 255623 [details] [diff] [review]
Fix v2.0 (using JS_snprintf)

checked in

Checking in src/nsXPConnect.cpp;
/cvsroot/mozilla/js/src/xpconnect/src/nsXPConnect.cpp,v  <--  nsXPConnect.cpp
new revision: 1.94; previous revision: 1.93
done
Checking in src/xpcwrappedjs.cpp;
/cvsroot/mozilla/js/src/xpconnect/src/xpcwrappedjs.cpp,v  <--  xpcwrappedjs.cpp
new revision: 1.51; previous revision: 1.50
done
Checking in src/xpcwrappednative.cpp;
/cvsroot/mozilla/js/src/xpconnect/src/xpcwrappednative.cpp,v  <--  xpcwrappednative.cpp
new revision: 1.130; previous revision: 1.129
done
Thanks for the followup fix, tinderboxes didn't catch it because it's in debug-only code.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: