Closed Bug 386912 Opened 13 years ago Closed 12 years ago

cycle collector faults after tracing "JS object but unknown to the JS GC"

Categories

(Core :: XPConnect, defect, critical)

x86
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla1.9beta1

People

(Reporter: dbaron, Assigned: peterv)

References

Details

(Keywords: memory-leak)

Attachments

(4 files)

I've seen this happen once, and Robert O'Callahan has seen it happen multiple times.  (See bug 386664.)  We see the cycle collector fault because of:

###!!! ASSERTION: JS object but unknown to the JS GC?: 'refcount > 0', file
/Users/roc/mozilla-checkin/mozilla/js/src/xpconnect/src/nsXPConnect.cpp, line
670
###!!! ASSERTION: Fault in cycle collector: zero refcount (ptr: 2024100)
: 'Not Reached', file
/Users/roc/mozilla-checkin/mozilla/xpcom/base/nsCycleCollector.cpp, line 920

Since faults cause the cycle collector to stop collecting, this should probably be blocking1.9.

We need to figure out what is or could be causing this...
Flags: blocking1.9?
Severity: normal → critical
Keywords: mlk, qawanted
Bug 386947 is one way to trigger these assertions.
roc, could you see if the patch in bug 386947 (already checked in) makes these go away for you?
Peterv, any chance you could refresh your memory on the cycle collector and look into this one?
Assignee: nobody → peterv
Flags: blocking1.9? → blocking1.9+
I'm still seeing this; this patch should make the first assertion fire with a useful stack.  Hopefully I'll catch it again sometime...
Attachment #272266 - Flags: superreview?(peterv)
Attachment #272266 - Flags: review?(peterv)
Attached file stacks of assertions
I just hit this while starting up (restoring a large session).  Here are the stacks of the assertions I hit up to the first of the old assertions.  Anything after the first of the old assertions is suspicious because it could have been traced from a suspicious object.
Comment on attachment 272266 [details] [diff] [review]
add an assertion that will fire with a useful stack

Do we still want to keep the assertion in nsXPConnect::Traverse?
Attachment #272266 - Flags: superreview?(peterv)
Attachment #272266 - Flags: superreview+
Attachment #272266 - Flags: review?(peterv)
Attachment #272266 - Flags: review+
I don't see the harm in keeping it, and it might be useful in showing the chain of events to somebody.
Anyway, assertion checked in, so hopefully other people will get some useful stacks too.
Attached file stacks of assertions
This time the initial number of problematic pointers before traversal of any problematic pointers was smaller (2).  Both of the stacks implicate nsJSScriptTimeoutHandler::cycleCollection::Traverse (as did some, but not the first, in the last set).
So, for what it's worth, I can't find anything that traverses *to* a nsJSScriptTimeoutHandler, so I'm not sure why it's a cycle collection participant.  Or are they wrapped by wrapped natives?  The only cases in which I'm seeing them traversed are cases where they were roots.
Depends on: 392144
I filed bug 392144 with some stacks on Mac, where the first assertion happens with JS_CallTracer on the stack.
I captured one of these faults in Chronicle. The fault occurs when GCGraphBuilder::Traverse's aPtrInfo was set up by nsJSScriptTimeoutHandler::cycleCollection::Traverse, calling NoteScriptChild on its tmp->mFunObj. That mFunObj field was initialized in nsJSScriptTimeoutHandler::Init just after we (successfully) called JS_AddNamedRoot on it. Attached is the stack for that Init call.

I can easily extract more data if needed.
The key in this stack is that we're calling JS_AddNamedRoot *after* we've constructed the mObjRefcounts that we're going to use for this cycle collection, but before we do the traversal from the script timeout handler.
Blocks: 394680
Status: NEW → ASSIGNED
Depends on: 379718
Target Milestone: --- → mozilla1.9 M9
Depends on: 395562
Target Milestone: mozilla1.9 M9 → ---
Targeting M9 per discussion with Damon.
Target Milestone: --- → mozilla1.9 M9
Roc, dbaron, should this indeed block beta?
Removing qawanted keyword based on comment 12 and comment 13.  It sounds like the cause of the fault in this case is understood, so having a testcase or steps to reproduce isn't critical.
Keywords: qawanted
PeterV are you working on this?  Just want to make sure we get this in for b1 :-)
I'm working on bug 379718 which also fixes this bug. It's a bit of a risky change though, not sure it'll be ready for b1.
I don't think there's an easy way to fix this bug without fixing bug 379718. The problem here is that we end up creating JS objects from the JSGC_END callback, after we've finished reference counting of the JS objects but before running the cycle collection. So we end up with JS objects for which we don't have a reference count. Bug 379718 does away with reference counting for JS objects and fixes this bug as a side-effect of that.
Depends on: 401687
Fixed by the patch for bug 401687.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Duplicate of this bug: 392144
Duplicate of this bug: 395562
You need to log in before you can comment on or make changes to this bug.