Closed
Bug 71141
Opened 24 years ago
Closed 23 years ago
crash with dangling GC root from nsXULElement::mScriptObject
Categories
(Core :: XUL, defect)
Tracking
()
RESOLVED
FIXED
mozilla0.8.1
People
(Reporter: waterson, Assigned: waterson)
References
()
Details
(Keywords: crash)
Attachments
(7 files)
3.93 KB,
patch
|
Details | Diff | Splinter Review | |
6.12 KB,
patch
|
Details | Diff | Splinter Review | |
8.14 KB,
patch
|
Details | Diff | Splinter Review | |
792 bytes,
patch
|
Details | Diff | Splinter Review | |
9.91 KB,
patch
|
Details | Diff | Splinter Review | |
10.32 KB,
patch
|
Details | Diff | Splinter Review | |
7.13 KB,
patch
|
Details | Diff | Splinter Review |
I've been able to (pretty reliably) reproduce a crash that appears to be caused by a dangling GC root from nsXULElement::mScriptObject. Here's how: 1. Go to www.webmonkey.com in Mozilla. 2. Choose File/Edit to open the page in Composer. 3. Repeat (2) as necessary, occasionally closing some of the windows. Adding belt-and-braces removal of the GC root in nsXULElement's dtor seems to fix the problem. Patch to follow...
Assignee | ||
Comment 1•24 years ago
|
||
Assignee | ||
Comment 2•24 years ago
|
||
N.B. that I whacked XPC's JS runtime service to support weak references so that I could hack this a bit more cleanly.
Assignee | ||
Comment 3•24 years ago
|
||
jpatel: I'm thinking this might be related to the JS Mark GC thing topcrash. ftp://ftp.mozilla.org/pub/data/crash-data/ns6analysis.html#js_MarkGCThing
Comment 4•23 years ago
|
||
Looks good to me, r/sr=brendan@mozilla.org. /be
Comment 5•23 years ago
|
||
r=jband From your comment, I gather that we are going to sometimes be calling JS_RemoveRootRT twice for the same root. Is that going to be happening a lot? I guess it is only a minor perf hit. I was also a little concerned about the danger of bringing the JSRuntimeService back from the dead - with a different JSRuntime! - but it looks like the shutdown code in the service manager will always protect us from that.
Assignee | ||
Comment 6•23 years ago
|
||
Assignee | ||
Comment 7•23 years ago
|
||
I thought about this a bit more, and think that maybe it's worth trying to solve the problem instead of band-aid around it. I added the above debug code, and then found this rather puzzling stack trace: nsDebug::Assertion(const char * 0x0129d000, const char * 0x0129cfe8, const char * 0x0129cfa8, int 549) line 254 + 13 bytes nsXULElement::~nsXULElement() line 549 + 35 bytes nsXULElement::`scalar deleting destructor'(unsigned int 1) + 15 bytes nsXULElement::Release(nsXULElement * const 0x05bbc7a0) line 687 + 157 bytes nsJSUtils::nsGenericFinalize(JSContext * 0x05ab6660, JSObject * 0x00f69408) line 437 + 12 bytes FinalizeXULElement(JSContext * 0x05ab6660, JSObject * 0x00f69408) line 291 + 13 bytes js_FinalizeObject(JSContext * 0x05ab6660, JSObject * 0x00f69408) line 1662 + 274 bytes js_GC(JSContext * 0x05ab6660, unsigned int 0) line 1218 + 11 bytes js_ForceGC(JSContext * 0x05ab6660) line 945 + 11 bytes js_DestroyContext(JSContext * 0x05ab6660, int 2) line 224 + 9 bytes JS_DestroyContext(JSContext * 0x05ab6660) line 882 + 11 bytes nsJSContext::~nsJSContext() line 392 + 13 bytes [...] If I understand this correctly, there's a XUL element that's being finalized even though it's still rooted. Do we finalize all JS objects associated with a context when the context is destroyed or something? Is this a GC bug? Or is my patch just bogus and I've dropped something on the floor here?
Comment 8•23 years ago
|
||
There's no way that JS is finalizing objects that are rooted - whether JSContexts are going away or not. We'd have noticed that! Are you *sure* there is not other code that is unrooting the object (and not setting your debug flag)?
Assignee | ||
Comment 9•23 years ago
|
||
brendan and I debugged this for a while. It turns out that there are *two* JS objects created for single DOM element. This happens because the generated DOM glue assumes (e.g., in NS_NewScriptXULElement) that asking for the script object for the parent *won't* recursively result in us asking for the child's script object again. XBL's ``onattach'' event handler can break this assumption, and that's exactly what happens in this case. We discovered this by adding a latch in nsXULElement's GetScriptObject() method (``mCreatingScriptObject''), and then assert-botching if we re-enter while building the script object.
Assignee | ||
Comment 10•23 years ago
|
||
Assignee | ||
Comment 11•23 years ago
|
||
You two should be aware of this.
Comment 12•23 years ago
|
||
I suggested to waterson that we change the glue code generated by idlc so NS_NewScriptXULElement and the rest all detect the recursive set. One way to do this: set *aReturn = nsnull; at the top of the function, before QI'ing aParent. Then test whether *aReturn became non-null during that call. The rest of the calls out from NS_NewScriptXULElement look ok to me -- NS_InitXULElementClass can't do any GetScriptObject on aSupports, and the QI from aSupports to nsIXULElement shouldn't. But hyatt should comment -- who knows where XBL might lurk? Cc'ing jst too, as we're talking about a deathbed hack to idlc. /be
Assignee | ||
Comment 13•23 years ago
|
||
Assignee | ||
Comment 14•23 years ago
|
||
brendan: I think that we actually need to do this in nsXULElement::GetScriptObject() (et. al), otherwise we'll run the onbindingattached handler twice for the child element.
Assignee | ||
Comment 15•23 years ago
|
||
(Also, relying on aReturn pointing to the same place seems sort of scary.)
Comment 16•23 years ago
|
||
waterson: right, we need to do this in several places, and we do need some well-known storage (however veneered by interface methods) that latches things so we don't double-create. Reording code, given all the layers and generality, won't guarantee that this problem is vanquished. I think we need some memory-based mechanism. Anyone have a better idea? /be
Summary: crash with danling GC root from nsXULElement::mScriptObject → crash with dangling GC root from nsXULElement::mScriptObject
Assignee | ||
Comment 17•23 years ago
|
||
Assignee | ||
Comment 18•23 years ago
|
||
I think this should do it: 1. Instead of creating script object directly into mScriptObject (or slots->mScriptObject), use stack variable. After creating script object, check to see if mScriptObject (or slots->mScriptObject) was set, indicating that we re-entered. If so, disconnect JSObject from native object, and ``discard'' redundant JSObject. This is a bit unsightly (adding JS_SetPrivate() goop to unlink native from JSObject), but I consider this a somewhat temporary hack given that the XPConnect'd DOM will make this all obsolete. N.B. that I fixed nsGenericElement, too, just in case somebody starts using XBL on HTML elements. 2. Add check for null script object to XUL element: if null, don't JS GC root the null pointer. 3. Add debug-only assert-botch code to notice if a XUL element is still rooted in dtor. 4. Unrelated gratuitous cleanup of XUL script proto manhandling of roots. Make nsJSRuntimeServiceImpl support weak references. r=/sr=, anyone?
Assignee | ||
Comment 19•23 years ago
|
||
*** Bug 70959 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 20•23 years ago
|
||
jst: brendan suggested that we could avoid this problem by enqueuing XBL's attach handler stuff and processing later, after script objects have been created. Dunno if you're going to need to wrestle with this problem while XPConnectifying the DOM...I'd be interested in what you thought...
Comment 21•23 years ago
|
||
r/sr=brendan@mozilla.org, looks good to me. /be
Comment 22•23 years ago
|
||
Doesn't this patch introduce a leak (which is better than a crash, but still...) in the case where we do re-enter GetScriptObject()? We create the script object, check if we re-ented and if we did we do a ::JS_SetPrivate(cx, newobject, nsnull), this will make the finalizer for the new script object *not* do the release on the native pointer, as it usually does, don't we need a (ew, this is uggly, but...) this->Release(); call (or maybe ::JS_GetPrivate(cx, newobject)->Release()) after ::JS_SetPrivate() to match the AddRef() that's done by factory->NewScriptElement()? Or did I overlook something here? As to what this will mean in the new XPConnectified DOM world I have no idea yet, we need to discuss this with jband once XBL actually works with an XPConnected DOM.
Comment 23•23 years ago
|
||
jst's smart, and stuff! /be
Assignee | ||
Comment 24•23 years ago
|
||
Assignee | ||
Comment 25•23 years ago
|
||
Assignee | ||
Comment 26•23 years ago
|
||
Ok, I opted for leaving *out* the changes to xpconnect (changing nsJSRuntimeServiceImpl to support weak references), and left the XUL proto element root management stuff alone. I started seeing bizarre shutdown crashes (from PREF_Shutdown(), through JS_GC). I suspect jband fears may have been realized, and a different runtime got created at shutdown.
Comment 27•23 years ago
|
||
Even in the new world, XBL will need a hook for when a DOM element is first being exposed to JS. This is needed for display: none elements, since I can't afford to waste time resolving style on display: none elts to determine what binding to attach. I have to do this lazily and it has to be done before the object is exposed to script.
Comment 28•23 years ago
|
||
waterson: I'm neutral on weakref support in nsJSRuntimeServiceImpl. I think it is safe in and of itself. Your prefs shutdown crash is probably bug 71237. I think that one is due to prefs trying to kill its JSContext late w/o regard to whether ot not the nsJSRuntimeServiceImpl has already shutdown and killed the one and only JSRuntime. It may be that changing your code to not hold a strong reference to the nsJSRuntimeServiceImpl exacerbated the problem by increasing the likelihood that the nsJSRuntimeServiceImpl would die earler. But, I don't think we are seeing the problem of a zombie nsJSRuntimeServiceImpl.
Assignee | ||
Comment 29•23 years ago
|
||
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Component: XP Toolkit/Widgets: XUL → XUL
QA Contact: jrgmorrison → xptoolkit.widgets
You need to log in
before you can comment on or make changes to this bug.
Description
•