Closed
Bug 71141
Opened 25 years ago
Closed 25 years ago
crash with dangling GC root from nsXULElement::mScriptObject
Categories
(Core :: XUL, defect)
Tracking
()
RESOLVED
FIXED
mozilla0.8.1
People
(Reporter: waterson, Assigned: waterson)
References
()
Details
(Keywords: crash)
Attachments
(7 files)
|
3.93 KB,
patch
|
Details | Diff | Splinter Review | |
|
6.12 KB,
patch
|
Details | Diff | Splinter Review | |
|
8.14 KB,
patch
|
Details | Diff | Splinter Review | |
|
792 bytes,
patch
|
Details | Diff | Splinter Review | |
|
9.91 KB,
patch
|
Details | Diff | Splinter Review | |
|
10.32 KB,
patch
|
Details | Diff | Splinter Review | |
|
7.13 KB,
patch
|
Details | Diff | Splinter Review |
I've been able to (pretty reliably) reproduce a crash that appears to be caused
by a dangling GC root from nsXULElement::mScriptObject. Here's how:
1. Go to www.webmonkey.com in Mozilla.
2. Choose File/Edit to open the page in Composer.
3. Repeat (2) as necessary, occasionally closing some of the windows.
Adding belt-and-braces removal of the GC root in nsXULElement's dtor seems to
fix the problem. Patch to follow...
| Assignee | ||
Comment 1•25 years ago
|
||
| Assignee | ||
Comment 2•25 years ago
|
||
N.B. that I whacked XPC's JS runtime service to support weak references so that
I could hack this a bit more cleanly.
| Assignee | ||
Comment 3•25 years ago
|
||
jpatel: I'm thinking this might be related to the JS Mark GC thing topcrash.
ftp://ftp.mozilla.org/pub/data/crash-data/ns6analysis.html#js_MarkGCThing
Comment 4•25 years ago
|
||
Looks good to me, r/sr=brendan@mozilla.org.
/be
Comment 5•25 years ago
|
||
r=jband
From your comment, I gather that we are going to sometimes be calling
JS_RemoveRootRT twice for the same root. Is that going to be happening a lot? I
guess it is only a minor perf hit. I was also a little concerned about the
danger of bringing the JSRuntimeService back from the dead - with a different
JSRuntime! - but it looks like the shutdown code in the service manager will
always protect us from that.
| Assignee | ||
Comment 6•25 years ago
|
||
| Assignee | ||
Comment 7•25 years ago
|
||
I thought about this a bit more, and think that maybe it's worth trying to solve
the problem instead of band-aid around it. I added the above debug code, and
then found this rather puzzling stack trace:
nsDebug::Assertion(const char * 0x0129d000, const char * 0x0129cfe8,
const char * 0x0129cfa8, int 549)
line 254 + 13 bytes
nsXULElement::~nsXULElement()
line 549 + 35 bytes
nsXULElement::`scalar deleting destructor'(unsigned int 1) + 15 bytes
nsXULElement::Release(nsXULElement * const 0x05bbc7a0)
line 687 + 157 bytes
nsJSUtils::nsGenericFinalize(JSContext * 0x05ab6660, JSObject * 0x00f69408)
line 437 + 12 bytes
FinalizeXULElement(JSContext * 0x05ab6660, JSObject * 0x00f69408)
line 291 + 13 bytes
js_FinalizeObject(JSContext * 0x05ab6660, JSObject * 0x00f69408)
line 1662 + 274 bytes
js_GC(JSContext * 0x05ab6660, unsigned int 0)
line 1218 + 11 bytes
js_ForceGC(JSContext * 0x05ab6660)
line 945 + 11 bytes
js_DestroyContext(JSContext * 0x05ab6660, int 2)
line 224 + 9 bytes
JS_DestroyContext(JSContext * 0x05ab6660)
line 882 + 11 bytes
nsJSContext::~nsJSContext()
line 392 + 13 bytes
[...]
If I understand this correctly, there's a XUL element that's being finalized
even though it's still rooted. Do we finalize all JS objects associated with a
context when the context is destroyed or something? Is this a GC bug? Or is my
patch just bogus and I've dropped something on the floor here?
Comment 8•25 years ago
|
||
There's no way that JS is finalizing objects that are rooted - whether
JSContexts are going away or not. We'd have noticed that! Are you *sure* there
is not other code that is unrooting the object (and not setting your debug
flag)?
| Assignee | ||
Comment 9•25 years ago
|
||
brendan and I debugged this for a while. It turns out that there are *two* JS
objects created for single DOM element. This happens because the generated DOM
glue assumes (e.g., in NS_NewScriptXULElement) that asking for the script object
for the parent *won't* recursively result in us asking for the child's script
object again. XBL's ``onattach'' event handler can break this assumption, and
that's exactly what happens in this case.
We discovered this by adding a latch in nsXULElement's GetScriptObject() method
(``mCreatingScriptObject''), and then assert-botching if we re-enter while
building the script object.
| Assignee | ||
Comment 10•25 years ago
|
||
| Assignee | ||
Comment 11•25 years ago
|
||
You two should be aware of this.
Comment 12•25 years ago
|
||
I suggested to waterson that we change the glue code generated by idlc so
NS_NewScriptXULElement and the rest all detect the recursive set. One way to do
this: set *aReturn = nsnull; at the top of the function, before QI'ing aParent.
Then test whether *aReturn became non-null during that call. The rest of the
calls out from NS_NewScriptXULElement look ok to me -- NS_InitXULElementClass
can't do any GetScriptObject on aSupports, and the QI from aSupports to
nsIXULElement shouldn't. But hyatt should comment -- who knows where XBL might
lurk? Cc'ing jst too, as we're talking about a deathbed hack to idlc.
/be
| Assignee | ||
Comment 13•25 years ago
|
||
| Assignee | ||
Comment 14•25 years ago
|
||
brendan: I think that we actually need to do this in
nsXULElement::GetScriptObject() (et. al), otherwise we'll run the
onbindingattached handler twice for the child element.
| Assignee | ||
Comment 15•25 years ago
|
||
(Also, relying on aReturn pointing to the same place seems sort of scary.)
Comment 16•25 years ago
|
||
waterson: right, we need to do this in several places, and we do need some
well-known storage (however veneered by interface methods) that latches things
so we don't double-create.
Reording code, given all the layers and generality, won't guarantee that this
problem is vanquished. I think we need some memory-based mechanism. Anyone
have a better idea?
/be
Summary: crash with danling GC root from nsXULElement::mScriptObject → crash with dangling GC root from nsXULElement::mScriptObject
| Assignee | ||
Comment 17•25 years ago
|
||
| Assignee | ||
Comment 18•25 years ago
|
||
I think this should do it:
1. Instead of creating script object directly into mScriptObject (or
slots->mScriptObject), use stack variable. After creating script object,
check to see if mScriptObject (or slots->mScriptObject) was set, indicating
that we re-entered. If so, disconnect JSObject from native object, and
``discard'' redundant JSObject.
This is a bit unsightly (adding JS_SetPrivate() goop to unlink native from
JSObject), but I consider this a somewhat temporary hack given that the
XPConnect'd DOM will make this all obsolete.
N.B. that I fixed nsGenericElement, too, just in case somebody starts
using XBL on HTML elements.
2. Add check for null script object to XUL element: if null, don't JS GC root
the null pointer.
3. Add debug-only assert-botch code to notice if a XUL element is still
rooted in dtor.
4. Unrelated gratuitous cleanup of XUL script proto manhandling of roots. Make
nsJSRuntimeServiceImpl support weak references.
r=/sr=, anyone?
| Assignee | ||
Comment 19•25 years ago
|
||
*** Bug 70959 has been marked as a duplicate of this bug. ***
| Assignee | ||
Comment 20•25 years ago
|
||
jst: brendan suggested that we could avoid this problem by enqueuing XBL's
attach handler stuff and processing later, after script objects have been
created. Dunno if you're going to need to wrestle with this problem while
XPConnectifying the DOM...I'd be interested in what you thought...
Comment 21•25 years ago
|
||
r/sr=brendan@mozilla.org, looks good to me.
/be
Comment 22•25 years ago
|
||
Doesn't this patch introduce a leak (which is better than a crash, but still...)
in the case where we do re-enter GetScriptObject()? We create the script object,
check if we re-ented and if we did we do a ::JS_SetPrivate(cx, newobject,
nsnull), this will make the finalizer for the new script object *not* do the
release on the native pointer, as it usually does, don't we need a (ew, this is
uggly, but...) this->Release(); call (or maybe ::JS_GetPrivate(cx,
newobject)->Release()) after ::JS_SetPrivate() to match the AddRef() that's done
by factory->NewScriptElement()? Or did I overlook something here?
As to what this will mean in the new XPConnectified DOM world I have no idea
yet, we need to discuss this with jband once XBL actually works with an
XPConnected DOM.
Comment 23•25 years ago
|
||
jst's smart, and stuff!
/be
| Assignee | ||
Comment 24•25 years ago
|
||
| Assignee | ||
Comment 25•25 years ago
|
||
| Assignee | ||
Comment 26•25 years ago
|
||
Ok, I opted for leaving *out* the changes to xpconnect (changing
nsJSRuntimeServiceImpl to support weak references), and left the XUL proto
element root management stuff alone. I started seeing bizarre shutdown crashes
(from PREF_Shutdown(), through JS_GC). I suspect jband fears may have been
realized, and a different runtime got created at shutdown.
Comment 27•25 years ago
|
||
Even in the new world, XBL will need a hook for when a DOM element is first
being exposed to JS. This is needed for display: none elements, since I can't
afford to waste time resolving style on display: none elts to determine what
binding to attach. I have to do this lazily and it has to be done before the
object is exposed to script.
Comment 28•25 years ago
|
||
waterson: I'm neutral on weakref support in nsJSRuntimeServiceImpl. I think it
is safe in and of itself. Your prefs shutdown crash is probably bug 71237. I
think that one is due to prefs trying to kill its JSContext late w/o regard to
whether ot not the nsJSRuntimeServiceImpl has already shutdown and killed the
one and only JSRuntime. It may be that changing your code to not hold a strong
reference to the nsJSRuntimeServiceImpl exacerbated the problem by increasing
the likelihood that the nsJSRuntimeServiceImpl would die earler. But, I don't
think we are seeing the problem of a zombie nsJSRuntimeServiceImpl.
| Assignee | ||
Comment 29•25 years ago
|
||
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Component: XP Toolkit/Widgets: XUL → XUL
QA Contact: jrgmorrison → xptoolkit.widgets
You need to log in
before you can comment on or make changes to this bug.
Description
•