Closed Bug 71141 Opened 24 years ago Closed 23 years ago

crash with dangling GC root from nsXULElement::mScriptObject

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla0.8.1

People

(Reporter: waterson, Assigned: waterson)

References

(
URL
)

Details

(Keywords: crash)

Attachments

(7 files)

call RemoveJSGCRoot() from nsXULElement's dtor; misc support 24 years ago Chris Waterson 3.93 KB, patch		Details \| Diff \| Splinter Review
add debug code to assert when this happens 23 years ago Chris Waterson 6.12 KB, patch		Details \| Diff \| Splinter Review
more debug code: latch during script object creation 23 years ago Chris Waterson 8.14 KB, patch		Details \| Diff \| Splinter Review
brendan's idea, implimented bracily 23 years ago Chris Waterson 792 bytes, patch		Details \| Diff \| Splinter Review
fix the bug, leave debugging code in. 23 years ago Chris Waterson 9.91 KB, patch		Details \| Diff \| Splinter Review
Release() ref from transient JSObj, per jst 23 years ago Chris Waterson 10.32 KB, patch		Details \| Diff \| Splinter Review
final diffs; leave XPConnect and the XUL proto stuff alone 23 years ago Chris Waterson 7.13 KB, patch		Details \| Diff \| Splinter Review

Chris Waterson

Assignee

Description

•

24 years ago

I've been able to (pretty reliably) reproduce a crash that appears to be caused 
by a dangling GC root from nsXULElement::mScriptObject. Here's how:

1. Go to www.webmonkey.com in Mozilla.
2. Choose File/Edit to open the page in Composer.
3. Repeat (2) as necessary, occasionally closing some of the windows.

Adding belt-and-braces removal of the GC root in nsXULElement's dtor seems to 
fix the problem. Patch to follow...

Chris Waterson

Assignee

Comment 1

•

24 years ago

Attached patch call RemoveJSGCRoot() from nsXULElement's dtor; misc support — Details — Splinter Review

Chris Waterson

Assignee

Comment 2

•

24 years ago

N.B. that I whacked XPC's JS runtime service to support weak references so that 
I could hack this a bit more cleanly.

Status: NEW → ASSIGNED

Keywords: crash

Target Milestone: --- → mozilla0.8.1

Chris Waterson

Assignee

Comment 3

•

24 years ago

jpatel: I'm thinking this might be related to the JS Mark GC thing topcrash.

  ftp://ftp.mozilla.org/pub/data/crash-data/ns6analysis.html#js_MarkGCThing

Brendan Eich [:brendan]

Comment 4

•

23 years ago

Looks good to me, r/sr=brendan@mozilla.org.

/be

John Bandhauer

Comment 5

•

23 years ago

r=jband

From your comment, I gather that we are going to sometimes be calling 
JS_RemoveRootRT twice for the same root. Is that going to be happening a lot? I 
guess it is only a minor perf hit. I was also a little concerned about the 
danger of bringing the JSRuntimeService back from the dead - with a different 
JSRuntime! - but it looks like the shutdown code in the service manager will 
always protect us from that.

Chris Waterson

Assignee

Comment 6

•

23 years ago

Attached patch add debug code to assert when this happens — Details — Splinter Review

Chris Waterson

Assignee

Comment 7

•

23 years ago

I thought about this a bit more, and think that maybe it's worth trying to solve 
the problem instead of band-aid around it. I added the above debug code, and 
then found this rather puzzling stack trace:

nsDebug::Assertion(const char * 0x0129d000, const char * 0x0129cfe8,
                   const char * 0x0129cfa8, int 549)
  line 254 + 13 bytes
nsXULElement::~nsXULElement()
  line 549 + 35 bytes
nsXULElement::`scalar deleting destructor'(unsigned int 1) + 15 bytes
nsXULElement::Release(nsXULElement * const 0x05bbc7a0)
  line 687 + 157 bytes
nsJSUtils::nsGenericFinalize(JSContext * 0x05ab6660, JSObject * 0x00f69408)
  line 437 + 12 bytes
FinalizeXULElement(JSContext * 0x05ab6660, JSObject * 0x00f69408)
  line 291 + 13 bytes
js_FinalizeObject(JSContext * 0x05ab6660, JSObject * 0x00f69408)
  line 1662 + 274 bytes
js_GC(JSContext * 0x05ab6660, unsigned int 0)
  line 1218 + 11 bytes
js_ForceGC(JSContext * 0x05ab6660)
  line 945 + 11 bytes
js_DestroyContext(JSContext * 0x05ab6660, int 2)
  line 224 + 9 bytes
JS_DestroyContext(JSContext * 0x05ab6660)
  line 882 + 11 bytes
nsJSContext::~nsJSContext()
  line 392 + 13 bytes
[...]

If I understand this correctly, there's a XUL element that's being finalized 
even though it's still rooted. Do we finalize all JS objects associated with a 
context when the context is destroyed or something? Is this a GC bug? Or is my 
patch just bogus and I've dropped something on the floor here?

John Bandhauer

Comment 8

•

23 years ago

There's no way that JS is finalizing objects that are rooted - whether 
JSContexts are going away or not. We'd have noticed that! Are you *sure* there 
is not other code that is unrooting the object (and not setting your debug 
flag)?

Chris Waterson

Assignee

Comment 9

•

23 years ago

brendan and I debugged this for a while. It turns out that there are *two* JS 
objects created for single DOM element. This happens because the generated DOM 
glue assumes (e.g., in NS_NewScriptXULElement) that asking for the script object 
for the parent *won't* recursively result in us asking for the child's script 
object again. XBL's ``onattach'' event handler can break this assumption, and 
that's exactly what happens in this case.

We discovered this by adding a latch in nsXULElement's GetScriptObject() method 
(``mCreatingScriptObject''), and then assert-botching if we re-enter while 
building the script object.

Chris Waterson

Assignee

Comment 10

•

23 years ago

Attached patch more debug code: latch during script object creation — Details — Splinter Review

Chris Waterson

Assignee

Comment 11

•

23 years ago

You two should be aware of this.

Brendan Eich [:brendan]

Comment 12

•

23 years ago

I suggested to waterson that we change the glue code generated by idlc so
NS_NewScriptXULElement and the rest all detect the recursive set.  One way to do
this: set *aReturn = nsnull; at the top of the function, before QI'ing aParent.

Then test whether *aReturn became non-null during that call.  The rest of the
calls out from NS_NewScriptXULElement look ok to me -- NS_InitXULElementClass
can't do any GetScriptObject on aSupports, and the QI from aSupports to
nsIXULElement shouldn't.  But hyatt should comment -- who knows where XBL might
lurk?  Cc'ing jst too, as we're talking about a deathbed hack to idlc.

/be

Chris Waterson

Assignee

Comment 13

•

23 years ago

Attached patch brendan's idea, implimented bracily — Details — Splinter Review

Chris Waterson

Assignee

Comment 14

•

23 years ago

brendan: I think that we actually need to do this in 
nsXULElement::GetScriptObject() (et. al), otherwise we'll run the 
onbindingattached handler twice for the child element.

Chris Waterson

Assignee

Comment 15

•

23 years ago

(Also, relying on aReturn pointing to the same place seems sort of scary.)

Brendan Eich [:brendan]

Comment 16

•

23 years ago

waterson: right, we need to do this in several places, and we do need some
well-known storage (however veneered by interface methods) that latches things
so we don't double-create.

Reording code, given all the layers and generality, won't guarantee that this
problem is vanquished.  I think we need some memory-based mechanism.  Anyone
have a better idea?

/be

Summary: crash with danling GC root from nsXULElement::mScriptObject → crash with dangling GC root from nsXULElement::mScriptObject

Chris Waterson

Assignee

Comment 17

•

23 years ago

Attached patch fix the bug, leave debugging code in. — Details — Splinter Review

Chris Waterson

Assignee

Comment 18

•

23 years ago

I think this should do it:

1. Instead of creating script object directly into mScriptObject (or
   slots->mScriptObject), use stack variable. After creating script object,
   check to see if mScriptObject (or slots->mScriptObject) was set, indicating
   that we re-entered. If so, disconnect JSObject from native object, and
   ``discard'' redundant JSObject.

   This is a bit unsightly (adding JS_SetPrivate() goop to unlink native from
   JSObject), but I consider this a somewhat temporary hack given that the
   XPConnect'd DOM will make this all obsolete.

   N.B. that I fixed nsGenericElement, too, just in case somebody starts
   using XBL on HTML elements.

2. Add check for null script object to XUL element: if null, don't JS GC root
   the null pointer.

3. Add debug-only assert-botch code to notice if a XUL element is still
   rooted in dtor.

4. Unrelated gratuitous cleanup of XUL script proto manhandling of roots. Make
   nsJSRuntimeServiceImpl support weak references.

r=/sr=, anyone?

Chris Waterson

Assignee

Comment 19

•

23 years ago

*** Bug 70959 has been marked as a duplicate of this bug. ***

Chris Waterson

Assignee

Comment 20

•

23 years ago

jst: brendan suggested that we could avoid this problem by enqueuing XBL's 
attach handler stuff and processing later, after script objects have been 
created. Dunno if you're going to need to wrestle with this problem while 
XPConnectifying the DOM...I'd be interested in what you thought...

Brendan Eich [:brendan]

Comment 21

•

23 years ago

r/sr=brendan@mozilla.org, looks good to me.

/be

Johnny Stenback (:jst)

Comment 22

•

23 years ago

Doesn't this patch introduce a leak (which is better than a crash, but still...)
in the case where we do re-enter GetScriptObject()? We create the script object,
check if we re-ented and if we did we do a ::JS_SetPrivate(cx, newobject,
nsnull), this will make the finalizer for the new script object *not* do the
release on the native pointer, as it usually does, don't we need a (ew, this is
uggly, but...) this->Release(); call (or maybe ::JS_GetPrivate(cx,
newobject)->Release()) after ::JS_SetPrivate() to match the AddRef() that's done
by factory->NewScriptElement()? Or did I overlook something here?

As to what this will mean in the new XPConnectified DOM world I have no idea
yet, we need to discuss this with jband once XBL actually works with an
XPConnected DOM.

Brendan Eich [:brendan]

Comment 23

•

23 years ago

jst's smart, and stuff!

/be

Chris Waterson

Assignee

Comment 24

•

23 years ago

Attached patch Release() ref from transient JSObj, per jst — Details — Splinter Review

Chris Waterson

Assignee

Comment 25

•

23 years ago

Attached patch final diffs; leave XPConnect and the XUL proto stuff alone — Details — Splinter Review

Chris Waterson

Assignee

Comment 26

•

23 years ago

Ok, I opted for leaving *out* the changes to xpconnect (changing 
nsJSRuntimeServiceImpl to support weak references), and left the XUL proto 
element root management stuff alone. I started seeing bizarre shutdown crashes 
(from PREF_Shutdown(), through JS_GC). I suspect jband fears may have been 
realized, and a different runtime got created at shutdown.

David Hyatt

Comment 27

•

23 years ago

Even in the new world, XBL will need a hook for when a DOM element is first
being exposed to JS.  This is needed for display: none elements, since I can't
afford to waste time resolving style on display: none elts to determine what
binding to attach.  I have to do this lazily and it has to be done before the
object is exposed to script.

John Bandhauer

Comment 28

•

23 years ago

waterson: I'm neutral on weakref support in nsJSRuntimeServiceImpl. I think it 
is safe in and of itself. Your prefs shutdown crash is probably bug 71237. I 
think that one is due to prefs trying to kill its JSContext late w/o regard to 
whether ot not the nsJSRuntimeServiceImpl has already shutdown and killed the 
one and only JSRuntime. It may be that changing your code to not hold a strong 
reference to the nsJSRuntimeServiceImpl exacerbated the problem by increasing 
the likelihood that the nsJSRuntimeServiceImpl would die earler. But, I don't 
think we are seeing the problem of a zombie nsJSRuntimeServiceImpl.

Chris Waterson

Assignee

Comment 29

•

23 years ago

Fix checked in.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

timeless

Updated

•

16 years ago

Component: XP Toolkit/Widgets: XUL → XUL

QA Contact: jrgmorrison → xptoolkit.widgets

You need to log in before you can comment on or make changes to this bug.