Closed Bug 415192 Opened 17 years ago Closed 16 years ago

Crash (within nsCycleCollector::Shutdown and nsXBLBinding::RemoveInsertionParent) with <xul:wizard>, cloneNode

Categories

(Core :: XBL, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla1.9

People

(Reporter: jruderman, Assigned: peterv)

References

Details

(Keywords: assertion, crash, testcase, Whiteboard: [sg:critical])

Attachments

(2 files, 2 obsolete files)

Loading the testcase *and then quitting* triggers:

###!!! ASSERTION: Shallow unbind won't clear document and binding parent on kids!: 'aDeep || (!GetCurrentDoc() && !GetBindingParent())', file /Users/jruderman/trunk/mozilla/content/base/src/nsGenericElement.cpp, line 2151

###!!! ASSERTION: op == PL_DHASH_LOOKUP || RECURSION_LEVEL(table) == 0: 'op == PL_DHASH_LOOKUP || RECURSION_LEVEL(table) == 0', file pldhash.c, line 585

Crash within nsXBLBinding::RemoveInsertionParent, often dereferencing or calling a random-looking memory address.

I couldn't figure out how to make a testcase that used its own XBL binding rather than using <xul:wizard> :(
Whiteboard: [sg:critical]
That first assertion is probably bad on its own...

The stack to the second assert is:

#1  0xb7a56fcb in NS_DebugBreak_P (aSeverity=1, 
    aStr=0xb7a81da8 "op == PL_DHASH_LOOKUP || RECURSION_LEVEL(table) == 0", 
    aExpr=0xb7a81da8 "op == PL_DHASH_LOOKUP || RECURSION_LEVEL(table) == 0", 
    aFile=0xb7a81c7e "pldhash.c", aLine=585)
    at ../../../mozilla/xpcom/base/nsDebugImpl.cpp:358
#2  0xb79c233f in PL_DHashTableOperate (table=0x8df2694, key=0x819eec8, 
    op=PL_DHASH_REMOVE) at pldhash.c:584
#3  0xb514e044 in nsTHashtable<nsBaseHashtableET<nsISupportsHashKey, nsRefPtr<nsXBLBinding> > >::RemoveEntry (this=0x8df2694, aKey=0x819eec8)
    at ../../../dist/include/xpcom/nsTHashtable.h:193
#4  0xb514d41b in nsBaseHashtable<nsISupportsHashKey, nsRefPtr<nsXBLBinding>, nsXBLBinding*>::Remove (this=0x8df2694, aKey=0x819eec8)
    at ../../../dist/include/xpcom/nsBaseHashtable.h:159
#5  0xb514a26d in nsBindingManager::SetBinding (this=0x8df2688, aContent=0x819eec8, 
    aBinding=0x0) at ../../../../mozilla/content/xbl/src/nsBindingManager.cpp:529
#6  0xb514a5d9 in nsBindingManager::ChangeDocumentFor (this=0x8df2688, 
    aContent=0x819eec8, aOldDocument=0x8df3148, aNewDocument=0x0)
    at ../../../../mozilla/content/xbl/src/nsBindingManager.cpp:616
#7  0xb4f45a42 in nsGenericElement::UnbindFromTree (this=0x819eec8, aDeep=0, 
    aNullParent=1) at ../../../../mozilla/content/base/src/nsGenericElement.cpp:2158
#8  0xb52adab5 in nsXULElement::UnbindFromTree (this=0x819eec8, aDeep=0, aNullParent=1)
    at ../../../../../mozilla/content/xul/content/src/nsXULElement.cpp:832
#9  0xb4ec0f78 in nsAttrAndChildArray::Clear (this=0x819eeb8)
    at ../../../../mozilla/content/base/src/nsAttrAndChildArray.cpp:625
#10 0xb4ebf7b2 in nsAttrAndChildArray::~nsAttrAndChildArray ()
    at ../../../dist/include/content/nsContentUtils.h:962
#11 0xb4f420b3 in ~nsGenericElement (this=0x819eea0)
    at ../../../../mozilla/content/base/src/nsGenericElement.cpp:1174
#12 0xb508f5aa in nsXMLElement::~nsXMLElement$delete ()
    at ../../../mozilla/layout/printing/nsPrintObject.h:66
#13 0xb4f60564 in nsNodeUtils::LastRelease (aNode=0x819eea0)
    at ../../../../mozilla/content/base/src/nsNodeUtils.cpp:250
#14 0xb4f4a018 in nsGenericElement::Release (this=0x819eea0)
    at ../../../../mozilla/content/base/src/nsGenericElement.cpp:3493
#15 0xb508e75e in nsXMLElement::Release (this=0x819eea0)
    at ../../../../../mozilla/content/xml/content/src/nsXMLElement.cpp:99
#16 0xb4c250f8 in ~nsCOMPtr (this=0x81a85f8)
    at ../../../../dist/include/xpcom/nsCOMPtr.h:583
#17 0xb5124521 in ~nsXBLBinding (this=0x81a85f0)
    at ../../../../mozilla/content/xbl/src/nsXBLBinding.cpp:288
#18 0xb4c4bbce in nsXBLBinding::Release (this=0x81a85f0)
    at ../../../../../mozilla/content/xbl/src/nsXBLBinding.h:93
#19 0xb4c4600a in nsRefPtr<nsXBLBinding>::~nsRefPtr ()
    at ../../../dist/include/xpcom/nsCOMArray.h:244
#20 0xb514eaa2 in ~nsBaseHashtableET (this=0x8e84674)
    at ../../../dist/include/xpcom/nsBaseHashtable.h:312
#21 0xb514e736 in nsTHashtable<nsBaseHashtableET<nsISupportsHashKey, nsRefPtr<nsXBLBinding> > >::s_ClearEntry (table=0x8df2694, entry=0x8e84674)
    at ../../../dist/include/xpcom/nsTHashtable.h:391
#22 0xb79c2645 in PL_DHashTableRawRemove (table=0x8df2694, entry=0x8e84674)
    at pldhash.c:693
#23 0xb79c2743 in PL_DHashTableEnumerate (table=0x8df2694, 
    etor=0xb79cc00a <PL_DHashStubEnumRemove(PLDHashTable*, PLDHashEntryHdr*, unsigned int, void*)>, arg=0x0) at pldhash.c:727
#24 0xb514daa3 in nsTHashtable<nsBaseHashtableET<nsISupportsHashKey, nsRefPtr<nsXBLBinding> > >::Clear (this=0x8df2694) at ../../../dist/include/xpcom/nsTHashtable.h:245
#25 0xb514cf74 in nsBaseHashtable<nsISupportsHashKey, nsRefPtr<nsXBLBinding>, nsXBLBinding*>::Clear (this=0x8df2694) at ../../../dist/include/xpcom/nsBaseHashtable.h:227
#26 0xb5149283 in nsBindingManager::cycleCollection::Unlink (this=0xb56352dc, 
    p=0x8df2688) at ../../../../mozilla/content/xbl/src/nsBindingManager.cpp:334
#27 0xb7a5eaff in nsCycleCollector::CollectWhite (this=0x80a5b58)
    at ../../../mozilla/xpcom/base/nsCycleCollector.cpp:1572

What I find odd is that we're calling element destructors during Unlink().  Shouldn't we have traversed those and hence be holding strong refs to them while we're unlinking?
Flags: blocking1.9?
Assignee: nobody → peterv
Flags: blocking1.9? → blocking1.9+
Priority: -- → P3
(In reply to comment #1)
> What I find odd is that we're calling element destructors during Unlink(). 
> Shouldn't we have traversed those and hence be holding strong refs to them
> while we're unlinking?

Not necessarily. What's happening is that something holds the cloned element alive, and that element has a binding. The binding manager is in a collectable cycle and so we try to collect it, which destroys the binding and things hanging of of it. The cloned element itself isn't being collected, so we don't keep strong references to things hanging of of it. I'm trying to find out why that element is being kept alive, but we'll also need a more general solution for this out-of-order unlinking. Elements don't keep a strong ref to their binding manager.
So the issue is the 

385 // The hashes keyed on nsIContent are traversed from the nsIContent itself.

comment in the binding manager traverse, right?  If it traversed the binding table, all those things would have refs to them...

Perhaps what we should do is have both the binding manager and the element traverse the binding?  The binding manager could hold two refs to it to make the refcount match the traversal.  Or would that fail for some reason?

Another option is to have nodes hold a strong ref to their ownerDocument.  That would prevent this whole issue, since the binding manager only dies when the document dies, and a document wouldn't die until all the nodes were dead.
Another solution would be to not have one binding manager per document, but rather just a global one. The only practical use we currently have for mDocument is to block and unblock onload.

That would definitely be a bigger change though.
That could have some performance implications, since we currently optimize DOM mutations for the cases when there are no bindings with insertion points associated with the document.
(In reply to comment #3)
> Perhaps what we should do is have both the binding manager and the element
> traverse the binding?  The binding manager could hold two refs to it to make
> the refcount match the traversal.  Or would that fail for some reason?

It wouldn't solve the problem, the element won't be unlinked (there's still an unknown ref to it) but the binding manager will still go away when the document dies.

> Another option is to have nodes hold a strong ref to their ownerDocument.  That
> would prevent this whole issue, since the binding manager only dies when the
> document dies, and a document wouldn't die until all the nodes were dead.

I just tried this, it's going to be hard to do this without leaking.

(In reply to comment #4)
> Another solution would be to not have one binding manager per document, but
> rather just a global one. The only practical use we currently have for
> mDocument is to block and unblock onload.
> 
> That would definitely be a bigger change though.

Yes, a quick look shows a lot of places expecting objects' lifetime to be bound to the binding manager (and thus to the document).
> It wouldn't solve the problem

Why not?  The binding manager will die, yes.  But as it dies it won't kill all those nodes, since during traverse the cycle collector will have taken refs to them.  So we won't get the reentry issue, as far as I can tell...
(In reply to comment #7)
> Why not?  The binding manager will die, yes.  But as it dies it won't kill all
> those nodes, since during traverse the cycle collector will have taken refs to
> them. 

It won't have taken those refs. The cycle collector only takes a strong reference to objects that it tries to unlink. In this case it will unlink the binding manager, but not the binding or anything hanging of of it, because something unknown holds a strong reference to the bound element and the bound element holds a strong reference to the binding (from the cycle collectors perspective).
Oh, hmm.  So it really does sound like we need to properly reflect the ownership model in the cycle collection:  The binding manager owns the binding.  But then we wouldn't collect cycles through the binding until the document goes away, right?

The other option is to change the hashtable enumeration to accumulate the things to destroy and to destroy them once we're not in the hashtable code anymore...
Blocks: 348156
Could we change it so that the nodeinfomanager owns the bindingmanager instead of the document? That way the bindingmanager won't go away until the last element does so.
Flags: tracking1.9+ → blocking1.9+
Priority: P3 → P2
Attached patch v1 (obsolete) — Splinter Review
This fixes the crash, but it still leaks a bunch of stuff on the testcase.
Will this always create a cycle between the nimgr and the document? That seems somewhat scary :(
Attached patch v1.1 (obsolete) — Splinter Review
This one doesn't leak. Turns out we weren't traversing the non-element nodes that are in the mInsertionParentTable hash. That might also fix some of the other leaks.
Attachment #309820 - Attachment is obsolete: true
Attachment #311367 - Flags: superreview?(jonas)
Attachment #311367 - Flags: review?(jonas)
I'm still very worried about the strong reference from the nodeinfomanager to the document. It means that all nodes have strong references to their owner document, and that all nodes in a document create a cycle through the document.

Seems like we are much more likely to leak unless we're really good at adding everything that holds references to nodes to the cycle collector.

What is the purpose of doing this?
Fixing the bug, see last point in comment 3 :-/. I'm personally not too worried about that strong ref.
Doesn't doing comment 10 work?

The issue is that you don't want to tear down the binding manager while there are still bindings alive right?
Attached patch v1.2Splinter Review
Attachment #311367 - Attachment is obsolete: true
Attachment #311883 - Flags: superreview?(jonas)
Attachment #311883 - Flags: review?(jonas)
Attachment #311367 - Flags: superreview?(jonas)
Attachment #311367 - Flags: review?(jonas)
Comment on attachment 311883 [details] [diff] [review]
v1.2

Awesome!
Attachment #311883 - Flags: superreview?(jonas)
Attachment #311883 - Flags: superreview+
Attachment #311883 - Flags: review?(jonas)
Attachment #311883 - Flags: review+
Status: NEW → RESOLVED
Closed: 16 years ago
Flags: in-testsuite+
OS: Mac OS X → All
Hardware: PC → All
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9
Depends on: 427313
Verified fixed with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9pre) Gecko/2008041217 Minefield/3.0pre ID:2008041217 and  Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9pre) Gecko/2008041304 Minefield/3.0pre

No Crash on Testcase -> Verified fixed

Note: When i try the testcase from this bug i get a leak report , not sure if this is something is should file a bug for.
Status: RESOLVED → VERIFIED
Tomcat, please do file a new bug and mark it blocking this one.
Depends on: 429003
(In reply to comment #21)
> Tomcat, please do file a new bug and mark it blocking this one.
> 

done (thx peter and jonas), the Leak Bug is now Bug 429003
The cycle collector doesn't exist in 1.8, there's not another aspect of this that applies to the older branch is there? If we don't need the fix on the 1.8 branch we can unhide this bug now.
Flags: wanted1.8.1.x-
(In reply to comment #23)
> The cycle collector doesn't exist in 1.8, there's not another aspect of this
> that applies to the older branch is there?

Nope, this shouldn't affect older branches in any way.
In that case we can un-hide this bug now.
Group: security
We may need to back most of this out if we use the nodeinfomanager-owns-document approach for strong-parent-pointers.
All the code/APIs assumes that nsIDocument::BindingManager() never returns
null. So it will be probably easier to keep binding manager in the document.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: