crash [@ nsContentList::PopulateSelf(unsigned int)]




7 years ago
6 years ago


(Reporter: wsmwk, Assigned: bz)


(Blocks: 1 bug, {crash, topcrash})

Windows Vista
crash, topcrash
Dependency tree / graph
Bug Flags:
in-testsuite ?

Firefox Tracking Flags

(firefox9 fixed)


(Whiteboard: [tbird crash][qa-], crash signature)


(1 attachment, 1 obsolete attachment)

crash [@ nsContentList::PopulateSelf(unsigned int)]
only  wndows OS matches this stack.
Mac crash bp-e2dff7a3-62e4-4276-ac1b-442502101218 doesn't match up

0	xul.dll	nsContentList::PopulateSelf	content/base/src/nsContentList.cpp:868
1	xul.dll	nsContentList::BringSelfUpToDate	content/base/src/nsContentList.cpp:933
2	xul.dll	nsContentList::IndexOf	content/base/src/nsContentList.cpp:526
3	xul.dll	nsContentUtils::GenerateStateKey	content/base/src/nsContentUtils.cpp:2137
4	xul.dll	nsDocument::GetLayoutHistoryState	content/base/src/nsDocument.cpp:7109
5	xul.dll	nsGenericHTMLElement::GetPrimaryPresState	content/html/content/src/nsGenericHTMLElement.cpp:1368
6		@0x8	
7		@0x3e333330	
8	mozcrt19.dll	arena_dalloc	obj-firefox/memory/jemalloc/crtsrc/jemalloc.c:4284
9	mozcrt19.dll	free	obj-firefox/memory/jemalloc/crtsrc/jemalloc.c:6121
10	xul.dll	ReleaseData	xpcom/string/src/nsSubstring.cpp:122
11	xul.dll	nsHTMLInputElement::SaveState	content/html/content/src/nsHTMLInputElement.cpp:3205
12	xul.dll	nsGenericElement::UnbindFromTree	content/base/src/nsGenericElement.cpp:3045
13	xul.dll	nsGenericHTMLFormElement::UnbindFromTree	content/html/content/src/nsGenericHTMLElement.cpp:2483
14	xul.dll	nsHTMLInputElement::UnbindFromTree	content/html/content/src/nsHTMLInputElement.cpp:2543
15	xul.dll	nsGenericElement::UnbindFromTree	content/base/src/nsGenericElement.cpp:3041
That stack is totally broken.  free() should not be calling GetPrimaryPresState; no way, nohow.

Are there steps to reproduce here?
(In reply to comment #1)
> That stack is totally broken.  free() should not be calling
> GetPrimaryPresState; no way, nohow.
> Are there steps to reproduce here?

bp-4d84a8ed-c910-445e-b1ee-357b12101128 (kyle)
From what I can recall, I had the two websites 'Facebook' ( and 'FacePunch Studios Forums' ( open in two tabs minimised in the windows taskbar. While I was checking on the status of a download via Valve Software's Steam application,  I utilised the 'tab preview' to open the Facepunch website by hovering over Firefox in the taskbar and selecting Facepunch studios, Firefox crashed upon the enlargement of

I've asked Kyle if he can reproduce.
most recent Thunderbird crash found is 

topcrash for firefox 4.0b9 (#168)

firefox bp-f0399cbc-467c-412c-8860-8f2d32110126
0	xul.dll	nsContentList::PopulateSelf	content/base/src/nsContentList.cpp:872
1	xul.dll	nsContentList::BringSelfUpToDate	content/base/src/nsContentList.cpp:933
2	xul.dll	nsContentList::GetLength	content/base/src/nsContentList.cpp:553
3	xul.dll	NS_InvokeByIndex_P	xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:102
4	xul.dll	XPC_WN_GetterSetter	js/src/xpconnect/src/xpcwrappednativejsops.cpp:1643
5	mozjs.dll	js::Invoke	js/src/jsinterp.cpp:700
6	mozjs.dll	js::ExternalInvoke	js/src/jsinterp.cpp:858
7	mozjs.dll	js::JSProxyHandler::call	js/src/jsproxy.cpp:248
8	mozjs.dll	JSCrossCompartmentWrapper::call	js/src/jswrapper.cpp:601
9	mozjs.dll	js::JSProxy::call	js/src/jsproxy.cpp:810
10	mozjs.dll	js::proxy_Call	js/src/jsproxy.cpp:1062
11	mozjs.dll	js::Invoke	js/src/jsinterp.cpp:693
12	mozjs.dll	js::ExternalInvoke	js/src/jsinterp.cpp:858
Keywords: topcrash
Whiteboard: [tbird crash]
error, error!

Thunderbird bp-dad39a2a-fb6d-4479-a430-7ea072101228 should not have been cited in comment 3

Thunderbird bp-ef190ff7-f9a0-42ff-8e39-f70292110103 is better
1	thunderbird.exe	nsContentList::PopulateSelf	content/base/src/nsContentList.cpp:870
2	thunderbird.exe	nsContentList::Item	content/base/src/nsContentList.cpp:406
3	thunderbird.exe	nsContentList::GetNodeAt	content/base/src/nsContentList.cpp:522
4	thunderbird.exe	nsIDOMNodeList_Item	objdir-tb/mozilla/js/src/xpconnect/src/dom_quickstubs.cpp:4507
5	js3250.dll	js_Interpret	js/src/jsops.cpp:2208
6	js3250.dll	js_Invoke	js/src/jsinterp.cpp:1368
7	js3250.dll	js_InternalInvoke	js/src/jsinterp.cpp:1423
8	js3250.dll	js_InternalGetOrSet	js/src/jsinterp.cpp:1486
9	js3250.dll	js_GetSprop	js/src/jsscope.h:602
10	js3250.dll	js_NativeGet	js/src/jsobj.cpp:4117
11	js3250.dll	js_GetPropertyHelper	js/src/jsobj.cpp:4275
Crash Signature: [@ nsContentList::PopulateSelf(unsigned int)]
barely any crashes for thunderbird
still a firefox topcrash - #110 for v6. no correlations listed.
Fennec is also seeing this:
Fennec 7.0a2 Crash Report [@ nsContentList::PopulateSelf ]

Frame 	Module 	Signature [Expand] 	Source
0 	nsContentList::PopulateSelf 	nsINode.h:1211
1 	nsContentList::BringSelfUpToDate 	content/base/src/nsContentList.cpp:972
2 	nsContentList::Length 	nsVoidArray.h:63
3 	nsContentList::GetLength 	content/base/src/nsContentList.cpp:595
4 	nsIDOMNodeList_GetLength 	obj-firefox/js/src/xpconnect/src/dom_quickstubs.cpp:7839
5 	js_GetProperty 	js/src/jscntxtinlines.h:339
6 	js_GetLengthProperty 	js/src/jsarray.cpp:199
7 	array_slice 	js/src/jsarray.cpp:2549
9 	array_slice 	js/src/jsarray.cpp:2537
10 	js::mjit::JaegerShot 	js/src/jscntxt.h:2040
11 	js::Interpret 	js/src/jsinterp.cpp:4125
12 	js::Invoke 	js/src/jsinterp.cpp:613
13 	js::mjit::stubs::SlowCall 	js/src/methodjit/InvokeHelpers.cpp:186
14 	SlowCallFromIC 	js/src/methodjit/MonoIC.cpp:544
16 	SlowCallFromIC 	js/src/methodjit/MonoIC.h:71

Only Stack Crash :
Both the Fennec line numbers correspond to end-of-function... is that expected?

Comment 8

6 years ago
This has been rising on Firefox in the last days, see for more reports

Comment 9

6 years ago
2. Null deref Crash winxp/win7 Beta/7

+		this	0x00000000 {mNodeInfo={...} mParent=??? mFlags=??? ...}	const nsINode * const

>	xul.dll!nsINode::GetNextSibling()  Line 1090 + 0xa bytes	C++
 	xul.dll!nsINode::GetNextNode(const nsINode * aRoot=0x068fd638)  Line 1121 + 0x8 bytes	C++
 	xul.dll!nsContentList::PopulateSelf(unsigned int aNeededLength=4294967295)  Line 900 + 0xf bytes	C++
 	xul.dll!nsContentList::BringSelfUpToDate(int aDoFlush=1)  Line 969	C++
 	xul.dll!nsContentList::IndexOf(nsIContent * aContent=0x07b69eb0, int aDoFlush=1)  Line 570	C++
 	xul.dll!nsContentUtils::GenerateStateKey(nsIContent * aContent=0x07b69eb0, const nsIDocument * aDocument=0x068fd638, nsIStatefulFrame::SpecialStateID aID=eNoID, nsACString_internal & aKey={...})  Line 1928 + 0x13 bytes	C++
 	xul.dll!nsGenericHTMLElement::GetLayoutHistoryAndKey(nsGenericHTMLElement * aContent=0x07b69eb0, int aRead=0, nsILayoutHistoryState * * aHistory=0x001288b8, nsACString_internal & aKey={...})  Line 1437 + 0x18 bytes	C++
 	xul.dll!nsGenericHTMLElement::GetPrimaryPresState(nsGenericHTMLElement * aContent=0x07b69eb0, nsPresState * * aPresState=0x001289f0)  Line 1386 + 0x27 bytes	C++
 	xul.dll!nsHTMLSelectElement::SaveState()  Line 1625 + 0x10 bytes	C++
 	xul.dll!nsGenericHTMLFormElement::UnbindFromTree(int aDeep=1, int aNullParent=0)  Line 2560	C++
 	xul.dll!nsHTMLSelectElement::UnbindFromTree(int aDeep=1, int aNullParent=0)  Line 1343	C++
 	xul.dll!nsGenericElement::UnbindFromTree(int aDeep=1, int aNullParent=0)  Line 2986	C++
 	xul.dll!nsStyledElementNotElementCSSInlineStyle::UnbindFromTree(int aDeep=1, int aNullParent=0)  Line 249	C++


low volume

high volume
Blocks: 532972

Comment 10

6 years ago
Correlations for Firefox 6.0.2 Windows NT (excerpt):

58% (104/178) vs.   4% (5176/118254) avgcfgx.dll
58% (104/178) vs.   4% (5234/118254) avglngx.dll
58% (104/178) vs.   5% (5610/118254) avgxpl.dll
58% (104/178) vs.   5% (5638/118254) avglogx.dll
57% (102/178) vs.   4% (5314/118254) avgssff6.dll
27% (48/178) vs.   3% (3651/118254) avgcslx.dll
22% (39/178) vs.   2% (2380/118254) avgtbapi.dll
22% (39/178) vs.   2% (2499/118254) IGeared_tavgp_xputils6.dll
22% (39/178) vs.   2% (2500/118254) xpavgtbapi6.dll

54% (97/178) vs.   6% (6546/118254) {1E73965B-8B48-48be-9C8D-68B920ABC1C4} (AVG Safe Search)
22% (39/178) vs.   3% (3499/118254) avg@igeared (AVG Security Toolbar)

This suggests that this is way more common for people who are using AVG security tools.

Comment 11

6 years ago
This is now #11 on Firefox 8 topcrashes, while it is much lower on 7. While it doesn't have significant correlations on 7, it has some on 8.

Correlations for Firefox 8.0 Windows NT (excerpt):
55% (129/235) vs.   0% (129/27052) ZDNdrv50.dll
55% (129/235) vs.   0% (129/27052) ZDNui50.dll
65% (153/235) vs.  15% (4085/27052) msctfime.ime
29% (68/235) vs.   0% (68/27052) zsdepl.dll
29% (68/235) vs.   0% (68/27052) zsdeplui.dll
29% (68/235) vs.   0% (68/27052) htui.dll
29% (68/235) vs.   1% (176/27052) hccutils.dll

49% (114/235) vs.   2% (475/27052) {81BF1D23-5F17-408D-AC6B-BD6DF7CAF670} (iMacros for Firefox,
14% (34/235) vs.   1% (390/27052) (Xmarks (formerly Foxmarks),
11% (26/235) vs.   0% (26/27052) {94f46f7f-fad4-4f32-a406-eac23df4d9a2}

This crash looks very much to me as if we have a bug in our code there that gets triggered and elevated by some kinds of add-ons.
Stacks are just so broken :(

Comment 13

6 years ago
Here there's a list with all recent crashes with this signature:
A search on crash-stats tells me we had 1 crash with that signature in the last week in 10.0a1 Nightly and 6 crashes in 9.0a2 Aurora.
###!!! ASSERTION: aRoot not an ancestor of |this|?: 'cur', file c:\dev\mozilla-b
eta\obj-i686-pc-mingw32\dist\include\nsINode.h, line 1133

>	xul.dll!RealBreak()  Line 422	C++
 	xul.dll!Break(const char * aMsg)  Line 521	C++
 	xul.dll!NS_DebugBreak_P(unsigned int aSeverity, const char * aStr, const char * aExpr, const char * aFile, int aLine)  Line 380 + 0xc bytes	C++
 	xul.dll!nsINode::GetNextNodeImpl(const nsINode * aRoot, const int aSkipChildren)  Line 1133 + 0x23 bytes	C++
 	xul.dll!nsINode::GetNextNode(const nsINode * aRoot)  Line 1107	C++
 	xul.dll!nsContentList::PopulateSelf(unsigned int aNeededLength)  Line 900 + 0xf bytes	C++
 	xul.dll!nsContentList::BringSelfUpToDate(int aDoFlush)  Line 969	C++
 	xul.dll!nsContentList::IndexOf(nsIContent * aContent, int aDoFlush)  Line 570	C++
 	xul.dll!nsContentUtils::GenerateStateKey(nsIContent * aContent, const nsIDocument * aDocument, nsIStatefulFrame::SpecialStateID aID, nsACString_internal & aKey)  Line 1920 + 0x13 bytes	C++
 	xul.dll!nsGenericHTMLElement::GetLayoutHistoryAndKey(nsGenericHTMLElement * aContent, int aRead, nsILayoutHistoryState * * aHistory, nsACString_internal & aKey)  Line 1527 + 0x18 bytes	C++
 	xul.dll!nsGenericHTMLElement::GetPrimaryPresState(nsGenericHTMLElement * aContent, nsPresState * * aPresState)  Line 1476 + 0x27 bytes	C++
 	xul.dll!nsHTMLSelectElement::SaveState()  Line 1649 + 0x10 bytes	C++
 	xul.dll!nsGenericHTMLFormElement::UnbindFromTree(int aDeep, int aNullParent)  Line 2661	C++

... more unbinding stuff
> Here there's a list with all recent crashes with this signature

Thanks for that.  So I see two distinct kinds of crashes in that list:

1)  More common: Crash at address 0x18 == 24 (all of these on Windows)
2)  Everything else

Looking at an instance of the 0x18, it has, as usual, a pretty bogus stack near the top, and is claimed to be on this line in nsContentList.cpp:

   900      cur = cur->GetNextNode(mRootNode);

Importantly, the bottom of the stack is CC unlinking.

24 is the offset of mNextSibling in nsINode in a 32-bit build.

So presumably we're trying to get mNextSibling on a null pointer.

Looking at GetNextNode, it basically works like this:

  1)  Check for first child; if there return it.
  2)  Check for next sibling; if there return it.
  3)  Walk up the parent chain until we hit aRoot, checking next siblings.

There is no test for the parent chain hitting null before we hit aRoot, because that's not supposed to ever happen.  But I would bet that it's happening in this case.

It's hard to tell exactly what the exact order of things is that gets us to that situation because of the broken stack, but if I take at face value the parts of the stack that make sense it looks like nsGenericElement::cycleCollection::Unlink calls UnbindFromTree on a <body>, which recursively calls UnbindFromTree on its descendants and at some point nsGenericHTMLFormElement::UnbindFromTree calls SaveState() which calls GetPrimaryPresState(), which ends up generating a state key, which needs to walk the DOM.  And presumably Unlink does not dispatch ContentRemoved notifications before unbinding, so it may be possible for us to end up in a state where the nodelist has references to nodes in the tree being unlinked....

Let me see if I can write a testcase.
Yeah, comment 14 is exactly what would result from the scenario comment 15 describes.  Kyle, how did you get that to happen?
I loaded the URL from comment 9 in a debug beta build.
So another note: since nodes hold strong refs to parents and nodelists hold strong refs to their nodes, this can only get triggered if the nodelist in question (probably document.forms in this case) is part of the set of things being unlinked.

If we had some way to know in the nodelist code that this is the case, we could just bail out.  If we had some way to know in nsGenericHTMLFormElement::UnbindFromTree that we don't need to save state in this case (is that even true?) we could bail out of there.

Alternately, we could add a basic sanity check to PopulateSelf that the starting node is a descendant of mRoot, I guess.  Or add an extra null-check to GetNextNode, but I'd sort of like to avoid that.
(In reply to Boris Zbarsky (:bz) from comment #18)
> So another note: since nodes hold strong refs to parents ...

That's not true on 8 though is it?
> I loaded the URL from comment 9 in a debug beta build.

Hmm.  I'd tried nightly and didn't get that.  But I do see it in beta.

So in that case, what happens is that |count| in PopulateSelf is 0.  But mRootNode is the document, while |cur| is nsHTMLTableElement.  And cur->mParent is in fact null.

So we managed to walk into a subtree that's already been unbound.  This is different from the scenario I was thinking of in comment 18 (where count != 0 and we start with mElements[count - 1] which happens to be in a subtree which is partially unbound)!

And the reason _that_ can happen is that the nsGenericElement unlink method unbinds all the child nodes _before_ removing them from mAttrsAndChildren.  This is just wrong.  All other places unbind after removing from the child list, and code relies on that.  Looking now to see why this code is written this way.
Oh, and the situation I was worried about can also arise, I believe: we don't disconnect the content list until NodeWillBeDestroyed, which is called from nsNodeUtils::LastRelease.  I wonder whether we should call NodeWillBeDestroyed from unlink as well?
LastRelease/NodeWillBeDestroyed will be called when CC releases the node. Or are you thinking something else?
Blame dates back to CVS.  And lots of the changes are duplicated between hg and CVS, sigh...

In any case, the unbind-before-remove was added in:

3.603 <> 2007-11-29 00:41
Bug 348156: Fix leaks by relying on cycle collector rather than calling UnbindFromTree on all nodes. r/sr=jst

I'm going to try reordering this to make sure that tree state is self-consistent by the time UnbindFromTree happens and see how that goes.

For the case I'd been worrying about, the concern is that a content list has X as mRootNode and both X and the content list are getting unlinked.  X gets unlinked first, so it removes all its kids but does NOT send notifications.  One of the kids getting unbound tries to SaveState lands in content list code, the content list starts walking from the last kid it has (which can be non-null because the content list has not been unlinked yet), and is suddenly operating on nodes that are not descendants of mRootNode...  Jonas, Peter, Olli, can you think of any reason that _can't_ happen?
Blocks: 348156
OK, reordering the unbind and remove calls fixes at least the comment 9 testcase for me.
Created attachment 570928 [details] [diff] [review]
Make sure, when unlinking, to remove the kids from our child list before unbinding them.

Olli, how do you feel about this for branches?  The patch applies as-is to m-c and beta, so probably also to aurora... but I doubt we want it for beta at this point.  Taking it on Aurora might be a good thing, though.
Attachment #570928 - Flags: review?(Olli.Pettay)
Taking this.  I'll file a followup on the end of comment 23 once this lands, unless I can convince myself that situation can't arise.
Assignee: nobody → bzbarsky
Priority: -- → P1
Comment on attachment 570928 [details] [diff] [review]
Make sure, when unlinking, to remove the kids from our child list before unbinding them.

># HG changeset patch
># User Boris Zbarsky <>
># Date 1320122481 14400
># Node ID 3dbd25f695f125f7d4aec8de19ee17944a4354a5
># Parent  c8d7bd7d9c0afa4a5c308a9386e465a738510361
>Bug 620122.  Make sure, when unlinking, to remove the kids from our child list before unbinding them.  r=smaug
>diff --git a/content/base/src/nsGenericElement.cpp b/content/base/src/nsGenericElement.cpp
>--- a/content/base/src/nsGenericElement.cpp
>+++ b/content/base/src/nsGenericElement.cpp
>@@ -4216,20 +4216,29 @@ NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN(ns
>     PRUint32 childCount = tmp->mAttrsAndChildren.ChildCount();
>     if (childCount) {
>       // Don't allow script to run while we're unbinding everything.
>       nsAutoScriptBlocker scriptBlocker;
>       while (childCount-- > 0) {
>         // Once we have XPCOMGC we shouldn't need to call UnbindFromTree.
>         // We could probably do a non-deep unbind here when IsInDoc is false
>         // for better performance.
>-        tmp->mAttrsAndChildren.ChildAt(childCount)->UnbindFromTree();
>+        // Hold a strong ref to the node when we remove it, because we may be
>+        // the last reference to it.  We need to call RemoveChildAt() and
>+        // update mFirstChild before calling UnbindFromTree, since this last
>+        // can notify various observers and they should really see consistent
>+        // tree state.
>+        nsCOMPtr<nsIContent> child = tmp->mAttrsAndChildren.ChildAt(childCount);
>         tmp->mAttrsAndChildren.RemoveChildAt(childCount);
>+        if (childCount == 0) {
>+          tmp->mFirstChild = nsnull;
>+        }
>+        child->UnbindFromTree();
>       }
>-      tmp->mFirstChild = nsnull;
>     }
I really wish you could add
already_AddRefed<nsIContent> nsAttrAndChildArray::TakeChildAt
so that we don't need to do extra addref/release.
Attachment #570928 - Flags: review?(bugs) → review-
I can do that, sure.
Created attachment 571132 [details] [diff] [review]
With TakeChildAt
Attachment #571132 - Flags: review?
Attachment #570928 - Attachment is obsolete: true
Whiteboard: [tbird crash] → [need review][tbird crash]
Attachment #571132 - Flags: review? → review?(bugs)


6 years ago
Attachment #571132 - Flags: review?(bugs) → review+
This signature showed up on the Firefox 8 explosive report today: Farmville was mentioned in one of the comments.

This should significantly reduce the frequency of this crash, at the very least.

I've filed bug 699321 on the last part of comment 23.
Flags: in-testsuite?
Whiteboard: [need review][tbird crash] → [tbird crash]
Target Milestone: --- → mozilla10
Comment on attachment 571132 [details] [diff] [review]
With TakeChildAt

Requesting aurora and beta approval.  This should be a pretty safe patch that just makes unlinking do the child node removal steps in the same order as every single other callsite that removes child nodes.  And it fixes what's apparently a common crash.
Attachment #571132 - Flags: approval-mozilla-beta?
Attachment #571132 - Flags: approval-mozilla-aurora?
Last Resolved: 6 years ago
Resolution: --- → FIXED

Comment 34

6 years ago
Comment on attachment 571132 [details] [diff] [review]
With TakeChildAt

[triage comment]
Too late for Firefox 8 beta, but we'll approve it for Firefox 9 aurora. Please land asap.
Attachment #571132 - Flags: approval-mozilla-beta?
Attachment #571132 - Flags: approval-mozilla-beta-
Attachment #571132 - Flags: approval-mozilla-aurora?
Attachment #571132 - Flags: approval-mozilla-aurora+
status-firefox9: --- → fixed
Does this crash affect Firefox, or is it Thunderbird only?

Comment 37

6 years ago
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #36)
> Does this crash affect Firefox, or is it Thunderbird only?

As I said in comment #8 and comment #11, it happens in Firefox as well - and confirms that.
Thanks Robert. Is there a testcase QA can use to verify this fix in Firefox?
Whiteboard: [tbird crash] → [tbird crash][qa?]
Not that I know of.  But you can look at crash frequencies in crash-stats, right?
in fact in thunderbird it is quite rare (comment 5).  only 28 crashes in 12 weeks for tb6,7,8. So it may be at least a month after dec 20 release to confirm the problem is gone from version 9
I'm seeing 47 reports with this signature on Firefox 9.0b4...
Bug 699321 may also cause crashes with this signature.
(In reply to Boris Zbarsky (:bz) from comment #42)
> Bug 699321 may also cause crashes with this signature.

Is there a way I can verify these crashes are being produced by that bug and not this one?
Not that I know of.

What you can do is examine crash frequencies and see whether there's any change, I guess. 

The thing we should really do is fix that bug and then see whether the crash is gone altogether.
I'm going to call this qa- and we can handle the broad verification on bug 699321.
Whiteboard: [tbird crash][qa?] → [tbird crash][qa-]
You need to log in before you can comment on or make changes to this bug.