Last Comment Bug 620122 - crash [@ nsContentList::PopulateSelf(unsigned int)]
: crash [@ nsContentList::PopulateSelf(unsigned int)]
Status: RESOLVED FIXED
[tbird crash][qa-]
: crash, topcrash
Product: Core
Classification: Components
Component: DOM (show other bugs)
: Trunk
: x86 Windows Vista
: P1 critical (vote)
: mozilla10
Assigned To: Boris Zbarsky [:bz] (still a bit busy)
:
: Andrew Overholt [:overholt]
Mentors:
Depends on:
Blocks: 532972 348156
  Show dependency treegraph
 
Reported: 2010-12-18 08:49 PST by Wayne Mery (:wsmwk, NI for questions)
Modified: 2011-12-07 14:33 PST (History)
14 users (show)
bzbarsky: in‑testsuite?
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed


Attachments
Make sure, when unlinking, to remove the kids from our child list before unbinding them. (1.80 KB, patch)
2011-10-31 21:42 PDT, Boris Zbarsky [:bz] (still a bit busy)
bugs: review-
Details | Diff | Splinter Review
With TakeChildAt (4.26 KB, patch)
2011-11-01 13:43 PDT, Boris Zbarsky [:bz] (still a bit busy)
bugs: review+
christian: approval‑mozilla‑aurora+
christian: approval‑mozilla‑beta-
Details | Diff | Splinter Review

Description Wayne Mery (:wsmwk, NI for questions) 2010-12-18 08:49:02 PST
crash [@ nsContentList::PopulateSelf(unsigned int)]
only  wndows OS matches this stack.
Mac crash bp-e2dff7a3-62e4-4276-ac1b-442502101218 doesn't match up

bp-e88ee015-44cc-4953-a96d-2ce092101211
EXCEPTION_ACCESS_VIOLATION_READ
0x14
0	xul.dll	nsContentList::PopulateSelf	content/base/src/nsContentList.cpp:868
1	xul.dll	nsContentList::BringSelfUpToDate	content/base/src/nsContentList.cpp:933
2	xul.dll	nsContentList::IndexOf	content/base/src/nsContentList.cpp:526
3	xul.dll	nsContentUtils::GenerateStateKey	content/base/src/nsContentUtils.cpp:2137
4	xul.dll	nsDocument::GetLayoutHistoryState	content/base/src/nsDocument.cpp:7109
5	xul.dll	nsGenericHTMLElement::GetPrimaryPresState	content/html/content/src/nsGenericHTMLElement.cpp:1368
6		@0x8	
7		@0x3e333330	
8	mozcrt19.dll	arena_dalloc	obj-firefox/memory/jemalloc/crtsrc/jemalloc.c:4284
9	mozcrt19.dll	free	obj-firefox/memory/jemalloc/crtsrc/jemalloc.c:6121
10	xul.dll	ReleaseData	xpcom/string/src/nsSubstring.cpp:122
11	xul.dll	nsHTMLInputElement::SaveState	content/html/content/src/nsHTMLInputElement.cpp:3205
12	xul.dll	nsGenericElement::UnbindFromTree	content/base/src/nsGenericElement.cpp:3045
13	xul.dll	nsGenericHTMLFormElement::UnbindFromTree	content/html/content/src/nsGenericHTMLElement.cpp:2483
14	xul.dll	nsHTMLInputElement::UnbindFromTree	content/html/content/src/nsHTMLInputElement.cpp:2543
15	xul.dll	nsGenericElement::UnbindFromTree	content/base/src/nsGenericElement.cpp:3041
Comment 1 Boris Zbarsky [:bz] (still a bit busy) 2010-12-22 13:08:44 PST
That stack is totally broken.  free() should not be calling GetPrimaryPresState; no way, nohow.

Are there steps to reproduce here?
Comment 2 Wayne Mery (:wsmwk, NI for questions) 2010-12-29 13:27:51 PST
(In reply to comment #1)
> That stack is totally broken.  free() should not be calling
> GetPrimaryPresState; no way, nohow.
> 
> Are there steps to reproduce here?

bp-4d84a8ed-c910-445e-b1ee-357b12101128 (kyle)
From what I can recall, I had the two websites 'Facebook' (www.facebook.com) and 'FacePunch Studios Forums' (www.facepunch.com) open in two tabs minimised in the windows taskbar. While I was checking on the status of a download via Valve Software's Steam application,  I utilised the 'tab preview' to open the Facepunch website by hovering over Firefox in the taskbar and selecting Facepunch studios, Firefox crashed upon the enlargement of www.facepunch.com.

I've asked Kyle if he can reproduce.
Comment 3 Wayne Mery (:wsmwk, NI for questions) 2011-01-26 12:30:11 PST
most recent Thunderbird crash found is 
bp-dad39a2a-fb6d-4479-a430-7ea072101228
EXCEPTION_ACCESS_VIOLATION_READ
0x0


topcrash for firefox 4.0b9 (#168)

firefox bp-f0399cbc-467c-412c-8860-8f2d32110126
EXCEPTION_ACCESS_VIOLATION_READ
0x11
0	xul.dll	nsContentList::PopulateSelf	content/base/src/nsContentList.cpp:872
1	xul.dll	nsContentList::BringSelfUpToDate	content/base/src/nsContentList.cpp:933
2	xul.dll	nsContentList::GetLength	content/base/src/nsContentList.cpp:553
3	xul.dll	NS_InvokeByIndex_P	xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:102
4	xul.dll	XPC_WN_GetterSetter	js/src/xpconnect/src/xpcwrappednativejsops.cpp:1643
5	mozjs.dll	js::Invoke	js/src/jsinterp.cpp:700
6	mozjs.dll	js::ExternalInvoke	js/src/jsinterp.cpp:858
7	mozjs.dll	js::JSProxyHandler::call	js/src/jsproxy.cpp:248
8	mozjs.dll	JSCrossCompartmentWrapper::call	js/src/jswrapper.cpp:601
9	mozjs.dll	js::JSProxy::call	js/src/jsproxy.cpp:810
10	mozjs.dll	js::proxy_Call	js/src/jsproxy.cpp:1062
11	mozjs.dll	js::Invoke	js/src/jsinterp.cpp:693
12	mozjs.dll	js::ExternalInvoke	js/src/jsinterp.cpp:858
Comment 4 Wayne Mery (:wsmwk, NI for questions) 2011-01-26 12:36:07 PST
error, error!

Thunderbird bp-dad39a2a-fb6d-4479-a430-7ea072101228 should not have been cited in comment 3

Thunderbird bp-ef190ff7-f9a0-42ff-8e39-f70292110103 is better
EXCEPTION_ACCESS_VIOLATION_EXEC
0x4c25e
1	thunderbird.exe	nsContentList::PopulateSelf	content/base/src/nsContentList.cpp:870
2	thunderbird.exe	nsContentList::Item	content/base/src/nsContentList.cpp:406
3	thunderbird.exe	nsContentList::GetNodeAt	content/base/src/nsContentList.cpp:522
4	thunderbird.exe	nsIDOMNodeList_Item	objdir-tb/mozilla/js/src/xpconnect/src/dom_quickstubs.cpp:4507
5	js3250.dll	js_Interpret	js/src/jsops.cpp:2208
6	js3250.dll	js_Invoke	js/src/jsinterp.cpp:1368
7	js3250.dll	js_InternalInvoke	js/src/jsinterp.cpp:1423
8	js3250.dll	js_InternalGetOrSet	js/src/jsinterp.cpp:1486
9	js3250.dll	js_GetSprop	js/src/jsscope.h:602
10	js3250.dll	js_NativeGet	js/src/jsobj.cpp:4117
11	js3250.dll	js_GetPropertyHelper	js/src/jsobj.cpp:4275
Comment 5 Wayne Mery (:wsmwk, NI for questions) 2011-07-26 08:30:16 PDT
barely any crashes for thunderbird
still a firefox topcrash - #110 for v6. no correlations listed.
Comment 6 Naoki Hirata :nhirata (please use needinfo instead of cc) 2011-08-08 17:28:57 PDT
Fennec is also seeing this:
Fennec 7.0a2 Crash Report [@ nsContentList::PopulateSelf ] 
https://crash-stats.mozilla.com/report/index/35d1def5-1890-4502-9734-f68f12110801

Frame 	Module 	Signature [Expand] 	Source
0 	libxul.so 	nsContentList::PopulateSelf 	nsINode.h:1211
1 	libxul.so 	nsContentList::BringSelfUpToDate 	content/base/src/nsContentList.cpp:972
2 	libxul.so 	nsContentList::Length 	nsVoidArray.h:63
3 	libxul.so 	nsContentList::GetLength 	content/base/src/nsContentList.cpp:595
4 	libxul.so 	nsIDOMNodeList_GetLength 	obj-firefox/js/src/xpconnect/src/dom_quickstubs.cpp:7839
5 	libxul.so 	js_GetProperty 	js/src/jscntxtinlines.h:339
6 	libxul.so 	js_GetLengthProperty 	js/src/jsarray.cpp:199
7 	libxul.so 	array_slice 	js/src/jsarray.cpp:2549
8 	libxul.so 	libxul.so@0xb27c52 	
9 	libxul.so 	array_slice 	js/src/jsarray.cpp:2537
10 	libxul.so 	js::mjit::JaegerShot 	js/src/jscntxt.h:2040
11 	libxul.so 	js::Interpret 	js/src/jsinterp.cpp:4125
12 	libxul.so 	js::Invoke 	js/src/jsinterp.cpp:613
13 	libxul.so 	js::mjit::stubs::SlowCall 	js/src/methodjit/InvokeHelpers.cpp:186
14 	libxul.so 	SlowCallFromIC 	js/src/methodjit/MonoIC.cpp:544
15 	libxul.so 	libxul.so@0xb27c52 	
16 	libxul.so 	SlowCallFromIC 	js/src/methodjit/MonoIC.h:71

Only Stack Crash :
https://crash-stats.mozilla.com/report/list?range_value=7&range_unit=days&date=2011-08-08%2001%3A00%3A00&signature=nsContentList%3A%3APopulateSelf&version=Fennec%3A7.0a2
Comment 7 Boris Zbarsky [:bz] (still a bit busy) 2011-08-08 19:49:57 PDT
Both the Fennec line numbers correspond to end-of-function... is that expected?
Comment 8 Robert Kaiser 2011-09-19 09:52:37 PDT
This has been rising on Firefox in the last days, see https://crash-stats.mozilla.com/report/list?signature=nsContentList%3A%3APopulateSelf%28unsigned%20int%29 for more reports
Comment 9 Bob Clary [:bc:] 2011-09-19 18:46:02 PDT
1. http://users.skynet.be/crazysweeper/partitions/stratovarius.xml
2. Null deref Crash winxp/win7 Beta/7

+		this	0x00000000 {mNodeInfo={...} mParent=??? mFlags=??? ...}	const nsINode * const

>	xul.dll!nsINode::GetNextSibling()  Line 1090 + 0xa bytes	C++
 	xul.dll!nsINode::GetNextNode(const nsINode * aRoot=0x068fd638)  Line 1121 + 0x8 bytes	C++
 	xul.dll!nsContentList::PopulateSelf(unsigned int aNeededLength=4294967295)  Line 900 + 0xf bytes	C++
 	xul.dll!nsContentList::BringSelfUpToDate(int aDoFlush=1)  Line 969	C++
 	xul.dll!nsContentList::IndexOf(nsIContent * aContent=0x07b69eb0, int aDoFlush=1)  Line 570	C++
 	xul.dll!nsContentUtils::GenerateStateKey(nsIContent * aContent=0x07b69eb0, const nsIDocument * aDocument=0x068fd638, nsIStatefulFrame::SpecialStateID aID=eNoID, nsACString_internal & aKey={...})  Line 1928 + 0x13 bytes	C++
 	xul.dll!nsGenericHTMLElement::GetLayoutHistoryAndKey(nsGenericHTMLElement * aContent=0x07b69eb0, int aRead=0, nsILayoutHistoryState * * aHistory=0x001288b8, nsACString_internal & aKey={...})  Line 1437 + 0x18 bytes	C++
 	xul.dll!nsGenericHTMLElement::GetPrimaryPresState(nsGenericHTMLElement * aContent=0x07b69eb0, nsPresState * * aPresState=0x001289f0)  Line 1386 + 0x27 bytes	C++
 	xul.dll!nsHTMLSelectElement::SaveState()  Line 1625 + 0x10 bytes	C++
 	xul.dll!nsGenericHTMLFormElement::UnbindFromTree(int aDeep=1, int aNullParent=0)  Line 2560	C++
 	xul.dll!nsHTMLSelectElement::UnbindFromTree(int aDeep=1, int aNullParent=0)  Line 1343	C++
 	xul.dll!nsGenericElement::UnbindFromTree(int aDeep=1, int aNullParent=0)  Line 2986	C++
 	xul.dll!nsStyledElementNotElementCSSInlineStyle::UnbindFromTree(int aDeep=1, int aNullParent=0)  Line 249	C++

bp-1b5cd8bb-aa45-4d0f-8612-b83022110919

low volume
https://crash-stats.mozilla.com/query/query?do_query=1&query_type=contains&query=nsINode%3A%3AGetNextSibling

high volume
https://crash-stats.mozilla.com/query/query?do_query=1&query_type=contains&query=nsContentList%3A%3APopulateSelf
Comment 10 Robert Kaiser 2011-09-20 17:50:08 PDT
Correlations for Firefox 6.0.2 Windows NT (excerpt):

58% (104/178) vs.   4% (5176/118254) avgcfgx.dll
58% (104/178) vs.   4% (5234/118254) avglngx.dll
58% (104/178) vs.   5% (5610/118254) avgxpl.dll
58% (104/178) vs.   5% (5638/118254) avglogx.dll
57% (102/178) vs.   4% (5314/118254) avgssff6.dll
27% (48/178) vs.   3% (3651/118254) avgcslx.dll
22% (39/178) vs.   2% (2380/118254) avgtbapi.dll
22% (39/178) vs.   2% (2499/118254) IGeared_tavgp_xputils6.dll
22% (39/178) vs.   2% (2500/118254) xpavgtbapi6.dll

54% (97/178) vs.   6% (6546/118254) {1E73965B-8B48-48be-9C8D-68B920ABC1C4} (AVG Safe Search)
22% (39/178) vs.   3% (3499/118254) avg@igeared (AVG Security Toolbar)

This suggests that this is way more common for people who are using AVG security tools.
Comment 11 Robert Kaiser 2011-10-31 07:18:08 PDT
This is now #11 on Firefox 8 topcrashes, while it is much lower on 7. While it doesn't have significant correlations on 7, it has some on 8.

Correlations for Firefox 8.0 Windows NT (excerpt):
Modules:
55% (129/235) vs.   0% (129/27052) ZDNdrv50.dll
55% (129/235) vs.   0% (129/27052) ZDNui50.dll
65% (153/235) vs.  15% (4085/27052) msctfime.ime
29% (68/235) vs.   0% (68/27052) zsdepl.dll
29% (68/235) vs.   0% (68/27052) zsdeplui.dll
29% (68/235) vs.   0% (68/27052) htui.dll
29% (68/235) vs.   1% (176/27052) hccutils.dll

Add-ons:
49% (114/235) vs.   2% (475/27052) {81BF1D23-5F17-408D-AC6B-BD6DF7CAF670} (iMacros for Firefox, https://addons.mozilla.org/addon/3863)
14% (34/235) vs.   1% (390/27052) foxmarks@kei.com (Xmarks (formerly Foxmarks), https://addons.mozilla.org/addon/2410)
11% (26/235) vs.   0% (26/27052) {94f46f7f-fad4-4f32-a406-eac23df4d9a2}


This crash looks very much to me as if we have a bug in our code there that gets triggered and elevated by some kinds of add-ons.
Comment 12 Olli Pettay [:smaug] 2011-10-31 07:30:59 PDT
Stacks are just so broken :(
Comment 13 Robert Kaiser 2011-10-31 07:43:52 PDT
Here there's a list with all recent crashes with this signature: https://crash-stats.mozilla.com/report/list?signature=nsContentList%3A%3APopulateSelf%28unsigned%20int%29
A search on crash-stats tells me we had 1 crash with that signature in the last week in 10.0a1 Nightly and 6 crashes in 9.0a2 Aurora.
Comment 14 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-10-31 07:53:19 PDT
###!!! ASSERTION: aRoot not an ancestor of |this|?: 'cur', file c:\dev\mozilla-b
eta\obj-i686-pc-mingw32\dist\include\nsINode.h, line 1133

>	xul.dll!RealBreak()  Line 422	C++
 	xul.dll!Break(const char * aMsg)  Line 521	C++
 	xul.dll!NS_DebugBreak_P(unsigned int aSeverity, const char * aStr, const char * aExpr, const char * aFile, int aLine)  Line 380 + 0xc bytes	C++
 	xul.dll!nsINode::GetNextNodeImpl(const nsINode * aRoot, const int aSkipChildren)  Line 1133 + 0x23 bytes	C++
 	xul.dll!nsINode::GetNextNode(const nsINode * aRoot)  Line 1107	C++
 	xul.dll!nsContentList::PopulateSelf(unsigned int aNeededLength)  Line 900 + 0xf bytes	C++
 	xul.dll!nsContentList::BringSelfUpToDate(int aDoFlush)  Line 969	C++
 	xul.dll!nsContentList::IndexOf(nsIContent * aContent, int aDoFlush)  Line 570	C++
 	xul.dll!nsContentUtils::GenerateStateKey(nsIContent * aContent, const nsIDocument * aDocument, nsIStatefulFrame::SpecialStateID aID, nsACString_internal & aKey)  Line 1920 + 0x13 bytes	C++
 	xul.dll!nsGenericHTMLElement::GetLayoutHistoryAndKey(nsGenericHTMLElement * aContent, int aRead, nsILayoutHistoryState * * aHistory, nsACString_internal & aKey)  Line 1527 + 0x18 bytes	C++
 	xul.dll!nsGenericHTMLElement::GetPrimaryPresState(nsGenericHTMLElement * aContent, nsPresState * * aPresState)  Line 1476 + 0x27 bytes	C++
 	xul.dll!nsHTMLSelectElement::SaveState()  Line 1649 + 0x10 bytes	C++
 	xul.dll!nsGenericHTMLFormElement::UnbindFromTree(int aDeep, int aNullParent)  Line 2661	C++

... more unbinding stuff
Comment 15 Boris Zbarsky [:bz] (still a bit busy) 2011-10-31 08:10:08 PDT
> Here there's a list with all recent crashes with this signature

Thanks for that.  So I see two distinct kinds of crashes in that list:

1)  More common: Crash at address 0x18 == 24 (all of these on Windows)
2)  Everything else

Looking at an instance of the 0x18, it has, as usual, a pretty bogus stack near the top, and is claimed to be on this line in nsContentList.cpp:

   900      cur = cur->GetNextNode(mRootNode);

Importantly, the bottom of the stack is CC unlinking.

24 is the offset of mNextSibling in nsINode in a 32-bit build.

So presumably we're trying to get mNextSibling on a null pointer.

Looking at GetNextNode, it basically works like this:

  1)  Check for first child; if there return it.
  2)  Check for next sibling; if there return it.
  3)  Walk up the parent chain until we hit aRoot, checking next siblings.

There is no test for the parent chain hitting null before we hit aRoot, because that's not supposed to ever happen.  But I would bet that it's happening in this case.

It's hard to tell exactly what the exact order of things is that gets us to that situation because of the broken stack, but if I take at face value the parts of the stack that make sense it looks like nsGenericElement::cycleCollection::Unlink calls UnbindFromTree on a <body>, which recursively calls UnbindFromTree on its descendants and at some point nsGenericHTMLFormElement::UnbindFromTree calls SaveState() which calls GetPrimaryPresState(), which ends up generating a state key, which needs to walk the DOM.  And presumably Unlink does not dispatch ContentRemoved notifications before unbinding, so it may be possible for us to end up in a state where the nodelist has references to nodes in the tree being unlinked....

Let me see if I can write a testcase.
Comment 16 Boris Zbarsky [:bz] (still a bit busy) 2011-10-31 08:10:56 PDT
Yeah, comment 14 is exactly what would result from the scenario comment 15 describes.  Kyle, how did you get that to happen?
Comment 17 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-10-31 08:12:40 PDT
I loaded the URL from comment 9 in a debug beta build.
Comment 18 Boris Zbarsky [:bz] (still a bit busy) 2011-10-31 08:14:51 PDT
So another note: since nodes hold strong refs to parents and nodelists hold strong refs to their nodes, this can only get triggered if the nodelist in question (probably document.forms in this case) is part of the set of things being unlinked.

If we had some way to know in the nodelist code that this is the case, we could just bail out.  If we had some way to know in nsGenericHTMLFormElement::UnbindFromTree that we don't need to save state in this case (is that even true?) we could bail out of there.

Alternately, we could add a basic sanity check to PopulateSelf that the starting node is a descendant of mRoot, I guess.  Or add an extra null-check to GetNextNode, but I'd sort of like to avoid that.
Comment 19 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-10-31 08:18:06 PDT
(In reply to Boris Zbarsky (:bz) from comment #18)
> So another note: since nodes hold strong refs to parents ...

That's not true on 8 though is it?
Comment 20 Boris Zbarsky [:bz] (still a bit busy) 2011-10-31 08:23:59 PDT
> I loaded the URL from comment 9 in a debug beta build.

Hmm.  I'd tried nightly and didn't get that.  But I do see it in beta.

So in that case, what happens is that |count| in PopulateSelf is 0.  But mRootNode is the document, while |cur| is nsHTMLTableElement.  And cur->mParent is in fact null.

So we managed to walk into a subtree that's already been unbound.  This is different from the scenario I was thinking of in comment 18 (where count != 0 and we start with mElements[count - 1] which happens to be in a subtree which is partially unbound)!

And the reason _that_ can happen is that the nsGenericElement unlink method unbinds all the child nodes _before_ removing them from mAttrsAndChildren.  This is just wrong.  All other places unbind after removing from the child list, and code relies on that.  Looking now to see why this code is written this way.
Comment 21 Boris Zbarsky [:bz] (still a bit busy) 2011-10-31 08:27:20 PDT
Oh, and the situation I was worried about can also arise, I believe: we don't disconnect the content list until NodeWillBeDestroyed, which is called from nsNodeUtils::LastRelease.  I wonder whether we should call NodeWillBeDestroyed from unlink as well?
Comment 22 Olli Pettay [:smaug] 2011-10-31 08:32:45 PDT
LastRelease/NodeWillBeDestroyed will be called when CC releases the node. Or are you thinking something else?
Comment 23 Boris Zbarsky [:bz] (still a bit busy) 2011-10-31 08:44:31 PDT
Blame dates back to CVS.  And lots of the changes are duplicated between hg and CVS, sigh...

In any case, the unbind-before-remove was added in:

3.603 <jonas@sicking.cc> 2007-11-29 00:41
Bug 348156: Fix leaks by relying on cycle collector rather than calling UnbindFromTree on all nodes. r/sr=jst

I'm going to try reordering this to make sure that tree state is self-consistent by the time UnbindFromTree happens and see how that goes.

For the case I'd been worrying about, the concern is that a content list has X as mRootNode and both X and the content list are getting unlinked.  X gets unlinked first, so it removes all its kids but does NOT send notifications.  One of the kids getting unbound tries to SaveState lands in content list code, the content list starts walking from the last kid it has (which can be non-null because the content list has not been unlinked yet), and is suddenly operating on nodes that are not descendants of mRootNode...  Jonas, Peter, Olli, can you think of any reason that _can't_ happen?
Comment 24 Boris Zbarsky [:bz] (still a bit busy) 2011-10-31 09:17:58 PDT
OK, reordering the unbind and remove calls fixes at least the comment 9 testcase for me.
Comment 25 Boris Zbarsky [:bz] (still a bit busy) 2011-10-31 21:42:33 PDT
Created attachment 570928 [details] [diff] [review]
Make sure, when unlinking, to remove the kids from our child list before unbinding them.

Olli, how do you feel about this for branches?  The patch applies as-is to m-c and beta, so probably also to aurora... but I doubt we want it for beta at this point.  Taking it on Aurora might be a good thing, though.
Comment 26 Boris Zbarsky [:bz] (still a bit busy) 2011-10-31 21:43:35 PDT
Taking this.  I'll file a followup on the end of comment 23 once this lands, unless I can convince myself that situation can't arise.
Comment 27 Olli Pettay [:smaug] 2011-11-01 12:44:50 PDT
Comment on attachment 570928 [details] [diff] [review]
Make sure, when unlinking, to remove the kids from our child list before unbinding them.

># HG changeset patch
># User Boris Zbarsky <bzbarsky@mit.edu>
># Date 1320122481 14400
># Node ID 3dbd25f695f125f7d4aec8de19ee17944a4354a5
># Parent  c8d7bd7d9c0afa4a5c308a9386e465a738510361
>Bug 620122.  Make sure, when unlinking, to remove the kids from our child list before unbinding them.  r=smaug
>
>diff --git a/content/base/src/nsGenericElement.cpp b/content/base/src/nsGenericElement.cpp
>--- a/content/base/src/nsGenericElement.cpp
>+++ b/content/base/src/nsGenericElement.cpp
>@@ -4216,20 +4216,29 @@ NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN(ns
>     PRUint32 childCount = tmp->mAttrsAndChildren.ChildCount();
>     if (childCount) {
>       // Don't allow script to run while we're unbinding everything.
>       nsAutoScriptBlocker scriptBlocker;
>       while (childCount-- > 0) {
>         // Once we have XPCOMGC we shouldn't need to call UnbindFromTree.
>         // We could probably do a non-deep unbind here when IsInDoc is false
>         // for better performance.
>-        tmp->mAttrsAndChildren.ChildAt(childCount)->UnbindFromTree();
>+
>+        // Hold a strong ref to the node when we remove it, because we may be
>+        // the last reference to it.  We need to call RemoveChildAt() and
>+        // update mFirstChild before calling UnbindFromTree, since this last
>+        // can notify various observers and they should really see consistent
>+        // tree state.
>+        nsCOMPtr<nsIContent> child = tmp->mAttrsAndChildren.ChildAt(childCount);
>         tmp->mAttrsAndChildren.RemoveChildAt(childCount);
>+        if (childCount == 0) {
>+          tmp->mFirstChild = nsnull;
>+        }
>+        child->UnbindFromTree();
>       }
>-      tmp->mFirstChild = nsnull;
>     }
I really wish you could add
already_AddRefed<nsIContent> nsAttrAndChildArray::TakeChildAt
so that we don't need to do extra addref/release.
Comment 28 Boris Zbarsky [:bz] (still a bit busy) 2011-11-01 13:30:26 PDT
I can do that, sure.
Comment 29 Boris Zbarsky [:bz] (still a bit busy) 2011-11-01 13:43:03 PDT
Created attachment 571132 [details] [diff] [review]
With TakeChildAt
Comment 30 Marcia Knous [:marcia - use ni] 2011-11-02 16:16:57 PDT
This signature showed up on the Firefox 8 explosive report today: https://crash-analysis.mozilla.com/rkaiser/2011-11-01/2011-11-01.firefox.8.explosiveness.html. Farmville was mentioned in one of the comments.
Comment 31 Boris Zbarsky [:bz] (still a bit busy) 2011-11-02 20:40:40 PDT
http://hg.mozilla.org/integration/mozilla-inbound/rev/2ab1c36d6e27

This should significantly reduce the frequency of this crash, at the very least.

I've filed bug 699321 on the last part of comment 23.
Comment 32 Boris Zbarsky [:bz] (still a bit busy) 2011-11-02 20:42:28 PDT
Comment on attachment 571132 [details] [diff] [review]
With TakeChildAt

Requesting aurora and beta approval.  This should be a pretty safe patch that just makes unlinking do the child node removal steps in the same order as every single other callsite that removes child nodes.  And it fixes what's apparently a common crash.
Comment 33 Marco Bonardo [::mak] 2011-11-03 08:52:36 PDT
https://hg.mozilla.org/mozilla-central/rev/2ab1c36d6e27
Comment 34 christian 2011-11-07 10:41:59 PST
Comment on attachment 571132 [details] [diff] [review]
With TakeChildAt

[triage comment]
Too late for Firefox 8 beta, but we'll approve it for Firefox 9 aurora. Please land asap.
Comment 35 Boris Zbarsky [:bz] (still a bit busy) 2011-11-07 11:39:15 PST
https://hg.mozilla.org/releases/mozilla-aurora/rev/c845d62ee43f
Comment 36 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-12-07 13:01:05 PST
Does this crash affect Firefox, or is it Thunderbird only?
Comment 37 Robert Kaiser 2011-12-07 13:13:08 PST
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #36)
> Does this crash affect Firefox, or is it Thunderbird only?

As I said in comment #8 and comment #11, it happens in Firefox as well - and https://crash-stats.mozilla.com/report/list?signature=nsContentList%3A%3APopulateSelf%28unsigned%20int%29 confirms that.
Comment 38 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-12-07 13:15:46 PST
Thanks Robert. Is there a testcase QA can use to verify this fix in Firefox?
Comment 39 Boris Zbarsky [:bz] (still a bit busy) 2011-12-07 13:24:30 PST
Not that I know of.  But you can look at crash frequencies in crash-stats, right?
Comment 40 Wayne Mery (:wsmwk, NI for questions) 2011-12-07 13:26:13 PST
in fact in thunderbird it is quite rare (comment 5).  only 28 crashes in 12 weeks for tb6,7,8. So it may be at least a month after dec 20 release to confirm the problem is gone from version 9
Comment 41 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-12-07 13:30:35 PST
I'm seeing 47 reports with this signature on Firefox 9.0b4...

http://bit.ly/uExcY1
Comment 42 Boris Zbarsky [:bz] (still a bit busy) 2011-12-07 13:36:51 PST
Bug 699321 may also cause crashes with this signature.
Comment 43 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-12-07 13:38:05 PST
(In reply to Boris Zbarsky (:bz) from comment #42)
> Bug 699321 may also cause crashes with this signature.

Is there a way I can verify these crashes are being produced by that bug and not this one?
Comment 44 Boris Zbarsky [:bz] (still a bit busy) 2011-12-07 13:42:34 PST
Not that I know of.

What you can do is examine crash frequencies and see whether there's any change, I guess. 

The thing we should really do is fix that bug and then see whether the crash is gone altogether.
Comment 45 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-12-07 14:33:45 PST
I'm going to call this qa- and we can handle the broad verification on bug 699321.

Note You need to log in before you can comment on or make changes to this bug.