Last Comment Bug 480300 - mozilla-central Tinderbox random crashes in layout/generic/crashtests/* [@ nsHTMLAnchorElement::UnbindFromTree]
: mozilla-central Tinderbox random crashes in layout/generic/crashtests/* [@ ns...
: crash, regression
Product: Core
Classification: Components
Component: DOM (show other bugs)
: Trunk
: All All
: -- critical (vote)
: ---
Assigned To: Peter Van der Beken [:peterv]
Depends on:
Blocks: 480322
  Show dependency treegraph
Reported: 2009-02-26 04:42 PST by Mats Palmgren (:mats)
Modified: 2011-11-10 11:14 PST (History)
15 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

stack from Linux debug build (8.50 KB, text/plain)
2009-03-02 08:12 PST, Mats Palmgren (:mats)
no flags Details
Wallpaper I just pushed (1.05 KB, patch)
2009-03-04 11:01 PST, Boris Zbarsky [:bz]
no flags Details | Diff | Review

Description Mats Palmgren (:mats) 2009-02-26 04:42:04 PST
Linux mozilla-central Tinderbox random crashes in layout/generic/crashtests/*

There's been frequent random crashes (Segmentation fault) in
layout/generic/crashtests/* recently:

Linux mozilla-central unit test on 2009/02/26 02:58:26:

Linux mozilla-central unit test on 2009/02/25 04:12:07:

Linux mozilla-central unit test on 2009/02/25 01:08:38:

Linux mozilla-central unit test on 2009/02/24 17:56:08:

Linux mozilla-central unit test on 2009/02/24 17:20:40:
Comment 1 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-02-26 06:31:07 PST
I ran the layout/generic/crashtests tests under valgrind a few days ago to investigate, and hit only bug 479502 (which I'd hit doing something else a few days before).  Given the top one of the logs listed, that probably isn't the problem.
Comment 2 Daniel Holbert [:dholbert] 2009-02-26 08:51:27 PST
Here's one more failure of this type:

Linux mozilla-central unit test on 2009/02/26 04:58:26
(Segfault after loading layout/generic/crashtests/354458-1.html)
Comment 3 Ben Hearsum (:bhearsum) 2009-02-26 09:59:08 PST

I've managed to reproduce this crash some of the time running crashtest manually on one of our unit test boxes. I'm running with XPCOM_DEBUG_BREAK=stack-and-abort to try and get a stack right now - is that the correct way to do so? Anything other data I can get for you?
Comment 4 Ben Hearsum (:bhearsum) 2009-02-26 10:29:25 PST
I don't seem to be able to get a stack with XPCOM_DEBUG_BREAK=stack-and-abort. It crashes with:
REFTEST TEST-PASS | file:///builds/moz2_slave/mozilla-central-linux-unittest/build/layout/generic/crashtests/370866-1.xhtml | (LOAD ONLY)
../../objdir/dist/bin/ line 131:  9183 Segmentation fault      "$prog" ${1+"$@"}

and puts me back at the command line. I don't see any files containing a crash stack nor is one printed out on the command line.

I'm going to try XPCOM_DEBUG_BREAK=break and run through gdb, but guidance here would be much appreciated.
Comment 5 Ben Hearsum (:bhearsum) 2009-02-26 10:44:49 PST
We SIGSEGV'ed after: REFTEST TEST-PASS | file:///builds/moz2_slave/mozilla-central-linux-unittest/build/layout/generic/crashtests/373868-1.xhtml | (LOAD ONLY)

Here's the stack that gdb gave me:
#0  0x01134d1d in nsHTMLAnchorElement::UnbindFromTree () from ../../objdir/dist/bin/
#1  0x0138e43c in nsElementDeletionObserver::NodeWillBeDestroyed () from ../../objdir/dist/bin/
#2  0x010db6b0 in nsNodeUtils::LastRelease () from ../../objdir/dist/bin/
#3  0x010cab1b in nsGenericElement::Release () from ../../objdir/dist/bin/
#4  0x01138f7b in nsHTMLBodyElement::Release () from ../../objdir/dist/bin/
#5  0x0163914d in nsXPCOMCycleCollectionParticipant::Unroot () from ../../objdir/dist/bin/
#6  0x0167676b in nsCycleCollector::CollectWhite () from ../../objdir/dist/bin/
#7  0x016767c3 in nsCycleCollector::FinishCollection () from ../../objdir/dist/bin/
#8  0x0167681b in nsCycleCollector_finishCollection () from ../../objdir/dist/bin/
#9  0x00dcf46e in XPCCycleCollectGCCallback () from ../../objdir/dist/bin/
#10 0x006ae58a in js_GC () from ../../objdir/dist/bin/
#11 0x0068af4b in JS_GC () from ../../objdir/dist/bin/
#12 0x00dccb11 in nsXPConnect::Collect () from ../../objdir/dist/bin/
#13 0x01677302 in nsCycleCollector::Collect () from ../../objdir/dist/bin/
#14 0x016773c0 in nsCycleCollector_collect () from ../../objdir/dist/bin/
#15 0x012083cc in nsJSContext::CC () from ../../objdir/dist/bin/
#16 0x0120a0a4 in nsJSContext::CCIfUserInactive () from ../../objdir/dist/bin/
#17 0x0120a228 in GCTimerFired () from ../../objdir/dist/bin/
#18 0x0166dbd3 in nsTimerImpl::Fire () from ../../objdir/dist/bin/
#19 0x0166e319 in nsTimerEvent::Run () from ../../objdir/dist/bin/
#20 0x0166b305 in nsThread::ProcessNextEvent () from ../../objdir/dist/bin/
#21 0x01638ee7 in NS_ProcessNextEvent_P () from ../../objdir/dist/bin/
#22 0x0159242a in nsBaseAppShell::Run () from ../../objdir/dist/bin/
#23 0x0145968e in nsAppStartup::Run () from ../../objdir/dist/bin/
#24 0x00db9c7f in XRE_main () from ../../objdir/dist/bin/
#25 0x080495b1 in main ()
Comment 6 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-02-26 11:08:06 PST
Additional information from gdb that would be useful would be the output of the following (first of the three should give no output):

frame 0
inf reg

which ought to give at least a bit of information about why it's crashing.

(It seems like the major possibilities would be:
 * UnbindFromTree not being able to handle a node that's been unlinked
 * a bug in unlink where we fail to clear out something that we release
 * a reference counting bug that leads us to access deleted memory.)
Comment 7 Ben Hearsum (:bhearsum) 2009-02-26 11:10:55 PST
(gdb) frame 0
#0  0x01134d1d in nsHTMLAnchorElement::UnbindFromTree () from ../../objdir/dist/bin/
(gdb) disass
Dump of assembler code for function _ZN19nsHTMLAnchorElement14UnbindFromTreeEii:
0x01134cd8 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+0>:	push   %ebp
0x01134cd9 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+1>:	mov    %esp,%ebp
0x01134cdb <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+3>:	push   %esi
0x01134cdc <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+4>:	sub    $0x4,%esp
0x01134cdf <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+7>:	mov    0x8(%ebp),%esi
0x01134ce2 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+10>:	testb  $0x1,0xc(%esi)
0x01134ce6 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+14>:	jne    0x1134cfd <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+37>
0x01134ce8 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+16>:	push   %eax
0x01134ce9 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+17>:	pushl  0x10(%ebp)
0x01134cec <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+20>:	pushl  0xc(%ebp)
0x01134cef <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+23>:	push   %esi
0x01134cf0 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+24>:	call   0x112dbc0 <_ZN20nsGenericHTMLElement14UnbindFromTreeEii>
0x01134cf5 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+29>:	add    $0x10,%esp
0x01134cf8 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+32>:	mov    0xfffffffc(%ebp),%esi
0x01134cfb <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+35>:	leave  
0x01134cfc <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+36>:	ret    
0x01134cfd <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+37>:	push   %eax
0x01134cfe <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+38>:	push   %eax
0x01134cff <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+39>:	push   $0x0
0x01134d01 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+41>:	push   %esi
0x01134d02 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+42>:	call   0x112f6c0 <_ZN20nsGenericHTMLElement17RegUnRegAccessKeyEi>
0x01134d07 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+47>:	add    $0x10,%esp
0x01134d0a <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+50>:	xor    %edx,%edx
0x01134d0c <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+52>:	testb  $0x1,0xc(%esi)
0x01134d10 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+56>:	je     0x1134d1b <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+67>
0x01134d12 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+58>:	mov    0x8(%esi),%eax
0x01134d15 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+61>:	mov    0x14(%eax),%eax
0x01134d18 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+64>:	mov    0x8(%eax),%edx
0x01134d1b <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+67>:	push   %eax
0x01134d1c <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+68>:	push   %eax
0x01134d1d <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+69>:	mov    (%edx),%eax
0x01134d1f <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+71>:	push   %esi
0x01134d20 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+72>:	push   %edx
0x01134d21 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+73>:	call   *0x1c8(%eax)
0x01134d27 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+79>:	add    $0x10,%esp
0x01134d2a <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+82>:	movl   $0x0,0x28(%esi)
0x01134d31 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+89>:	jmp    0x1134ce8 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+16>
0x01134d33 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+91>:	nop    
---Type <return> to continue, or q <return> to quit---
End of assembler dump.
(gdb) inf reg
eax            0xafec79a0	-1343456864
ecx            0xb7c9479c	-1211545700
edx            0x0	0
ebx            0x19bf510	26998032
esp            0xbfa9c618	0xbfa9c618
ebp            0xbfa9c628	0xbfa9c628
esi            0xafd88130	-1344765648
edi            0xaff49070	-1342926736
eip            0x1134d1d	0x1134d1d <nsHTMLAnchorElement::UnbindFromTree(int, int)+69>
eflags         0x10202	[ IF RF ]
cs             0x73	115
ss             0x7b	123
ds             0x7b	123
es             0x7b	123
fs             0x0	0
gs             0x33	51
Comment 8 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2009-02-26 11:24:36 PST
I think that means GetCurrentDoc() is returning null, although I read the dissassembly pretty quickly and I might be off.

Could Unlink be putting us in a state where IsInDoc() is true but GetCurrentDoc() is null (because GetOwnerDoc() is null)?
Comment 9 Daniel Holbert [:dholbert] 2009-02-26 11:33:59 PST
This crash just happened on

REFTEST TEST-PASS | file:///builds/moz2_slave/mozilla-central-macosx-unittest/build/layout/generic/crashtests/373859-1.html | (LOAD ONLY)
2009-02-26 10:54:21.494 firefox-bin[50525:c103] Invalid memory access of location 00000000 eip=0191d2d1
../../objdir/dist/ line 131: 50525 Bus error               "$prog" ${1+"$@"}
program finished with exit code 138
TinderboxPrint: crashtest<br/><em class="testfail">FAIL</em>

Platform --> All/All
Comment 10 Daniel Holbert [:dholbert] 2009-02-26 11:35:04 PST
(In reply to comment #9)
> This crash just happened on 
... I meant to say: happened on the "OS X 10.5.2 mozilla-central unit test" box.
Comment 11 Mats Palmgren (:mats) 2009-02-26 11:56:20 PST
Thanks for the stack Ben, now I see that it's exactly the same crash
that I got yesterday in my debug build (x86_64 Linux) during crashtests.
I could only reproduce it once so I disregarded it at the time.
I remember looking at the result of GetCurrentDoc() and it was null.

Maybe we should null-check that instead of IsInDoc() ?

nsHTMLAnchorElement::UnbindFromTree(PRBool aDeep, PRBool aNullParent)
  if (IsInDoc()) {
    // If this link is ever reinserted into a document, it might
    // be under a different xml:base, so forget the cached state now
    mLinkState = eLinkState_Unknown;
  nsGenericHTMLElement::UnbindFromTree(aDeep, aNullParent);
Comment 12 Boris Zbarsky [:bz] 2009-02-26 12:05:43 PST
Hmm.  GetOwnerDoc() might be null if the document has already been deleted. That's the only way I can see for that pointer to become null, since nsGenericElement's Unlink doesn't drop the nodeinfo, the nodeinfo doesn't drop the nodeinfo manager, and the nodeinfo manager doesn't drop the document until ~nsDocument.

But ~nsDocument also unbinds all the kids, so there shouldn't be any kids left after that with IsInDoc() testing true.

Point is, IsInDoc() should guarantee that GetCurrentDoc() returns non-null.  All sorts of code relies on that, so just changing this one place won't help much.
Comment 13 Ben Hearsum (:bhearsum) 2009-02-26 12:13:29 PST
Is it still useful for me to keep the gdb session open or can I return this machine to the pool?
Comment 14 Mats Palmgren (:mats) 2009-02-26 12:25:13 PST
Not for me, but Boris or David might want to look?
Comment 16 Mats Palmgren (:mats) 2009-03-02 08:12:55 PST
Created attachment 364912 [details]
stack from Linux debug build

I was able to reproduce this again while running the crashtest suite.
It's a native anonymous node created by the editor.
I printed a few things from gdb but I don't know what to look for...
Let me know if you want more, I'll keep the debugger session open for
a few hours...  I can also add a few printfs to this tree if you want...
Comment 17 Peter Van der Beken [:peterv] 2009-03-02 08:48:22 PST
Hmm, that code relies on the NodeWillBeDestroyed notification for the parent to unbind the native anonymous child. That notification is probably too late though, maybe we should switch to ParentChainChanged?
Comment 18 Boris Zbarsky [:bz] 2009-03-02 09:26:58 PST
Ah, native anon content wouldn't be unbound by the document's destruction, indeed.

Can we just back that editor mess out altogether?  :(
Comment 19 Mike Beltzner [:beltzner, not reading bugmail] 2009-03-03 22:24:01 PST
Happened again:
Comment 20 Mike Beltzner [:beltzner, not reading bugmail] 2009-03-04 10:48:23 PST
Twice again on trunk OSX:

REFTEST TEST-PASS | file:///builds/moz2_slave/mozilla-central-macosx-unittest/build/layout/generic/crashtests/387215-1.xhtml | (LOAD ONLY)
2009-03-04 09:57:29.532 firefox-bin[46339:c103] Invalid memory access of location 00000000 eip=018fb121
../../objdir/dist/ line 131: 46339 Bus error               "$prog" ${1+"$@"}
program finished with exit code 138

REFTEST TEST-PASS | file:///builds/moz2_slave/mozilla-central-macosx-unittest/build/layout/generic/crashtests/323493-1.html | (LOAD ONLY)
2009-03-04 08:33:59.093 firefox-bin[88272:bf03] Invalid memory access of location 00000000 eip=018fb121
../../objdir/dist/ line 131: 88272 Bus error               "$prog" ${1+"$@"}
program finished with exit code 138
Comment 21 Boris Zbarsky [:bz] 2009-03-04 11:01:25 PST
Created attachment 365482 [details] [diff] [review]
Wallpaper I just pushed

I just checked this in.  This is Mats' suggestion, basically.  It's wallpaper, but with any luck it'll at least stop the test oranges...
Comment 22 Ben Hearsum (:bhearsum) 2009-03-06 08:51:08 PST
The last crashtest crash on m-c OS X unit test was at: 2009-03-04 09:44:40.

That lines up pretty well with your patch, bz.
Comment 23 Ben Hearsum (:bhearsum) 2009-03-06 08:54:27 PST
Same thing on Linux. The last crash was: Tue Mar 3 11:17:32 2009
Comment 24 Barry Edwin GIlmour 2009-04-27 18:26:06 PDT
For what it is worth, I just had this bug-crash while freshly-opening my m.d.a.seamonkey newsgroup account-folder in SeaMonkey's Mail/News window..

0|0||nsHTMLAnchorElement::UnbindFromTree(int, int)||228|0x2)

which I think is same-as this bugs Attachment 364912 [details] #5
(0x00007fdd26f56870 in nsHTMLAnchorElement::UnbindFromTree (this=0x1763080, aDeep=1, aNullParent=1) at /usr/moz/hg5/content/html/content/src/nsHTMLAnchorElement.cpp:228)

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b5pre) Gecko/20090427 Lightning/1.0pre SeaMonkey/2.0b1pre ID:20090427000534

Bug-report at:-
"SeaMonkey 2.0b1pre Crash Report [@ nsHTMLAnchorElement::UnbindFromTree(int, int) ] ID: 75978574-c3c2-438b-aa40-b592e2090427 Signature: nsHTMLAnchorElement::UnbindFromTree(int, int)"
Comment 25 Sheila Mooney 2011-11-10 11:14:14 PST
I don't see this anymore on anything post 4.0. Resolving works for me.

Note You need to log in before you can comment on or make changes to this bug.