mozilla-central Tinderbox random crashes in layout/generic/crashtests/* [@ nsHTMLAnchorElement::UnbindFromTree]




9 years ago
6 years ago


(Reporter: Mats Palmgren (vacation - back in August), Assigned: peterv)


({crash, regression})

crash, regression

Firefox Tracking Flags

(Not tracked)


(crash signature)


(2 attachments)

Linux mozilla-central Tinderbox random crashes in layout/generic/crashtests/*

There's been frequent random crashes (Segmentation fault) in
layout/generic/crashtests/* recently:

Linux mozilla-central unit test on 2009/02/26 02:58:26:

Linux mozilla-central unit test on 2009/02/25 04:12:07:

Linux mozilla-central unit test on 2009/02/25 01:08:38:

Linux mozilla-central unit test on 2009/02/24 17:56:08:

Linux mozilla-central unit test on 2009/02/24 17:20:40:
I ran the layout/generic/crashtests tests under valgrind a few days ago to investigate, and hit only bug 479502 (which I'd hit doing something else a few days before).  Given the top one of the logs listed, that probably isn't the problem.
Here's one more failure of this type:

Linux mozilla-central unit test on 2009/02/26 04:58:26
(Segfault after loading layout/generic/crashtests/354458-1.html)

I've managed to reproduce this crash some of the time running crashtest manually on one of our unit test boxes. I'm running with XPCOM_DEBUG_BREAK=stack-and-abort to try and get a stack right now - is that the correct way to do so? Anything other data I can get for you?
I don't seem to be able to get a stack with XPCOM_DEBUG_BREAK=stack-and-abort. It crashes with:
REFTEST TEST-PASS | file:///builds/moz2_slave/mozilla-central-linux-unittest/build/layout/generic/crashtests/370866-1.xhtml | (LOAD ONLY)
../../objdir/dist/bin/ line 131:  9183 Segmentation fault      "$prog" ${1+"$@"}

and puts me back at the command line. I don't see any files containing a crash stack nor is one printed out on the command line.

I'm going to try XPCOM_DEBUG_BREAK=break and run through gdb, but guidance here would be much appreciated.
We SIGSEGV'ed after: REFTEST TEST-PASS | file:///builds/moz2_slave/mozilla-central-linux-unittest/build/layout/generic/crashtests/373868-1.xhtml | (LOAD ONLY)

Here's the stack that gdb gave me:
#0  0x01134d1d in nsHTMLAnchorElement::UnbindFromTree () from ../../objdir/dist/bin/
#1  0x0138e43c in nsElementDeletionObserver::NodeWillBeDestroyed () from ../../objdir/dist/bin/
#2  0x010db6b0 in nsNodeUtils::LastRelease () from ../../objdir/dist/bin/
#3  0x010cab1b in nsGenericElement::Release () from ../../objdir/dist/bin/
#4  0x01138f7b in nsHTMLBodyElement::Release () from ../../objdir/dist/bin/
#5  0x0163914d in nsXPCOMCycleCollectionParticipant::Unroot () from ../../objdir/dist/bin/
#6  0x0167676b in nsCycleCollector::CollectWhite () from ../../objdir/dist/bin/
#7  0x016767c3 in nsCycleCollector::FinishCollection () from ../../objdir/dist/bin/
#8  0x0167681b in nsCycleCollector_finishCollection () from ../../objdir/dist/bin/
#9  0x00dcf46e in XPCCycleCollectGCCallback () from ../../objdir/dist/bin/
#10 0x006ae58a in js_GC () from ../../objdir/dist/bin/
#11 0x0068af4b in JS_GC () from ../../objdir/dist/bin/
#12 0x00dccb11 in nsXPConnect::Collect () from ../../objdir/dist/bin/
#13 0x01677302 in nsCycleCollector::Collect () from ../../objdir/dist/bin/
#14 0x016773c0 in nsCycleCollector_collect () from ../../objdir/dist/bin/
#15 0x012083cc in nsJSContext::CC () from ../../objdir/dist/bin/
#16 0x0120a0a4 in nsJSContext::CCIfUserInactive () from ../../objdir/dist/bin/
#17 0x0120a228 in GCTimerFired () from ../../objdir/dist/bin/
#18 0x0166dbd3 in nsTimerImpl::Fire () from ../../objdir/dist/bin/
#19 0x0166e319 in nsTimerEvent::Run () from ../../objdir/dist/bin/
#20 0x0166b305 in nsThread::ProcessNextEvent () from ../../objdir/dist/bin/
#21 0x01638ee7 in NS_ProcessNextEvent_P () from ../../objdir/dist/bin/
#22 0x0159242a in nsBaseAppShell::Run () from ../../objdir/dist/bin/
#23 0x0145968e in nsAppStartup::Run () from ../../objdir/dist/bin/
#24 0x00db9c7f in XRE_main () from ../../objdir/dist/bin/
#25 0x080495b1 in main ()
Additional information from gdb that would be useful would be the output of the following (first of the three should give no output):

frame 0
inf reg

which ought to give at least a bit of information about why it's crashing.

(It seems like the major possibilities would be:
 * UnbindFromTree not being able to handle a node that's been unlinked
 * a bug in unlink where we fail to clear out something that we release
 * a reference counting bug that leads us to access deleted memory.)
(gdb) frame 0
#0  0x01134d1d in nsHTMLAnchorElement::UnbindFromTree () from ../../objdir/dist/bin/
(gdb) disass
Dump of assembler code for function _ZN19nsHTMLAnchorElement14UnbindFromTreeEii:
0x01134cd8 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+0>:	push   %ebp
0x01134cd9 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+1>:	mov    %esp,%ebp
0x01134cdb <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+3>:	push   %esi
0x01134cdc <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+4>:	sub    $0x4,%esp
0x01134cdf <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+7>:	mov    0x8(%ebp),%esi
0x01134ce2 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+10>:	testb  $0x1,0xc(%esi)
0x01134ce6 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+14>:	jne    0x1134cfd <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+37>
0x01134ce8 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+16>:	push   %eax
0x01134ce9 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+17>:	pushl  0x10(%ebp)
0x01134cec <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+20>:	pushl  0xc(%ebp)
0x01134cef <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+23>:	push   %esi
0x01134cf0 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+24>:	call   0x112dbc0 <_ZN20nsGenericHTMLElement14UnbindFromTreeEii>
0x01134cf5 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+29>:	add    $0x10,%esp
0x01134cf8 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+32>:	mov    0xfffffffc(%ebp),%esi
0x01134cfb <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+35>:	leave  
0x01134cfc <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+36>:	ret    
0x01134cfd <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+37>:	push   %eax
0x01134cfe <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+38>:	push   %eax
0x01134cff <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+39>:	push   $0x0
0x01134d01 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+41>:	push   %esi
0x01134d02 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+42>:	call   0x112f6c0 <_ZN20nsGenericHTMLElement17RegUnRegAccessKeyEi>
0x01134d07 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+47>:	add    $0x10,%esp
0x01134d0a <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+50>:	xor    %edx,%edx
0x01134d0c <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+52>:	testb  $0x1,0xc(%esi)
0x01134d10 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+56>:	je     0x1134d1b <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+67>
0x01134d12 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+58>:	mov    0x8(%esi),%eax
0x01134d15 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+61>:	mov    0x14(%eax),%eax
0x01134d18 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+64>:	mov    0x8(%eax),%edx
0x01134d1b <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+67>:	push   %eax
0x01134d1c <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+68>:	push   %eax
0x01134d1d <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+69>:	mov    (%edx),%eax
0x01134d1f <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+71>:	push   %esi
0x01134d20 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+72>:	push   %edx
0x01134d21 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+73>:	call   *0x1c8(%eax)
0x01134d27 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+79>:	add    $0x10,%esp
0x01134d2a <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+82>:	movl   $0x0,0x28(%esi)
0x01134d31 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+89>:	jmp    0x1134ce8 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+16>
0x01134d33 <_ZN19nsHTMLAnchorElement14UnbindFromTreeEii+91>:	nop    
---Type <return> to continue, or q <return> to quit---
End of assembler dump.
(gdb) inf reg
eax            0xafec79a0	-1343456864
ecx            0xb7c9479c	-1211545700
edx            0x0	0
ebx            0x19bf510	26998032
esp            0xbfa9c618	0xbfa9c618
ebp            0xbfa9c628	0xbfa9c628
esi            0xafd88130	-1344765648
edi            0xaff49070	-1342926736
eip            0x1134d1d	0x1134d1d <nsHTMLAnchorElement::UnbindFromTree(int, int)+69>
eflags         0x10202	[ IF RF ]
cs             0x73	115
ss             0x7b	123
ds             0x7b	123
es             0x7b	123
fs             0x0	0
gs             0x33	51
I think that means GetCurrentDoc() is returning null, although I read the dissassembly pretty quickly and I might be off.

Could Unlink be putting us in a state where IsInDoc() is true but GetCurrentDoc() is null (because GetOwnerDoc() is null)?
This crash just happened on

REFTEST TEST-PASS | file:///builds/moz2_slave/mozilla-central-macosx-unittest/build/layout/generic/crashtests/373859-1.html | (LOAD ONLY)
2009-02-26 10:54:21.494 firefox-bin[50525:c103] Invalid memory access of location 00000000 eip=0191d2d1
../../objdir/dist/ line 131: 50525 Bus error               "$prog" ${1+"$@"}
program finished with exit code 138
TinderboxPrint: crashtest<br/><em class="testfail">FAIL</em>

Platform --> All/All
Ever confirmed: true
OS: Linux → All
Hardware: x86 → All
Summary: Linux mozilla-central Tinderbox random crashes in layout/generic/crashtests/* → mozilla-central Tinderbox random crashes in layout/generic/crashtests/*
(In reply to comment #9)
> This crash just happened on 
... I meant to say: happened on the "OS X 10.5.2 mozilla-central unit test" box.
Thanks for the stack Ben, now I see that it's exactly the same crash
that I got yesterday in my debug build (x86_64 Linux) during crashtests.
I could only reproduce it once so I disregarded it at the time.
I remember looking at the result of GetCurrentDoc() and it was null.

Maybe we should null-check that instead of IsInDoc() ?

nsHTMLAnchorElement::UnbindFromTree(PRBool aDeep, PRBool aNullParent)
  if (IsInDoc()) {
    // If this link is ever reinserted into a document, it might
    // be under a different xml:base, so forget the cached state now
    mLinkState = eLinkState_Unknown;
  nsGenericHTMLElement::UnbindFromTree(aDeep, aNullParent);

Comment 12

9 years ago
Hmm.  GetOwnerDoc() might be null if the document has already been deleted. That's the only way I can see for that pointer to become null, since nsGenericElement's Unlink doesn't drop the nodeinfo, the nodeinfo doesn't drop the nodeinfo manager, and the nodeinfo manager doesn't drop the document until ~nsDocument.

But ~nsDocument also unbinds all the kids, so there shouldn't be any kids left after that with IsInDoc() testing true.

Point is, IsInDoc() should guarantee that GetCurrentDoc() returns non-null.  All sorts of code relies on that, so just changing this one place won't help much.
Is it still useful for me to keep the gdb session open or can I return this machine to the pool?
Not for me, but Boris or David might want to look?
Component: Layout → Content
QA Contact: layout → content
Summary: mozilla-central Tinderbox random crashes in layout/generic/crashtests/* → mozilla-central Tinderbox random crashes in layout/generic/crashtests/* [@ nsHTMLAnchorElement::UnbindFromTree]
Blocks: 480322
Still happening big time on Linux and Mac

Added a note to the tree about this
Created attachment 364912 [details]
stack from Linux debug build

I was able to reproduce this again while running the crashtest suite.
It's a native anonymous node created by the editor.
I printed a few things from gdb but I don't know what to look for...
Let me know if you want more, I'll keep the debugger session open for
a few hours...  I can also add a few printfs to this tree if you want...

Comment 17

9 years ago
Hmm, that code relies on the NodeWillBeDestroyed notification for the parent to unbind the native anonymous child. That notification is probably too late though, maybe we should switch to ParentChainChanged?

Comment 18

9 years ago
Ah, native anon content wouldn't be unbound by the document's destruction, indeed.

Can we just back that editor mess out altogether?  :(


9 years ago
Assignee: nobody → peterv
Happened again:
Twice again on trunk OSX:

REFTEST TEST-PASS | file:///builds/moz2_slave/mozilla-central-macosx-unittest/build/layout/generic/crashtests/387215-1.xhtml | (LOAD ONLY)
2009-03-04 09:57:29.532 firefox-bin[46339:c103] Invalid memory access of location 00000000 eip=018fb121
../../objdir/dist/ line 131: 46339 Bus error               "$prog" ${1+"$@"}
program finished with exit code 138

REFTEST TEST-PASS | file:///builds/moz2_slave/mozilla-central-macosx-unittest/build/layout/generic/crashtests/323493-1.html | (LOAD ONLY)
2009-03-04 08:33:59.093 firefox-bin[88272:bf03] Invalid memory access of location 00000000 eip=018fb121
../../objdir/dist/ line 131: 88272 Bus error               "$prog" ${1+"$@"}
program finished with exit code 138

Comment 21

9 years ago
Created attachment 365482 [details] [diff] [review]
Wallpaper I just pushed

I just checked this in.  This is Mats' suggestion, basically.  It's wallpaper, but with any luck it'll at least stop the test oranges...
The last crashtest crash on m-c OS X unit test was at: 2009-03-04 09:44:40.

That lines up pretty well with your patch, bz.
Same thing on Linux. The last crash was: Tue Mar 3 11:17:32 2009

Comment 24

8 years ago
For what it is worth, I just had this bug-crash while freshly-opening my m.d.a.seamonkey newsgroup account-folder in SeaMonkey's Mail/News window..

0|0||nsHTMLAnchorElement::UnbindFromTree(int, int)||228|0x2)

which I think is same-as this bugs Attachment 364912 [details] #5
(0x00007fdd26f56870 in nsHTMLAnchorElement::UnbindFromTree (this=0x1763080, aDeep=1, aNullParent=1) at /usr/moz/hg5/content/html/content/src/nsHTMLAnchorElement.cpp:228)

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b5pre) Gecko/20090427 Lightning/1.0pre SeaMonkey/2.0b1pre ID:20090427000534

Bug-report at:-
"SeaMonkey 2.0b1pre Crash Report [@ nsHTMLAnchorElement::UnbindFromTree(int, int) ] ID: 75978574-c3c2-438b-aa40-b592e2090427 Signature: nsHTMLAnchorElement::UnbindFromTree(int, int)"


8 years ago
Component: Content → DOM
QA Contact: content → general
Crash Signature: [@ nsHTMLAnchorElement::UnbindFromTree]

Comment 25

6 years ago
I don't see this anymore on anything post 4.0. Resolving works for me.
Last Resolved: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.