Bugzilla

Christian :Biesinger (don't email me, ping me on IRC)

Comment 2

•

18 years ago

i bet we could override the pure virtual handler and cause a crash, but yeah, the result is currently this dialog with nothing useful that can be done unless you have a build w/ symbols and a debugger, you can't do anything with it.

i suspect the reason for the behavior is that standard null function pointers (which is the default internal theoretical implementation) when called result in call stacks that look like this:

0x0

-that's all-

your average debugger isn't very happy with that, and developers can't do anything with it.

compare:

_abort
_purevirt
_lame_function_that_called_pure_virt

Jesse Ruderman

Comment 3

•

18 years ago

Why would the call stack have only 0x0 on it?  That's not consistent with what I see in Talkback, etc.

Comment 4

•

18 years ago

Stroustrup says in "Design & Evolution of C++" 13.2.3 "My implementation places a pointer to a function called __pure_virtual_called in the vtbl; this function can then be defined to give a reasonable run-time error".

Don't you agree that a "pure virtual function called" message is better than a "program crashed" message?

Jesse Ruderman

Comment 5

•

18 years ago

Not when it interferes with getting a stack trace.

Comment 6

•

18 years ago

if you jump to 0x0 in certain cases, then that's really the entire stack. there are a couple of crashes floating around where that's all you'll see.

the point is that the implementations decided that instead of doing that, they'd provide this purevirtualfunction and have that be the called method.

as i said, i'm pretty sure we could cause our builds to override the default impl with something talkback would catch....

Chris Thomas (CTho) [formerly cst@andrew.cmu.edu cst@yecc.com]

Comment 7

•

18 years ago

Confirming as I get this quite often on Firefox shutdown.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Updated

•

18 years ago

Component: General → DOM: Events

Product: Firefox → Core

Version: unspecified → Trunk

Chris Thomas (CTho) [formerly cst@andrew.cmu.edu cst@yecc.com]

Comment 8

•

18 years ago

jesse: ctho kindly fell into this bug, so my analysis will appear here shortly.

Summary: Microsoft C++ Library Runtime Error (R6025 - pure virtual function call) → Microsoft C++ Library Runtime Error (R6025 - pure virtual function call) [@ purecall - ClassifyWrapper]

Comment 9

•

18 years ago

Attached file stack (ignore frames after the pure virtual) — Details

IRC transcript:
[stuff about pure virtual calls dying]
(2:27:37 PM) <timeless> unfortunately in gecko
(2:27:44 PM) <timeless> we kinda use the windows event loop
(2:28:05 PM) <timeless> so instead of the app spinning waiting for you to bring the developer to the dialog ...
(2:28:12 PM) <timeless> gecko uses the event loop to kill itself
(2:28:38 PM) <timeless> 03 js3250!JS_DHashTableOperate+0x51
(2:28:59 PM) <timeless> 4c js3250!JS_DHashTableEnumerate+0xce
(2:29:00 PM) <timeless> sorry
(2:29:11 PM) <timeless> 0x03 is asserting that it shouldn't be under 0x4c
(2:29:14 PM) <timeless> and it's right
(2:29:25 PM) <timeless> but, the bug is the purecall in 0x41
(2:29:43 PM) <timeless> anything past that is a waste  of time that you can't easily recognize w/o symbols for the runtime+pos
(2:30:02 PM) <timeless> as for debugging the actual core bug
(2:30:05 PM) <timeless> i suppose we could do that
(2:30:09 PM) <timeless> .frame 42
(2:32:44 PM) <CTho>   WrapperSCCEntry *SCCEntry = NS_STATIC_CAST(WrapperSCCEntry*,
(2:32:44 PM) <CTho>     PL_DHashTableOperate(&sWrapperSCCTable, entry->participant->GetSCCIndex(),
(2:32:44 PM) <CTho>                          PL_DHASH_ADD));
(2:32:52 PM) <timeless> ctho: ok
(2:32:58 PM) <timeless> entry->participant
(2:33:27 PM) <CTho> 0x15b31300 class nsIDOMGCParticipant *
(2:36:03 PM) <timeless> ctho: can you turn on *all* of the buttons in calls?
(2:36:51 PM) <timeless> .frame 60
(2:37:06 PM) <timeless> possibly .frame 61
(2:37:11 PM) <timeless> kinda depends
(2:37:16 PM) <timeless> i think .frame 61
(2:37:31 PM) <timeless> ctho: what branch are you using?
(2:37:56 PM) <timeless> 182     nsContentUtils::RemoveListenerManager(this);
(2:37:58 PM) <timeless> from .frame 61
(2:38:02 PM) <timeless> ok, so the problem is that
(2:38:15 PM) <timeless> this nsHTMLAnchorElement creature
(2:38:30 PM) <timeless> in .frame 6a/.frame 6b
(2:38:34 PM) <timeless> was dying
(2:38:36 PM) <timeless> unfortunately for us
(2:38:43 PM) <timeless> in its process of dying
(2:39:06 PM) <timeless> it kinda let go of something else, looks like maybe an event listener it had
(2:39:28 PM) <timeless> the event listener let go of a js context (probably for the js function that was the event listener)
(2:40:14 PM) <timeless> the release of the js context resulted in a gc
(2:40:24 PM) <timeless> the gc then walked around and found a reference to the anchor
(2:40:30 PM) <timeless> it then asked the anchor to mark itself
(2:40:38 PM) <timeless> unfortunate, the anchor was in the process of being very dead
(2:40:46 PM) <timeless> clear as mud?
(2:40:54 PM) <timeless> s/,/ly,/
(2:40:57 PM) <CTho> yup
(2:41:19 PM) <timeless> stack trace+my commentary into any existing purecall bug that seems to have classifywrapper skid marks
(2:41:20 PM) <timeless> or make your own
(2:41:23 PM) ***timeless doesn't care
(2:42:12 PM) <timeless> cc jst+smaug+bz+dbaron
(2:42:24 PM) <timeless> and everyone involved in nsDOMGCParticipantSH

Chris Thomas (CTho) [formerly cst@andrew.cmu.edu cst@yecc.com]

Comment 10

•

18 years ago

Attached file timeless' explanation of the pure virtual issue — Details

timeless asked me to attach this

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 11

•

18 years ago

Looks like it may be a regression from bug 334075.  Moving code from a destructor of a more-derived class to a destructor of a base class can be dangerous if that code relies on the more-derived class still being around, which this code does, since it needs an implementation of GetSCCIndex (and, for certain debugging code, QueryInterface).

Blocks: 334075

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 12

•

18 years ago

That said, something else seems pretty broken if ReleaseWrapper hasn't been called yet for a node that's being destroyed.

Chris Thomas (CTho) [formerly cst@andrew.cmu.edu cst@yecc.com]

Comment 13

•

18 years ago

the URL for this->nsIDocument at frame 111 was "http://www.telegraph.co.uk/money/main.jhtml?xml=/money/2006/05/28/ccfox28.xml"

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 14

•

18 years ago

Er, ignore comment 12 -- we need separate ReleaseWrapper calls for each participant, and this one (presuming entry->key != entry->participant in frame 42) won't happen until the event listener manager is destroyed, which is what's happening on the stack.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 15

•

18 years ago

Except no, that still doesn't make sense -- since the nsJSContext is being destroyed, this must be the last preserved wrapper, and ~nsMarkedJSFunctionHolder_base has the correct ordering:  ReleaseWrapper before NS_IF_RELEASE(obj).  So I don't see how the wrapper is still preserved.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 16

•

18 years ago

Attached file further IRC debugging in #developers — Details

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 17

•

18 years ago

I don't understand how this could have worked before. Even if the GetSCCIndex call would have worked, the QI to nsIDOMNode would not. Though maybe we were simply using the |else| code in this case.

But I do think it looks strange that JS still has pointers to these nodes as late as this.

Phil Ringnalda (:philor)

Updated

•

18 years ago

Assignee: nobody → events

QA Contact: general → ian

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 18

•

18 years ago

So could it be that the node being destroyed is still on the parent chain of some node that JS references?  ClassifyWrapper does walk the parent chaing.

I'd _think_ that by the time we're in ~nsINode we won't be the mParent of anything anymore, though...

Comment 19

•

18 years ago

Where does it walk the parent chain?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 20

•

18 years ago

It walks it in nsGenericElement::GetSCCIndex if the element is not in a document...

That said, I'm having issues with the posted stacks not actually matching nsDOMClassInfo.cpp from the day they were posted. What version of nsDOMClassInfo.cpp are those line numbers for, Ctho?

Comment 21

•

18 years ago

Unless there's tail-call optimization happening (which I don't see an opportunity for), it didn't get in to nsGenericElement::GetSCCIndex -- and couldn't have, since it was already in ~nsINode for that element.

Chris Thomas (CTho) [formerly cst@andrew.cmu.edu cst@yecc.com]

Comment 22

•

18 years ago

So that brings me back to my question in comment 20 -- on what exact line of code is the pure virtual call happening?

Comment 23

•

18 years ago

I:\smtrunk1>cvs status mozilla/dom/src/base/nsDOMClassInfo.cpp
===================================================================
File: nsDOMClassInfo.cpp        Status: Needs Patch

   Working revision:    1.382
   Repository revision: 1.385   /cvsroot/mozilla/dom/src/base/nsDOMClassInfo.cpp,v

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 24

•

18 years ago

In that revision,

5169 dbaron        1.334     PL_DHashTableOperate(&sWrapperSCCTable, entry->participant->GetSCCIndex(),

5170 dbaron        1.269                          PL_DHASH_ADD));

I assume that you have no local changes, right?

The stack has:

42 0012d95c 100053be 01eee544 125eba80 00000a9a gklayout!ClassifyWrapper(struct PLDHashTable * table = 0x01eee544, struct PLDHashEntryHdr * hdr = 0x125eba80, unsigned int number = 0xa9a, void * arg = 0x0012d9b8)+0x21 (FPO: [Non-Fpo]) (CONV: cdecl) [i:\smtrunk1\mozilla\dom\src\base\nsdomclassinfo.cpp @ 5170]

So entry->participant seems to be dead...

Can you catch this in a debugger again?  What's entry->key?  Is it the JS event listener or marked function holder or something?  That's the only way I can see this happening, I think.  And if that's the case, then we probably need to nsContentUtils::RemoveListenerManager from whatever destructors would still have everything needed for the GC stuff working...

And if so, then we just need to

Comment 25

•

18 years ago

We went through that already.  See comment 16.

Chris Thomas (CTho) [formerly cst@andrew.cmu.edu cst@yecc.com]

Comment 26

•

18 years ago

> I assume that you have no local changes, right?

configure mailnews/import/Makefile.in toolkit/content/widgets/tabbrowser.xml xpfe/browser/resources/content/navigatorOverlay.xul xpfe/browser/resources/content/nsBrowserStatusHandler.js 
xpfe/communicator/resources/content/contentAreaClick.js 
xpfe/components/bookmarks/resources/bookmarksMenu.js 
xpfe/global/resources/content/bindings/tabbrowser.xml

> Can you catch this in a debugger again?

If it crashes me again, sure.  Otherwise, I did dump some 107MB thing for timeless from WinDbg - maybe it's got what you need in it?

skierpage

Comment 27

•

18 years ago

FYI, I got one of these alerts a few days ago using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060601 Minefield/3.0a1

and since today's nightly automatic update to Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060606 Minefield/3.0a1 I've had about 6 of these crashes.  Not all occurred when closing windows, I think one was while doing a Ctrl-F search.  These crashes have never triggered the Talkback agent.

Comment 28

•

18 years ago

that's right, and until someone like me writes some evil code, they never will trigger talkback. note that we do have code in cvs.mozilla.org that could enable us to replace these functions, but it's a wee bit of a pain. and I'm currently officially a linux hacker...

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 29

•

18 years ago

*** Bug 340770 has been marked as a duplicate of this bug. ***

Chris Lawson (gone)

Comment 30

•

18 years ago

Attached file two Mac OS X crash logs — Details

Here are a couple crash logs from Mac OS X that appear to be very similar, if not identical, to the WinXP stack.

cl

Chris Lawson (gone)

Comment 31

•

18 years ago

OS/Platform = All.

OS: Windows XP → All

Hardware: PC → All

Chris Lawson (gone)

Comment 32

•

18 years ago

*** Bug 340770 has been marked as a duplicate of this bug. ***

Comment 33

•

18 years ago

I just hit this again; when I had it in the debugger I forgot that it wasn't well understood, and figured it was already known.

Flags: blocking1.9a1+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 34

•

18 years ago

So one possible solution here is to make us kill the listenermanager in the nsGenericElement dtor. I would be fine with doing that, but it would be good to know if that is a good fix, or just wallpapering another problem.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 35

•

18 years ago

Attached file some more debugging — Details

Gérard Talbot

Comment 36

•

18 years ago

*** Bug 334027 has been marked as a duplicate of this bug. ***

Robert Claypool

Comment 37

•

18 years ago

Whenever I get a "Pure virtual function" crash, Talkback never comes up.

Robert Claypool

Comment 38

•

18 years ago

Just caught that someone else posted the same thing some time ago as my previous post, but...

When I launch process explorer right click on Firefox, choose debug and tell it to attach the debugger, Firefox crashes without triggering Talkback as well.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

18 years ago

Flags: blocking1.9+

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 39

•

18 years ago

It's possible that this was fixed by some patches that went in recently (bug 345660 and bug 330689).  I'd be interested to hear if people still see this in trunk builds from 2006-08-04 or later.

Updated

•

18 years ago

Depends on: 286619

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 40

•

18 years ago

(In reply to comment #39)
> It's possible that this was fixed by some patches that went in recently (bug
> 345660 and bug 330689).  I'd be interested to hear if people still see this in
> trunk builds from 2006-08-04 or later.
> 

Sorry, still seeing this with current trunk builds. :-(

Updated

•

18 years ago

Depends on: 349069

Blake Kaplan (:mrbkap) (inactive)

Comment 41

•

18 years ago

*** Bug 349427 has been marked as a duplicate of this bug. ***

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 42

•

18 years ago

Bug 349069 was fixed yesterday. It may have helped here.
So if anyone could confirm whether or not there are still
'pure virtual function call' crashes.

philippe (part-time)

Comment 43

•

18 years ago

(In reply to comment #42)
> Bug 349069 was fixed yesterday. It may have helped here.
> So if anyone could confirm whether or not there are still
> 'pure virtual function call' crashes.


I think this helped, yes. I haven't been able to reproduce this problem since that bug was fixed.

(Camino builds).

Orrin Edenfield

Comment 44

•

18 years ago

I just experienced what seems like this bug in the Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060830 Minefield/3.0a1 build.  Right clicked on a link on tigerdirect.com and got the error.  Can't reproduce it though.

Windows XP SP2
Extensions:
  Adblock Plus 0.7.1.2
  PDF Download 0.7.4
  Tab Mix Plus 0.3.0.60819

William Bumgarner [:zsinj]

Comment 45

•

18 years ago

(In reply to comment #44)
> I just experienced what seems like this bug in the Mozilla/5.0 (Windows; U;
> Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060830 Minefield/3.0a1 build.  Right
> clicked on a link on tigerdirect.com and got the error.  Can't reproduce it
> though.
> 
> Windows XP SP2
> Extensions:
>   Adblock Plus 0.7.1.2
>   PDF Download 0.7.4
>   Tab Mix Plus 0.3.0.60819
> 

That is because the checkin for bug 349069 was backed out.  A new patch has been developed and is waiting for review.  I have been running for awhile with the new patch and have not encountered this problem since.

Comment 46

•

18 years ago

*** Bug 350705 has been marked as a duplicate of this bug. ***