Closed
Bug 338595
Opened 18 years ago
Closed 18 years ago
Microsoft C++ Library Runtime Error (R6025 - pure virtual function call) [@ purecall - ClassifyWrapper]
Categories
(Core :: DOM: Events, defect)
Core
DOM: Events
Tracking
()
RESOLVED
FIXED
People
(Reporter: stefan.vallaster, Unassigned)
References
Details
(Keywords: crash)
Crash Data
Attachments
(5 files)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060517 Minefield/3.0a1 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060517 Minefield/3.0a1 Closed Minefield and a warning message from Microsoft C++ Library shows up Runtime Error Programm: C:\...\firefox.exe R0625 - pure virtual function call [OK] after pressing ok - minefield crashes before pressing ok it works well and you can surf (strange) Reproducible: Didn't try Actual Results: after ok ff hangs c++ runtime error
Comment 1•18 years ago
|
||
Can you get a stack trace or provide steps to reproduce? Otherwise this bug isn't much more useful than "Firefox crashed once". (Why do multiple compilers treat pure virtual function calls so differently from, say, segmentation faults? For example, gcc on Mac and Linux makes Firefox "abort" in bug 338312, breaking the OS crash dialog on Mac. According to timeless in bug 291250, MSVC's behavior breaks Talkback on Windows.)
Severity: major → critical
Keywords: crash
i bet we could override the pure virtual handler and cause a crash, but yeah, the result is currently this dialog with nothing useful that can be done unless you have a build w/ symbols and a debugger, you can't do anything with it. i suspect the reason for the behavior is that standard null function pointers (which is the default internal theoretical implementation) when called result in call stacks that look like this: 0x0 -that's all- your average debugger isn't very happy with that, and developers can't do anything with it. compare: _abort _purevirt _lame_function_that_called_pure_virt
Comment 3•18 years ago
|
||
Why would the call stack have only 0x0 on it? That's not consistent with what I see in Talkback, etc.
Comment 4•18 years ago
|
||
Stroustrup says in "Design & Evolution of C++" 13.2.3 "My implementation places a pointer to a function called __pure_virtual_called in the vtbl; this function can then be defined to give a reasonable run-time error". Don't you agree that a "pure virtual function called" message is better than a "program crashed" message?
Comment 5•18 years ago
|
||
Not when it interferes with getting a stack trace.
if you jump to 0x0 in certain cases, then that's really the entire stack. there are a couple of crashes floating around where that's all you'll see. the point is that the implementations decided that instead of doing that, they'd provide this purevirtualfunction and have that be the called method. as i said, i'm pretty sure we could cause our builds to override the default impl with something talkback would catch....
Comment 7•18 years ago
|
||
Confirming as I get this quite often on Firefox shutdown.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Component: General → DOM: Events
Product: Firefox → Core
Version: unspecified → Trunk
jesse: ctho kindly fell into this bug, so my analysis will appear here shortly.
Summary: Microsoft C++ Library Runtime Error (R6025 - pure virtual function call) → Microsoft C++ Library Runtime Error (R6025 - pure virtual function call) [@ purecall - ClassifyWrapper]
IRC transcript: [stuff about pure virtual calls dying] (2:27:37 PM) <timeless> unfortunately in gecko (2:27:44 PM) <timeless> we kinda use the windows event loop (2:28:05 PM) <timeless> so instead of the app spinning waiting for you to bring the developer to the dialog ... (2:28:12 PM) <timeless> gecko uses the event loop to kill itself (2:28:38 PM) <timeless> 03 js3250!JS_DHashTableOperate+0x51 (2:28:59 PM) <timeless> 4c js3250!JS_DHashTableEnumerate+0xce (2:29:00 PM) <timeless> sorry (2:29:11 PM) <timeless> 0x03 is asserting that it shouldn't be under 0x4c (2:29:14 PM) <timeless> and it's right (2:29:25 PM) <timeless> but, the bug is the purecall in 0x41 (2:29:43 PM) <timeless> anything past that is a waste of time that you can't easily recognize w/o symbols for the runtime+pos (2:30:02 PM) <timeless> as for debugging the actual core bug (2:30:05 PM) <timeless> i suppose we could do that (2:30:09 PM) <timeless> .frame 42 (2:32:44 PM) <CTho> WrapperSCCEntry *SCCEntry = NS_STATIC_CAST(WrapperSCCEntry*, (2:32:44 PM) <CTho> PL_DHashTableOperate(&sWrapperSCCTable, entry->participant->GetSCCIndex(), (2:32:44 PM) <CTho> PL_DHASH_ADD)); (2:32:52 PM) <timeless> ctho: ok (2:32:58 PM) <timeless> entry->participant (2:33:27 PM) <CTho> 0x15b31300 class nsIDOMGCParticipant * (2:36:03 PM) <timeless> ctho: can you turn on *all* of the buttons in calls? (2:36:51 PM) <timeless> .frame 60 (2:37:06 PM) <timeless> possibly .frame 61 (2:37:11 PM) <timeless> kinda depends (2:37:16 PM) <timeless> i think .frame 61 (2:37:31 PM) <timeless> ctho: what branch are you using? (2:37:56 PM) <timeless> 182 nsContentUtils::RemoveListenerManager(this); (2:37:58 PM) <timeless> from .frame 61 (2:38:02 PM) <timeless> ok, so the problem is that (2:38:15 PM) <timeless> this nsHTMLAnchorElement creature (2:38:30 PM) <timeless> in .frame 6a/.frame 6b (2:38:34 PM) <timeless> was dying (2:38:36 PM) <timeless> unfortunately for us (2:38:43 PM) <timeless> in its process of dying (2:39:06 PM) <timeless> it kinda let go of something else, looks like maybe an event listener it had (2:39:28 PM) <timeless> the event listener let go of a js context (probably for the js function that was the event listener) (2:40:14 PM) <timeless> the release of the js context resulted in a gc (2:40:24 PM) <timeless> the gc then walked around and found a reference to the anchor (2:40:30 PM) <timeless> it then asked the anchor to mark itself (2:40:38 PM) <timeless> unfortunate, the anchor was in the process of being very dead (2:40:46 PM) <timeless> clear as mud? (2:40:54 PM) <timeless> s/,/ly,/ (2:40:57 PM) <CTho> yup (2:41:19 PM) <timeless> stack trace+my commentary into any existing purecall bug that seems to have classifywrapper skid marks (2:41:20 PM) <timeless> or make your own (2:41:23 PM) ***timeless doesn't care (2:42:12 PM) <timeless> cc jst+smaug+bz+dbaron (2:42:24 PM) <timeless> and everyone involved in nsDOMGCParticipantSH
timeless asked me to attach this
Looks like it may be a regression from bug 334075. Moving code from a destructor of a more-derived class to a destructor of a base class can be dangerous if that code relies on the more-derived class still being around, which this code does, since it needs an implementation of GetSCCIndex (and, for certain debugging code, QueryInterface).
Blocks: 334075
That said, something else seems pretty broken if ReleaseWrapper hasn't been called yet for a node that's being destroyed.
the URL for this->nsIDocument at frame 111 was "http://www.telegraph.co.uk/money/main.jhtml?xml=/money/2006/05/28/ccfox28.xml"
Er, ignore comment 12 -- we need separate ReleaseWrapper calls for each participant, and this one (presuming entry->key != entry->participant in frame 42) won't happen until the event listener manager is destroyed, which is what's happening on the stack.
Except no, that still doesn't make sense -- since the nsJSContext is being destroyed, this must be the last preserved wrapper, and ~nsMarkedJSFunctionHolder_base has the correct ordering: ReleaseWrapper before NS_IF_RELEASE(obj). So I don't see how the wrapper is still preserved.
I don't understand how this could have worked before. Even if the GetSCCIndex call would have worked, the QI to nsIDOMNode would not. Though maybe we were simply using the |else| code in this case. But I do think it looks strange that JS still has pointers to these nodes as late as this.
Updated•18 years ago
|
Assignee: nobody → events
QA Contact: general → ian
Comment 18•18 years ago
|
||
So could it be that the node being destroyed is still on the parent chain of some node that JS references? ClassifyWrapper does walk the parent chaing. I'd _think_ that by the time we're in ~nsINode we won't be the mParent of anything anymore, though...
Where does it walk the parent chain?
Comment 20•18 years ago
|
||
It walks it in nsGenericElement::GetSCCIndex if the element is not in a document... That said, I'm having issues with the posted stacks not actually matching nsDOMClassInfo.cpp from the day they were posted. What version of nsDOMClassInfo.cpp are those line numbers for, Ctho?
Unless there's tail-call optimization happening (which I don't see an opportunity for), it didn't get in to nsGenericElement::GetSCCIndex -- and couldn't have, since it was already in ~nsINode for that element.
Comment 22•18 years ago
|
||
So that brings me back to my question in comment 20 -- on what exact line of code is the pure virtual call happening?
I:\smtrunk1>cvs status mozilla/dom/src/base/nsDOMClassInfo.cpp =================================================================== File: nsDOMClassInfo.cpp Status: Needs Patch Working revision: 1.382 Repository revision: 1.385 /cvsroot/mozilla/dom/src/base/nsDOMClassInfo.cpp,v
Comment 24•18 years ago
|
||
In that revision, 5169 dbaron 1.334 PL_DHashTableOperate(&sWrapperSCCTable, entry->participant->GetSCCIndex(), 5170 dbaron 1.269 PL_DHASH_ADD)); I assume that you have no local changes, right? The stack has: 42 0012d95c 100053be 01eee544 125eba80 00000a9a gklayout!ClassifyWrapper(struct PLDHashTable * table = 0x01eee544, struct PLDHashEntryHdr * hdr = 0x125eba80, unsigned int number = 0xa9a, void * arg = 0x0012d9b8)+0x21 (FPO: [Non-Fpo]) (CONV: cdecl) [i:\smtrunk1\mozilla\dom\src\base\nsdomclassinfo.cpp @ 5170] So entry->participant seems to be dead... Can you catch this in a debugger again? What's entry->key? Is it the JS event listener or marked function holder or something? That's the only way I can see this happening, I think. And if that's the case, then we probably need to nsContentUtils::RemoveListenerManager from whatever destructors would still have everything needed for the GC stuff working... And if so, then we just need to
We went through that already. See comment 16.
> I assume that you have no local changes, right? configure mailnews/import/Makefile.in toolkit/content/widgets/tabbrowser.xml xpfe/browser/resources/content/navigatorOverlay.xul xpfe/browser/resources/content/nsBrowserStatusHandler.js xpfe/communicator/resources/content/contentAreaClick.js xpfe/components/bookmarks/resources/bookmarksMenu.js xpfe/global/resources/content/bindings/tabbrowser.xml > Can you catch this in a debugger again? If it crashes me again, sure. Otherwise, I did dump some 107MB thing for timeless from WinDbg - maybe it's got what you need in it?
Comment 27•18 years ago
|
||
FYI, I got one of these alerts a few days ago using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060601 Minefield/3.0a1 and since today's nightly automatic update to Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060606 Minefield/3.0a1 I've had about 6 of these crashes. Not all occurred when closing windows, I think one was while doing a Ctrl-F search. These crashes have never triggered the Talkback agent.
Comment 28•18 years ago
|
||
that's right, and until someone like me writes some evil code, they never will trigger talkback. note that we do have code in cvs.mozilla.org that could enable us to replace these functions, but it's a wee bit of a pain. and I'm currently officially a linux hacker...
Comment 29•18 years ago
|
||
*** Bug 340770 has been marked as a duplicate of this bug. ***
Comment 30•18 years ago
|
||
Here are a couple crash logs from Mac OS X that appear to be very similar, if not identical, to the WinXP stack. cl
Comment 32•18 years ago
|
||
*** Bug 340770 has been marked as a duplicate of this bug. ***
I just hit this again; when I had it in the debugger I forgot that it wasn't well understood, and figured it was already known.
Flags: blocking1.9a1+
So one possible solution here is to make us kill the listenermanager in the nsGenericElement dtor. I would be fine with doing that, but it would be good to know if that is a good fix, or just wallpapering another problem.
Comment 36•18 years ago
|
||
*** Bug 334027 has been marked as a duplicate of this bug. ***
Comment 37•18 years ago
|
||
Whenever I get a "Pure virtual function" crash, Talkback never comes up.
Comment 38•18 years ago
|
||
Just caught that someone else posted the same thing some time ago as my previous post, but... When I launch process explorer right click on Firefox, choose debug and tell it to attach the debugger, Firefox crashes without triggering Talkback as well.
Flags: blocking1.9+
It's possible that this was fixed by some patches that went in recently (bug 345660 and bug 330689). I'd be interested to hear if people still see this in trunk builds from 2006-08-04 or later.
Comment 40•18 years ago
|
||
(In reply to comment #39) > It's possible that this was fixed by some patches that went in recently (bug > 345660 and bug 330689). I'd be interested to hear if people still see this in > trunk builds from 2006-08-04 or later. > Sorry, still seeing this with current trunk builds. :-(
Comment 41•18 years ago
|
||
*** Bug 349427 has been marked as a duplicate of this bug. ***
Comment 42•18 years ago
|
||
Bug 349069 was fixed yesterday. It may have helped here. So if anyone could confirm whether or not there are still 'pure virtual function call' crashes.
Comment 43•18 years ago
|
||
(In reply to comment #42) > Bug 349069 was fixed yesterday. It may have helped here. > So if anyone could confirm whether or not there are still > 'pure virtual function call' crashes. I think this helped, yes. I haven't been able to reproduce this problem since that bug was fixed. (Camino builds).
Comment 44•18 years ago
|
||
I just experienced what seems like this bug in the Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060830 Minefield/3.0a1 build. Right clicked on a link on tigerdirect.com and got the error. Can't reproduce it though. Windows XP SP2 Extensions: Adblock Plus 0.7.1.2 PDF Download 0.7.4 Tab Mix Plus 0.3.0.60819
Comment 45•18 years ago
|
||
(In reply to comment #44) > I just experienced what seems like this bug in the Mozilla/5.0 (Windows; U; > Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060830 Minefield/3.0a1 build. Right > clicked on a link on tigerdirect.com and got the error. Can't reproduce it > though. > > Windows XP SP2 > Extensions: > Adblock Plus 0.7.1.2 > PDF Download 0.7.4 > Tab Mix Plus 0.3.0.60819 > That is because the checkin for bug 349069 was backed out. A new patch has been developed and is waiting for review. I have been running for awhile with the new patch and have not encountered this problem since.
Comment 46•18 years ago
|
||
*** Bug 350705 has been marked as a duplicate of this bug. ***
Comment 47•18 years ago
|
||
This seems to have been fixed by the check-in for bug 349069. If anyone is still seeing this with a build form 2006-09-03 or later please add a comment.
Comment 48•18 years ago
|
||
Bug 349069 should have fixed this. Marking FIXED. Please re-open if you still see crashes.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
I still wonder if we should try to add assertions that would catch the problem in comment 15.
Comment 50•18 years ago
|
||
*** Bug 361746 has been marked as a duplicate of this bug. ***
Updated•13 years ago
|
Crash Signature: [@ purecall - ClassifyWrapper]
You need to log in
before you can comment on or make changes to this bug.
Description
•