Closed Bug 65800 Opened 25 years ago Closed 24 years ago

trunk #1 crashes: handling |CantRenderReplacedElementEvent|s

Categories

(Core :: Layout, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla0.9

People

(Reporter: dbaron, Assigned: dbaron)

References

Details

(Keywords: crash, topcrash)

Attachments

(5 files)

The #1 set of topcrashes on the trunk is things related to FrameManager::HandlePLEvent. These show up in the talkback reports under the signatures: 0x00000000 (and other low numbers) FrameManager::HandlePLEvent These *may* be related to / the same as bug 63347 but to avoid confusion I'd rather file this as a separate bug, and then if we fix both we can fix both. Information at ftp://ftp.mozilla.org/pub/data/crash-data/
FWIW, I can't figure out exactly what code causes the crash from the talkback reports. The line number isn't all that helpful, since I don't see what would crash. (Note also that the file changed recently, so look at the version of the file corresponding to the date of the talkback report. That can be done easily by appending "&rev=n.nn" to the CVS Blame URL. If someone could look at the information available only internally on the talkback server (cyclone?), it might help to figure out exactly where the crash is.
So the code that looks to be causing the crash is in this neighborhood... FrameManager* frameManager = (FrameManager*)aEvent->owner; //adding a ptr check since talkback is complaining about a crash here. //I suspect that if the event->owner is really null, bad things will happen //elsewhere. --> if(frameManager && frameManager->CanProcessEvents()) { // Remove the posted event from the linked list CantRenderReplacedElementEvent** events = &frameManager->mPostedEvents; while (*events) { ...if we back up to r1.64 of nsFrameManager.cpp. If we end up with the PC at zero, then that would seem to indiciate that we've tried to dispatch a method through a bad vtable. Could it be that the frame manager has been destroyed? CanProcessEvents() is a virtual method.
Yeah, I suspect that the CantRenderReplacedElementEvent should hold a strong reference to the frame manager, and release it when the event gets processed.
Right now the frame manager should be dequeueing and deleting all the events that are still posted when it is destroyed. However, something required to do that could be failing. There seem to have been a lot of efforts to prevent crashes here (for example, CanProcessEvents), but none seems to have succeeded permanently. I'm wondering if perhaps there's something else strange going on, like a corrupt vtable pointer or something. It really seems like the events *should* be deleted, although something could be failing to prevent that...
This patch looks reasonable to me, r=waterson. I say we try it and see if we make any headway by looking at the talkback data. Whatdya think, buster?
Assigning to myself since I have a possible fix.
Assignee: clayton → dbaron
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla0.8
Blocks: 66164
Without this patch, I seem to be able to fairly consistently get a crash following the procedure in bug 66164. With the patch, I seem to hit bug 51384 instead. It's possible that this patch just pushes the problem off to another place, or that the problem I'm seeing is triggering both of these bugs, and this one just crashes it first. A stacktrace is attached to bug 66164, at: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=23128 .
Based on the comments in bug 66164, I have a revised patch that also nulls out the mPresShell pointer in Shutdown() and tests mPresShell in HandlePLEvent. I want to test it a little, and then I'll attach it...
Actually, that won't work. The frameManager relies on the presShell being around when it is destroyed, but holds a weak pointer since the presShell owns it. A good solution might be to move some of the stuff in ~PresShell to PresShell::Shutdown.
s/PresShell/FrameManager/ on the last two lines of my previous comment
Based on dbaron's comments on 2001-01-18 06:03, I think this bug might be related to bug 65243. In fact, there's a good chance it's a duplicate, and that the fix David is proposing is a band-aid we don't need once bug 65243 is fixed. David, what do you think?
Depends on: 65243
Yes, it does. It might not be a bad idea to bulletproof things with this patch, too, or at least to remove the existing mCanProcessEvents (which doesn't help much because it's only false after the frame manager is deleted).
Here's another crash from bug 66164: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=23247 That's with the original patch applied, but without the revised. I will try David's revised patch, and see if that fixes things for me.
Using a CVS build from 1/22/2001 (which incorporates the patch from bug 65243), along with the patches from bug 61386 and bug 61756, I can no longer get Mozilla to crash by following the procedure in bug 66164.
buster, waterson: Any thoughts on this? Should we go with something like my previous patch as extra bulletproofing, or should we just remove the mCanProcessEvents code (which didn't do any good anyway)?
In principle, I'm all for taking out code that doesn't do anything. Has this dropped off the topcrasher radar now that danm's fix is in? If so, why don't we try backing out the mCanProcessEvents stuff to see if ``things stay fixed.''
agreed. try removing the (hopefully) unneeded code. and lets avoid putting in the bullet-proofing unless a need arises. Leaving in comments could be helpful, and if we can think of any conditions for asserts that would make us think somethings gone wrong, that would be very helpful.
There haven't been any published talkback reports since danm's fix went in. I think chofmann mentioned a problem with the talkback server.
we should have talkback enabled on the trunk builds soon (hopefully tommorrow), so please try to crash with those builds before verifying this bug.
jpatel: is this still showing up on the talkback radar?
I just got the trunk topcrash reports running again, but due to the talkback server problems, the stack traces aren't very helpful (and neither are a lot of the stack signatures). Here is the top 40 list for today (crashes from the last 9 days), maybe someone can tell if one of these stack signatures is related to FrameManager::HandlePLEvent: Top crashes Count Area Bug Status Result 78 gkhtml.dll 72 GKHTML.DLL 41 nsXBLService::LoadBindings 39 PromptUserCallback 33 psmglue.dll 29 PSMGLUE.DLL 19 HTMLContentSink::CreateContentObject 18 msgnews.dll 17 libgklayout.so 16 necko.dll 16 libpsmglue.so 16 MSVCRT.DLL 50243 RESO WORK 12 XPCOM.DLL 11 NECKO.DLL 11 0x00000000 58256 VERI FIXE 10 rdf.dll 9 nsCacheManager::NoteDormant 9 libmsgnews.so 9 img3250.dll 9 JSDOM.DLL 9 IMG3250.DLL 8 .__ptr_glue 7 msgbsutl.dll 7 MSVCRT.dll 50243 RESO WORK 6 nsTableOuterFrame::GetCaptionSide 6 nsNNTPProtocol::SendFirstNNTPCommand 6 msgdb.dll 5 xpcom.dll 5 nsCSSFrameConstructor::ConstructXULFrame 5 libgkgfx.so 4 libxpcom.so 4 libnecko.so 4 libmozjs.so 4 libgkplugin.so 4 libc.so.6 4 MOZBRWSR.DLL 4 EDITOR.DLL 3 ntdll.dll 3 nsEventStateManager::UpdateCursor 3 mork.dll Sorry if this isn't too helpful, but it's all we have right now. I'll keep you posted as we start getting more useful data.
->0.9, since there's probably no crash here now, but we still should do some cleanup.
Target Milestone: mozilla0.8 → mozilla0.9
Couple weak nits. Take 'em or leave 'em. r=waterson - In Destroy(), do we want to early-exit if we find mPresShell == 0? (The old code didn't, but there's an assertion there...) - Ibid, FrameManager::HandlePLEvent() and frameManager == 0. - Might NS_ENSURE_STATE(mPresShell != 0, ...) be a bit more lucid than NS_ENSURE_TRUE(mPresShell, ...)? (In general, I'm not a big fan of NS_ENSURE anything, but I may be beginning to warm up to them the way they've been used here...)
> - In Destroy(), do we want to early-exit if we find mPresShell == 0? (The old > code didn't, but there's an assertion there...) Well, I'm never sure whether it's a good idea to add a null-check that I'm sure we won't hit in the current code. I added the assertion more to document the assumption that the code is making. It ought to be pretty hard to call Destroy twice... > - Ibid, FrameManager::HandlePLEvent() and frameManager == 0. Well, it's initalized with |this|, so it should never be null, and the assertion is there to document that fact. > - Might NS_ENSURE_STATE(mPresShell != 0, ...) be a bit more lucid than > NS_ENSURE_TRUE(mPresShell, ...)? (In general, I'm not a big fan of > NS_ENSURE anything, but I may be beginning to warm up to them the way > they've been used here...) It doesn't make much differenc to me, although I don't think the former is much clearer and I know there are some people who strongly dislike comparisons to null.
Keywords: mozilla0.8.1
Keywords: mozilla0.8
I don't like the assertion messages 'will crash soon' and would prefer something more descriptive instead, but that is minor. sr=attinasi
Checked in the cleanup 2001-03-05 17:46, and the crash is long-gone, so marking fixed.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Marking verified based on reporter's comments.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: