Closed Bug 54868 Opened 25 years ago Closed 25 years ago

can't send Web Mail using N6 9/28 build; crash 100% of time on clicking "Send"

Categories

(Core :: DOM: Core & HTML, defect, P1)

x86
Windows NT
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: ekrock, Assigned: nisheeth_mozilla)

References

Details

(Keywords: crash, regression, Whiteboard: [rtm++])

Attachments

(4 files)

Using Commercial 2000092808 on WinNT 4.0 SP4. To repro: 1) go to home.netscape.com 2) click web mail 3) log in to your Web Mail account 4) click Write Mail 5) enter your email address in To, "test" in Subject, and "test" in Body 6) Click Send. Browser crashes. Reproduced 3 times and filed TalkBack reports each time 10/1. Assigning to DOM0 as Web Mail only uses HTML 3.2 and JS1.1/DOM0 so far as I know. Marking crash, rtm, 4xp, and P1 as this is extremely high profile (our own portal's web mail).
Keywords: 4xp, crash, rtm
Priority: P3 → P1
ekrock: do you have a franklin (new style) webmail account or a usa.net (old style) webmail account?
I think it must be an old-style Web Mail account since it looks the same as it always has.
This is a P1 ship stopper and regression. Marking rtm+ need info so that investigation starts on this bug. We'll mark it rtm+ once the fix is attached, reviewed, and super-reviewed.
Keywords: regression
Whiteboard: [rtm+ need info]
Hmm, tested on both WinNT and Linux and I see no crash, Nisheeth also tested on WinNT and no crash. Eric, please re-test. Marking WORKSFORME.
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → WORKSFORME
I just got this crash. Re-opening. Using commerical build id 2000100408 on WinNT
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
just got it again, this time with a debug build on WinNT using bits from the branch. stack: PresShell::~PresShell() line 1267 + 15 bytes PresShell::`scalar deleting destructor'() + 15 bytes PresShell::Release() line 1188 + 158 bytes nsCOMPtr<nsIPresShell>::~nsCOMPtr<nsIPresShell>() line 490 ReflowEvent::HandleEvent() line 4364 + 8 bytes HandlePLEvent() line 4372 PL_HandleEvent() line 575 + 10 bytes PL_ProcessPendingEvents() line 508 + 9 bytes _md_EventReceiverProc() line 1044 + 9 bytes USER32! 77e71820() 00ac4100() the line of code is: mViewManager->DisableRefresh(); mViewManager is not null, but it has already been destroyed (the vtable is 0xdddddddd)
cc-ing jud and adam. I set breakpoints in the destructors of the Document Viewer, Pres Shell, and View Manager. What I see is after I click "send" is: ~DocumentViewer ~PresShell ~ViewManager ~DocumentViewer ~PresShell ~ViewManager ~DocumentViewer ~PresShell ~DocumentViewer ~ViewManager ~ViewManager ~PresShell So, somehow the destructor for the last DocumentViewer is getting triggered before the destructor for the previous DV's ViewManager, and worse the View Manager associated via weak reference with the last Pres Shell is getting destroyed before the PresShell.
I should have written that last comment to show that the final ~DV is nested within the previous one. Here's the call stack at the point where we destroy the nested DV: DocumentViewerImpl::~DocumentViewerImpl() line 408 DocumentViewerImpl::`scalar deleting destructor'() + 15 bytes DocumentViewerImpl::Release() line 355 + 154 bytes nsCOMPtr<nsIContentViewer>::assign_assuming_AddRef() line 472 nsCOMPtr<nsIContentViewer>::assign_with_AddRef() line 849 nsCOMPtr<nsIContentViewer>::operator=() line 584 nsDocShell::Destroy() line 1595 nsWebShell::Destroy() line 1393 nsHTMLFrameInnerFrame::~nsHTMLFrameInnerFrame() line 463 nsHTMLFrameInnerFrame::`scalar deleting destructor'() + 15 bytes nsFrame::Destroy() line 425 + 34 bytes nsFrameList::DestroyFrames() line 36 nsContainerFrame::Destroy() line 98 nsFrameList::DestroyFrames() line 36 nsContainerFrame::Destroy() line 98 nsBoxFrame::Destroy() line 1002 + 13 bytes nsFrameList::DestroyFrames() line 36 nsContainerFrame::Destroy() line 98 nsBoxFrame::Destroy() line 1002 + 13 bytes nsFrameList::DestroyFrames() line 36 nsContainerFrame::Destroy() line 98 nsBoxFrame::Destroy() line 1002 + 13 bytes nsFrameList::DestroyFrames() line 36 nsContainerFrame::Destroy() line 98 nsBoxFrame::Destroy() line 1002 + 13 bytes nsFrameList::DestroyFrames() line 36 nsContainerFrame::Destroy() line 98 nsBoxFrame::Destroy() line 1002 + 13 bytes nsFrameList::DestroyFrames() line 36 nsContainerFrame::Destroy() line 98 ViewportFrame::Destroy() line 144 FrameManager::~FrameManager() line 405 FrameManager::`scalar deleting destructor'() + 15 bytes FrameManager::Release() line 384 + 157 bytes PresShell::~PresShell() line 1272 + 27 bytes PresShell::`scalar deleting destructor'() + 15 bytes PresShell::Release() line 1188 + 158 bytes nsCOMPtr<nsIPresShell>::~nsCOMPtr<nsIPresShell>() line 490 DocumentViewerImpl::~DocumentViewerImpl() line 447 + 97 bytes DocumentViewerImpl::`scalar deleting destructor'() + 15 bytes DocumentViewerImpl::Release() line 355 + 154 bytes nsCOMPtr<nsIContentViewer>::assign_assuming_AddRef() line 472 nsCOMPtr<nsIContentViewer>::assign_with_AddRef() line 849 nsCOMPtr<nsIContentViewer>::operator=() line 584 nsDocShell::Destroy() line 1595 nsWebShell::Destroy() line 1393 nsXULWindow::Destroy() line 324 nsWebShellWindow::Destroy() line 1779 nsContentTreeOwner::Destroy() line 408 GlobalWindowImpl::Close() line 2055 GlobalWindowImpl::CloseWindow() line 3541 nsJSContext::ScriptEvaluated() line 1295 + 18 bytes nsJSContext::CallEventHandler() line 915 nsJSEventListener::HandleEvent() line 154 + 64 bytes nsEventListenerManager::HandleEventSubType() line 788 + 19 bytes nsEventListenerManager::HandleEvent() line 1365 + 39 bytes GlobalWindowImpl::HandleDOMEvent() line 501 DocumentViewerImpl::LoadComplete() line 669 + 47 bytes nsWebShell::OnEndDocumentLoad() line 954 nsDocLoaderImpl::FireOnEndDocumentLoad() line 805 nsDocLoaderImpl::DocLoaderIsEmpty() line 612 nsDocLoaderImpl::OnStopRequest() line 555 nsLoadGroup::RemoveChannel() line 566 + 39 bytes PresShell::RemoveDummyLayoutRequest() line 4704 + 44 bytes PresShell::ReflowCommandRemoved() line 4657 PresShell::ProcessReflowCommands() line 4477 ReflowEvent::HandleEvent() line 4362 HandlePLEvent() line 4372 PL_HandleEvent() line 575 + 10 bytes PL_ProcessPendingEvents() line 508 + 9 bytes _md_EventReceiverProc() line 1044 + 9 bytes USER32! 77e71820()
I think all the Pres Shells get destroyed within the context of the owning DV destructors, except this last one which is getting destroyed during the processing of an event, after the owning DV destructor has already terminated.
This could be related to the relatively recent changes to the pres shell to enable async reflow during doc load. Taking from jst.
Assignee: jst → nisheeth
Status: REOPENED → NEW
buster, I tried reproducing this on a webmail account (my own) and a franklin account (moztester) and was unable to do so using today's windows optimized and debug branch builds. Since you can reproduce this, can you temporarily reset your webmail password and make it public? Once we've finished debugging this problem, you can go back to your old password.
rtm need info (not gotten review/super review yet).
Whiteboard: [rtm+ need info] → [rtm need info]
*** Bug 55420 has been marked as a duplicate of this bug. ***
nisheeth: bug 55420 is 100% reproducable, and is the same crash.
So this bug is also affecting plug-in download from cbsnews.com. We really want this fixed for RTM.
The analysis of this bug here looks like the situation I have in bug 54233, Alt-f x (or Alt-f c in last window) crashes on NT. If focus is in urlbar there is no crash. In that bug we are in nsMenuFrame::Execute(). It fetches a ref to nsIPresShell, calls its method to handle event, and when the presshell pointer is released we have a crash in ~nsIPresShell because mViewMananeger has been deleted (mDocumentViewer was deleted, which owns view manager, pointer value 0xdddddddd). I think the destructor order is the same as buster reported, although I did not check it completely.
You know what, my other rtm need info bug 53763 also looks like it is suffering from this same thing. It is about a crash when closing a dialog caused by an event when the content where the dialog was created has been destroyed by the time the dialog is dismissed. Again when nsIPresShell pointer is released we have mViewManager == 0xdddddddd.
I'm attaching a patch that fixes the web mail crash and the crash on www.cbsnews.com. Unfortunately, this patch does not fix bug 53763 or bug 54233. I will look into those more tomorrow.
Attached patch Proposed fixSplinter Review
Marking fix in hand.
Whiteboard: [rtm need info] → [rtm need info][fix in hand]
The document viewer contains the pres shell and the view manager. When the document viewer is destroyed, the pres shell is supposed to be destroyed before the view manager. This wasn't happening because the ReflowEvent struct's HandleEvent method at the bottom of the stack was maintaining a strong reference to the pres shell while the document viewer, and consequently, the view manger got destroyed further up the stack. When the stack unwound down to HandleEvent(), the pres shell finally got destroyed, but because the associated view manager had already been destroyed, a crash occurred trying to dereference the view manager. The fix is to hold on to the document viewer containing the pres shell while processing reflow commands on the pres shell. I am not happy with introducing a dependency on the content viewer from within the pres shell, but I couldn't figure out a better way to fix this. Actually, while writing this, it occurred to me that a better fix would be to hold on to the view manager rather than the document viewer. Let me try that out.
Holding on to the view manager also works. Attaching the patch as a better proposed fix.
I don't know if the order of nsCOMPtr destructor is guaranteed across platforms/compilers. It seems like a fragile solution. A slight modification would be to use normal pointers, and explicitly NS_RELEASE them in the order you want. That's much better from a code documentation point of view as well. So in general I approve of the approach, but I'd like to see a safer and more clear implementation (unless you're 100% sure that destructor order is guaranteed in this case, and that no compiler we care about violates this guarantee.)
Oh, and be sure to put back the "kung fu death grip" comment from the first patch. It's damn funny.
Yeah, I'd agree with Buster - use explicit NS_RELEASE calls if you want to guarantee the order of destruction.
We rely on the order of destructors in other places as well. A'la scc, so I kind of believe compliers would do the right thing. It is in the C++ standard as well if I am not mistaken. http://lxr.mozilla.org/seamonkey/source/layout/base/src/nsDocumentViewer.cpp#270
It doesn't matter if it's in the standard if not all compilers follow the standard... and Just gcc/VC++/CW are not good enough - remember there are pleanty of other platforms that people will be trying to compile/run mozilla on. Why rely on a very edge-case of a standard, that is potentially different from compiler to compiler, when you can do explicit releases? Remember, you can still use nsCOMPtr's and still explicity release stuff, you just assign nsnull to the nsCOMPtr, and that will Release() the old object.
*** Bug 56699 has been marked as a duplicate of this bug. ***
Better safe than sorry. I'm attaching a patch that explicitly releases the com ptrs by assigning nsnull to them. Please review and approve it. Thanks!
Status: NEW → ASSIGNED
sr=buster. thanks for making the change, it was the Right Thing.
Marking rtm+
Whiteboard: [rtm need info][fix in hand] → [rtm+][fix in hand]
PDT says rtm need info: needs r=, and please don't include whitespace changes in patch.
Whiteboard: [rtm+][fix in hand] → [rtm need info][fix in hand]
I've already given the new patch sr=buster, can somebody please give an r=??? Nisheeth, you should send a note out to reviewers@mozilla.org about this, if you haven't already.
Renominating rtm+, now that Heikki has reviewed the bug.
Whiteboard: [rtm need info][fix in hand] → [rtm+][fix in hand]
rtm++
Whiteboard: [rtm+][fix in hand] → [rtm++][fix in hand]
The fix is checked into the rtm branch. I'm about to attach the final, cleaned up patch that got checked in case it is needed for future reference.
Whiteboard: [rtm++][fix in hand] → [rtm++][fix checked into branch]
I just checked in the fix to the trunk. Marking fixed.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → FIXED
Whiteboard: [rtm++][fix checked into branch] → [rtm++]
verified ability to send webmail using both franklin account and old usa.net account. verified on branch: win32 2000-11-04-09-mn6 verified ability to send webmail using old usa.net account. verified on branch: mac 2000-11-04-08-mn6 and linux 2000-11-04-09-mn6. Webmail is down at this exact moment so I can't try the Franklin accounts using Mac and Linux. leave this unverified pending verification on Mac and Linux for the franklin webmail accounts.
esther - can someone in mail help to verify this quickly since we have the franklin mail accounts already? Need to verify on Mac and Linux only. Thanks.
verified WinNT 20001105 rtm candidate build. adding kevin to the cc-list because he was looking at a similar problem at one time.
really marking verified this time
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: