Closed Bug 134437 Opened 23 years ago Closed 23 years ago

[CRASH] setting focus on inner frame a switching stylesheets causes crash [@ nsBlockFrame::ReflowBlockFrame]

Categories

(Core :: Layout, defect, P1)

x86
Windows 2000
defect

Tracking

()

VERIFIED FIXED
mozilla1.0

People

(Reporter: jrspm, Assigned: alexsavulov)

References

()

Details

(Keywords: crash, Whiteboard: [adt2 RTM])

Crash Data

Attachments

(4 files, 1 obsolete file)

using build 2002032916 on win2k (sp2sr1) To reproduce: 1. Go to URL: http://www.hixie.ch/tests/evil/state/001.html 2. set the focus in the IFrame (just click the mouse once in the IFrame) 3. go to the view/Use Style and change the stylesheet to "Alternate" Expected: Should just switch the style and show the same number as was there before Actual: crash Talkback ID: #TB4644175M I first reported this crash in bug 52334. jst thought it might be related to his changes...here is what he said: "...That crash, however, is not due to these changes directly. The crash that happens with the above testcase (note, you don't need to select anything in the iframe, just giving it focus is enough) is due to a re-entrancy problem in the layout frame destruction code that is much easier to trigger with these changes than w/o them. We end up destroying a frame, and while doing so we end up destroying a window which passes focus along to one of the frames we're destroying, and that ends up reflowing that frame and since that frame is being destroyed we end up getting references to deleted frames n' other nice things that lead to the crash...." After investigating this for a bit, jst said this: "Ok, I just grabbed a build that didn't have my changes in it and did some more testing with it. Turns out that the crash reported above is just as reproduceable w/o my changes, I crash in the exact same place due to the exact same reasons with a trunk build w/o my changes. Because of this fact, I won't be spending any more time on that crash as part of this bug, we need a new bug for that crash." This should definitely be fixed *before* 1.0 is released! jake
cc'ing jst and targeting to Mozilla 1.0. Sorry if this isn't kosher, but this crash needs to be fixed before a 1.0 release for sure! Jake
Target Milestone: --- → mozilla1.0
worksforme with linux build 20020329. part of the frame looks a bit scrambled for a second, but fixes itself. no crash.
Could someone test if this same crash is reproduceable on a real site that uses iframes?
--> attinasi_layout to test...
Assignee: attinasi → attinasi_layout
Could be related to bug 133219 and/or bug 134520
Strangely, NS_ABORT_IF_FALSE dies *NOT* abort if false - it asserts but does not abort :( The problem is that we are trying to reflow an empty line, because the empty line is still marked dirty. I need to spend some time digging into why the line is emptied without having the dirty bit cleared, but the quick fix is to check for a null firstChild in ReflowBlockFrame, before dereferencing it.
Status: NEW → ASSIGNED
Oh, I see - NS_ABORT_IF_FALSE was supposed to abort the program, not the routine. So, it is just an assertion by another name. Amazing what reading the header files can do for a guy...
I have a fix for this: we are marking empty lines dirty in a couple of places, and that is causing assertions and ultimately the crash because if a line is marked dirty, we assume that it has at least one child. I'll attach a patch, but I'm first trying to understand if it is ever legal to have a dirty empty line.
Summary: [CRASH] setting focus on inner frame a switching stylesheets causes crash → [CRASH] setting focus on inner frame a switching stylesheets causes crash [@ nsBlockFrame::ReflowBlockFrame]
I'm attaching Jake's incident info and I have added the stack signature into the summary simply so that Talkback staff can see this one. There are enough incidents with this signature that I don't want to accidentally start logging dupes.
Setting Priority to P1
Priority: -- → P1
Assignee: attinasi_layout → alexsavulov
Status: ASSIGNED → NEW
taking from Marc
here is what happens: look at the stack below: - the frame constructor removes content (contentremoved) - the frame manager instructs the parent frame to remove a frame - the block frame destroys its child (block frame) before it removes the line that contained the block - the removed frame destroys itself but is also destroying a view that will cause the app event queue to be processed - there is a reflow command in the queue - the parent block tries to reflow an empty line that ends with a crash solution: in BlockFrame::DoRemoveFrame remove the line first then call aDeletedFrame->Destroy: - aDeletedFrame->Destroy(aPresContext); - aDeletedFrame = nextInFlow; // If line is empty, remove it now if (0 == lineChildCount) { ... ++line; } } + aDeletedFrame->Destroy(aPresContext); + aDeletedFrame = nextInFlow; I will prepare a patch on this (stack excerpt:) nsIFrame::GetStyleData(nsStyleStructID eStyleStruct_Display, const nsStyleStruct * & 0x00126ed8) line 564 + 3 bytes nsBlockFrame::ReflowBlockFrame(nsBlockReflowState & {...}, nsLineList_iterator {...}, int * 0x001271b8) line 3102 nsBlockFrame::ReflowLine(nsBlockReflowState & {...}, nsLineList_iterator {...}, int * 0x001271b8, int 1) line 2461 + 27 bytes nsBlockFrame::ReflowDirtyLines(nsBlockReflowState & {...}) line 2238 + 31 bytes nsBlockFrame::Reflow(nsBlockFrame * const 0x036f6498, nsIPresContext * 0x035a8048, nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...}, unsigned int & 0) line 844 + 15 bytes ....... PresShell::ProcessReflowCommands(int 0) line 6338 PresShell::FlushPendingNotifications(PresShell * const 0x0351a898, int 0) line 5056 nsDocument::FlushPendingNotifications(nsDocument * const 0x036c0978, int 1, int 0) line 3419 ...... nsWindow::ProcessMessage(unsigned int 7, unsigned int 2753886, long 0, long * 0x0012c1d8) line 3733 + 23 bytes nsWindow::WindowProc(HWND__ * 0x004a0584, unsigned int 7, unsigned int 2753886, long 0) line 1130 + 27 bytes USER32! 77e12e98() USER32! 77e139a3() USER32! 77e1395f() NTDLL! 77fa032f() nsView::~nsView() line 163 nsView::`scalar deleting destructor'(unsigned int 1) + 15 bytes ..... nsBlockFrame::DoRemoveFrame(nsIPresContext * 0x035a8048, nsIFrame * 0x036f98f0) line 4946 nsBlockFrame::RemoveFrame(nsBlockFrame * const 0x036f6498, nsIPresContext * 0x035a8048, nsIPresShell & {...}, nsIAtom * 0x00000000, nsIFrame * 0x036f98f0) line 4808 + 16 bytes FrameManager::RemoveFrame(FrameManager * const 0x034f6528, nsIPresContext * 0x035a8048, nsIPresShell & {...}, nsIFrame * 0x036f6498, nsIAtom * 0x00000000, nsIFrame * 0x036f98f0) line 1015 nsCSSFrameConstructor::ContentRemoved(nsCSSFrameConstructor * const 0x0341a0b0, nsIPresContext * 0x035a8048, nsIContent * 0x03694310, nsIContent * 0x034feaa0, int 2, int 0) line 9575 + 58 bytes nsCSSFrameConstructor::RecreateFramesForContent(nsIPresContext * 0x035a8048, nsIContent * 0x034feaa0, int 0, nsIStyleRule * 0x00000000, nsIStyleContext * 0x00000000) line 11868 + 35 bytes nsCSSFrameConstructor::ProcessRestyledFrames(nsCSSFrameConstructor * const 0x0341a0b0, nsStyleChangeList & {...}, nsIPresContext * 0x035a8048) line 10020 PresShell::ReconstructStyleData(PresShell * const 0x0351a898, int 0) line 5382
Attached patch patch preview (obsolete) — Splinter Review
the promised patch, see explanation in previous comment i also propose a similar change in the nsLineBox to prevent surprizes
Why in the world do we return to the event queue when deleting a view? That's windows-only, isn't it? Hawe we always done that? It makes it very hard to maintain lots of invariants. Under the circumstances, though, the patch seems reasonable.
well, i have not debugged yet on Linux, (i'm not that good in debugging the XFree86 stuff but I'll ask JKeiser to help), however on windows when destroying the view, we also destroy its window using the native call to the win32 API ::DestroyWindow(HWND) and that brings us back into the windows message loop so we process all the pending events. I will take a look at the Linux implementation and see what is going on there.
Comment on attachment 78805 [details] [diff] [review] patch preview Good catch! It is surprising that we can be reentered during the deletion of the frame, but I think your solution to the problem is a good one. In general it seems like a good idea to modify the line before modifying the frame that is in the line... Currently, there is no invariant that says we cannot be reflowed again during a reflow (or any other operation on a frame for that matter). Maybe if we get a real reflow queue instead of using the OS message queue we can assert that kind of invariant, but right now we have no control over it.
the same patch with explanatory comments added
Attachment #78805 - Attachment is obsolete: true
this is an Win32 issue only (I didn't tested on the 2 Macs yet, but I think that is really Win32 only) I tried to find a reason why shall the empty lines exist beyond the existence of the frames that are going to be deleted). Couldn't find anything to advocate for that since the destroying of a nsLineBox does not causes more than release the allocated arena space. (we don't have any other objects inheriting from linelink) (i also noticed the mem leak in the nsLineBox::DeleteLineList, but i heard dbaron fixed it already)
Comment on attachment 78966 [details] [diff] [review] revised patch with comments r=dbaron, although you should fix the incorrect indentation (tabbing?) of the comments. It also looks like they might run over 80 characters.
Attachment #78966 - Flags: review+
> Currently, there is no invariant that says we cannot be reflowed again during a > reflow (or any other operation on a frame for that matter). No. Layout (i.e., reflow) may not be re-entered, period. Can't we just not process reflows if the window is being destroyed? Or detect that we are _already_ in reflow, and not try to re-process the queue?
chris: i would suggest to fix the crash with the patch and open another bug to see if we can correct the event queue problem on windows. i think that in this case we don't have a re-entrancy problem (we're not in the reflow yet - see stack in comment #12), but that sync message sent from win32's ::DestroyWindow is causing us to flush the queue (i don't know the internals of ::DestroyWindow but i strongly suspect that is making use of ::SendMessage that is a sync message processing). I think that marc's ideea to have reflow 'gates' (bug 137195) to block reflow when the tree is not in a consistent state, is a good one. I'm not sure yet if this is implementable without causing other hard to detect problems like the one we have here in this bug (someone might be tempted to 'close the gate' when is not recomendable), but is worth to give it a thought.
While a generic mechanism for forbidding re-entrancy would be nice, it seems like we ought to be able to fix this specific case without it. I'm generally opposed to adding complexity to the frame code (which is already impenetrable) to work around the real problem that lies elsewhere. I'd appreciate it if you could think about this some more.
complete crash stack with comments. ** crash nsIFrame::GetStyleData(nsStyleStructID eStyleStruct_Display, const nsBlockFrame::ReflowBlockFrame(nsBlockReflowState & {...}, nsLineLi nsBlockFrame::ReflowLine(nsBlockReflowState & {...}, nsLineList_ite ** here we reflow a line that's empty nsBlockFrame::ReflowDirtyLines(nsBlockReflowState & {...}) line 223 nsBlockFrame::Reflow(nsBlockFrame * const 0x03b14d58, nsIPresContex nsContainerFrame::ReflowChild(nsIFrame * 0x03b14d58, nsIPresContext CanvasFrame::Reflow(CanvasFrame * const 0x03b01190, nsIPresContext nsBoxToBlockAdaptor::Reflow(nsBoxLayoutState & {...}, nsIPresContex nsBoxToBlockAdaptor::DoLayout(nsBoxToBlockAdaptor * const 0x03b14cb nsBox::Layout(nsBox * const 0x03b14cbc, nsBoxLayoutState & {...}) l nsScrollBoxFrame::DoLayout(nsScrollBoxFrame * const 0x03b0168c, nsB nsBox::Layout(nsBox * const 0x03b0168c, nsBoxLayoutState & {...}) l nsContainerBox::LayoutChildAt(nsBoxLayoutState & {...}, nsIBox * 0x nsGfxScrollFrameInner::LayoutBox(nsBoxLayoutState & {...}, nsIBox * nsGfxScrollFrameInner::Layout(nsBoxLayoutState & {...}) line 1217 nsGfxScrollFrame::DoLayout(nsGfxScrollFrame * const 0x03b01494, nsB nsBox::Layout(nsBox * const 0x03b01494, nsBoxLayoutState & {...}) l nsBoxFrame::Reflow(nsBoxFrame * const 0x03b0145c, nsIPresContext * nsGfxScrollFrame::Reflow(nsGfxScrollFrame * const 0x03b0145c, nsIPr nsContainerFrame::ReflowChild(nsIFrame * 0x03b0145c, nsIPresContext ViewportFrame::Reflow(ViewportFrame * const 0x03b01154, nsIPresCont nsHTMLReflowCommand::Dispatch(nsIPresContext * 0x039b5028, nsHTMLRe PresShell::ProcessReflowCommand(nsVoidArray & {...}, int 0, nsHTMLR PresShell::ProcessReflowCommands(int 0) line 6338 ** the press shell flushes th reflow commands PresShell::FlushPendingNotifications(PresShell * const 0x037cd3a8, nsDocument::FlushPendingNotifications(nsDocument * const 0x03ad7660 nsHTMLDocument::FlushPendingNotifications(nsHTMLDocument * const 0x ** the window list flushes the pending notifications to retrive a consistent length (is my guess) nsDOMWindowList::GetLength(nsDOMWindowList * const 0x039a2e80, unsi GlobalWindowImpl::GetLength(GlobalWindowImpl * const 0x0399b49c, un XPTC_InvokeByIndex(nsISupports * 0x0399b49c, unsigned int 59, unsig XPCWrappedNative::CallMethod(XPCCallContext & {...}, XPCWrappedNati XPCWrappedNative::GetAttribute(XPCCallContext & {...}) line 1823 + ** i'm not really an expert here but we end up calling a method with the index 59 on the GlobalWindowImpl: GetLength XPC_WN_GetterSetter(JSContext * 0x02b69008, JSObject * 0x02d8f108, js_Invoke(JSContext * 0x02b69008, unsigned int 0, unsigned int 2) l js_InternalInvoke(JSContext * 0x02b69008, JSObject * 0x02d8f108, lo js_GetProperty(JSContext * 0x02b69008, JSObject * 0x02d8f108, long js_Interpret(JSContext * 0x02b69008, long * 0x0012942c) line 2576 + js_Invoke(JSContext * 0x02b69008, unsigned int 1, unsigned int 2) l nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJSClass * const 0x02deb nsXPCWrappedJS::CallMethod(nsXPCWrappedJS * const 0x03aa05a8, unsig PrepareAndDispatch(nsXPTCStubBase * 0x03aa05a8, unsigned int 3, uns SharedStub() line 139 nsEventListenerManager::HandleEventSubType(nsListenerStruct * 0x03a nsEventListenerManager::HandleEvent(nsEventListenerManager * const nsXULElement::HandleDOMEvent(nsXULElement * const 0x02e0dcf8, nsIPr nsXULElement::HandleDOMEvent(nsXULElement * const 0x02e0df78, nsIPr nsXULElement::HandleDOMEvent(nsXULElement * const 0x02e11ff8, nsIPr nsXULElement::HandleDOMEvent(nsXULElement * const 0x038fab48, nsIPr nsXULElement::HandleDOMEvent(nsXULElement * const 0x03897040, nsIPr nsXULElement::HandleDOMEvent(nsXULElement * const 0x0392e258, nsIPr nsXULElement::HandleChromeEvent(nsXULElement * const 0x0392e268, ns GlobalWindowImpl::HandleDOMEvent(GlobalWindowImpl * const 0x0399b49 nsDocument::HandleDOMEvent(nsDocument * const 0x03ad7660, nsIPresCo ** NS_GOTFOCUS gets translated to NS_FOCUS_CONTENT nsEventStateManager::PreHandleEvent(nsEventStateManager * const 0x0 PresShell::HandleEventInternal(nsEvent * 0x0012bdfc, nsIView * 0x03 PresShell::HandleEvent(PresShell * const 0x037cd3ac, nsIView * 0x03 nsViewManager::HandleEvent(nsView * 0x03b17a20, nsGUIEvent * 0x0012 nsView::HandleEvent(nsViewManager * 0x037e7cc8, nsGUIEvent * 0x0012 nsViewManager::DispatchEvent(nsViewManager * const 0x037e7cc8, nsGU HandleEvent(nsGUIEvent * 0x0012bdfc) line 83 nsWindow::DispatchEvent(nsWindow * const 0x03b17aec, nsGUIEvent * 0 nsWindow::DispatchWindowEvent(nsGUIEvent * 0x0012bdfc) line 886 nsWindow::DispatchFocus(unsigned int 105, int 1) line 4907 + 15 byt ** WM_SETFOCUS gets dispatched, DispatchFocus receives NS_GOTFOCUS(105) nsWindow::ProcessMessage(unsigned int 7, unsigned int 395000, long nsWindow::WindowProc(HWND__ * 0x001d0708, unsigned int 7, unsigned USER32! 77e12e98() USER32! 77e139a3() USER32! 77e1395f() NTDLL! 77fa032f() ** the view's window get's destroyed with ::DestroyWindow() nsView::~nsView() line 163 nsView::`scalar deleting destructor'(unsigned int 1) + 15 bytes nsView::Destroy(nsView * const 0x03ab1f80) line 260 + 34 bytes ** the frame destroys the view nsFrame::Destroy(nsFrame * const 0x03aae504, nsIPresContext * 0x039 nsFrameList::DestroyFrames(nsIPresContext * 0x039b5028) line 131 nsContainerFrame::Destroy(nsContainerFrame * const 0x03aae474, nsIP nsLineBox::DeleteLineList(nsIPresContext * 0x039b5028, nsLineList & nsBlockFrame::Destroy(nsBlockFrame * const 0x03b181b0, nsIPresConte ** we destroy the frame before removing the line nsBlockFrame::DoRemoveFrame(nsIPresContext * 0x039b5028, nsIFrame * nsBlockFrame::RemoveFrame(nsBlockFrame * const 0x03b14d58, nsIPresC FrameManager::RemoveFrame(FrameManager * const 0x03888b28, nsIPresC ** the old content is removed nsCSSFrameConstructor::ContentRemoved(nsCSSFrameConstructor * const nsCSSFrameConstructor::RecreateFramesForContent(nsIPresContext * 0x nsCSSFrameConstructor::ProcessRestyledFrames(nsCSSFrameConstructor PresShell::ReconstructStyleData(PresShell * const 0x037cd3a8, int 0 PresShell::StyleSheetDisabledStateChanged(PresShell * const 0x037cd nsDocument::SetStyleSheetDisabledState(nsIStyleSheet * 0x03959068, ** the style sheet is disabled CSSStyleSheetImpl::SetDisabled(CSSStyleSheetImpl * const 0x0395906c XPTC_InvokeByIndex(nsISupports * 0x0395906c, unsigned int 5, unsign XPCWrappedNative::CallMethod(XPCCallContext & {...}, XPCWrappedNati XPCWrappedNative::SetAttribute(XPCCallContext & {...}) line 1826 + XPC_WN_GetterSetter(JSContext * 0x02b69008, JSObject * 0x02d8fe18, js_Invoke(JSContext * 0x02b69008, unsigned int 1, unsigned int 2) l js_InternalInvoke(JSContext * 0x02b69008, JSObject * 0x02d8fe18, lo js_SetProperty(JSContext * 0x02b69008, JSObject * 0x02d8fe18, long js_Interpret(JSContext * 0x02b69008, long * 0x0012d8f0) line 2587 + js_Invoke(JSContext * 0x02b69008, unsigned int 1, unsigned int 2) l js_InternalInvoke(JSContext * 0x02b69008, JSObject * 0x02d5d548, lo JS_CallFunctionValue(JSContext * 0x02b69008, JSObject * 0x02d5d548, nsJSContext::CallEventHandler(nsJSContext * const 0x02c1e280, void nsJSEventListener::HandleEvent(nsJSEventListener * const 0x02e18410 nsEventListenerManager::HandleEventSubType(nsListenerStruct * 0x02e nsEventListenerManager::HandleEvent(nsEventListenerManager * const nsXULElement::HandleDOMEvent(nsXULElement * const 0x02e190d8, nsIPr nsXULElement::HandleDOMEvent(nsXULElement * const 0x03938880, nsIPr nsXULElement::HandleDOMEvent(nsXULElement * const 0x03b88d10, nsIPr nsXULElement::HandleDOMEvent(nsXULElement * const 0x03cbce70, nsIPr nsXULElement::HandleDOMEvent(nsXULElement * const 0x038fe008, nsIPr PresShell::HandleDOMEventWithTarget(PresShell * const 0x019751c8, n nsMenuFrame::Execute() line 1684 nsMenuFrame::HandleEvent(nsMenuFrame * const 0x03cbab38, nsIPresCon PresShell::HandleEventInternal(nsEvent * 0x0012f8d0, nsIView * 0x03 PresShell::HandleEvent(PresShell * const 0x019751cc, nsIView * 0x03 nsViewManager::HandleEvent(nsView * 0x03c41930, nsGUIEvent * 0x0012 nsView::HandleEvent(nsViewManager * 0x02c56958, nsGUIEvent * 0x0012 nsViewManager::DispatchEvent(nsViewManager * const 0x02c56958, nsGU HandleEvent(nsGUIEvent * 0x0012f8d0) line 83 ** the UI event gets handled nsWindow::DispatchEvent(nsWindow * const 0x03c419fc, nsGUIEvent * 0 nsWindow::DispatchWindowEvent(nsGUIEvent * 0x0012f8d0) line 886 nsWindow::DispatchMouseEvent(unsigned int 301, unsigned int 0, nsPo ChildWindow::DispatchMouseEvent(unsigned int 301, unsigned int 0, n nsWindow::ProcessMessage(unsigned int 514, unsigned int 0, long 308 nsWindow::WindowProc(HWND__ * 0x000606ea, unsigned int 514, unsigne USER32! 77e12e98() USER32! 77e130e0() USER32! 77e15824() nsAppShellService::Run(nsAppShellService * const 0x0183fcf8) line 3 main1(int 2, char * * 0x00277780, nsISupports * 0x00000000) line 14 main(int 2, char * * 0x00277780) line 1763 + 37 bytes mainCRTStartup() line 338 + 17 bytes KERNEL32! 77e97d08()
i'm not in the mechanics of NS_FOCUS_CONTENT yet. could someone explain me why does the method GlobalWindowImpl::GetLength get's called there and why that method needs to flush all the pending notifications? (maybe we don't even need to call that at that point)
GlobalWindowImpl::GetLength() must flush to ensure that all [i]frame's have been created since the value returned by GetLength() relies on that. And as for why that method is called in the first place, it's called from JS, i.e. someone has what looks like a onfocus handler that touches window.length in JS, why, I don't know. I don't think any of that can be changed, so we need to fix this elsewhere.
(Thanks for continuing to look into this, Alex.) jst: will this change once IFRAMEs are initialized at frame creation time? IOW, will GetLength() still need to flush? alex: perhaps it would be possible to modify nsPresShell such that we set `mIsReflowing' in PresShell::ReconstructStyleData? (If so, we'd probably want to rename mIsReflowing to mIsSafeToReflow and invert its value. We also may want to examine other similar entry points to the frame ctor...)
Well, initially the iframe loading changes will only affect <iframe> elements, not <frame> elements, so for now GetLength() will need to flush. At some point we'll change how <frame> elements are loade too, and then we wouldn't really need to flush in GetLength(). However, GetLength() is only one of many places where we need to flush in the GlobalWindowImpl code, so even if we'd make GetLength() not flush, we'd still have this exact same problem if someone causes some of the other code that flushes to run in a situation similar to this one.
I will look at the mIsSubmitting issue proposed by Chris. I was thinking about that too, still I didn't wanted yet to go that way. Somehow the new mIsSafeToReflow seems to me to do then the same that Marc proposes in his bug 137195. Maybe this way we shoot both bugs from the tree :-) Thanks Chris and JST for helping. I'll dig into it and come back with a new patch.
Depends on: 137195
Marc and I talked at length about this on the phone. Just want to make sure we get that discussion down on paper. (Or whatever.) The solution I proposed above (and the solution that alex had originally proposed) would probably cause errors of omission, because we'll simply refuse to do the operation that's being requested. (E.g., what happens if we refuse to flush? Will this cause us to fail to properly handle the focus event? Will the OS's notion of focus be out-of-sync with the event system's?) So is there something else we can do? The _real_ problem here (no, really! :-)) is that widget destruction is forcing us to re-enter the event queue at an inopportune time. So why not defer widget destruction until it's safe for us to do so? Specifically, rather than immediately destroying the widget in nsView::Destroy, we ought to enqueue the widget for later destruction when we've re-established our layout invariants. Marc and I discussed two ways to do this. One would be to simply post a PLEvent back to the main event queue that would be responsible for destroying the widget. Another would be to maintain a special queue inside the view manager (or pres shell, or whatever) that we'd process at `safe' times, but before returning out to the main event queue. The advantage of the former solution (PLEvent) is that the implementation is simple and self-contained within the view/widget system. The down-side of this solution is that it allows a window of time where other, previously scheduled events (possibly targeted at the doomed widget) could be processed. This could lead to disaster (e.g., widget looking for a destroyed view, etc.), although we could probably mitigate that with some sanity checks in the widget. The advantage of the latter (special queue) is that we get rid of the `zombie' widgets more quickly; i.e., before returning out to the main event queue to process any events intended for the doomed widgets. (Presumably, the OS can manage cleanup of its event queue properly.) The down-side of this approach is that we need to make sure that we process the special queue after performing any widget-destroying operations: it may be difficult to pinpoint and maintain all of these. cc'ing roc & kmcclusk for input.
Marc: noted what we'd discussed above.
I vote for Plan B. We can queue destroyed widgets in the view manager and then destroy them all later. (My guess is that the presshell should poll the view manager after it gets its frame/view tree into a consistent state having restyled the frames.) We can unset the widget's view so that no events from the widget can get through. One reason I prefer this is that post-1.0 we're in a position to eliminate the use of native widgets almost completely (provided someone, me?, can figure out how to handle plugins). Thus I think the burden of keeping track of where widgets are destroyed will eventually dissipate on its own.
Why not suppress the dispatch of the NS_GOTFOCUS event within the widget code? If a widget is in the process of being destroyed we could set a flag which prevents us from dispatching any focus events. If we need to dispatch the NS_GOTFOCUS we can schedule it with a user event so that it will be processed when we return to the message pump. I think this would be safer than deferring the destruction of widgets. I think the real suprise with this bug is that the NS_GOTFOCUS is dispatched synchronously with the destruction of the widget. We also need to answer the question of whether the NS_GOTFOCUS event really needs to be dispatched when the focused widget is destroyed. We should test Linux and Mac to determine if they synchronously dispatch the NS_GOTFOCUS event to another widget when the widget with focus is destroyed. Do they even dispatch the NS_GOTFOCUS event at all when a widget is destroyed? My guess is that they do something different than WIN32 since they do not crash when loading the URL in this bug. If Linux and Mac do not dispatch the NS_GOTFOCUS we should probably suppress the NS_GOTFOCUS on WIN32. Whenever possible we should strive to make all three platforms work exactly the same at the widget and gfx level to prevent future platform specific problems.
Keywords: nsbeta1nsbeta1+
Whiteboard: [adt2]
LINUX does not dispatch syncronously the FOCUS event. Actually there is no GOT_FOCUS dispatching anymore after switching. Here comes the surprize: I reloaded the page, clicked in the <iframe> and SEGFAULT. I asked Marc to help testing on Mac.
Keywords: nsbeta1+nsbeta1
Whiteboard: [adt2]
The Linux crasher from the previous comment gets fixed if I apply the patch (attachment 78966 [details] [diff] [review]) proposed in this bug. This makes me think that there are other issues related to destroying frames before we remove empty lines. I'm not sure right now but I kindof doubt now that deffering the FOCUS message or view removal will solve all our problems (since I discovered the sequel crasher on linux and the fact that is getting fixed by the patch proposed here). Need to find the crash reason on linux though before i can make a definitive statement.
sorry, i made a wrong statement. i can get the crash even after applying the first patch on linux. (i must have made a testing mistake the first time). debugging....
nsbeta1+
Keywords: nsbeta1nsbeta1+
Alex: what you're describing in comment 33 sounds like a different issue altogether. What is the stack trace you're seeing there? If it's different than attachment 77884 [details], let's file a different bug. kmcclusk: destroying a widget will typically force us to process several events synchronously if I'm remembering my Java AWT hacking days correctly -- kin? I think we really want to wipe out the entire class of problems in one swell foop by queuing destruction.
Comment on attachment 78966 [details] [diff] [review] revised patch with comments (I don't think this is the right approach.)
Attachment #78966 - Flags: needs-work+
Whiteboard: [adt2]
In case anyone is curious, the JS code that's triggering the flush is in isDocumentFrame() in contentAreaUtils.js: http://lxr.mozilla.org/seamonkey/source/xpfe/communicator/resources/content/contentAreaUtils.js#43 It's being called from contentAreaFrameFocus() in navigator.js: http://lxr.mozilla.org/seamonkey/source/xpfe/browser/resources/content/navigator.js#167 From what I recall, Chris is right, if the native widget that's being destroyed has focus, the focus is passed up to it's parent during the destruction of the child, which may trigger immediate focus callbacks on the parent, on some window systems.
We generate cross platform events for only a small subset of the toolkit specific events. We also suppress a large number of the cross platform events because they only need to be dispatched within the cross platform code for toplevel windows not managed by views. These events including: NS_CREATE, NS_DESTROY, NS_SETZLEVEL, NS_MOVE. I put this suppression code for these in several months ago because these events are *not* needed and they were causing performance issues when child windows were destroyed on large pages. If there are other cross platform events being dispatched synchronously and they cause problems we should analyze which cross platform events are being sent and see if they are really necessary. In regards to Alex's comments: Both the WIN32 and LINUX problems point to the same issue. We have not defined or enforced what should happen with the widget module in regards to dispatching NS_GOTFOCUS when the focus'ed widget has been destroyed. Alex has identified that we behave differently at the widget module level when destroying a focus'ed widget. WIN dispatches a synchronous NS_GOTFOCUS message. Linux does not dispatch a NS_GOTFOCUS at all. For windows the synchronous dispatch causes problems because we are processing the NS_GOTFOCUS synchronously and we reflow unexpectedly. For Linux its NS_GOTFOCUS behavior causes problems because the event state manager is never told that the focus'ed object it is holding a reference has been destroyed. We should define what the correct behavior for dispatching the NS_GOTFOCUS event is. All NS_EVENTS should be dispatched in a consistent manner regardless of the platform. We should not dispatch NS_GOTFOCUS events differently between Linux and WIN32. There are two proposed solutions: solution #1: Make all platforms synchronously dispatch NS_GOTFOCUS ----------------------------------------------------- WIN32: No change LINUX: Add code to synchronously dispatch NS_GOTFOCUS when the widget is destroyed. XP CODE: Defer the widget destruction to a safe time using a PL_event solution #2: Make all platforms defer the dispatch of the NS_GOTFOCUS. ------------------------------------------------------ WIN32: Don't dispatch a NS_GOTFOCUS if the focus'ed widget is being destroyed, schedule it to be sent later. LINUX: Send a deferred NS_GOT_FOCUS XPCODE: No change. Both solutions may run into a problem with the event state manager. From the Linux crash it looks like the event state manager is referencing something that was destroyed when it processes the NS_GOTFOCUS. If we defer the NS_GOTFOCUS either by deferring the widget destruction or explicitly scheduling a user event we may crash in the event state manager later because it is trying to remove focus from something that has been destroyed. We should see what the event state manager is referencing when it crashes. Is it a widget, frame or something else? I would guess its a frame. If it's a frame then we will probably have to notify the event state manager that the frame is about to be destroyed.
mozilla/LINUX references a registered zombie listener that was listening for the focus (after my latest observations). This migth be only a 'normal' bug resulted from a mistake in the code. maybe we just lack the removal of that listener. investigating (i still have not opened a boog on that one - shame). working on it
ok, here is a little hack that fixes the crashes on both platforms: in PresShell::ReconstructStyleData(PRBool aRebuildRuleTree) { .... if (frameChange == NS_STYLE_HINT_RECONSTRUCT_ALL) set->ReconstructDocElementHierarchy(mPresContext); else { + + nsIView* view; + rootFrame->GetView(mPresContext, &view); + if(view) { + nsIWidget* widget; + view->GetWidget(widget); + if(widget) + widget->SetFocus(); + } + cssFrameConstructor->ProcessRestyledFrames(changeList, mPresContext); if (aRebuildRuleTree) { GetRootFrame(&rootFrame); WalkFramesThroughPlaceholders(mPresContext, rootFrame, .... sure, the focus is moved from the widget that's going to be destroyed and so there is no try to move the focus anymore. now i'm going to try to find a way to fix the problem in the way GOT_FOCUS is fired/handled on both platforms. (on MacOSaks I and Marc couldn't repro any of the crashes)
So this super-sloppy patch is meant as a sketch of how we might do deferred widget destruction using `Method A' (PLEvent) from comment 29. It's a windows-only fix, but it seems to work. Thoughts?
Chris, don't we want then to hide the window or push it in the background or something if is doomed to be destroyed anyway (before we VERIFY(::PostMessage...)? hiding it might cause the same sync focus event so maybe reducing the size to 0x0 would be a better approach. hmmm. I'm thinking that if you defer the destruction for every window, there some fancy 'special effects' are possible, particulary if we have some absolutly pos windows and stuff (afaik, absolutely pos windows have their own window). but this are just thoughts, I didn't tested anything yet. what about XUL and UI are they going to be affected (i'm not a specialist with the UI though)?
> don't we want then to hide the window or push it in the background or > something if is doomed to be destroyed anyway (before we > VERIFY(::PostMessage...)? That doesn't make any sense to me. It's not like this window is going to be around for any user-noticable period of time. > hmmm. I'm thinking that if you defer the destruction for every window, there > some fancy 'special effects' are possible, particulary if we have some > absolutly pos windows and stuff What special effects? > what about XUL and UI are they going to be affected Absolutely; for example, menus have widgets. If your concern is that this will somehow make the app `behave differently in a visible way', I think that is ill-founded: we're talking about changing event ordering, that's all. Try the patch, I'm pretty sure you won't be able to notice any difference in behavior (modulo bugs, of course).
the patch (attachment 81064 [details] [diff] [review]) by chris waterson works. i tested it with nested iframes, with absolute/fixed position and couldn't see problems so far. in a discution with kevin mcckluskey he suggested that we'd better check if the window has the focus and only then deffer the destruction.
there is a global flag that's set right before we call ::DestroyWindow and reset after that so every window that receives the focus will schedule the event for a later processing until the flag is getting reseted. this patch works also. chris mentioned in a chat session that we might try to do both: deferring the focus event and the window destruction. thoughts?
Does this by any chance fix bug 110112, too? This patch seems reasonable to me. Note that when I was cobbling together attachment 81064 [details] [diff] [review], I noticed that there were cases where destroying one native widget would (natch) destroy its children. You might want to sanity check this change to make sure that it doesn't somehow re-enter (and prematurely clear gIsDestroyingAny): NS_ASSERTION(!gIsDestroyingAny, "oops, already set gIsDestroyingAny"); gIsDestroyingAny = PR_TRUE; VERIFY(::DestroyWindow(mWnd)); NS_ASSERTION(gIsDestroyingAny, "oops, already cleared gIsDestroyingAny"); gIsDestroyingAny = PR_FALSE; (You don't need to check that in; maybe just run with the assertion set for a while to verify that you don't get into a re-entrancy situation.)
good ideea. now i have the problem that some checkin caused the crash reported here to be repaired so i need to see if the 1.0.0 branch is afected too. investigating. damn...
the crash is gone from both trunk and 1.0.0 branch? i'm wondering what caused that?
hehe, i think it was karnaze's fix for bug 136746 that fixed this crash :-) investigating
chris: i found out that karnaze's patch for bug 136746 fixed the crash here so i removed his patch in a local build and was able to crash again. however, i was not able to reproduce the crash of bug 110112 using the steps provided in comment http://bugzilla.mozilla.org/show_bug.cgi?id=110112#c23 on that bug, i tried other combinations and also failed to reproduce so i can't tell if the fix proposed in attachment 81791 [details] [diff] [review] fixes the crash on bug 110112 too. now the feeling i, kevin and karnaze have is to apply the patch proposed here since karnaze's patch could be removed. what do you think about?
No longer depends on: 137195
Comment on attachment 81791 [details] [diff] [review] the focus event deferring approach sr=waterson.
Attachment #81791 - Flags: superreview+
Comment on attachment 81791 [details] [diff] [review] the focus event deferring approach Looks good! r=kmcclusk@netscape.com
Comment on attachment 81791 [details] [diff] [review] the focus event deferring approach Looks good! r=kmcclusk@netscape.com
Attachment #81791 - Flags: review+
FIXED ON TRUNK (won't fix on branch only if requested by ADT since the crash was fixed by the fix for bug 136746)
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
This may also fix bug 110112.
Blocks: 110112
Marking this verified fixed. I haven't seen this crash in MozillaTrunk Talkback data for a while now.
Status: RESOLVED → VERIFIED
Keywords: approval
Whiteboard: [adt2] → [adt2 RTM]
Marking adt1.0.1+ on behalf of the adt for checkin to the 1.0 branch.
Keywords: adt1.0.1adt1.0.1+
Comment on attachment 81791 [details] [diff] [review] the focus event deferring approach Approved for 1.0 branch. Please remove mozilla1.0.1+ when it's fixed and add fixed1.0.1
Attachment #81791 - Flags: approval+
fixed on the branch 1.0.1
Verified on branch (2002-07-16-05 0S X ) and (2002-07-16-08 Windows ME).
Keywords: verified1.0.1
Crash Signature: [@ nsBlockFrame::ReflowBlockFrame]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: