Closed Bug 395609 Opened 17 years ago Closed 16 years ago

Crash [@ nsHTMLReflowState::GetNearestContainingBlock] with absolute positioning and overflow event doing stuff

Categories

(Core :: Layout, defect, P1)

x86
Windows XP
defect

Tracking

()

VERIFIED FIXED
mozilla1.9beta5

People

(Reporter: martijn.martijn, Assigned: smaug)

References

(Depends on 1 open bug)

Details

(Keywords: crash, regression, testcase, Whiteboard: [dbaron-1.9:Rs][depends on 402982] post 1.8-branch)

Crash Data

Attachments

(5 files, 6 obsolete files)

Attached file testcase
See testcase which crashes current trunk build on load.
This seems to have regressed between 2007-04-29 and 2007-04-30:
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-04-29+03&maxdate=2007-04-30+07&cvsroot=%2Fcvsroot
Not sure which patch might have caused it, it might be coincidence anyway.

Right before the crash I see these assertions in a debug build:
###!!! ASSERTION: reflowing in the middle of frame construction: 'mPresContext->
mLayoutPhaseCount[eLayoutPhase_FrameC] == 0', file ../../dist/include/layout/nsP
resContext.h, line 921
###!!! ASSERTION: no placeholder frame: 'nsnull != placeholderFrame', file c:/mo
zilla/mozilla/layout/generic/nsHTMLReflowState.cpp, line 1094

http://crash-stats.mozilla.com/report/index/b5aad48a-5f2b-11dc-8c22-001a4bd43e5c
0  	nsHTMLReflowState::GetNearestContainingBlock(nsIFrame*, int&, int&)  	 mozilla/layout/generic/nsHTMLReflowState.cpp:700
1 	nsHTMLReflowState::InitAbsoluteConstraints(nsPresContext*, nsHTMLReflowState const*, int, int) 	mozilla/layout/generic/nsHTMLReflowState.cpp:1100
2 	nsHTMLReflowState::InitConstraints(nsPresContext*, int, int, nsMargin const*, nsMargin const*) 	mozilla/layout/generic/nsHTMLReflowState.cpp:1803
3 	nsHTMLReflowState::Init(nsPresContext*, int, int, nsMargin const*, nsMargin const*) 	mozilla/layout/generic/nsHTMLReflowState.cpp:315
4 	nsHTMLReflowState::nsHTMLReflowState(nsPresContext*, nsHTMLReflowState const&, nsIFrame*, nsSize const&, int, int, int) 	mozilla/layout/generic/nsHTMLReflowState.cpp:180
5 	nsAbsoluteContainingBlock::ReflowAbsoluteFrame(nsIFrame*, nsPresContext*, nsHTMLReflowState const&, int, int, nsIFrame*, unsigned int&, nsRect*) 	mozilla/layout/generic/nsAbsoluteContainingBlock.cpp:369
6 	nsAbsoluteContainingBlock::Reflow(nsIFrame*, nsPresContext*, nsHTMLReflowState const&, int, int, int, int, nsRect*) 	mozilla/layout/generic/nsAbsoluteContainingBlock.cpp:159
7 	nsBlockFrame::Reflow(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned int&) 	mozilla/layout/generic/nsBlockFrame.cpp:1138
8 	nsFrame::BoxReflow(nsBoxLayoutState&, nsPresContext*, nsHTMLReflowMetrics&, nsIRenderingContext*, int, int, int, int, int) 	mozilla/layout/generic/nsFrame.cpp:6112
9 	nsFrame::RefreshSizeCache(nsBoxLayoutState&) 	mozilla/layout/generic/nsFrame.cpp:5719
10 	nsFrame::GetBoxAscent(nsBoxLayoutState&) 	mozilla/layout/generic/nsFrame.cpp:5882
etc...
Flags: blocking1.9?
Flags: blocking1.9? → blocking1.9+
Still crashing for me, using a 2007-11-01 trunk build.
Calling frameloader->Destroy() causes all sorts of do-not-do-when-destroying-
frames, in this case DocumentViewerImpl::LoadComplete/PresShell::FlushPendingNotifications

The possible problem I see with the patch is that framedocshell stays in the docshell tree a bit longer time.
The patch still keeps the behavior that if iframe is detached and the re-attached to DOM, unload fires before load. If asynchronous call is done only for
base_win->Destroy (in frameloader), unload comes after load, and that is wrong.
Group: security
Comment on attachment 287387 [details] [diff] [review]
Destroy frameloader asyncronously

I think a bit more is needed to fix other (possible) similar problems.
Attachment #287387 - Attachment is obsolete: true
Hmmm.  We have a whole bunch of bugs that could get fixed by this sort of change.

Of course the real fix is to move the frameloader out of the frame (and into the binding)!
Really to binding? But yes, anyway out from frame code.
For now I was thinking to do something similar what was done for <script>s (don't do anything in Bind/unbind, but after batch)+ what is done in the patch.
That way unload/load should fire quite closely the same way as now.
The binding is what implements all the "frame behavior" (contentDocument/contentWindow, etc), so it would make sense for it to just own the frameloader too.
Ah, you're talking about <xul:editor> (which the testcase uses), I'm talking also
about how frame loading happens for example with <html:iframe>.
But need to think this a bit.
... nsIFrame objects should not own frameloader, frameloaders (or docshells) 
should not be destroyed() when binding/unbinding, when fixing this all load/unload event handling must not change in normal cases...

Perhaps in case of XUL, nsContainerBoxObject could own frameloader.
I know some don't like box objects too much, but for this it might make sense
to keep the functionality there. That way iframe/editor/browser would get handled
in a similar way. Or something ...
html:iframe doesn't have the frame owning the frameloader.  The content node owns it.

> frameloaders (or docshells) should not be destroyed() when binding/unbinding,

We have a bug on that.  Though that's a bug about not destroying them period, not just destroying them a little later.

> Perhaps in case of XUL, nsContainerBoxObject could own frameloader.

Do we create those for bindings that extend xul:iframe?  Or are we going to rely on the removal of extends="tagname"?
How are we going to handle the removal of extends="tagname"?
Currently xul:iframe/editor/browser relies on that their boxobject implements
nsIContainerBoxObject, and the boxobject is created based on what
mBindingManager->ResolveTag() returns.
> the boxobject is created based on what mBindingManager->ResolveTag() returns

Ah, ok.  All good then.
Assignee: nobody → Olli.Pettay
Blocks: 407152
Attached patch just some ideas (obsolete) — Splinter Review
The basic idea is to split Destroy to 'pre-destroy' and Finalize, 
but not yet sure about using nsRunnable and that extra mozAutoDocUpdate.
Something like this should fix also possible problems with html frames by
making nsFrameLoader::Destroy 'safe'.
Attached patch a bit better patch (obsolete) — Splinter Review
Jonas, Boris, what do you think about the approach here?

mFinalizableFrameLoaders isn't perhaps the best name and 
nsAsyncFrameLoaderFinalize is there really just to prevent possible problems
in strange cases when ::Destroy gets called by ~nsFrameLoader. Don't have a 
testcase for that.
With the patch unload events should get fired at some reasonable time,
I think; in all normal cases not asynchronously but right after update or 
immediately. I have some simple tests for that.
Attachment #293274 - Attachment is obsolete: true
It seems like it might work... it's scary, with the stuff hanging out in this new state for a bit.  :(

I'd use an nsTArray<nsCOMPtr<nsFrameLoader> > and swap() instead of copying.  Not that we ever expect this array to be big, but still.
Attached patch possible patch (obsolete) — Splinter Review
Attachment #293408 - Attachment is obsolete: true
Attachment #293497 - Flags: review?(bzbarsky)
The update in the presshell code is wrong (will prevent correct layout in some cases), but I'd really like to know why you think it's needed....
The testcase for this bug needs that; the frame for xul:editor gets deleted 
during flush and at that time nsFrameLoader::Finalize must not be called.
But if mozAutoDocUpdate can't be used in that case, perhaps presshell
could notify document that frameloaders must not be finalized and document
could postpone finalization until the end of flush...same notification system
could be used by Begin/EndUpdate.
That sounds like a better idea.  Having the update in place would prevent counter/quote updates happening, for example...

That said, if your concern is that frame destruction could cause issues, then there are plenty of other things that you should be worrying about: restyle processing, reflow, etc.
Attached patch v2 (obsolete) — Splinter Review
Bz, comments on this one?
Attachment #293497 - Attachment is obsolete: true
Attachment #295282 - Flags: review?(bzbarsky)
Attachment #293497 - Flags: review?(bzbarsky)
I'm not sure when I'll have time to review this in the depth it needs, to be honest....  It's not likely to be any time in the next few weeks.
Comment on attachment 295282 [details] [diff] [review]
v2

Jonas?
Attachment #295282 - Flags: review?(bzbarsky) → review?(jonas)
I'm not really sure I'm the right person to review this :(
That is what I was a bit afraid of ...
Who could be the right one? dbaron? roc?
I'm completely ignorant about this stuff
dbaron then?  at least for the layout/ part?
This is something we want to have for beta3.
bz told that he may not have time to review this anytime soon, so hunting
someone else.
Attachment #295282 - Flags: review?(jonas) → review?(bzbarsky)
Sorry bz, but finding anyone else to review seems quite difficult.

This is something which should be fixed b3, IMO. This is a security bug, and the
patch does change unload/load event handling in some (rare, at least I hope so)
cases. Changes to unload/load have traditionally been pretty regression risky.
Priority: P3 → P1
I'll see what I can do, but if I recall the schedule it's very unlikely that I can possibly review this in time for beta3.
Blocks: 402982
Might it actually be less risky to change the ownership so the frame never owns the frameloader?
It would certainly be a lot simpler.
The problem there is who owns it.  The obvious candidate is the XBL binding, and we've been considering doing that... Come to think of it, since bindings are in fact properly attached and detached now maybe we can do it.  We'd need to handle the case of the binding being attached while the node is not in a document and detecting when the node _is_ added to the document, I guess.

Alternately, the boxObject could own the frameloader?  That might be simpler, in some ways.
Could we just have nsXULElement implement nsIFrameLoaderOwner as a tear-off? It could store the frame loader as a node property, delete the frame loader in a property destructor.
Hmm.  That might work too, yes...  The problem is at what point it should create and destroy the loader then.  The problem here is that whether a XUL element should be a frame loader owner depends on the binding... and the binding can change.  I suppose the boxObject approach has the same issue, so we could just make it work whenever it would get that sort of boxobject.
Why don't we just destroy the frameloader, if there is one, in UnbindFromTree or the element destructor, the way nsGenericHTMLFrameElement does? Is there a problem with allowing the frame loader to exist even when a different (or no) XBL binding is applied? 
Actually, I think this might be easiest fixed once I've landed the patch for bug 401155. It'll add a service that executes nsIRunnables when it's safe to run script.
(In reply to comment #35)
> Why don't we just destroy the frameloader, if there is one, in UnbindFromTree
> or the element destructor, the way nsGenericHTMLFrameElement does?
I don't think  nsGenericHTMLFrameElement does the right thing.
If destroying can cause scripts to be run, it shouldn't happen in unbind.

(In reply to comment #36)
> It'll add a service that executes nsIRunnables when it's safe to run script.
Sounds promising, but if that is used here, had to test
http://mozilla.pettay.fi/moztests/frame/iframe.html that firing unload
event doesn't regress (doesn't move to happen too late).

OK, so does *any* of our code for destroying frameloaders do it at a good time?
Not really, IIRC :(
The idea with the patch is to make calling ::Destroy safe.

Is there something in the patch that looks scary or something?
To me especially the content/ part is quite clear.
In the layout/ part the question is mainly that does the patch capture all the
cases.
> Why don't we just destroy the frameloader, if there is one, in UnbindFromTree
> or the element destructor, the way nsGenericHTMLFrameElement does?

If we can do this without slowing down things too much in XUL, that sounds great.

> Is there a problem with allowing the frame loader to exist even when a
> different (or no) XBL binding is applied? 

I'm not sure.  It might be OK except for memory use, but that would be a rare enough case that it should be OK.

Frankly, I think if we can make XUL behave like HTML for now that would be wonderful.  Then we can worry about fixing the remaining issue for both of them together in the same way.
Smaug, I think your patch adds complexity because it creates a new state for the frameloader. Also, having frames own things is just always bad design, as I think you know well, and if we fix that then the code in your patch is just unnecessary.

I'll try to code up "make XUL behave like HTML". Not sure when I'll have time, the next few days are nuts for me, but I'll see what I can do.
(In reply to comment #41)
> I'll try to code up "make XUL behave like HTML".

But as far as I see, that is not enough. HTML frames destroy frameloader when
unbinding - not good.

So do we have testcases similar to the tests in this bug and related bugs that show problems using HTML frames?

I agree with Boris:

> Frankly, I think if we can make XUL behave like HTML for now that would be
> wonderful.  Then we can worry about fixing the remaining issue for both of them
> together in the same way.
Bug 402982 covers the HTML issue.
OK, let's figure out what the best solution is for HTML over there, and once we've figured that out, we make XUL work the same way.
Attachment #295282 - Flags: review?(bzbarsky)
Roc said he'll take this
https://bugzilla.mozilla.org/show_bug.cgi?id=402982#c13
Assignee: Olli.Pettay → roc
Whiteboard: [dbaron-1.9:Rs] → [dbaron-1.9:Rs][depends on 402982]
Roc, do you have time for this? If not, assign back to me.
I was waiting for 402982 to land first, but if you want this, please take it.
Assignee: roc → Olli.Pettay
Attached patch WIP (obsolete) — Splinter Review
This brings XUL frame loading close to HTML frame loading. Tried to copy
behavior pretty exactly.
Passes mochitest and actually fixes this bug. Have to still test load/unload 
event handling etc. Comments?
It may cause some problems now that frame loading start earlier (bindtotree) than
before (layout).
Attachment #295282 - Attachment is obsolete: true
It might cause problems but since it's XUL we can and should change the users.

Just one nit, why not make GetFrameLoader deCOM and return an nsIFrameLoader* in nsXULElement? The forwarder in the tearoff still has to be COM of course.
(In reply to comment #50)
> It might cause problems but since it's XUL we can and should change the users.
But it is late in 1.9 and we don't want to change too many things. We should get
this landed ASAP so we get as much testing as possible.
 
> Just one nit, why not make GetFrameLoader deCOM and return an nsIFrameLoader*
> in nsXULElement? The forwarder in the tearoff still has to be COM of course.
Because with COM method I can use NS_FORWARD_NSIFRAMELOADEROWNER.
Not a big thingie anyway.
Attached patch proposed patchSplinter Review
Strange, I thought I uploaded this already yesterday.
This is something for which 2 separate reviews might be useful.
I copied some comments from HTML frame handling and cleaned up CC macros.
Attachment #302163 - Attachment is obsolete: true
Attachment #302352 - Flags: review?(roc)
I've done some testing also with thunderbird.
So far everything looks ok.
+            // XXXbz we really want to only partially destroy the frame
+            // loader... we don't want to tear down the docshell.  Food for
+            // later bug.

You'll actually fix that in your other bug, right? so you don't really need this comment.

In nsSubdocumentFrame, I think we should remove mFrameLoader completely, and just get it from the content when we need it. Then there's no worries about mFrameLoader getting out of sync and we get simpler code and less data size too.
(In reply to comment #54)
> You'll actually fix that in your other bug, right? so you don't really need
> this comment.
See https://bugzilla.mozilla.org/show_bug.cgi?id=402982#c26
I think we should keep mFrameLoader in nsSubDocumentFrame. Otherwise if
some script first removes element from document and then immediately
(perhaps using mutation events) puts it back,
nsSubDocumentFrame might use wrong frameloader when doing clean-up in
Destroy().
Keeping mFrameLoader should be less regression-risky, since
it has been used also for html:iframe etc. and the patch makes XUL to work like
HTML.
Comment on attachment 302352 [details] [diff] [review]
proposed patch

OK then
Attachment #302352 - Flags: review?(roc) → review+
Comment on attachment 302352 [details] [diff] [review]
proposed patch

Some content peer should sr this. Jonas?
Attachment #302352 - Flags: superreview?(jonas)
->P1
Priority: P3 → P1
Attached patch updated patchSplinter Review
This is the same as attachment 302352 [details] [diff] [review], with two changes:
1) updated to trunk to fix conflict with bug 406916's patch
2) change to content/xul/content/src/Makefile.in which I needed to get this to compile.
Attachment #302352 - Flags: superreview?(jonas) → superreview+
Checked in
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Flags: in-testsuite?
Backed out as the likely cause of the Txul regression.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Confirmed as cause of Txul regression on bl-bldlnx03_fx-linux-tbox-head based on Txul going back down after the backout.
Either this bug or bug 419452 (both backed out in the same cycle) was the cause of the orange on fxdebug-win32-tbox due to an assertion during the bloat test:

  WARNING: NS_ENSURE_TRUE(sgo || !hasHadScriptObject) failed: file e:/builds/tinderbox/Fx-Trunk-Memtest/WINNT_5.2_Depend/mozilla/content/base/src/nsContentUtils.cpp, line 2583
  WARNING: NS_ENSURE_TRUE(sgo || !hasHadScriptObject) failed: file e:/builds/tinderbox/Fx-Trunk-Memtest/WINNT_5.2_Depend/mozilla/content/base/src/nsContentUtils.cpp, line 2583
  ###!!! ASSERTION: invalid active window: 'Error', file e:/builds/tinderbox/Fx-Trunk-Memtest/WINNT_5.2_Depend/mozilla/embedding/components/windowwatcher/src/nsWindowWatcher.cpp, line 1086
  gklayout!nsFocusController::UpdateWWActiveWindow+0x000000000000013B (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\dom\src\base\nsfocuscontroller.cpp, line 592)
  gklayout!nsFocusController::SetActive+0x000000000000002E (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\dom\src\base\nsfocuscontroller.cpp, line 559)
  appshell!nsWebShellWindow::HandleEvent+0x0000000000000546 (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\xpfe\appshell\src\nswebshellwindow.cpp, line 473)
  gkwidget!nsWindow::DispatchEvent+0x00000000000000C6 (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\widget\src\windows\nswindow.cpp, line 972)
  gkwidget!nsWindow::DispatchWindowEvent+0x0000000000000026 (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\widget\src\windows\nswindow.cpp, line 993)
  gkwidget!nsWindow::DispatchFocus+0x00000000000000D2 (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\widget\src\windows\nswindow.cpp, line 5976)
  gkwidget!nsWindow::ProcessMessage+0x000000000000101A (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\widget\src\windows\nswindow.cpp, line 4530)
  gkwidget!nsWindow::WindowProc+0x0000000000000178 (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\widget\src\windows\nswindow.cpp, line 1187)
  USER32!LoadCursorW+0x0000000000004CF5
  USER32!LoadCursorW+0x0000000000004E86
  USER32!GetMessageW+0x000000000000009F
  USER32!GetClientRect+0x000000000000004A
  ntdll!KiUserCallbackDispatcher+0x000000000000002E
  gklayout!nsView::~nsView+0x00000000000001AA (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\view\src\nsview.cpp, line 274)
  gklayout!nsView::`scalar deleting destructor'+0x000000000000000F
  gklayout!nsIView::Destroy+0x000000000000002A (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\view\src\nsview.cpp, line 314)
  gklayout!nsFrame::Destroy+0x00000000000000D5 (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\layout\generic\nsframe.cpp, line 510)
  gklayout!nsSplittableFrame::Destroy+0x000000000000002D (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\layout\generic\nssplittableframe.cpp, line 74)
  gklayout!nsContainerFrame::Destroy+0x000000000000017E (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\layout\generic\nscontainerframe.cpp, line 300)
  gklayout!ViewportFrame::Destroy+0x000000000000001E (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\layout\generic\nsviewportframe.cpp, line 69)
  gklayout!nsFrameManager::Destroy+0x0000000000000079 (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\layout\base\nsframemanager.cpp, line 284)
  gklayout!PresShell::Destroy+0x0000000000000401 (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\layout\base\nspresshell.cpp, line 1676)
  gklayout!DocumentViewerImpl::Destroy+0x000000000000059C (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\layout\base\nsdocumentviewer.cpp, line 1522)
  docshell!nsDocShell::Destroy+0x0000000000000303 (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\docshell\base\nsdocshell.cpp, line 3654)
  appshell!nsXULWindow::Destroy+0x000000000000035A (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\xpfe\appshell\src\nsxulwindow.cpp, line 503)
  appshell!nsWebShellWindow::Destroy+0x000000000000014F (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\xpfe\appshell\src\nswebshellwindow.cpp, line 840)
  appshell!nsContentTreeOwner::Destroy+0x0000000000000047 (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\xpfe\appshell\src\nscontenttreeowner.cpp, line 563)
  gklayout!nsGlobalWindow::ReallyCloseWindow+0x000000000000021B (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\dom\src\base\nsglobalwindow.cpp, line 5412)
  gklayout!nsCloseEvent::Run+0x0000000000000024 (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\dom\src\base\nsglobalwindow.cpp, line 5220)
  xpcom_core!nsThread::ProcessNextEvent+0x00000000000001FA (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\xpcom\threads\nsthread.cpp, line 511)
  xpcom_cnsStringStats
   => mAllocCount:          33007
   => mReallocCount:         4184
   => mFreeCount:           27642  --  LEAKED 5365 !!!
   => mShareCount:          29820
   => mAdoptCount:           4329
   => mAdoptFreeCount:       4294  --  LEAKED 35 !!!
  ore!NS_ProcessNextEvent_P+0x0000000000000053 (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\obj-fx-trunk\xpcom\build\nsthreadutils.cpp, line 227)
  gkwidget!nsBaseAppShell::Run+0x000000000000005D (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\widget\src\xpwidgets\nsbaseappshell.cpp, line 151)
  tkitcmps!nsAppStartup::Run+0x000000000000006B (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\toolkit\components\startup\src\nsappstartup.cpp, line 181)
  xul!XRE_main+0x0000000000003350 (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\toolkit\xre\nsapprunner.cpp, line 3149)
  firefox!NS_internal_main+0x00000000000002B2 (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\browser\app\nsbrowserapp.cpp, line 158)
  firefox!wmain+0x0000000000000119 (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\toolkit\xre\nswindowswmain.cpp, line 87)
  firefox!__tmainCRTStartup+0x00000000000001A6 (f:\sp\vctools\crt_bld\self_x86\crt\src\crtexe.c, line 594)
  firefox!wmainCRTStartup+0x000000000000000D (f:\sp\vctools\crt_bld\self_x86\crt\src\crtexe.c, line 414)
  kernel32!ProcessIdToSessionId+0x0000000000000209


From the log, it also looks like more DOMWINDOWs were created than before in those broken cycles; something Jesse also saw in one of his builds, along with a bunch of other brokenness (but that went away when he updated).
(In reply to comment #65)
> From the log, it also looks like more DOMWINDOWs were created than before in
> those broken cycles;
It is possible that more DOMWindows are created with the patch, because 
frameloader is created when element is attached to DOM. Without the patch
frameloader is created only if there is a layout object created for the
element. What you mean with "those broken cycles"?

The assertion happened only on windows tbox? Can't reproduce on linux, not even
while running mochitest, browsertest or chrometest.
(In reply to comment #65)
>   gkwidget!nsWindow::WindowProc+0x0000000000000178
> (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\widget\src\windows\nswindow.cpp,
> line 1187)
>   USER32!LoadCursorW+0x0000000000004CF5
>   USER32!LoadCursorW+0x0000000000004E86
>   USER32!GetMessageW+0x000000000000009F
>   USER32!GetClientRect+0x000000000000004A
>   ntdll!KiUserCallbackDispatcher+0x000000000000002E
>   gklayout!nsView::~nsView+0x00000000000001AA
> (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\view\src\nsview.cpp,
> line 274)
>   gklayout!nsView::`scalar deleting destructor'+0x000000000000000F
>   gklayout!nsIView::Destroy+0x000000000000002A
> (e:\builds\tinderbox\fx-trunk-memtest\winnt_5.2_depend\mozilla\view\src\nsview.cpp,
> line 314)
I don't understand how ~nsView causes those ntdll! and USER32!.
nsView.cpp:274 releases mWindow, which is an nsIWidget.
Does mWindow point to a deleted object?
Btw, the testcase in this bug doesn't crash now because of the patch for bug 402982, but bug 402034 shows still similar unsafe crash.
Since Neil relanded and didn't cause the orange, this patch was probably responsible for the orange in comment 65.
Doesn't destroying widgets on Windows spin the event loop?
Ah, I'm not familiar with windows widgets. Sounds bad actually.
(In reply to comment #70)
> Doesn't destroying widgets on Windows spin the event loop?
But if that is really true, shouldn't it get fixed?
Could we delete mWnd asynchronously?
Depends on: 419798
I'd appreciate any good suggestions what to do here.
Apparently the patch breaks some extensions, for example Adblock plus (bug 419798). We could delay frame content loading until layout object is created
for the xul element, but the assertion is about deletion.
If event loop runs when nsIFrame tree is destroyed, not good.
I don't have Windows dev system to test that.
> Apparently the patch breaks some extensions, for example Adblock plus

Do we know why?  Is Adblock making unwarranted assumptions, or is something else going on?
(In reply to comment #66)
> It is possible that more DOMWindows are created with the patch, because 
> frameloader is created when element is attached to DOM. Without the patch
> frameloader is created only if there is a layout object created for the
> element.

Maybe we just need to change some chrome code to defer inserting the element into the DOM in a place where right now it's relying on display:none to defer loading?

(In reply to comment #70)
> Doesn't destroying widgets on Windows spin the event loop?

Not exactly, at least, not any more. Windows dispatches focus events during HWND destruction and Mac has a similar problem. In bug 399852 we very recently landed a fix to defer focus changes triggered during nsCSSFrameConstructor::BeginUpdate/EndUpdate, at the view manager level. So it's no longer possible to reach the Gecko event loop and do arbitrary things during widget destruction.

Unfortunately in this case we didn't reach the view manager. We exploded in nsWebShellWindow::HandleEvent where it handles GOTFOCUS. If we can fix that code to be safe while we're in widget/frame/view destruction, we should be OK. However it might make more sense to rework the event callback mechanism to call it from the view manager, if that's possible, so that focus events are deferred for it.
I wonder what would be the safe place to unsuppress. Would something break
if that happens asynchronously (when suppression has happened because of 

> However it might make more sense to rework the event callback mechanism to call
> it from the view manager, if that's possible, so that focus events are deferred
> for it.
I don't quite follow this.

 

(In reply to comment #74)
> Do we know why?  Is Adblock making unwarranted assumptions, or is something
> else going on?
Just doing some more tests.

>  Would something break if that happens asynchronously

Probably.

>> However it might make more sense to rework the event callback mechanism to call
>> it from the view manager, if that's possible, so that focus events are deferred
>> for it.
>I don't quite follow this.

Right now we're calling mEventCallback with GOTFOCUS when the OS event comes in, which is killing us. That needs to be deferred using the view manager's focus-change-deferral mechanism. The easiest way to do that would be to move it to the view manager.
I'm still not sure what breaks ABP. Asked trev.moz@adblockplus.org about that.
The patch does have the problem that it doesn't check whether element is in
overlay document, but adding the checks doesn't fix ABP problems.
I did also a lot simpler patch which doesn't change ownership of frameloader, 
but that doesn't fix bug 402034.
...still trying to find some reasonable fix. And haven't yet done anything to
fix those non-Linux problems... ugh.
My brains need vacation :)
The problem with mEventCallback and GOTFOCUS is something we must fix even apart from the rest of this bug.
Flags: blocking1.9+
(In reply to comment #75)
> Unfortunately in this case we didn't reach the view manager. We exploded in
> nsWebShellWindow::HandleEvent where it handles GOTFOCUS. If we can fix that
> code to be safe while we're in widget/frame/view destruction, we should be OK.
> However it might make more sense to rework the event callback mechanism to call
> it from the view manager, if that's possible, so that focus events are deferred
> for it.

Scrub that. What actually happens is that nsWebShellWindow's top-level widget has no associated view, its events go straight to nsWebShellWindow::HandleEvent. And that has no facility for deferring focus.

I think we need a mechanism like nsIViewManager::Suppress/UnsupressFocusEvents, but for the nsWebShellWindow, called from nsCSSFrameConstructor where we call the viewmanager APIs now.
testcase does not crash a FF 2.0.0.13pre build -- assuming this isn't needed on the 1.8 branch. Please correct me before I unhide the bug if I'm wrong.
Flags: wanted1.8.1.x-
Whiteboard: [dbaron-1.9:Rs][depends on 402982] → [dbaron-1.9:Rs][depends on 402982] post 1.8-branch
I do think we want some fix for 1.8. I'm still trying to find the
less-regression-risky-patch for trunk - or hoping Wladimir gives some ideas after
testing tryserver builds (https://bugzilla.mozilla.org/show_bug.cgi?id=419798#c13).

Some help to fix Windows-event-loop-handling would be great.
Chris, since you fixed Bug 399852, could you think any simple way to
fix the windows-event-loop-problem here?
(In reply to comment #83)
> Chris, since you fixed Bug 399852, could you think any simple way to
> fix the windows-event-loop-problem here?

Nothing springs to mind sorry. It sounds like there's no simpler way other than doing what Roc suggests in comment 80, something similar to the patch for bug 399852. Have nsWebShellWindow remember what window should be focused, and defer focus events while constructing/destructing frames/views/widgets. When deferring is disabled, if there was a focus change during deferment, focus what should have been focused, if it's not been destroyed in the meantime.

You might try calling nsViewManager::IsFocusSuppressed() at the point of crash, and if its on then you'd only need to suppress webshell focus events at the same places that we suppress ViewManager focus events.
I've been meaning to write the nsWebShellWindow fix but I haven't gotten around to it yet. Maybe in a day or two.

I think only other big thing we need to do is track down where those extra documents are being created and see if it's just some chrome that needs tweaking or whether we need to actually change the timing of window instantiation.
This also makes GetFrameLoader to work like with HTML - just return possible
existing frameloader, do not create a new one.
Blocks: 420415
Flags: tracking1.9+
Comment on attachment 307449 [details] [diff] [review]
with check for overlays

Jonas, want to do a quick review for this.
The main change is in
::LoadSrc, adding the check for overlays.

This might prevent some extra documents to be created, if xul:browser was used in a overlay.
Overall this change may cause more documents, if xul:browser isn't ever set visible. But still, this change is IMO the right thing to do.
Not sure what can be done if this causes small txul regression :/

Should the window-event-loop problem go to some other bug (which blocks this one)?
Attachment #307449 - Flags: superreview?(jonas)
Attachment #307449 - Flags: review?(jonas)
So one cause for the extra created document is sidebar. By default the
xul:browser doesn't seem to have src attribute, so about:blank document is created.
And there is also xul:iframe for toolbar customization.
That is something to fix.
The toolbar customization iframe is used only on mac.
I tested this on linux by enabling 
TOOLBAR_CUSTOMIZATION_SHEET, but some mac testing is needed too, with the
main patch in this bug and with the patch for Bug 420415 :/

(Do I finally need to buy a mac on Monday)

But since this is for mac only, I'm not sure why linux txul went up a bit when
I originally checked in the patch.
Attachment #308076 - Flags: review?(gavin.sharp)
Attachment #308076 - Flags: review?(gavin.sharp) → review?(mano)
Attachment #307449 - Flags: superreview?(jonas)
Attachment #307449 - Flags: superreview+
Attachment #307449 - Flags: review?(jonas)
Attachment #307449 - Flags: review+
Depends on: 423269
Comment on attachment 308076 [details] [diff] [review]
possible fix for osx toolbar customization

r=mano
Attachment #308076 - Flags: review?(mano) → review+
Target Milestone: --- → mozilla1.9beta5
Attachment #307449 - Flags: approval1.9b5?
Attachment #308076 - Flags: approval1.9b5?
Comment on attachment 307449 [details] [diff] [review]
with check for overlays

a=beltzner
Attachment #307449 - Flags: approval1.9b5? → approval1.9b5+
Comment on attachment 308076 [details] [diff] [review]
possible fix for osx toolbar customization

a=beltzner
Attachment #308076 - Flags: approval1.9b5? → approval1.9b5+
Checked in.
So far talos looks ok and no assertion.
Bug 419798 is open to get ABP extension fixed.
Closing this one.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Verified fixed, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b5pre) Gecko/2008032204 Minefield/3.0b5pre
Status: RESOLVED → VERIFIED
Depends on: 425015
For the record, this caused bug 425814.
Blocks: 425814
Depends on: 428533
Depends on: 431082
(In reply to comment #75)
> (In reply to comment #66)
> > It is possible that more DOMWindows are created with the patch, because 
> > frameloader is created when element is attached to DOM. Without the patch
> > frameloader is created only if there is a layout object created for the
> > element.
> 
> Maybe we just need to change some chrome code to defer inserting the element
> into the DOM in a place where right now it's relying on display:none to defer
> loading?
> 
> (In reply to comment #70)
> > Doesn't destroying widgets on Windows spin the event loop?
> 
> Not exactly, at least, not any more. Windows dispatches focus events during
> HWND destruction and Mac has a similar problem. In bug 399852 we very recently
> landed a fix to defer focus changes triggered during
> nsCSSFrameConstructor::BeginUpdate/EndUpdate, at the view manager level. So
> it's no longer possible to reach the Gecko event loop and do arbitrary things
> during widget destruction.
> 
> Unfortunately in this case we didn't reach the view manager. We exploded in
> nsWebShellWindow::HandleEvent where it handles GOTFOCUS. If we can fix that
> code to be safe while we're in widget/frame/view destruction, we should be OK.
> However it might make more sense to rework the event callback mechanism to call
> it from the view manager, if that's possible, so that focus events are deferred
> for it.
Group: core-security
Crash Signature: [@ nsHTMLReflowState::GetNearestContainingBlock]
Depends on: 844549
Backed out the test since it caused intermittent failures due to referring
to external resources, i.e. <editor src="http://google.com/"/>
https://hg.mozilla.org/integration/mozilla-inbound/rev/689fff301d90
Flags: in-testsuite+ → in-testsuite-
You need to log in before you can comment on or make changes to this bug.