Closed Bug 553363 Opened 13 years ago Closed 13 years ago

docshell/base/crashtests/432114-1.html fails to fire onload with lazyfc


(Core :: DOM: Navigation, defect)

Not set





(Reporter: tnikkel, Assigned: tnikkel)




(2 files)

Attached patch patchSplinter Review
docshell/base/crashtests/432114-1.html contains an iframe which adds an event listener for DOMNodeInserted that removes itself (the iframe) from the parent document when called. The iframe also contains a contenteditable frameset element. After the last onload blocker is gone we call nsDocLoader::DocLoaderIsEmpty, and it finds that it is not busy so it flushes layout. This creates the iframe's frame in the parent document and does the InitialReflow for the iframe document. This triggers the editor's bogus node insert for the content editable frameset, and this triggers the DOMNodeInserted event listener (ugh). This removes the iframe from the document and puts its frameloader in the finalize list. When the outermost update we are in finishes we finalize the frameloader which calls Destroy on the docshell, which calls Stop on the docloader, removes the docshell (and hence docloader) as a child of its parent, and then calls Destroy on the docloader. The docloader is still busy (it is in the middle of flushing layout), and removing it as a child from its parent does not call DocLoaderIsEmpty on the parent at all, so the parent never gets to onload, even though it has nothing left to do.

This kind of reentrancy into the docloader to destroy itself while it is flushing is not a good situation. The only place I could see to try to entirely avoid this situation is by making some of the frameloader finalize/destroy stuff async by posting a runnable to the main thread to do it. But this seemed like a very delicate thing to do without causing serious regressions.

So I was left with a bunch of solutions none of which I was really happy with, but I think least bad one is to call DocLoaderIsEmpty in the parent when a child is removed from it that is about to be destroyed.
Component: Layout → Document Navigation
QA Contact: layout → docshell
Attachment #433377 - Flags: review?(bzbarsky)
The situation described in comment 0 should be triggering asserts, right? Specifically this one in nsDocLoader::Stop:

346   NS_ASSERTION(!IsBusy(), "Shouldn't be busy here");
Yes, and a few more. The patch of course doesn't get rid of the assertions, it just allows onload to fire.
OK.  So what if nsDocLoader::Stop set another boolean flag (mIsFlushingLayoutAffectsIsBusy?) which defaults true to false?  Then IsBusy would check that flag before checking mIsFlushingLayout.  And the assert in DocLoaderIsEmpty could be changed to:

  !mIsFlushingLayout || (!mIsFlushingLayoutAffectsIsBusy && !aFlushLayout)

or some such.  Or the layout flush could be conditioned on !mIsFlushingLayout.

Or heck, simply set mIsFlushingLayout to false before calling DocLoaderIsEmpty....  The point being to make IsBusy() no longer return true once we're being stopped.

The first approach, which prevents us calling into flush while inside a flush even if someone calls DocLoaderIsEmpty with PR_TRUE for aFlushLayout after Stop() has been called but still under the layout flush is probably the safest one.

I _think_ either of these solutions is somewhat clearer than the attached patch...
I initially considered that solution, but I didn't like it because it could affect more situations than just the flush-stop-destroy one that causes the problem. (flush-stop by itself would be okay because when the flush finished it would still be able to call DocLoaderIsEmpty on its parent.) But I can go with it.
Attached patch alternate patchSplinter Review
Attachment #437653 - Flags: review?(bzbarsky)
Comment on attachment 437653 [details] [diff] [review]
alternate patch

I don't think you need the CID change.  r=me with that taken out.
Attachment #437653 - Flags: review?(bzbarsky) → review+
Attachment #433377 - Flags: review?(bzbarsky)
Made that changed and pushed
Closed: 13 years ago
Resolution: --- → FIXED
Backed out because something in the push caused tsvg_opacity regression.
Resolution: FIXED → ---
Repushed this only and the talos numbers look okay.
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.