Closed Bug 463350 Opened 16 years ago Closed 15 years ago

Crash [@ GetLastSpecialSibling] with -moz-column, fieldset, select

Categories

(Core :: Layout, defect, P3)

defect

Tracking

()

VERIFIED FIXED

People

(Reporter: jruderman, Assigned: roc)

References

Details

(5 keywords, Whiteboard: [sg:critical])

Crash Data

Attachments

(5 files)

This testcase causes GetLastSpecialSibling to call 0 or 0xdddddddd.
Before the crash, I see 7x

###!!! ASSERTION: frame was not removed from primary frame map before destruction or was readded to map after being removed: 'Not Reached', file /Users/jruderman/central/layout/base/nsFrameManager.cpp, line 728
Whiteboard: [sg:critical?]
The testcase crashes my linux mozilla-central build in GetLastSpecialSibling, too, with 7 copies of the assertion from comment 1. =>  Platform --> All/All.
OS: Mac OS X → All
Hardware: x86 → All
The original testcase has a very large height (72496331mm) on the body.  However, it seems this isn't required -- I get the same crash (and assertions) with height=0, as in this modified testcase.
The first line of 'boom()' is sufficient to trigger the assertions from comment 1 (without crashing), as demonstrated in this testcase.
I'm attaching the frametree of testcase 3, taken at the beginning of the reflow triggered by 'boom()' (the reflow that causes the assertion failures).

The 7 assertions failures are for these 'aFrame' pointers (in this order):
 nsScrollbarButtonFrame    0xb1bf18b0
 nsButtonBoxFrame          0xb1bf1abc
 nsSliderFrame             0xb1bf19d4
 nsScrollbarButtonFrame    0xb1bf1bdc
 nsScrollbarFrame          0xb1bf17e4
 nsGfxButtonControlFrame   0xb1b9b2e4
 nsComboboxControlFrame    0xb1bf12fc

Basically, those frames are the whole "ScrollbarFrame" subtree in the "selectPopupList" list, plus the combobox and its button.
Whiteboard: [sg:critical?] → [sg:critical]
This smells like a missing call to DeletingFrameSubtree or CleanupFrameReferences (both in nsCSSFrameConstructor.cpp).
So I think the basic problem here is that any callers to DeleteNextInFlowChild at the point where they're entering the frame destruction process (i.e., not from nsBlockFrame::RemoveFrame) need to call nsCSSFrameConstructor::RemoveMappingsForFrameSubtree, or something like it, possibly as corrected by the patch to bug 428113, although I'm really not sure.

Could somebody who understands DeleteNextInFlowChild better have a look at this?

I think we might also need a better system to decide who's responsible for calling DeletingFrameSubtree (which is what's called by RemoveMappingsForFrameSubtree) or CleanupFrameReferences; I think right now it's sort of whoever enters the process of frame destruction, since it's important that they're run from the top of whatever subtree is being destroyed.
It also seems like this could be responsible for a substantial chunk of -moz-column crashiness.

I'd note that in this case the caller of DeleteNextInFlowChild is nsBlockFrame::ReflowBlock, as seen on the stack to those 7 assertions.
To be clear -- the case where I'm really unsure what we'd want to do is if we're calling DeleteNextInFlowChild on a placeholder frame.  What do we want to do with the out-of-flows?
Then again, if we call it on the ancestor of a placeholder frame, it will delete the out-of-flows, so we should probably be consistent and use the RemoveMappingsForFrameSubtree as modified in bug 428113.  The question then is whether other code needs to be modified to deal with that.  I suspect it does.
Two more thoughts:

  * It also seems broken to me that we're using DeleteNextInFlowChild to delete things that contain primary frames rather than just continuations.  Maybe that's the underlying problem?

  * We could perhaps use CleanupFrameReferences rather than DeletingFrameSubtree... it might be that the main difference between the two is whether they delete out-of-flows, although I'd have to check that.
(In reply to comment #11)
>   * It also seems broken to me that we're using DeleteNextInFlowChild to delete
> things that contain primary frames rather than just continuations.  Maybe
> that's the underlying problem?

I don't think so. It's quite possible for a continuation to contain first-in-flows/primary frames, and DeleteNextInFlowChild needs to delete that subtree.

>   * We could perhaps use CleanupFrameReferences rather than
> DeletingFrameSubtree... it might be that the main difference between the two is
> whether they delete out-of-flows, although I'd have to check that.

Frame deletion has grown all topsy-turvy. We need to take a step back, work out what the invariants should be, and refactor it all.
(In reply to comment #12)
> Frame deletion has grown all topsy-turvy. We need to take a step back, work out
> what the invariants should be, and refactor it all.

Yeah.  I have some ideas about it.  But I actually don't think this bug is all that hard, except perhaps for figuring out what block frame's expectations regarding floats and maybe other out-of-flows are, which I think is something we'd have to do anyway.
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P3
Roc, can you suggest an owner here?
David wants me to take it. But I think if we end up cutting bugs from the blocker list, this will be one of the ones that gets cut.
Assignee: nobody → roc
Flags: wanted1.9.1+
Flags: blocking1.9.1-
Flags: blocking1.9.1+
Regression range (testing with this bug's original testcase)
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a7pre) Gecko/2007072504 Minefield/3.0a7pre
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a7pre) Gecko/2007072604 Minefield/3.0a7pre

 --> it looks like this has the same root cause as bug 472919, as dbaron suggested in bug 472919 comment 11.
Keywords: regression
Flags: wanted1.9.0.x+
Attached patch fixSplinter Review
nsFieldSetFrame::Reflow skips reflowing the content or legend children if their frame subtrees are not dirty. But we hit a case where the content child frame subtree is not dirty but it has a next-in-flow. We simply skip reflowing the content child frame, which means the reflow status for the nsFieldSetFrame ends up as NS_FRAME_COMPLETE, so the block containing the fieldset deletes all the fieldset's continuations, including a bunch of frames which include primary frames for elements. It is an error to return "COMPLETE" for the fieldset when its next-in-flow still contains non-empty-continuation content.

Basically, though, vertical breaking of fieldsets is broken. We pass the height constraint into Reflow for the legend, but nothing ever creates continuations for the legend so any legend content that doesn't fit vertically just disappears. We also don't take the fieldset borders and padding into account when adjusting the available height for the child frames, so if the child content just fits, we let it fit and then add our borders and padding at the bottom, overflowing the height constraint.

Instead of trying to fix that up, let's just disable vertical breaking of fieldsets. I have no reason to believe it matters at all. This patch does that. Note that this means a tall fieldset will be moved to the top of the next page or column if it doesn't fit where it is.

This fixes bug 472919 and bug 473410 as well as this bug.
Attachment #375443 - Flags: superreview?(dbaron)
Attachment #375443 - Flags: review?(dbaron)
Whiteboard: [sg:critical] → [sg:critical][needs review]
BTW the reason this causes a crash is that as dbaron mentioned in comment #11, block code is calling DeleteNextInFlowChild on a frame that contains more than just empty continuation frames. That causes a crash because we haven't cleaned up the primary frame map entries that map content to frames in the deleted subtree. Why can't we just remove the primary frame map entry when we destroy the frame? It seems no less efficient to me, DoDeletingFrameSubtree calls aFrameManager->RemoveAsPrimaryFrame on every removed frame already. In nsFrameManager::Destroy we can move ClearPrimaryFrameMap before rootFrame->Destroy() so that every RemoveAsPrimaryFrame is super-quick since mPrimaryFrameMap.ops will be null. It would be more robust because we'd be sure that the primary frame map entry goes away whenever the frame dies, the assert there would no longer be necessary.
Attachment #375443 - Flags: superreview?(dbaron)
Attachment #375443 - Flags: superreview+
Attachment #375443 - Flags: review?(dbaron)
Attachment #375443 - Flags: review+
http://hg.mozilla.org/mozilla-central/rev/f053a233cc7b
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: [sg:critical][needs review] → [sg:critical][needs 191 approval]
Testcase still needs to be checked in.

dbaron, any comments on comment #18?
Flags: in-testsuite?
Comment on attachment 375443 [details] [diff] [review]
fix

we should probably take this on 1.9.0. Safe and fixes a few security bugs.
Attachment #375443 - Flags: approval1.9.1?
Attachment #375443 - Flags: approval1.9.0.12?
Comment on attachment 375443 [details] [diff] [review]
fix

a191=beltzner
Attachment #375443 - Flags: approval1.9.1? → approval1.9.1+
Whiteboard: [sg:critical][needs 191 approval] → [sg:critical][needs 191 landing]
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/341b87b1ce8f
Keywords: fixed1.9.1
Whiteboard: [sg:critical][needs 191 landing] → [sg:critical]
Flags: blocking1.9.0.12+
Btw, this bug's patch applies cleanly on the 1.9.0.x branch -- no backporting needed.
Attachment #375443 - Flags: approval1.9.0.12? → approval1.9.0.12+
Comment on attachment 375443 [details] [diff] [review]
fix

Approved for 1.9.0.12, a=dveditz for release-drivers
Checking in layout/forms/nsFieldSetFrame.cpp;
/cvsroot/mozilla/layout/forms/nsFieldSetFrame.cpp,v  <--  nsFieldSetFrame.cpp
new revision: 3.170; previous revision: 3.169
Keywords: fixed1.9.0.12
Verified testcase 3 no longer asserts on 1.9.0.12 with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.12pre) Gecko/2009063012 GranParadiso/3.0.12pre.

Testcases 1 and 2 do not crash with that build or Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.12pre) Gecko/2009063005 GranParadiso/3.0.12pre (.NET CLR 3.5.30729).
Group: core-security
Crashtest: http://hg.mozilla.org/mozilla-central/rev/282da8ca07f3
Flags: in-testsuite? → in-testsuite+
verified FIXED on opt builds for the crashes:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3pre) Gecko/20090810 Shiretoko/3.5.3pre (.NET CLR 3.5.30729) ID:20090810045609

and

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a2pre) Gecko/20090810 Minefield/3.6a2pre (.NET CLR 3.5.30729) ID:20090810050215

and verified FIXED on debug build for the assertion:

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3pre) Gecko/20090811 Shiretoko/3.5.3pre ID:20090811074904
Status: RESOLVED → VERIFIED
Depends on: 471015
This bug's patch seems to have introduced a layout regression in 1.9.0.x branch (along with 1.9.1 / mozilla-central). The regression is Bug 471015 -- tall fieldsets get clamped to one page when printed now.

(I guess this outcome isn't really a surprise, though, based on comment 17 and this bug's checkin comment.)
Crash Signature: [@ GetLastSpecialSibling]
You need to log in before you can comment on or make changes to this bug.