Last Comment Bug 463350 - Crash [@ GetLastSpecialSibling] with -moz-column, fieldset, select
: Crash [@ GetLastSpecialSibling] with -moz-column, fieldset, select
Status: VERIFIED FIXED
[sg:critical]
: crash, regression, testcase, verified1.9.0.12, verified1.9.1
Product: Core
Classification: Components
Component: Layout (show other bugs)
: Trunk
: All All
: P3 critical (vote)
: ---
Assigned To: Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
:
Mentors:
Depends on: 471015
Blocks: randomclasses 343943 472919 473410 474075 486428
  Show dependency treegraph
 
Reported: 2008-11-05 20:39 PST by Jesse Ruderman
Modified: 2011-06-13 10:01 PDT (History)
12 users (show)
roc: blocking1.9.1-
roc: wanted1.9.1+
dveditz: blocking1.9.0.12+
dveditz: wanted1.9.0.x+
jruderman: in‑testsuite+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
testcase (crashes Firefox when loaded) (482 bytes, text/html)
2008-11-05 20:39 PST, Jesse Ruderman
no flags Details
testcase 2: height=0 on body (crashes Firefox when loaded) (473 bytes, text/html)
2008-12-29 13:57 PST, Daniel Holbert [:dholbert]
no flags Details
testcase 3: just triggers assertions (375 bytes, text/html)
2008-12-29 14:03 PST, Daniel Holbert [:dholbert]
no flags Details
frametree of testcase 3, before first assertion (8.74 KB, text/plain)
2008-12-29 14:44 PST, Daniel Holbert [:dholbert]
no flags Details
fix (2.03 KB, patch)
2009-05-01 22:15 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
dbaron: review+
dbaron: superreview+
mbeltzner: approval1.9.1+
dveditz: approval1.9.0.12+
Details | Diff | Splinter Review

Description Jesse Ruderman 2008-11-05 20:39:17 PST
Created attachment 346609 [details]
testcase (crashes Firefox when loaded)

This testcase causes GetLastSpecialSibling to call 0 or 0xdddddddd.
Comment 1 Jesse Ruderman 2008-11-05 20:39:49 PST
Before the crash, I see 7x

###!!! ASSERTION: frame was not removed from primary frame map before destruction or was readded to map after being removed: 'Not Reached', file /Users/jruderman/central/layout/base/nsFrameManager.cpp, line 728
Comment 2 Daniel Holbert [:dholbert] 2008-12-29 13:32:54 PST
The testcase crashes my linux mozilla-central build in GetLastSpecialSibling, too, with 7 copies of the assertion from comment 1. =>  Platform --> All/All.
Comment 3 Daniel Holbert [:dholbert] 2008-12-29 13:57:16 PST
Created attachment 354737 [details]
testcase 2: height=0 on body  (crashes Firefox when loaded)

The original testcase has a very large height (72496331mm) on the body.  However, it seems this isn't required -- I get the same crash (and assertions) with height=0, as in this modified testcase.
Comment 4 Daniel Holbert [:dholbert] 2008-12-29 14:03:25 PST
Created attachment 354742 [details]
testcase 3: just triggers assertions

The first line of 'boom()' is sufficient to trigger the assertions from comment 1 (without crashing), as demonstrated in this testcase.
Comment 5 Daniel Holbert [:dholbert] 2008-12-29 14:44:07 PST
Created attachment 354744 [details]
frametree of testcase 3, before first assertion

I'm attaching the frametree of testcase 3, taken at the beginning of the reflow triggered by 'boom()' (the reflow that causes the assertion failures).

The 7 assertions failures are for these 'aFrame' pointers (in this order):
 nsScrollbarButtonFrame    0xb1bf18b0
 nsButtonBoxFrame          0xb1bf1abc
 nsSliderFrame             0xb1bf19d4
 nsScrollbarButtonFrame    0xb1bf1bdc
 nsScrollbarFrame          0xb1bf17e4
 nsGfxButtonControlFrame   0xb1b9b2e4
 nsComboboxControlFrame    0xb1bf12fc

Basically, those frames are the whole "ScrollbarFrame" subtree in the "selectPopupList" list, plus the combobox and its button.
Comment 6 David Baron :dbaron: ⌚️UTC+2 (review requests must explain patch) 2009-01-24 21:47:37 PST
This smells like a missing call to DeletingFrameSubtree or CleanupFrameReferences (both in nsCSSFrameConstructor.cpp).
Comment 7 David Baron :dbaron: ⌚️UTC+2 (review requests must explain patch) 2009-01-24 22:02:39 PST
So I think the basic problem here is that any callers to DeleteNextInFlowChild at the point where they're entering the frame destruction process (i.e., not from nsBlockFrame::RemoveFrame) need to call nsCSSFrameConstructor::RemoveMappingsForFrameSubtree, or something like it, possibly as corrected by the patch to bug 428113, although I'm really not sure.

Could somebody who understands DeleteNextInFlowChild better have a look at this?

I think we might also need a better system to decide who's responsible for calling DeletingFrameSubtree (which is what's called by RemoveMappingsForFrameSubtree) or CleanupFrameReferences; I think right now it's sort of whoever enters the process of frame destruction, since it's important that they're run from the top of whatever subtree is being destroyed.
Comment 8 David Baron :dbaron: ⌚️UTC+2 (review requests must explain patch) 2009-01-24 22:04:53 PST
It also seems like this could be responsible for a substantial chunk of -moz-column crashiness.

I'd note that in this case the caller of DeleteNextInFlowChild is nsBlockFrame::ReflowBlock, as seen on the stack to those 7 assertions.
Comment 9 David Baron :dbaron: ⌚️UTC+2 (review requests must explain patch) 2009-01-24 22:21:21 PST
To be clear -- the case where I'm really unsure what we'd want to do is if we're calling DeleteNextInFlowChild on a placeholder frame.  What do we want to do with the out-of-flows?
Comment 10 David Baron :dbaron: ⌚️UTC+2 (review requests must explain patch) 2009-01-24 22:22:19 PST
Then again, if we call it on the ancestor of a placeholder frame, it will delete the out-of-flows, so we should probably be consistent and use the RemoveMappingsForFrameSubtree as modified in bug 428113.  The question then is whether other code needs to be modified to deal with that.  I suspect it does.
Comment 11 David Baron :dbaron: ⌚️UTC+2 (review requests must explain patch) 2009-01-25 11:43:25 PST
Two more thoughts:

  * It also seems broken to me that we're using DeleteNextInFlowChild to delete things that contain primary frames rather than just continuations.  Maybe that's the underlying problem?

  * We could perhaps use CleanupFrameReferences rather than DeletingFrameSubtree... it might be that the main difference between the two is whether they delete out-of-flows, although I'd have to check that.
Comment 12 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2009-01-25 13:26:11 PST
(In reply to comment #11)
>   * It also seems broken to me that we're using DeleteNextInFlowChild to delete
> things that contain primary frames rather than just continuations.  Maybe
> that's the underlying problem?

I don't think so. It's quite possible for a continuation to contain first-in-flows/primary frames, and DeleteNextInFlowChild needs to delete that subtree.

>   * We could perhaps use CleanupFrameReferences rather than
> DeletingFrameSubtree... it might be that the main difference between the two is
> whether they delete out-of-flows, although I'd have to check that.

Frame deletion has grown all topsy-turvy. We need to take a step back, work out what the invariants should be, and refactor it all.
Comment 13 David Baron :dbaron: ⌚️UTC+2 (review requests must explain patch) 2009-01-25 14:13:11 PST
(In reply to comment #12)
> Frame deletion has grown all topsy-turvy. We need to take a step back, work out
> what the invariants should be, and refactor it all.

Yeah.  I have some ideas about it.  But I actually don't think this bug is all that hard, except perhaps for figuring out what block frame's expectations regarding floats and maybe other out-of-flows are, which I think is something we'd have to do anyway.
Comment 14 Damon Sicore (:damons) 2009-01-28 11:50:00 PST
Roc, can you suggest an owner here?
Comment 15 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2009-01-28 14:50:10 PST
David wants me to take it. But I think if we end up cutting bugs from the blocker list, this will be one of the ones that gets cut.
Comment 16 Daniel Holbert [:dholbert] 2009-02-18 10:54:14 PST
Regression range (testing with this bug's original testcase)
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a7pre) Gecko/2007072504 Minefield/3.0a7pre
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a7pre) Gecko/2007072604 Minefield/3.0a7pre

 --> it looks like this has the same root cause as bug 472919, as dbaron suggested in bug 472919 comment 11.
Comment 17 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2009-05-01 22:15:47 PDT
Created attachment 375443 [details] [diff] [review]
fix

nsFieldSetFrame::Reflow skips reflowing the content or legend children if their frame subtrees are not dirty. But we hit a case where the content child frame subtree is not dirty but it has a next-in-flow. We simply skip reflowing the content child frame, which means the reflow status for the nsFieldSetFrame ends up as NS_FRAME_COMPLETE, so the block containing the fieldset deletes all the fieldset's continuations, including a bunch of frames which include primary frames for elements. It is an error to return "COMPLETE" for the fieldset when its next-in-flow still contains non-empty-continuation content.

Basically, though, vertical breaking of fieldsets is broken. We pass the height constraint into Reflow for the legend, but nothing ever creates continuations for the legend so any legend content that doesn't fit vertically just disappears. We also don't take the fieldset borders and padding into account when adjusting the available height for the child frames, so if the child content just fits, we let it fit and then add our borders and padding at the bottom, overflowing the height constraint.

Instead of trying to fix that up, let's just disable vertical breaking of fieldsets. I have no reason to believe it matters at all. This patch does that. Note that this means a tall fieldset will be moved to the top of the next page or column if it doesn't fit where it is.

This fixes bug 472919 and bug 473410 as well as this bug.
Comment 18 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2009-05-01 22:23:09 PDT
BTW the reason this causes a crash is that as dbaron mentioned in comment #11, block code is calling DeleteNextInFlowChild on a frame that contains more than just empty continuation frames. That causes a crash because we haven't cleaned up the primary frame map entries that map content to frames in the deleted subtree. Why can't we just remove the primary frame map entry when we destroy the frame? It seems no less efficient to me, DoDeletingFrameSubtree calls aFrameManager->RemoveAsPrimaryFrame on every removed frame already. In nsFrameManager::Destroy we can move ClearPrimaryFrameMap before rootFrame->Destroy() so that every RemoveAsPrimaryFrame is super-quick since mPrimaryFrameMap.ops will be null. It would be more robust because we'd be sure that the primary frame map entry goes away whenever the frame dies, the assert there would no longer be necessary.
Comment 19 David Baron :dbaron: ⌚️UTC+2 (review requests must explain patch) 2009-05-05 15:35:12 PDT
Comment on attachment 375443 [details] [diff] [review]
fix

r+sr=dbaron
Comment 20 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2009-05-08 13:31:47 PDT
http://hg.mozilla.org/mozilla-central/rev/f053a233cc7b
Comment 21 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2009-05-08 13:33:09 PDT
Testcase still needs to be checked in.

dbaron, any comments on comment #18?
Comment 22 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2009-05-08 13:36:31 PDT
Comment on attachment 375443 [details] [diff] [review]
fix

we should probably take this on 1.9.0. Safe and fixes a few security bugs.
Comment 23 Mike Beltzner [:beltzner, not reading bugmail] 2009-05-14 12:50:07 PDT
Comment on attachment 375443 [details] [diff] [review]
fix

a191=beltzner
Comment 24 Daniel Holbert [:dholbert] 2009-05-22 10:58:07 PDT
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/341b87b1ce8f
Comment 25 Daniel Holbert [:dholbert] 2009-05-22 11:59:53 PDT
Btw, this bug's patch applies cleanly on the 1.9.0.x branch -- no backporting needed.
Comment 26 Daniel Veditz [:dveditz] 2009-05-28 11:17:18 PDT
Comment on attachment 375443 [details] [diff] [review]
fix

Approved for 1.9.0.12, a=dveditz for release-drivers
Comment 27 Daniel Veditz [:dveditz] 2009-06-24 23:04:36 PDT
Checking in layout/forms/nsFieldSetFrame.cpp;
/cvsroot/mozilla/layout/forms/nsFieldSetFrame.cpp,v  <--  nsFieldSetFrame.cpp
new revision: 3.170; previous revision: 3.169
Comment 28 Al Billings [:abillings] 2009-06-30 17:35:54 PDT
Verified testcase 3 no longer asserts on 1.9.0.12 with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.12pre) Gecko/2009063012 GranParadiso/3.0.12pre.

Testcases 1 and 2 do not crash with that build or Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.12pre) Gecko/2009063005 GranParadiso/3.0.12pre (.NET CLR 3.5.30729).
Comment 29 Jesse Ruderman 2009-08-08 16:03:15 PDT
Crashtest: http://hg.mozilla.org/mozilla-central/rev/282da8ca07f3
Comment 30 Aakash Desai [:aakashd] 2009-08-11 08:46:25 PDT
verified FIXED on opt builds for the crashes:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3pre) Gecko/20090810 Shiretoko/3.5.3pre (.NET CLR 3.5.30729) ID:20090810045609

and

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a2pre) Gecko/20090810 Minefield/3.6a2pre (.NET CLR 3.5.30729) ID:20090810050215

and verified FIXED on debug build for the assertion:

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.3pre) Gecko/20090811 Shiretoko/3.5.3pre ID:20090811074904
Comment 31 Daniel Holbert [:dholbert] 2009-10-27 10:51:11 PDT
This bug's patch seems to have introduced a layout regression in 1.9.0.x branch (along with 1.9.1 / mozilla-central). The regression is Bug 471015 -- tall fieldsets get clamped to one page when printed now.

(I guess this outcome isn't really a surprise, though, based on comment 17 and this bug's checkin comment.)

Note You need to log in before you can comment on or make changes to this bug.