Last Comment Bug 467493 - Crash [@ nsHTMLReflowState::GetHypotheticalBoxContainer] with -moz-column, overflow, clear, margin, fixed pos
: Crash [@ nsHTMLReflowState::GetHypotheticalBoxContainer] with -moz-column, ov...
Status: RESOLVED WORKSFORME
[sg:critical?]
: assertion, crash, testcase, verified1.9.0.14, verified1.9.1
Product: Core
Classification: Components
Component: Layout (show other bugs)
: Trunk
: x86 All
: P2 critical (vote)
: ---
Assigned To: Robert O'Callahan (:roc) (email my personal email if necessary)
:
: Jet Villegas (:jet)
Mentors:
Depends on: 191448
Blocks: randomstyles 502017
  Show dependency treegraph
 
Reported: 2008-12-01 21:56 PST by Jesse Ruderman
Modified: 2011-06-13 10:01 PDT (History)
12 users (show)
roc: blocking1.9.1+
dveditz: blocking1.9.0.9-
dveditz: blocking1.9.0.14+
dveditz: wanted1.9.0.x+
jruderman: in‑testsuite+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
unaffected


Attachments
testcase (crashes Firefox when loaded) (240 bytes, text/html)
2008-12-01 21:56 PST, Jesse Ruderman
no flags Details
testcase 2 (412 bytes, text/html)
2009-02-11 17:30 PST, Daniel Holbert [:dholbert]
no flags Details
semi-reference case 2 (doesn't crash) (412 bytes, text/html)
2009-02-11 17:35 PST, Daniel Holbert [:dholbert]
no flags Details
fix for 1.9.0.x? (1.03 KB, patch)
2009-06-11 14:17 PDT, Daniel Holbert [:dholbert]
no flags Details | Diff | Splinter Review
testcase 3 [moved to bug 502017] (472 bytes, text/html)
2009-07-02 10:41 PDT, Daniel Holbert [:dholbert]
no flags Details

Description Jesse Ruderman 2008-12-01 21:56:10 PST
Created attachment 350921 [details]
testcase (crashes Firefox when loaded)

###!!! ASSERTION: bad height: 'Not Reached', file /Users/jruderman/central/layout/generic/nsLineLayout.cpp, line 188

###!!! ASSERTION: Placeholder relationship should have been torn down; see comments in nsPlaceholderFrame.h: '!shell->FrameManager()->GetPlaceholderFrameFor(mOutOfFlowFrame)', file /Users/jruderman/central/layout/generic/nsPlaceholderFrame.cpp, line 132

###!!! ASSERTION: Dead placeholder in placeholder map: 'entry->placeholderFrame->GetOutOfFlowFrame() != (void*)0xdddddddd', file /Users/jruderman/central/layout/base/nsFrameManager.cpp, line 137

###!!! ASSERTION: no placeholder frame: 'nsnull != placeholderFrame', file /Users/jruderman/central/layout/generic/nsHTMLReflowState.cpp, line 1140

Null deref [@ nsHTMLReflowState::GetHypotheticalBoxContainer], but the third assertion earns this bug a free [sg:critical?].

I suspect this is a recent regression, because I think I wasn't hitting this crash before last week.
Comment 1 David Baron :dbaron: ⌚️UTC-10 2008-12-16 15:36:28 PST
After poking at this for a few minutes in the debugger, my guess would be that it's a regression from bug 465928.  However, I haven't verified that, and I'm not really sure.

I started by focusing on the second assertion.  What's happening is that we're in nsPlaceholderFrame::Destroy called by nsBlockFrame::DoRemoveFrame via nsBlockReflowContext::ReflowBlock's call to DeleteNextInFlowChild.  What I don't know is through what codepath we'd normally expect nsCSSFrameConstructor::RemoveMappingsForFrameSubtree to be called in such a case.  Or are we assuming that this frame is empty, and the problem is that it actually isn't?  (That's sort of my logic for implicating bug 465928.)
Comment 2 Robert O'Callahan (:roc) (email my personal email if necessary) 2009-01-14 17:26:48 PST
I don't crash on Mac trunk. Jesse, is this WFY?
Comment 3 Daniel Holbert [:dholbert] 2009-01-14 17:34:00 PST
No crash for me either, on Linux trunk (using nightly & debug build).

FWIW, my debug build spits out a bunch of copies of these messages:
WARNING: bad value: file nsFloatManager.cpp, line 149
###!!! ASSERTION: Computed overflow area must contain frame bounds: 'aNewSize.width == 0 || aNewSize.height == 0 || aOverflowArea->Contains(nsRect(nsPoint(0, 0), aNewSize))', file nsFrame.cpp, line 5566

(Maybe the crash was fixed by the spacemanager --> floatmanager change?)
Comment 4 Jesse Ruderman 2009-01-14 20:39:22 PST
Yes, WFM.
Comment 5 Robert O'Callahan (:roc) (email my personal email if necessary) 2009-01-14 20:40:38 PST
Hmm, but is it fixed on the 1.9.1 branch?
Comment 6 Daniel Holbert [:dholbert] 2009-02-02 17:14:47 PST
No, this is *not* fixed on 1.9.1 branch. I just tried latest nightly, and it still crashes on this bug's testcase.  (No big surprise, given that the suspected fix, bug 191448, hasn't landed there.)

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b3pre) Gecko/20090202 Shiretoko/3.1b3pre
Comment 7 Daniel Holbert [:dholbert] 2009-02-02 19:35:27 PST
This crash became fixed between these mozilla-central nightly builds:

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090104 Minefield/3.2a1pre
Built from http://hg.mozilla.org/mozilla-central/rev/9f497b1505d2

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090105 Minefield/3.2a1pre
Built from http://hg.mozilla.org/mozilla-central/rev/c04e27e056ba

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=9f497b1505d2&tochange=c04e27e056ba

That window includes a number of checkins from Bug 191448, so I'm making the dependency on bug 191448 official.
Comment 8 David Baron :dbaron: ⌚️UTC-10 2009-02-02 20:08:24 PST
I don't see why bug 191448 should have changed any behavior of things outside the space manager code itself, except perhaps for dealing with very large sizes.  Such changes were unintentional, and could well indicate a bug in that change.

That said, bug 191448 hasn't had all that many regressions.  But I wouldn't be comfortable shipping it in a release without having it in a beta.

If this was fixed by bug 191448, it would be good to know why.
Comment 9 Daniel Holbert [:dholbert] 2009-02-03 10:59:24 PST
I'll fire off some builds from right around when bug 191448 landed, to check whether it's really responsible for fixing this (and if so, which of its several patches did it).
Comment 10 Daniel Holbert [:dholbert] 2009-02-03 13:30:37 PST
I've confirmed that this was fixed by bug 191448.

Here are three consecutive changesets (newest first):
changeset 496e0cb5c943 (main 191448 patch):          No crash
changeset b19f0a7a3c4c (intermediate 191448 patch):  Doesn't compile*
changeset b087a09f82aa (unrelated):                  Crash

So, this was fixed by something in these changesets:
  http://hg.mozilla.org/mozilla-central/rev/496e0cb5c943
  http://hg.mozilla.org/mozilla-central/rev/b19f0a7a3c4c 

* changeset b19f0a7a3c4c's compile error was from something complaining about nsSpaceManager not being declared (probably since it was in the process of being removed).  That issue was apparently fixed in the following checkin.
Comment 11 Daniel Holbert [:dholbert] 2009-02-11 17:30:36 PST
Created attachment 361901 [details]
testcase 2

Here's a tweaked version of testcase, with the style separated out into a <style> block, with s/margin/margin-top, and with the margin-top value reduced to be *just* at the threshold where we crash.
Comment 12 Daniel Holbert [:dholbert] 2009-02-11 17:35:01 PST
Created attachment 361905 [details]
semi-reference case 2 (doesn't crash)

Here's a semi-reference case, for comparison with testcase 2.  Its just got a margin-top value that's .000000000001 less than the testcase, which is enough to save us from crashing in 1.9.1 builds.
Comment 13 Daniel Holbert [:dholbert] 2009-02-11 17:47:22 PST
If I dump the frametrees of testcase 2 & semi-reference case #2 in a patched mozilla-central build (which doesn't crash), it looks like the margin-top value is exactly at the point where we wrap around the signed integer boundary.

Here's a cleaned-up snippet of testcase 2's frame tree
  ColumnSet(div)(0)@0xb269528c [view=0xb269b100] {0,0,35640,0}<
    Block(div)(0)@0xb26951ec {0,0,17340,0} <
      line 0xb2695c18: count=1 {0,2147483520,17340,473} <
        Block(div)(0)@0xb2695454 {0,2147483520,17340,473} <

...and here's the matching snippet from testcase 2:
  ColumnSet(div)(0)@0xb266128c [view=0xb622b6a0] {0,0,35640,0} <
    Block(div)(0)@0xb26611ec {0,0,17340,0} <
      line 0xb2661c18: count=1 {0,-2147483648,17340,473} <
        Block(div)(0)@0xb2661454 {0,-2147483648,17340,473} <
 
Notice the change from 2147483520 (2^31 - 128) in the reference case to -2147483648 (-2^31 exactly) in the test case -- that's the integer overflow, and the old code's crash is dependent on that, for some reason.
Comment 14 Daniel Holbert [:dholbert] 2009-02-11 17:49:16 PST
(In reply to comment #13)
> Here's a cleaned-up snippet of testcase 2's frame tree

D'oh, sorry, that was a typo -- the first frametree snippet above is from "semi-reference case 2".
Comment 15 Daniel Holbert [:dholbert] 2009-02-11 22:25:03 PST
Ok, so I've looked into why our behavior changed on this testcase, and I think I've got a decent summary.

IN SHORT: 
The new nsFloatManager::ClearFloats method includes an initial "HaveAnyFloats" check (with an immediate return, if we don't have floats). However, the old version in nsSpaceManager had no such check.  In situations with integer overflow in nsSpaceManager.mY, that made nsSpaceManager::ClearFloats return the wrong value, which made us set "clearanceFrame" when we shouldn't, which messed up our handling of placeholders.

IN MORE DETAIL:
(note: below I use "mXXXManager" to refer to mSpaceManager / mFloatManager)

We start off with an integer overflow in the margin-top value, at -2^31 = -214748364 app-units.  This traces all the way back to the CSS engine, with a call to nsStyleSides::GetTop (as "styleMargin->mMargin.GetTop()").

Then, during reflow, this happens:
 - We hit a call to nsBlockFrame::ReflowBlockFrame on the innermost block that I listed in the frametree snippets above.  During that, we call aState.ClearFloats, passing in 0.
 - This method calls mXXXManager::ClearFloats, again passing in 0. (Note that at this point, mXXXManager.mY is -2^31)
 - IN NEW CODE: We start out by checking "HaveAnyFloats()". We don't have any, so we just return aY, which is 0.
 - IN OLD CODE: There's no initial "HaveAnyFloats"-type check, so we plod ahead through the function.  We start out by setting "bottom" = mY == -2^31. Then, through a PR_MAX call, we end up _increasing_ bottom to be mMaximalLeftYMost, which is currently nscoord_MIN == -2^30.  Then, finally, we subtract mY = -2^31 off of bottom, which leaves us with (positive) nscoord_MAX.  We return that.

This returned value is passed back up to nsBlockFrame::ReflowBlockFrame and is stored in "clearY".  We compare that to curY, at nsBlockFrame.cpp:2915
 - IN NEW CODE: clearY == curY == 0.  (they're equal)
 - IN OLD CODE: clearY is nscoord_MAX, curY is 0. (they're **not equal**)

So in old code, that makes us evaluate the next clause, which assigns a value to aState.mReflowState.mDiscoveredClearance. (This value never gets set in new code).  This "mDiscoveredClearance" is just an alias for a local variable "clearanceFrame" up a few levels in another call to nsBlockFrame::ReflowBlockFrame.  When we set this value, it affects our execution path in that function, apparently messing up our handling of placeholder frames.

Then, the crash happens later on, because we assume the fixed-position block has a placeholder frame, when in fact it doesn't (as indicated in the final assertion in comment 0).

Note that if I simply tweak the faulty "nsSpaceManager::ClearFloats" call to make it return 0 instead of nscoord_MAX, then everything goes fine -- no crash, no assertions.
Comment 16 Daniel Holbert [:dholbert] 2009-02-11 22:26:08 PST
if it's not clear, comment 15 was in reply to this bit of comment 8:
> If this was fixed by bug 191448, it would be good to know why.
Comment 17 Daniel Holbert [:dholbert] 2009-02-11 22:40:05 PST
Conclusion from comment 15:
---------------------------
To fix this particular bug on 1.9.1, it'd probably be sufficient to just add a check with nsSpaceManager's equivalent of "HaveAnyFloats" at the beginning of the ClearFloats method, to match nsFloatManager's behavior there.  However, it might be better to just take the whole nsFloatManager patch, since it's been successfully baked on mozilla-central for a little while now. (and it also fixes the sg:crit bug 437565)

We should make a decision about point this ASAP, to make sure that whatever patch that we land for this will get beta coverage.

Side note:
----------
The root cause of this bug is that large positive margin values suddenly overflow into negative territory (INT_MIN) somewhere in the CSS engine.  It'd be much nicer (and would give much more predictable behavior) if we could cap them at INT_MAX (if not nscoord_MAX), rather than letting them take that extra step to INT_MIN.  That would probably help a lot with preventing these sorts of bugs.
Comment 18 cmtalbert 2009-02-12 09:17:05 PST
(In reply to comment #17)
> Conclusion from comment 15:
> ---------------------------
> To fix this particular bug on 1.9.1, it'd probably be sufficient to just add a
> check with nsSpaceManager's equivalent of "HaveAnyFloats" at the beginning of
> the ClearFloats method, to match nsFloatManager's behavior there.  However, it
> might be better to just take the whole nsFloatManager patch, since it's been
> successfully baked on mozilla-central for a little while now. (and it also
> fixes the sg:crit bug 437565)
> 
> We should make a decision about point this ASAP, to make sure that whatever
> patch that we land for this will get beta coverage.
> 
I checked the crash-reporter database and we don't have any user-reported crashes in there of this crash on 1.9.1.  Now, that said, if there is a known fix for this, if that fix has baked on trunk without issue for a while and if that fix also fixes a sg:crit bug, I think we should take it on the branch. This is already marked as "blocking 1.9.1" so, it shouldn't need approval, I think the patch that fixes it ought to just land on 1.9.1.
> Side note:
> ----------
> The root cause of this bug is that large positive margin values suddenly
> overflow into negative territory (INT_MIN) somewhere in the CSS engine.  It'd
> be much nicer (and would give much more predictable behavior) if we could cap
> them at INT_MAX (if not nscoord_MAX), rather than letting them take that extra
> step to INT_MIN.  That would probably help a lot with preventing these sorts of
> bugs.
That's undoubtedly a problem too, but should probably be a follow-up bug.
Comment 19 Daniel Holbert [:dholbert] 2009-02-12 10:35:27 PST
(In reply to comment #18)
> This is already marked as "blocking 1.9.1" so, it shouldn't need approval, I
> think the patch that fixes it ought to just land on 1.9.1.

I'd wanted to get dbaron's opinion first, since he wrote the patch that fixes this (in bug 191448) & he briefly commented above on its appropriateness for 1.9.1, in comment 8.

dbaron, (a) does the explanation in comment 15 make sense to you, and (b) would you be comfortable with bug 191448's patch landing on 1.9.1?
Comment 20 Daniel Holbert [:dholbert] 2009-02-13 12:07:09 PST
Excellent -- it looks like dbaron fixed this on 1.9.1 in bug 191448 comment 32.  Adding fixed1.9.1 keyword.
Comment 21 Marcia Knous [:marcia - use ni] 2009-04-22 12:35:02 PDT
Verified fixed on the 1.9.1 branch using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b4pre) Gecko/20090422 Shiretoko/3.5b4pre. Both test cases no longer crash.
Comment 22 Daniel Holbert [:dholbert] 2009-06-11 14:17:44 PDT
Created attachment 382825 [details] [diff] [review]
fix for 1.9.0.x?

(In reply to comment #15)
> The new nsFloatManager::ClearFloats method includes an initial "HaveAnyFloats"
> check (with an immediate return, if we don't have floats). However, the old
> version in nsSpaceManager had no such check.

Here's a patch that simply adds this one check.  (Note: I mistyped in the quoted bit above -- the correct function name is "HasAnyFloats", not "HaveAnyFloats")

This fixes the crash on both expected-to-crash testcases, in 1.9.0.x.

dbaron, do you think this check is safe to add on its own?
Comment 23 David Baron :dbaron: ⌚️UTC-10 2009-06-11 14:22:16 PDT
Does it fix the crash if you add an unrelated float (inside the correct block) to those expected-to-crash testcases?
Comment 24 Daniel Holbert [:dholbert] 2009-07-02 10:41:11 PDT
Created attachment 386544 [details]
testcase 3 [moved to bug 502017]

(In reply to comment #23)
> Does it fix the crash if you add an unrelated float (inside the correct block)
> to those expected-to-crash testcases?

Good question -- sadly attachment 382825 [details] [diff] [review] does *not* fix the crash on
1.9.0.x if we add a float inside the first div. (as demonstrated by this new testcase)
Comment 25 Daniel Holbert [:dholbert] 2009-07-02 10:53:25 PDT
testcase 3 is WFM on mozilla-central, but it actually crashes on 1.9.1, in both debug builds and in the stock Firefox 3.5 release! (and on 1.9.0.x, as stated in comment 24)

Testcases 1 and 2 still don't crash on 1.9.1, though.

1.9.1 crash report for testcase 3:
http://crash-stats.mozilla.com/report/pending/f7955dc3-a1ff-4e8a-a542-bdc8f2090702

Perhaps I'll file a new bug on testcase 3, since this bug here was already marked verified1.9.1 (for the first two testcases)

CRASHY 1.9.1 build:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5

WFM mozilla-central build:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090701 Minefield/3.6a1pre
Comment 26 Daniel Holbert [:dholbert] 2009-07-02 11:14:41 PDT
Comment on attachment 386544 [details]
testcase 3 [moved to bug 502017]

I've filed bug 502017 on the "testcase 3" crash in 1.9.1.
Comment 27 Daniel Holbert [:dholbert] 2009-07-10 11:24:55 PDT
The patch posted in bug 502017 fixes the crashes in all of the testcases posted here, on 1.9.0.x branch, as noted in bug 502017 comment 15.
Comment 28 Daniel Holbert [:dholbert] 2009-07-10 12:55:36 PDT
That patch just landed, in per bug 502017 comment 17. --> This is fixed1.9.0.13
Comment 29 Al Billings [:abillings] 2009-07-26 14:14:08 PDT
Verified fixed for 1.9.0.13 with my own debug build (Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.13pre) Gecko/2009072417 GranParadiso/3.0.13pr). No crashes with attached testcases.
Comment 30 Jesse Ruderman 2009-10-15 13:28:29 PDT
Crashtest: http://hg.mozilla.org/mozilla-central/rev/4da43cad0331

Note You need to log in before you can comment on or make changes to this bug.