Closed Bug 467493 Opened 16 years ago Closed 16 years ago

Crash [@ nsHTMLReflowState::GetHypotheticalBoxContainer] with -moz-column, overflow, clear, margin, fixed pos

Categories

(Core :: Layout, defect, P2)

x86
All
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
status1.9.1 --- unaffected

People

(Reporter: jruderman, Assigned: roc)

References

Details

(5 keywords, Whiteboard: [sg:critical?])

Crash Data

Attachments

(4 files, 1 obsolete file)

###!!! ASSERTION: bad height: 'Not Reached', file /Users/jruderman/central/layout/generic/nsLineLayout.cpp, line 188

###!!! ASSERTION: Placeholder relationship should have been torn down; see comments in nsPlaceholderFrame.h: '!shell->FrameManager()->GetPlaceholderFrameFor(mOutOfFlowFrame)', file /Users/jruderman/central/layout/generic/nsPlaceholderFrame.cpp, line 132

###!!! ASSERTION: Dead placeholder in placeholder map: 'entry->placeholderFrame->GetOutOfFlowFrame() != (void*)0xdddddddd', file /Users/jruderman/central/layout/base/nsFrameManager.cpp, line 137

###!!! ASSERTION: no placeholder frame: 'nsnull != placeholderFrame', file /Users/jruderman/central/layout/generic/nsHTMLReflowState.cpp, line 1140

Null deref [@ nsHTMLReflowState::GetHypotheticalBoxContainer], but the third assertion earns this bug a free [sg:critical?].

I suspect this is a recent regression, because I think I wasn't hitting this crash before last week.
Whiteboard: [sg:critical?]
Flags: blocking1.9.1?
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P2
After poking at this for a few minutes in the debugger, my guess would be that it's a regression from bug 465928.  However, I haven't verified that, and I'm not really sure.

I started by focusing on the second assertion.  What's happening is that we're in nsPlaceholderFrame::Destroy called by nsBlockFrame::DoRemoveFrame via nsBlockReflowContext::ReflowBlock's call to DeleteNextInFlowChild.  What I don't know is through what codepath we'd normally expect nsCSSFrameConstructor::RemoveMappingsForFrameSubtree to be called in such a case.  Or are we assuming that this frame is empty, and the problem is that it actually isn't?  (That's sort of my logic for implicating bug 465928.)
Assignee: nobody → roc
I don't crash on Mac trunk. Jesse, is this WFY?
No crash for me either, on Linux trunk (using nightly & debug build).

FWIW, my debug build spits out a bunch of copies of these messages:
WARNING: bad value: file nsFloatManager.cpp, line 149
###!!! ASSERTION: Computed overflow area must contain frame bounds: 'aNewSize.width == 0 || aNewSize.height == 0 || aOverflowArea->Contains(nsRect(nsPoint(0, 0), aNewSize))', file nsFrame.cpp, line 5566

(Maybe the crash was fixed by the spacemanager --> floatmanager change?)
Yes, WFM.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WORKSFORME
Hmm, but is it fixed on the 1.9.1 branch?
No, this is *not* fixed on 1.9.1 branch. I just tried latest nightly, and it still crashes on this bug's testcase.  (No big surprise, given that the suspected fix, bug 191448, hasn't landed there.)

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b3pre) Gecko/20090202 Shiretoko/3.1b3pre
OS: Mac OS X → All
Whiteboard: [sg:critical?] → [sg:critical?][maybe fixed by 191448 on 1.9.2]
This crash became fixed between these mozilla-central nightly builds:

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090104 Minefield/3.2a1pre
Built from http://hg.mozilla.org/mozilla-central/rev/9f497b1505d2

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090105 Minefield/3.2a1pre
Built from http://hg.mozilla.org/mozilla-central/rev/c04e27e056ba

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=9f497b1505d2&tochange=c04e27e056ba

That window includes a number of checkins from Bug 191448, so I'm making the dependency on bug 191448 official.
Depends on: 191448
I don't see why bug 191448 should have changed any behavior of things outside the space manager code itself, except perhaps for dealing with very large sizes.  Such changes were unintentional, and could well indicate a bug in that change.

That said, bug 191448 hasn't had all that many regressions.  But I wouldn't be comfortable shipping it in a release without having it in a beta.

If this was fixed by bug 191448, it would be good to know why.
I'll fire off some builds from right around when bug 191448 landed, to check whether it's really responsible for fixing this (and if so, which of its several patches did it).
I've confirmed that this was fixed by bug 191448.

Here are three consecutive changesets (newest first):
changeset 496e0cb5c943 (main 191448 patch):          No crash
changeset b19f0a7a3c4c (intermediate 191448 patch):  Doesn't compile*
changeset b087a09f82aa (unrelated):                  Crash

So, this was fixed by something in these changesets:
  http://hg.mozilla.org/mozilla-central/rev/496e0cb5c943
  http://hg.mozilla.org/mozilla-central/rev/b19f0a7a3c4c 

* changeset b19f0a7a3c4c's compile error was from something complaining about nsSpaceManager not being declared (probably since it was in the process of being removed).  That issue was apparently fixed in the following checkin.
Whiteboard: [sg:critical?][maybe fixed by 191448 on 1.9.2] → [sg:critical?][fixed by 191448 on 1.9.2]
Attached file testcase 2
Here's a tweaked version of testcase, with the style separated out into a <style> block, with s/margin/margin-top, and with the margin-top value reduced to be *just* at the threshold where we crash.
Here's a semi-reference case, for comparison with testcase 2.  Its just got a margin-top value that's .000000000001 less than the testcase, which is enough to save us from crashing in 1.9.1 builds.
If I dump the frametrees of testcase 2 & semi-reference case #2 in a patched mozilla-central build (which doesn't crash), it looks like the margin-top value is exactly at the point where we wrap around the signed integer boundary.

Here's a cleaned-up snippet of testcase 2's frame tree
  ColumnSet(div)(0)@0xb269528c [view=0xb269b100] {0,0,35640,0}<
    Block(div)(0)@0xb26951ec {0,0,17340,0} <
      line 0xb2695c18: count=1 {0,2147483520,17340,473} <
        Block(div)(0)@0xb2695454 {0,2147483520,17340,473} <

...and here's the matching snippet from testcase 2:
  ColumnSet(div)(0)@0xb266128c [view=0xb622b6a0] {0,0,35640,0} <
    Block(div)(0)@0xb26611ec {0,0,17340,0} <
      line 0xb2661c18: count=1 {0,-2147483648,17340,473} <
        Block(div)(0)@0xb2661454 {0,-2147483648,17340,473} <
 
Notice the change from 2147483520 (2^31 - 128) in the reference case to -2147483648 (-2^31 exactly) in the test case -- that's the integer overflow, and the old code's crash is dependent on that, for some reason.
(In reply to comment #13)
> Here's a cleaned-up snippet of testcase 2's frame tree

D'oh, sorry, that was a typo -- the first frametree snippet above is from "semi-reference case 2".
Ok, so I've looked into why our behavior changed on this testcase, and I think I've got a decent summary.

IN SHORT: 
The new nsFloatManager::ClearFloats method includes an initial "HaveAnyFloats" check (with an immediate return, if we don't have floats). However, the old version in nsSpaceManager had no such check.  In situations with integer overflow in nsSpaceManager.mY, that made nsSpaceManager::ClearFloats return the wrong value, which made us set "clearanceFrame" when we shouldn't, which messed up our handling of placeholders.

IN MORE DETAIL:
(note: below I use "mXXXManager" to refer to mSpaceManager / mFloatManager)

We start off with an integer overflow in the margin-top value, at -2^31 = -214748364 app-units.  This traces all the way back to the CSS engine, with a call to nsStyleSides::GetTop (as "styleMargin->mMargin.GetTop()").

Then, during reflow, this happens:
 - We hit a call to nsBlockFrame::ReflowBlockFrame on the innermost block that I listed in the frametree snippets above.  During that, we call aState.ClearFloats, passing in 0.
 - This method calls mXXXManager::ClearFloats, again passing in 0. (Note that at this point, mXXXManager.mY is -2^31)
 - IN NEW CODE: We start out by checking "HaveAnyFloats()". We don't have any, so we just return aY, which is 0.
 - IN OLD CODE: There's no initial "HaveAnyFloats"-type check, so we plod ahead through the function.  We start out by setting "bottom" = mY == -2^31. Then, through a PR_MAX call, we end up _increasing_ bottom to be mMaximalLeftYMost, which is currently nscoord_MIN == -2^30.  Then, finally, we subtract mY = -2^31 off of bottom, which leaves us with (positive) nscoord_MAX.  We return that.

This returned value is passed back up to nsBlockFrame::ReflowBlockFrame and is stored in "clearY".  We compare that to curY, at nsBlockFrame.cpp:2915
 - IN NEW CODE: clearY == curY == 0.  (they're equal)
 - IN OLD CODE: clearY is nscoord_MAX, curY is 0. (they're **not equal**)

So in old code, that makes us evaluate the next clause, which assigns a value to aState.mReflowState.mDiscoveredClearance. (This value never gets set in new code).  This "mDiscoveredClearance" is just an alias for a local variable "clearanceFrame" up a few levels in another call to nsBlockFrame::ReflowBlockFrame.  When we set this value, it affects our execution path in that function, apparently messing up our handling of placeholder frames.

Then, the crash happens later on, because we assume the fixed-position block has a placeholder frame, when in fact it doesn't (as indicated in the final assertion in comment 0).

Note that if I simply tweak the faulty "nsSpaceManager::ClearFloats" call to make it return 0 instead of nscoord_MAX, then everything goes fine -- no crash, no assertions.
if it's not clear, comment 15 was in reply to this bit of comment 8:
> If this was fixed by bug 191448, it would be good to know why.
Conclusion from comment 15:
---------------------------
To fix this particular bug on 1.9.1, it'd probably be sufficient to just add a check with nsSpaceManager's equivalent of "HaveAnyFloats" at the beginning of the ClearFloats method, to match nsFloatManager's behavior there.  However, it might be better to just take the whole nsFloatManager patch, since it's been successfully baked on mozilla-central for a little while now. (and it also fixes the sg:crit bug 437565)

We should make a decision about point this ASAP, to make sure that whatever patch that we land for this will get beta coverage.

Side note:
----------
The root cause of this bug is that large positive margin values suddenly overflow into negative territory (INT_MIN) somewhere in the CSS engine.  It'd be much nicer (and would give much more predictable behavior) if we could cap them at INT_MAX (if not nscoord_MAX), rather than letting them take that extra step to INT_MIN.  That would probably help a lot with preventing these sorts of bugs.
(In reply to comment #17)
> Conclusion from comment 15:
> ---------------------------
> To fix this particular bug on 1.9.1, it'd probably be sufficient to just add a
> check with nsSpaceManager's equivalent of "HaveAnyFloats" at the beginning of
> the ClearFloats method, to match nsFloatManager's behavior there.  However, it
> might be better to just take the whole nsFloatManager patch, since it's been
> successfully baked on mozilla-central for a little while now. (and it also
> fixes the sg:crit bug 437565)
> 
> We should make a decision about point this ASAP, to make sure that whatever
> patch that we land for this will get beta coverage.
> 
I checked the crash-reporter database and we don't have any user-reported crashes in there of this crash on 1.9.1.  Now, that said, if there is a known fix for this, if that fix has baked on trunk without issue for a while and if that fix also fixes a sg:crit bug, I think we should take it on the branch. This is already marked as "blocking 1.9.1" so, it shouldn't need approval, I think the patch that fixes it ought to just land on 1.9.1.
> Side note:
> ----------
> The root cause of this bug is that large positive margin values suddenly
> overflow into negative territory (INT_MIN) somewhere in the CSS engine.  It'd
> be much nicer (and would give much more predictable behavior) if we could cap
> them at INT_MAX (if not nscoord_MAX), rather than letting them take that extra
> step to INT_MIN.  That would probably help a lot with preventing these sorts of
> bugs.
That's undoubtedly a problem too, but should probably be a follow-up bug.
(In reply to comment #18)
> This is already marked as "blocking 1.9.1" so, it shouldn't need approval, I
> think the patch that fixes it ought to just land on 1.9.1.

I'd wanted to get dbaron's opinion first, since he wrote the patch that fixes this (in bug 191448) & he briefly commented above on its appropriateness for 1.9.1, in comment 8.

dbaron, (a) does the explanation in comment 15 make sense to you, and (b) would you be comfortable with bug 191448's patch landing on 1.9.1?
Excellent -- it looks like dbaron fixed this on 1.9.1 in bug 191448 comment 32.  Adding fixed1.9.1 keyword.
Keywords: fixed1.9.1
Whiteboard: [sg:critical?][fixed by 191448 on 1.9.2] → [sg:critical?]
Flags: in-testsuite?
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.8-
Verified fixed on the 1.9.1 branch using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b4pre) Gecko/20090422 Shiretoko/3.5b4pre. Both test cases no longer crash.
Attached patch fix for 1.9.0.x? (obsolete) — Splinter Review
(In reply to comment #15)
> The new nsFloatManager::ClearFloats method includes an initial "HaveAnyFloats"
> check (with an immediate return, if we don't have floats). However, the old
> version in nsSpaceManager had no such check.

Here's a patch that simply adds this one check.  (Note: I mistyped in the quoted bit above -- the correct function name is "HasAnyFloats", not "HaveAnyFloats")

This fixes the crash on both expected-to-crash testcases, in 1.9.0.x.

dbaron, do you think this check is safe to add on its own?
Attachment #382825 - Flags: review?(dbaron)
Does it fix the crash if you add an unrelated float (inside the correct block) to those expected-to-crash testcases?
Flags: blocking1.9.0.13?
(In reply to comment #23)
> Does it fix the crash if you add an unrelated float (inside the correct block)
> to those expected-to-crash testcases?

Good question -- sadly attachment 382825 [details] [diff] [review] does *not* fix the crash on
1.9.0.x if we add a float inside the first div. (as demonstrated by this new testcase)
Attachment #382825 - Attachment is obsolete: true
Attachment #382825 - Flags: review?(dbaron)
testcase 3 is WFM on mozilla-central, but it actually crashes on 1.9.1, in both debug builds and in the stock Firefox 3.5 release! (and on 1.9.0.x, as stated in comment 24)

Testcases 1 and 2 still don't crash on 1.9.1, though.

1.9.1 crash report for testcase 3:
http://crash-stats.mozilla.com/report/pending/f7955dc3-a1ff-4e8a-a542-bdc8f2090702

Perhaps I'll file a new bug on testcase 3, since this bug here was already marked verified1.9.1 (for the first two testcases)

CRASHY 1.9.1 build:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5

WFM mozilla-central build:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090701 Minefield/3.6a1pre
Blocks: 502017
Comment on attachment 386544 [details]
testcase 3 [moved to bug 502017]

I've filed bug 502017 on the "testcase 3" crash in 1.9.1.
Attachment #386544 - Attachment description: testcase 3 → testcase 3 [moved to bug 502017]
Flags: blocking1.9.0.13?
Flags: blocking1.9.0.13+
The patch posted in bug 502017 fixes the crashes in all of the testcases posted here, on 1.9.0.x branch, as noted in bug 502017 comment 15.
That patch just landed, in per bug 502017 comment 17. --> This is fixed1.9.0.13
Keywords: fixed1.9.0.13
Verified fixed for 1.9.0.13 with my own debug build (Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.13pre) Gecko/2009072417 GranParadiso/3.0.13pr). No crashes with attached testcases.
Group: core-security
Crashtest: http://hg.mozilla.org/mozilla-central/rev/4da43cad0331
Flags: in-testsuite? → in-testsuite+
Crash Signature: [@ nsHTMLReflowState::GetHypotheticalBoxContainer]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: