Last Comment Bug 437565 - Crash [@ nsSpaceManager::GetPrevBand] with floats after nscoord_MAX of height
: Crash [@ nsSpaceManager::GetPrevBand] with floats after nscoord_MAX of height
Status: VERIFIED FIXED
[sg:critical?][fixed by 191448 (float...
: assertion, crash, testcase, verified1.9.0.14, verified1.9.1
Product: Core
Classification: Components
Component: Layout: Floats (show other bugs)
: 1.9.1 Branch
: All All
: -- critical (vote)
: ---
Assigned To: Daniel Holbert [:dholbert]
:
Mentors:
: 437566 (view as bug list)
Depends on: 191448
Blocks: randomstyles 344905
  Show dependency treegraph
 
Reported: 2008-06-05 22:21 PDT by Jesse Ruderman
Modified: 2011-06-13 10:01 PDT (History)
13 users (show)
roc: wanted1.9.1+
dveditz: blocking1.9.0.9-
dveditz: blocking1.9.0.14+
dveditz: wanted1.9.0.x+
jruderman: in‑testsuite+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
unaffected


Attachments
testcase (crashes Firefox when loaded) (554 bytes, application/xhtml+xml)
2008-06-05 22:21 PDT, Jesse Ruderman
no flags Details
backtrace of Linux crash (5.75 KB, text/plain)
2008-08-14 16:52 PDT, Daniel Holbert [:dholbert]
no flags Details
testcase 2 (crashes firefox when loaded) (813 bytes, application/xhtml+xml)
2008-10-03 13:09 PDT, Daniel Holbert [:dholbert]
no flags Details
testcase 3 (crashes 1.9.0 & 1.9.1 debug builds with skinny viewports) (742 bytes, application/xhtml+xml)
2009-02-11 11:41 PST, Daniel Holbert [:dholbert]
no flags Details
band-aid v1 (1.11 KB, patch)
2009-08-04 16:51 PDT, Daniel Holbert [:dholbert]
dbaron: review+
Details | Diff | Splinter Review
band-aid v1b (2.03 KB, patch)
2009-08-10 15:52 PDT, Daniel Holbert [:dholbert]
dholbert: review+
samuel.sidler+old: approval1.9.0.14+
Details | Diff | Splinter Review

Description Jesse Ruderman 2008-06-05 22:21:43 PDT
Created attachment 323993 [details]
testcase (crashes Firefox when loaded)

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1a1pre) Gecko/2008053113 Minefield/3.1a1pre

Steps to reproduce:
1. Load the testcase in a debug build of Firefox.

Result:

###!!! ASSERTION: aBandRect should be first rect within its band: '!aBandRect || aBandRect == mBandList.Head() || aBandRect->Prev()->mBottom != aBandRect->mBottom', file /Users/jruderman/central/layout/generic/nsSpaceManager.h, line 475

Crash dereferencing 0xc0000017 (note the first digit -- this is not a null deref!)

The assertion also occurs in bug 410232, but that bug has a different testcase, no crash, and is public.
Comment 1 Jesse Ruderman 2008-06-05 22:22:24 PDT
*** Bug 437566 has been marked as a duplicate of this bug. ***
Comment 2 Daniel Holbert [:dholbert] 2008-08-14 16:32:11 PDT
Crashes my Linux debug build, too. (updated from mozilla-central yesterday)
Comment 3 Daniel Holbert [:dholbert] 2008-08-14 16:52:46 PDT
Created attachment 333851 [details]
backtrace of Linux crash

I'm attaching the backtrace of my Linux crash.  It's crashing on this line:
http://mxr.mozilla.org/seamonkey/source/layout/generic/nsSpaceManager.cpp#435
  434  NS_ASSERTION(aBandRect->Prev() == &mBandList ||
  435               aBandRect->Prev()->mBottom <= aBandRect->mTop,
  436               "aBandRect should be first rect within its band");
The crash happens because we look up  aBandRect->Prev()->mBottom, and  aBandRect->Prev() is either 0x1 or 0x5a5a5a5a.  

I'm not actually sure if it's 0x5a5a5a5a or 0x1 -- if I break at the crash in GDB and I do "p aBandRect->prev", then it displays 0x5a5a5a5a.  But then if I do "p aBandRect->Prev()", I get 0x1.  And that call to Prev() seems to have a lasting effect on 'prev', because from that point on, "p aBandRect->prev" displays 0x1 instead of 0x5a5a5a5a. Weird.
Comment 4 Daniel Holbert [:dholbert] 2008-08-14 16:54:29 PDT
(In reply to comment #0)
> ###!!! ASSERTION: aBandRect should be first rect within its band [...]

FWIW, I get two copies of this assertion before the crash.  Just before those assertions, I get these messages:

nsLineLayout: SVGOuterSVG(svg)(0)@0xb0d0797c metrics=53280,1073741824!
nsBlockReflowContext: Block(span)(0)@0xb0d07b0c metrics=53280,1073742064!
nsBlockReflowContext: Area(div)(7)@0xb0d07720 metrics=53280,1073744344!
Comment 5 Daniel Holbert [:dholbert] 2008-10-03 13:09:22 PDT
Created attachment 341661 [details]
testcase 2 (crashes firefox when loaded)

Here's a somewhat simplified testcase.

FWIW, the only function of the <svg style="display: table-row;"> element in the original testcase was to insert a huge height. (see bug 410233 -- that particular style on <svg> makes it about nscoord_MAX tall)

So, for simplicity, this testcase has replaced that <svg> with a <div> whose height is close to nscoord_MAX app-units.
Comment 6 Jesse Ruderman 2009-01-06 15:03:47 PST
No crash now that nsSpaceManager has been replaced by nsFloatManager.

The first testcase triggers the "bad value" assertion in bug 472218, which is about to be turned into a warning.  But the second testcase also triggers the assertion in bug 439204, "We placed a float where there was no room".
Comment 7 Boris Zbarsky [:bz] 2009-01-06 15:07:09 PST
Need a crashtest.
Comment 8 Daniel Veditz [:dveditz] 2009-01-20 22:53:59 PST
Still crashes 1.9.0.6pre, we need to figure out what fixed this.
Comment 9 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2009-01-20 23:03:00 PST
In comment 6 Jesse might have been implying it was bug 191448.
Comment 10 Daniel Veditz [:dveditz] 2009-01-20 23:43:46 PST
bug 191448 hasn't landed on 1.9.1 and doesn't appear likely to based on lack of flags and the size of the patch, but this is a blocking1.9.1+ security bug (and Shiretoko still dies horribly).

Is there an alternate small band-aid fix for the 1.9.0/1.9.1 branches? I'm reopening so we don't lose track of this one.
Comment 11 Daniel Veditz [:dveditz] 2009-01-21 15:17:27 PST
clearing 1.9.0.7 flag for now, we'll wait for a 1.9.1 patch first.
Comment 12 chris hofmann 2009-01-23 16:40:03 PST
How hard might this be to get this to the 1.9.1 branch?
Comment 13 Jesse Ruderman 2009-01-23 16:41:29 PST
(e.g. by figuring out what in the float manager refactoring fixed this, or even porting that refactoring to 1.9.1, given that it caused so few regressions)
Comment 14 Daniel Holbert [:dholbert] 2009-02-02 19:39:19 PST
FWIW: In addition to fixing this bug here, bug 191448 seems to have fixed bug 467493, which is [sg:crititcal?] and blocking1.9.1+.

dbaron, is there any chance that bug 191448's patch would be appropriate for backporting to 1.9.1? (and to 1.9.0.x?)
Comment 15 Daniel Holbert [:dholbert] 2009-02-03 13:36:42 PST
Like Bug 467493, this bug's testcase crashes in changeset b087a09f82aa, but is fixed in changeset 496e0cb5c943.  (with the intermediate changeset b19f0a7a3c4c not compiling, as mentioned in Bug 467493 comment 10).

So, this was definitely fixed by bug 191448.
Comment 16 Daniel Holbert [:dholbert] 2009-02-11 11:41:43 PST
Created attachment 361806 [details]
testcase 3 (crashes 1.9.0 & 1.9.1 debug builds with skinny viewports)

Here's a modified version of testcase 2.  It's sized such that it loads fine in the layout debugger, but reducing the width by one 'click' (via alt-F8, rightarrow, leftarrow) is enough to trigger the crash.

To try to answer dbaron's question in related bug 467493 comment 8 ("If this was fixed by bug 191448, it would be good to know why"), I've spent some time looking into specifically which of the nsFloatManager changes made us start working on this this testcase in mozilla-central -- details to follow.
Comment 17 Daniel Holbert [:dholbert] 2009-02-11 12:25:01 PST
(In reply to comment #16)
> To try to answer dbaron's question in related bug 467493 comment 8 ("If this
> was fixed by bug 191448, it would be good to know why"), I've spent some time
> looking into specifically which of the nsFloatManager changes made us start
> working on this this testcase in mozilla-central -- details to follow.

(NOTE: When I compare "old code" vs "new code" below, I'm talking about code before bug 191448 landed vs. code after it landed.)

Ok -- in old code, we crash somewhere within this call, in nsBlockReflowState::FlowAndPlaceFloat:
  947    mSpaceManager->AddRectRegion(floatFrame, region);
http://hg.mozilla.org/mozilla-central/annotate/b087a09f82aa/layout/generic/nsBlockReflowState.cpp#l947

At that point, region.y is nscoord_MAX (because floatY is nscoord_MAX, because mY is nscoord_MAX).

If I trace execution back a bit to where mY became nscoord_MAX, I arrive here:
   821  while (!CanPlaceFloat(floatSize, floatDisplay->mFloats, aForceFit)) {
   [snip]
   832      mY += mAvailSpaceRect.height;
   833      GetAvailableSpace(mY, aForceFit);
http://hg.mozilla.org/mozilla-central/annotate/b087a09f82aa/layout/generic/nsBlockReflowState.cpp#l821

Here, mY starts out as 0, but mAvailSpaceRect.height is nscoord_MAX.  So we end up with an mY = nscoord_MAX.

This is where new code's behavior branches off from the old code's behavior, though. The above-quoted call to GetAvailableSpace() will end up resetting mBand.mLeftFloats (which is replaced with "mBandHasFloats" in new code).  In old code, we set mLeftFloats to 0, ***but in new code, we set mBandHasFloats to PR_TRUE***.  This difference affects our behavior in the next "CanPlaceFloat" check in the "while" condition -- that function returns true in old code (terminating the loop), but returns false in new code (making us continue looping).  Also -- if I use GDB to force this particular CanPlaceFloat call to change its behavior & return false in old code, we don't crash.

So -- why do we set mBand.mLeftFloats = 0 in old code?  In nsBlockBandData::ComputeAvailSpaceRect, it looks like we increment a temporary variable "leftFloats" for each entry in a frame list.  However, in this particular case, that frame list (trapezoid->mFrames == mTrapezoids[0]->mFrames) is a null pointer, so we never get a chance to increment "leftFloats".  So mLeftFloats end up as 0.
http://hg.mozilla.org/mozilla-central/annotate/b087a09f82aa/layout/generic/nsBlockBandData.cpp#l186

In new code, on the other hand, mBandHasFloats is set in nsFloatManager::GetBand -- it's passed in by reference as aHasFloats.  In that function, we do bounds-overlapping-comparisons instead of iterating across a frame list. We end up turning on the temp variable "haveFloats" here:
   202      // This float is in our band.
   203      haveFloats = PR_TRUE;
http://hg.mozilla.org/mozilla-central/annotate/496e0cb5c943/layout/generic/nsFloatManager.cpp#l203

So, now I'm wondering why mTrapezoids[0]->mFrames was null in old code.  I'll look into that after lunch.
Comment 18 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2009-02-11 12:30:53 PST
(In reply to comment #16)
> To try to answer dbaron's question in related bug 467493 comment 8 ("If this
> was fixed by bug 191448, it would be good to know why"), I've spent some time
> looking into specifically which of the nsFloatManager changes made us start
> working on this this testcase in mozilla-central -- details to follow.

I'm not particularly interested in that question for *this* bug, only for *that* bug.  In this bug, I was intentionally careful about integer overflows when writing nsFloatManager, and this is a bug on a crash *inside* the space manager.  The other bug is a crash in code outside of it.
Comment 19 Daniel Holbert [:dholbert] 2009-02-11 13:14:12 PST
ah, ok. :)  I won't look into this here much more, then.

FWIW, one last thing here -- the new code actually ends up with an integer overflow on testcase 3, in a place where the old code didn't.  Since CanPlaceFloat returns false now (in the situation from comment 18), we hit that "mY += mAvailSpaceRect.height;" line a second time, and so now we end up with mY = nscoord_MAX + nscoord_MAX = -2147483648 = -2^31.  This ends up triggering the NS_WARNING("bad value") at nsFloatManager.cpp:149, though it doesn't seem to cause anything else bad.  Perhaps that "mY += mAvailSpaceRect.height" line should use NSCoordSaturatingAdd, to be safe(r)...?
Comment 20 Daniel Holbert [:dholbert] 2009-02-13 12:07:52 PST
Excellent -- it looks like dbaron fixed this on 1.9.1 in bug 191448 comment 32.  Adding fixed1.9.1 keyword.
Comment 21 Daniel Holbert [:dholbert] 2009-02-13 12:09:43 PST
(In reply to comment #20)
>  Adding fixed1.9.1 keyword.
... and resolving as fixed by bug 191448.
Comment 22 Tony Chung [:tchung] 2009-02-26 12:46:23 PST
Verified fix on Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3pre) Gecko/20090226 Shiretoko/3.1b3pre Ubiquity/0.1.5 
and Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090226 Minefield/3.2a1pre
Comment 23 Daniel Veditz [:dveditz] 2009-03-09 11:36:03 PDT
Although we'd like a 1.9.0 fix, bug 191448 seems out of scope. Is a smaller branch band-aide possible? Is bug 191448 less risky than it looks?
Comment 24 Daniel Veditz [:dveditz] 2009-07-10 11:16:02 PDT
we're not going to take bug 191448 on 1.9.0.

Is a smaller branch band-aide possible?
Comment 25 Daniel Holbert [:dholbert] 2009-08-04 16:51:05 PDT
Created attachment 392622 [details] [diff] [review]
band-aid v1

Here's one band-aid for this.  It jumps in at the spot where we're about to delete the BandRect that we later end up crashing on, and it clears the reference that we shouldn't be retaining to it.

This patch fixes the crash (and triggers the added warning) on all three testcases here.
Comment 26 Daniel Holbert [:dholbert] 2009-08-04 16:56:57 PDT
Comment on attachment 392622 [details] [diff] [review]
band-aid v1

dbaron -- what do you think of this as a band-aid fix?

Pros:
 - Fairly targeted change.
 - Only changes behavior when we're already in a clearly-bad state, AFAICT (i.e. it won't affect behavior of normal web sites or expose us to new badness)
 - The change to behavior is minimal -- just clears a cached pointer. (moreover, it's a pointer to stuff that's about to be deleted -- so, we do *not* want to be using it anymore anyway)
Comment 27 Daniel Holbert [:dholbert] 2009-08-04 17:36:20 PDT
(In reply to comment #17)
> Ok -- in old code, we crash somewhere within this call, in
> nsBlockReflowState::FlowAndPlaceFloat:
>   947    mSpaceManager->AddRectRegion(floatFrame, region);

BTW -- I'm pretty sure that in comment 17, I misspoke -- I think I meant to say "we first *assert* somewhere within this call" (not that we *crash* within that call).

I say this because, when testing a pre-FloatManager build (from changeset b087a09f82aa) as well as a fresh CVS build, I'm noticing that I do hit the assertion within that "FlowAndPlaceFloat" function call, but the crash doesn't happen until later.

Anyway -- the lowest chunk of the backtrace, when we actually crash, is:
> #0  0xb5b7487c in nsSpaceManager::GetPrevBand at /mozilla/layout/generic/nsSpaceManager.cpp:434
> #1  0xb5b749b1 in nsSpaceManager::GuessBandWithTopAbove at /mozilla/layout/generic/nsSpaceManager.cpp:1335
> #2  0xb5b7522b in nsSpaceManager::GetBandData (this=0xb2fd79c0, aYOffset=0, aMaxSize=@0xbfde7e2c, aBandData=@0xbfde8700) at /mozilla/layout/generic/nsSpaceManager.cpp:372
> #3  0xb5acc671 in nsBlockBandData::GetBandData (this=0xbfde8700, aY=0, aRelaxHeightConstraint=0) at /mozilla/layout/generic/nsBlockBandData.cpp:117
> #4  0xb5acc89f in nsBlockBandData::GetAvailableSpace (this=0xbfde8700, aY=0, aRelaxHeightConstraint=0, aResult=@0xbfde86c8) at /mozilla/layout/generic/nsBlockBandData.cpp:87
> #5  0xb5ae4885 in nsBlockReflowState::GetAvailableSpace (this=0xbfde866c, aY=0, aRelaxHeightConstraint=0) at /mozilla/layout/generic/nsBlockReflowState.cpp:388
> #6  0xb5ae2af5 in nsBlockReflowState::GetAvailableSpace (this=0xbfde866c) at /mozilla/layout/generic/nsBlockReflowState.h:87
> ...

The crashing line is the first line of GetPrevBand (434 in debug builds, and presumably line 438 in optimized builds):
431 nsSpaceManager::BandRect*
432 nsSpaceManager::GetPrevBand(const BandRect* aBandRect) const
433 {
434   NS_ASSERTION(aBandRect->Prev() == &mBandList ||
...
438   BandRect* prev = aBandRect->Prev();

We crash because |aBandRect| is a pointer to a deleted object.

Going up one level to "GuessBandWithTopAbove", we see that |aBandRect| taken from |mCachedBandPosition|.

The band-aid patch prevents the crash by clearing mCachedBandPosition when the structure it points to is deleted.
Comment 28 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2009-08-09 08:04:21 PDT
Comment on attachment 392622 [details] [diff] [review]
band-aid v1

>+            } else if (mCachedBandPosition == rect) {
>+              // Clear cached reference to this rect before we delete it.
>+              NS_WARNING("Uh oh -- we're about to delete a BandRect "
>+                         "while retaining a cached pointer to it. "
>+                         "Clearing cached pointer to prevent badness...");
>+              SetCachedBandPosition(nsnull);
>             }

I don't think you need the NS_WARNING; the comment and the code should be enough.

I also think you can get away with setting the cache to something useful; in particular, we know that |band| is still valid, so you should be able to call SetCachedBandPosition(band);



However, I think you actually need to make the same change in a second place:  where we delete |prevRect| just below; we can have a pretty-much-identical else to that if (prevRect == band).


With those changes, this looks good to me.
Comment 29 David Baron :dbaron: ⌚️UTC+8 (review requests must explain patch) 2009-08-09 08:08:03 PDT
Given the fix, it looks like the underlying bug is a regression from bug 270392.  (If we remember, we can mark the dependency after the patch lands.)

This means that 1.9.0.* is the only branch on which it is currently present; it hadn't been introduced by 1.8.0* or 1.8.1.*, and the code was removed on 1.9.1.* and later.
Comment 30 Daniel Holbert [:dholbert] 2009-08-10 15:52:58 PDT
Created attachment 393645 [details] [diff] [review]
band-aid v1b

This version addresses review comments from comment 28, in a slightly roundabout way.

dbaron suggested calling SetCachedBandPosition(band).  There's actually a call like that in the code already, at the end of the clause immediately before my previous patch's added "else if".

So instead of changing the contents of my "else if" block, this updated patch just moves the already-existing |SetCachedBandPosition(band)| line outside of the block that previously encompassed it. (This has the same effect as applying dbaron's suggestion to my previous patch.)

I'm doing the same thing below for "prevRect", per dbaron's suggestion.

Cleared the above with dbaron over IRC -- carrying forward r+, and requesting approval.
Comment 31 Daniel Holbert [:dholbert] 2009-08-10 15:59:41 PDT
I've verified that the updated band-aid (v1b) still fixes the crash on all posted testcases.  (FWIW, neither of the band-aids fix the assertion mentioned in comment 0, though -- they only fix the crash.)
Comment 32 Daniel Holbert [:dholbert] 2009-08-10 16:10:47 PDT
(In reply to comment #29)
> Given the fix, it looks like the underlying bug is a regression from bug
> 270392.  (If we remember, we can mark the dependency after the patch lands.)

Nope -- the crash here doesn't seem to be a regression from that bug's patch.  I just tested the nightly from the day after it landed...
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a2pre) Gecko/20070124 Minefield/3.0a2pre
...and it doesn't crash on any of this bug's testcases.

> This means that 1.9.0.* is the only branch on which it is currently present

This part should still be true, though.  Even if this crash wasn't strictly a regression from bug 270392's patch, it definitely depends on a cached pointer introduced in that patch.
Comment 33 Samuel Sidler (old account; do not CC) 2009-08-10 16:44:52 PDT
Comment on attachment 393645 [details] [diff] [review]
band-aid v1b

Approved for 1.9.0.14. a=ss for release-drivers
Comment 34 Daniel Holbert [:dholbert] 2009-08-10 17:05:57 PDT
Band-aid v1b committed to CVS:
Checking in layout/generic/nsSpaceManager.cpp;
/cvsroot/mozilla/layout/generic/nsSpaceManager.cpp,v  <--  nsSpaceManager.cpp
new revision: 3.92; previous revision: 3.91
done

  --> fixed1.9.0.14
Comment 35 Al Billings [:abillings] 2009-08-18 15:46:40 PDT
Verified fixed in my debug build of 1.9.0.14 synched from 8/12. The testcases crash in my 1.9.0.13 debug build.
Comment 36 Jesse Ruderman 2009-10-15 13:28:23 PDT
Crashtest: http://hg.mozilla.org/mozilla-central/rev/4da43cad0331

Note You need to log in before you can comment on or make changes to this bug.