Closed Bug 437565 Opened 16 years ago Closed 15 years ago

Crash [@ nsSpaceManager::GetPrevBand] with floats after nscoord_MAX of height

Categories

(Core :: Layout: Floats, defect)

1.9.1 Branch
defect
Not set
critical

Tracking

()

VERIFIED FIXED
Tracking Status
status1.9.1 --- unaffected

People

(Reporter: jruderman, Assigned: dholbert)

References

Details

(5 keywords, Whiteboard: [sg:critical?][fixed by 191448 (floatmgr re-write)])

Crash Data

Attachments

(5 files, 1 obsolete file)

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1a1pre) Gecko/2008053113 Minefield/3.1a1pre

Steps to reproduce:
1. Load the testcase in a debug build of Firefox.

Result:

###!!! ASSERTION: aBandRect should be first rect within its band: '!aBandRect || aBandRect == mBandList.Head() || aBandRect->Prev()->mBottom != aBandRect->mBottom', file /Users/jruderman/central/layout/generic/nsSpaceManager.h, line 475

Crash dereferencing 0xc0000017 (note the first digit -- this is not a null deref!)

The assertion also occurs in bug 410232, but that bug has a different testcase, no crash, and is public.
Whiteboard: [sg:critical?]
Crashes my Linux debug build, too. (updated from mozilla-central yesterday)
OS: Mac OS X → All
Hardware: PC → All
I'm attaching the backtrace of my Linux crash.  It's crashing on this line:
http://mxr.mozilla.org/seamonkey/source/layout/generic/nsSpaceManager.cpp#435
  434  NS_ASSERTION(aBandRect->Prev() == &mBandList ||
  435               aBandRect->Prev()->mBottom <= aBandRect->mTop,
  436               "aBandRect should be first rect within its band");
The crash happens because we look up  aBandRect->Prev()->mBottom, and  aBandRect->Prev() is either 0x1 or 0x5a5a5a5a.  

I'm not actually sure if it's 0x5a5a5a5a or 0x1 -- if I break at the crash in GDB and I do "p aBandRect->prev", then it displays 0x5a5a5a5a.  But then if I do "p aBandRect->Prev()", I get 0x1.  And that call to Prev() seems to have a lasting effect on 'prev', because from that point on, "p aBandRect->prev" displays 0x1 instead of 0x5a5a5a5a. Weird.
(In reply to comment #0)
> ###!!! ASSERTION: aBandRect should be first rect within its band [...]

FWIW, I get two copies of this assertion before the crash.  Just before those assertions, I get these messages:

nsLineLayout: SVGOuterSVG(svg)(0)@0xb0d0797c metrics=53280,1073741824!
nsBlockReflowContext: Block(span)(0)@0xb0d07b0c metrics=53280,1073742064!
nsBlockReflowContext: Area(div)(7)@0xb0d07720 metrics=53280,1073744344!
Flags: blocking1.9.1?
Flags: blocking1.9.1? → wanted1.9.1+
Here's a somewhat simplified testcase.

FWIW, the only function of the <svg style="display: table-row;"> element in the original testcase was to insert a huge height. (see bug 410233 -- that particular style on <svg> makes it about nscoord_MAX tall)

So, for simplicity, this testcase has replaced that <svg> with a <div> whose height is close to nscoord_MAX app-units.
Summary: Crash [@ nsSpaceManager::GetPrevBand] with floats, svg → Crash [@ nsSpaceManager::GetPrevBand] with floats after nscoord_MAX of height
No crash now that nsSpaceManager has been replaced by nsFloatManager.

The first testcase triggers the "bad value" assertion in bug 472218, which is about to be turned into a warning.  But the second testcase also triggers the assertion in bug 439204, "We placed a float where there was no room".
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WORKSFORME
Need a crashtest.
Flags: in-testsuite?
Still crashes 1.9.0.6pre, we need to figure out what fixed this.
Flags: wanted1.9.0.x+
Whiteboard: [sg:critical?] → [sg:critical?] need-fix-window
In comment 6 Jesse might have been implying it was bug 191448.
bug 191448 hasn't landed on 1.9.1 and doesn't appear likely to based on lack of flags and the size of the patch, but this is a blocking1.9.1+ security bug (and Shiretoko still dies horribly).

Is there an alternate small band-aid fix for the 1.9.0/1.9.1 branches? I'm reopening so we don't lose track of this one.
Status: RESOLVED → REOPENED
Depends on: 191448
Flags: blocking1.9.0.7?
Resolution: WORKSFORME → ---
Whiteboard: [sg:critical?] need-fix-window → [sg:critical?] maybe fixed by 191448 on 1.9.2
Version: Trunk → 1.9.1 Branch
clearing 1.9.0.7 flag for now, we'll wait for a 1.9.1 patch first.
Status: REOPENED → NEW
Flags: blocking1.9.0.7?
How hard might this be to get this to the 1.9.1 branch?
(e.g. by figuring out what in the float manager refactoring fixed this, or even porting that refactoring to 1.9.1, given that it caused so few regressions)
Whiteboard: [sg:critical?] maybe fixed by 191448 on 1.9.2 → [sg:critical?][maybe fixed by 191448 on 1.9.2]
FWIW: In addition to fixing this bug here, bug 191448 seems to have fixed bug 467493, which is [sg:crititcal?] and blocking1.9.1+.

dbaron, is there any chance that bug 191448's patch would be appropriate for backporting to 1.9.1? (and to 1.9.0.x?)
Like Bug 467493, this bug's testcase crashes in changeset b087a09f82aa, but is fixed in changeset 496e0cb5c943.  (with the intermediate changeset b19f0a7a3c4c not compiling, as mentioned in Bug 467493 comment 10).

So, this was definitely fixed by bug 191448.
Whiteboard: [sg:critical?][maybe fixed by 191448 on 1.9.2] → [sg:critical?][fixed by 191448 on 1.9.2]
Here's a modified version of testcase 2.  It's sized such that it loads fine in the layout debugger, but reducing the width by one 'click' (via alt-F8, rightarrow, leftarrow) is enough to trigger the crash.

To try to answer dbaron's question in related bug 467493 comment 8 ("If this was fixed by bug 191448, it would be good to know why"), I've spent some time looking into specifically which of the nsFloatManager changes made us start working on this this testcase in mozilla-central -- details to follow.
Attachment #361806 - Attachment description: testcase 3 (crashes 1.9.0 & 1.9.1 builds when loaded in smallish window) → testcase 3 (crashes 1.9.0 & 1.9.1 builds when viewed in skinny viewport)
Attachment #361806 - Attachment description: testcase 3 (crashes 1.9.0 & 1.9.1 builds when viewed in skinny viewport) → testcase 3 (crashes 1.9.0 & 1.9.1 debug builds with skinny viewports)
(In reply to comment #16)
> To try to answer dbaron's question in related bug 467493 comment 8 ("If this
> was fixed by bug 191448, it would be good to know why"), I've spent some time
> looking into specifically which of the nsFloatManager changes made us start
> working on this this testcase in mozilla-central -- details to follow.

(NOTE: When I compare "old code" vs "new code" below, I'm talking about code before bug 191448 landed vs. code after it landed.)

Ok -- in old code, we crash somewhere within this call, in nsBlockReflowState::FlowAndPlaceFloat:
  947    mSpaceManager->AddRectRegion(floatFrame, region);
http://hg.mozilla.org/mozilla-central/annotate/b087a09f82aa/layout/generic/nsBlockReflowState.cpp#l947

At that point, region.y is nscoord_MAX (because floatY is nscoord_MAX, because mY is nscoord_MAX).

If I trace execution back a bit to where mY became nscoord_MAX, I arrive here:
   821  while (!CanPlaceFloat(floatSize, floatDisplay->mFloats, aForceFit)) {
   [snip]
   832      mY += mAvailSpaceRect.height;
   833      GetAvailableSpace(mY, aForceFit);
http://hg.mozilla.org/mozilla-central/annotate/b087a09f82aa/layout/generic/nsBlockReflowState.cpp#l821

Here, mY starts out as 0, but mAvailSpaceRect.height is nscoord_MAX.  So we end up with an mY = nscoord_MAX.

This is where new code's behavior branches off from the old code's behavior, though. The above-quoted call to GetAvailableSpace() will end up resetting mBand.mLeftFloats (which is replaced with "mBandHasFloats" in new code).  In old code, we set mLeftFloats to 0, ***but in new code, we set mBandHasFloats to PR_TRUE***.  This difference affects our behavior in the next "CanPlaceFloat" check in the "while" condition -- that function returns true in old code (terminating the loop), but returns false in new code (making us continue looping).  Also -- if I use GDB to force this particular CanPlaceFloat call to change its behavior & return false in old code, we don't crash.

So -- why do we set mBand.mLeftFloats = 0 in old code?  In nsBlockBandData::ComputeAvailSpaceRect, it looks like we increment a temporary variable "leftFloats" for each entry in a frame list.  However, in this particular case, that frame list (trapezoid->mFrames == mTrapezoids[0]->mFrames) is a null pointer, so we never get a chance to increment "leftFloats".  So mLeftFloats end up as 0.
http://hg.mozilla.org/mozilla-central/annotate/b087a09f82aa/layout/generic/nsBlockBandData.cpp#l186

In new code, on the other hand, mBandHasFloats is set in nsFloatManager::GetBand -- it's passed in by reference as aHasFloats.  In that function, we do bounds-overlapping-comparisons instead of iterating across a frame list. We end up turning on the temp variable "haveFloats" here:
   202      // This float is in our band.
   203      haveFloats = PR_TRUE;
http://hg.mozilla.org/mozilla-central/annotate/496e0cb5c943/layout/generic/nsFloatManager.cpp#l203

So, now I'm wondering why mTrapezoids[0]->mFrames was null in old code.  I'll look into that after lunch.
(In reply to comment #16)
> To try to answer dbaron's question in related bug 467493 comment 8 ("If this
> was fixed by bug 191448, it would be good to know why"), I've spent some time
> looking into specifically which of the nsFloatManager changes made us start
> working on this this testcase in mozilla-central -- details to follow.

I'm not particularly interested in that question for *this* bug, only for *that* bug.  In this bug, I was intentionally careful about integer overflows when writing nsFloatManager, and this is a bug on a crash *inside* the space manager.  The other bug is a crash in code outside of it.
ah, ok. :)  I won't look into this here much more, then.

FWIW, one last thing here -- the new code actually ends up with an integer overflow on testcase 3, in a place where the old code didn't.  Since CanPlaceFloat returns false now (in the situation from comment 18), we hit that "mY += mAvailSpaceRect.height;" line a second time, and so now we end up with mY = nscoord_MAX + nscoord_MAX = -2147483648 = -2^31.  This ends up triggering the NS_WARNING("bad value") at nsFloatManager.cpp:149, though it doesn't seem to cause anything else bad.  Perhaps that "mY += mAvailSpaceRect.height" line should use NSCoordSaturatingAdd, to be safe(r)...?
Excellent -- it looks like dbaron fixed this on 1.9.1 in bug 191448 comment 32.  Adding fixed1.9.1 keyword.
Keywords: fixed1.9.1
Whiteboard: [sg:critical?][fixed by 191448 on 1.9.2] → [sg:critical?]
(In reply to comment #20)
>  Adding fixed1.9.1 keyword.
... and resolving as fixed by bug 191448.
Status: NEW → RESOLVED
Closed: 16 years ago15 years ago
Resolution: --- → FIXED
Verified fix on Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3pre) Gecko/20090226 Shiretoko/3.1b3pre Ubiquity/0.1.5 
and Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090226 Minefield/3.2a1pre
Status: RESOLVED → VERIFIED
Flags: blocking1.9.0.8?
Although we'd like a 1.9.0 fix, bug 191448 seems out of scope. Is a smaller branch band-aide possible? Is bug 191448 less risky than it looks?
Flags: blocking1.9.0.8? → blocking1.9.0.8-
Whiteboard: [sg:critical?] → [sg:critical?] fixed by 191448 (floatmgr re-write)
Flags: blocking1.9.0.13?
Whiteboard: [sg:critical?] fixed by 191448 (floatmgr re-write) → [needs answer to comment 23 from layout devs][sg:critical?] fixed by 191448 (floatmgr re-write)
we're not going to take bug 191448 on 1.9.0.

Is a smaller branch band-aide possible?
Whiteboard: [needs answer to comment 23 from layout devs][sg:critical?] fixed by 191448 (floatmgr re-write) → [needs answer to comment 24 from dholbert][sg:critical?] fixed by 191448 (floatmgr re-write)
Attached patch band-aid v1 (obsolete) — Splinter Review
Here's one band-aid for this.  It jumps in at the spot where we're about to delete the BandRect that we later end up crashing on, and it clears the reference that we shouldn't be retaining to it.

This patch fixes the crash (and triggers the added warning) on all three testcases here.
Comment on attachment 392622 [details] [diff] [review]
band-aid v1

dbaron -- what do you think of this as a band-aid fix?

Pros:
 - Fairly targeted change.
 - Only changes behavior when we're already in a clearly-bad state, AFAICT (i.e. it won't affect behavior of normal web sites or expose us to new badness)
 - The change to behavior is minimal -- just clears a cached pointer. (moreover, it's a pointer to stuff that's about to be deleted -- so, we do *not* want to be using it anymore anyway)
Attachment #392622 - Flags: review?(dbaron)
(In reply to comment #17)
> Ok -- in old code, we crash somewhere within this call, in
> nsBlockReflowState::FlowAndPlaceFloat:
>   947    mSpaceManager->AddRectRegion(floatFrame, region);

BTW -- I'm pretty sure that in comment 17, I misspoke -- I think I meant to say "we first *assert* somewhere within this call" (not that we *crash* within that call).

I say this because, when testing a pre-FloatManager build (from changeset b087a09f82aa) as well as a fresh CVS build, I'm noticing that I do hit the assertion within that "FlowAndPlaceFloat" function call, but the crash doesn't happen until later.

Anyway -- the lowest chunk of the backtrace, when we actually crash, is:
> #0  0xb5b7487c in nsSpaceManager::GetPrevBand at /mozilla/layout/generic/nsSpaceManager.cpp:434
> #1  0xb5b749b1 in nsSpaceManager::GuessBandWithTopAbove at /mozilla/layout/generic/nsSpaceManager.cpp:1335
> #2  0xb5b7522b in nsSpaceManager::GetBandData (this=0xb2fd79c0, aYOffset=0, aMaxSize=@0xbfde7e2c, aBandData=@0xbfde8700) at /mozilla/layout/generic/nsSpaceManager.cpp:372
> #3  0xb5acc671 in nsBlockBandData::GetBandData (this=0xbfde8700, aY=0, aRelaxHeightConstraint=0) at /mozilla/layout/generic/nsBlockBandData.cpp:117
> #4  0xb5acc89f in nsBlockBandData::GetAvailableSpace (this=0xbfde8700, aY=0, aRelaxHeightConstraint=0, aResult=@0xbfde86c8) at /mozilla/layout/generic/nsBlockBandData.cpp:87
> #5  0xb5ae4885 in nsBlockReflowState::GetAvailableSpace (this=0xbfde866c, aY=0, aRelaxHeightConstraint=0) at /mozilla/layout/generic/nsBlockReflowState.cpp:388
> #6  0xb5ae2af5 in nsBlockReflowState::GetAvailableSpace (this=0xbfde866c) at /mozilla/layout/generic/nsBlockReflowState.h:87
> ...

The crashing line is the first line of GetPrevBand (434 in debug builds, and presumably line 438 in optimized builds):
431 nsSpaceManager::BandRect*
432 nsSpaceManager::GetPrevBand(const BandRect* aBandRect) const
433 {
434   NS_ASSERTION(aBandRect->Prev() == &mBandList ||
...
438   BandRect* prev = aBandRect->Prev();

We crash because |aBandRect| is a pointer to a deleted object.

Going up one level to "GuessBandWithTopAbove", we see that |aBandRect| taken from |mCachedBandPosition|.

The band-aid patch prevents the crash by clearing mCachedBandPosition when the structure it points to is deleted.
Whiteboard: [needs answer to comment 24 from dholbert][sg:critical?] fixed by 191448 (floatmgr re-write) → [sg:critical?] fixed by 191448 (floatmgr re-write)
Flags: blocking1.9.0.14? → blocking1.9.0.14+
Assignee: nobody → dholbert
Whiteboard: [sg:critical?] fixed by 191448 (floatmgr re-write) → [sg:critical?][needs r=dbaron] fixed by 191448 (floatmgr re-write)
Comment on attachment 392622 [details] [diff] [review]
band-aid v1

>+            } else if (mCachedBandPosition == rect) {
>+              // Clear cached reference to this rect before we delete it.
>+              NS_WARNING("Uh oh -- we're about to delete a BandRect "
>+                         "while retaining a cached pointer to it. "
>+                         "Clearing cached pointer to prevent badness...");
>+              SetCachedBandPosition(nsnull);
>             }

I don't think you need the NS_WARNING; the comment and the code should be enough.

I also think you can get away with setting the cache to something useful; in particular, we know that |band| is still valid, so you should be able to call SetCachedBandPosition(band);



However, I think you actually need to make the same change in a second place:  where we delete |prevRect| just below; we can have a pretty-much-identical else to that if (prevRect == band).


With those changes, this looks good to me.
Given the fix, it looks like the underlying bug is a regression from bug 270392.  (If we remember, we can mark the dependency after the patch lands.)

This means that 1.9.0.* is the only branch on which it is currently present; it hadn't been introduced by 1.8.0* or 1.8.1.*, and the code was removed on 1.9.1.* and later.
Whiteboard: [sg:critical?][needs r=dbaron] fixed by 191448 (floatmgr re-write) → [sg:critical?][needs addressing review comments in comment 28] fixed by 191448 (floatmgr re-write)
Attached patch band-aid v1bSplinter Review
This version addresses review comments from comment 28, in a slightly roundabout way.

dbaron suggested calling SetCachedBandPosition(band).  There's actually a call like that in the code already, at the end of the clause immediately before my previous patch's added "else if".

So instead of changing the contents of my "else if" block, this updated patch just moves the already-existing |SetCachedBandPosition(band)| line outside of the block that previously encompassed it. (This has the same effect as applying dbaron's suggestion to my previous patch.)

I'm doing the same thing below for "prevRect", per dbaron's suggestion.

Cleared the above with dbaron over IRC -- carrying forward r+, and requesting approval.
Attachment #392622 - Attachment is obsolete: true
Attachment #393645 - Flags: review+
Attachment #393645 - Flags: approval1.9.0.14?
I've verified that the updated band-aid (v1b) still fixes the crash on all posted testcases.  (FWIW, neither of the band-aids fix the assertion mentioned in comment 0, though -- they only fix the crash.)
Whiteboard: [sg:critical?][needs addressing review comments in comment 28] fixed by 191448 (floatmgr re-write) → [sg:critical?][needs approval][probably regression from 270392][ fixed by 191448 (floatmgr re-write)
(In reply to comment #29)
> Given the fix, it looks like the underlying bug is a regression from bug
> 270392.  (If we remember, we can mark the dependency after the patch lands.)

Nope -- the crash here doesn't seem to be a regression from that bug's patch.  I just tested the nightly from the day after it landed...
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a2pre) Gecko/20070124 Minefield/3.0a2pre
...and it doesn't crash on any of this bug's testcases.

> This means that 1.9.0.* is the only branch on which it is currently present

This part should still be true, though.  Even if this crash wasn't strictly a regression from bug 270392's patch, it definitely depends on a cached pointer introduced in that patch.
Whiteboard: [sg:critical?][needs approval][probably regression from 270392][ fixed by 191448 (floatmgr re-write) → [sg:critical?][needs approval][ fixed by 191448 (floatmgr re-write)
Comment on attachment 393645 [details] [diff] [review]
band-aid v1b

Approved for 1.9.0.14. a=ss for release-drivers
Attachment #393645 - Flags: approval1.9.0.14? → approval1.9.0.14+
Whiteboard: [sg:critical?][needs approval][ fixed by 191448 (floatmgr re-write) → [sg:critical?][needs approval][fixed by 191448 (floatmgr re-write)]
Band-aid v1b committed to CVS:
Checking in layout/generic/nsSpaceManager.cpp;
/cvsroot/mozilla/layout/generic/nsSpaceManager.cpp,v  <--  nsSpaceManager.cpp
new revision: 3.92; previous revision: 3.91
done

  --> fixed1.9.0.14
Keywords: fixed1.9.0.14
Whiteboard: [sg:critical?][needs approval][fixed by 191448 (floatmgr re-write)] → [sg:critical?][fixed by 191448 (floatmgr re-write)]
Verified fixed in my debug build of 1.9.0.14 synched from 8/12. The testcases crash in my 1.9.0.13 debug build.
Group: core-security
Crashtest: http://hg.mozilla.org/mozilla-central/rev/4da43cad0331
Flags: in-testsuite? → in-testsuite+
Crash Signature: [@ nsSpaceManager::GetPrevBand]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: