Crash [@ nsSpaceManager::GetPrevBand] with floats after nscoord_MAX of height

VERIFIED FIXED

Status

()

Core
Layout: Floats
--
critical
VERIFIED FIXED
9 years ago
6 years ago

People

(Reporter: Jesse Ruderman, Assigned: dholbert)

Tracking

(Blocks: 2 bugs, 5 keywords)

1.9.1 Branch
assertion, crash, testcase, verified1.9.0.14, verified1.9.1
Points:
---
Dependency tree / graph
Bug Flags:
wanted1.9.1 +
blocking1.9.0.9 -
blocking1.9.0.14 +
wanted1.9.0.x +
in-testsuite +

Firefox Tracking Flags

(status1.9.1 unaffected)

Details

(Whiteboard: [sg:critical?][fixed by 191448 (floatmgr re-write)], crash signature)

Attachments

(5 attachments, 1 obsolete attachment)

(Reporter)

Description

9 years ago
Created attachment 323993 [details]
testcase (crashes Firefox when loaded)

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1a1pre) Gecko/2008053113 Minefield/3.1a1pre

Steps to reproduce:
1. Load the testcase in a debug build of Firefox.

Result:

###!!! ASSERTION: aBandRect should be first rect within its band: '!aBandRect || aBandRect == mBandList.Head() || aBandRect->Prev()->mBottom != aBandRect->mBottom', file /Users/jruderman/central/layout/generic/nsSpaceManager.h, line 475

Crash dereferencing 0xc0000017 (note the first digit -- this is not a null deref!)

The assertion also occurs in bug 410232, but that bug has a different testcase, no crash, and is public.
(Reporter)

Updated

9 years ago
Duplicate of this bug: 437566
(Reporter)

Updated

9 years ago
Whiteboard: [sg:critical?]
(Assignee)

Comment 2

9 years ago
Crashes my Linux debug build, too. (updated from mozilla-central yesterday)
OS: Mac OS X → All
Hardware: PC → All
(Assignee)

Comment 3

9 years ago
Created attachment 333851 [details]
backtrace of Linux crash

I'm attaching the backtrace of my Linux crash.  It's crashing on this line:
http://mxr.mozilla.org/seamonkey/source/layout/generic/nsSpaceManager.cpp#435
  434  NS_ASSERTION(aBandRect->Prev() == &mBandList ||
  435               aBandRect->Prev()->mBottom <= aBandRect->mTop,
  436               "aBandRect should be first rect within its band");
The crash happens because we look up  aBandRect->Prev()->mBottom, and  aBandRect->Prev() is either 0x1 or 0x5a5a5a5a.  

I'm not actually sure if it's 0x5a5a5a5a or 0x1 -- if I break at the crash in GDB and I do "p aBandRect->prev", then it displays 0x5a5a5a5a.  But then if I do "p aBandRect->Prev()", I get 0x1.  And that call to Prev() seems to have a lasting effect on 'prev', because from that point on, "p aBandRect->prev" displays 0x1 instead of 0x5a5a5a5a. Weird.
(Assignee)

Comment 4

9 years ago
(In reply to comment #0)
> ###!!! ASSERTION: aBandRect should be first rect within its band [...]

FWIW, I get two copies of this assertion before the crash.  Just before those assertions, I get these messages:

nsLineLayout: SVGOuterSVG(svg)(0)@0xb0d0797c metrics=53280,1073741824!
nsBlockReflowContext: Block(span)(0)@0xb0d07b0c metrics=53280,1073742064!
nsBlockReflowContext: Area(div)(7)@0xb0d07720 metrics=53280,1073744344!
(Reporter)

Updated

9 years ago
Flags: blocking1.9.1?
Flags: blocking1.9.1? → wanted1.9.1+
(Assignee)

Comment 5

9 years ago
Created attachment 341661 [details]
testcase 2 (crashes firefox when loaded)

Here's a somewhat simplified testcase.

FWIW, the only function of the <svg style="display: table-row;"> element in the original testcase was to insert a huge height. (see bug 410233 -- that particular style on <svg> makes it about nscoord_MAX tall)

So, for simplicity, this testcase has replaced that <svg> with a <div> whose height is close to nscoord_MAX app-units.
Summary: Crash [@ nsSpaceManager::GetPrevBand] with floats, svg → Crash [@ nsSpaceManager::GetPrevBand] with floats after nscoord_MAX of height
(Reporter)

Comment 6

9 years ago
No crash now that nsSpaceManager has been replaced by nsFloatManager.

The first testcase triggers the "bad value" assertion in bug 472218, which is about to be turned into a warning.  But the second testcase also triggers the assertion in bug 439204, "We placed a float where there was no room".
(Reporter)

Updated

9 years ago
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → WORKSFORME
Need a crashtest.
Flags: in-testsuite?
Still crashes 1.9.0.6pre, we need to figure out what fixed this.
Flags: wanted1.9.0.x+
Whiteboard: [sg:critical?] → [sg:critical?] need-fix-window
In comment 6 Jesse might have been implying it was bug 191448.
bug 191448 hasn't landed on 1.9.1 and doesn't appear likely to based on lack of flags and the size of the patch, but this is a blocking1.9.1+ security bug (and Shiretoko still dies horribly).

Is there an alternate small band-aid fix for the 1.9.0/1.9.1 branches? I'm reopening so we don't lose track of this one.
Status: RESOLVED → REOPENED
Depends on: 191448
Flags: blocking1.9.0.7?
Resolution: WORKSFORME → ---
Whiteboard: [sg:critical?] need-fix-window → [sg:critical?] maybe fixed by 191448 on 1.9.2
Version: Trunk → 1.9.1 Branch
clearing 1.9.0.7 flag for now, we'll wait for a 1.9.1 patch first.
Status: REOPENED → NEW
Flags: blocking1.9.0.7?

Comment 12

9 years ago
How hard might this be to get this to the 1.9.1 branch?
(Reporter)

Comment 13

9 years ago
(e.g. by figuring out what in the float manager refactoring fixed this, or even porting that refactoring to 1.9.1, given that it caused so few regressions)
(Assignee)

Updated

9 years ago
Whiteboard: [sg:critical?] maybe fixed by 191448 on 1.9.2 → [sg:critical?][maybe fixed by 191448 on 1.9.2]
(Assignee)

Comment 14

9 years ago
FWIW: In addition to fixing this bug here, bug 191448 seems to have fixed bug 467493, which is [sg:crititcal?] and blocking1.9.1+.

dbaron, is there any chance that bug 191448's patch would be appropriate for backporting to 1.9.1? (and to 1.9.0.x?)
(Assignee)

Comment 15

9 years ago
Like Bug 467493, this bug's testcase crashes in changeset b087a09f82aa, but is fixed in changeset 496e0cb5c943.  (with the intermediate changeset b19f0a7a3c4c not compiling, as mentioned in Bug 467493 comment 10).

So, this was definitely fixed by bug 191448.
(Assignee)

Updated

9 years ago
Whiteboard: [sg:critical?][maybe fixed by 191448 on 1.9.2] → [sg:critical?][fixed by 191448 on 1.9.2]
(Assignee)

Comment 16

9 years ago
Created attachment 361806 [details]
testcase 3 (crashes 1.9.0 & 1.9.1 debug builds with skinny viewports)

Here's a modified version of testcase 2.  It's sized such that it loads fine in the layout debugger, but reducing the width by one 'click' (via alt-F8, rightarrow, leftarrow) is enough to trigger the crash.

To try to answer dbaron's question in related bug 467493 comment 8 ("If this was fixed by bug 191448, it would be good to know why"), I've spent some time looking into specifically which of the nsFloatManager changes made us start working on this this testcase in mozilla-central -- details to follow.
(Assignee)

Updated

9 years ago
Attachment #361806 - Attachment description: testcase 3 (crashes 1.9.0 & 1.9.1 builds when loaded in smallish window) → testcase 3 (crashes 1.9.0 & 1.9.1 builds when viewed in skinny viewport)
(Assignee)

Updated

9 years ago
Attachment #361806 - Attachment description: testcase 3 (crashes 1.9.0 & 1.9.1 builds when viewed in skinny viewport) → testcase 3 (crashes 1.9.0 & 1.9.1 debug builds with skinny viewports)
(Assignee)

Comment 17

9 years ago
(In reply to comment #16)
> To try to answer dbaron's question in related bug 467493 comment 8 ("If this
> was fixed by bug 191448, it would be good to know why"), I've spent some time
> looking into specifically which of the nsFloatManager changes made us start
> working on this this testcase in mozilla-central -- details to follow.

(NOTE: When I compare "old code" vs "new code" below, I'm talking about code before bug 191448 landed vs. code after it landed.)

Ok -- in old code, we crash somewhere within this call, in nsBlockReflowState::FlowAndPlaceFloat:
  947    mSpaceManager->AddRectRegion(floatFrame, region);
http://hg.mozilla.org/mozilla-central/annotate/b087a09f82aa/layout/generic/nsBlockReflowState.cpp#l947

At that point, region.y is nscoord_MAX (because floatY is nscoord_MAX, because mY is nscoord_MAX).

If I trace execution back a bit to where mY became nscoord_MAX, I arrive here:
   821  while (!CanPlaceFloat(floatSize, floatDisplay->mFloats, aForceFit)) {
   [snip]
   832      mY += mAvailSpaceRect.height;
   833      GetAvailableSpace(mY, aForceFit);
http://hg.mozilla.org/mozilla-central/annotate/b087a09f82aa/layout/generic/nsBlockReflowState.cpp#l821

Here, mY starts out as 0, but mAvailSpaceRect.height is nscoord_MAX.  So we end up with an mY = nscoord_MAX.

This is where new code's behavior branches off from the old code's behavior, though. The above-quoted call to GetAvailableSpace() will end up resetting mBand.mLeftFloats (which is replaced with "mBandHasFloats" in new code).  In old code, we set mLeftFloats to 0, ***but in new code, we set mBandHasFloats to PR_TRUE***.  This difference affects our behavior in the next "CanPlaceFloat" check in the "while" condition -- that function returns true in old code (terminating the loop), but returns false in new code (making us continue looping).  Also -- if I use GDB to force this particular CanPlaceFloat call to change its behavior & return false in old code, we don't crash.

So -- why do we set mBand.mLeftFloats = 0 in old code?  In nsBlockBandData::ComputeAvailSpaceRect, it looks like we increment a temporary variable "leftFloats" for each entry in a frame list.  However, in this particular case, that frame list (trapezoid->mFrames == mTrapezoids[0]->mFrames) is a null pointer, so we never get a chance to increment "leftFloats".  So mLeftFloats end up as 0.
http://hg.mozilla.org/mozilla-central/annotate/b087a09f82aa/layout/generic/nsBlockBandData.cpp#l186

In new code, on the other hand, mBandHasFloats is set in nsFloatManager::GetBand -- it's passed in by reference as aHasFloats.  In that function, we do bounds-overlapping-comparisons instead of iterating across a frame list. We end up turning on the temp variable "haveFloats" here:
   202      // This float is in our band.
   203      haveFloats = PR_TRUE;
http://hg.mozilla.org/mozilla-central/annotate/496e0cb5c943/layout/generic/nsFloatManager.cpp#l203

So, now I'm wondering why mTrapezoids[0]->mFrames was null in old code.  I'll look into that after lunch.
(In reply to comment #16)
> To try to answer dbaron's question in related bug 467493 comment 8 ("If this
> was fixed by bug 191448, it would be good to know why"), I've spent some time
> looking into specifically which of the nsFloatManager changes made us start
> working on this this testcase in mozilla-central -- details to follow.

I'm not particularly interested in that question for *this* bug, only for *that* bug.  In this bug, I was intentionally careful about integer overflows when writing nsFloatManager, and this is a bug on a crash *inside* the space manager.  The other bug is a crash in code outside of it.
(Assignee)

Comment 19

9 years ago
ah, ok. :)  I won't look into this here much more, then.

FWIW, one last thing here -- the new code actually ends up with an integer overflow on testcase 3, in a place where the old code didn't.  Since CanPlaceFloat returns false now (in the situation from comment 18), we hit that "mY += mAvailSpaceRect.height;" line a second time, and so now we end up with mY = nscoord_MAX + nscoord_MAX = -2147483648 = -2^31.  This ends up triggering the NS_WARNING("bad value") at nsFloatManager.cpp:149, though it doesn't seem to cause anything else bad.  Perhaps that "mY += mAvailSpaceRect.height" line should use NSCoordSaturatingAdd, to be safe(r)...?
(Assignee)

Comment 20

9 years ago
Excellent -- it looks like dbaron fixed this on 1.9.1 in bug 191448 comment 32.  Adding fixed1.9.1 keyword.
Keywords: fixed1.9.1
Whiteboard: [sg:critical?][fixed by 191448 on 1.9.2] → [sg:critical?]
(Assignee)

Comment 21

9 years ago
(In reply to comment #20)
>  Adding fixed1.9.1 keyword.
... and resolving as fixed by bug 191448.
Status: NEW → RESOLVED
Last Resolved: 9 years ago9 years ago
Resolution: --- → FIXED
Verified fix on Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3pre) Gecko/20090226 Shiretoko/3.1b3pre Ubiquity/0.1.5 
and Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090226 Minefield/3.2a1pre
Status: RESOLVED → VERIFIED
Keywords: fixed1.9.1 → verified1.9.1
Flags: blocking1.9.0.8?
Although we'd like a 1.9.0 fix, bug 191448 seems out of scope. Is a smaller branch band-aide possible? Is bug 191448 less risky than it looks?
Flags: blocking1.9.0.8? → blocking1.9.0.8-
Whiteboard: [sg:critical?] → [sg:critical?] fixed by 191448 (floatmgr re-write)
Flags: blocking1.9.0.13?
Whiteboard: [sg:critical?] fixed by 191448 (floatmgr re-write) → [needs answer to comment 23 from layout devs][sg:critical?] fixed by 191448 (floatmgr re-write)
we're not going to take bug 191448 on 1.9.0.

Is a smaller branch band-aide possible?
Whiteboard: [needs answer to comment 23 from layout devs][sg:critical?] fixed by 191448 (floatmgr re-write) → [needs answer to comment 24 from dholbert][sg:critical?] fixed by 191448 (floatmgr re-write)
(Assignee)

Comment 25

8 years ago
Created attachment 392622 [details] [diff] [review]
band-aid v1

Here's one band-aid for this.  It jumps in at the spot where we're about to delete the BandRect that we later end up crashing on, and it clears the reference that we shouldn't be retaining to it.

This patch fixes the crash (and triggers the added warning) on all three testcases here.
(Assignee)

Comment 26

8 years ago
Comment on attachment 392622 [details] [diff] [review]
band-aid v1

dbaron -- what do you think of this as a band-aid fix?

Pros:
 - Fairly targeted change.
 - Only changes behavior when we're already in a clearly-bad state, AFAICT (i.e. it won't affect behavior of normal web sites or expose us to new badness)
 - The change to behavior is minimal -- just clears a cached pointer. (moreover, it's a pointer to stuff that's about to be deleted -- so, we do *not* want to be using it anymore anyway)
Attachment #392622 - Flags: review?(dbaron)
(Assignee)

Comment 27

8 years ago
(In reply to comment #17)
> Ok -- in old code, we crash somewhere within this call, in
> nsBlockReflowState::FlowAndPlaceFloat:
>   947    mSpaceManager->AddRectRegion(floatFrame, region);

BTW -- I'm pretty sure that in comment 17, I misspoke -- I think I meant to say "we first *assert* somewhere within this call" (not that we *crash* within that call).

I say this because, when testing a pre-FloatManager build (from changeset b087a09f82aa) as well as a fresh CVS build, I'm noticing that I do hit the assertion within that "FlowAndPlaceFloat" function call, but the crash doesn't happen until later.

Anyway -- the lowest chunk of the backtrace, when we actually crash, is:
> #0  0xb5b7487c in nsSpaceManager::GetPrevBand at /mozilla/layout/generic/nsSpaceManager.cpp:434
> #1  0xb5b749b1 in nsSpaceManager::GuessBandWithTopAbove at /mozilla/layout/generic/nsSpaceManager.cpp:1335
> #2  0xb5b7522b in nsSpaceManager::GetBandData (this=0xb2fd79c0, aYOffset=0, aMaxSize=@0xbfde7e2c, aBandData=@0xbfde8700) at /mozilla/layout/generic/nsSpaceManager.cpp:372
> #3  0xb5acc671 in nsBlockBandData::GetBandData (this=0xbfde8700, aY=0, aRelaxHeightConstraint=0) at /mozilla/layout/generic/nsBlockBandData.cpp:117
> #4  0xb5acc89f in nsBlockBandData::GetAvailableSpace (this=0xbfde8700, aY=0, aRelaxHeightConstraint=0, aResult=@0xbfde86c8) at /mozilla/layout/generic/nsBlockBandData.cpp:87
> #5  0xb5ae4885 in nsBlockReflowState::GetAvailableSpace (this=0xbfde866c, aY=0, aRelaxHeightConstraint=0) at /mozilla/layout/generic/nsBlockReflowState.cpp:388
> #6  0xb5ae2af5 in nsBlockReflowState::GetAvailableSpace (this=0xbfde866c) at /mozilla/layout/generic/nsBlockReflowState.h:87
> ...

The crashing line is the first line of GetPrevBand (434 in debug builds, and presumably line 438 in optimized builds):
431 nsSpaceManager::BandRect*
432 nsSpaceManager::GetPrevBand(const BandRect* aBandRect) const
433 {
434   NS_ASSERTION(aBandRect->Prev() == &mBandList ||
...
438   BandRect* prev = aBandRect->Prev();

We crash because |aBandRect| is a pointer to a deleted object.

Going up one level to "GuessBandWithTopAbove", we see that |aBandRect| taken from |mCachedBandPosition|.

The band-aid patch prevents the crash by clearing mCachedBandPosition when the structure it points to is deleted.
(Assignee)

Updated

8 years ago
Whiteboard: [needs answer to comment 24 from dholbert][sg:critical?] fixed by 191448 (floatmgr re-write) → [sg:critical?] fixed by 191448 (floatmgr re-write)
Flags: blocking1.9.0.14? → blocking1.9.0.14+
Assignee: nobody → dholbert
Whiteboard: [sg:critical?] fixed by 191448 (floatmgr re-write) → [sg:critical?][needs r=dbaron] fixed by 191448 (floatmgr re-write)
Attachment #392622 - Flags: review?(dbaron) → review+
Comment on attachment 392622 [details] [diff] [review]
band-aid v1

>+            } else if (mCachedBandPosition == rect) {
>+              // Clear cached reference to this rect before we delete it.
>+              NS_WARNING("Uh oh -- we're about to delete a BandRect "
>+                         "while retaining a cached pointer to it. "
>+                         "Clearing cached pointer to prevent badness...");
>+              SetCachedBandPosition(nsnull);
>             }

I don't think you need the NS_WARNING; the comment and the code should be enough.

I also think you can get away with setting the cache to something useful; in particular, we know that |band| is still valid, so you should be able to call SetCachedBandPosition(band);



However, I think you actually need to make the same change in a second place:  where we delete |prevRect| just below; we can have a pretty-much-identical else to that if (prevRect == band).


With those changes, this looks good to me.
Given the fix, it looks like the underlying bug is a regression from bug 270392.  (If we remember, we can mark the dependency after the patch lands.)

This means that 1.9.0.* is the only branch on which it is currently present; it hadn't been introduced by 1.8.0* or 1.8.1.*, and the code was removed on 1.9.1.* and later.
Whiteboard: [sg:critical?][needs r=dbaron] fixed by 191448 (floatmgr re-write) → [sg:critical?][needs addressing review comments in comment 28] fixed by 191448 (floatmgr re-write)
(Assignee)

Comment 30

8 years ago
Created attachment 393645 [details] [diff] [review]
band-aid v1b

This version addresses review comments from comment 28, in a slightly roundabout way.

dbaron suggested calling SetCachedBandPosition(band).  There's actually a call like that in the code already, at the end of the clause immediately before my previous patch's added "else if".

So instead of changing the contents of my "else if" block, this updated patch just moves the already-existing |SetCachedBandPosition(band)| line outside of the block that previously encompassed it. (This has the same effect as applying dbaron's suggestion to my previous patch.)

I'm doing the same thing below for "prevRect", per dbaron's suggestion.

Cleared the above with dbaron over IRC -- carrying forward r+, and requesting approval.
Attachment #392622 - Attachment is obsolete: true
Attachment #393645 - Flags: review+
Attachment #393645 - Flags: approval1.9.0.14?
(Assignee)

Comment 31

8 years ago
I've verified that the updated band-aid (v1b) still fixes the crash on all posted testcases.  (FWIW, neither of the band-aids fix the assertion mentioned in comment 0, though -- they only fix the crash.)
(Assignee)

Updated

8 years ago
Whiteboard: [sg:critical?][needs addressing review comments in comment 28] fixed by 191448 (floatmgr re-write) → [sg:critical?][needs approval][probably regression from 270392][ fixed by 191448 (floatmgr re-write)
(Assignee)

Comment 32

8 years ago
(In reply to comment #29)
> Given the fix, it looks like the underlying bug is a regression from bug
> 270392.  (If we remember, we can mark the dependency after the patch lands.)

Nope -- the crash here doesn't seem to be a regression from that bug's patch.  I just tested the nightly from the day after it landed...
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a2pre) Gecko/20070124 Minefield/3.0a2pre
...and it doesn't crash on any of this bug's testcases.

> This means that 1.9.0.* is the only branch on which it is currently present

This part should still be true, though.  Even if this crash wasn't strictly a regression from bug 270392's patch, it definitely depends on a cached pointer introduced in that patch.
Whiteboard: [sg:critical?][needs approval][probably regression from 270392][ fixed by 191448 (floatmgr re-write) → [sg:critical?][needs approval][ fixed by 191448 (floatmgr re-write)
Comment on attachment 393645 [details] [diff] [review]
band-aid v1b

Approved for 1.9.0.14. a=ss for release-drivers
Attachment #393645 - Flags: approval1.9.0.14? → approval1.9.0.14+
(Assignee)

Updated

8 years ago
Whiteboard: [sg:critical?][needs approval][ fixed by 191448 (floatmgr re-write) → [sg:critical?][needs approval][fixed by 191448 (floatmgr re-write)]
(Assignee)

Comment 34

8 years ago
Band-aid v1b committed to CVS:
Checking in layout/generic/nsSpaceManager.cpp;
/cvsroot/mozilla/layout/generic/nsSpaceManager.cpp,v  <--  nsSpaceManager.cpp
new revision: 3.92; previous revision: 3.91
done

  --> fixed1.9.0.14
Keywords: fixed1.9.0.14
Whiteboard: [sg:critical?][needs approval][fixed by 191448 (floatmgr re-write)] → [sg:critical?][fixed by 191448 (floatmgr re-write)]
Verified fixed in my debug build of 1.9.0.14 synched from 8/12. The testcases crash in my 1.9.0.13 debug build.
Keywords: fixed1.9.0.14 → verified1.9.0.14
status1.9.1: --- → unaffected
Group: core-security
(Reporter)

Comment 36

8 years ago
Crashtest: http://hg.mozilla.org/mozilla-central/rev/4da43cad0331
Flags: in-testsuite? → in-testsuite+
Crash Signature: [@ nsSpaceManager::GetPrevBand]
You need to log in before you can comment on or make changes to this bug.