Last Comment Bug 443089 - <mtd> with huge rowspan causes crash with sad nsCellMap
: <mtd> with huge rowspan causes crash with sad nsCellMap
Status: RESOLVED FIXED
[sg:critical?]
: assertion, crash, testcase, verified1.8.1.17, verified1.9.0.2
Product: Core
Classification: Components
Component: MathML (show other bugs)
: Trunk
: All All
: -- critical (vote)
: ---
Assigned To: Bernd
:
:
Mentors:
Depends on:
Blocks: 347580 348954
  Show dependency treegraph
 
Reported: 2008-07-01 20:05 PDT by Jesse Ruderman
Modified: 2008-09-25 15:01 PDT (History)
13 users (show)
samuel.sidler+old: blocking1.9.0.2+
samuel.sidler+old: wanted1.9.0.x+
dveditz: blocking1.8.1.17+
dveditz: wanted1.8.1.x+
asac: blocking1.8.0.next+
jruderman: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
testcase (162 bytes, application/xhtml+xml)
2008-07-01 20:05 PDT, Jesse Ruderman
no flags Details
backtrace of crash (on linux, in EnsureCapacity) (5.70 KB, text/plain)
2008-07-02 11:34 PDT, Daniel Holbert [:dholbert]
no flags Details
partial patch v1 (599 bytes, patch)
2008-07-02 13:49 PDT, Daniel Holbert [:dholbert]
no flags Details | Diff | Splinter Review
patch v2: abort! (2.06 KB, patch)
2008-07-02 14:57 PDT, Daniel Holbert [:dholbert]
no flags Details | Diff | Splinter Review
patch (2.66 KB, patch)
2008-07-03 21:30 PDT, Bernd
bzbarsky: review+
bzbarsky: superreview+
Details | Diff | Splinter Review
patch rev1 (3.72 KB, patch)
2008-07-12 10:08 PDT, Bernd
no flags Details | Diff | Splinter Review
patch rev1 (4.26 KB, text/plain)
2008-07-12 10:10 PDT, Bernd
no flags Details
complete patch for checkin (3.25 KB, patch)
2008-07-12 22:56 PDT, Bernd
samuel.sidler+old: approval1.9.0.2+
Details | Diff | Splinter Review
1.8 branch patch (5.92 KB, patch)
2008-08-04 07:00 PDT, Bernd
dveditz: approval1.8.1.17+
Details | Diff | Splinter Review
for 1.8.0 (1.8 branch without xpcom/glue/nsTArray.cpp bits) (4.41 KB, patch)
2008-08-31 06:19 PDT, Alexander Sack
asac: approval1.8.0.next+
Details | Diff | Splinter Review

Description Jesse Ruderman 2008-07-01 20:05:47 PDT
Created attachment 327731 [details]
testcase

Steps to reproduce:
1. Load the testcase in a debug build (Firefox trunk on Tiger).

Result:

Sometimes:
firefox-bin(17051,0xa000d000) malloc: *** error for object 0x339c001b: pointer being reallocated was not allocated
firefox-bin(17051,0xa000d000) malloc: *** set a breakpoint in szone_error to debug
###!!! ASSERTION: invalid array index: 'i < Length()', file ../../dist/include/xpcom/nsTArray.h, line 317

Always, a crash at one of the following:
nsCellMap::AppendCell - nsTArray_base::EnsureCapacity
nsCellMap::SetDataAt - nsTArrayElementTraits<CellData*>::Construct<CellData*>
nsCellMap::AppendCell - CellData::IsOrig

All of the crashes involve non-null, bogus pointer dereferences.  For the first crash signature, this is easiest to see with:

export MallocScribble=1
export MallocPreScribble=1
Comment 1 Daniel Holbert [:dholbert] 2008-07-01 20:23:11 PDT
Crashes my linux nightly optimized build, too.  (OS/Hardware --> All)

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1pre) Gecko/2008063004 GranParadiso/3.0.1pre

Crash report:
http://crash-stats.mozilla.com/report/index/ea776a35-47e5-11dd-af52-001cc45a2ce4
Comment 2 Daniel Holbert [:dholbert] 2008-07-02 11:34:41 PDT
Created attachment 327835 [details]
backtrace of crash (on linux, in EnsureCapacity)

Here's the backtrace from the crash.

At this point, the array looks like this:

(gdb) p this
$2 = (nsTArray_base * const) 0xae9ffb1c
(gdb) p *this
$3 = {static sEmptyHdr = {mLength = 0, mCapacity = 0, mIsAutoArray = 0}, mHdr = 0xa5a5a5a5}

We're crashing because we try to dereference mHdr, to see mHdr->mCapacity.
Comment 3 Daniel Holbert [:dholbert] 2008-07-02 11:58:40 PDT
We're wrapping around the unsigned int bound here:

http://mxr.mozilla.org/seamonkey/source/xpcom/glue/nsTArray.cpp

101     size_type size = sizeof(Header) + capacity * elemSize;
102     header = static_cast<Header*>(NS_Realloc(mHdr, size));

at this point, we have
  capacity = 1073741825
  elemSize = 4
  sizeof(Heeader) = 8  (I think)

so 'size' ends up being
   (8 + 1073741825 * 4) mod 2^32 = (4294967308) mod 2^32 = 12

which is much smaller than we're expecting it to be.
Comment 4 Daniel Holbert [:dholbert] 2008-07-02 12:02:13 PDT
Sorry, I should've included a larger code snippet:

100     // NS_Realloc existing data
101     size_type size = sizeof(Header) + capacity * elemSize;  // <--- size=12
102     header = static_cast<Header*>(NS_Realloc(mHdr, size));  
103     if (!header)
104       return PR_FALSE;
105   }
106 
107   header->mCapacity = capacity;           // <---- capacity = 1073741825
108   mHdr = header;

The disconnect between the wrapped-around 'size' and the presumed 'capacity' is what's killing us, I think.
Comment 5 Daniel Holbert [:dholbert] 2008-07-02 13:49:21 PDT
Created attachment 327858 [details] [diff] [review]
partial patch v1

This patch fixes the bounds check to avoid the issue described in comment 4.  Basically, EnsureCapacity already has a bounds check, but in this case we're wrapping around in that bounds-check as well, which renders it useless. :)  This just fixes that issue by casting to a PRUint64.

After this patch, we print these assertions:

###!!! ASSERTION: Attempting to allocate excessively large array: 'Error', file nsTArray.cpp, line 69
###!!! ASSERTION: invalid array index: 'i < Length()', file ../../dist/include/xpcom/nsTArray.h, line 317

... and crash in nsTArray<CellData*>::SetCapacity.

It sounds like someone's not error-checking the return-value of EnsureCapacity (to bail out if needed).
Comment 6 Daniel Holbert [:dholbert] 2008-07-02 14:15:16 PDT
(In reply to comment #5)
> It sounds like someone's not error-checking the return-value of EnsureCapacity
> (to bail out if needed).

Yup, that's happening here:
http://mxr.mozilla.org/seamonkey/source/layout/tables/nsCellMap.cpp#1507

1507     // XXXbz handle allocation failures?
1508     Grow(aMap, 1 + endRowIndex - origNumMapRows);

Grow() returns PR_FALSE if it fails, and we're not checking for that right now.
Comment 7 Daniel Holbert [:dholbert] 2008-07-02 14:22:51 PDT
(In reply to comment #6)
> Grow() returns PR_FALSE if it fails, and we're not checking for that right now.

... and then we hit the "invalid array index" assertion and crash down here, because we think we now have endRowIndex rows to work with (when in fact we don't because Grow() failed)

1596   for (PRInt32 rowX = aRowIndex; rowX <= endRowIndex; rowX++) {
1597     // The row at rowX will need to have at least endColIndex columns
1598     mRows[rowX].SetCapacity(endColIndex);
 

Comment 8 Daniel Holbert [:dholbert] 2008-07-02 14:57:08 PDT
Created attachment 327868 [details] [diff] [review]
patch v2: abort!

This patch just aborts with NS_ABORT_IF_FALSE if allocation fails as mentioned in comment 6.

I'll defer to Boris (or anyone else who knows the CellMap code) for the "correct" fix here -- i.e. how we should gracefully handle the allocation failure.

I just wanted to get an initial patch up that would make us at least die safely rather than unsafely, in case there's no easy / immediate more graceful solution for this.
Comment 9 Boris Zbarsky [:bz] (still a bit busy) 2008-07-02 21:54:02 PDT
The reason I added the XXX was because I had no idea what to do in that case...  Bernd might know.

We're basically doing an allocation of a size that's under content page control here, right?
Comment 10 Bernd 2008-07-03 09:17:53 PDT
why isn't the math rowspan blocked at content level? we did that long time ago for html content
Comment 11 Bernd 2008-07-03 09:24:48 PDT
nsMathMLmtdFrame::GetRowSpan()
and 
nsMathMLmtdFrame::GetColSpan()

should do the same clamping as

we do at http://mxr.mozilla.org/seamonkey/source/content/html/content/src/nsHTMLTableCellElement.cpp#264




Comment 12 Boris Zbarsky [:bz] (still a bit busy) 2008-07-03 13:38:48 PDT
Will that get us to the point where this can be a never-fails allocation?  Or should we still handle OOM here?
Comment 13 Jesse Ruderman 2008-07-03 16:15:18 PDT
NS_ABORT_IF_FALSE is debug-only.  It does not make opt builds safe.
Comment 14 Daniel Holbert [:dholbert] 2008-07-03 16:35:15 PDT
Comment on attachment 327868 [details] [diff] [review]
patch v2: abort!

Ah, good to know. Marking obsolete.

Bernd mentioned in IRC that he's working on a real fix along the lines of comment 11, so I won't post an updated bail-out patch.
Comment 15 Bernd 2008-07-03 21:30:10 PDT
Created attachment 328086 [details] [diff] [review]
patch
Comment 16 Bernd 2008-07-03 21:44:03 PDT
to my knowledge there is no never-fails allocation, but the patch closes a pretty nasty thing when rowspan/colspan values exceed our internal structure limits which might be before the OOM condition.
Comment 17 Boris Zbarsky [:bz] (still a bit busy) 2008-07-03 21:54:34 PDT
> to my knowledge there is no never-fails allocation

The plan for Mozilla2 is to have two types of allocations:

1)  Ones that can never fail (if the memory actually can't be allocated, even after caches are flushed and the memory reserve is used, the program aborts instead of having the allocation fail).

2)  Ones that can fail and need to be null-checked by the caller.

Any allocation that allocates a biggish chunk of data under content control really needs to be the latter.

Sounds like we need a followup bug on actually handling the XXX comment, right?
Comment 18 Boris Zbarsky [:bz] (still a bit busy) 2008-07-03 21:59:27 PDT
Comment on attachment 328086 [details] [diff] [review]
patch

Looks OK, except I'd love to see those #defines in a single place instead of in both files.... If there is no good place, maybe that needs fixing.
Comment 19 Bernd 2008-07-03 22:08:23 PDT
I will write a second patch which will handle boris comments inside nsCellMap.cpp. However I believe if the cellmap fails with OOM we are pretty much doomed. What happens then is that: if we bail out, our lookup table (cellmap) gets out of sync with the internal frame structure, which from my experience is certain way to get sooner rather than later the beast down.
Comment 20 Bernd 2008-07-03 22:13:52 PDT
I did not want to introduce a header dependency, the only .h file that both share is StyleConst.h and adding to that file seems so wrong.
Comment 21 Boris Zbarsky [:bz] (still a bit busy) 2008-07-03 22:25:49 PDT
It really sounds like we could use a header both include.  Heck, how about celldata.h?  It should be includable by both.

As for the rest... we really need a way to flag a frame tree as needing immediate destruction in situations like this.  Or something.  Or perhaps we should just stick to abort behavior for now and file a bug to get this sorted out...
Comment 22 Bernd 2008-07-12 10:08:35 PDT
Created attachment 329228 [details] [diff] [review]
patch rev1

changes required by bz
Comment 23 Bernd 2008-07-12 10:10:49 PDT
Created attachment 329231 [details]
patch rev1
Comment 24 Bernd 2008-07-12 22:56:43 PDT
Created attachment 329264 [details] [diff] [review]
complete patch for checkin
Comment 25 Bernd 2008-07-13 10:06:08 PDT
I am thick of the following:

	<firebot>	Firefox: 'MacOSX Darwin 9.2.2 mozilla-central qm-moz2mini01 dep unit test' has changed state from Success to Test Failed.
	<firebot>	Firefox: 'Linux mozilla-central qm-centos5-moz2-02-hw dep unit test' has changed state from Success to Test Failed.
	<firebot>	Firefox: 'WINNT 5.2 mozilla-central qm-win2k3-03 dep unit test' has changed state from Success to Test Failed.

I will be away for vacation, so if somebody finds that tiny moment where all boxes are green please check it in. I will only check it in if the boxes are green and this will not happen before august.

The perma-orange together with the hg disaster (no guilty column on tinderbox + no mark as in bonsai, patch juggling when you are not the tip when somebody did check in between commit and push on a completely different edge of the source base) remembers me the old 'Lethal Weapon' slogan: I'm to old for this shit.

Its time for the reedbot to check this in.

 
Comment 26 Bernd 2008-07-16 08:04:28 PDT
I just pushed b87fdbfecc16
Comment 27 Bernd 2008-07-16 13:20:41 PDT
Comment on attachment 329264 [details] [diff] [review]
complete patch for checkin

This should probably go after some baking onto branches
Comment 28 Bernd 2008-07-16 13:21:52 PDT
the test case should be pushed once the branches are released.
Comment 29 Samuel Sidler (old account; do not CC) 2008-07-29 16:55:26 PDT
Comment on attachment 329264 [details] [diff] [review]
complete patch for checkin

Approved for 1.9.0.2. Please land in CVS. a=ss
Comment 30 Samuel Sidler (old account; do not CC) 2008-08-03 14:55:41 PDT
Bernd, does this apply to the 1.8 branch or do we need a separate patch? (Or, alternatively, is this bug even applicable on the 1.8 branch?)
Comment 31 Bernd 2008-08-04 07:00:37 PDT
Created attachment 332206 [details] [diff] [review]
1.8 branch patch
Comment 32 Daniel Veditz [:dveditz] 2008-08-04 11:36:31 PDT
Comment on attachment 332206 [details] [diff] [review]
1.8 branch patch

Approved for 1.8.1.17, a=dveditz for release-drivers.
Comment 33 Bernd 2008-08-05 10:18:31 PDT
fixed on branches that I care for.
Comment 34 Marcia Knous [:marcia - use ni] 2008-08-21 15:45:42 PDT
verified fixed on the 1.9.0 branch using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.2pre) Gecko/2008082104 GranParadiso/3.0.2pre (equivalent build on Tiger) and Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.2pre) Gecko/2008082105 GranParadiso/3.0.2pre.
Comment 35 Carsten Book [:Tomcat] 2008-08-29 21:36:47 PDT
verified fixed for 1.8.1.17 using Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.18pre) Gecko/2008082803 Firefox/2.0.0.18pre (Debug Build). No Crash on Testcase
Comment 36 Alexander Sack 2008-08-31 06:19:23 PDT
Created attachment 336244 [details] [diff] [review]
for 1.8.0 (1.8 branch without xpcom/glue/nsTArray.cpp bits)

a=asac for 1.8.0.15
Comment 37 Jesse Ruderman 2008-09-25 15:01:17 PDT
I checked the testcase in as a crashtest.

Note You need to log in before you can comment on or make changes to this bug.