Last Comment Bug 351328 - Crash [@ nsCSSRendering::PaintBackgroundWithSC] with colspan=0
: Crash [@ nsCSSRendering::PaintBackgroundWithSC] with colspan=0
Status: RESOLVED FIXED
[sg:critical] uses freed memory
: crash, fixed1.8.1, regression, testcase, verified1.8.0.8
Product: Core
Classification: Components
Component: Layout: Tables (show other bugs)
: Trunk
: PowerPC Mac OS X
: -- critical (vote)
: ---
Assigned To: Bernd
:
:
Mentors:
Depends on:
Blocks: stirtable
  Show dependency treegraph
 
Reported: 2006-09-04 13:07 PDT by Jesse Ruderman
Modified: 2011-06-13 10:01 PDT (History)
6 users (show)
dveditz: blocking1.8.0.8+
jruderman: in‑testsuite+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
testcase (353 bytes, application/xhtml+xml)
2006-09-04 13:09 PDT, Jesse Ruderman
no flags Details
don't assert but fix it (1.20 KB, patch)
2006-09-05 13:22 PDT, Bernd
bzbarsky: review+
bzbarsky: superreview+
Details | Diff | Splinter Review
Like so? (1.64 KB, patch)
2006-09-06 09:22 PDT, Mats Palmgren (:mats)
no flags Details | Diff | Splinter Review
patch with asserts (1.42 KB, patch)
2006-09-09 07:08 PDT, Bernd
no flags Details | Diff | Splinter Review
revised patch (1.27 KB, patch)
2006-09-09 08:40 PDT, Bernd
bzbarsky: review+
bzbarsky: superreview+
dveditz: approval1.8.0.8+
mtschrep: approval1.8.1+
Details | Diff | Splinter Review
Patch for Mozilla 1.7 (628 bytes, patch)
2007-10-26 05:11 PDT, Muktha
dbaron: review-
Details | Diff | Splinter Review

Description Jesse Ruderman 2006-09-04 13:07:32 PDT
This testcase causes a scary crash.  In a debug build, the memory address accessed looks random (NOT something like 0xdddddddd or 0xdadadada).  Examples: 0xff9c8020, 0x653b3261.

Stack trace:

0    nsCachedStyleData::GetStyleData(nsStyleStructID const&) + 92 (nsRuleNode.h:215)
1    nsStyleContext::GetStyleData(nsStyleStructID) + 52 (nsStyleContext.cpp:221)
2    nsIFrame::GetStyleData(nsStyleStructID) const + 108 (nsIFrame.h:591)
3    nsIFrame::GetStyleDisplay() const + 36 (nsStyleStructList.h:95)
4    nsCSSRendering::PaintBackgroundWithSC(nsPresContext*, nsIRenderingContext&, nsIFrame*, nsRect const&, nsRect const&, nsStyleBackground const&, nsStyleBorder const&, nsStylePadding const&, int, nsRect*) + 176 (nsCSSRendering.cpp:2902)
5    TableBackgroundPainter::PaintCell(nsTableCellFrame*, int) + 572 (nsTablePainter.cpp:599)
...
Comment 1 Jesse Ruderman 2006-09-04 13:09:19 PDT
Created attachment 236733 [details]
testcase
Comment 2 Jesse Ruderman 2006-09-04 13:11:10 PDT
Just before the crash I see

1 * ###!!! ASSERTION: bad arg in nsCellMap::InsertCellAt: 'PR_FALSE', nsCellMap.cpp, line 1576
2 * ###!!! ASSERTION: nsCellMap::GetNumCellsOriginatingInCol - bad col index: 'PR_FALSE', nsCellMap.cpp, line 792

Comment 3 Bernd 2006-09-04 21:20:40 PDT
this does not crash Gecko/20060111 arghhh
Comment 4 Bernd 2006-09-05 13:22:44 PDT
Created attachment 236847 [details] [diff] [review]
don't assert but fix it
Comment 5 Mats Palmgren (:mats) 2006-09-06 09:21:14 PDT
(In reply to comment #4)
> don't assert but fix it

Hmm, isn't this just wallpaper? Shouldn't we track down why the index is
out of bounds and fix the root cause?

IMHO, the table code needs more assertion checks, not less.
Comment 6 Mats Palmgren (:mats) 2006-09-06 09:22:14 PDT
Created attachment 236967 [details] [diff] [review]
Like so?
Comment 7 Bernd 2006-09-06 10:42:52 PDT
Its not wallpaper what I proposed, but doing an append as your patch does. However my patch does cover all callers of InsertCells, yours does not. If we would like to avoid the low level fix then the logic that you propose should be incorporated into nsTableFrame::InsertCells. Regardless what option we choose: asserting and throwing the cellframe references is not option.
Comment 8 Bernd 2006-09-06 11:19:39 PDT
Or with a closer look I don't see where my patch is more or less wallpaper than yours. The issue is the 0 span handling. I will rewrite this as I am sick of it. But this is not branchable. So I need a patch that cuts all possible entry pathes towards the possibilities of throwing cell frames.
Comment 9 Mats Palmgren (:mats) 2006-09-06 13:59:31 PDT
(In reply to comment #7)
> However my patch does cover all callers of InsertCells, yours does not.

Yes, that is intentional. I don't think we should silently "repair"
bad indicies like you did. I don't mind making the code robust against
errors, but please keep the assertion in that case so we can track down
the bad callers and fix them.

> the logic that you propose should be incorporated into
> nsTableFrame::InsertCells. 

Ok. If you want to extend InsertCells to mean Append when
the index is one beyond the last cell that's fine with me.
Please assert for other invalid indicies though, so we can
catch bugs in callers.

(In reply to comment #8)
> Or with a closer look I don't see where my patch is more or less
> wallpaper than yours.

Well, your code allows me to write buggy code without being notified
of my error.

My patch on the other hand fixes the root cause of this particular
bug which is that after AttributeChangedFor() have called RemoveCell()
the col index will be too large if this is the last cell in a row.
We should Append the cell in this case since InsertCells() doesn't
handle that case.

I agree that making the low level code more robust for future errors
is a good thing, but please keep the assertion in that case
(and extend it to assert when |aColIndexBefore| is negative too).
Comment 10 Mats Palmgren (:mats) 2006-09-06 13:59:58 PDT
Bernd, don't get me wrong, I have the greatest respect and appreciation
for your hard work on the table code.

I just don't think "repairing" bad arguments (silently) in low level
functions is a good strategy to make quality code (in general).
And, as far as I know, we don't do that in other layout code.
Comment 11 Bernd 2006-09-06 22:51:49 PDT
Mats I will happily leave the assert where it is but I don't see any point to wallpaper one caller and to wait till the next caller will trip the condition at the low level routine. The goal to me seems to be write safe subfunctions that could not fail rather than to put the load on the callers. Especially given the numerous branches that we maintain.

>My patch on the other hand fixes the root cause of this particular
>bug which is that after AttributeChangedFor()

I would be so happy if this would be the root, but it is not.
The root is the wrong zero span handling. What happens here is:



  document.getElementById('A').setAttribute('colspan', 0);
  document.getElementById('B').setAttribute('rowspan', 3);

<table>
  <tr>
    <td id="A">TD</td>
    <td id="B" colspan="3">TD</td>
  </tr>
</table>

The table has initially 4 columns
then we call the first time AttributeChanged
it removes the first cell the number of cols goes down to 3
then we insert the cell this will we need to write it before the first cell, so we empty the row and then append the first cell into the row. Everything is OK up to here. Now we append the second cell it looks for the first free space (cellmap hole or a dead cell) but append has a secondary function it expands zero spans and this happens here. So the first cell is expanded till it covers all columns. That is equivalent to a colspan 3. Then the first place to place the second cell is at col 3 (starting with 0).
In the cellmap dump it looks like:

C0 C C C1 C C              <===== this is wrong

while it should ideally look like 

C0 C1 C C

or at least as (due to the fact that we insert a zero span as minimum colspan=2)

C0 C C1 C C

Cx stands for real cells, C for colspanned

So on the second round of calls we remove the second cell it causes a cellmap rebuild and we end with

C0 C 

Now when we insert the cell again the cellmap underneath has changed the position 2 is now after all cells. What we shall do now is an Append (your proposal) or as i would call it an insert after pos 1. But both of them are wallpapers.

The root is cellmap like this C0 C C C1 C C where C0 is a cell with colspan 0.
So the question now is where do we wallpaper at the level where the error occures or at the caller level. My proposal is the first one, yours the second. The disadvantage in the second approach is that we put the burden on the caller that it detects misbehaviour of the callee. "Nobody expects the spanish inquisition" (http://en.wikipedia.org/wiki/The_Spanish_Inquisition_(Monty_Python)) is the rule that describes such an approach.

Why do I would like to wallpaper at all:
I would like to have at least some fix on 1.8.1. The correct fix IMHO is to remove at
http://lxr.mozilla.org/seamonkey/source/layout/tables/nsCellMap.cpp#2433
 aUpdateZeroSpan argument and implement a new mechanism to expand zerospans.
This will not be done in 10 lines. So I am afraid that I will not get it into 1.8.1.
Comment 12 Bernd 2006-09-06 22:55:45 PDT
s/The root is cellmap like this C0 C C C1 C C where C0 is a cell with colspan 0./
The root problem is when we have a cellmap like this: C0 C C C1 C C , where C0 is a cell with colspan 0./
Comment 13 Boris Zbarsky [:bz] (still a bit busy) 2006-09-07 00:23:17 PDT
Comment on attachment 236847 [details] [diff] [review]
don't assert but fix it

I'm OK with this for branch if we keep the assert.  On trunk, let's work on doing the "real fix" in either this bug or a followup?
Comment 14 Mats Palmgren (:mats) 2006-09-07 09:34:38 PDT
(In reply to comment #11)
> I would be so happy if this would be the root, but it is not.

Ok, you have convinced me that my proposed fix is also just wallpaper
and that yours is better since it covers more cases.

> The goal to me seems to be write safe subfunctions that could not fail
> rather than to put the load on the callers.

Yes and no.
A function should protest when given faulty input values and the root cause
should be tracked down and fixed. That doesn't conflict with the goal that
functions can also be robust.

I would love to see more assertion checks in the table code, not only does
it allow us to find bugs, but it also serves as documentation of
the assumptions the code was written under.

> I would like to have at least some fix on 1.8.1.

I agree that a minimal patch (like yours) is the way to go for branches.
You could also assert when "aColIndexBefore < -1".
Comment 15 Bernd 2006-09-09 07:08:28 PDT
Created attachment 237485 [details] [diff] [review]
patch with asserts
Comment 16 Bernd 2006-09-09 08:40:48 PDT
Created attachment 237496 [details] [diff] [review]
revised patch

The function might be called with -1 as a colindex. This is legal as the argument shows the cell after which the new cells shall be inserted. After -1 goes 0 so we insert at the beginning of a row. Requesting a new review.
Comment 17 Bernd 2006-09-10 08:46:49 PDT
The real fix is bug 351942
Comment 18 Boris Zbarsky [:bz] (still a bit busy) 2006-09-10 10:22:02 PDT
Comment on attachment 237496 [details] [diff] [review]
revised patch

You could still assert that aColIndexBefore >= -1....
Comment 19 Bernd 2006-09-12 22:34:28 PDT
fixed on trunk
Comment 20 Bernd 2006-09-16 00:55:12 PDT
Does anybody see this on branch if so ask for 1.8.1.1 approval
Comment 21 Martijn Wargers [:mwargers] (not working for Mozilla) 2006-09-16 01:39:23 PDT
Yes, it also crashes 1.8.1 builds.
Comment 22 Mike Schroepfer 2006-09-16 17:29:00 PDT
Comment on attachment 237496 [details] [diff] [review]
revised patch

a=schrep - approved to land on 1.8.1 branch before rc1.
Comment 23 Bernd 2006-09-18 09:52:18 PDT
fixed on branch
Comment 24 Daniel Veditz [:dveditz] 2006-09-20 14:30:59 PDT
This is appropriate for 1.8.0.x right?
Comment 25 Bernd 2006-09-20 21:33:09 PDT
exactly, the real patch is a a 60kb one - bug 351942. I made this patch as I anticipated that bug 351942 is not branchable.
Comment 26 Bernd 2006-09-21 01:38:35 PDT
Comment on attachment 237496 [details] [diff] [review]
revised patch

asking approval
Comment 27 Daniel Veditz [:dveditz] 2006-09-26 14:32:41 PDT
Comment on attachment 237496 [details] [diff] [review]
revised patch

approved for 1.8.0 branch, a=dveditz for drivers
Comment 28 Daniel Veditz [:dveditz] 2006-10-06 13:51:56 PDT
Fix checked into MOZILLA_1_8_0_BRANCH
Comment 29 Jay Patel [:jay] 2006-10-20 14:02:30 PDT
v.fixed on 1.8.0 branch with Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.0.8pre) Gecko/20061020 Firefox/1.5.0.8pre, no crash with testcase.
Comment 30 Muktha 2007-10-26 05:11:09 PDT
Created attachment 286279 [details] [diff] [review]
Patch for Mozilla 1.7

After applying the patch provided in the bug report, noticed that Mozilla 1.7 still crashed with the testcase.

Following is the snippet of the stack trace

  [1] __lwp_kill(0x1, 0xb), at 0xd27811a5
  [2] pthread_kill(0x1, 0xb), at 0xd277e072
  [3] raise(0xb), at 0xd272cf43
  [4] nsProfileLock::FatalSignalHandler(0xb, 0x0, 0x8045bac), at 0xcdff998e
  [5] __sighndlr(0xb, 0x0, 0x8045bac, 0xcdff9860), at 0xd277fc9f
  [6] call_user_handler(0xb, 0x0, 0x8045bac), at 0xd277623d
  [7] sigacthandler(), at 0xd27763bd
  ---- called from signal handler with signal 11 (SIGSEGV) ------
=>[8] TableBackgroundPainter::TableBackgroundData::IsVisible(0x2), at 0xce66bc8c
  [9] TableBackgroundPainter::PaintCell(0x80460f8, 0x8b4a8f8, 0x0), at 0xce66b5a1
  [10] TableBackgroundPainter::PaintRow(0x80460f8, 0x8b4a5a0, 0x0), at 0xce66b44a
  [11] TableBackgroundPainter::PaintRowGroup(0x80460f8, 0x8b4a564, 0x0), at 0xce66b176
  [12] TableBackgroundPainter::PaintTable(0x80460f8, 0x8b4a294), at 0xce66ae66
  [13] nsTableFrame::Paint(0x8b4a294, 0x8b84b98, 0x88099a8, 0x804623c, 0x0, 0x0), at 0xce64045c
  [14] nsContainerFrame::PaintChild(0x8b4a140, 0x8b84b98, 0x88099a8, 0x80462f8, 0x8b4a294, 0x0, 0x0), at 0xce4b1ccf
  [15] nsTableOuterFrame::Paint(0x8b4a140, 0x8b84b98, 0x88099a8, 0x80462f8, 0x0, 0x0), at 0xce656848
  [16] nsContainerFrame::PaintChild(0x8b49fcc, 0x8b84b98, 0x88099a8, 0x8046520, 0x8b4a140, 0x0, 0x0), at 0xce4b1ccf
  [17] nsBlockFrame::PaintChild(0x8b49fcc, 0x8b84b98, 0x88099a8, 0x8046520, 0x8b4a140, 0x0, 0x0), at 0xce4a3de2
  [18] PaintLine(0x80463bc, 0x8046520, 0x80463cc, 0x0, 0x8046404, 0x8b84b98, 0x88099a8, 0x0, 0x8b49fcc), at 0xce49fee7
  [19] nsBlockFrame::PaintChildren(0x8b49fcc, 0x8b84b98, 0x88099a8, 0x8046520, 0x0, 0x0), at 0xce49c1b2
  [20] nsHTMLContainerFrame::PaintDecorationsAndChildren(0x8b49fcc, 0x8b84b98, 0x88099a8, 0x8046520, 0x0, 0x1, 0x0), at 0xce4dc8f3/

From the stack trace it appears that,

o TableBackgroundPainter::PaintCell (nsTablePainter.cpp) the   Cols[colIndex].mColGroup is corrupted.

o It is also seen that when the crash happens, the colIndex value is equal to the value of mNumCols.

Have attached a patch to fix this problem. Kindly review.
Comment 31 Bernd 2007-10-28 01:51:52 PDT
The patch for 1.7 is wrong/wall paper. The index should not be out of bound at this place. The place to fix this is inside the cellmap.
Comment 32 Jesse Ruderman 2007-12-16 20:26:47 PST
Crashtest checked in.
Comment 33 David Baron :dbaron: ⌚️UTC-10 2008-02-13 15:29:54 PST
Comment on attachment 286279 [details] [diff] [review]
Patch for Mozilla 1.7

marking review- per Bernd's comment 31

Note You need to log in before you can comment on or make changes to this bug.