Closed Bug 373400 Opened 17 years ago Closed 16 years ago

Crash when print previewing tinderbox page [@ nsTableRowGroupFrame::SplitSpanningCells]

Categories

(Core :: Layout: Tables, defect, P2)

x86
Windows XP
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: polidobj, Assigned: bernd_mozilla)

References

()

Details

(Keywords: crash, regression, testcase, Whiteboard: [dbaron-1.9:Rw])

Crash Data

Attachments

(10 files, 3 obsolete files)

Doing a print preview on a tinderbox page crashes.  A few times it wouldn't crash in the initial print preview but would crash when changing between portrait and landscape.

The regression range I found is:
2006120804 crashes
2006120704 ok

http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2006-12-07+04%3A00%3A00&maxdate=2006-12-08+04%3A00%3A00&cvsroot=%2Fcvsroot

I'm suspecting bug 362708.  Because that's a closed bug I'm setting the security flag on this bug.  ( Just in case :) )


Incident ID: 30085199
Stack Signature	nsTableRowGroupFrame::SplitSpanningCells 6be0833e
Product ID	FirefoxTrunk
Build ID	2007030904
Trigger Time	2007-03-09 13:38:14.0
Platform	Win32
Operating System	Windows NT 5.1 build 2600
Module	firefox.exe + (0031005d)
URL visited	
User Comments	
Since Last Crash	23862 sec
Total Uptime	23862 sec
Trigger Reason	Access violation
Source File, Line No.	e:\builds\tinderbox\fx-trunk\winnt_5.2_depend\mozilla\layout\tables\nstablerowgroupframe.cpp, line 943
Stack Trace 	
nsTableRowGroupFrame::SplitSpanningCells  [mozilla/layout/tables/nstablerowgroupframe.cpp, line 943]
nsTableRowGroupFrame::SplitRowGroup  [mozilla/layout/tables/nstablerowgroupframe.cpp, line 1203]
Confirm crash in winxp debug build. 

###!!! ASSERTION: invalid continued row: 'PR_FALSE', file f:/work/mozilla/builds/1.9.0/mozilla/layout/tables/nsTableRowGroupFrame.cpp, line 1007

  for (nsTableRowFrame* row = &aFirstRow; !wasLast; row = row->GetNextRow()) {
    wasLast = (row == &aLastRow);
=>    PRInt32 rowIndex = row->GetRowIndex();
    nsPoint rowPos = row->GetPosition();

null dereference on row

On Linux I see lots of memory|cpu thrashing and a temporary hang of a minute or so...
This crashes for me when switching to landscape on print preview.

This testcase and the url don't crash for me on the latest branch build on print preview.
I see this on linux as well (same crash that Bob saw, null pointer reference). Doesn't seem like an exploitable crash.
This is obviously a regression from the reflow branch landing and NOT bug 362708.  I see no reason for the security flag to be set on it.  All that does is make it hard for people to find it.
(In reply to comment #5)
> This is obviously a regression from the reflow branch landing and NOT bug
> 362708.  I see no reason for the security flag to be set on it.  All that does
> is make it hard for people to find it.
> 
Hmm I take that back.  I suppose it is possible that 362708 could be the culprit.  I will try to narrow this down.
Flags: blocking1.9?
OK> I did a lot of testing on this.

Firstly, I can not make the attached testcase crash with current builds at all.

Secondly, the crash you are seeing with that regression range is a print preview crash that was already fixed in bug 362210 and not related to this bug at all.

Thirdly, that crash was caused by bug 300010 (the reflow landing) and not by any security related fix.

Therefore, this bug should be made public and I will try to narrow down a real regression window on it.
No longer blocks: reflow-refactor
Perhaps there is something odd with the Gran Paradiso 3.0a2 release build.  It is not clear that is the same crash, and the 2007020704 nightly does NOT crash.  Still looking.
OK.  I found the real check-in that caused the regression.  Oddly it turns out it is another security fix.  Go figure.  Anyway it was bug 370360.  I verified it as the cause by backout.

Assignee: printing → nobody
Component: Print Preview → Layout: Tables
QA Contact: layout.tables
>Oddly it turns out it is another security fix.
>Go figure.

The patch in bug 370360 consists of three pieces:

1. fix the possible  array boundary violation. This got checked in into branches
2. switch to TArray
3. make cell frames to report properly the truncation status, which will cause more attempts to break cells. We crash just in this function.

1. is security related
2. is to have a sane code base
3. while I was at it I got negative heights which might be bad (integer overflow...) and I fixed it.

940   // Iterate the rows between aFirstRow and aLastRow
941   for (nsTableRowFrame* row = &aFirstRow; !wasLast; row = row->GetNextRow()) {

942     wasLast = (row == &aLastRow);
// we crash here, probably due to 0 deref.
943     PRInt32 rowIndex = row->GetRowIndex(); 
944     nsPoint rowPos = row->GetPosition();





I was not trying to imply there was something specific incorrect in the code that was changed in that patch.

We definitely do crash because of a 0 deref at that point.  THe first thing i tried (which did appear to fix this was to add.

  if (!row)
    break;

at that point.  This avoids the crash, but I was trying to find a proper fix.

I was thinking perhaps this is another case where a test for rows already pushed into the nextInFlow.

I should also point out hat for at least 2 weeks before the patch for bug 370360, there was another flaky crash related to doing a print preview on this testcase.  The crashes were internment and at not at the same point.  Backing out the patch seems to revert you to the sometimes more flaky crashes.  With the patch from 370360 it seems to always crash immediately uponn doing the print preview.  Without the patch, it will sometimes crash upon just doing the print preview, sometimes crash when scrolling the preview page, sometimes crash upon closing the print preview, and sometimes upon closing the browser.  Most of the time, however, it does not crash at all.
OK.  Before the patch for bug 370360 landed, print preview on this page would crash intermittently, but not in SplitSpanningCells.  After the patch landed, a repeatable crash in SplitSpanningCells began occurring.  By backing out the patch for bug 370360, I was able to get back to the original condition.  However, leaving the patch for bug 370360 intact, and adding the wallpaper patch from comment #13, I am unable to make it crash at all.

It would appear that for some reason aLastRow is incorrect in the call to SplitSpanningCells.
Odder still, the call to SplitSpanningCells that is crashing is the attempt to go back to the first split, passing the same values for aFirstRow and aLastRow.  As the first call did not crash it is hard to see why this one should.  I guess I will have to leave trying to fix this to someone who understands this area of the code better.
Bill don't waste your time on it, this is a table crash. ==> mine
Assignee: nobody → bernd_mozilla
Flags: blocking1.9? → blocking1.9+
There is a mismatch between http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/layout/tables/nsTableRowGroupFrame.cpp&rev=3.383&mark=953,1180#950
what splitspanning cell expects and what is done in splitrowgroup. This changed due to bug 231823 see https://bugzilla.mozilla.org/show_bug.cgi?id=231823#c15 specifically.
Attached patch sketchSplinter Review
(In reply to comment #18)
> Created an attachment (id=262663) [details]
> sketch
> 

Verifying that this patch resolves the crash issue for me.
but it regresses bug 231823 :-(
Attached patch How about this? (obsolete) — Splinter Review
With this patch I don't get a crash and don't get the regression of bug 231823.  However, it does not do as good a job of conserving paper on printing the tinderbox webpage.
Attached patch better (i think) patch (obsolete) — Splinter Review
This makes more sense to me, but does not appear to help in the print of tinderbox pages the first 2 pages still are mostly blank.
Attachment #262882 - Attachment is obsolete: true
Attached patch one-liner (obsolete) — Splinter Review
Actually, this one line change seems to be sufficient to avoid the crash.  It removes a line of code that makes little sense to me in trying to follow the logic of the original fix for bug 231823.

This does not cause any regression in the bug 231823 testcase.
Arrgghh!  This is odd.  It could be that none of those patches I posted in comments #21 #22 and #23 actually avoid the crash.  This crash is not quite a solid as I thought.  For some unexplained reason current trunk builds no longer crash on any of these testcases, nor do they seem to crash on doing a print preview of any of the current tinderbox pages.  However, I don't see nay code that has been checked in recently that appears could possibly have fixed anything in this area.
Bill, the test case asserts more than enough in a debug build. The problem with bug 231823 is that the previous code used the availheight as page break hint
see http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/layout/tables/nsTableRowGroupFrame.cpp&rev=3.383&mark=952-957#945
The page page break hint is top down measured while https://bugzilla.mozilla.org/show_bug.cgi?id=231823#c15 the availableHeight is the remainder that is measured relative to the bottom.

The original idea was to break not inside the table cells but at the row boundaries. David made the variable contain what it names in bug 231823 and I r+ed it. This efficiently turned off SplitSpanningCells.

The task here is not to avoid the crash but to find way that gives right coordinates to ReflowCellFrame inside SplitSpanningCells. Negative coordinates for the availHeight are not acceptable. Neither is it acceptable to print dozens of pages for the test case in bug 231823.





Eli, do you have an idea what I should do here?
(In reply to comment #26)
> Eli, do you have an idea what I should do here?
> 

Oh, wow; this code is pretty confusing.  The fix to bug 231823 was wrong.  The issue is that availHeight is used to mean two different incompatible things.  One is that it's used as the bottom coordinate of the row for SplitSpanningCells. The other is that it is used to mean the bottom coordinate of the rowgroup.  The patch in bug 231823 screws up both of these meanings.

I'll see if I can come up with a patch; that will be clearer than trying to explain it.
Since it now appears that the root cause of this bug appears to be bug 231823 and not any of the previously suspected currently non-public security bugs, should this bug be made public now?
Attachment #263051 - Attachment is obsolete: true
Attachment #263073 - Attachment is obsolete: true
Okay, so I tried basically backing out the patch for bug 231823.  Somehow both rows are getting continued onto each page.  This clearly shouldn't be happening, but I'm not familiar enough with the table code to understand why this is happening.
A bit more: SplitSpanningCells creates a continuation for the last row on the page.  However, when a row gets pushed off to the next page, we end up creating a continuation for that last row, which makes the row which got pushed off not the first row on the page.  Therefore, we end up creating a ton of continuations until we hit a continuation of the spanned cell that can't be split anymore.
Blocks: 231823
This is basically backing out bug 231823; I fixed a minor bug in one edge case (the "We were better off with the 1st call to SplitSpanningCells, do it again" case was using the wrong value).  The rest of this is simply variable renaming to make it clear what's going on.

In case I didn't make it clear before, the real cause of bug 231823 is at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/layout/tables/nsTableRowGroupFrame.cpp&rev=3.383&mark=972#972
In the case where the last row on the page gets pushed off to the next page, this code still creates a continuation for the last row on the page, which shouldn't get one.

I'm not entirely sure how to go about fixing the issue, though.
>In the case where the last row on the page gets pushed off to the next page,
>this code still creates a continuation for the last row on the page,
Isn't that the job of UndoContinuedRow?

The whole continuation row business seems questionable to me, why don't we push the continued cell into the next row? (Should have asked this Chris in 2002).


And if we undo we could just loop over the row and look for cells that have a previnflow.


(In reply to comment #32)
> The whole continuation row business seems questionable to me, why don't we push
> the continued cell into the next row? (Should have asked this Chris in 2002).

The point of this code is to deal with the issue that if a row has a child that continues onto the next page, but the row itself doesn't continue onto the next page, pushing the cells into the overflow doesn't work.

I actually have an idea for one way to fix this; I'll throw together a patch soon.
(In reply to comment #33)
> I actually have an idea for one way to fix this; I'll throw together a patch
> soon.

Never mind; the whole thing is way too complicated for me to trust it.
Fantasai is doing a lot of work in bug 379349 about printing and page breaks.  So far he has not looked at tables over there but seem to be getting ready to.
(In reply to comment #36)
> Created an attachment (id=266188) [details]
> patch that does not regress 231823
> 

Works great in avoiding the crash and wasting less space on tinderbox pages.

Not only does it not regress bug 231823, but it fixes the issue remaining from bug 231823 that both testcases were previously printing 2 pages, the second one being blank.  That no longer occurs with this patch.
Attached file test case with border
Attachment #269569 - Flags: review?(sharparrow1)
Comment on attachment 269569 [details] [diff] [review]
accumulated patch

Did you mean to include the nsTableFrame.cpp change?  It's fine either way, but I don't think it affects this testcase.

>-              // We're not on top of the page, so put the row on the next page to give it more height 
>-              rowIsOnPage = PR_FALSE;
>+              if (rowMetrics.height > availSize.height) { 
>+                // We're not on top of the page, so put the row on the next page to give it more height 
>+                rowIsOnPage = PR_FALSE;
>+              }

Is there some reason why the containing if statement shouldn't be "if (rowMetrics.height > availSize.height)" rather than "if (rowMetrics.height >= availSize.height)"?

Other than that, it looks good.
Blocks: 379468
I don't crash on Windows, with an up to date 1.8 branch build. Is there a reason for this bug to stay security sensitive? If I understand things correctly it's a trunk-only null deref regression?
Comment on attachment 269569 [details] [diff] [review]
accumulated patch

Marking r- until I get an answer to my question.
Attachment #269569 - Flags: review?(sharparrow1) → review-
>Did you mean to include the nsTableFrame.cpp change?
Yes, the code is wrong otherwise. I found it while I was auditing the code for this bug.

Is there some reason why the containing if statement shouldn't be "if
(rowMetrics.height > availSize.height)" rather than "if (rowMetrics.height >=
availSize.height)"?

Yes, if rowMetrics.height == availSize.height it still fits (perfectly) so why should it be pushed?


Comment on attachment 269569 [details] [diff] [review]
accumulated patch

asking again for review
Attachment #269569 - Flags: review- → review?(sharparrow1)
Attachment #269569 - Flags: review?(sharparrow1) → review+
Attachment #276272 - Flags: superreview?(dbaron)
Comment on attachment 276272 [details] [diff] [review]
patch adapted to eli's comment

>Index: nsTableFrame.cpp
>       // Special handling for incomplete children
>-      if (NS_FRAME_IS_NOT_COMPLETE(aStatus)) {         
>+      if (NS_FRAME_IS_NOT_COMPLETE(aStatus) && isPaginated) {         

And what if it's not paginated?

>Index: nsTableRowFrame.cpp
>   aCellFrame->SetSize(
>     nsSize(cellSize.width, fullyComplete ? aAvailableHeight : desiredSize.height));
> 
>   // XXX What happens if this cell has 'vertical-align: baseline' ?
>   // XXX Why is it assumed that the cell's ascent hasn't changed ?
>   if (fullyComplete) {
>+    desiredSize.height = aAvailableHeight;
>     aCellFrame->VerticallyAlignChild(mMaxCellAscent);
>   }

Why not move this addition earlier (into a different if() checking the same condition) so you can avoid the ?: inside the parameter to the SetSize call?
Comment on attachment 276272 [details] [diff] [review]
patch adapted to eli's comment

Finally got to reading the rest of this; I didn't review closely, but it seems ok.

But marking sr- pending answer to the first question in comment 48.
Attachment #276272 - Flags: superreview?(dbaron) → superreview-
Whiteboard: [dbaron-1.9:Rw]
I removed the first chunk. I added it initially, because I was concerned that we will try to split the table in non paginated mode. I am not sure thats a really good idea so its probably better to wait if this is really an issue.
Attachment #290038 - Flags: superreview?(dbaron)
Comment on attachment 290038 [details] [diff] [review]
patch to incorporate Davids comments

sr=dbaron
Attachment #290038 - Flags: superreview?(dbaron) → superreview+
fix checked in 
Status: NEW → RESOLVED
Closed: 16 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Verified fixed, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b2pre) Gecko/2007120405 Minefield/3.0b2pre

No crash anymore with any of the testcases or the tinderbox page on print preview.
Status: RESOLVED → VERIFIED
No longer blocks: 379468
Depends on: 379468
Depends on: 422249
No longer depends on: 379468
Depends on: 409084
Crash Signature: [@ nsTableRowGroupFrame::SplitSpanningCells]
Landed the three crashtests:
https://hg.mozilla.org/integration/mozilla-inbound/rev/328b65f2efa0
Group: core-security
Flags: in-testsuite? → in-testsuite+
You need to log in before you can comment on or make changes to this bug.