Closed Bug 313295 Opened 19 years ago Closed 15 years ago

hang due to many invalid col elements

Categories

(Core :: Layout: Tables, defect)

x86
Windows XP
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: masayuki, Unassigned)

References

(Depends on 1 open bug, )

Details

(Keywords: hang, testcase)

Attachments

(7 files, 2 obsolete files)

If a table has many invalid col element, hang up.

<table>
<col span="7"><tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
<col span="7"><tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
<col span="7"><tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
<col span="7"><tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
<col span="7"><tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
<col span="7"><tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
...
Attached file reduced testcase
Summary: Hang up by many invalid col elements → many invalid col elements cause hang
Summary: many invalid col elements cause hang → hang due to many invalid col elements
Probably there is a wider problem wih tables, see this page:
http://animeoltre.altervista.org/listone.html

I get 100% of CPU load and a non-responsive system until it is fully loaded or I click on the "Stop" button.
Moreover the page takes a lot of time to load.

Tested with IE 6.0 and no problems with it.

Using Firefox 1.5 Beta 2 (English) on Windows 2000 SP4 (English)
Pentium 3 Mobile 1 GHz with 256 MB RAM
there is one reason, why I hate these performance bug, there is always somebody who attaches some url to the bugs (oh this page is also slow). The trick is break this down to a managable testcase
Sorry, I didn't know I couldn't post an URL as a test case!

Really, I didn't know, sorry again.
You can and should of course posts URL's, the question is more how usefull is the URL. This bug is very usefull (for me at least) because it demonstrates a single failure. So a good performance bug has a *reduced* testcase that points to a single code weakness. We have the "large tables are slow" garbage bin already for all not so specific cases. When in doubt attach it to that bug. If you are however certain that your testcase does point to a single weakness, then PLEASE file a bug with that testcase. You probably can't imagine how seldom and valuable are such "reduced" performance testcases. If your testcase can be reduced to 2-3k of code and is still slow, you know that you are there.

(this is the political correct version of comment 5)
Boris could you jprof the reduced testcase.
Total hit count: 339561
Count %Total  Function Name
320047   94.3     nsCellMap::GetDataAt(nsTableCellMap&, int, int, int)
17399   5.1     __i686.get_pc_thunk.bx

There were 337415 hits under nsCellMap::GetDataAt.  Of those, GetDataAt was called from nsCellMap::RowIsSpannedInto 223663 times, and from nsCellMap::GetCellInfoAt 113745 times.

The ultimate caller (under reflow(), of course) of RowIsSpannedInto is nsTableRowGroupFrame::CalculateRowHeights.

The ultimate caller (again, under reflow()) of GetCellInfoAt is BasicTableLayoutStrategy::AssignNonPctColumnWidths
Oh, and the caller of AssignNonPctColumnWidths is BasicTableLayoutStrategy::Initialize.
taking
Assignee: nobody → bernd_mozilla
The number of effective columns tells us allready where is the last column where we should expect a originating cell.
Attachment #201090 - Flags: review?(bzbarsky)
Attachment #201090 - Flags: superreview?(bzbarsky)
Attachment #201090 - Attachment is obsolete: true
Attachment #201092 - Flags: superreview?(bzbarsky)
Attachment #201092 - Flags: review?(bzbarsky)
Attachment #201090 - Flags: superreview?(bzbarsky)
Attachment #201090 - Flags: review?(bzbarsky)
bernd, what does GetEffectiveColCount() return in this case, in terms of the markup?
1 as we have only one cell, GetEffectiveColCount gives the number of  cols that we need to cover the cells back. see http://www.mozilla.org/newlayout/doc/table-layout.html Effective Columns
Comment on attachment 201092 [details] [diff] [review]
paranthese precedence

So the effect is to just put a |if (colX < numEffCols)| around that whole for loop, right?  Could you just do that?  No need to evaluate this condition for every single iteration for those cases that colX < numEffCols.

r+sr=bzbarsky with that change.
Attachment #201092 - Flags: superreview?(bzbarsky)
Attachment #201092 - Flags: superreview+
Attachment #201092 - Flags: review?(bzbarsky)
Attachment #201092 - Flags: review+
fix checked in, Boris could you please verify with another profile that there is no other hotspot.
That patch fixed the BasicTableLayoutStrategy::AssignNonPctColumnWidths callstack.  The profile now shows:

Total hit count: 128383
Count %Total  Function Name
107259   83.5     nsCellMap::GetDataAt(nsTableCellMap&, int, int, int)
12997   10.1     __libc_poll
5964   4.6     __i686.get_pc_thunk.bx

with all calls to nsCellMap::GetDataAt coming from nsCellMap::RowIsSpannedInto, called from nsTableRowGroupFrame::CalculateRowHeights.

Should that be spun off into a different bug?
Boris,I like this bug, so what I will do is: to provide the cellMap methods a optional argument numEffCols if we loop over the table. The question is only should I do that as two functions which doubles the code, or as pointer which defaults to nsNull? I am not sure whether http://lxr.mozilla.org/seamonkey/source/layout/tables/nsTableRowGroupFrame.h#273 and its optional argument is still political correct (aka reviewable).
You could also do it as PRInt32 that defaults to -1 for unbounded, no?  That is, is -1 ever a valid value for the effective col count?
Attached patch patch (obsolete) — Splinter Review
Thats basically the same idea as in the patch above only inside calcrowheights.
Attachment #201313 - Flags: superreview?(bzbarsky)
Attachment #201313 - Flags: review?(bzbarsky)
Comment on attachment 201313 [details] [diff] [review]
patch

>Index: nsCellMap.cpp
>+PRBool nsTableCellMap::RowIsSpannedInto(PRInt32 >+      if (aNumEffCols < 0)
>+        return cellMap->RowIsSpannedInto(*this, rowIndex);
>+      return cellMap->RowIsSpannedInto(*this, rowIndex, aNumEffCols);

Why bother with the if?  nsCellMap::RowIsSpannedInto does its own check for negative aNumEffCols, so you should just call 

  return cellMap->RowIsSpannedInto(*this, rowIndex, aNumEffCols);

here unconditionally, I think.

>+PRBool nsTableCellMap::RowHasSpanningCells(PRInt32 aRowIndex,

Similar here.

>Index: nsTableFrame.cpp
>+PRBool nsTableFrame::RowHasSpanningCells(PRInt32 aRowIndex, PRInt32 aNumEffCols)

Similar here.

>+PRBool nsTableFrame::RowIsSpannedInto(PRInt32 aRowIndex, PRInt32 aNumEffCols)

And here.

>Index: nsTableRowGroupFrame.cpp
>+  PRInt32 numEffCols = tableFrame->GetEffectiveColCount();
...
>+      if (!tableFrame->RowIsSpannedInto(startRowIndex, numEffCols))

Are there ever times when we would NOT want to limit the search to the effective columns?  That is, should the GetEffectiveColCount() call be in nsTableFrame::RowIsSpannedInto?
Attached patch revised patchSplinter Review
I would like to avoid to move the call inside both methods as we will double the search, usually both functions go in pairs or are at least related so that external calling will do the work only once.
Attachment #201313 - Attachment is obsolete: true
Attachment #201656 - Flags: superreview?(bzbarsky)
Attachment #201656 - Flags: review?(bzbarsky)
Attachment #201313 - Flags: superreview?(bzbarsky)
Attachment #201313 - Flags: review?(bzbarsky)
Attachment #201656 - Flags: superreview?(bzbarsky)
Attachment #201656 - Flags: superreview+
Attachment #201656 - Flags: review?(bzbarsky)
Attachment #201656 - Flags: review+
Blocks: 271789
fix checked in, Boris could you verify with a jprof that there is not hotspot left. If it is really gone, then I would like to proceed with bug 271789
There are no more hotspots on the "reduced testcase".

On the "original testcase" I see (for the pageload):

Total hit count: 90534
Count %Total  Function Name
18472   20.4     __libc_poll
11665   12.9     nsTableCellMap::GetNumCellsOriginatingInCol(int) const
5863   6.5     nsTableCellMap::GetEffectiveColSpan(int, int)
5335   5.9     nsTableCellMap::GetEffectiveRowSpan(int, int)
2956   3.3     nsTableFrame::GetNumCellsOriginatingInCol(int) const
2346   2.6     nsTableFrame::GetCellMap() const
2033   2.2     nsTableCellMap::GetCellInfoAt(int, int, int*, int*)

In all, 19259 hits are under nsTableFrame::GetEffectiveColCount, most of them under nsTableFrame::GetNumCellsOriginatingInCol, most of those in nsTableCellMap::GetNumCellsOriginatingInCol(int).
Attachment #201873 - Attachment is patch: false
Attachment #201873 - Attachment mime type: text/plain → application/zip
Boris, did the patch in bug 271789 improve the jprof?
> Boris, did the patch in bug 271789 improve the jprof?

Without that patch:

Total hit count: 81610
Count %Total  Function Name
13292   16.3     nsTableCellMap::GetNumCellsOriginatingInCol(int) const
7188   8.8     nsTableCellMap::GetEffectiveColSpan(int, int)
6339   7.8     nsTableCellMap::GetEffectiveRowSpan(int, int)
3581   4.4     nsTableFrame::GetNumCellsOriginatingInCol(int) const
2926   3.6     nsTableFrame::GetCellMap() const
2481   3.0     nsTableCellMap::GetCellInfoAt(int, int, int*, int*)
2375   2.9     nsSplittableFrame::GetFirstInFlow() const

With that patch:

Total hit count: 92749
Count %Total  Function Name
18059   19.5     nsTableCellMap::GetNumCellsOriginatingInCol(int) const
6726   7.3     nsTableCellMap::GetEffectiveRowSpan(int, int)
6703   7.2     nsTableCellMap::GetEffectiveColSpan(int, int)
4428   4.8     nsTableFrame::GetNumCellsOriginatingInCol(int) const
3617   3.9     nsTableFrame::GetCellMap() const
2811   3.0     nsSplittableFrame::GetFirstInFlow() const
2771   3.0     nsTableCellMap::GetCellInfoAt(int, int, int*, int*)
1796   1.9     nsTableFrame::GetEffectiveColCount() const

So no, not really...
I cannot reproduce this bug. The patch was checked-in?
see comment 24
Thanks, but why this bug is still opening?
The bug is still open because the original testcase is still pretty slow (see comment 25).
Boris is this still an issue after the reflow branch landed?
Well.  I still see a hang on that testcase and that hang is a good bit longer than for Opera.  So I guess this is an issue, yes.  ;)

The current profile shows:

Total hit count: 1037974
Count %Total  Function Name
472436   45.5     nsCellMap::GetRowCount(int) const
456911   44.0     nsTableCellMap::GetCellInfoAt(int, int, int*, int*) const

The calls to GetRowCount() come about equally from nsTableCellMap::GetCellInfoAt and from BasicTableLayoutStrategy::ComputeColumnIntrinsicWidths.  All the calls to nsTableCellMap::GetCellInfoAt come from BasicTableLayoutStrategy::ComputeColumnIntrinsicWidths.  There's a total of 980369 hits under BasicTableLayoutStrategy::ComputeColumnIntrinsicWidths, so that accounts for most of the time.

I guess at least part of the issue with the GetRowCount() calls is that we make them once for every column in BasicTableLayoutStrategy.cpp.  That's easy enough to fix -- I filed bug 366382 on it.  I also tried applying the patch for bug 357729, which makes nsCellMap::GetRowCount() very fast, and then all the time is spent in nsTableCellMap::GetCellInfoAt...  So maybe we don't need that patch for bug 366382 after all.

What we _really_ need here, I suspect, is a fix for bug 352461.
Depends on: 352461, 357729, 366382
So I did a little more poking around, and now I see why GetRowCount was coming up so much.  This testcase has 947 <col> elements.  That means the parser also synthesizes 947 <tbody> tags (is that correct, even?).  So all that linked-list stuff in nsTableCellMap ends up being Really Slow if we do it.

In any case, with my prototype patch for bug 352461 I see something like:

Total hit count: 185709
Count %Total  Function Name
21968   11.8     nsTableCellMap::GetNumCellsOriginatingInCol(int) const
21928   11.8     nsTableCellMap::GetMapFor(nsTableRowGroupFrame&)

where GetMapFor is called by nsTableCellMap::Synchronize from nsTableFrame::InsertRowGroups.  This is slow probably because there are so many rowgroups and we start searching at the beginning.  I filed bug 366892 on this.

nsTableCellMap::GetNumCellsOriginatingInCol() is called by nsTableFrame::GetNumCellsOriginatingInCol(), which is called by nsTableFrame::GetEffectiveColCount(), which is called by nsTableRowGroupFrame::CalculateRowHeights().  This call is made _once_ in CalculateRowHeights.  So the problem is that we're calling CalculateRowHeights so many times.  :(  Well, that and in this case GetEffectiveColCount has to walk over almost all the cols -- GetColCount() returns 1456 while GetEffectiveColCount() returns 5.  Perhaps we could cache the return value in nsTableFrame and invalidate the cache when we change the cellmap?

I should note that this is still a vast improvement.  The 1037974 hits I was getting before were just me running the profiler for a bit then killing the app; this 185709 hits is for the full pageload.
Depends on: 366892
This testcase has 947 <col> elements.  That means the parser also
synthesizes 947 <tbody> tags (is that correct, even?). 

hell, NO. One tbody should be generated and thats it.
Well.  The <col> tags come interspersed in between <tr> tags...  What _should_ happen with those <col>s?

Attachment #200369 - Attachment mime type: text/html → text/html; charset=Shift_JIS
Attachment #200370 - Attachment mime type: text/html → text/html; charset=Shift_JIS
bernd, should we get a parser bug filed on the behavior here?  What behavior _do_ we want?
Depends on: 420557
I filed bug 420557 for the parser issue
There is no sense in owning this, once the parser side is fixed this needs be revisited.
Assignee: bernd_mozilla → nobody
Keywords: testcase
we render the testcase rather fast < 2 sec, this is wfm with current trunk?
yeah, it's fine. thank you.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: