Last Comment Bug 313295 - hang due to many invalid col elements
: hang due to many invalid col elements
Status: RESOLVED WORKSFORME
: hang, testcase
Product: Core
Classification: Components
Component: Layout: Tables (show other bugs)
: Trunk
: x86 Windows XP
: -- critical with 1 vote (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
:
Mentors:
http://www.standard-comm.co.jp/login/...
Depends on: 357729 352461 366382 366892 420557
Blocks: 271789
  Show dependency treegraph
 
Reported: 2005-10-21 11:28 PDT by Masayuki Nakano [:masayuki] (Mozilla Japan)
Modified: 2009-07-21 22:31 PDT (History)
10 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
testcase(hung up) (115.57 KB, text/html; charset=Shift_JIS)
2005-10-21 11:28 PDT, Masayuki Nakano [:masayuki] (Mozilla Japan)
no flags Details
testcase2(same above, but it doesn't have col element) (104.47 KB, text/html; charset=Shift_JIS)
2005-10-21 11:29 PDT, Masayuki Nakano [:masayuki] (Mozilla Japan)
no flags Details
reduced testcase (475 bytes, text/html)
2005-10-22 10:37 PDT, Bernd
no flags Details
Profile of "reduced testcase" (76.91 KB, text/html)
2005-10-27 11:36 PDT, Boris Zbarsky [:bz] (still a bit busy)
no flags Details
patch to close the AssignNonPctColumns hoole (2.96 KB, patch)
2005-10-28 00:01 PDT, Bernd
no flags Details | Diff | Splinter Review
paranthese precedence (2.96 KB, patch)
2005-10-28 00:04 PDT, Bernd
bzbarsky: review+
bzbarsky: superreview+
Details | Diff | Splinter Review
patch (13.14 KB, patch)
2005-10-30 00:53 PDT, Bernd
no flags Details | Diff | Splinter Review
revised patch (16.08 KB, patch)
2005-11-02 11:44 PST, Bernd
bzbarsky: review+
bzbarsky: superreview+
Details | Diff | Splinter Review
Zipped-up profile of original testcase (331.81 KB, application/zip)
2005-11-04 12:44 PST, Boris Zbarsky [:bz] (still a bit busy)
no flags Details

Description Masayuki Nakano [:masayuki] (Mozilla Japan) 2005-10-21 11:28:10 PDT
If a table has many invalid col element, hang up.

<table>
<col span="7"><tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
<col span="7"><tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
<col span="7"><tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
<col span="7"><tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
<col span="7"><tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
<col span="7"><tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td>
...
Comment 1 Masayuki Nakano [:masayuki] (Mozilla Japan) 2005-10-21 11:28:59 PDT
Created attachment 200369 [details]
testcase(hung up)
Comment 2 Masayuki Nakano [:masayuki] (Mozilla Japan) 2005-10-21 11:29:45 PDT
Created attachment 200370 [details]
testcase2(same above, but it doesn't have col element)
Comment 3 Bernd 2005-10-22 10:37:31 PDT
Created attachment 200453 [details]
reduced testcase
Comment 4 agarberisoft 2005-10-25 10:00:55 PDT
Probably there is a wider problem wih tables, see this page:
http://animeoltre.altervista.org/listone.html

I get 100% of CPU load and a non-responsive system until it is fully loaded or I click on the "Stop" button.
Moreover the page takes a lot of time to load.

Tested with IE 6.0 and no problems with it.

Using Firefox 1.5 Beta 2 (English) on Windows 2000 SP4 (English)
Pentium 3 Mobile 1 GHz with 256 MB RAM
Comment 5 Bernd 2005-10-25 10:39:11 PDT
there is one reason, why I hate these performance bug, there is always somebody who attaches some url to the bugs (oh this page is also slow). The trick is break this down to a managable testcase
Comment 6 agarberisoft 2005-10-25 14:07:10 PDT
Sorry, I didn't know I couldn't post an URL as a test case!

Really, I didn't know, sorry again.
Comment 7 Bernd 2005-10-26 13:21:15 PDT
You can and should of course posts URL's, the question is more how usefull is the URL. This bug is very usefull (for me at least) because it demonstrates a single failure. So a good performance bug has a *reduced* testcase that points to a single code weakness. We have the "large tables are slow" garbage bin already for all not so specific cases. When in doubt attach it to that bug. If you are however certain that your testcase does point to a single weakness, then PLEASE file a bug with that testcase. You probably can't imagine how seldom and valuable are such "reduced" performance testcases. If your testcase can be reduced to 2-3k of code and is still slow, you know that you are there.

(this is the political correct version of comment 5)
Comment 8 Bernd 2005-10-27 11:29:30 PDT
Boris could you jprof the reduced testcase.
Comment 9 Boris Zbarsky [:bz] (still a bit busy) 2005-10-27 11:36:02 PDT
Created attachment 201014 [details]
Profile of "reduced testcase"

Total hit count: 339561
Count %Total  Function Name
320047   94.3     nsCellMap::GetDataAt(nsTableCellMap&, int, int, int)
17399   5.1     __i686.get_pc_thunk.bx

There were 337415 hits under nsCellMap::GetDataAt.  Of those, GetDataAt was called from nsCellMap::RowIsSpannedInto 223663 times, and from nsCellMap::GetCellInfoAt 113745 times.

The ultimate caller (under reflow(), of course) of RowIsSpannedInto is nsTableRowGroupFrame::CalculateRowHeights.

The ultimate caller (again, under reflow()) of GetCellInfoAt is BasicTableLayoutStrategy::AssignNonPctColumnWidths
Comment 10 Boris Zbarsky [:bz] (still a bit busy) 2005-10-27 11:37:44 PDT
Oh, and the caller of AssignNonPctColumnWidths is BasicTableLayoutStrategy::Initialize.
Comment 11 Bernd 2005-10-27 23:59:51 PDT
taking
Comment 12 Bernd 2005-10-28 00:01:23 PDT
Created attachment 201090 [details] [diff] [review]
patch to close the AssignNonPctColumns hoole

The number of effective columns tells us allready where is the last column where we should expect a originating cell.
Comment 13 Bernd 2005-10-28 00:04:41 PDT
Created attachment 201092 [details] [diff] [review]
paranthese precedence
Comment 14 Boris Zbarsky [:bz] (still a bit busy) 2005-10-28 05:28:28 PDT
bernd, what does GetEffectiveColCount() return in this case, in terms of the markup?
Comment 15 Bernd 2005-10-28 09:26:05 PDT
1 as we have only one cell, GetEffectiveColCount gives the number of  cols that we need to cover the cells back. see http://www.mozilla.org/newlayout/doc/table-layout.html Effective Columns
Comment 16 Boris Zbarsky [:bz] (still a bit busy) 2005-10-28 10:31:28 PDT
Comment on attachment 201092 [details] [diff] [review]
paranthese precedence

So the effect is to just put a |if (colX < numEffCols)| around that whole for loop, right?  Could you just do that?  No need to evaluate this condition for every single iteration for those cases that colX < numEffCols.

r+sr=bzbarsky with that change.
Comment 17 Bernd 2005-10-29 07:40:27 PDT
fix checked in, Boris could you please verify with another profile that there is no other hotspot.
Comment 18 Boris Zbarsky [:bz] (still a bit busy) 2005-10-29 08:00:42 PDT
That patch fixed the BasicTableLayoutStrategy::AssignNonPctColumnWidths callstack.  The profile now shows:

Total hit count: 128383
Count %Total  Function Name
107259   83.5     nsCellMap::GetDataAt(nsTableCellMap&, int, int, int)
12997   10.1     __libc_poll
5964   4.6     __i686.get_pc_thunk.bx

with all calls to nsCellMap::GetDataAt coming from nsCellMap::RowIsSpannedInto, called from nsTableRowGroupFrame::CalculateRowHeights.

Should that be spun off into a different bug?
Comment 19 Bernd 2005-10-29 08:58:49 PDT
Boris,I like this bug, so what I will do is: to provide the cellMap methods a optional argument numEffCols if we loop over the table. The question is only should I do that as two functions which doubles the code, or as pointer which defaults to nsNull? I am not sure whether http://lxr.mozilla.org/seamonkey/source/layout/tables/nsTableRowGroupFrame.h#273 and its optional argument is still political correct (aka reviewable).
Comment 20 Boris Zbarsky [:bz] (still a bit busy) 2005-10-29 09:11:20 PDT
You could also do it as PRInt32 that defaults to -1 for unbounded, no?  That is, is -1 ever a valid value for the effective col count?
Comment 21 Bernd 2005-10-30 00:53:45 PDT
Created attachment 201313 [details] [diff] [review]
patch

Thats basically the same idea as in the patch above only inside calcrowheights.
Comment 22 Boris Zbarsky [:bz] (still a bit busy) 2005-10-30 05:50:05 PST
Comment on attachment 201313 [details] [diff] [review]
patch

>Index: nsCellMap.cpp
>+PRBool nsTableCellMap::RowIsSpannedInto(PRInt32 >+      if (aNumEffCols < 0)
>+        return cellMap->RowIsSpannedInto(*this, rowIndex);
>+      return cellMap->RowIsSpannedInto(*this, rowIndex, aNumEffCols);

Why bother with the if?  nsCellMap::RowIsSpannedInto does its own check for negative aNumEffCols, so you should just call 

  return cellMap->RowIsSpannedInto(*this, rowIndex, aNumEffCols);

here unconditionally, I think.

>+PRBool nsTableCellMap::RowHasSpanningCells(PRInt32 aRowIndex,

Similar here.

>Index: nsTableFrame.cpp
>+PRBool nsTableFrame::RowHasSpanningCells(PRInt32 aRowIndex, PRInt32 aNumEffCols)

Similar here.

>+PRBool nsTableFrame::RowIsSpannedInto(PRInt32 aRowIndex, PRInt32 aNumEffCols)

And here.

>Index: nsTableRowGroupFrame.cpp
>+  PRInt32 numEffCols = tableFrame->GetEffectiveColCount();
...
>+      if (!tableFrame->RowIsSpannedInto(startRowIndex, numEffCols))

Are there ever times when we would NOT want to limit the search to the effective columns?  That is, should the GetEffectiveColCount() call be in nsTableFrame::RowIsSpannedInto?
Comment 23 Bernd 2005-11-02 11:44:21 PST
Created attachment 201656 [details] [diff] [review]
revised patch

I would like to avoid to move the call inside both methods as we will double the search, usually both functions go in pairs or are at least related so that external calling will do the work only once.
Comment 24 Bernd 2005-11-04 12:10:46 PST
fix checked in, Boris could you verify with a jprof that there is not hotspot left. If it is really gone, then I would like to proceed with bug 271789
Comment 25 Boris Zbarsky [:bz] (still a bit busy) 2005-11-04 12:43:46 PST
There are no more hotspots on the "reduced testcase".

On the "original testcase" I see (for the pageload):

Total hit count: 90534
Count %Total  Function Name
18472   20.4     __libc_poll
11665   12.9     nsTableCellMap::GetNumCellsOriginatingInCol(int) const
5863   6.5     nsTableCellMap::GetEffectiveColSpan(int, int)
5335   5.9     nsTableCellMap::GetEffectiveRowSpan(int, int)
2956   3.3     nsTableFrame::GetNumCellsOriginatingInCol(int) const
2346   2.6     nsTableFrame::GetCellMap() const
2033   2.2     nsTableCellMap::GetCellInfoAt(int, int, int*, int*)

In all, 19259 hits are under nsTableFrame::GetEffectiveColCount, most of them under nsTableFrame::GetNumCellsOriginatingInCol, most of those in nsTableCellMap::GetNumCellsOriginatingInCol(int).
Comment 26 Boris Zbarsky [:bz] (still a bit busy) 2005-11-04 12:44:35 PST
Created attachment 201873 [details]
Zipped-up profile of original testcase
Comment 27 Bernd 2005-11-10 22:26:07 PST
Boris, did the patch in bug 271789 improve the jprof?
Comment 28 Boris Zbarsky [:bz] (still a bit busy) 2005-11-11 06:04:43 PST
> Boris, did the patch in bug 271789 improve the jprof?

Without that patch:

Total hit count: 81610
Count %Total  Function Name
13292   16.3     nsTableCellMap::GetNumCellsOriginatingInCol(int) const
7188   8.8     nsTableCellMap::GetEffectiveColSpan(int, int)
6339   7.8     nsTableCellMap::GetEffectiveRowSpan(int, int)
3581   4.4     nsTableFrame::GetNumCellsOriginatingInCol(int) const
2926   3.6     nsTableFrame::GetCellMap() const
2481   3.0     nsTableCellMap::GetCellInfoAt(int, int, int*, int*)
2375   2.9     nsSplittableFrame::GetFirstInFlow() const

With that patch:

Total hit count: 92749
Count %Total  Function Name
18059   19.5     nsTableCellMap::GetNumCellsOriginatingInCol(int) const
6726   7.3     nsTableCellMap::GetEffectiveRowSpan(int, int)
6703   7.2     nsTableCellMap::GetEffectiveColSpan(int, int)
4428   4.8     nsTableFrame::GetNumCellsOriginatingInCol(int) const
3617   3.9     nsTableFrame::GetCellMap() const
2811   3.0     nsSplittableFrame::GetFirstInFlow() const
2771   3.0     nsTableCellMap::GetCellInfoAt(int, int, int*, int*)
1796   1.9     nsTableFrame::GetEffectiveColCount() const

So no, not really...
Comment 29 Masayuki Nakano [:masayuki] (Mozilla Japan) 2005-12-01 05:54:27 PST
I cannot reproduce this bug. The patch was checked-in?
Comment 30 Bernd 2005-12-01 06:50:15 PST
see comment 24
Comment 31 Masayuki Nakano [:masayuki] (Mozilla Japan) 2005-12-01 07:10:25 PST
Thanks, but why this bug is still opening?
Comment 32 Boris Zbarsky [:bz] (still a bit busy) 2005-12-01 09:22:49 PST
The bug is still open because the original testcase is still pretty slow (see comment 25).
Comment 33 Bernd 2006-12-26 06:58:26 PST
Boris is this still an issue after the reflow branch landed?
Comment 34 Boris Zbarsky [:bz] (still a bit busy) 2007-01-08 19:13:55 PST
Well.  I still see a hang on that testcase and that hang is a good bit longer than for Opera.  So I guess this is an issue, yes.  ;)

The current profile shows:

Total hit count: 1037974
Count %Total  Function Name
472436   45.5     nsCellMap::GetRowCount(int) const
456911   44.0     nsTableCellMap::GetCellInfoAt(int, int, int*, int*) const

The calls to GetRowCount() come about equally from nsTableCellMap::GetCellInfoAt and from BasicTableLayoutStrategy::ComputeColumnIntrinsicWidths.  All the calls to nsTableCellMap::GetCellInfoAt come from BasicTableLayoutStrategy::ComputeColumnIntrinsicWidths.  There's a total of 980369 hits under BasicTableLayoutStrategy::ComputeColumnIntrinsicWidths, so that accounts for most of the time.

I guess at least part of the issue with the GetRowCount() calls is that we make them once for every column in BasicTableLayoutStrategy.cpp.  That's easy enough to fix -- I filed bug 366382 on it.  I also tried applying the patch for bug 357729, which makes nsCellMap::GetRowCount() very fast, and then all the time is spent in nsTableCellMap::GetCellInfoAt...  So maybe we don't need that patch for bug 366382 after all.

What we _really_ need here, I suspect, is a fix for bug 352461.
Comment 35 Boris Zbarsky [:bz] (still a bit busy) 2007-01-12 22:31:24 PST
So I did a little more poking around, and now I see why GetRowCount was coming up so much.  This testcase has 947 <col> elements.  That means the parser also synthesizes 947 <tbody> tags (is that correct, even?).  So all that linked-list stuff in nsTableCellMap ends up being Really Slow if we do it.

In any case, with my prototype patch for bug 352461 I see something like:

Total hit count: 185709
Count %Total  Function Name
21968   11.8     nsTableCellMap::GetNumCellsOriginatingInCol(int) const
21928   11.8     nsTableCellMap::GetMapFor(nsTableRowGroupFrame&)

where GetMapFor is called by nsTableCellMap::Synchronize from nsTableFrame::InsertRowGroups.  This is slow probably because there are so many rowgroups and we start searching at the beginning.  I filed bug 366892 on this.

nsTableCellMap::GetNumCellsOriginatingInCol() is called by nsTableFrame::GetNumCellsOriginatingInCol(), which is called by nsTableFrame::GetEffectiveColCount(), which is called by nsTableRowGroupFrame::CalculateRowHeights().  This call is made _once_ in CalculateRowHeights.  So the problem is that we're calling CalculateRowHeights so many times.  :(  Well, that and in this case GetEffectiveColCount has to walk over almost all the cols -- GetColCount() returns 1456 while GetEffectiveColCount() returns 5.  Perhaps we could cache the return value in nsTableFrame and invalidate the cache when we change the cellmap?

I should note that this is still a vast improvement.  The 1037974 hits I was getting before were just me running the profiler for a bit then killing the app; this 185709 hits is for the full pageload.
Comment 36 Bernd 2007-01-12 22:44:27 PST
This testcase has 947 <col> elements.  That means the parser also
synthesizes 947 <tbody> tags (is that correct, even?). 

hell, NO. One tbody should be generated and thats it.
Comment 37 Boris Zbarsky [:bz] (still a bit busy) 2007-01-12 22:56:12 PST
Well.  The <col> tags come interspersed in between <tr> tags...  What _should_ happen with those <col>s?

Comment 38 Boris Zbarsky [:bz] (still a bit busy) 2007-01-21 20:36:52 PST
bernd, should we get a parser bug filed on the behavior here?  What behavior _do_ we want?
Comment 39 Bernd 2008-03-02 02:44:11 PST
I filed bug 420557 for the parser issue
Comment 40 Bernd 2008-05-04 11:22:39 PDT
There is no sense in owning this, once the parser side is fixed this needs be revisited.
Comment 41 Bernd 2009-07-21 12:09:16 PDT
we render the testcase rather fast < 2 sec, this is wfm with current trunk?
Comment 42 Masayuki Nakano [:masayuki] (Mozilla Japan) 2009-07-21 22:31:14 PDT
yeah, it's fine. thank you.

Note You need to log in before you can comment on or make changes to this bug.