Open
Bug 363698
Opened 18 years ago
Updated 2 years ago
firefox chews CPU on large table
Categories
(Core :: Layout: Tables, defect, P3)
Core
Layout: Tables
Tracking
()
NEW
People
(Reporter: eagle.lu, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: perf)
Attachments
(1 file)
6.99 KB,
patch
|
Details | Diff | Splinter Review |
Go to
http://cvs.gnome.org/viewcvs/gtk%2B/ChangeLog.pre-2-8?rev=1.7010&view=log
firefox comsumes over 90% of CPU for over 6 minutes.
This bug occurs both on Linux and OpenSolaris
Updated•18 years ago
|
Product: Firefox → Core
QA Contact: general → general
Comment 1•18 years ago
|
||
It's taking forever for me on Windows.
But how is this different than any of these bugs: bug 54542, bug 318474, bug 352367.
Tables that large are just not wise.
OS: Linux → All
Hardware: Sun → All
Updated•17 years ago
|
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → DUPLICATE
Summary: firefox chews CPU → firefox chews CPU on large table
Comment 3•17 years ago
|
||
Please don't ever mark performance bugs as duplicates unless:
1) They have an identical small reduced testcase.
or
2) There is profile data indicating that they are in fact duplicates.
If you think they might be duplicates, mark a dependency.
In particular, bug 318474 is more or less useless, so marking something duplicate of it is a good way to make sure that nothing happens with it.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Comment 4•17 years ago
|
||
So a profile shows an obvious difference from the testcase in bug 318474: there are hotspots in GetRowCount and GetEffectiveRowSpan:
Total hit count: 1522412
Count %Total Function Name
490927 32.2 nsCellMap::GetRowCount(int) const
282600 18.6 nsTableCellMap::GetEffectiveRowSpan(int, int) const
94102 6.2 nsTableCellMap::GetEffectiveColSpan(int, int) const
Looking at the testcase, it looks like in addition to there being lots of rows, there are actually 17950 tbodies, which I suspect is what's triggering the above hotspots.
Comment 5•17 years ago
|
||
I wonder whether we could store the nsCellMaps in an array instead of a linked list, store an up-to-date start index for each one, and do binary search... Right now, about 30% of the time on this testcase is spent under nsTableFrame::GetEffectiveRowSpan. Each call is O(N) in number of rowgroups, and we have more or less one rowgroup per 2-3 rows, so overall table layout is O(N^2) in number of rows...
Status: REOPENED → NEW
Component: General → Layout: Tables
Keywords: perf
QA Contact: general → layout.tables
Comment 6•17 years ago
|
||
Alternately, we'd want to keep more state in the various callers of GetEffectiveRowSpan, so we don't have to keep recomputing the nsCellMap a given cell lives in. Especially because the answer is the same for all cells in a given row!
Comment 7•17 years ago
|
||
So just to keep track of things, the main possible callstacks into nsCellMap::GetRowCount are (with hit counts along the way, and branches that are considered elsewhere marked with a 'R' after the count, for "repeat"):
490927 nsCellMap::GetRowCount
156251 nsTableCellMap::GetEffectiveRowSpan
126746 nsTableFrame::GetEffectiveRowSpan(nsTableCellFrame&, nsCellMap*)
121623 nsTableRowFrame::CalculateCellActualSize
60921 nsTableRowFrame::ReflowChildren
252972 nsTableRowFrame::Reflow
53397 nsTableRowFrame::CalcHeight
84194 R nsTableRowFrame::ReflowChildren
55093 nsTableRowFrame::InitHasCellWithStyleHeight
55093 R nsTableRowFrame::Reflow
54241 GetHeightOfRowsSpannedBelowFirst
54241 nsTableRowFrame::DidResize
54241 nsTableRowGroupFrame::DidResizeRows
54241 nsTableRowGroupFrame::CalculateRowHeights
170488 nsTableRowGroupFrame::ReflowChildren
32813 nsTableRowFrame::UpdateHeight
30797 R nsTableRowFrame::CalcHeight
2017 R nsTableRowFrame::ReflowChildren
29505 nsTableFrame::GetEffectiveRowSpan(int, nsTableCellFrame&)
65164 R nsTableRowGroupFrame::CalculateRowHeights
137025 R nsTableFrame::GetEffectiveRowSpan(nsTableCellFrame&, nsCellMap*)
57082 nsTableFrame::GetEffectiveColSpan(nsTableCellFrame&, nsCellMap*)
105840 R nsTableRowFrame::ReflowChildren
48758 nsTableCellMap::GetEffectiveColSpan
28578 R nsTableFrame::GetEffectiveColSpan(nsTableCellFrame&, nsCellMap*)
35659 R nsTableFrame::GetEffectiveRowSpan(int, nsTableCellFrame&)
490927 nsCellMap::GetRowCount
156251 nsTableCellMap::GetEffectiveRowSpan
126746 nsTableFrame::GetEffectiveRowSpan(nsTableCellFrame&, nsCellMap*)
121623 nsTableRowFrame::CalculateCellActualSize
60921 nsTableRowFrame::ReflowChildren
252972 nsTableRowFrame::Reflow
so this means for given row where we could once compute to which cellmaps it belongs, instead we search for the cellmap over and over again, by going the long way instead of supplying the cellmap at http://lxr.mozilla.org/seamonkey/source/layout/tables/nsTableRowFrame.cpp#631 but supplying null. Silly. More than this the cellmap could be computed at the rowgroup level above and be supplied to all its rows.
Comment 9•17 years ago
|
||
For my reference when testing whether performance has improved: I let this page load from local disk for about two hours, and it's still not done.
Comment 10•17 years ago
|
||
Bernd, that's exactly what I think we should be doing. Computing in nsTableFrame or nsTableRowGroupFrame which cellmap is the relevant one and then making use of it throughout.
Not sure what the cleanest way to do this is, yet. Especially given a table split over multiple pages.
Comment 11•17 years ago
|
||
Our in other words
http://lxr.mozilla.org/seamonkey/search?string=GetEffectiveRowSpan
nearly all hits are potential performance issues. Only border collapse uses the fast way :-(
Comment 12•17 years ago
|
||
The way border collapse uses is buggy right now. It gives the wrong answer if there is more than one rowgroup.
Comment 13•17 years ago
|
||
It looks like we have an nsTableRowGroupFrame::GetStartRowIndex... Is that guaranteed to return the same thing as adding up the nsCellMap::GetRowCount for the preceding rowgroups's cellmaps?
Comment 14•17 years ago
|
||
For that matter, would it make sense to just store the start index on nsCellMap ahd have a pointer from nsTableRowGroupFrame to the corresponding nsCellMap?
Comment 15•17 years ago
|
||
The reason I ask the question in comment 14 is that nsTableRowGroupFrame::GetStartRowIndex seems to be a lot more complicated than it really should be, somehow...
Comment 16•17 years ago
|
||
>nsTableRowGroupFrame::GetStartRowIndex seems to be a lot more complicated than
>it really should be, somehow...
Which part? The first one which is lame protection against wrong frame construction, where we should rather assert and die or the second which handles empty rowgroups?
Comment 17•17 years ago
|
||
Both. Why not just store the row index of the first row in the rowgroup directly?
Comment 18•17 years ago
|
||
I think thats a wrong design decision from the time (1998) where frames should have been as lean as possible, so the idea was probably to store the index only once in the row and then to look it up. Then lame protection was necessary at that time 3.55 buster%netscape.com 1998-09-15 13:36 nsTableRowGroupFrame no longer assumes all its children are rows, or that all row children are cells.
Storing means syncing it all the time, also when rows are removed from other rowgroups that are before. This should be done inside nsTableRowGroupFrame::AdjustRowIndices
Comment 19•17 years ago
|
||
Hmm. Rowgroups can split, right? But the cellmap lives entirely on the first-in-flow table, correct?
How is this stuff supposed to work, exactly? It looks to me like adding rows to a not-first-in-flow rowgroup currently just corrupts the cellmap. We'll try to
cellMap->InsertRows(aRowGroupFrame, aRowFrames, aRowIndex, aConsiderSpans,
damageArea);
Where aRowGroupFrame is the in-flow rowgroup, and then InsertRows() will fail to find the corresponding cellmap and do nothing at all.
Or am I missing something?
Comment 20•17 years ago
|
||
The reason this matters is that for the functions being called here we don't need a rowgroup-relative index. We need an nsCellMap-relative one.
I suppose we could just store the start of each nsCellMap directly in the nsCellMap (which would still be broken with splitting, but whatever) and then make the nsCellMap list an array so we can binary search to find the right cellmap? That would make this pageload O(N^2 log N) instead of the current O(N^3) in number of rows.... We could make it O(N^2) if we Just Knew in O(1) time which nsCellMap to use, and with what index, of course.
Another option is to not change anything about how nsTableRowGroupFrame works apart from having a pointer from the nsTableRowGroupFrame to the relevant nsCellMap, and store the start row index in the nsCellMap. That might be the simplest thing to do here...
Instead of changing the mess that is the current rowgroup code (and then dragging along all the information needed
Comment 21•17 years ago
|
||
So the issue is that maintaining the rowgroup-to-cellmap pointer is a little weird when we have continuation rowgroups. Ideally those would share the same cellmap pointer as the first-in-flow. But we do in fact give a (semi-bogus?) cellmap to the in-flows of a table, so I'm not quite sure, for example, whether in-flow rowgroups would end up going through nsTableFrame::InsertRowGroups as they get pulled.
Comment 22•17 years ago
|
||
I guess I'll assume they don't since that code uses the first-in-flow's cellmap?
Comment 23•17 years ago
|
||
Man. All the InsertRows, etc stuff is just so broken with continuations, since it derives indices based on the frame tree (the nsRowGroupFrame methods) and then uses them to index into the cellmap (which uses first-in-flow-table-relative indexing throughout).
I really hope we don't split tables over columns...
Comment 24•17 years ago
|
||
Stopping work on this for now, and praying that someone will tell me that I'm wrong and that this stuff somehow works.... and explaining how.
Comment 25•17 years ago
|
||
>I really hope we don't split tables over columns...
We don't do this, it crashes within seconds if Mars ehm Jesse attacks. I wrote a special barrier against this (bug 362275).
Comment 26•17 years ago
|
||
see https://bugzilla.mozilla.org/show_bug.cgi?id=362275#c8 for a description of the situation
Comment 27•17 years ago
|
||
> it crashes within seconds
Ok. That's consistent with my reading of this code, too. OK.
In that case I guess we can just not worry about any of the updating stuff working with in-flows and file followup bugs?
Comment 28•17 years ago
|
||
To be precise: the basic assumption of the pagination code inside tables is that it will happen only during printing and NEVER incremental. (See http://lxr.mozilla.org/seamonkey/source/layout/tables/nsTableRowGroupFrame.cpp#1032). If we decide to change this to be compatible with columns it requires a lot of changes. I think this out for the coming release. Having postponed it, one should seriously think where we are heading I would like to point to roc's comment https://bugzilla.mozilla.org/show_bug.cgi?id=397428#c14
"The continuation model itself is a flawed design."
On the other hand, I have seen the previous big rearchitecture attempt with Chris Karnaze and John Keiser in 2002/2003 and it virtually stopped all the work on this code and got never into trunk.
Updated•17 years ago
|
Flags: blocking1.9?
Flags: blocking1.9? → blocking1.9-
Comment 29•16 years ago
|
||
I would like to help with this test case:
http://rty24.webhelye.hu/editReportXml.jsp.html
Comment 30•16 years ago
|
||
This does seem to have somewhat regressed since Fx 2 but the testcase above is a really, really complex example with over 7,000 lines of nested tables.
Comment 31•16 years ago
|
||
(In reply to comment #30)
> This does seem to have somewhat regressed since Fx 2 but the testcase above is
> a really, really complex example with over 7,000 lines of nested tables.
>
"the testcase above is a really, really complex example with over 7,000 lines of nested tables" that can be rendered within seconds with Fx2 but completely locks up Fx3 for minutes. That looks to be much more than "somewhat" regressed to me.
Comment 32•16 years ago
|
||
If something is a regression from Firefox 2 it is NOT this bug. This bug is present in Firefox 2. Please file a new bug with the new testcase and cc me on it?
In general, always do this for performance issues. "slow" can happen for a large number of reasons, and different reasons need different bugs to address them. If it's really a duplicate, it'll get marked that way.
Comment 33•16 years ago
|
||
I filed bug 438509 on the testcase from comment 29. As expected, it has nothing to do with this bug, and also as expected it does show a serious performance regression that we need to fix.
Comment 34•13 years ago
|
||
Taking, I guess, though I'd love for some who wants to learn layout stuff to pick this up... ;)
Comment 35•13 years ago
|
||
Mats, any chance you want to take this?
Comment 36•12 years ago
|
||
Firefox becomes unusable for over 1 minute when rendering this 9 MB Apache directory listing (essentially a large table):
http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-aurora-l10n/
I created a new profile and installed the profiler addon mentioned in [1], loaded the page above, and uploaded my profile here:
http://people.mozilla.com/~bgirard/cleopatra/?report=a730d1b97e935d774ec42130e9f988a326854bec
I'm running the latest Nightly (2012-09-18) on OS X. If you need me to regenerate a new profiler report with a newer build just let me know.
John
[1] https://developer.mozilla.org/en-US/docs/Performance/Reporting_a_Performance_Problem
Comment 37•12 years ago
|
||
John, could you please file a separate bug on that? Mark it as depending on bug 539356, because a huge part of the profile is invalidation...
Comment 38•12 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #37)
> John, could you please file a separate bug on that? Mark it as depending on
> bug 539356, because a huge part of the profile is invalidation...
Ok - I filed bug 792413.
Comment 39•6 years ago
|
||
Decreasing the priority as no update for the last 2 years on this bug.
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage
about the priority meaning.
Priority: P1 → P5
Priority: P5 → P3
Updated•5 years ago
|
Assignee: bzbarsky → nobody
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•