363698 - firefox chews CPU on large table

Reporter

Description

•

18 years ago

Go to http://cvs.gnome.org/viewcvs/gtk%2B/ChangeLog.pre-2-8?rev=1.7010&view=log firefox comsumes over 90% of CPU for over 6 minutes. This bug occurs both on Linux and OpenSolaris

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

18 years ago

Product: Firefox → Core

QA Contact: general → general

Brian Polidoro

Comment 1

•

18 years ago

It's taking forever for me on Windows. But how is this different than any of these bugs: bug 54542, bug 318474, bug 352367. Tables that large are just not wise.

OS: Linux → All

Hardware: Sun → All

Wayne Mery (:wsmwk)

Updated

•

17 years ago

Status: NEW → RESOLVED

Closed: 17 years ago

Resolution: --- → DUPLICATE

Summary: firefox chews CPU → firefox chews CPU on large table

Boris Zbarsky [:bzbarsky]

Comment 3

•

17 years ago

Please don't ever mark performance bugs as duplicates unless: 1) They have an identical small reduced testcase. or 2) There is profile data indicating that they are in fact duplicates. If you think they might be duplicates, mark a dependency. In particular, bug 318474 is more or less useless, so marking something duplicate of it is a good way to make sure that nothing happens with it.

Status: RESOLVED → REOPENED

Resolution: DUPLICATE → ---

Boris Zbarsky [:bzbarsky]

Comment 4

•

17 years ago

So a profile shows an obvious difference from the testcase in bug 318474: there are hotspots in GetRowCount and GetEffectiveRowSpan: Total hit count: 1522412 Count %Total Function Name 490927 32.2 nsCellMap::GetRowCount(int) const 282600 18.6 nsTableCellMap::GetEffectiveRowSpan(int, int) const 94102 6.2 nsTableCellMap::GetEffectiveColSpan(int, int) const Looking at the testcase, it looks like in addition to there being lots of rows, there are actually 17950 tbodies, which I suspect is what's triggering the above hotspots.

Boris Zbarsky [:bzbarsky]

Comment 5

•

17 years ago

I wonder whether we could store the nsCellMaps in an array instead of a linked list, store an up-to-date start index for each one, and do binary search... Right now, about 30% of the time on this testcase is spent under nsTableFrame::GetEffectiveRowSpan. Each call is O(N) in number of rowgroups, and we have more or less one rowgroup per 2-3 rows, so overall table layout is O(N^2) in number of rows...

Status: REOPENED → NEW

Component: General → Layout: Tables

Keywords: perf

QA Contact: general → layout.tables

Boris Zbarsky [:bzbarsky]

Comment 6

•

17 years ago

Alternately, we'd want to keep more state in the various callers of GetEffectiveRowSpan, so we don't have to keep recomputing the nsCellMap a given cell lives in. Especially because the answer is the same for all cells in a given row!

Boris Zbarsky [:bzbarsky]

Comment 7

•

17 years ago

So just to keep track of things, the main possible callstacks into nsCellMap::GetRowCount are (with hit counts along the way, and branches that are considered elsewhere marked with a 'R' after the count, for "repeat"): 490927 nsCellMap::GetRowCount 156251 nsTableCellMap::GetEffectiveRowSpan 126746 nsTableFrame::GetEffectiveRowSpan(nsTableCellFrame&, nsCellMap*) 121623 nsTableRowFrame::CalculateCellActualSize 60921 nsTableRowFrame::ReflowChildren 252972 nsTableRowFrame::Reflow 53397 nsTableRowFrame::CalcHeight 84194 R nsTableRowFrame::ReflowChildren 55093 nsTableRowFrame::InitHasCellWithStyleHeight 55093 R nsTableRowFrame::Reflow 54241 GetHeightOfRowsSpannedBelowFirst 54241 nsTableRowFrame::DidResize 54241 nsTableRowGroupFrame::DidResizeRows 54241 nsTableRowGroupFrame::CalculateRowHeights 170488 nsTableRowGroupFrame::ReflowChildren 32813 nsTableRowFrame::UpdateHeight 30797 R nsTableRowFrame::CalcHeight 2017 R nsTableRowFrame::ReflowChildren 29505 nsTableFrame::GetEffectiveRowSpan(int, nsTableCellFrame&) 65164 R nsTableRowGroupFrame::CalculateRowHeights 137025 R nsTableFrame::GetEffectiveRowSpan(nsTableCellFrame&, nsCellMap*) 57082 nsTableFrame::GetEffectiveColSpan(nsTableCellFrame&, nsCellMap*) 105840 R nsTableRowFrame::ReflowChildren 48758 nsTableCellMap::GetEffectiveColSpan 28578 R nsTableFrame::GetEffectiveColSpan(nsTableCellFrame&, nsCellMap*) 35659 R nsTableFrame::GetEffectiveRowSpan(int, nsTableCellFrame&)

Bernd

Comment 8

•

17 years ago

490927 nsCellMap::GetRowCount 156251 nsTableCellMap::GetEffectiveRowSpan 126746 nsTableFrame::GetEffectiveRowSpan(nsTableCellFrame&, nsCellMap*) 121623 nsTableRowFrame::CalculateCellActualSize 60921 nsTableRowFrame::ReflowChildren 252972 nsTableRowFrame::Reflow so this means for given row where we could once compute to which cellmaps it belongs, instead we search for the cellmap over and over again, by going the long way instead of supplying the cellmap at http://lxr.mozilla.org/seamonkey/source/layout/tables/nsTableRowFrame.cpp#631 but supplying null. Silly. More than this the cellmap could be computed at the rowgroup level above and be supplied to all its rows.

Boris Zbarsky [:bzbarsky]

Comment 9

•

17 years ago

For my reference when testing whether performance has improved: I let this page load from local disk for about two hours, and it's still not done.

Boris Zbarsky [:bzbarsky]

Comment 10

•

17 years ago

Bernd, that's exactly what I think we should be doing. Computing in nsTableFrame or nsTableRowGroupFrame which cellmap is the relevant one and then making use of it throughout. Not sure what the cleanest way to do this is, yet. Especially given a table split over multiple pages.

Bernd

Comment 11

•

17 years ago

Our in other words http://lxr.mozilla.org/seamonkey/search?string=GetEffectiveRowSpan nearly all hits are potential performance issues. Only border collapse uses the fast way :-(

Boris Zbarsky [:bzbarsky]

Comment 12

•

17 years ago

The way border collapse uses is buggy right now. It gives the wrong answer if there is more than one rowgroup.

Boris Zbarsky [:bzbarsky]

Comment 13

•

17 years ago

It looks like we have an nsTableRowGroupFrame::GetStartRowIndex... Is that guaranteed to return the same thing as adding up the nsCellMap::GetRowCount for the preceding rowgroups's cellmaps?

Boris Zbarsky [:bzbarsky]

Comment 14

•

17 years ago

For that matter, would it make sense to just store the start index on nsCellMap ahd have a pointer from nsTableRowGroupFrame to the corresponding nsCellMap?

Boris Zbarsky [:bzbarsky]

Comment 15

•

17 years ago

The reason I ask the question in comment 14 is that nsTableRowGroupFrame::GetStartRowIndex seems to be a lot more complicated than it really should be, somehow...

Bernd

Comment 16

•

17 years ago

>nsTableRowGroupFrame::GetStartRowIndex seems to be a lot more complicated than >it really should be, somehow... Which part? The first one which is lame protection against wrong frame construction, where we should rather assert and die or the second which handles empty rowgroups?

Boris Zbarsky [:bzbarsky]

Comment 17

•

17 years ago

Both. Why not just store the row index of the first row in the rowgroup directly?

Bernd

Comment 18

•

17 years ago

I think thats a wrong design decision from the time (1998) where frames should have been as lean as possible, so the idea was probably to store the index only once in the row and then to look it up. Then lame protection was necessary at that time 3.55 buster%netscape.com 1998-09-15 13:36 nsTableRowGroupFrame no longer assumes all its children are rows, or that all row children are cells. Storing means syncing it all the time, also when rows are removed from other rowgroups that are before. This should be done inside nsTableRowGroupFrame::AdjustRowIndices

Boris Zbarsky [:bzbarsky]

Comment 19

•

17 years ago

Hmm. Rowgroups can split, right? But the cellmap lives entirely on the first-in-flow table, correct? How is this stuff supposed to work, exactly? It looks to me like adding rows to a not-first-in-flow rowgroup currently just corrupts the cellmap. We'll try to cellMap->InsertRows(aRowGroupFrame, aRowFrames, aRowIndex, aConsiderSpans, damageArea); Where aRowGroupFrame is the in-flow rowgroup, and then InsertRows() will fail to find the corresponding cellmap and do nothing at all. Or am I missing something?

Boris Zbarsky [:bzbarsky]

Comment 20

•

17 years ago

The reason this matters is that for the functions being called here we don't need a rowgroup-relative index. We need an nsCellMap-relative one. I suppose we could just store the start of each nsCellMap directly in the nsCellMap (which would still be broken with splitting, but whatever) and then make the nsCellMap list an array so we can binary search to find the right cellmap? That would make this pageload O(N^2 log N) instead of the current O(N^3) in number of rows.... We could make it O(N^2) if we Just Knew in O(1) time which nsCellMap to use, and with what index, of course. Another option is to not change anything about how nsTableRowGroupFrame works apart from having a pointer from the nsTableRowGroupFrame to the relevant nsCellMap, and store the start row index in the nsCellMap. That might be the simplest thing to do here... Instead of changing the mess that is the current rowgroup code (and then dragging along all the information needed

Boris Zbarsky [:bzbarsky]

Comment 21

•

17 years ago

So the issue is that maintaining the rowgroup-to-cellmap pointer is a little weird when we have continuation rowgroups. Ideally those would share the same cellmap pointer as the first-in-flow. But we do in fact give a (semi-bogus?) cellmap to the in-flows of a table, so I'm not quite sure, for example, whether in-flow rowgroups would end up going through nsTableFrame::InsertRowGroups as they get pulled.

Boris Zbarsky [:bzbarsky]

Comment 22

•

17 years ago

I guess I'll assume they don't since that code uses the first-in-flow's cellmap?

Boris Zbarsky [:bzbarsky]

Comment 23

•

17 years ago

Man. All the InsertRows, etc stuff is just so broken with continuations, since it derives indices based on the frame tree (the nsRowGroupFrame methods) and then uses them to index into the cellmap (which uses first-in-flow-table-relative indexing throughout). I really hope we don't split tables over columns...

Boris Zbarsky [:bzbarsky]

Comment 24

•

17 years ago

Attached patch What I have so far — Details — Splinter Review

Stopping work on this for now, and praying that someone will tell me that I'm wrong and that this stuff somehow works.... and explaining how.

Bernd

Comment 25

•

17 years ago

>I really hope we don't split tables over columns... We don't do this, it crashes within seconds if Mars ehm Jesse attacks. I wrote a special barrier against this (bug 362275).

Bernd

Comment 26

•

17 years ago

see https://bugzilla.mozilla.org/show_bug.cgi?id=362275#c8 for a description of the situation

Boris Zbarsky [:bzbarsky]

Comment 27

•

17 years ago

> it crashes within seconds Ok. That's consistent with my reading of this code, too. OK. In that case I guess we can just not worry about any of the updating stuff working with in-flows and file followup bugs?

Bernd

Comment 28

•

17 years ago

To be precise: the basic assumption of the pagination code inside tables is that it will happen only during printing and NEVER incremental. (See http://lxr.mozilla.org/seamonkey/source/layout/tables/nsTableRowGroupFrame.cpp#1032). If we decide to change this to be compatible with columns it requires a lot of changes. I think this out for the coming release. Having postponed it, one should seriously think where we are heading I would like to point to roc's comment https://bugzilla.mozilla.org/show_bug.cgi?id=397428#c14 "The continuation model itself is a flawed design." On the other hand, I have seen the previous big rearchitecture attempt with Chris Karnaze and John Keiser in 2002/2003 and it virtually stopped all the work on this code and got never into trunk.

Boris Zbarsky [:bzbarsky]

Updated

•

17 years ago

Flags: blocking1.9?

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

17 years ago

Flags: blocking1.9? → blocking1.9-

Gyorgy Kapolnas

Comment 29

•

16 years ago

I would like to help with this test case: http://rty24.webhelye.hu/editReportXml.jsp.html

Ryan Jones-Ward [:sciguyryan]

Comment 30

•

16 years ago

This does seem to have somewhat regressed since Fx 2 but the testcase above is a really, really complex example with over 7,000 lines of nested tables.

wellofsouls

Comment 31

•

16 years ago

(In reply to comment #30) > This does seem to have somewhat regressed since Fx 2 but the testcase above is > a really, really complex example with over 7,000 lines of nested tables. > "the testcase above is a really, really complex example with over 7,000 lines of nested tables" that can be rendered within seconds with Fx2 but completely locks up Fx3 for minutes. That looks to be much more than "somewhat" regressed to me.

Boris Zbarsky [:bzbarsky]

Comment 32

•

16 years ago

If something is a regression from Firefox 2 it is NOT this bug. This bug is present in Firefox 2. Please file a new bug with the new testcase and cc me on it? In general, always do this for performance issues. "slow" can happen for a large number of reasons, and different reasons need different bugs to address them. If it's really a duplicate, it'll get marked that way.

Boris Zbarsky [:bzbarsky]

Comment 33

•

16 years ago

I filed bug 438509 on the testcase from comment 29. As expected, it has nothing to do with this bug, and also as expected it does show a serious performance regression that we need to fix.

Boris Zbarsky [:bzbarsky]

Comment 34

•

13 years ago

Taking, I guess, though I'd love for some who wants to learn layout stuff to pick this up... ;)

Assignee: nobody → bzbarsky

Blocks: 234240

Priority: -- → P1

Boris Zbarsky [:bzbarsky]

Updated

•

13 years ago

Blocks: 397829

Boris Zbarsky [:bzbarsky]

Comment 35

•

13 years ago

Mats, any chance you want to take this?

John Hopkins (:jhopkins)

Comment 36

•

12 years ago

Firefox becomes unusable for over 1 minute when rendering this 9 MB Apache directory listing (essentially a large table): http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-aurora-l10n/ I created a new profile and installed the profiler addon mentioned in [1], loaded the page above, and uploaded my profile here: http://people.mozilla.com/~bgirard/cleopatra/?report=a730d1b97e935d774ec42130e9f988a326854bec I'm running the latest Nightly (2012-09-18) on OS X. If you need me to regenerate a new profiler report with a newer build just let me know. John [1] https://developer.mozilla.org/en-US/docs/Performance/Reporting_a_Performance_Problem

Boris Zbarsky [:bzbarsky]

Comment 37

•

12 years ago

John, could you please file a separate bug on that? Mark it as depending on bug 539356, because a huge part of the profile is invalidation...

John Hopkins (:jhopkins)

Comment 38

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #37) > John, could you please file a separate bug on that? Mark it as depending on > bug 539356, because a huge part of the profile is invalidation... Ok - I filed bug 792413.

Sylvestre Ledru [:Sylvestre]

Comment 39

•

6 years ago

Decreasing the priority as no update for the last 2 years on this bug. See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage about the priority meaning.

Priority: P1 → P5

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

6 years ago

Priority: P5 → P3

Boris Zbarsky [:bzbarsky]

Updated

•

5 years ago

Assignee: bzbarsky → nobody

BMO Automation

Updated

•

2 years ago

Severity: normal → S3