Closed Bug 148338 Opened 22 years ago Closed 3 years ago

Hangs on very large HTML tables (50.000 rows / 26.mb data) (nsCellMap::GetDataAt)

Categories

(Core :: Layout: Tables, defect, P3)

defect

Tracking

()

RESOLVED WORKSFORME
Future

People

(Reporter: henrik, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: hang, perf, testcase)

Attachments

(7 files)

After trying to load a html page with a *very* large table (thats 50.000 rows from around 26 MB of data. symptoms: - the 26 MB is fetched from the webserver (which is on a 100 Mbs LAN) - memory rizes to around 400MB (Windows 2000 taskmanager's VM size) - browser hangs, and can only be exited by killing it via tha task manager)
I cannot upload the table in question as it contains company data i cannot disclose.Will try and create test case later
Keywords: hang, perf
Henrik: Bugzilla can't accept attachments bigger than 1 MB, so you have to upload packed testcase on some other location.
Severity: normal → critical
This needs a testcase that can be profiled....
Blocks: 56854
Just tried the attached testcase with 50000 table cells, two lines per cell. I waited and waited and waited and killed mozilla-bin after 15mins since it did not react on any user input anymore (this is no real hang, but the Zilla was simply far too busy). Further tests (cutting-down the number of cells) showed that it takes ~12mins to reach line 14000 (which means that it would need ~45mins to load the whole document (assuming the load time grows linear (which is not the case...))). Ouch.
I wonder if it changes anything if the 50000 lines were broken into say 1000 tables with 50 lines in each ?
I downloaded the attachment and unzipped the html source. I loaded the local file in both IE and Moz 1.0 - Build 2002053104. Here are the results: IE - loaded page within six seconds (twice), refreshed in 4 seconds, cPU - 100% Moz - still loading after ninty seconds, CPU Usage - 100%, VM Size - 26220K. The application has become unresponsive and I cannot see anything in the window, I can't kill it either, must use task manager. The symptoms are exactly as reported. OS: WinXP Reproducible: Always, even in a brand new session
I have tried the original version (the one I reported the bug on), in both: * IE 6 on windows 2000 * Netscape 7 on Windows 2000 * Opera 6.02 on windows and they all hung after a few minutes and comsumed plenty of memory
I'll try to profile this on June 13 or so... of course people should feel free to do it before that.
some timing from PII-450, Linux, current trunk CVS, optimization -O3, 392MB Ram thousand rows seconds 1 2 2 5 5 17 10 65 11 89 12 115 13 147 15 213 this would be O(N^2+) scaling... for the case of 15000 rows, Jprof says that 82% of the time was spent in nsCellMap::GetDataAt. I can attach the full output if it would be useful.
this is a test case based on the original file, only differnce is that the swift codes and country names have been replaced by dummy texts There is apporx 56.000 lines in the 28 mb file. This file does not render memory usage goes up to about 180 mb and the VM size to about 480 mb, and then moz appear to be hanging.
Andrew, please do attach the full profile.
I am adding the mozilla1.0.1 keyword in order to nominate this bug for mozilla 1.0.1, the reasons are: * it has the severity critical, and it hangs (kills) mozilla * The bug seems very general, it doesn't happen to just bug tables with a jpeg in each third row that has a blue pixel at 233,450 , but instead it happends to all big tables. * We do have test cases and a rough estimate of the scaling (this would be O(N^2+) scaling...) meaning that we do have some good info * not knowing that code, i would say that it looks like andrew has nailed the problem by finding out that c82% of the cpu time was spendt in nsCellMap::GetDataAt.
Keywords: mozilla1.0.1
Keywords: testcase
Priority: -- → P2
So it looks like we're calling the nsCellMap::GetDataAt on every incremental reflow. It also looks like that function is walking an array that is O(N) in the number of rows on every call. Would it help to store a flag that says "there is no useful data in the cellmap"?
I tried the patch on attachement #1 reduced to about 11500 rows, and the time went from 120 seconds to 30 seconds. I think for a larger table the performance gain will be greater than 4x, since the patch avoids traversing all of the rows. I experimented with removing the 0 colspan/rowspan calculations in nsCellMap::GetDataAt and was able to get the time to 20 seconds, but to rework the way 0 colspans/rowspans are handled represents a lot of work, that I may do later. It would be nice to get new jprof results with the patch.
Severity: critical → major
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0.1
Comment on attachment 86520 [details] [diff] [review] patch to speed up nsTableRowGroupFrame::CalculateRowHeights sr=waterson
Attachment #86520 - Flags: superreview+
some time data rows/1000 orig patched 5 17 11 10 65 25 15 213 48 20 70 30 132 40 210 jprof report to follow
down to spending 32% in nsCellMap::GetDataAt
OS,Platform=>All
OS: Windows 2000 → All
Hardware: PC → All
Comment on attachment 86520 [details] [diff] [review] patch to speed up nsTableRowGroupFrame::CalculateRowHeights r=bernd
Attachment #86520 - Flags: review+
Could we please get a jprof report (both for pached and unpatched) for my original testcase, since its table is not just a clean table with only XX in each cell, but it also contains form controls on each row.
form controls will probably just get you a more drastic version of bug 148636. there's a perf hit there also, but mainly memory consumption.
That is excatly why I would like to see the results for my attachment, to see if it willl render after the patch and to see how much it differs from the first attachment, which was only a simple test case I hope it is not too much trouble to create such a jprof run
I checked the patch into the trunk but am leaving the bug open.
Attachment #86019 - Attachment mime type: application/octet-stream → application/zip
This doesn't show any issues with the cellmap. The slowness due to swapping almost immediately completely overwhelmed whatever else was going on. I let it run for 5 minutes or so wall clock time, during which there were 7385 hits, with 1.5ms between hits in code time. In other words, almost all the time was spent outside Mozilla code.
The profile with this patch is really no different from the profile without the patch on the _original_ testcase. Again, the real speed problem there is the swapping.
Attachment #85774 - Attachment mime type: application/octet-stream → application/x-gzip
Here is output from eazel profiler. I cut table to about 3600 rows and it takes around 15min to render.
The patch that was checked into the trunk is the biggest gain to be made in tables. I'm not sure who should look at the general footprint/swapping problem, but I'm moving this to future to get it off of my radar.
Target Milestone: mozilla1.0.1 → Future
Summary: Hangs on very large HTML tables (50.000 rows / 26.mb data) → Hangs on very large HTML tables (50.000 rows / 26.mb data) (nsCellMap::GetDataAt)
Also related to bug 54542
mass reassign to default owner
Assignee: karnaze → table
Status: ASSIGNED → NEW
QA Contact: amar → madhur
Target Milestone: Future → ---
Target Milestone: --- → Future
Might a new profile bring more light into this?
Blocks: 54542
How about a profilable testcase first? The only one that's still usefully profilable as far as I can tell just shows bug 148636.
Will compile a new testcase by the end of next week.
*** Bug 226358 has been marked as a duplicate of this bug. ***
Blocks: 234240
It seems questionable to me that we call RowIsSpannedInto as soon as we need to seriously update the rowgroup height, and then we loop over all cells and even try to repair cell map hole. The colinfo (http://lxr.mozilla.org/seamonkey/source/layout/html/table/src/nsCellMap.h#53) for every column group has two member variables: 55 PRInt32 mNumCellsOrig; // number of cells originating in the col 56 PRInt32 mNumCellsSpan; // number of cells spanning into the col via colspans (not rowspans) 57 // for simplicity, a colspan=0 cell is only counted as spanning the 58 // 1st col to the right of where it orginates and we update them during manipulations of the cellmap (if we fail we crash). It might be worth the effort to do something similiar for rows and have a struct nsRowInfo { int mNumCellsSpanIn; int mNumCellsSpanOut; int mNumCellsOrig; } and update it when building the cellmap, so that we only look up these numbers when calling RowIsSpannedInto I think I will be able to do that in the timeframe outlined in http://bugzilla.mozilla.org/show_bug.cgi?id=54542#c140
*** Bug 239432 has been marked as a duplicate of this bug. ***
I think this has improved quite considerably. Mozilla1.7 takes very long (> 5minutes) and freezes completely up after a while. Current trunk build takes a few minutes (2 or so I guess) and don't freeze up. Only at the end the ui becomes a little slow.
Latest results for the simple test case (1st attachment) on a Core Duo 1.86GHz Windows machine : IE7 loads it in 5 sec Firefox 3 beta 3 loads it in 20 sec No freeze, fine scrolling performance after completing loading.
Assignee: layout.tables → nobody
QA Contact: madhur → layout.tables
A current e10s trunk build (FF36) takes about 2 seconds to show the table on my fast desktop Linux machine. The page is blank while it's loading, and then the contents all become visible at once. Scrolling is very smooth once it has loaded. Chromium also takes about 2 seconds, but it loads things progressively -- the start of the table is visible immediately, and the last part of the table takes about 2 seconds to show up. So that's a nicer behaviour.
Moving to p3 because no activity for at least 1 year(s). See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information
Priority: P2 → P3
Moving to p3 because no activity for at least 1 year(s). See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information

Marking this as Resolved > Worksforme since the hang is not occurring anymore using Release 93.0, Beta 94.0b2 and latest Nightly 95.0a1 (2021-10-07) on Windows 10 and Ubuntu 20.04.
If anyone else is able to reproduce it please re-open the issue or file a new one.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: