Closed Bug 145425 Opened 22 years ago Closed 13 years ago

extreme cpu, memory usage and time to load big HTML file (in comparsion with other browsers)

Categories

(Core :: Layout, defect, P2)

x86
All
defect

Tracking

()

RESOLVED WORKSFORME
Future

People

(Reporter: aha, Assigned: waterson)

References

(Depends on 1 open bug)

Details

(4 keywords, Whiteboard: [reflow-refactor])

Attachments

(1 file, 6 obsolete files)

Linkchecker Xenu has no problem to generate several megs big HTML file as its
report. Last week generate about 30 MB big HTML files and start Mozilla to
display this report. Mozilla hangs for several minutes and its memory used
growns to stars. 

So I prepared similar structured file (32 MBs, PREformated text with lot of
links) and compared several common used browsers, below are results. Mozilla
needs 13x more time to render page than NN 4.79 and eats near to 2x more memory
than other browsers. In NN4.79 you could scroll document while loading, in
Mozilla you can't anything.

Mozilla 2002051708/trunk:
Memory used before test:		16 MB
Time to 1st displayed char:		4 s
Time to load:				302 s
Memory used peak:			239 MB
Memory on Document Done:		202 MB

Mozilla 2002051006/RC2:
Memory used before test:		16 MB
Time to 1st displayed char:		10 s
Time to load:				292 s
Memory used peak:			256 MB
Memory on Document Done:		202 MB

NN 4.79
Memory used before test:		9 MB
Time to 1st displayed char:		0 s
Time to load:				23 s
Memory used peak:			132 MB
Memory on Document Done:		130 MB

Opera 6.02 (B#1101):
Memory used before test:		10 MB
Time to 1st displayed char:		0 s
Time to load:				90 s
Memory used peak:			151 MB
Memory on Document Done:		151 MB

MSIE 5.0:
Memory used before test:		9.5 MB
Time to 1st displayed char:		1 s
Time to load:				58 s
Memory used peak:			133 MB
Memory on Document Done:		132 MB
Big testcase file (ZIP, 1.9MB) http://www.hauner.cz/testcasehtml.zip

Tested on Win2K (SP2), AMD Duron 950 MHz, 512 MB RAM. While test was constant
CPU usage by other running application (just supporting test). All browsers was
started just before separate test with blank page.
Keywords: 4xp, perf, testcase
-> Layout
Assignee: Matti → attinasi
Component: Browser-General → Layout
QA Contact: imajes-qa → petersen
This is a duplicate.  Please search for the "slow large page" tracking bug
(those exact words).  Then dup to one of its dependencies.
Whiteboard: DUPEME
Boris. 0 bugs found...

I cannot reproduce this bug on Linux. testcase load very, very fast (<1 sec)

Mozilla 1505200207 Linux redHat 7.3 KDE 1 ghz, 128 ram, 8 mbps connection.
Marking WFM in the OS X May 19th build (2002-05-19-05)
Status: UNCONFIRMED → RESOLVED
Closed: 22 years ago
Resolution: --- → WORKSFORME
Chris: Huh? You marked this bug reported for W2K as WFM, if don't occur on Linux
and OS X? I retried it again using 2002052108/trunk with same result - 5 minutes
in hang, memory usage peak on 270 MBs, after load 205 MBs. Futhermore, I tested
this bug now on Linux comp and opening and displaying file from local disk takes
5,5 minutes (2002051920/RH7.1) and similar amount of memory.
Bug is still present, so I'm REOPENing it.

Zbigniew: Did you try small file atteched with this bug or did you download ZIP
file with real (and big) testcase?
Status: RESOLVED → UNCONFIRMED
Resolution: WORKSFORME → ---
I would be willing to be that we spend a large amount of time looking up the
visited state of each of these links in global history. Reporter: could you try
this with a clean profile and see if there is any difference in performance?
Chris: Using 2002052508/trunk/W2K with new profile and empty history I'm getting
same results (253 MB peak and 205 MB after page load). Do you have any other
idea, where browser could spend time?

Boris: Zbigniew Braniecki tried to find bug to duplicate, but he wasn't lucky.

(I'm changing OS to All because it occurs on Linux too - my comment 7).
OS: Windows 2000 → All
ah, sorry.  "long" not "large".

This bug needs a profile; otherwise we're all just guessing.  I'll try to do 
that in a few weeks.
Blocks: 56854
Keywords: qawanted
Taking for now. What bzbarsky says: we need a profile.
Assignee: attinasi → waterson
Status: UNCONFIRMED → NEW
Ever confirmed: true
jprof was set up as 
  setenv JPROF_FLAGS "JP_DEFER JP_PERIOD=0.0015"

trunk pulled may 25

Wall clock time to load 210 seconds
The jprof info seems to point away from checking the links, although I'm not so
good at interpreting jprof output.

However, I did also test making |nsStyleUtil::IsHTMLLink| into an early return
(no-op) and it seemed to reduce wall clock time by about 10%.

(I'll also note in passing that because of 8bit chars in some of the mailto:
urls in that page, we load (on win32) msgcompo.dll and create a new nsIURI to 
parse those mail URLs just to pass them as char* to the global history to check
the link for visited state. Ugh. 4.x linux doesn't appear to bother with
:visited state for mailto: links).
(Actually, 8bit-in-mailto is too rare to worry about the extra cycles).
So.. the first obvious culprit is nsFrameList::LastChild.  Would it make any 
sense to add a mLastChild member to nsFrameList?  Because as it stands 
appending frames has to walk the entire existing frame list, so there are
O(n^2) calls to LastChild() on pageload.... (and this starts to quickly suck 
for cases with lots of child frames).  What would the footprint cost of adding 
this member be?
Every container frame would grow by a word, which isn't death but...

It looks like the only reason that we're trying to _use_ the LastChild in
nsCSSFrameConstructor::AppendFrames is to determine if there are :after pseudos
-- a fairly unlikely case which we could probably afford to handle in a less
efficient way.
Did you see my proposal in 144826?
Changing priority to P2.
Priority: -- → P2
Some thoughts about nsFrameList and adding |mLastChild|:

Do we always go through the nsFrameList appending functions rather than just
using SetNextSibling?  (Note that things done in batches within the frame
constructor already use the last child member of nsFrameItems.)

Would it be better to do something similar to what I did for the rule node child
storage:  use a tagged pointer, where the low bit being unset means that it's
just a first child, and if it's set, it points to a pair of pointers that has
first and last child?  We could then avoid switching to the latter mechanism
until we hit 8 or 16 frames or so.
dbaron, I'm not sure how you're proposal would make things different in this
case. Wouldn't we still need to get the last frame to look at it and see if it
was the generated frame?

I truncated the big file to 100,000 lines, and did some timing tests:

67s  Current code
59s  `#if 0' LastFrame check; always append content.
61s  Probe parent for :after pseudo, and only then check to see
     if LastFrame is the pseudo frame (this patch).
Chris, any chance of seeing how that patch affects the pageload tests, if at 
all?
On the giant test case, I get:

267s      #if 0
265s (!?) probe
308s      original

I'm just looking at the `Document: Done' number, so this isn't very scientific.
The important thing is that we do start to see an algorithmic win on really,
really, long pages.


As for page load:

Original.

Test id: 3CF6C24902
Avg. Median : 734 msec		Minimum     : 177 msec
Average     : 758 msec		Maximum     : 2345 msec
Test id: 3CF6C6D9C0
Avg. Median : 739 msec		Minimum     : 210 msec
Average     : 753 msec		Maximum     : 2353 msec


Probe.

Test id: 3CF6BCAF43
Avg. Median : 736 msec		Minimum     : 190 msec
Average     : 744 msec		Maximum     : 2292 msec
Test id: 3CF6C03F6F
Avg. Median : 734 msec		Minimum     : 179 msec
Average     : 745 msec		Maximum     : 2331 msec


So it looks like probing might be a slight win, but it's probably just in the noise.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.1alpha
Comment on attachment 85661 [details] [diff] [review]
probe parent for pseudo before walking frame list, diff -uw

+  // See if the eparent has an :after pseudo-element

"the parent" right?

r=bzbarsky with that.  If it does not regress pageload I say land it and then
reprofile.
Attachment #85661 - Flags: review+
Chris, could you retarget this bug as 1.1a is out?
Keywords: footprint
dbaron says we ought to put a bit in the style context that indicates whether or
not a frame has a :before or :after pseudo; that way, we'll avoid doing the
pseudo-probe. Since this is only an issue for really, really, really big pages
(and it may have a negative impact on small pages), I'm going to FUTURE.
Target Milestone: mozilla1.1alpha → Future
We should look more at that jprof.  There are some interesting things other than
the Append to examine.  Almost 1/2 the time is spent in laying out the data.

Random example: around 150 hits in nsFont::Equals, similar numbers in converting
text strings to single-byte for getting the width for layout.  We spend much
more than than (>5%-ish if I remember) dealing with creating and recovering
block reflow states.
I re-test (with my primitive technique) giant testcase and I got these _bad_
numbers for actual nightbuild (used hardware is same):

Mozilla 200212108/trunk:
Memory used before test:		21 MB
Time to 1st displayed char:		1 s
Time to load:				689 s
Memory used peak:			313 MB
Memory on Document Done:		313 MB
Blocks: 23187
Whiteboard: DUPEME
Blocks: 174359
Blocks: 29805
Blocks: 112858
Attachment #85661 - Attachment is obsolete: true
Comment on attachment 121076 [details] [diff] [review]
waterson's patch, updated to trunk.

roc, would you do the honors?
Attachment #121076 - Flags: superreview?(roc+moz)
Attachment #121076 - Flags: review?(roc+moz)
Attached patch A bit more cleanup... (obsolete) — Splinter Review
Attachment #121076 - Attachment is obsolete: true
Attachment #121079 - Flags: superreview?(roc+moz)
Attachment #121079 - Flags: review?(roc+moz)
Attachment #121076 - Flags: superreview?(roc+moz)
Attachment #121076 - Flags: review?(roc+moz)
Comment on attachment 121079 [details] [diff] [review]
A bit more cleanup...

> +  NS_PRECONDITION(aContent, "Must have a content node");

This should actually be removed from nsLayoutUtils::IsGeneratedContentFor
Comment on attachment 121079 [details] [diff] [review]
A bit more cleanup...

OK, that looks fine. You could make HasPseudoStyle inline since it's really
just wrapping a single method call.
Attachment #121079 - Flags: superreview?(roc+moz)
Attachment #121079 - Flags: superreview+
Attachment #121079 - Flags: review?(roc+moz)
Attachment #121079 - Flags: review+
Comment on attachment 121079 [details] [diff] [review]
A bit more cleanup...

removed extra assert, inlined that function, and checked in.

Time to reprofile here, I'd say....
Attachment #121079 - Attachment is obsolete: true
Blocks: 203448
Are there other issues implied in this bug or can it be marked fix?
See comment 26 and comment 33 -- we need a new profile to identify any remaining
problems.
dbradley/bz - could you do a profile to further investigate what can be done 
here.
Attached file Updated jprof of testcase (obsolete) —
Looks like a good deal of time is spent in reflow or in GetPrimaryFrameFor (the
latter all on #text nodes, in response to content being appended to those
nodes...)
Attachment #85405 - Attachment is obsolete: true
Perhaps we need a way to avoid triggering reflow off these AppendData calls in
cases when the text node's frame has not yet been constructed (or at least the
text node's parent has not been flushed from the sink)?  Or do we have one?
Depends on: 115049
Depends on: 233463
Depends on: 74656
Depends on: 235255
Can someone do a fresh test/analysis of current builds?
quick load on Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1)
Gecko/20050903 Firefox/1.6a1 with 2ghz amd 64 less than 2s. memory usage up 200k.
*** Bug 301862 has been marked as a duplicate of this bug. ***
*** Bug 61684 has been marked as a duplicate of this bug. ***
Blocks: 267525
(In reply to comment #39)
> Can someone do a fresh test/analysis of current builds?
> 

ditto (again)
Not worth it till the reflow branch lands.
Whiteboard: [reflow-refactor]
For what it's worth, I just tested on reflow branch and performance seems largely unaffected.  I'll reprofile after the branch lands.
Attachment #84154 - Attachment is obsolete: true
Attachment #126694 - Attachment is obsolete: true
Attachment #234142 - Attachment is patch: false
Attachment #234142 - Attachment mime type: text/plain → application/bzip2
QA Contact: chrispetersen → layout
Performance in the testcase is better in Firefox 8.0a2 than in IE9. However, adding to MemShrink.
Whiteboard: [reflow-refactor] → [reflow-refactor][MemShrink]
On my Linux box, we load this page faster than chrome (7s versus 19s) and use almost exactly the same amount of memory.

However, this testcase gives us high (~50%) heap-unclassified.  I imagine this is covered by one of our existing darkmatter bugs, but I'll open a separate bug with the testcase just in case.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago13 years ago
Resolution: --- → WORKSFORME
Whiteboard: [reflow-refactor][MemShrink] → [reflow-refactor]
I would expect that textruns might account for a lot of the heap here... in any case, please cc me and njn on the new bug?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: