Open Bug 216418 Opened 21 years ago Updated 11 months ago

Page with lots of <pre> text (or rather: extremely deeply-nested inline tags, with lots of linebreaks) uses lots of memory in layout arenas and is slow to load

Categories

(Core :: Layout, defect)

defect

Tracking

()

Performance Impact medium

People

(Reporter: paultolk, Unassigned)

References

(Depends on 2 open bugs, Blocks 1 open bug, )

Details

(Keywords: perf)

Attachments

(6 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5a) Gecko/20030718
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5a) Gecko/20030718

Browser starts eating memory and CPU time, become sluggish and eventually stops
responding on this text-only (!) page (see URL). IE works fine and eats 10 times
less memory there. 

Reproducible: Always

Steps to Reproduce:
1. Try to browse http://www.lambdacs.com/cpt/FAQ.html

Actual Results:  
see details

Expected Results:  
show page and do not go into non-responsive mode or waste resources
confirming with 2003081515 Win Server 2003
this page is 1100 KB basic html
I can see that after loading and as soon as I move the mouse over the page 
the CPU goes 100% for a few seconds and the browser becomes sluggish (Athlon XP
1700+)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: perf
Page loaded OK on W98SE and Mozilla 1.7.1
Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.7.1) Gecko/20040707

Can we confirm on W2K and 1.7.1?
Confirming.. With Mozilla 1.7.1 on W2K "VM Size" in Task manager grew from ~30
Mb to 200 Mb after which I had to kill the Mozilla not waiting for the
completion of the page load as the machine became completely non-responsive (it
is a very moderate computer, 128 Mb and 400 Mhz). What's interesting, the "Mem
Usage" in task manager pulses (from ~30Mb to 90, then back to 68 and then it was
pulsing approximately between these 2 values ~60 Mb and ~90 Mb).
Vijaya , 09/20/2004

I wish to confirm the above bug report.
The webpage is using up too much CPU time and memory
and goes into an unresponsive mode in the end.
I'm using Mozilla Firefox 0.9.3 on a Win 2000 machine.
The link works fine with IE 5.0 and Netscape 6.2.
I think that it is important to fix this problem since it 
is only a text page (even though it is a large file)
and as it works with other browsers and does not use up resources 
and hang as it does with Mozilla. The browser may be acting in such
a weird way with other such webpages in other websites too; so I think its
important to fix this bug.



Product: Browser → Seamonkey
IE <3sec.
Seamonkey trunk ~10 sec of high cpu on a modest 512mb AMD
Comparable performance on faster machine with FF 1.0.4.
smaller machies will of course be slower.

as previously described, just a big 1.3mb text file - not even any tables
Assignee: general → nobody
Severity: major → normal
Product: Mozilla Application Suite → Core
QA Contact: general → general
Summary: Browser starts eating resources and then stops responding on this simple page → Slow text file compared to IE - high CPU and memory for simple page
Whiteboard: dupeme
Should profile once new textframe lands.
Depends on: 367177
Although this is is probably at least partly related to the unescaped angle brackets around e-mail addresses in the file that give a deep tag nesting stack... 

Also note that this file is most emphatically NOT plaintext.  I'm not sure why people keep talking about a "text file".
roc, for what it's worth new textframe makes this testcase about 12% slower over here -- goes from 80 seconds to 90 seconds or so.
Attachment #267117 - Attachment is patch: false
Attachment #267117 - Attachment mime type: text/plain → application/zip
It makes it 36% slower over here. Looking into it.
A big part of the problem I'm seeing on Mac is that a lot of the text frames overflow their inline containers: their y-offset is -1 appunit. This happens the font ascent on Mac is 588.something appunits. nsTextFrameThebes computes the text ascent as ceil(font-ascent) == 589, which I think is the safe thing to do ... but nsThebesFontMetrics rounds the ascent to 588, which is used to set the ascent of the inline container. Then baseline alignment of the text pushes it slightly up out of its container. This causes us to propagate overflow rects all over the place causing serious performance problems.

There is a related problem with height calculation. nsTextFrameThebes adds ceil(font-ascent) and ceil(font-descent) to get the height. nsThebesFontMetrics needs to do the same thing (currently it just rounds the font maxHeight).

This patch fixes these issues by patching nsThebesFontMetrics. We could change nsTextFrameThebes to round ascent and descent and/or compute height differently, but using ceil is safer, especially when we will soon be computing accurate glyph bounds. (At that time, on Mac, we'll want to round font ascent and descent up to the nearest *pixel*, which will make this patch less relevant...)
Attachment #267261 - Flags: review?(pavlov)
That knocks the new-textframe penalty down to about 11%, similar to what Boris reports. I had a close look with Shark, which is a good tool but it's still hard to see exactly what the problem is. The slowdown is clearly in reflow (of course), but a lot of it (nearly half?) just seems to be things being generically fractionally slower everywhere in the call graph.

What does stand out is 2-3% of the profile spent in BuildTextRunsScanner::ScanFrames, almost entirely at exactly the recursive call to ScanFrames. I think what's happening here is that we descend very deep into a stack of inlines doing ::Reflow, reach the first text frame, and then call EnsureTextRun on it, which initiates another recursive traversal starting at the block through all child frames to create textruns for the text frames, which grows the stack a lot more. I think this is significantly increasing our stack memory footprint which is probably hurting cache and slowing down other things as well as being slow itself. This code can easily be turned into an explicit-stack algorithm so I'll try doing that and see how much it helps. Also, we should probably trigger textrun construction from the block when we reflow an inline line for the first time, so that it doesn't add to the bottom of the ::Reflow stack.

Another 1% or so of the profile is spent setting up line breaks and storing them in the textrun. This could be avoided when the paragraph is entirely white-space:pre or white-space:nowrap. This is easy and I'll do this too.

It looks like textrun construction is about as expensive with old or new textframes, even though the textruns being constructed are rather different.

We are working a little bit harder with the new textframe, because preformatted tabs are being rendered as proper tab stops instead of just expanded to a certain number of spaces. Of course for a fixed-width font like this page, that doesn't make a difference, but optimizing that away would be complex and I think undesirable. That accounts for maybe 1% of the profile.

Other generic observations that are independent of new or old textframe:
-- In this testcase --- which does a *ton* of layout of spans because of the 70+-deep nested spans --- nsLineLayout::VerticalAlignFrames spends significant amounts of time getting font metrics, which actually aren't used most of the time. This could easily be optimized.
-- nsHTMLContainerFrame::ReparentFrameViewList is being hit a lot. We could speed it up by exiting early when the frame has no view or child view to reparent. I think we have noted this elsewhere but deferred it in the hope of removing views altogether, but we should probably do it for 1.9. 3-4% Tp win on this testcase.
-- We spend 2-3% of the profile in nsFont::BaseEquals doing case-insensitive comparison of font family names. Atoms for lowercased font-family names would really help here. I believe Rob Arnold may have a patch for this...
-- I see 3% of the profile right here in nsHTMLReflowState:

  nsIAtom* frameType = frame->GetType();			
  if (nsGkAtoms::tableRowGroupFrame == frameType ||
      nsGkAtoms::tableColGroupFrame == frameType ||			
      nsGkAtoms::tableRowFrame      == frameType ||			
      nsGkAtoms::tableColFrame      == frameType) {			

I'm not sure why exactly. The generated code is rather interesting:
	0x16e5a0dd	movl     +4657654(%ebx), %eax
	0x16e5a0e3	cmpl     (%eax), %edx
	0x16e5a0e5	jz       0x16e5a10d	
	0x16e5a0e7	movl     +4657622(%ebx), %eax
0.9%	0x16e5a0ed	cmpl     (%eax), %edx
	0x16e5a0ef	jz       0x16e5a10d 
0.2%	0x16e5a0f1	movl     +4656998(%ebx), %eax
87.2%	0x16e5a0f7	cmpl     (%eax), %edx
2.3%	0x16e5a0f9	jz       0x16e5a10d 
0.5%	0x16e5a0fb	movl     +4657594(%ebx), %eax
	0x16e5a101	cmpl     (%eax), %edx
	0x16e5a103	jz       0x16e5a10d 

Not sure why that one instruction hurts so much. Cache conflict maybe? It's unfortunate that each static access appears to go through some kind of indirect lookup. Maybe we'd gain by rewriting atom tables to use a single static object or array.
David, you may find the above comments interesting...
> -- I see 3% of the profile right here in nsHTMLReflowState:

Would it make sense to just have a virtual method on nsIFrame to handle this use case?  Or at least an IsFrameOfType bit (though that would not work for border-collapse tables)?

It's worth getting bugs filed on all the various points; they should block this bug.

I still wonder how IE manages to be several times faster, though. :(
> I still wonder how IE manages to be several times faster, though. :(

Are we sure it's treating the unescaped email addresses the same way we are, and creating 70+-deep nested spans? If we just rejected opening tags with @s in them, this problem would disappear instantly.

We consume time and space that's proportional to the depth of the spans times the number of lines. That's fundamental to our inline frame model. That could be avoided (e.g. a representation where we don't have frames for spans, just for inline leaves), but there would be other tradeoffs and it would require us to make inline frames, and inline reflow, rather special and different from other frames. That could well be a good thing, but it's a big project.
So the parser discussion is basically bug 112738, right?

It might be nice to actually figure out what IE does (which appareantly does NOT include treating <thread.h> as a tag?) and implementing it....
Depends on: 112738
IE treats unknown tags as empty elements.

How does Safari trunk compare to Gecko on this, performance-wise? They create a tree like we do, so are a more interesting comparison.
On my Powerbook G4, Safari trunk renders this page in a couple of seconds. Firefox trunk takes orders of magnitude longer. So it's not a matter of the parsing of the tags. (Bringing up their DOM inspector takes a long time, presumably because of the depth of the tree.)
I implemented the linebreaking optimization; it helps a tiny bit, but is simple, and would help more on other plaintext pages, so it's worth taking.

I implemented the explicit-stack version of ScanFrames and it's much much worse, so forget that.

We're doing a fair bit of re-reflowing of lines when text nodes get appended to. That is also causing a fair bit of textrun reconstruction.
Hoisting the scanning process up to start it from the block level didn't seem to make any difference either, and was hard to do.

The next thing I want to try is fixing the scanning process so we don't have to scan all the way from the start of the block to the end, every time. This would be generally very useful.
> Firefox trunk takes orders of magnitude longer.

It shouldn't be orders of magnitude. On my Macbook Pro, trunk takes about 6 seconds to fully load the page. Maybe one order of magnitude, tops...
Okay, I fixed the scanning process and that made another significant dent in the new-textframe regression*. Now it's down to 7%. I'll do some more profiling tomorrow.

* only after I ran my benchmark with the laptop plugged in. Grrr.
OK, reprofiling shows that the remaining differences are mainly due to
-- Tab processing being slower (1.0% out of the 7.9% of nsTextFrame::Reflow). This is basically because instead of expanding tabs to some number of spaces during text transformation, we treat it as spacing (taking slower paths) and perform text measurement to figure out how much spacing is required. I think this is OK as the cost of a better tabbing model and CSS 2.1 compliance...
-- About 1.2% overhead for building textruns, excluding the actual gfx text-to-textrun construction. This is basically the cost of scanning the frame tree backwards to find where to start construction every time we append to the block (0.5%) and the cost of scanning the frame tree forwards from that point to collect all the text that needs textrun construction. These costs are particularly high for this document because they require us to crawl through the 70-deep inline spans for every line.
-- The biggest difference is the cost of actual textrun construction. With the old textframe it's 0.2% of the profile, with the new textframe it's 4.7% of the (longer!) profile. I think it's pretty clear that the per-word cache the old textframe uses (indirectly) is working much better than the per-same-style-text-run cache that the new textframe is using, on this document. I think this means we really need a word cache in the new textframe world too.
Although, another fix that would help this particular document is to recognize a monospaced font and take a no-kerning, no-ligatures fast path that caches the glyphs and advances for 96 displayable ASCII characters.
Attachment #267261 - Flags: review?(pavlov)
the patch in this bug is the same as was in another bug. .should this be duped?
No. There's a dependency marked on bug 383547. Anyway, this is not a new-textframe bug really; ultimately the performance issues here have little to do with text frames.
BTW I got the word cache working. It seems to reduce the new-textframe penalty in this bug to something negligible...
Blocks: 367116
No longer blocks: 367116
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.4) Gecko/20100503 Firefox/3.6.4

Still reproducible on Windows XP SP3 with Firefox 3.6.4.
Still reproducible on Windows 7 64bit with Firefox 8.0a2. Adding to MemShrink.
Whiteboard: dupeme → dupeme, [MemShrink]
I'm seeing 320mb of layout arena memory usage here.

This looks like a duplicate of bug 319143 to me, but let's mark it as depending on that bug and check back when we fix that one to see if this one is better.
Depends on: 319143
Summary: Slow text file compared to IE - high CPU and memory for simple page → Page with lots of <pre> text uses lots of memory in layout arenas and is slow to load
MemShrink:P2, per the priority of bug 319143.
Whiteboard: dupeme, [MemShrink] → [MemShrink:P2]
No longer depends on: 112738
Jonathan: This bug still reproduces for me using the attached test.html file. Of the 325 MB used in the Layout arena, only 4.89 MB (01.22%) consists of text-runs. For a file of this type (all <pre> text) where does the rest of the RAM go?
Frame objects I think. This test has very very deep nested <span>s breaking across lines, and we create one frame per <span> per line it's on, so we create many many frames.
Removing the MemShrink tag because it blocks tracking bug 689769 which also has the MemShrink tag.
Removing the MemShrink tag because it blocks tracking bug 689769 which also has the MemShrink tag.
Whiteboard: [MemShrink:P2]
Some updated measurements with more detail:

> ├──330.50 MB (71.33%) -- active/window(file:///home/njn/FAQ.html)
> │  ├──327.62 MB (70.70%) -- layout
> │  │  ├──318.25 MB (68.68%) -- frames
> │  │  │  ├──315.09 MB (68.00%) ── nsInlineFrame
> │  │  │  └────3.16 MB (00.68%) ++ (3 tiny)
> │  │  ├────4.89 MB (01.06%) ── text-runs
> │  │  └────4.48 MB (00.97%) ++ (6 tiny)
> │  └────2.89 MB (00.62%) ++ (4 tiny)
Severity: normal → S3

This is still pretty slow, don't know if it's still relevant though.

Firefox Nightly: 5286
Chromium: 681

https://share.firefox.dev/3FS4x2h

Severity: S3 → --
Component: General → Layout
OS: Windows 2000 → All
Hardware: x86 → All
Performance Impact: --- → ?
Version: Trunk → unspecified

The Performance Impact Calculator has determined this bug's performance impact to be medium. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Platforms: [x] Windows [x] macOS [x] Linux [x] Android
Page load impact: Some
Websites affected: Rare
[x] Able to reproduce locally

Performance Impact: ? → medium
Severity: -- → S3

I'm not sure if it's a determining factor in causing the issue or not, but one interesting aspect of the testcase is that it's got extremely deeply nested inline-level elements, due to the testcase having a whole bunch of unescaped <stuff> tokens which end up getting interpreted as HTML tags (equivalent to <span> effectively). These weren't intended as HTML tags (they're meant to be human-readable text and markup in quoted newsgroup posts), so they're never closed, which means the nesting level gets deeper and deeper as you go further down the page.

(This is essentially what roc observed in comment 35 here.)

The first such inadvertent HTML tag is from a mention of <thread.h>, and then the rest are mostly email addresses from replies in the quoted discussions there (e.g. John Doe <johndoe@example.org> wrote: [some quoted email] -- the <> expression there gets interpreted as an HTML tag and adds a layer of nesting for the rest of the document that follows it).

This ends up meaning each line in the frame tree has about 240 levels of inline frames around the text on that line. When we hit a newline character, we end that line and fragment all of those 240 inline frames to generate a continuation for each on the next line.

So essentially, each linebreak here is quite "heavy" in terms of overhead, due to the inadvertent massive nesting level. And there are lots of linebreaks.

Attachment #9324995 - Attachment description: Boris' test case → testcase 1: Boris' test case
Attached file testcase 2

Here's a much simplified version of Boris's testcase, with the same alert, but being a bit more up-front about the nesting and newlines.

In Firefox, the alert shows 1675 i.e. it takes about 1.7 seconds to reflow.
In Chrome, the alert shows 3 (i.e. 3ms to reflow).
In Epiphany (WebKit), the alert shows 1.

Depends on: 1832678

Profile with testcase 2: https://share.firefox.dev/3O3NHCu

The reflow there is 2.1s long. If I focus on that and tick 'invert call stack', the top 3 items are responsible for about 41% of that time, and they appear to be things that we could optimize away entirely (or almost entirely):

(1) 328ms in GetNearestBlockContainer, called by nsIFrame::GetContainingBlock, called by mozilla::ReflowInput::InitCBReflowInput. GetNearestBlockContainer will be expensive when called on leaf frames here since it walks the ancestor tree until it finds a block-level container, which in this case is often a long walk, and will always find the same frame. This is nearly always unnecessary since really we're just calling this function to checking whether the return value is equal to our parent frame (and generally it is not, in this testcase, and we could find that out much quicker).

(2) 311ms in nsTArray_base::Length, inside of nsInlineFrame::ReflowFrames, calling into nsIFrame::SetParent (probably to reparent after fragmenting for a linebreak), calling into nsIFrame::SchedulePaint, calling into InvalidateRenderingObservers, calling into mozilla::SVGObserverUtils::InvalidateDirectRenderingObservers, which then does a frame property removal, which involves the nsTArray_Base::Length call (probably to check the length of the frame property array when we go to look for the frame-property that we've been asked to remove). I assume the frame-property-removal in question is ObjectBoundingBoxProperty(), here:
https://searchfox.org/mozilla-central/rev/ee6ab6eb2d222e0004604de62206baa67cac33af/layout/svg/SVGObserverUtils.cpp#1694
In this case that property would not be set in the first place, so it's silly that we're spending time trying to remove it. We should consider putting its usage/removal behind a bit-flag somewhere. (Also, maybe better: I wonder if we can avoid/coalesce many of these SchedulePaint calls entirely, since this is all happening before we've ever painted at all...)

(3) 237ms in nsContainerFrame::ReparentFrameViewList, called by nsInlineFrame::Reflow as part of linebreak-fragmentation reparenting. Similar to (1), this walks the ancestor chain of the old-parent and the new-parent, until it finds a view (it won't) or a common ancestor (which in this case is the containing block, which is ~300 inline-frame ancestors away in this testcase). Due to how many things are getting fragmented with each linebreak, this happens a lot and wastes a lot of time. I suspect this is something that we could coalesce or put behind some sort of state bit at some level.

Summary: Page with lots of <pre> text uses lots of memory in layout arenas and is slow to load → Page with lots of <pre> text (or rather: extremely deeply-nested inline tags, with lots of linebreaks) uses lots of memory in layout arenas and is slow to load
Depends on: 1832684
Depends on: 1832685

I filed bug 1832678 on part (1), bug 1832684 on part (2), and bug 1832685 on part (3) in comment 45.

To make things a bit more explicit, this version of the testcase explicitly displays the body and reads document.documentElement.offsetHeight to flush layout, as part of the onload handler.

This makes epiphany take several hundred milliseconds (in the range of 300-500), though Chrome still loads it near-instantly, reporting ~24ms

This testcase is identical to testcase 3 except that I added margin-top to the span elements, which has no effect.

This makes the testcase take much longer in Chrome, on the order of 900-1000ms (roughly as long as we would take, with the 3 issues in comment 45 addressed).

This behavior-difference suggests to me that Chrome is optimizing away the deeply-nested inline boxes entirely in the earlier testcases here, taking advantage of the fact that they have no actual layout impact.

(In fact, that's still true here in testcase 4; but you would have to do more analysis to prove that, and presumably their heuristics don't do that analysis.)

Attachment #9333234 - Attachment description: testcase 4 (same as testcase 3 but now with inert `margin-top:300px` style to span) → testcase 4 (same as testcase 3 but now with inert `margin-top:300px` style to span, which makes it slower in Chrome)

Now that the three things in comment 45 - comment 46 are fixed (thanks to tnikkel), testcases 2, 3, and 4 now consistently reports 600 - 640 milliseconds for me in Nightly.

Testcase 4 reports higher (worse) values in Chrome -- 700-900ms, and occasionally values in the ~1400ms neighborhood.
Testcases 2-3 are also still consistently between 2 to 50ms in Chrome, though, I assume because of the possible optimization that I alluded to in comment 48.

Testcase 1 has improved a little bit for us, too, but not entirely. It reports ~3000-3300ms in today's Nightly, vs. ~4200ms in a Nightly from last week.

Updated profiles (reloaded the page to get 3 distinct loads in each case, to have more data):

The profile of Testcase 1 shows that we're still spending ~35% of the reflow inside of SchedulePaint (which is mostly InvalidateDirectRenderingObservers) -- i.e. bug 1832684's fix isn't helping us there. That makes sense, since in that case we are still doing a substantial amount of reflow after we've unsuppressed painting. (We do an initial paint early on before we've reflowed the whole document.) That's probably the next low-hanging-fruit to fix here.

Depends on: 1833208
Depends on: 1833212

I spun off bug 1833208 for the hypothetical inline-coalescing optimization mentioned in comment 48.
I spun off bug 1833212 for the remaining SchedulePaint/InvalidateDirectRenderingObservers burden.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: