<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 7

•

18 years ago

Should profile once new textframe lands.

Depends on: 367177

Comment 8

•

18 years ago

Although this is is probably at least partly related to the unescaped angle brackets around e-mail addresses in the file that give a deep tag nesting stack... Also note that this file is most emphatically NOT plaintext. I'm not sure why people keep talking about a "text file".

Comment 9

•

18 years ago

roc, for what it's worth new textframe makes this testcase about 12% slower over here -- goes from 80 seconds to 90 seconds or so.

Comment 10

•

18 years ago

Attached file File as of today — Details

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

18 years ago

Attachment #267117 - Attachment is patch: false

Attachment #267117 - Attachment mime type: text/plain → application/zip

Comment 11

•

18 years ago

It makes it 36% slower over here. Looking into it.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 12

•

18 years ago

Attached patch fix overflow problem due to inconsistent rounding — Details — Splinter Review

A big part of the problem I'm seeing on Mac is that a lot of the text frames overflow their inline containers: their y-offset is -1 appunit. This happens the font ascent on Mac is 588.something appunits. nsTextFrameThebes computes the text ascent as ceil(font-ascent) == 589, which I think is the safe thing to do ... but nsThebesFontMetrics rounds the ascent to 588, which is used to set the ascent of the inline container. Then baseline alignment of the text pushes it slightly up out of its container. This causes us to propagate overflow rects all over the place causing serious performance problems. There is a related problem with height calculation. nsTextFrameThebes adds ceil(font-ascent) and ceil(font-descent) to get the height. nsThebesFontMetrics needs to do the same thing (currently it just rounds the font maxHeight). This patch fixes these issues by patching nsThebesFontMetrics. We could change nsTextFrameThebes to round ascent and descent and/or compute height differently, but using ceil is safer, especially when we will soon be computing accurate glyph bounds. (At that time, on Mac, we'll want to round font ascent and descent up to the nearest *pixel*, which will make this patch less relevant...)

Attachment #267261 - Flags: review?(pavlov)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 13

•

18 years ago

That knocks the new-textframe penalty down to about 11%, similar to what Boris reports. I had a close look with Shark, which is a good tool but it's still hard to see exactly what the problem is. The slowdown is clearly in reflow (of course), but a lot of it (nearly half?) just seems to be things being generically fractionally slower everywhere in the call graph. What does stand out is 2-3% of the profile spent in BuildTextRunsScanner::ScanFrames, almost entirely at exactly the recursive call to ScanFrames. I think what's happening here is that we descend very deep into a stack of inlines doing ::Reflow, reach the first text frame, and then call EnsureTextRun on it, which initiates another recursive traversal starting at the block through all child frames to create textruns for the text frames, which grows the stack a lot more. I think this is significantly increasing our stack memory footprint which is probably hurting cache and slowing down other things as well as being slow itself. This code can easily be turned into an explicit-stack algorithm so I'll try doing that and see how much it helps. Also, we should probably trigger textrun construction from the block when we reflow an inline line for the first time, so that it doesn't add to the bottom of the ::Reflow stack. Another 1% or so of the profile is spent setting up line breaks and storing them in the textrun. This could be avoided when the paragraph is entirely white-space:pre or white-space:nowrap. This is easy and I'll do this too. It looks like textrun construction is about as expensive with old or new textframes, even though the textruns being constructed are rather different. We are working a little bit harder with the new textframe, because preformatted tabs are being rendered as proper tab stops instead of just expanded to a certain number of spaces. Of course for a fixed-width font like this page, that doesn't make a difference, but optimizing that away would be complex and I think undesirable. That accounts for maybe 1% of the profile. Other generic observations that are independent of new or old textframe: -- In this testcase --- which does a *ton* of layout of spans because of the 70+-deep nested spans --- nsLineLayout::VerticalAlignFrames spends significant amounts of time getting font metrics, which actually aren't used most of the time. This could easily be optimized. -- nsHTMLContainerFrame::ReparentFrameViewList is being hit a lot. We could speed it up by exiting early when the frame has no view or child view to reparent. I think we have noted this elsewhere but deferred it in the hope of removing views altogether, but we should probably do it for 1.9. 3-4% Tp win on this testcase. -- We spend 2-3% of the profile in nsFont::BaseEquals doing case-insensitive comparison of font family names. Atoms for lowercased font-family names would really help here. I believe Rob Arnold may have a patch for this... -- I see 3% of the profile right here in nsHTMLReflowState: nsIAtom* frameType = frame->GetType(); if (nsGkAtoms::tableRowGroupFrame == frameType || nsGkAtoms::tableColGroupFrame == frameType || nsGkAtoms::tableRowFrame == frameType || nsGkAtoms::tableColFrame == frameType) { I'm not sure why exactly. The generated code is rather interesting: 0x16e5a0dd movl +4657654(%ebx), %eax 0x16e5a0e3 cmpl (%eax), %edx 0x16e5a0e5 jz 0x16e5a10d 0x16e5a0e7 movl +4657622(%ebx), %eax 0.9% 0x16e5a0ed cmpl (%eax), %edx 0x16e5a0ef jz 0x16e5a10d 0.2% 0x16e5a0f1 movl +4656998(%ebx), %eax 87.2% 0x16e5a0f7 cmpl (%eax), %edx 2.3% 0x16e5a0f9 jz 0x16e5a10d 0.5% 0x16e5a0fb movl +4657594(%ebx), %eax 0x16e5a101 cmpl (%eax), %edx 0x16e5a103 jz 0x16e5a10d Not sure why that one instruction hurts so much. Cache conflict maybe? It's unfortunate that each static access appears to go through some kind of indirect lookup. Maybe we'd gain by rewriting atom tables to use a single static object or array.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 14

•

18 years ago

David, you may find the above comments interesting...

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 15

•

18 years ago

> -- I see 3% of the profile right here in nsHTMLReflowState: Would it make sense to just have a virtual method on nsIFrame to handle this use case? Or at least an IsFrameOfType bit (though that would not work for border-collapse tables)? It's worth getting bugs filed on all the various points; they should block this bug. I still wonder how IE manages to be several times faster, though. :(

Comment 16

•

18 years ago

> I still wonder how IE manages to be several times faster, though. :( Are we sure it's treating the unescaped email addresses the same way we are, and creating 70+-deep nested spans? If we just rejected opening tags with @s in them, this problem would disappear instantly. We consume time and space that's proportional to the depth of the spans times the number of lines. That's fundamental to our inline frame model. That could be avoided (e.g. a representation where we don't have frames for spans, just for inline leaves), but there would be other tradeoffs and it would require us to make inline frames, and inline reflow, rather special and different from other frames. That could well be a good thing, but it's a big project.

Hixie (not reading bugmail)

Comment 17

•

18 years ago

So the parser discussion is basically bug 112738, right? It might be nice to actually figure out what IE does (which appareantly does NOT include treating <thread.h> as a tag?) and implementing it....

Depends on: 112738

Comment 18

•

18 years ago

IE treats unknown tags as empty elements. How does Safari trunk compare to Gecko on this, performance-wise? They create a tree like we do, so are a more interesting comparison.

Hixie (not reading bugmail)

Comment 19

•

18 years ago

On my Powerbook G4, Safari trunk renders this page in a couple of seconds. Firefox trunk takes orders of magnitude longer. So it's not a matter of the parsing of the tags. (Bringing up their DOM inspector takes a long time, presumably because of the depth of the tree.)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 20

•

18 years ago

I implemented the linebreaking optimization; it helps a tiny bit, but is simple, and would help more on other plaintext pages, so it's worth taking. I implemented the explicit-stack version of ScanFrames and it's much much worse, so forget that. We're doing a fair bit of re-reflowing of lines when text nodes get appended to. That is also causing a fair bit of textrun reconstruction.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 21

•

18 years ago

Hoisting the scanning process up to start it from the block level didn't seem to make any difference either, and was hard to do. The next thing I want to try is fixing the scanning process so we don't have to scan all the way from the start of the block to the end, every time. This would be generally very useful.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 22

•

18 years ago

> Firefox trunk takes orders of magnitude longer. It shouldn't be orders of magnitude. On my Macbook Pro, trunk takes about 6 seconds to fully load the page. Maybe one order of magnitude, tops...

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 23

•

18 years ago

Okay, I fixed the scanning process and that made another significant dent in the new-textframe regression*. Now it's down to 7%. I'll do some more profiling tomorrow. * only after I ran my benchmark with the laptop plugged in. Grrr.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 24

•

18 years ago

OK, reprofiling shows that the remaining differences are mainly due to -- Tab processing being slower (1.0% out of the 7.9% of nsTextFrame::Reflow). This is basically because instead of expanding tabs to some number of spaces during text transformation, we treat it as spacing (taking slower paths) and perform text measurement to figure out how much spacing is required. I think this is OK as the cost of a better tabbing model and CSS 2.1 compliance... -- About 1.2% overhead for building textruns, excluding the actual gfx text-to-textrun construction. This is basically the cost of scanning the frame tree backwards to find where to start construction every time we append to the block (0.5%) and the cost of scanning the frame tree forwards from that point to collect all the text that needs textrun construction. These costs are particularly high for this document because they require us to crawl through the 70-deep inline spans for every line. -- The biggest difference is the cost of actual textrun construction. With the old textframe it's 0.2% of the profile, with the new textframe it's 4.7% of the (longer!) profile. I think it's pretty clear that the per-word cache the old textframe uses (indirectly) is working much better than the per-same-style-text-run cache that the new textframe is using, on this document. I think this means we really need a word cache in the new textframe world too.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 25

•

18 years ago

Although, another fix that would help this particular document is to recognize a monospaced font and take a no-kerning, no-ligatures fast path that caches the glyphs and advances for 96 displayable ASCII characters.

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

18 years ago

Depends on: 383547

Stuart Parmenter

Updated

•

18 years ago

Attachment #267261 - Flags: review?(pavlov)

Stuart Parmenter

Comment 26

•

18 years ago

the patch in this bug is the same as was in another bug. .should this be duped?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 27

•

18 years ago

No. There's a dependency marked on bug 383547. Anyway, this is not a new-textframe bug really; ultimately the performance issues here have little to do with text frames.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 28

•

18 years ago

BTW I got the word cache working. It seems to reduce the new-textframe penalty in this bug to something negligible...

Updated

•

18 years ago

Blocks: 367116

Marco Castelluccio [:marco]

Updated

•

18 years ago

No longer blocks: 367116

Carlos Alén Silva

Comment 30

•

15 years ago

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.4) Gecko/20100503 Firefox/3.6.4 Still reproducible on Windows XP SP3 with Firefox 3.6.4.

Comment 31

•

14 years ago

Still reproducible on Windows 7 64bit with Firefox 8.0a2. Adding to MemShrink.

Whiteboard: dupeme → dupeme, [MemShrink]

Justin Lebar (not reading bugmail)

Comment 32

•

14 years ago

I'm seeing 320mb of layout arena memory usage here. This looks like a duplicate of bug 319143 to me, but let's mark it as depending on that bug and check back when we fix that one to see if this one is better.

Depends on: 319143

Summary: Slow text file compared to IE - high CPU and memory for simple page → Page with lots of <pre> text uses lots of memory in layout arenas and is slow to load

Justin Lebar (not reading bugmail)

Comment 33

•

14 years ago

MemShrink:P2, per the priority of bug 319143.

Whiteboard: dupeme, [MemShrink] → [MemShrink:P2]

Justin Lebar (not reading bugmail)

Updated

•

14 years ago

No longer depends on: 112738

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

13 years ago

Blocks: text-suck

Jet Villegas (inactive)

Comment 34

•

13 years ago

Jonathan: This bug still reproduces for me using the attached test.html file. Of the 325 MB used in the Layout arena, only 4.89 MB (01.22%) consists of text-runs. For a file of this type (all <pre> text) where does the rest of the RAM go?

Comment 35

•

13 years ago

Frame objects I think. This test has very very deep nested <span>s breaking across lines, and we create one frame per <span> per line it's on, so we create many many frames.

Comment 36

•

13 years ago

Removing the MemShrink tag because it blocks tracking bug 689769 which also has the MemShrink tag.

Comment 37

•

13 years ago

Removing the MemShrink tag because it blocks tracking bug 689769 which also has the MemShrink tag.

Updated

•

13 years ago

Whiteboard: [MemShrink:P2]

https://share.firefox.dev/3FS4x2h

Comment 38

•

11 years ago

Some updated measurements with more detail: > ├──330.50 MB (71.33%) -- active/window(file:///home/njn/FAQ.html) > │ ├──327.62 MB (70.70%) -- layout > │ │ ├──318.25 MB (68.68%) -- frames > │ │ │ ├──315.09 MB (68.00%) ── nsInlineFrame > │ │ │ └────3.16 MB (00.68%) ++ (3 tiny) > │ │ ├────4.89 MB (01.06%) ── text-runs > │ │ └────4.48 MB (00.97%) ++ (6 tiny) > │ └────2.89 MB (00.62%) ++ (4 tiny)

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Gregory Pappas [:gregp]

Comment 39

•

2 years ago

Attached file testcase 1: Boris' test case — Details

Gregory Pappas [:gregp]

Comment 40

•

2 years ago

This is still pretty slow, don't know if it's still relevant though.

Firefox Nightly: 5286
Chromium: 681

URL: http://www.lambdacs.com/cpt/FAQ.html → https://web.archive.org/web/200308120...

Severity: S3 → --

Component: General → Layout

OS: Windows 2000 → All

Hardware: x86 → All

Gregory Pappas [:gregp]

Updated

•

2 years ago

Performance Impact: --- → ?

Version: Trunk → unspecified

Markus Stange [:mstange]

Comment 41

•

2 years ago

The Performance Impact Calculator has determined this bug's performance impact to be medium. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Platforms: [x] Windows [x] macOS [x] Linux [x] Android
Page load impact: Some
Websites affected: Rare
[x] Able to reproduce locally

Performance Impact: ? → medium

Updated

•

2 years ago

Severity: -- → S3

Comment 42

•

2 years ago

•

Edited

I'm not sure if it's a determining factor in causing the issue or not, but one interesting aspect of the testcase is that it's got extremely deeply nested inline-level elements, due to the testcase having a whole bunch of unescaped <stuff> tokens which end up getting interpreted as HTML tags (equivalent to <span> effectively). These weren't intended as HTML tags (they're meant to be human-readable text and markup in quoted newsgroup posts), so they're never closed, which means the nesting level gets deeper and deeper as you go further down the page.

(This is essentially what roc observed in comment 35 here.)

The first such inadvertent HTML tag is from a mention of <thread.h>, and then the rest are mostly email addresses from replies in the quoted discussions there (e.g. John Doe <johndoe@example.org> wrote: [some quoted email] -- the <> expression there gets interpreted as an HTML tag and adds a layer of nesting for the rest of the document that follows it).

Comment 43

•

2 years ago

This ends up meaning each line in the frame tree has about 240 levels of inline frames around the text on that line. When we hit a newline character, we end that line and fragment all of those 240 inline frames to generate a continuation for each on the next line.

So essentially, each linebreak here is quite "heavy" in terms of overhead, due to the inadvertent massive nesting level. And there are lots of linebreaks.

Updated

•

2 years ago

Attachment #9324995 - Attachment description: Boris' test case → testcase 1: Boris' test case

Comment 44

•

2 years ago

Attached file testcase 2 — Details

Here's a much simplified version of Boris's testcase, with the same alert, but being a bit more up-front about the nesting and newlines.

In Firefox, the alert shows 1675 i.e. it takes about 1.7 seconds to reflow.
In Chrome, the alert shows 3 (i.e. 3ms to reflow).
In Epiphany (WebKit), the alert shows 1.

Updated

•

2 years ago

Depends on: 1832678

Comment 45

•

2 years ago

Profile with testcase 2: https://share.firefox.dev/3O3NHCu

The reflow there is 2.1s long. If I focus on that and tick 'invert call stack', the top 3 items are responsible for about 41% of that time, and they appear to be things that we could optimize away entirely (or almost entirely):

(1) 328ms in GetNearestBlockContainer, called by nsIFrame::GetContainingBlock, called by mozilla::ReflowInput::InitCBReflowInput. GetNearestBlockContainer will be expensive when called on leaf frames here since it walks the ancestor tree until it finds a block-level container, which in this case is often a long walk, and will always find the same frame. This is nearly always unnecessary since really we're just calling this function to checking whether the return value is equal to our parent frame (and generally it is not, in this testcase, and we could find that out much quicker).

(2) 311ms in nsTArray_base::Length, inside of nsInlineFrame::ReflowFrames, calling into nsIFrame::SetParent (probably to reparent after fragmenting for a linebreak), calling into nsIFrame::SchedulePaint, calling into InvalidateRenderingObservers, calling into mozilla::SVGObserverUtils::InvalidateDirectRenderingObservers, which then does a frame property removal, which involves the nsTArray_Base::Length call (probably to check the length of the frame property array when we go to look for the frame-property that we've been asked to remove). I assume the frame-property-removal in question is ObjectBoundingBoxProperty(), here:
https://searchfox.org/mozilla-central/rev/ee6ab6eb2d222e0004604de62206baa67cac33af/layout/svg/SVGObserverUtils.cpp#1694
In this case that property would not be set in the first place, so it's silly that we're spending time trying to remove it. We should consider putting its usage/removal behind a bit-flag somewhere. (Also, maybe better: I wonder if we can avoid/coalesce many of these SchedulePaint calls entirely, since this is all happening before we've ever painted at all...)

(3) 237ms in nsContainerFrame::ReparentFrameViewList, called by nsInlineFrame::Reflow as part of linebreak-fragmentation reparenting. Similar to (1), this walks the ancestor chain of the old-parent and the new-parent, until it finds a view (it won't) or a common ancestor (which in this case is the containing block, which is ~300 inline-frame ancestors away in this testcase). Due to how many things are getting fragmented with each linebreak, this happens a lot and wastes a lot of time. I suspect this is something that we could coalesce or put behind some sort of state bit at some level.

Updated

•

2 years ago

Summary: Page with lots of <pre> text uses lots of memory in layout arenas and is slow to load → Page with lots of <pre> text (or rather: extremely deeply-nested inline tags, with lots of linebreaks) uses lots of memory in layout arenas and is slow to load

Updated

•

2 years ago

Depends on: 1832684

Updated

•

2 years ago

Depends on: 1832685

Comment 46

•

2 years ago

I filed bug 1832678 on part (1), bug 1832684 on part (2), and bug 1832685 on part (3) in comment 45.

Comment 47

•

2 years ago

Attached file testcase 3 (explicitly flush layout as part of the benchmark) — Details

To make things a bit more explicit, this version of the testcase explicitly displays the body and reads document.documentElement.offsetHeight to flush layout, as part of the onload handler.

This makes epiphany take several hundred milliseconds (in the range of 300-500), though Chrome still loads it near-instantly, reporting ~24ms

Comment 48

•

2 years ago

Attached file testcase 4 (same as testcase 3 but now with inert `margin-top:300px` style to span, which makes it slower in Chrome) — Details

This testcase is identical to testcase 3 except that I added margin-top to the span elements, which has no effect.

This makes the testcase take much longer in Chrome, on the order of 900-1000ms (roughly as long as we would take, with the 3 issues in comment 45 addressed).

This behavior-difference suggests to me that Chrome is optimizing away the deeply-nested inline boxes entirely in the earlier testcases here, taking advantage of the fact that they have no actual layout impact.

(In fact, that's still true here in testcase 4; but you would have to do more analysis to prove that, and presumably their heuristics don't do that analysis.)

Updated

•

2 years ago

Attachment #9333234 - Attachment description: testcase 4 (same as testcase 3 but now with inert `margin-top:300px` style to span) → testcase 4 (same as testcase 3 but now with inert `margin-top:300px` style to span, which makes it slower in Chrome)

Comment 49

•

2 years ago

•

Edited

Now that the three things in comment 45 - comment 46 are fixed (thanks to tnikkel), testcases 2, 3, and 4 now consistently reports 600 - 640 milliseconds for me in Nightly.

Testcase 4 reports higher (worse) values in Chrome -- 700-900ms, and occasionally values in the ~1400ms neighborhood.
Testcases 2-3 are also still consistently between 2 to 50ms in Chrome, though, I assume because of the possible optimization that I alluded to in comment 48.

Testcase 1 has improved a little bit for us, too, but not entirely. It reports ~3000-3300ms in today's Nightly, vs. ~4200ms in a Nightly from last week.

Updated profiles (reloaded the page to get 3 distinct loads in each case, to have more data):

Testcase 1: https://share.firefox.dev/3pFEIgP
Testcase 4: https://share.firefox.dev/3I8gO3R
(note, the reflows in these profiles are a bit higher than described earlier in this comment, presumably due to profiling overhead)

The profile of Testcase 1 shows that we're still spending ~35% of the reflow inside of SchedulePaint (which is mostly InvalidateDirectRenderingObservers) -- i.e. bug 1832684's fix isn't helping us there. That makes sense, since in that case we are still doing a substantial amount of reflow after we've unsuppressed painting. (We do an initial paint early on before we've reflowed the whole document.) That's probably the next low-hanging-fruit to fix here.

Updated

•

2 years ago

Depends on: 1833208

Updated

•

2 years ago

Depends on: 1833212