Page with lots of <pre> text (or rather: extremely deeply-nested inline tags, with lots of linebreaks) uses lots of memory in layout arenas and is slow to load
Categories
(Core :: Layout, defect)
Tracking
()
| Performance Impact | medium |
People
(Reporter: paultolk, Unassigned)
References
(Depends on 2 open bugs, Blocks 1 open bug, )
Details
(Keywords: perf)
Attachments
(6 files)
Comment 1•22 years ago
|
||
Comment 2•21 years ago
|
||
Updated•21 years ago
|
Comment 5•19 years ago
|
||
Comment 8•18 years ago
|
||
Comment 9•18 years ago
|
||
Comment 10•18 years ago
|
||
Updated•18 years ago
|
Comment 15•18 years ago
|
||
Comment 17•18 years ago
|
||
Comment 18•18 years ago
|
||
Comment 19•18 years ago
|
||
Updated•18 years ago
|
Comment 26•18 years ago
|
||
Comment 30•15 years ago
|
||
Comment 31•14 years ago
|
||
Comment 32•14 years ago
|
||
Comment 33•14 years ago
|
||
Comment 34•13 years ago
|
||
Comment 36•13 years ago
|
||
Comment 37•13 years ago
|
||
Updated•13 years ago
|
Comment 38•11 years ago
|
||
Updated•3 years ago
|
Comment 39•2 years ago
|
||
Comment 40•2 years ago
|
||
This is still pretty slow, don't know if it's still relevant though.
Firefox Nightly: 5286
Chromium: 681
Updated•2 years ago
|
Comment 41•2 years ago
|
||
The Performance Impact Calculator has determined this bug's performance impact to be medium. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.
Platforms: [x] Windows [x] macOS [x] Linux [x] Android
Page load impact: Some
Websites affected: Rare
[x] Able to reproduce locally
Updated•2 years ago
|
Comment 42•2 years ago
•
|
||
I'm not sure if it's a determining factor in causing the issue or not, but one interesting aspect of the testcase is that it's got extremely deeply nested inline-level elements, due to the testcase having a whole bunch of unescaped <stuff> tokens which end up getting interpreted as HTML tags (equivalent to <span> effectively). These weren't intended as HTML tags (they're meant to be human-readable text and markup in quoted newsgroup posts), so they're never closed, which means the nesting level gets deeper and deeper as you go further down the page.
(This is essentially what roc observed in comment 35 here.)
The first such inadvertent HTML tag is from a mention of <thread.h>, and then the rest are mostly email addresses from replies in the quoted discussions there (e.g. John Doe <johndoe@example.org> wrote: [some quoted email] -- the <> expression there gets interpreted as an HTML tag and adds a layer of nesting for the rest of the document that follows it).
Comment 43•2 years ago
|
||
This ends up meaning each line in the frame tree has about 240 levels of inline frames around the text on that line. When we hit a newline character, we end that line and fragment all of those 240 inline frames to generate a continuation for each on the next line.
So essentially, each linebreak here is quite "heavy" in terms of overhead, due to the inadvertent massive nesting level. And there are lots of linebreaks.
Updated•2 years ago
|
Comment 44•2 years ago
|
||
Here's a much simplified version of Boris's testcase, with the same alert, but being a bit more up-front about the nesting and newlines.
In Firefox, the alert shows 1675 i.e. it takes about 1.7 seconds to reflow.
In Chrome, the alert shows 3 (i.e. 3ms to reflow).
In Epiphany (WebKit), the alert shows 1.
Comment 45•2 years ago
|
||
Profile with testcase 2: https://share.firefox.dev/3O3NHCu
The reflow there is 2.1s long. If I focus on that and tick 'invert call stack', the top 3 items are responsible for about 41% of that time, and they appear to be things that we could optimize away entirely (or almost entirely):
(1) 328ms in GetNearestBlockContainer, called by nsIFrame::GetContainingBlock, called by mozilla::ReflowInput::InitCBReflowInput. GetNearestBlockContainer will be expensive when called on leaf frames here since it walks the ancestor tree until it finds a block-level container, which in this case is often a long walk, and will always find the same frame. This is nearly always unnecessary since really we're just calling this function to checking whether the return value is equal to our parent frame (and generally it is not, in this testcase, and we could find that out much quicker).
(2) 311ms in nsTArray_base::Length, inside of nsInlineFrame::ReflowFrames, calling into nsIFrame::SetParent (probably to reparent after fragmenting for a linebreak), calling into nsIFrame::SchedulePaint, calling into InvalidateRenderingObservers, calling into mozilla::SVGObserverUtils::InvalidateDirectRenderingObservers, which then does a frame property removal, which involves the nsTArray_Base::Length call (probably to check the length of the frame property array when we go to look for the frame-property that we've been asked to remove). I assume the frame-property-removal in question is ObjectBoundingBoxProperty(), here:
https://searchfox.org/mozilla-central/rev/ee6ab6eb2d222e0004604de62206baa67cac33af/layout/svg/SVGObserverUtils.cpp#1694
In this case that property would not be set in the first place, so it's silly that we're spending time trying to remove it. We should consider putting its usage/removal behind a bit-flag somewhere. (Also, maybe better: I wonder if we can avoid/coalesce many of these SchedulePaint calls entirely, since this is all happening before we've ever painted at all...)
(3) 237ms in nsContainerFrame::ReparentFrameViewList, called by nsInlineFrame::Reflow as part of linebreak-fragmentation reparenting. Similar to (1), this walks the ancestor chain of the old-parent and the new-parent, until it finds a view (it won't) or a common ancestor (which in this case is the containing block, which is ~300 inline-frame ancestors away in this testcase). Due to how many things are getting fragmented with each linebreak, this happens a lot and wastes a lot of time. I suspect this is something that we could coalesce or put behind some sort of state bit at some level.
Updated•2 years ago
|
Comment 46•2 years ago
|
||
I filed bug 1832678 on part (1), bug 1832684 on part (2), and bug 1832685 on part (3) in comment 45.
Comment 47•2 years ago
|
||
To make things a bit more explicit, this version of the testcase explicitly displays the body and reads document.documentElement.offsetHeight to flush layout, as part of the onload handler.
This makes epiphany take several hundred milliseconds (in the range of 300-500), though Chrome still loads it near-instantly, reporting ~24ms
Comment 48•2 years ago
|
||
This testcase is identical to testcase 3 except that I added margin-top to the span elements, which has no effect.
This makes the testcase take much longer in Chrome, on the order of 900-1000ms (roughly as long as we would take, with the 3 issues in comment 45 addressed).
This behavior-difference suggests to me that Chrome is optimizing away the deeply-nested inline boxes entirely in the earlier testcases here, taking advantage of the fact that they have no actual layout impact.
(In fact, that's still true here in testcase 4; but you would have to do more analysis to prove that, and presumably their heuristics don't do that analysis.)
Updated•2 years ago
|
Comment 49•2 years ago
•
|
||
Now that the three things in comment 45 - comment 46 are fixed (thanks to tnikkel), testcases 2, 3, and 4 now consistently reports 600 - 640 milliseconds for me in Nightly.
Testcase 4 reports higher (worse) values in Chrome -- 700-900ms, and occasionally values in the ~1400ms neighborhood.
Testcases 2-3 are also still consistently between 2 to 50ms in Chrome, though, I assume because of the possible optimization that I alluded to in comment 48.
Testcase 1 has improved a little bit for us, too, but not entirely. It reports ~3000-3300ms in today's Nightly, vs. ~4200ms in a Nightly from last week.
Updated profiles (reloaded the page to get 3 distinct loads in each case, to have more data):
- Testcase 1: https://share.firefox.dev/3pFEIgP
- Testcase 4: https://share.firefox.dev/3I8gO3R
(note, the reflows in these profiles are a bit higher than described earlier in this comment, presumably due to profiling overhead)
The profile of Testcase 1 shows that we're still spending ~35% of the reflow inside of SchedulePaint (which is mostly InvalidateDirectRenderingObservers) -- i.e. bug 1832684's fix isn't helping us there. That makes sense, since in that case we are still doing a substantial amount of reflow after we've unsuppressed painting. (We do an initial paint early on before we've reflowed the whole document.) That's probably the next low-hanging-fruit to fix here.
Comment 50•2 years ago
|
||
I spun off bug 1833208 for the hypothetical inline-coalescing optimization mentioned in comment 48.
I spun off bug 1833212 for the remaining SchedulePaint/InvalidateDirectRenderingObservers burden.
Description
•