Closed Bug 94587 Opened 23 years ago Closed 6 years ago

Excessive Re-Layout Re-Reflow long documents

Categories

(Core :: Layout, defect, P2)

defect

Tracking

()

RESOLVED INCOMPLETE
mozilla1.4alpha

People

(Reporter: emrick, Unassigned)

References

()

Details

(Keywords: testcase, topembed-, topperf)

Attachments

(1 file)

I've discovered much of the layout and reflow activity in rendering of a large 
document is 'triggered' (the verb 'caused' is not really right here) by the 
method nsGfxScrollFrameInner::Layout, specificly a call to LayoutChildAt by way 
of LayoutBox at this line of code: 
http://lxr.mozilla.org/seamonkey/source/layout/html/base/src/nsGfxScrollFrame.c
pp#1140 

This Layout method is invoked many times (10, 20, more, depending on length) 
for large document, the unconditional LayoutChildAt method invoked each time 
too. Several things unclear to me. It's not clear to me why it's invoked so 
many times, when it would seem to be just related to scroll bars. However, the 
method seems actually involved in layout/reflow/rendering/paint of more than 
scroll bars. If comment out the call to LayoutBox method, then more than just 
scrollbars are impacted, text area part is rendered incomplete, sometimes 
entirely incomplete, depending on document. However, if comment out the call to 
LayoutBox, large document load time improvement dramatic, I measure 35-50%! 

The problem/opportunity here is, it appears much of the the layout, reflow, 
activity triggered through this point is unneeded and/or redundant. There 
appears to be opportunity to significantly reduce response times by reducing 
the number of time that nsGfxScrollFrameInner::Layout is invoked, or by 
filtering inside the Layout method to do less invokes of LayoutBox / 
LayoutChildAt methods.

I've tried a couple hacks at code inside Layout to filter unneeded calls to 
LayoutChildAt, but don't understand the code enough, getting good speed-up, but 
some pages don't finish rendering, or for example, the progress bar doesn't 
clean-up after load complete, other goinks depending on doc content. My 
experiments make me wonder if this Layout method is unexpectedly doing much 
more than it was intented to do, being a inadvertant 'trigger' for work meant 
to be triggered elsewhere. 

Please advise.
Keywords: perf, topperf
Chris (K & W), Dave: any comments on what Sam is looking at?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Some additional data. So, moving one level up in the food chain, most 
of the calls to nsGfxScrollFrameInner::Layout are generated from within 
nsGfxScrollFrame::Reflow. The data below shows 'reflows' and timings for a 
particular document. The bulk of the time is actually spent in LayoutChildAt. 
The data below shows total elapsed time, service time for each layout/reflow. 
Notice 29 individual reflows occur mostly in three bursts of reflow activity. 
The 3 bursts have 9, 9, then 10 reflows. The time spent in layout/reflow is 30% 
of total time.   

time  service
0.00  0.00 nsGfxScrollFrameInner Constructor
0.01  0.00 Reflow 1

2.79  0.84 Reflow 2
3.63  0.01 Reflow 3
3.64  0.02 Reflow 4
3.66  0.01 Reflow 5
3.67  0.02 Reflow 6
3.69  0.01 Reflow 7
3.70  0.02 Reflow 8
3.72  0.01 Reflow 9
3.73  0.02 Reflow 10
3.75       (other activity)
....
4.86  0.37 Reflow 11
5.23  0.03 Reflow 12
5.26  0.01 Reflow 13
5.27  0.02 Reflow 14
5.29  0.02 Reflow 15
5.31  0.03 Reflow 16
5.34  0.03 Reflow 17
5.37  0.02 Reflow 18
5.39  0.02 Reflow 19
5.41       (other activity)
....
7.00  0.49 Reflow 20
7.49  0.04 Reflow 21
7.53  0.04 Reflow 22
7.57  0.04 Reflow 23
7.61  0.04 Reflow 24
7.65  0.04 Reflow 25
7.69  0.03 Reflow 26
7.72  0.04 Reflow 27
7.76  0.04 Reflow 28
7.80  0.04 Reflow 29
7.84       <- total elapsed
      2.35 <- nsGfxScrollFrame::Reflow service time
              most of this time actually in LayoutChildAt
I forgot to say, timing data from my 700 MHz Thinkpad T20. The response time of 
this testcase is very proportional to processor speed, i.e. 233 MHz response 
times three times longer than 700 MHz. The html file loaded from local disk. 
Document is simple self contained text, a large list of links, no imbeds, 
images, etc. It's a customer-provided testcase, so I can't public-domain it.  
The customer app is online documentation. The document basicly looks like a 
detailed, multi-level book index. Customer complaint is response time compared 
to netscape 4.x.    
Sam
Discovery ocntinues. The preference:
user_pref("layout.reflow.async.duringDocLoad", false);
changes the behavior and provides some benefit. 

The problem being, I would guess, this preference as a global and 
decision already made to make the default 'true'. Has thought been 
given to dynamic setting of the preference? Perhaps, maybe, say, based 
on ContentSize versus ClientSize? That is, when the doc outgrows the 
window by some amount, then disable reflow.async for duration of load?

With async off, I get this:

0.00  0.00 nsGfxScrollFrameInner Constructor
0.10  0.00 Reflow 1
....       (other stuff)
2.61  1.02 reflow 2
3.63  0.02 reflow 3
3.66  0.01 reflow 4
3.68  0.02 reflow 5
3.70  0.03 reflow 6
3.73  0.02 reflow 7
3.76  0.02 reflow 8
3.78  0.02 reflow 9
3.80  0.02 reflow 10
3.78  0.02 reflow 11
3.82  0.02 reflow 12
3.84  0.03 reflow 13
3.87  0.02 reflow 14
3.89
....  (other stuff)
6.40  0.72 reflow 15
7.12  0.04 reflow 16
7.16  0.04 reflow 17
7.20       <- elapsed
      2.07 <- reflow service time

There's still opportunity here, I think. The 'reflow 2' time (both 
async=off and async=on) is large because scroll bar visibility change 
happens amidst it, basicly doubles the work and the time. Maybe its 
possible to collapse this to one layout/reflow pass. 
Sam    
Blocks: 71668
The scroll bar change is probably bug 12234. Reassigning to evaughan, due to 
scroll frame issues.
Assignee: karnaze → evaughan
evaughan on sabbatical till 8/20...
sarri, can you get someone to help?
Reflow is recursive.  Anything you reflow in an HTML page is going to go 
through the GFXScrollFrame's LayoutChild methods.  This doesn't implicate the 
scroll frame in any way.
As I've said before, I've patched the code to always show a vertical scrollbar, 
and it doesn't make a bit of difference in the performance numbers.  
Dave is correct. When a document is loaded many incremental reflows occur to
size and new content that added incrementally to increase performance. How much
time is spent in LayoutChildAt excluding the calls under it? Keep in mind to
size an image or to add some text requires a top down reflow. Its efficient
because only the frames that have changed or have children that have changed
will recieve this call.
The base problem is that we fire a timeout to start reflow - but after that we
reflow constantly, which can slow down parsing from a fast source (local for
example).

See PresShell::InitialReflow()

Probably this should NOT be just a 1-time timer, but some initial delay with
smaller delays thereafter (say .1 or .2 seconds, instead of the .01 that the
postings here show).  Better yet (but harder) would be to delay reflows until
either we've had unflowed content for 0.x seconds, or until we've been idle for
0.y seconds (0.y < 0.x).  As total time increases (or document size), we may
want to delay longer between reflows (since it's less likely to be important to
show it immediately).

This will keep fairly smooth updates without spending all our time in reflow,
especially with local datasources or very fast connections.
rjesup: that's not quite right. We don't do initial reflow on a timeout; we do
the initial reflow as soon as we see the <body> element. Subsequent reflows
aren't enqueued until frames have been constructed, and _that_ is primarily
throttled in nsHTMLContentSink. (Which, by the way, has that timer monkey
business you mention, although I'm not fond of it.)

When a reflow is scheduled, yes, we put an event into the queue to process the
reflow commands. (I suppose this is what you mean by ``reflowing constantly''.)
But, if content is arriving rapidly, that event has to compete with all the
other events in the queue, so we'll tend to reflow in larger chunks. In other
words, loading a local file will cause necko ought to queue up several buffers
worth of content before we get around to servicing the reflow event. (We're all
sharing the same event queue.)

I'm a bit skeptical of making pres shell's reflow stuff any more byzantine than
it already is. I'd rather see us try to simplify instead of adding more knobs
and dials to wallpaper over algorithmic problems in content construction, frame
construction, and reflow. My $0.02.
>0.00  0.00 nsGfxScrollFrameInner Constructor
>0.10  0.00 Reflow 1
>....       (other stuff)
>2.61  1.02 reflow 2
>3.63  0.02 reflow 3

>There's still opportunity here, I think. The 'reflow 2' time (both 
>async=off and async=on) is large because scroll bar visibility change 
>happens amidst it, basicly doubles the work and the time. Maybe its 
>possible to collapse this to one layout/reflow pass. 

If this "reflow 2" takes so long (because we have a lot of data?) doesn't this
mean that it is being delayed far too long? Is it possible to efficiently detect
when the scrollbar visibility change must happen?



In reply to: waterson@netscape.com
Ah so, yes, adding some more trace points, this is becoming a bit 
clearer to me now. So in this long doc I am looking at, it would seem 
both flavors of reflow are happening. Both 'content driven' and 'timer 
driven' reflows. So a question is, couldn't / shouldn't a 'content 
driven' reflow then 'reset' the delay (or cancel the pending 
notify/interrupt) for the 'timer driven' reflow?

The mNotificationInterval = 120000; 
at: 
http://lxr.mozilla.org/mozilla/source/content/html/document/src/nsHTMLC
ontentSink.cpp#2523
seems to be a key variable here. I am experimenting with higher values 
(i.e. user_pref("content.notify.interval", xxxxxxx);
to determine tradeoffs. Maybe the problem is, content reflows can take 
so long that timer triggered reflow will usually happen after. One 
thing very clear here to me is the overall behavior of reflow-related  
activity highly dependent on document content. It's seems hard to 
generalize the 'overhead of reflow', even describe expected reflow 
behavior, in abstract of a specific document and its content. 
Sam 
Just to be clear, there is not ``timer driven reflow''. There is ``timer driven
frame construction''. Frame construction enqueues an incremental reflow. An
incremental reflow is targeted at an individual frame, and is coalesced with
subsequent reflows as more and more of the frame tree becomes dirty (see
nsIFrame::ReflowDirtyChild). So, no, we shouldn't cancel a pending reflow just
because new content has arrived, because that reflow may be targeted at a part
of the frame tree that the newly generated reflow command would fail to flow.

The real problem here (why reflows keep taking longer and longer) is that an
reflow has state (positions of floated elements, vertical margins) that must be
recovered from ``clean'' frames. Right now, that state recovery is O(n), which
is why you're seeing each reflow take longer and longer. You should look into
(and help test) dbaron's work on making state recovery be O(1). See bug 78911
and its dependents.
Let me be clear, however you clarify it, reflow-related processing, 
much of which is redundant or unneeded, is occuring, and it needs 
fixed. I'll be real surprised if there is any one point-source fix. I 
don't understand you try to keep short-circuit this discussion. Looks 
to me many places might need to be made better. Some comment made above 
about 'its efficient because only children needing reflow get called to 
do it', well in my testcase, that means 300,000 calls more or less to 
UpdateSpaceManager method in 8 seconds elapsed. Its true most of those 
do not result in a reflow, however, just transversing the tree is far 
from free. 
Excuse me for trying to explain how things work. I'll let you do that on your
own from now on, as I can see it's not appreciated. My point to rjesup and
others was that adding more timers may just mask underlying algorithmic
problems, some of which are known and we are trying to fix.
Chris: thanks for the update; I hadn't looked deep enough.  I just noticed the
figures showing reflows every ~0.02 seconds, which seemed a little bit frequent
to me.

As you say, the O(n) or worse cases are a bigger problem, though I still want to
look into limiting frequency of reflows.  I have some very specific reasons to
want to do this having to do with CPU and display device usage, even if the O(n)
problems are fixed.  (I can't assume I can spend spare CPU cycles reflowing even
if there's nothing else to do within this mozilla.)

Another thing (thinking off the top of my head) is how important is it to reflow
non-visible areas while still loading.  If they're not part of a visible
line/block, then all they can affect is scroll-bars, which is of only limited
importance while still loading.  If they are part of a visible table, but not
visible themselves, the table size _could_ change - but only to grow wider or
maybe to adjust the relative widths of columns.  Again, not critical to be
updated frequently during a load.  (It may be that there are optimizations like
this or far better already within the tree, I'm just starting to look at this.)

(by updated frequently, I mean updating a few times a second perhaps or faster.)

>Excuse me for trying to explain how things work. I'll let you do
>that on your own from now on, as I can see it's not appreciated.
Constructive discussion on the ideas being presented is appreciated. 
Explanations are constructive. Single-handedly dismissing an idea is 
not constructive.

>My point to rjesup and others was that adding more timers may 
>just mask underlying algorithmic problems, some of which are known
>and we are trying to fix.
A mask in the hand is worth two fixes in the bush. Beyond notions of 
beauty, difference between 'a mask' and 'a fix' is simply semantics. 
Whatever causes the problem to not occur is a fix, I think.

I didn't mean to drive you away. Just don't shoot down viable ideas.  
emrick: "whatever causes the problem to not occur is a fix"?  If you mean
"whatever causes the symptom not to occur", then I have to disagree, because
underlying problems can have many symptoms, only some of which we know about at
any given moment.  Treating symptoms may help in a desperate situation; it may
even help to palliate an affliction for a long while.  But it's always better to
fix the underlying problem.

This difference between treating symptoms and treating causes is emphatically
not a semantic or aesthetic issue.  The symptom-treatments bloat and complicate
the system.  The complications mean more bugs.  The underlying problem too often
still festers.  The medical analogy is apt.

Sorry to wax philosophical, but hey, you started it here!  I hope fixes to frame
construction or reflow scheduling simplify the system; I'm not saying you will
inevitably bloat and complicate it by what you propose to fix here, or that
rjesup will do so, either.  Sorry, I'm merely quibbling with your statement that
a mask and a fix are the same, if they make the symptom disappear.

Waterson has been around layout long enough to see people add more "knobs" to
tune code that itself takes too many cycles.  Some of those knobs can be set,
for certain kinds of content, to treat symptoms.  But more symptoms emerge, and
the system grows sicker in spite of the palliatives.  Waterson, attinasi,
dbaron, and others, want to fix the underlying too-many-cycles problem.  If the
knob-adders and -twiddlers were dominant, the symptom-treating would tend to
suppress those real fixes, palliating the current crop of symptoms but leaving
the system worse off for the next crop.  At each step, the palliatives tend to
crowd out the (more involved) underlying fixes.  This has happened already too
often, in my outsider's opinion, in the history of Gecko.

Anyway, I don't see how waterson's comments dated 2001-08-22 12:33 (which stated
facts about how reflow and frame construction work, then asserted facts about
the underlying problem and why it's important to fix) could be said to "shoot
down viable *ideas*" (my emphasis).  Bring on your good ideas, kindly reject
waterson's exhortation to help out on the underlying problem, and do your worst
(worse is better!).

/be
We posted this to the perf newsgroup, but got no response from there.
Would appreciate comment please.
Sam

To:	mozilla-performance@mozilla.org
cc:	 
Subject:	Setting content.notify.backoffcount to 1

We have improved greatly nearly all of our large text testcases by 
setting content.notify.backoffcount 
to the value of 1 (it is defaulted to -1).  One customer test ( 
consisting of 1000s of Links within an unordered list) improved from 
over 26 seconds to under 22 seconds by adding this parameter to 
Prefs.js.  We run our tests using OS/2 on a 333 MHz 128 mb system and 
are at the 0.9.2.1 level of Mozilla.  Most other large text testcases 
improve in the range of 10%.  I have not noticed any layout problems or 
response problems using 1 as the value for the 
content.notify.backoffcount.  

Please advise me,  what I am missing?   Why isn't this the correct 
setting? 

Thanks. 
Ivan Eisen 
IBM 
From what I undersand, setting the backoff to 1 turns off the content sink
flushing until the entire document is loaded (The 1 indicates that content sink
flushing should be suspended after the first reflow). This effectively prevents
incremental loading of the page. If your only goal is overall page load time
then setting it to 1 or n is fine, but most users want to be able to see content
as it comes in on a regular interval. The other effect of setting it to 1 is
that a single reflow will occur when the document has completed loading which
may prevent the user from interacting with Mozilla until the reflow has
completed. This is evident when large documents are loaded. 

cc'ing Vidur
Blocks: 114584
Taking this bug
Assignee: eric → kmcclusk
Target Milestone: --- → mozilla1.0.1
Keywords: mozilla1.0
Bulk moving Mozilla1.01 bugs to future-P1. I will pull from the future-P1 list
to schedule bugs for post Mozilla1.0 milestones
Priority: -- → P1
Target Milestone: mozilla1.0.1 → Future
Keywords: mozilla1.0+
Keywords: mozilla1.0
Keywords: mozilla1.0.1
per conversation w/ jesup, minusing *this* bug for 1.0.1, if specific
patches/bugs fall out of this, please nominate those for 1.0.1 for consideration.
That sounds right to me, use this as gate to specific 
fixes, if and as available.
Sam
Keywords: mozilla1.1mozilla1.2
batch: adding topembed per Gecko2 document
http://rocknroll.mcom.com/users/marek/publish/Gecko/Gecko2Tasks.html
Keywords: topembed
removing mozilla1.0+

Any actual patches that come out of this will be considered for 1.0 branch
Keywords: mozilla1.0+
Blocks: 91643
Keywords: nsbeta1
Keywords: topembedtopembed+
QA Contact: petersen → praveenqa
Blocks: grouper
Keywords: testcase
Think also bug 168638 is related to this.
Keywords: mozilla1.2mozilla1.3
Do we have an approach to this?
Keywords: perf
Blocks: 152385
Target Milestone: Future → mozilla1.4alpha
Blocks: 91351
Blocks: 61684
What is the real issue that this bug is trying to address?
Hum... the original report is pretty clear, I think. 
With particlar kind of documents, very large mostly textual documents, 
say, things that are like 'chapters of a manual' then mozilla remains 
much slower than IE to reder such documents, and some part of this time 
is (or at least was) being spent in visually unproductive activity that 
ia related to document reflow. 

At lot of the reflow-related processing has changed since the bug was 
originally opened. At this point in time, more analysis is needed. The 
root problem (long text documents comparatively slow) remains. Today, 
the laregst major contributor that may be elsewhere (attributes?) or 
not, I don't know. 
Sam
Clearing topembed(+) for re-consideration. 

At this point I think this bug is really a tracking bug for improving overall
reflow performance. 

I recently did measurements where I hacked gecko to disable incremental
construction of layout frames and instead I performed a single content sink
flush to construct all frames and reflow them after all of the document's
content was loaded. 

I tested performance by loading a large document (lxr page for
CSSFrameConstructor.cpp).  I did not see any difference in overall page load
performance.

The assertion in this bug is that constructing and reflowing layout frames as
chunks of content are loaded is significantly slower than delaying the layout
frame construction and reflow until all or most of the document's content has
been constructed does not seem to be case. At least for example I tested.  

There may be other types of content where this is not true however.
Keywords: topembed+topembed
Kevin, the problem with loading lxr pages is, it sorta confuses the issue. look 
at the page source, with such, performance is being dominated by href/history 
and attributes processing, rather than actual content delivery. i.e. the real 
nsCSSFrameConstructor.cpp is about 550K, but the lxr 
nsCSSFrameConstructor.cpp.htmlis 3.5MB. Also, loading from lxr, you're mostly 
spending time waiting on lxr. Suggest you save the file locally and then 
strip/mangle the href tags so they are not being processed, retry that 
experiment... or another local larger more-text-less-html file... one good 
example is the full java/docs/api/AllNames.html, for example.   
" Suggest you save the file locally "

I failed to mention that I *did* save it locally when preforming the test.
Kevin, argh!.... what build you at? I just tried this where I am, Gecko/20030331
and somthing in or around nsHTMLTokenizer::ScanDocStructure is so broken as to
overwhelm everything else. Its better on my 1.3b, but I've not symbols for that
anymore so I can't scope that out. Guess maybe I need to pull another nightly?
Discussed in edt.  It appears that dialog is ongoing between Kevin and Sam.  We
agreed to minus this bug to get it off our radar until that dialog is resolved.
 Please renominate if needed.
Keywords: topembedtopembed-
QA Contact: praveenqa → dsirnapalli
Flags: blocking1.4?
Keywords: mozilla1.3
Flags: blocking1.4? → blocking1.4-
adt: nsbeta1-
Keywords: nsbeta1nsbeta1-
Sam: did you have a chance to further investigate 
nsHTMLTokenizer::ScanDocStructure ?
Markus, no, have not revisited this. I've been working XSLT perf of 
late. I'll run a profile and then a function trace on a long textual 

document and see what it now shows. 

As for the disposition this bug... I dunno what's 'right thing to do' with it.

Lots of things pertinent to 'reflow related processing' have improved in past 
two years. 

Some things don't have so much to do with reflow itself, per-se, but, hummm... 
code and design to allow and accomodate reflow cycles. Maybe its as good now 
as it can be, given mozilla ryo cross-platform event/threading model. 

On the other hand, list of 'blocks' and 'dups' of ths bug is large. If we 
disposition this bug away, do those just come back alive again?    
 
Anyway, let me get a trace and see if that ScanDocStructure thing was real or 
just a transient.

 
Okay, traces and profiled recent build, local disk load of a large lxr source
code page. Weirdness with nsHTMLTokenizer::ScanDocStructure not seen. 

Profiled and traced 'middle part' of loading the page, after initial window-area
paint, but before document load completion. Did this twice, with and without: 
user_pref("content.notify.backoffcount,5);
which essentially disables timer based incremental reflow (after 5 reflows).

For the testcase, visual rendering was unchanged with and without pref. The
reflow activity happening without the pref is visually unproductive.

Unfortunatly, I must report than visually unproductive reflow remains very
significant to loadtime of such documents, accounting for at least 30% of total
load time. 

Some of the underlying code significant to causing this time is
nsTextFrame::ComputeTotalWordDimensions coming from nsTextFrame::MeasureText
coming from nsTextFrame::Reflow. Also a bunch of reflow-invoked style and font
activity, dispersed amoung a lot of function, but large in aggregate.

Hope this helps. 
Sam
I think this is no longer a problem in Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20061208 Minefield/3.0a1 ID:2006120804 [cairo]
Based on what?  This bug is filed on specific profiling data.  Do you have data that shows the problem is gone?  If so, please post it in the bug.
Assignee: kmcclusk → nobody
QA Contact: dsirnapalli → layout
Status: NEW → RESOLVED
Closed: 6 years ago
Priority: P1 → P2
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: