I've discovered much of the layout and reflow activity in rendering of a large document is 'triggered' (the verb 'caused' is not really right here) by the method nsGfxScrollFrameInner::Layout, specificly a call to LayoutChildAt by way of LayoutBox at this line of code: http://lxr.mozilla.org/seamonkey/source/layout/html/base/src/nsGfxScrollFrame.c pp#1140 This Layout method is invoked many times (10, 20, more, depending on length) for large document, the unconditional LayoutChildAt method invoked each time too. Several things unclear to me. It's not clear to me why it's invoked so many times, when it would seem to be just related to scroll bars. However, the method seems actually involved in layout/reflow/rendering/paint of more than scroll bars. If comment out the call to LayoutBox method, then more than just scrollbars are impacted, text area part is rendered incomplete, sometimes entirely incomplete, depending on document. However, if comment out the call to LayoutBox, large document load time improvement dramatic, I measure 35-50%! The problem/opportunity here is, it appears much of the the layout, reflow, activity triggered through this point is unneeded and/or redundant. There appears to be opportunity to significantly reduce response times by reducing the number of time that nsGfxScrollFrameInner::Layout is invoked, or by filtering inside the Layout method to do less invokes of LayoutBox / LayoutChildAt methods. I've tried a couple hacks at code inside Layout to filter unneeded calls to LayoutChildAt, but don't understand the code enough, getting good speed-up, but some pages don't finish rendering, or for example, the progress bar doesn't clean-up after load complete, other goinks depending on doc content. My experiments make me wonder if this Layout method is unexpectedly doing much more than it was intented to do, being a inadvertant 'trigger' for work meant to be triggered elsewhere. Please advise.
Chris (K & W), Dave: any comments on what Sam is looking at?
Some additional data. So, moving one level up in the food chain, most of the calls to nsGfxScrollFrameInner::Layout are generated from within nsGfxScrollFrame::Reflow. The data below shows 'reflows' and timings for a particular document. The bulk of the time is actually spent in LayoutChildAt. The data below shows total elapsed time, service time for each layout/reflow. Notice 29 individual reflows occur mostly in three bursts of reflow activity. The 3 bursts have 9, 9, then 10 reflows. The time spent in layout/reflow is 30% of total time. time service 0.00 0.00 nsGfxScrollFrameInner Constructor 0.01 0.00 Reflow 1 2.79 0.84 Reflow 2 3.63 0.01 Reflow 3 3.64 0.02 Reflow 4 3.66 0.01 Reflow 5 3.67 0.02 Reflow 6 3.69 0.01 Reflow 7 3.70 0.02 Reflow 8 3.72 0.01 Reflow 9 3.73 0.02 Reflow 10 3.75 (other activity) .... 4.86 0.37 Reflow 11 5.23 0.03 Reflow 12 5.26 0.01 Reflow 13 5.27 0.02 Reflow 14 5.29 0.02 Reflow 15 5.31 0.03 Reflow 16 5.34 0.03 Reflow 17 5.37 0.02 Reflow 18 5.39 0.02 Reflow 19 5.41 (other activity) .... 7.00 0.49 Reflow 20 7.49 0.04 Reflow 21 7.53 0.04 Reflow 22 7.57 0.04 Reflow 23 7.61 0.04 Reflow 24 7.65 0.04 Reflow 25 7.69 0.03 Reflow 26 7.72 0.04 Reflow 27 7.76 0.04 Reflow 28 7.80 0.04 Reflow 29 7.84 <- total elapsed 2.35 <- nsGfxScrollFrame::Reflow service time most of this time actually in LayoutChildAt
I forgot to say, timing data from my 700 MHz Thinkpad T20. The response time of this testcase is very proportional to processor speed, i.e. 233 MHz response times three times longer than 700 MHz. The html file loaded from local disk. Document is simple self contained text, a large list of links, no imbeds, images, etc. It's a customer-provided testcase, so I can't public-domain it. The customer app is online documentation. The document basicly looks like a detailed, multi-level book index. Customer complaint is response time compared to netscape 4.x. Sam
Discovery ocntinues. The preference: user_pref("layout.reflow.async.duringDocLoad", false); changes the behavior and provides some benefit. The problem being, I would guess, this preference as a global and decision already made to make the default 'true'. Has thought been given to dynamic setting of the preference? Perhaps, maybe, say, based on ContentSize versus ClientSize? That is, when the doc outgrows the window by some amount, then disable reflow.async for duration of load? With async off, I get this: 0.00 0.00 nsGfxScrollFrameInner Constructor 0.10 0.00 Reflow 1 .... (other stuff) 2.61 1.02 reflow 2 3.63 0.02 reflow 3 3.66 0.01 reflow 4 3.68 0.02 reflow 5 3.70 0.03 reflow 6 3.73 0.02 reflow 7 3.76 0.02 reflow 8 3.78 0.02 reflow 9 3.80 0.02 reflow 10 3.78 0.02 reflow 11 3.82 0.02 reflow 12 3.84 0.03 reflow 13 3.87 0.02 reflow 14 3.89 .... (other stuff) 6.40 0.72 reflow 15 7.12 0.04 reflow 16 7.16 0.04 reflow 17 7.20 <- elapsed 2.07 <- reflow service time There's still opportunity here, I think. The 'reflow 2' time (both async=off and async=on) is large because scroll bar visibility change happens amidst it, basicly doubles the work and the time. Maybe its possible to collapse this to one layout/reflow pass. Sam
The scroll bar change is probably bug 12234. Reassigning to evaughan, due to scroll frame issues.
evaughan on sabbatical till 8/20... sarri, can you get someone to help?
Reflow is recursive. Anything you reflow in an HTML page is going to go through the GFXScrollFrame's LayoutChild methods. This doesn't implicate the scroll frame in any way.
As I've said before, I've patched the code to always show a vertical scrollbar, and it doesn't make a bit of difference in the performance numbers.
Dave is correct. When a document is loaded many incremental reflows occur to size and new content that added incrementally to increase performance. How much time is spent in LayoutChildAt excluding the calls under it? Keep in mind to size an image or to add some text requires a top down reflow. Its efficient because only the frames that have changed or have children that have changed will recieve this call.
The base problem is that we fire a timeout to start reflow - but after that we reflow constantly, which can slow down parsing from a fast source (local for example). See PresShell::InitialReflow() Probably this should NOT be just a 1-time timer, but some initial delay with smaller delays thereafter (say .1 or .2 seconds, instead of the .01 that the postings here show). Better yet (but harder) would be to delay reflows until either we've had unflowed content for 0.x seconds, or until we've been idle for 0.y seconds (0.y < 0.x). As total time increases (or document size), we may want to delay longer between reflows (since it's less likely to be important to show it immediately). This will keep fairly smooth updates without spending all our time in reflow, especially with local datasources or very fast connections.
rjesup: that's not quite right. We don't do initial reflow on a timeout; we do the initial reflow as soon as we see the <body> element. Subsequent reflows aren't enqueued until frames have been constructed, and _that_ is primarily throttled in nsHTMLContentSink. (Which, by the way, has that timer monkey business you mention, although I'm not fond of it.) When a reflow is scheduled, yes, we put an event into the queue to process the reflow commands. (I suppose this is what you mean by ``reflowing constantly''.) But, if content is arriving rapidly, that event has to compete with all the other events in the queue, so we'll tend to reflow in larger chunks. In other words, loading a local file will cause necko ought to queue up several buffers worth of content before we get around to servicing the reflow event. (We're all sharing the same event queue.) I'm a bit skeptical of making pres shell's reflow stuff any more byzantine than it already is. I'd rather see us try to simplify instead of adding more knobs and dials to wallpaper over algorithmic problems in content construction, frame construction, and reflow. My $0.02.
>0.00 0.00 nsGfxScrollFrameInner Constructor >0.10 0.00 Reflow 1 >.... (other stuff) >2.61 1.02 reflow 2 >3.63 0.02 reflow 3 >There's still opportunity here, I think. The 'reflow 2' time (both >async=off and async=on) is large because scroll bar visibility change >happens amidst it, basicly doubles the work and the time. Maybe its >possible to collapse this to one layout/reflow pass. If this "reflow 2" takes so long (because we have a lot of data?) doesn't this mean that it is being delayed far too long? Is it possible to efficiently detect when the scrollbar visibility change must happen?
In reply to: firstname.lastname@example.org Ah so, yes, adding some more trace points, this is becoming a bit clearer to me now. So in this long doc I am looking at, it would seem both flavors of reflow are happening. Both 'content driven' and 'timer driven' reflows. So a question is, couldn't / shouldn't a 'content driven' reflow then 'reset' the delay (or cancel the pending notify/interrupt) for the 'timer driven' reflow? The mNotificationInterval = 120000; at: http://lxr.mozilla.org/mozilla/source/content/html/document/src/nsHTMLC ontentSink.cpp#2523 seems to be a key variable here. I am experimenting with higher values (i.e. user_pref("content.notify.interval", xxxxxxx); to determine tradeoffs. Maybe the problem is, content reflows can take so long that timer triggered reflow will usually happen after. One thing very clear here to me is the overall behavior of reflow-related activity highly dependent on document content. It's seems hard to generalize the 'overhead of reflow', even describe expected reflow behavior, in abstract of a specific document and its content. Sam
Just to be clear, there is not ``timer driven reflow''. There is ``timer driven frame construction''. Frame construction enqueues an incremental reflow. An incremental reflow is targeted at an individual frame, and is coalesced with subsequent reflows as more and more of the frame tree becomes dirty (see nsIFrame::ReflowDirtyChild). So, no, we shouldn't cancel a pending reflow just because new content has arrived, because that reflow may be targeted at a part of the frame tree that the newly generated reflow command would fail to flow. The real problem here (why reflows keep taking longer and longer) is that an reflow has state (positions of floated elements, vertical margins) that must be recovered from ``clean'' frames. Right now, that state recovery is O(n), which is why you're seeing each reflow take longer and longer. You should look into (and help test) dbaron's work on making state recovery be O(1). See bug 78911 and its dependents.
Let me be clear, however you clarify it, reflow-related processing, much of which is redundant or unneeded, is occuring, and it needs fixed. I'll be real surprised if there is any one point-source fix. I don't understand you try to keep short-circuit this discussion. Looks to me many places might need to be made better. Some comment made above about 'its efficient because only children needing reflow get called to do it', well in my testcase, that means 300,000 calls more or less to UpdateSpaceManager method in 8 seconds elapsed. Its true most of those do not result in a reflow, however, just transversing the tree is far from free.
Excuse me for trying to explain how things work. I'll let you do that on your own from now on, as I can see it's not appreciated. My point to rjesup and others was that adding more timers may just mask underlying algorithmic problems, some of which are known and we are trying to fix.
Chris: thanks for the update; I hadn't looked deep enough. I just noticed the figures showing reflows every ~0.02 seconds, which seemed a little bit frequent to me. As you say, the O(n) or worse cases are a bigger problem, though I still want to look into limiting frequency of reflows. I have some very specific reasons to want to do this having to do with CPU and display device usage, even if the O(n) problems are fixed. (I can't assume I can spend spare CPU cycles reflowing even if there's nothing else to do within this mozilla.) Another thing (thinking off the top of my head) is how important is it to reflow non-visible areas while still loading. If they're not part of a visible line/block, then all they can affect is scroll-bars, which is of only limited importance while still loading. If they are part of a visible table, but not visible themselves, the table size _could_ change - but only to grow wider or maybe to adjust the relative widths of columns. Again, not critical to be updated frequently during a load. (It may be that there are optimizations like this or far better already within the tree, I'm just starting to look at this.) (by updated frequently, I mean updating a few times a second perhaps or faster.)
>Excuse me for trying to explain how things work. I'll let you do >that on your own from now on, as I can see it's not appreciated. Constructive discussion on the ideas being presented is appreciated. Explanations are constructive. Single-handedly dismissing an idea is not constructive. >My point to rjesup and others was that adding more timers may >just mask underlying algorithmic problems, some of which are known >and we are trying to fix. A mask in the hand is worth two fixes in the bush. Beyond notions of beauty, difference between 'a mask' and 'a fix' is simply semantics. Whatever causes the problem to not occur is a fix, I think. I didn't mean to drive you away. Just don't shoot down viable ideas.
emrick: "whatever causes the problem to not occur is a fix"? If you mean "whatever causes the symptom not to occur", then I have to disagree, because underlying problems can have many symptoms, only some of which we know about at any given moment. Treating symptoms may help in a desperate situation; it may even help to palliate an affliction for a long while. But it's always better to fix the underlying problem. This difference between treating symptoms and treating causes is emphatically not a semantic or aesthetic issue. The symptom-treatments bloat and complicate the system. The complications mean more bugs. The underlying problem too often still festers. The medical analogy is apt. Sorry to wax philosophical, but hey, you started it here! I hope fixes to frame construction or reflow scheduling simplify the system; I'm not saying you will inevitably bloat and complicate it by what you propose to fix here, or that rjesup will do so, either. Sorry, I'm merely quibbling with your statement that a mask and a fix are the same, if they make the symptom disappear. Waterson has been around layout long enough to see people add more "knobs" to tune code that itself takes too many cycles. Some of those knobs can be set, for certain kinds of content, to treat symptoms. But more symptoms emerge, and the system grows sicker in spite of the palliatives. Waterson, attinasi, dbaron, and others, want to fix the underlying too-many-cycles problem. If the knob-adders and -twiddlers were dominant, the symptom-treating would tend to suppress those real fixes, palliating the current crop of symptoms but leaving the system worse off for the next crop. At each step, the palliatives tend to crowd out the (more involved) underlying fixes. This has happened already too often, in my outsider's opinion, in the history of Gecko. Anyway, I don't see how waterson's comments dated 2001-08-22 12:33 (which stated facts about how reflow and frame construction work, then asserted facts about the underlying problem and why it's important to fix) could be said to "shoot down viable *ideas*" (my emphasis). Bring on your good ideas, kindly reject waterson's exhortation to help out on the underlying problem, and do your worst (worse is better!). /be
We posted this to the perf newsgroup, but got no response from there. Would appreciate comment please. Sam To: email@example.com cc: Subject: Setting content.notify.backoffcount to 1 We have improved greatly nearly all of our large text testcases by setting content.notify.backoffcount to the value of 1 (it is defaulted to -1). One customer test ( consisting of 1000s of Links within an unordered list) improved from over 26 seconds to under 22 seconds by adding this parameter to Prefs.js. We run our tests using OS/2 on a 333 MHz 128 mb system and are at the 0.9.2.1 level of Mozilla. Most other large text testcases improve in the range of 10%. I have not noticed any layout problems or response problems using 1 as the value for the content.notify.backoffcount. Please advise me, what I am missing? Why isn't this the correct setting? Thanks. Ivan Eisen IBM
From what I undersand, setting the backoff to 1 turns off the content sink flushing until the entire document is loaded (The 1 indicates that content sink flushing should be suspended after the first reflow). This effectively prevents incremental loading of the page. If your only goal is overall page load time then setting it to 1 or n is fine, but most users want to be able to see content as it comes in on a regular interval. The other effect of setting it to 1 is that a single reflow will occur when the document has completed loading which may prevent the user from interacting with Mozilla until the reflow has completed. This is evident when large documents are loaded. cc'ing Vidur
Taking this bug
Bulk moving Mozilla1.01 bugs to future-P1. I will pull from the future-P1 list to schedule bugs for post Mozilla1.0 milestones
per conversation w/ jesup, minusing *this* bug for 1.0.1, if specific patches/bugs fall out of this, please nominate those for 1.0.1 for consideration.
That sounds right to me, use this as gate to specific fixes, if and as available. Sam
batch: adding topembed per Gecko2 document http://rocknroll.mcom.com/users/marek/publish/Gecko/Gecko2Tasks.html
removing mozilla1.0+ Any actual patches that come out of this will be considered for 1.0 branch
Think also bug 168638 is related to this.
Do we have an approach to this?
What is the real issue that this bug is trying to address?
Hum... the original report is pretty clear, I think. With particlar kind of documents, very large mostly textual documents, say, things that are like 'chapters of a manual' then mozilla remains much slower than IE to reder such documents, and some part of this time is (or at least was) being spent in visually unproductive activity that ia related to document reflow. At lot of the reflow-related processing has changed since the bug was originally opened. At this point in time, more analysis is needed. The root problem (long text documents comparatively slow) remains. Today, the laregst major contributor that may be elsewhere (attributes?) or not, I don't know. Sam
Clearing topembed(+) for re-consideration. At this point I think this bug is really a tracking bug for improving overall reflow performance. I recently did measurements where I hacked gecko to disable incremental construction of layout frames and instead I performed a single content sink flush to construct all frames and reflow them after all of the document's content was loaded. I tested performance by loading a large document (lxr page for CSSFrameConstructor.cpp). I did not see any difference in overall page load performance. The assertion in this bug is that constructing and reflowing layout frames as chunks of content are loaded is significantly slower than delaying the layout frame construction and reflow until all or most of the document's content has been constructed does not seem to be case. At least for example I tested. There may be other types of content where this is not true however.
Kevin, the problem with loading lxr pages is, it sorta confuses the issue. look at the page source, with such, performance is being dominated by href/history and attributes processing, rather than actual content delivery. i.e. the real nsCSSFrameConstructor.cpp is about 550K, but the lxr nsCSSFrameConstructor.cpp.htmlis 3.5MB. Also, loading from lxr, you're mostly spending time waiting on lxr. Suggest you save the file locally and then strip/mangle the href tags so they are not being processed, retry that experiment... or another local larger more-text-less-html file... one good example is the full java/docs/api/AllNames.html, for example.
" Suggest you save the file locally " I failed to mention that I *did* save it locally when preforming the test.
Kevin, argh!.... what build you at? I just tried this where I am, Gecko/20030331 and somthing in or around nsHTMLTokenizer::ScanDocStructure is so broken as to overwhelm everything else. Its better on my 1.3b, but I've not symbols for that anymore so I can't scope that out. Guess maybe I need to pull another nightly?
Discussed in edt. It appears that dialog is ongoing between Kevin and Sam. We agreed to minus this bug to get it off our radar until that dialog is resolved. Please renominate if needed.
Sam: did you have a chance to further investigate nsHTMLTokenizer::ScanDocStructure ?
Markus, no, have not revisited this. I've been working XSLT perf of late. I'll run a profile and then a function trace on a long textual document and see what it now shows. As for the disposition this bug... I dunno what's 'right thing to do' with it. Lots of things pertinent to 'reflow related processing' have improved in past two years. Some things don't have so much to do with reflow itself, per-se, but, hummm... code and design to allow and accomodate reflow cycles. Maybe its as good now as it can be, given mozilla ryo cross-platform event/threading model. On the other hand, list of 'blocks' and 'dups' of ths bug is large. If we disposition this bug away, do those just come back alive again? Anyway, let me get a trace and see if that ScanDocStructure thing was real or just a transient.
Okay, traces and profiled recent build, local disk load of a large lxr source code page. Weirdness with nsHTMLTokenizer::ScanDocStructure not seen. Profiled and traced 'middle part' of loading the page, after initial window-area paint, but before document load completion. Did this twice, with and without: user_pref("content.notify.backoffcount,5); which essentially disables timer based incremental reflow (after 5 reflows). For the testcase, visual rendering was unchanged with and without pref. The reflow activity happening without the pref is visually unproductive. Unfortunatly, I must report than visually unproductive reflow remains very significant to loadtime of such documents, accounting for at least 30% of total load time. Some of the underlying code significant to causing this time is nsTextFrame::ComputeTotalWordDimensions coming from nsTextFrame::MeasureText coming from nsTextFrame::Reflow. Also a bunch of reflow-invoked style and font activity, dispersed amoung a lot of function, but large in aggregate. Hope this helps. Sam
I think this is no longer a problem in Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20061208 Minefield/3.0a1 ID:2006120804 [cairo]
Based on what? This bug is filed on specific profiling data. Do you have data that shows the problem is gone? If so, please post it in the bug.