Closed Bug 273293 Opened 20 years ago Closed 17 years ago

Performance degradation at betanews.com

Categories

(Core :: Layout: Floats, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: martijn.martijn, Assigned: roc)

References

()

Details

(Keywords: perf, regression, testcase, Whiteboard: [reflow-refactor])

Attachments

(5 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a6) Gecko/20041203 Firefox/1.0+
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a6) Gecko/20041203 Firefox/1.0+

The time for focusing a link and before the autoscroll image appears (using the
autoscroll feature in Firefox), has increased considerably at this site.
That page uses a lot of floating divs.

It was considerably faster in:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a5) Gecko/20041125 7:07am
But has become slow in:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a6) Gecko/20041125
Firefox/1.0+ 10:19am
The first build doesn't have the big patch from bug 209694, the second one has.

Reproducible: Always
Steps to Reproduce:
1.
2.
3.
Attached file Testcase
This is basically the raw structure (somewhat simplified, though) of that page.
Blocks: 209694
I think I see something similar at http://gemal.dk/blog/categories/mozilla/ ,
since bug 209694 is fixed.
When you hover over a link you get to see something called a 'nicetitle'.
Since the fix for bug 209694, this has become slower to appear and the cpu
creeps up.
I think this is kind of strange, because the nicetitle is an absolutely
positioned div, so doesn't really effect the layout of the rest of the page.
(see js: http://gemal.dk/js/titles.js and stylesheet:
http://gemal.dk/css/style.css , look for .nicetitle)
To Robert, per bug 209694 dependency...

Martijn, does that testcase show the performance problem?  If so, what are the
exact steps to reproduce it?
Assignee: nobody → roc
Keywords: perf, regression
Ok, here is a little focus test that focuses/unfocuses the first link at an
interval of 50ms and this is done 21 times. The time this takes is being
measured.

Results:
Mozilla 2004-11-25: 2424ms
Firefox 2004-11-25: 13109ms
(latest nightly build is even a bit slower: Firefox 2005-01-10: 15372ms)
This is basically the same test as testcase2, but now with creating and
destroying an absolutely positioned <div>. 

Results:
Mozilla 2004-11-25: 1061ms (not taking cpu)
Firefox 2004-11-25: 7761ms (cpu 100%)
(latest nightly build seems a bit faster: Firefox 2005-01-10: 7271ms)

By the way, I'm using a Duron 800MHz, 128MB memory and a nVida Riva TNT2 M64 32
MB video card.
OK, it's not obvious to me what in the bug 209694 patch makes this testcase
slower.... The profile just shows lots of time spent reflowing floats, which is
not surprising.
I'm sorry, I don't understand. I find it rather surprising that in testcase3,
where an absolutely positioned div is created, a lot of time is spent in
reflowing floats. The floats should not be affected by the creation of the
absolutely positioned div, afaict.

For the rest, I guess, it is not surprising. Focus outlines are causing reflows
now, and since bug 209694 got fixed, floating just got a bit slower (but less
buggy).
I was profiling testcase 2, not testcase 3.  I'll look into testcase 3; you're
right that that one should not have slowed down.
I'm really not sure what's going on here... We end up reflowing all those floats
for some reason, but I'm not sure why.  :(
Keywords: testcase
> I'm really not sure what's going on here... We end up reflowing all those
> floats for some reason, but I'm not sure why.  :(

We insert placeholders into the BODY. This causes incremental reflow of the
BODYs. The DIV that contains all the floats is a child of the BODY and it
contains a block with 'clear:both'. The fix in bug 209694 forces us to reflow
this DIV in case its block with 'clear' needs to move. This DIV is not on the
incremental reflow path so nsBlockReflowContext::ReflowBlock sets the reason to
a resize reflow. This forces the DIV to reflow all its child floats to find out
how big they want to be.

I think incremental reflow can reflow off-the-reflow-path blocks with a Dirty
reason, instead of a Resize reason. That reduces the testcase time to just 5s
instead of 30s, but that's still more than the 1s from before 209694 (or if you
remove the 'clear' from the "interview with BetaNews" DIV). I'm working to find
out why.
Another thing that makes this testcase fun is that it contains this:

<body>
<div style="float:left;"><a href="#">BetaNews Home</a></div> 
<div style="clear:both;">bar</div>

and this:

<div>
	<div style="float:left;"><a href="#">Nate Mook</a>, BetaNews</div>
	<div style="clear:both;"><a href="#">interview with BetaNews</a></div>

Which are both examples triggering the optimistic two-pass reflow logic used to
implement the really tough CSS 2.1 rules on margin-collapsing/clearance. The
idea is that in these cases we don't know whether the DIV with 'clear:both'
actually has clearance until we've reflowed the float, but we don't know where
to place the float until we know how much margin-collapsing is going on, which
depends on whether that DIV has clearance. So we start by assuming the DIV does
*not* have clearance, reflowing everything under the margin-root, and then we
find out that the DIV *does* have clearance, so we reflow again from the
margin-root.

The BODY's second-pass reflow is done as a dirty reflow on itself, and that gets
us back into the situation I described in the previous comment. Now the dirty
reflow on the BODY does a resize-reflow on its children which forces us to
reflow all the big list of floats. I should *think* that dirty reflow of the
parent should be able to do dirty reflows of the children.
Yeah, changing dirty reflows to always do dirty reflows on the children, in
addition to making off-incremental-reflow-path reflows also do dirty reflows,
totally nails this bug. It's an extremely risky change though. I'm not even sure
if it's correct or not because the reason system is, shall we say,
underspecified. Of course all this will go away in dbaron's rework...
The reason I think it should be OK is that resize reflows aren't guaranteed to
do more work than dirty reflows. In particular the "try to skip lines" logic in
nsBlockFrame::PrepareResizeReflow will skip reflowing inline lines satisfying
certain conditions. What we will probably find, though, is that the limitations
of PrepareResizeReflow's line-skipping mean we always reflow certain lines, and
other parts of the code rely on that when they really should be marking such
lines dirty.

There may also be issues with dirty reflows of other types of children.
Attached patch debug codeSplinter Review
Here is some debug code I wrote that helps track down the causes of lines being
marked dirty. I think we should check this in, It shouldn't change any
behaviour.
Attachment #173017 - Flags: superreview?(dbaron)
Attachment #173017 - Flags: review?(dbaron)
Attached patch actual 'fix'Splinter Review
Here's the actual patch. I'm not sure what to do with it. I wish we had
automated regression tests for incremental reflow.
So... My vote would be that if part of the patch is safe we should land it.  I'm
not sure we really want to mess with the unsafe part at this point in the 1.8
cycle, especially given the reflow branch.
(In reply to comment #10)
> We insert placeholders into the BODY. This causes incremental reflow of the
> BODYs. 
I'm sorry, I don't understand. Why should adding an absolutely/fixed positioned
element in the body cause an incremental reflow on the body?

(This is probably very naive of me) The "actual 'fix'" patch just works fine for
me. I'm not seeing any problems with it.
> Why should adding an absolutely/fixed positioned element in the body cause an
> incremental reflow on the body?

Because of the way reflow is currently dispatched in Mozilla.  I wouldn't worry
too much about it for now; as Robert said in comment 12, it's likely to change soon.
Unfortunately there is no "safe" part to the unsafe patch.

Martjin, can you give me some idea of how serious the impact of this slowdown
will be in the real world? If it's going to cause real problems for 1.8/FF1.1
users then I think we should check it in ASAP, but only after I spend some time
implementing and running automated tests for incremental reflow. If it's not
going to cause bad problems then I would rather just leave this alone until
David's branch lands.
Well, it's probably not a big deal.
I've only seen this bug at the two sites I mentioned earlier and on:
http://politiken.dk/VisArtikel.iasp?PageID=1
And I have kind of a slow computer (800MHz), so probably most people won't even
notice this bug.
Okay. Let's keep our eyes open and not fix this for now.

(I'd still like to land the debug code though.)
Comment on attachment 173017 [details] [diff] [review]
debug code

Put a big comment above MarkDirty saying it has to be inline so that compilers
optimize the debugging strings away.  And please back out if the codesize test
when you check in shows that some of them actually don't.

(This will make things more painful for me to merge, but oh well...)
Attachment #173017 - Flags: superreview?(dbaron)
Attachment #173017 - Flags: superreview+
Attachment #173017 - Flags: review?(dbaron)
Attachment #173017 - Flags: review+
Hey David, how about you merge the attachment to your branch (if indeed it even
makes sense for your branch) and I refrain from checking it in to the trunk.
Blocks: 114584
Blocks: 320378, 326085
Whiteboard: [reflow-refactor]
Blocks: 203448
(In reply to comment #21)
> Okay. Let's keep our eyes open and not fix this for now.
> 
> (I'd still like to land the debug code though.)

? to block for this since the code exists.
Flags: blocking1.9a1?
Flags: blocking1.9a1? → blocking1.9-
Whiteboard: [reflow-refactor] → [reflow-refactor] [wanted-1.9]
Blocks: 331292
Flags: blocking1.9- → blocking1.9?
Blocks: 321322
Blocks: 364067
Is this bug still present on the trunk?
I ran the focus test #2 on my Macbook Pro.

My trunk opt build: about 2350ms
FF2 release build: about 3550ms

So it's improved quite a lot on trunk but I wouldn't say it's completely fixed.
So I just tried the testcases here.  Times are approximate in ms; I didn't bother with more than two sig figs.

                              Focus     CreateElement
Before bug 209694 landed      1000         1000
After bug 209694 landed      20000        10000
Current trunk                10000         1000

So the focus test still has a problem, but the other is ok.
So I did a profile of the focus test in the trunk build.  The results are basically:

Total hit count: 68555
  7019 PresShell::ProcessReflowCommands(int)
  51274 PresShell::Paint(nsIView*, nsIRenderingContext*, nsRegion const&)

(those are "under" times).  The rest seems to be spent in event dispatch and so forth.

Under PresShell::Paint, we have:
   4073 nsIFrame::BuildDisplayListForStackingContext
    984 nsDisplayList::OptimizeVisibility
  44004 nsDisplayText::Paint

Not sure why we're spending so much time painting text... There's still no paint flashing under cairo, so I can't tell whether we're invalidating way too much or just being _really_ slow to paint it.

We should probably also remeasure once the new textframe lands.
Depends on: 334411, 367177
Blocks: 375462
This was already satisfactory fixed for me even before the fix for bug 320378 went in.
Can this bug be marked worksforme, Robert, or do you still think something could be optimized?
This bug has probably outlived its usefulness. We should have new bugs with more specific testcases if we need them.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → WORKSFORME
Whiteboard: [reflow-refactor] [wanted-1.9] → [reflow-refactor]
Flags: wanted1.9+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: