CalculateHypotheticalBox can be slow when lines have lots of placeholders (only placeholders?) on them

REOPENED
Unassigned

Status

()

Core
Layout: Block and Inline
P3
normal
REOPENED
7 years ago
2 years ago

People

(Reporter: bz, Unassigned)

Tracking

(Depends on: 2 bugs, Blocks: 1 bug, {perf})

Trunk
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite -

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: ietestdrive, URL)

Attachments

(5 attachments, 1 obsolete attachment)

On my mac, 50% of the total time is reflow (the rest is painting).

Almost a third of the reflow time is CalculateHypotheticalBox. Almost half of that is self time; the rest is the nsLineBox::IndexOf call the line iterator constructor makes.

Would it make any sense to search backwards within lines when searching back from the line cursor?  Are we even hitting that here?

The self time is the IsEmpty() calls in AreAllEarlierInFlowFramesEmpty, looks like.  Not sure what we can do about those.
Created attachment 519059 [details] [diff] [review]
wip (wdiff)

Using "lineBox->CachedIsEmpty()" is valid at this point, right?
If so, then if it's true it follows that all frames on that line
before the place-holder must be empty, so we can skip the whole
AreAllEarlierInFlowFramesEmpty() loop.

This makes the default 30x30 test about 9% faster in a local
opt build for me (Linux x86-64).
(In reply to comment #1)
> Using "lineBox->CachedIsEmpty()" is valid at this point, right?

Yes.
Yeah, 9% about matches my profiler data (1/2 * 1/3 * 0.5).

Can we do something about the line iterator too?  I'm not sure what; searching backwards in the line box won't actually do the right thing...
I think nsAbsoluteContainingBlock::Reflow could keep a line cursor containing the current line of kidFrame's placeholder. The line cursor should only need to advance forward.
The thing is, in this case there aren't that many lines (30 at most).  It's just that each line has lots of stuff on it... so for your average new abs pos kid, we have to look at 900/2 placeholders (not counting the walls!) to find the "right" line.
Created attachment 519247 [details] [diff] [review]
wip8

Fwiw, I tried something like that, this gave an additional 3% improvement.
Created attachment 519594 [details] [diff] [review]
wip28

Caching the last child frame on the LineCursor property gives a significant
improvement.  Also optimized the dual direction line iteration a bit, it
won't give much for this test but I think it might be good when there are
many lines and the cursor is near either end.  When there is no line cursor
(we start from the first line) is such a case.
Created attachment 519595 [details] [diff] [review]
previous patches rolled into one

With all patches I get a 33% improvement in a local Linux x86-64 opt build.

Comment 9

7 years ago
opera pass this test amazingly
Mats, are you actively working on this?  If not, should someone else pick it up?
I'm not actively working on this.  It's probably worth taking some of
these patches, at least the first one.  Feel free to pick it up.

Comment 12

7 years ago
Is this bug similar to the slowness when visiting http://mazery.sjackson.net/ ?
(In reply to mdew from comment #12)
> Is this bug similar to the slowness when visiting
> http://mazery.sjackson.net/ ?

No, that seems to use canvas, not tables (and Firefox is a lot faster than Chrome on that page for me).
See jprof profile (Fedora x64 Xeon) in https://bugzilla.mozilla.org/attachment.cgi?id=563041

See also bug 626927, though that has a different cause.
Keywords: perf
OS: Mac OS X → All
Hardware: x86 → All
FYI, in my case 80% is ::Paint(), ~17% is Reflow.

Of the 80%, 38% is PaintForFrame(), 24% nsIFrame::BuildDisplayListForStackingContext(), and 18.4% nsDisplayList::ComputeVisibilityForRoot()
Here's a thought.  What about stashing the pointer to the line box on the abs pos frame, or on its placeholder, in a property?   We could do that during reflow....
That's tricky because it has to be updated as the placeholder (or its ancestors) get pushed and pulled around.
Ah, pulling and pushing ancestors, yeah.  :(
Alternatively we could give nsInlineFrames and nsPlaceholderFrames a direct pointer to the line-box if and only if they're immediate children of the line box/block. Then you could find the line-box for any placeholder by walking up the ancestors looking for the pointer.
That could end up with a fair amount of cost if there are lots of inlines.

Would it make sense to cache the line box on the placeholder when reflowing the placeholder?  By the time we need to calculate the hypothetical box, the placeholder has been reflowed, presumably, to get the horizontal position....

Updated

7 years ago
Depends on: 691414
OK, I'm going to try that.  I think it should give us simpler code than Mats' last patch, with about the same speedup.  It does rely on the invariant that we reflow the placeholder before the abs pos frame.  If we think this is not reliable across incremental reflows, we can clear the cached line box somewhere...
Assignee: nobody → bzbarsky
Priority: -- → P1
See Also: → bug 691645
Whiteboard: ietestdrive → [need review]ietestdrive

Comment 24

7 years ago
Try run for 90b41fd48aea is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=90b41fd48aea
Results (out of 71 total builds):
    success: 66
    warnings: 4
    failure: 1
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bzbarsky@mozilla.com-90b41fd48aea
(In reply to Boris Zbarsky (:bz) from comment #22)
> OK, I'm going to try that.  I think it should give us simpler code than
> Mats' last patch, with about the same speedup.  It does rely on the
> invariant that we reflow the placeholder before the abs pos frame.

I don't believe this invariant holds. If something inside an abs-pos frame is dirty, nsBlockFrame::ChildIsDirty does nothing to dirty the placeholder or its line, nor does PresShell::FrameNeedsReflow.

Another approach could be to cache state in nsAbsoluteContainingBlock::Reflow that keeps track of which line we're on and the last placeholder in the line that we reflowed, and pass that state as a parameter to ReflowAbsoluteFrame and poke it into nsHTMLReflowState. Then CalculateHypotheticalBox can start its search for the placeholder from the last placeholder.
> If something inside an abs-pos frame is dirty, nsBlockFrame::ChildIsDirty does nothing to
> dirty the placeholder or its line, nor does PresShell::FrameNeedsReflow.

But in that case the placeholder's line is valid, no?  We only need to worry about the cache storing an incorrect line, and in that case the invariant I mentioned should hold and keep that from lasting until we get to this code.
Comment on attachment 564435 [details] [diff] [review]
Perhaps like so

Review of attachment 564435 [details] [diff] [review]:
-----------------------------------------------------------------

I think you should probably just make the cached line box a regular field of nsPlaceholderFrame. Almost all abs-pos placeholders will end up having that property. If we're worried about the word of space in placeholders for floats, we could make abs-pos placeholders a tiny subclass of nsPlaceholderFrame.

::: layout/generic/nsHTMLReflowState.cpp
@@ +984,5 @@
>          // all the frames before it are empty.  In that case, it would
>          // have been just before this line.
>          // XXXbz the line box is not fully reflowed yet if our
>          // containing block is relatively positioned...
> +        bool found = false;

Why did you move "found" out?
Comment on attachment 564435 [details] [diff] [review]
Perhaps like so

> I think you should probably just make the cached line box a regular field of
> nsPlaceholderFrame.

Hmm.  Yeah, fair.  That probably actually uses less memory than the property in most cases.

> Why did you move "found" out?

By mistake.  ;)
Attachment #564435 - Attachment is obsolete: true
Attachment #564435 - Flags: review?(roc)
Created attachment 564745 [details] [diff] [review]
Speed up CalculateHypotheticalBox for the case of a block containing a bunch of lines but absolutely no in-flows anywhere.
Attachment #564745 - Flags: review?(roc)
Oh, can you add a comment to GetCachedLineBox indicating that it's only OK to call this if the caller ensures that the placeholder has been reflowed since any changes to the frame tree that could affect the line-box?
Yes, absolutely.
https://hg.mozilla.org/integration/mozilla-inbound/rev/0cd9ed297f73
Flags: in-testsuite-
Whiteboard: [need review]ietestdrive → ietestdrive
Target Milestone: --- → mozilla10
https://hg.mozilla.org/mozilla-central/rev/0cd9ed297f73
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
A rough test on my very fast Xeon went from 85 seconds (see bug 641340) to 82 seconds with this patch.  So this may have helped a little, but it doesn't look like it helped a lot. (Were any before/after profiles taken?  I could re-profile (I took one a few weeks ago) and compare.)  From my previous test, on the same system Chrome takes 4.5 seconds (close to 20x faster).  On my Win7 laptop (Lenovo W520 - also SandyBridge core i7 (2820QM in this case), but on the order of 1/2 as fast as the desktop generally) in IE9 it took 34 seconds, and on Aurora 9.0a2 (without this patch) it takes (drumroll) 339 seconds.  beat 10x as long by 1 second!  Given the lack of improvement on my desktop machine, I suspect this patch won't help much (not that it's bad).
> A rough test on my very fast Xeon went from 85 seconds (see bug 641340) to 82 seconds

How many measurements did you take?  This test is _very_ noisy and timing-dependent.

> Were any before/after profiles taken? 

Yes, of course.  Before the patch painting was about 50% of the total test time for the slice I profiled; after the patch it's closer to 60%.  If we assume painting speed is unchanged and number of paints is unchanged, that means that "new time" = "old time"*50/60.

Given that the maximum expected win was about 16% (see comment 0, which tells you exactly how much time the hotspot this bug is about was taking), that seems about right.

Comment 37

7 years ago
My test system (i7 2600k at stock) with FF Nightly (70e4de45a0d0)with new profile showed just 75 seconds and with this bug fix in build (38a487da2def}, I got only 54 seconds.
Test is with 30*30 maze, performed only once, both with new profile.

Comment 38

7 years ago
Is this bug really "resolved" I'm not seeing firefox nightly even coming close chrome, and even IE.

On a lowly AMD 4800+:
IE9: 25 seconds (30x30)
Chrome: 12 seconds (30x30)
Firefox: 154 seconnds

I've re-run these tests a few times over, getting pretty much the same results (give or take a few seconds).
Your nightly was built before this was landed on mozilla-central (this morning, Comment 34).  Hopefully you'll see improved results in tomorrow's nightly.
> Is this bug really "resolved"

Yes.  This bug wasn't "make it as fast as other browsers".  It was "1/6 of the time we're spending is in this function; we can get rid of it".  That's pretty obvious if you actually read comment 0 or comment 36.

There are other bugs that cover other things that we're slow on in this testcase.

Comment 41

7 years ago
For me this seems still not fixed :

Safari 5.1.1 = 5,2 seconds
Chrome 16.0.886.0 (Developer Build 101724 Windows) = 6.8s
IE9 = 19s

FF Nightly build October 25 (32 and 64 bit) = i did not hat time enough to wait for the end
As mentioned in comment 40 by bz, this bug was to reduce the overhead of one part of the problem, which accounted only for about 1/6th of the runtime, so even if it had been reduced to 0 by this bug, the test would only be ~1/6th faster.  See the other bugs in this group.

Updated

7 years ago
Depends on: 700112
> It does rely on the invariant that we reflow the placeholder before the abs pos frame. 

Turns out this is not true for cases when the containing block of the abs pos frame is a rel-pos inline or a rel pos block splitting across columns, because the placeholder might get pushed to a later continuation while the abs pos frame is reflowed after the first continuation is done.  Of course that means that in those case the layout ends up wrong with the old code, but that's probably better than crashing as in bug 696175.

I just backed this out on inbound to fix bug 696175 for now.  Once we finish up bug 524925 we can do abs pos reflow later and fix this again.
Status: RESOLVED → REOPENED
Priority: P1 → P3
Resolution: FIXED → ---

Updated

6 years ago
Depends on: 730769

Updated

6 years ago
Depends on: 730771

Comment 44

6 years ago
this is the only benchmark that mnakes /firefox look bad in the latest Toms Hardware :Web browser Grand Prix.
http://www.tomshardware.com/reviews/windows-7-chrome-20-firefox-13-opera-12,3228-8.html
Yes, we do know that...

Comment 46

6 years ago
Though accelerated Direct2D Performance seems to have improved significantly already also in the maze test still have to test on the IGP but it looks very nice in Multimedia tasks already much much better then some days back :)

Comment 47

6 years ago
I was restricting myself for a long time now to DX9 (layers) because of the bad Direct2D Performance seems slowly that isn't needed anymore also the frame drops seem to have lowered nicely and also the extensions are more stable in that regards :)
I guess all the new Profiling tools introduced really pay of now :)

Comment 48

6 years ago
Also https://bugzilla.mozilla.org/show_bug.cgi?id=698297 shows very significant render improvements :)

Comment 49

6 years ago
This has really improved within a few days . :)  Very good performance. Not still comparable to other browsers, but much better than previous firefox performance. Congratulations!

Which patch/set of patched made this happen?
Bug 524925.

Boris, should we wontfix this bug now? I think post-bug-524925 this isn't needed.
Chrome 23 (Canary) takes ~6 seconds for a 30x30 maze.  Nightly 2012-08-02
takes ~16 seconds.  There's no reflow methods showing up in a shark
build.  The results are:

Running (Self)          Symbol Name
1886.0ms   7.2%         isalloc_validate
    1040.0ms    3.9%        ozone_size
    664.0ms     2.5%        ozone_free_definite_size
    172.0ms     0.6%        ozone_free
    5.0ms       0.0%        free
    5.0ms       0.0%        ozone_realloc
973.0ms    3.7%         aa_render_shape
705.0ms    2.7%         _ZL11SearchTableP12PLDHashTablePKvj15PLDHashOperator
524.0ms    2.0%         memmove$VARIANT$sse42
455.0ms    1.7%         _spin_lock$VARIANT$mp
443.0ms    1.6%         resample_byte_h_4cpp_vector
436.0ms    1.6%         arena_dalloc
368.0ms    1.4%         nsRegion::SetToElements(unsigned int)
367.0ms    1.4%         arena_malloc
305.0ms    1.1%         mozilla::(anonymous namespace)::ContainerState::ProcessDisplayItems(nsDisplayList const&, mozilla::FrameLayerBuilder::Clip&, unsigned int)
... the rest is < 1% ...

So I would say this bug is worksforme.  I think we should still consider
taking some of the suggested improvements though even though it doesn't
impact this particular test anymore (at least the first patch seems
worth it without adding much risk).

BTW, is it reasonable that isalloc_validate takes this much time?
> Chrome 23 (Canary) takes ~6 seconds for a 30x30 maze.

Note that this is after a Shift+Reload.  If you close the dialog and
start a new run without doing Shift+Reload Chrome will take a different
(shorter) path through the maze, which takes 1.8 seconds.
(this could of course influence our comparative score quite badly
depending on how these sites perform their tests)

Comment 53

6 years ago
with the latest nightly, a strange behaviour occurs. as soon as the maze starts solving,nightly is blazing fast.Faster than chrome But very quickly, it slows down. Thst is, after going maybe 25-30 steps, the solving slows down a lot.

if you refresh the tab and start solving the maze again, it again start very fast but tapers very quickly.

Comment 54

6 years ago
(In reply to mayankleoboy1 from comment #53)
> with the latest nightly, a strange behaviour occurs. as soon as the maze
> starts solving,nightly is blazing fast.Faster than chrome But very quickly,
> it slows down. Thst is, after going maybe 25-30 steps, the solving slows
> down a lot.
> 
> if you refresh the tab and start solving the maze again, it again start very
> fast but tapers very quickly.

I can confirm this strange behaviour but only on Windows.. Linux runs fast all the way through.

I'm sure you already know, but test speeds vary alot depending on maze. Would be nice with a configurable seed, so you could test the same maze on different browsers/builds.

Comment 55

6 years ago
The first maze generated is still same.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #50)
> Bug 524925.

Actually I meant bug 691651.

We need to make it faster still. Bug 776190 tracks further improvements.
Could people PLEASE stop spamming the bug, as well as every other bug related to maze solver?  I really didn't need literally dozens of mails about this.  :(

We know it got faster, per comment 56.  We know it's still quadratic as comment 53 describes.  That's bug 691645, which is clearly hanging off bug 776190.

Mats, I agree that it's still worth making some of the changes here.  I'm not going to have time to drive them in.  Will you?

Resummarizing to the remaining issue, but it might make sense to file a new bug for it without all the baggage of this one...
Assignee: bzbarsky → nobody
Summary: IE Maze Solver testcase takes too long in CalculateHypotheticalBox → CalculateHypotheticalBox can be slow when lines have lots of placeholders (only placeholders?) on them
Target Milestone: mozilla10 → ---
And to be clear, I don't think we should overrotate on this, short of taking notes on how this could generally be done better for servo.  So I'd also be ok with a wontfix.
Blocks: 918620
You need to log in before you can comment on or make changes to this bug.