Closed Bug 735898 Opened 12 years ago Closed 12 years ago

BuildDisplayList takes too long on CNN.com

Categories

(Core :: Layout, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla14
Tracking Status
firefox14 --- verified
firefox15 --- verified
blocking-fennec1.0 --- beta+

People

(Reporter: jrmuizel, Assigned: roc)

References

Details

(Whiteboard: [layout])

Attachments

(10 files, 3 obsolete files)

1.32 KB, patch
Details | Diff | Splinter Review
1.18 KB, patch
Details | Diff | Splinter Review
7.49 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
11.82 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
2.54 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
2.56 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
21.41 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
7.93 KB, patch
dbaron
: review+
Details | Diff | Splinter Review
1.17 KB, patch
dbaron
: review+
Details | Diff | Splinter Review
1.62 KB, patch
Details | Diff | Splinter Review
about 5% of our PresShell::Paint time is in BuidlDisplayList (14ms). It would be nice if this was cheaper.
This patch forces display list construction and layer construction to run 100 times for every paint. It helps when profiling those phases on desktop.

I use this bookmarklet:
javascript:(function() { var d = document.createElement("div"); d.setAttribute("style", "position:absolute; top:0; left:0; width:2px; height:2px;"); document.body.appendChild(d); var iters = 0; var start = Date.now(); function doframe() { d.style.background = d.style.background ? "" : "black"; mozRequestAnimationFrame(doframe); if (++iters == 100) { alert((Date.now() - start) + "ms"); } } doframe(); })();
to force continuous tiny repaint of the page (cnn.com in this case) and report an elapsed time to paint 100 frames. On my laptop I'm getting results like this:
35861ms
38900ms
37763ms
38083ms
So each display-list construction and layer construction is taking 3-4ms maybe. This is for a browser window with outer dimensions 1146x1080.
Assignee: nobody → roc
36% of the profile is under BuildDisplayListForStackingContext (display list construction), 35% is under PaintRoot (mostly layer construction), and 25% under ComputeVisibilityForRoot.
Results after this patch:
36880ms
34236ms
34375ms
Significant improvement.
Attachment #612513 - Flags: review?(matt.woodrow)
Comment on attachment 612513 [details] [diff] [review]
Part 1: Cache bounds (union of all child item bounds) in nsDisplayWrapList

Do we never change the list after create the wrap list item? Say in deleting items that aren't visible?
That's a good point, but it's not a problem because it's OK for the bounds to be conservative.

There is the issue that nsDisplayWrapList::GetOpaqueRegion uses GetBounds, which makes everything a bit confusing when GetBounds is conservative (although I think still correct). I make that clear by having it use mVisibleRect instead.
Comment on attachment 612513 [details] [diff] [review]
Part 1: Cache bounds (union of all child item bounds) in nsDisplayWrapList

Review of attachment 612513 [details] [diff] [review]:
-----------------------------------------------------------------

As long as it isn't possible to create an nsDisplayWrapList without ever calling GetBounds on it (and I don't think that it is), this looks good.
Attachment #612513 - Flags: review?(matt.woodrow) → review+
nsDisplayTransform::GetBounds() might not call GetBounds on its child list if we are prerendering. Don't know if that's worth worrying about though.
blocking-fennec1.0: --- → ?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #5)
> There is the issue that nsDisplayWrapList::GetOpaqueRegion uses GetBounds,
> which makes everything a bit confusing when GetBounds is conservative
> (although I think still correct). I make that clear by having it use
> mVisibleRect instead.

Actually, I don't need to change that. The point is that GetOpaqueRegion returns a region where every visible pixel is opaque. It's OK for that region to include the bounds of items that were not opaque, but were also not visible. I will add a patch to change comments to make this clear.
The patches here are on top of the patch in bug 733607.
Depends on: 733607
(In reply to Matt Woodrow (:mattwoodrow) from comment #7)
> nsDisplayTransform::GetBounds() might not call GetBounds on its child list
> if we are prerendering. Don't know if that's worth worrying about though.

No, I don't think it is.
Results after this patch:
33300ms
32436ms
32106ms
Significant improvement.
Attachment #613083 - Flags: review?(matt.woodrow)
The improvement in part 3 is probably not going to be seen on mobile since theming isn't important there.
Attached patch Part 1 v2Splinter Review
This actually had a bad bug that triggered assertions: TryMerge implementations need to combine their mBounds. Moved the nsDisplayWrapList merging code to a MergeFrom helper.
Attachment #613092 - Flags: review?(matt.woodrow)
Attached patch box-shadow patch (obsolete) — Splinter Review
We create an nsDisplayBoxShadowOuter and an nsDisplayBoxShadowInner every time we see a box-shadow. We generally only need one of them (usually outer). This patch should let us avoid creating the other one.

For some reason I completely fail to understand, it seems to make things a bit slower :-(.
Attachment #612513 - Attachment is obsolete: true
blocking-fennec1.0: ? → beta+
Attachment #613092 - Flags: review?(matt.woodrow) → review+
Attachment #613084 - Flags: review?(matt.woodrow) → review+
Attachment #613083 - Flags: review?(matt.woodrow) → review+
Can we land the r+'ed patches now and do the rest in follow up?
> [inbound merger: please leave this bug open.]

I've put [leave open] in the whiteboard;  please remove once all the patches have landed on inbound.
Whiteboard: [leave open]
Looks like the performance issues were measurement error. This patch doesn't actually help much on the CNN page (the only box-shadows are in the Firefox chrome, and 2 of the 3 elements affected use both inner and outer shadows), but it would help on any page that used a lot of box-shadows, especially since inset shadows are rarely used on the Web.
Attachment #613096 - Attachment is obsolete: true
Attachment #613879 - Flags: review?(matt.woodrow)
One of these patches turned android mochitest-robocop perma-orange.
6 INFO TEST-UNEXPECTED-FAIL | test_bug720538 | Checking for iframe a few pixels from the right edge - Color rgba(0,130,0,255) not close enough to expected rgb(0,0,255)
Some Windows reftests are perma-orange as well. I'll back out.
Depends on: 744384
Whiteboard: [leave open] → [leave open][layout]
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #20)
> Looks like the performance issues were measurement error. This patch doesn't
> actually help much on the CNN page (the only box-shadows are in the Firefox
> chrome, and 2 of the 3 elements affected use both inner and outer shadows),
> but it would help on any page that used a lot of box-shadows, especially
> since inset shadows are rarely used on the Web.

So what's the state of the issue here that's a beta blocker for Fennec native?
Part 2 caused reftest failures because the clip-rect is snapped to nearest pixel edges, so even if (before snapping) it contains the bounds of some item, after snapping some partially covered pixel of the item might be clipped out. So part 2 doesn't quite preserve rendering.

I'll land parts 1 and 3 again. They seem to be green on try.
(In reply to David Baron [:dbaron] from comment #26)
> So what's the state of the issue here that's a beta blocker for Fennec
> native?

The patches here should improve things a bit but probably not enough. I'm still working on this bug.
Matt, can you do these reviews asap please?
Attachment #613879 - Flags: review?(matt.woodrow) → review+
Attachment #614741 - Flags: review?(matt.woodrow) → review+
Comment on attachment 614741 [details] [diff] [review]
Part 5: Simple tweaks to avoid wrapping empty lists and to reorder boolean tests

Review of attachment 614741 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/nsFrame.cpp
@@ +2163,5 @@
>    } else if (disp->IsFloating()) {
> +    if (!list.IsEmpty()) {
> +      rv = aLists.Floats()->AppendNewToTop(new (aBuilder)
> +          nsDisplayWrapList(aBuilder, child, &list));
> +      NS_ENSURE_SUCCESS(rv, rv);

Do we need to wrap when the list consists of a single item?
(In reply to Matt Woodrow (:mattwoodrow) from comment #32)
> ::: layout/generic/nsFrame.cpp
> @@ +2163,5 @@
> >    } else if (disp->IsFloating()) {
> > +    if (!list.IsEmpty()) {
> > +      rv = aLists.Floats()->AppendNewToTop(new (aBuilder)
> > +          nsDisplayWrapList(aBuilder, child, &list));
> > +      NS_ENSURE_SUCCESS(rv, rv);
> 
> Do we need to wrap when the list consists of a single item?

We can probably get away with not wrapping these at all, since nothing ever tries to dig into the Floats() list and stuff in it. I had a patch to do that but it didn't seem to make much difference and it's maybe a bit more risky than some of the other things I'm trying. I'll probably revive it later.
DisplayBackgroundBorderOutline uses 7-8% of the profile, so I tried a patch that adds a frame state bit indicating that DisplayBBO doesn't need to do anything because the frame has no background/border/outline. Surprisingly it didn't help much because only about 10% of the 360-ish calls to DisplayBBO in a single display list construction pass fail to generate anything. I think this means that most frames that call DisplayBBO (and are visible, i.e. have nonempty overflow area) actually do have a background or border.
The call to nsIFrame::GetOffsetTo from nsDisplayItem::nsDisplayItem was using a significant chunk of the profile (around 7%). GetOffsetTo is expensive because we have to walk all the way up the (crossdoc) frame tree to the reference frame, which is typically the root frame of the root document.

This patch makes that go away by caching a frame/offset pair in nsDisplayListBuilder and efficiently updating that cache (without calling GetOffsetTo) in BuildDisplayListForChild. nsDisplayListBuilder::ToReferenceFrame gets a hit rate of over 99%. The remaining calls to GetOffsetTo in display list construction are mainly BuildDisplayListForChild entering an out-of-flow frame and setting up the cached frame/offset pair from scratch.

I'm switched to a saved copy of CNN.com so these numbers aren't quite comparable to the ones above:

Before:
27928ms
27074ms
26730ms
27351ms
26899ms

After:
25771ms
25403ms
25572ms

Significant improvement.
Turns out none of the sites that call nsIFrame::HasBorder really need to do all the work of getting the used-border. It's enough to just check nsStyleBorder. If our style doesn't have a border, then we're not going to draw one even if GetUsedBorder returns something non-zero (due to reflow not having run yet).

This is a small perf win, a bit hard to verify on my noisy environment, but the profiler shows it as an improvement in DisplayBackgroundBorderOutline.
Attachment #615254 - Flags: review?(dbaron)
nsDisplayList::SortByZOrder was about 4% of the profile. This patch reduces it to 1%. Most of the cost was comparing the order of the display items in the content tree, but we don't really need to do that; where we call SortByZOrder, the items will already have been put into the PositionedDescendants() list in frame order (usually via placeholders), which follows content order.
Attachment #615305 - Flags: review?(dbaron)
I have a bunch of patches which make all the display item creation methods, including BuildDisplayList, return void instead of nsresult, removing a lot of conditional branches. I suspect they probably won't help perf much, although they simplify code and we should land them. But they're big patches so I'll hold off until after the next uplift.
Comment on attachment 615239 [details] [diff] [review]
Part 6: avoid nsIFrame::GetOffsetTo calls

Review of attachment 615239 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me, the removal of mEffectsFrame probably should be a separate patch though.
Attachment #615239 - Flags: review?(matt.woodrow) → review+
Depends on: 745934
Comment on attachment 615254 [details] [diff] [review]
Part 7: remove expensive nsIFrame::HasBorder checks

r=dbaron
Attachment #615254 - Flags: review?(dbaron) → review+
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #39)
> nsDisplayList::SortByZOrder was about 4% of the profile. This patch reduces
> it to 1%. Most of the cost was comparing the order of the display items in
> the content tree, but we don't really need to do that; where we call
> SortByZOrder, the items will already have been put into the
> PositionedDescendants() list in frame order (usually via placeholders),
> which follows content order.

Is that true if, say, one of them was inside a float or abs pos element and one of them wasn't?  (nsBlockFrame::BuildDisplayList definitely builds display lists for floats not through placeholders.)
Whiteboard: [leave open][layout] → [leave open][layout][has patches][needs review dbaron]
(In reply to David Baron [:dbaron] from comment #44)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #39)
> > nsDisplayList::SortByZOrder was about 4% of the profile. This patch reduces
> > it to 1%. Most of the cost was comparing the order of the display items in
> > the content tree, but we don't really need to do that; where we call
> > SortByZOrder, the items will already have been put into the
> > PositionedDescendants() list in frame order (usually via placeholders),
> > which follows content order.
> 
> Is that true if, say, one of them was inside a float or abs pos element and
> one of them wasn't?

Yes, I think so.

> (nsBlockFrame::BuildDisplayList definitely builds
> display lists for floats not through placeholders.)

No, it doesn't, it builds them through placeholders (except for the floats pushed into it from prev-in-flows).
Comment on attachment 615305 [details] [diff] [review]
Part 8: speed up SortByZOrder

ok, r=dbaron

(I hope we have good test coverage for the edge cases here, even so.  They're interesting cases to test...)
Attachment #615305 - Flags: review?(dbaron) → review+
That was part 8.

The reftest failure is in text-overflow/anonymous-block.html; a marker that's not visible before part 8 is now visible. We add overflow markers at the end of the PositionedDescendants() list, during DisplayLine for each line. Before part 8, we'd eventually sort the PositionedDescendants() into content order, and because the marker content is the block itself, this moved the markers to be before (below) all PositionedDescendants() contributed by the block descendants. In this testcase a marker is moved to be below a rel-pos element contributed by a previous line, and the rel-pos element covers it. With part 8, the marker isn't moved and the rel-pos element doesn't cover it anymore.

I think probably markers for line N should be able to appear on top of PositionedDescendants() contributed by lines < N. So, I'll just change the test.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #50)
> I think probably markers for line N should be able to appear on top of
> PositionedDescendants() contributed by lines < N. So, I'll just change the
> test.

Although it's sort of disturbing to have things be out of content order like this.

Why not just put the marker at the end of the Content() list, after the in-flow non-positioned content on the line? That leads to simpler code too.
> Why not just put the marker at the end of the Content() list

Wouldn't that put the ellipsis below backgrounds of the block's inline children?

I think you can add the markers to the bottom of PositionedDescendants() instead of
top (in TextOverflow::CreateMarkers) to keep the current text-overflow rendering.
(In reply to Mats Palmgren [:mats] from comment #52)
> Wouldn't that put the ellipsis below backgrounds of the block's inline
> children?

It would put the ellipsis below backgrounds of inline children on the following lines. That's OK isn't it?

> I think you can add the markers to the bottom of PositionedDescendants()
> instead of
> top (in TextOverflow::CreateMarkers) to keep the current text-overflow
> rendering.

That wouldn't quite work because it would put the ellipses below PostionedDescendants() of earlier siblings of the block too.
> It would put the ellipsis below backgrounds of inline children on the
> following lines.

That's not what I'm seeing.  Backgrounds of child inlines goes on the Content()
list in DisplayLine:
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsBlockFrame.cpp#6211
so if we put the markers on that list and someone eventually content-sorts
that list the markers will be sorted below.  Your suggested fix breaks:

data:text/html,<div style="overflow:hidden;text-overflow:ellipsis;width:50px;"><span style="padding:0 40px;background:lime"></span></div>

(and many other text-overflow reftests)

> That wouldn't quite work because it would put the ellipses below
> PostionedDescendants() of earlier siblings of the block too.

The lists that the TextOverflow works on is a local collection created in
DisplayLine, I don't see how my suggested change can affect earlier siblings
of the block.
This pass layout/reftests/text-overflow/reftest.list locally on Linux.
For a future correct solution we should probably collect the markers for all
lines on a special list, and add them after all lines are done, in
nsBlockFrame::BuildDisplayList.  It doesn't seem worth the risk doing that
in this cycle though.
(In reply to Mats Palmgren [:mats] from comment #54)
> That's not what I'm seeing.  Backgrounds of child inlines goes on the
> Content()
> list in DisplayLine:
> http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsBlockFrame.
> cpp#6211
> so if we put the markers on that list and someone eventually content-sorts
> that list the markers will be sorted below.  Your suggested fix breaks:
> 
> data:text/html,<div
> style="overflow:hidden;text-overflow:ellipsis;width:50px;"><span
> style="padding:0 40px;background:lime"></span></div>
> 
> (and many other text-overflow reftests)

Yeah, I suppose that's true if someone content-sorts the list.

> > That wouldn't quite work because it would put the ellipses below
> > PostionedDescendants() of earlier siblings of the block too.
> 
> The lists that the TextOverflow works on is a local collection created in
> DisplayLine, I don't see how my suggested change can affect earlier siblings
> of the block.

I misunderstood which PositionedDescendants list you were talking about. I think you're right.
(In reply to Mats Palmgren [:mats] from comment #55)
> Created attachment 616824 [details] [diff] [review]
> my suggested part 9 for clarity
> 
> This pass layout/reftests/text-overflow/reftest.list locally on Linux.

Actually no, I was right the first time. mMarkerList isn't a list from the DisplayLine collection, it's the PositionedDescendants() list passed into nsBlockFrame::BuildDisplayList.

I'm not going to land this again for now, so there's no urgency. When I get back to it, I'll try putting the ellipses in the PositionedDescendants() list before all the regular positioned descendants of the block.
Was review.

Roc, if we have all we're going to take for now, can we close this bug to get it off the blocker list and file separate follow up bugs?
Whiteboard: [leave open][layout][has patches][needs review dbaron] → [leave open][layout][has patches]
Sure.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Filed bug 748646 to follow-up on comment 58 (put text-overflow markers in content order).
Target Milestone: --- → mozilla14
cnn.com is loading pretty fast on the latest Nightly and Aurora builds. 
Closing bug as verified fixed on:

Firefox 15.0a1 (2012-05-29)
Firefox 14.0a2 (2012-05-29)

Device: Galaxy Nexus
OS: Android 4.0.2
Status: RESOLVED → VERIFIED
Blocks: 772690
(In reply to Mats Palmgren [:mats] from comment #61)
> Filed bug 748646 to follow-up on comment 58 (put text-overflow markers in
> content order).

...and now that that's been addressed, I spun off bug 772690 to re-land part 8 from here (the patch that was backed out in comment 49)
Whiteboard: [leave open][layout][has patches] → [layout][has patches]
Version: unspecified → Trunk
Whiteboard: [layout][has patches] → [layout]
You need to log in before you can comment on or make changes to this bug.