Closed Bug 1220466 Opened 4 years ago Closed 4 years ago

[APZ] 73% performance regression on IE mazesolver(40x40)

Categories

(Core :: Layout, defect)

Unspecified
Windows 7
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla48
Tracking Status
firefox45 --- disabled
firefox46 + wontfix
firefox47 + fixed
firefox48 --- fixed

People

(Reporter: alice0775, Assigned: mattwoodrow)

References

(Depends on 1 open bug, Blocks 3 open bugs, )

Details

(Keywords: perf, regression)

Attachments

(3 files)

Build Identifier:
https://hg.mozilla.org/mozilla-central/rev/c1a94d5365e48f86de6a7bb6a3c4c7d82b9d660e
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:45.0) Gecko/20100101 Firefox/45.0 ID:20151031030207


Steps to reproduce:
1. Exstract attached zip
2. Open Default.html
3. Select 40x40 and Click [Start]


Actual Results:
With e10s + APZ    : 116s
With e10s + disabled APZ : 67s
Summary: [APZ] 73% performance regression on IE mazesolver(40x40) since Firefox 39 → [APZ] 73% performance regression on IE mazesolver(40x40)
Attached file Demos-mazesolver.zip
Blocks: 1220459
Depends on: apz-talos
No longer depends on: 1220459
 Build ID  20160201030241
 User Agent  Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0

I have tested this on Win 7 x64 with Nightly 47.0a1 and this is are the results:

First time running the test:  
With e10s + APZ = 246 sec
With e10s + APZ disabled = 88 sec

Second time running the test:
With e10s + APZ = 121 sec
With e10s + APZ disabled = 95 sec

Third time running the test:
With e10s + APZ = 107 sec
With e10s + APZ disabled = 94 sec

Fourth time running the test:
With e10s + APZ = 123 sec
With e10s + APZ disabled = 85 sec
I got a profile (from OS X, not Windows), at https://cleopatra.io/#report=2b6548491191610c8685ed33fe1484b9ecd69480 - it definitely shows the content thread is flat-out busy painting so maybe it can point to some worthwhile optimization targets.
Blocks: paint-fast
Component: Panning and Zooming → Layout
[Tracking Requested - why for this release]:
Milan, can you help coordinate a fix for this for 46 assuming APZ ships with 46? This is marked as a regression tracking 46, which means we won't be shipping 46 with this...
Is this a regression from the bigger display port or from the event region display items?
The single biggest stack in the content process ends up with nsLayoutUtils::GetDisplayPort with 6.3%.

Total actual rasterization time is 0.9%.

We're getting hammered on display list and layer builder here. APZ is probably adding more display items to process (thanks to display ports), and increasing the total work per item (since we have GetDisplayPort calls and event handling etc).

We can probably do a bunch of small fixes to help reduce the pain here, or we could do something specific to really optimize this particular case.

I doubt the latter is useful though, since this page is fairly unique and I don't think we care about winning on it specifically.
From my testing this regression is mostly from event regions. The regression persists if I disable APZ but enable event regions. The page isn't very large, it's just barely scrollable so having a page-sized displayport doesn't hurt us too much here.
Commenting out the hunk at [1] makes most of the regression go away. I'm guessing there's a lot of pseudo-stacking-context items on this page so this code is getting really hot.

[1] http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsFrame.cpp?rev=7c7fe4da388c&mark=2557-2563#2557
With the 40x40 grid we end up creating ~8000 event regions display items per paint. That number goes down steadily as the maze finding progresses, ending at around ~7500. That's still a lot. There's a chunk of the display list which is just reams and reams of adjacent LayerEventRegions items that looks like this:

  LayerEventRegions p=0x163144680 f=0x1627304d0( class:marker) bounds(0,0,0,0) layerBounds(0,0,0,0) visible(0,0,0,0) componentAlpha(0,0,0,0) clip() scrollClip() ref=0x119e691d8 agr=0x119e691d8 (maybeHitRegion < (x=0, y=0, w=360, h=360); >)
  LayerEventRegions p=0x163144ab0 f=0x162730200( class:marker) bounds(0,0,0,0) layerBounds(0,0,0,0) visible(0,0,0,0) componentAlpha(0,0,0,0) clip() scrollClip() ref=0x119e691d8 agr=0x119e691d8 (maybeHitRegion < (x=0, y=0, w=360, h=360); >)
  LayerEventRegions p=0x16476fa88 f=0x162729d30( class:marker) bounds(0,0,0,0) layerBounds(0,0,0,0) visible(0,0,0,0) componentAlpha(0,0,0,0) clip() scrollClip() ref=0x119e691d8 agr=0x119e691d8 (maybeHitRegion < (x=0, y=0, w=360, h=360); >)
  LayerEventRegions p=0x16476feb8 f=0x162729880( class:marker) bounds(0,0,0,0) layerBounds(0,0,0,0) visible(0,0,0,0) componentAlpha(0,0,0,0) clip() scrollClip() ref=0x119e691d8 agr=0x119e691d8 (maybeHitRegion < (x=0, y=0, w=360, h=360); >)
  LayerEventRegions p=0x164770350 f=0x1627294c0( class:marker) bounds(0,0,0,0) layerBounds(0,0,0,0) visible(0,0,0,0) componentAlpha(0,0,0,0) clip() scrollClip() ref=0x119e691d8 agr=0x119e691d8 (maybeHitRegion < (x=0, y=0, w=360, h=360); >)
  LayerEventRegions p=0x164770780 f=0x162729010( class:marker) bounds(0,0,0,0) layerBounds(0,0,0,0) visible(0,0,0,0) componentAlpha(0,0,0,0) clip() scrollClip() ref=0x119e691d8 agr=0x119e691d8 (maybeHitRegion < (x=0, y=0, w=360, h=360); >)
  LayerEventRegions p=0x164770bb0 f=0x162728c50( class:marker) bounds(0,0,0,0) layerBounds(0,0,0,0) visible(0,0,0,0) componentAlpha(0,0,0,0) clip() scrollClip() ref=0x119e691d8 agr=0x119e691d8 (maybeHitRegion < (x=0, y=0, w=360, h=360); >)
  LayerEventRegions p=0x164771020 f=0x1627287a0( class:marker) bounds(0,0,0,0) layerBounds(0,0,0,0) visible(0,0,0,0) componentAlpha(0,0,0,0) clip() scrollClip() ref=0x119e691d8 agr=0x119e691d8 (maybeHitRegion < (x=0, y=0, w=360, h=360); >)

We should maybe not create so many of the same thing...
Reaching the limit of my layout knowledge here. Matt/Markus - can you think of any conditions where we can combine some of these pseudo stacking context event regions instead of creating a new one for each frame?
My understanding is that we need a new event regions display item each time we might have a new layer, so that each item can be assigned to a single layer.

pseudo stacking contexts only exist at the layout level, they shouldn't be visible to FrameLayerBuilder, so I can't see why we need a new event regions item for each one.

It seems like this code can be conditioned with buildingForChild.IsAnimatedGeometryRoot(), just like the non-pseudo stacking context path.

Adding ni? for Markus and tn, since it's been too long since I thought about this code.
Flags: needinfo?(mstange)
Flags: needinfo?(tnikkel)
Hmm, wouldn't we need to make sure we get a new layer event regions for anything that could build a container layer? So transform/opacity create pseudo stacking contexts so they have been getting a new layer event regions item, but won't after that change.
Flags: needinfo?(tnikkel)
Talked to matt on irc. transform/opacity create full stacking contexts, so they'd still get a layer event regions item.

We could likely avoid even more layer event regions items if we just created them for things that created container layers, instead of all stacking contexts, but that can be something for another bug.
(In reply to Timothy Nikkel (:tnikkel) from comment #15)
> We could likely avoid even more layer event regions items if we just created
> them for things that created container layers, instead of all stacking
> contexts, but that can be something for another bug.

That seems like a good idea, assuming this bug goes well.

Can we also skip creating new event region items for inactive container layers?

It seems like as soon as we enter an inactive container, we can just block all new event region items for descendants.

Unfortunately we don't really know when that happens at display list building time, but I think we could (bug 1205129?).
(In reply to Matt Woodrow (:mattwoodrow) from comment #17)
> Can we also skip creating new event region items for inactive container
> layers?

Currently AddFrame assumes the frames it adds are in the same coordinate system as the display items underlying frame. FrameLayerBuilder handles the task of transform the event regions when pulling them out of inactive container layers. So we'd need a solution to that. Either make AddFrame more complicated and handle transforms, or skip this optimization for transforms.
Assignee: nobody → matt.woodrow
Attachment #8728181 - Flags: review?(mstange)
Comment on attachment 8728152 [details] [diff] [review]
less-event-regions

These patches take my local run time from 29s to 16s.
Attachment #8728152 - Flags: review?(mstange)
That sounds like a nice win - should be comparable to the non-APZ score \o/
Comment on attachment 8728152 [details] [diff] [review]
less-event-regions

Review of attachment 8728152 [details] [diff] [review]:
-----------------------------------------------------------------

How does this ensure that we build a new event regions item when we enter a transform?
Attachment #8728152 - Flags: review?(mstange) → review+
Comment on attachment 8728181 [details] [diff] [review]
Avoid work for inactive layers

Review of attachment 8728181 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/base/FrameLayerBuilder.cpp
@@ +3984,5 @@
>          // This is the case for scrollbar thumbs, for example. In that case the
>          // clip we care about is the overflow:hidden clip on the scrollbar.
>          mPaintedLayerDataTree.AddingOwnLayer(animatedGeometryRoot->mParentAGR,
>  					     clipPtr,
> +					     IsInInactiveLayer() ? nullptr : uniformColorPtr);

Please do this in the initialization of the uniformColorPtr variable, so that you don't have to repeat it.
Attachment #8728181 - Flags: review?(mstange) → review+
(In reply to Markus Stange [:mstange] from comment #22)
> Comment on attachment 8728152 [details] [diff] [review]
> less-event-regions
> 
> Review of attachment 8728152 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> How does this ensure that we build a new event regions item when we enter a
> transform?

Transforms force a real stacking context, so we take the BuildDisplayListForStackingContext branch above.

That function has an unconditional creation of a new event regions item.
Oh, good.
Flags: needinfo?(mstange)
https://hg.mozilla.org/mozilla-central/rev/09b94960086e
https://hg.mozilla.org/mozilla-central/rev/1b7d730f11bf
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
I ran the 40x40 maze on OS X March 10 nightly which has the above fixes. I ran it 5 times each in three different configurations:

Config   no-e10s    e10s+APZ     e10s no-APZ
Samples   25         23           21
          24         23           22
          24         25           21
          24         24           24
          25         24           21
--------------------------------------------
Average  24.4s      23.8s        21.6s
--------------------------------------------

So APZ is better than no-e10s, but still ~10% worse than e10s without APZ. This is hugely better than the 70% regression so I'm verifying this as fixed. We should uplift this fix to 47 and 46.
Status: RESOLVED → VERIFIED
Whiteboard: [Dupe me]
Excellent! Is the remainder of this regression tracked somewhere else?
Flags: needinfo?(bugmail.mozilla)
Bug 1220459 covers the general regression in this benchmark. I can file another one for the 10% regression that comes with APZ assuming e10s is enabled.
Flags: needinfo?(bugmail.mozilla)
Great! What I care about is that this is tracked *somewhere* so we don't ship APZ, or any other features for that matter, with known regressions we didn't intend to ship with, so please ensure this gets tracked, either as part of the existing bug for this this benchmark, or separately.
Can you request uplift to aurora and beta as suggested in comment 28? Thanks!
Flags: needinfo?(matt.woodrow)
Though we may want to stick with only uplifting to 47, since I don't think we are actually planning to enable apz with 46.
Comment on attachment 8728152 [details] [diff] [review]
less-event-regions

Approval Request Comment
[Feature/regressing bug #]: APZ
[User impact if declined]: Poor performance on mazesolver (and other websites with similarly complex dynamic HTML changes).
[Describe test coverage new/current, TreeHerder]: Manually tested.
[Risks and why]: Low risk, just reduces unnecessary work.
[String/UUID change made/needed]: None
Flags: needinfo?(matt.woodrow)
Attachment #8728152 - Flags: approval-mozilla-aurora?
Comment on attachment 8728152 [details] [diff] [review]
less-event-regions

Perf improvement that was verified, Aurora47+
Attachment #8728152 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.