Closed Bug 1247979 Opened 4 years ago Closed 4 years ago
scrolling-boxes is really slow with apz
testcase: http://people.mozilla.org/~roc/scrolling-boxes.html <The_8472> speaking of performance, this is like 10x slower under e10s: http://people.mozilla.org/~roc/scrolling-boxes.html <RyanVM> 40x slower than Chrome on my system <RyanVM> and yeah, non-e10s is 4x slower than Chrome <RyanVM> it's APZ <chutten> but fixed backgrounds are the devil <jimm> sounds similar to bug 1242969 <RyanVM> e10s w/o APZ is basically the same (if not slightly faster) than non-e10s
We're doing a lot of repainting here and we probably shouldn't. Shrinking the displayport significantly speeds this up.
Profile at  (taken from Nightly, where the test takes about ~11 seconds vs ~1 second on Chrome). This is spending about 35% of the time in nsRegion::ScaleToOutsidePixels. Jeff, can that be pixman code be optimized any more? We might also be building some absurd region containing a ton of 10x10 rects and then doing region operations on it.  https://cleopatra.io/#report=2258ab3fc58453546fcbbe144826f43730b293e7
(In reply to Kartikaya Gupta (email:firstname.lastname@example.org) from comment #2) > Profile at  (taken from Nightly, where the test takes about ~11 seconds > vs ~1 second on Chrome). > > This is spending about 35% of the time in nsRegion::ScaleToOutsidePixels. > Jeff, can that be pixman code be optimized any more? We might also be > building some absurd region containing a ton of 10x10 rects and then doing > region operations on it. > >  https://cleopatra.io/#report=2258ab3fc58453546fcbbe144826f43730b293e7 ScaleToOutsidePixels can be optimized a lot. I will do so.
This does a union of all the rectangles at once, instead of one-by-one. It also cuts down a lot on allocation churn. It would be possible to avoid an additional allocation/copy if we change the interface of this function to mutate the existing region instead of returning a new one. Note, I have not tested the correctness of this patch, but it should be ok.
On my local debug build that patch improves performance on this page by ~28% (sample size of 1). We should get it landed and run another profile.
This keeps around a temporary nsRegion so that we can avoid modifying these regions in place. Modifying regions in place forces a new allocation and we'd rather not.
This time without other changes.
Attachment #8718957 - Attachment is obsolete: true
Alternatively we could use AccumRegions for these regions. See bug 1248044.
Still slow with e10s. About 10.5 seconds.
NI on myself to get another profile once this is in a Nightly. I don't want to do an opt build locally.
On the latest nightly on OS X for me it takes ~5700ms compared to Chrome's ~900ms. A huge improvement already. Here's a new profile: https://cleopatra.io/#report=aaf726c63fcbd881eac6de2c739e72fe5102628c - we're still spending a lot of time in pixman region operations, for example around 5.4% of time in PaintedLayerData::AccumulateEventRegions which is all in _moz_pixman_region32_union.
Actually that 5.4% goes up to like 22% if you look at a subset of the profile that includes many paints trims out the empty space on the beginning/end of the profile. Jeff, can you take a look at this profile and see if there is more that can be done?
I expect that using the AccumRegions from bug 1248044 should help with the rest of the time. I'm on PTO this week but I can look at writing a patch the week after.
Depends on: 1248044
Did somebody save this testcase locally? Roc's people is gone and he doesn't have a backup.
I'm nominating this to block APZ rollout, but that decision probably ultimately lives with Jeff (or maybe mbest). It's an artificial testcase but one that's likely to get attention.
Thanks for providing a copy of the test, Alice! I've uploaded it to my people account so it's easier for people to run.
Bug 1253860 fixes this bug; the test now takes comparable time to non-e10s. I ran it on March 10 nightly on my OS X machine 5 times in each of three configurations: Config no-e10s e10s+APZ e10s no-APZ Samples 1882 2233 1660 1824 1485 1350 1614 1478 1562 1977 1509 1465 1772 2111 1682 -------------------------------------------- Average 1813.8 1763.2 1543.8 -------------------------------------------- e10s without APZ is still faster, but e10s with APZ is still better than non-e10s so I call this a win. We can improve this further by making the event regions code faster; the win in bug 1253860 comes from skipping the display list building entirely when we can.
The patch that landed here actually landed in 47 so updating flags according to that. However the bug as described "scrolling-boxes is really slow with APZ" isn't actually fixed in 47; it requires bug 1253860 to be really fixed. I hope leave-open dies in a fire.
I'm changing the target milestone to 48 as bug 1253860 is wontfix for 47. Verified as fixed using latest Aurora 48.0a2 under Win 7 64-bit and Mac OS X 10.9.5, the average I obtained was 1382ms under Win and 1115ms under Mac.
You need to log in before you can comment on or make changes to this bug.