Closed Bug 937519 Opened 11 years ago Closed 11 years ago

UX (Australis) branch Windows 7/8 Tab Animation Regression Test (TART) regression on 2013-11-07

Categories

(Core :: Layout, defect)

All
Windows 7
defect
Not set
major

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: MattN, Assigned: mattwoodrow)

References

()

Details

(Keywords: perf, Whiteboard: [Australis:P1][Australis:M9])

Attachments

(1 file)

There was a major UX branch regression on the TART talos test starting 2013-11-07 for Windows 7 and Windows 8 only:
* Windows 7 Non-PGO: http://bit.ly/181tCT0 (37% regression)
* Windows 7 PGO:     http://bit.ly/17kSXU8 (38% regression)
* Windows 8 Non-PGO: http://bit.ly/19fZnUf (24% regression)
* Windows 8 PGO:     http://bit.ly/1fy7Jfo (22% regression)

Windows 7 regressed to be much worse than m-c. Windows 8 regressed to be the same as m-c instead of faster.

UX Regression Range: 9f6d6fa60cf0 to 88104a2deda5
http://hg.mozilla.org/projects/ux/pushloghtml?changeset=88104a2deda5

My first guess would be bug 934860 because it's specifically for the same platforms and could affect animations. I'm going to try bisect now (starting with a backout of bug 934860).
That backout failed to build locally so here's another try run on top of 88104a2deda5 (instead of UX tip) just in case since I don't have time to wait for the builds to finish as I'm going to bed: 
https://tbpl.mozilla.org/?tree=Try&showall=1&rev=9298f6cdc333

Someone else can feel free to bisect further while I'm sleeping if the backout doesn't solve the problem.
MattN's build likely failed because he needed a clobber - those first try pushes worked fine, and seem to confirm that bug 934860 is the cause of this TART regression.

I'll spin out some comparison profiles today to see where we're spending the time.
Blocks: 934860
Here's a comparison profile for Win7 on the icon-open-DPI1 subtest, between UX and UX with bug 934860 backed out.

The "before" (all samples weighted -1) is UX with bug 934860 backed out. "after" is UX with bug 934860 (df30212c9a7c).

http://tests.themasta.com/cleopatra/
Can you upload the individual profiles too please, I don't find the comparison very easy to interpret.
Did this improve at all after bug 930451 landed? i.e. after https://hg.mozilla.org/mozilla-central/rev/1574b1da1773 (in November 9th Nightly).
(In reply to Matt Woodrow (:mattwoodrow) from comment #6)
> Can you upload the individual profiles too please, I don't find the
> comparison very easy to interpret.

Hrm, looking more closely at the profiles I've extracted, I'm not confident I've done it properly. I'll generate some new profiles and post them shortly.
Ok, the comparison profile I posted actually makes it look like the after profile (normal UX) was faster than the before (UX without bug 934860) profile.

Having examined the raw data, I think this is because the after profile (normal UX) is drawing fewer frames than the before profile (UX without bug 934860) because computing those frames is taking longer. Because those frames are skipped, we skip drawing them.

So comparison profiles are probably not going to be useful here - the overall time within a subtest (like icon-open-DPI1) is equal (or even shorter!) in the bad case... and the comparison profiles won't help us figure out *why* those frames are taking longer to compute.

So, full profiles it is. Coming right up.
One thing that jumps out for me is that nsCSSRendering::PaintGradient takes 12x longer with UX proper than with UX + backout patch.
Assignee: MattN+bmo → matt.woodrow
(In reply to Mike Conley (:mconley) from comment #12)
> One thing that jumps out for me is that nsCSSRendering::PaintGradient takes
> 12x longer with UX proper than with UX + backout patch.

Where do you see this? I only see 19 samples vs 27. The extra time seems to be split evenly between CreateBrushForPattern and ComputeLinearGradientLine. Both of these could probably be cached, but it's fairly insignificant.

I'm looking into us spending a huge chunk of our time rendering SVG images.
I'm doing the work for SVG caching in bug 923341. It looks like a makes up a decent amount of our lost time, but not all of it.

https://tbpl.mozilla.org/?tree=Try&rev=2bc0b4ff7d81
Depends on: 923341
Other things that we regress on:

Flushing D2D after painting - 35 samples.
CreateOffscreenSurface inside CreateSamplingRestrictedDrawable - 17 samples.
jrmuizel asked me to log the calls to gfxAlphaBoxBlur::Paint and nsCSSRendering::PaintGradient over the course of a single TART run for both UX proper and UX with the backout patch.

UX with backout:

gfxAlphaBoxBlur::Paint: 994 calls
nsCSSRendering::PaintGradient: 1444 calls

UX proper:

gfxAlphaBoxBlur::Paint: 1299 calls (+305)
nsCSSRendering::PaintGradient: 2128 calls (+684)
mattwoodrow:

Here's the profile from Win7 with your caching patches that you asked me to kick off last night:

http://tests.themasta.com/cleopatra/?report=d7f142b07c021f6e2ded89aad9ad71fcd154ccc1

What do you think of gavin's proposal[1] to back out 934860 while we work on this?

[1]: https://mail.mozilla.org/pipermail/firefox-dev/2013-November/001123.html
Flags: needinfo?(matt.woodrow)
Depends on: 938395
Some more things:

We're spending a lot more time inside SSE2 code doing blurring for box shadows on win7. From the code, it looks like we always allocate a surface for the full box shadow, and blur all of it, even if only part of it will get drawn back.

Trying to clip the blurred surface size to what we actually need doesn't help much since we still need to pad the surface with space for the blur/spread radius (both of which are large in the australis use case).

I tried skipping this padding step and it's a significant win, but obviously renders wrong. I'm not sure if there's a way to do this correctly.

Caching the blurs would be a big win here too.


The other problem we have is the blurred element is using opacity:0.85. This forces a css stacking context, and goes through some fairly expensive paths since we have to ensure that the opacity is applied to everything drawn by the element and all of it's descendants (even though there are none in this case).

This is adding quite a bit of overhead even without my patches, though it is exacerbated by going through that code multiple times.

I believe we can move the opacity onto the color of the box blur without changing the visible rendering, see https://hg.mozilla.org/try/rev/ae927eee2be6. I think mconley is going to confirm that this doesn't regress anything visually.

With both of the above fixes applied, we're actually faster than we were before my patches landed. I'll still waiting on results without the incorrect blur clipping patch applied, but I expect we should be approximately even.

For reference:

With all of my patches backed out: 
https://tbpl.mozilla.org/?tree=Try&rev=f83d63c3282e (has SPS profiler enabled)
win8 profile: http://people.mozilla.org/~bgirard/cleopatra/#report=4e5aea5120655cbeacae95c32692fe5d2f44d2fa
win7 profile: http://people.mozilla.org/~bgirard/cleopatra/#report=5aa3d5fdf549c6b098008c27d6f8f6022b47c49a

With all my patches so far (including the broken one):
https://tbpl.mozilla.org/?tree=Try&rev=ae927eee2be6
https://tbpl.mozilla.org/?tree=Try&rev=993edcdb2e90 (has SPS profiler enabled)
win 8 profile: http://people.mozilla.org/~bgirard/cleopatra/#report=9be468954f7a1bdbe9386c5e0e47b262f6f23628
win 7 profile: http://people.mozilla.org/~bgirard/cleopatra/#report=1be66b5a54fe3cc065c7750bc535c7fc3bd65381

With all patches except the broken one:
https://tbpl.mozilla.org/?tree=Try&rev=367aa58b8507
https://tbpl.mozilla.org/?tree=Try&rev=d1f6151e3726 (has SPS profiler enabled)

Will post profiles for this last one once they complete, unless someone beats me to it.
Flags: needinfo?(matt.woodrow)
(In reply to Matt Woodrow (:mattwoodrow) from comment #18)
> Will post profiles for this last one once they complete, unless someone
> beats me to it.

Here it is: http://tests.themasta.com/cleopatra/?report=5f5639c94d430320bcc5291d32cf4840bd589cab
Bug 934860 got backed out, so this regression is gone.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Technically it was only hidden behind a (hidden) pref: layout.paint_rects_separately. From what I've seen, it will likely be re-enabled after some other optimizations for D2D clipping land (e.g. bug 938395), and will hopefully not reintroduce this regression at that time.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: