Closed Bug 1607746 Opened 4 years ago Closed 4 years ago

2.01% sessionrestore (linux64-shippable-qr) regression on push a99bcdc5c85992a6ab33eadd3bafc1d0324019bf (Mon January 6 2020)

Categories

(Testing :: Performance, defect)

Version 3
defect
Not set
normal

Tracking

(firefox-esr68 unaffected, firefox72 unaffected, firefox73 unaffected, firefox74 fixed)

RESOLVED FIXED
mozilla74
Tracking Status
firefox-esr68 --- unaffected
firefox72 --- unaffected
firefox73 --- unaffected
firefox74 --- fixed

People

(Reporter: Bebe, Assigned: cbrewster)

References

(Regression)

Details

(4 keywords)

Attachments

(2 files)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=a99bcdc5c85992a6ab33eadd3bafc1d0324019bf

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

2% sessionrestore linux64-shippable-qr opt e10s stylo 983.46 -> 1,003.25

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=24615

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/TestEngineering/Performance/Talos

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/TestEngineering/Performance/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/TestEngineering/Performance/Talos/RegressionBugsHandling

Blocks: 1607747
Flags: needinfo?(connorbrewster)
Regressions: 1606742

Looking into this, my initial thought is that adding support for the alpha elements in the color matrix slows down some CSS filters which don't transform the alpha component at all (sepia, saturate, etc...). I'll try adding a fast-path for these filters that don't touch the alpha component, while still supporting alpha transforms with the generic feColorMatrix filter primitive.

Regressed by: 1606742
No longer regressions: 1606742

After some investigation, I believe the regression is due to changing the flat varying mat3 vColorMat to a mat4. This is inside the brush_blend.glsl shader which is used for all CSS filters and is also used for handling opacity in some cases. The change from mat3 to mat4 adds a performance penalty for all effects going through this shader.

In the case of the sessionrestore test, the only time the brush_blend shader is used is for opacity effects.

If a stacking context has an opacity of < 1, we manually add an opacity filter op here:
https://searchfox.org/mozilla-central/rev/d4d6f81e0ab479cde192453bae83d5e3edfb39d6/gfx/webrender_bindings/src/bindings.rs#2387-2409

The opacity filter goes through the brush_blend shader which has quite a bit of branching and a lot of logic for handling various filters. It would probably make sense to handle opacity on a stacking context in a different way than sending it through the filter path.

I don't really see a way to fix this regression without backing out the changes from bug 1606742 or giving stacking context opacity its own faster path. Any thoughts?

Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(gwatson)

yep it'd be worth putting opacity in its own simple shader. Likewise isolating the color matrix filter which has a bunch more data than the others could be nice.

Flags: needinfo?(nical.bugzilla)
Assignee: nobody → connorbrewster
Flags: needinfo?(gwatson)
Flags: needinfo?(connorbrewster)

Opacity is a common effect that is used and the opacit filter path is also used when a stacking
context has an opacity of < 1. The brush_blend shader is slow since it has support for a large
portion of CSS filters; however, opacity is used much more often than the rest of the filters.
This patch adds a simple shader for opacity effects which bypasses the extra overhead in the
brush_blend shader.

I removed the old opacity filter path in the brush_blend shader and cleaned up the filter mode
constants in the shader so there are less magic numbers. This should help if/when we move more
filters to their own shaders.

Depends on D59610

Pushed by dvarga@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b0226237c55e
Part 1: Move opacity to its own shader in WebRender r=nical
Pushed by rmaries@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5cd63d35010f
Part 2: Clean up passing filter mode to shader r=nical
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74

== Change summary for alert #24673 (as of Wed, 15 Jan 2020 10:44:01 GMT) ==

Regressions:

41% raptor-tp6-yahoo-mail-firefox-cold loadtime windows10-64-shippable-qr opt 943.00 -> 1,333.83
25% raptor-tp6-slides-firefox-cold fcp windows10-64-shippable-qr opt 1,193.50 -> 1,491.83
24% raptor-tp6-slides-firefox-cold fcp windows10-64-shippable-qr opt 1,192.42 -> 1,477.17
16% raptor-tp6-slides-firefox-cold fcp linux64-shippable-qr opt 1,212.25 -> 1,400.50
9% raptor-tp6-yahoo-mail-firefox-cold windows10-64-shippable-qr opt 527.52 -> 575.10
6% raptor-tp6-slides-firefox-cold windows10-64-shippable-qr opt 1,212.80 -> 1,290.84
4% raptor-tp6-slides-firefox-cold linux64-shippable-qr opt 1,236.37 -> 1,286.65
2% raptor-tp6-slides-firefox-cold loadtime linux64-shippable-qr opt 1,956.54 -> 1,997.00

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=24673

I think we should open this bug and investigate further.
Thanks!

Flags: needinfo?(connorbrewster)

Looks like cold startup regressed due to compiling an extra shader (opacity). One possible way to deal with this could be to allow the blend shader to do opacity in addition to having a dedicated opacity shader. The renderer could chose which shader to use based on some criterion (for example how many shaders were need to be compiled this frame or simply just wait for a number of frames before compiling and using the dedicated opacity shader to avoid doing too much work during startup). It'd make reproducing issues with either shaders a bit annoying but we could use a pref or environment variable to force a certain behavior instead of having the heuristics.

Just to make sure I understand the idea, we would need to track how many shaders we are needing to compile within a frame (or use some other heuristic) and based on that switch out which shader we compile/bind to handle opacity? This would not affect anything in the batching code as the batching would happen before the renderer is able to determine which shader it wants to bind. Unfortunately, this means there could be cases where batching could occur but doesn't when the blend shader is used for opacity and there are other filter effects in the frame.

== Change summary for alert #24660 (as of Tue, 14 Jan 2020 10:41:05 GMT) ==

Improvements:

11% sessionrestore linux64-shippable-qr opt e10s stylo 1,002.75 -> 894.33
9% startup_about_home_paint linux64-shippable-qr opt e10s stylo 1,277.67 -> 1,164.67
9% startup_about_home_paint_realworld_webextensions linux64-shippable-qr opt e10s stylo 1,298.00 -> 1,183.50

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=24660

Yep that's the idea. Indeed we would miss some batching opportunities compared to always having a single shader for blending and opacity, but taking as a baseline that opacity and blending are separate shaders, this would only be a way to avoid shader compilation when we are already very busy.

We could improve upon that by making the batching code aware of which shader configurations are compiled and required to guide batching decisions, but that's something we can move towards incrementally. I don't think it would be worth the effort just for opacity vs blending, but I would like to gradually move the batching and shader infrastructure towards being able to decide between slow/multi-function shaders and small/fast ones during batching to balance per-draw-call and per-pixel cost depending on what's hurting the frame most (which we can reasonably estimate during batching).

That was something that crossed my mind when I saw the reported regression, but it's only a suggestion. Feel free to tackle it some other way if you are inspired!

I'm not seeing improvements in the new regressions from delaying compilation of the opacity shader. Even if I always route the opacity effect through the blend shader, which means opacity never gets compiled, there are still no improvements. Raptor perf results: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=54b1bdeb484fbaa82c2598192da817d673584d75&framework=10&selectedTimeRange=172800

I wonder if the regressions could be due to less optimal batching because I split opacity into its own shader. I'll look at the specific tests which regressed to see how much batching was affected by moving opacity to its own shader.

It's odd because typically the *-cold tests are sensible to shader compilation times since we don't get any hit in the shader cache, while it's rare for batching regressions to specifically affect cold startup tests.

I didn't really get anywhere with improving cold starts tests by delaying compilation of certain shaders, but since then there have been some optimizations around shader compilation (bug 1604615)

Flags: needinfo?(connorbrewster)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: