2.01% sessionrestore (linux64-shippable-qr) regression on push a99bcdc5c85992a6ab33eadd3bafc1d0324019bf (Mon January 6 2020)
Categories
(Testing :: Performance, defect)
Tracking
(firefox-esr68 unaffected, firefox72 unaffected, firefox73 unaffected, firefox74 fixed)
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox72 | --- | unaffected |
firefox73 | --- | unaffected |
firefox74 | --- | fixed |
People
(Reporter: Bebe, Assigned: cbrewster)
References
(Regression)
Details
(4 keywords)
Attachments
(2 files)
Talos has detected a Firefox performance regression from push:
As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
2% sessionrestore linux64-shippable-qr opt e10s stylo 983.46 -> 1,003.25
You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=24615
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.
To learn more about the regressing test(s), please see: https://wiki.mozilla.org/TestEngineering/Performance/Talos
For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/TestEngineering/Performance/Talos/Running
*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***
Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/TestEngineering/Performance/Talos/RegressionBugsHandling
Reporter | ||
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
Looking into this, my initial thought is that adding support for the alpha elements in the color matrix slows down some CSS filters which don't transform the alpha component at all (sepia, saturate, etc...). I'll try adding a fast-path for these filters that don't touch the alpha component, while still supporting alpha transforms with the generic feColorMatrix
filter primitive.
Updated•5 years ago
|
Assignee | ||
Comment 2•5 years ago
•
|
||
After some investigation, I believe the regression is due to changing the flat varying mat3 vColorMat
to a mat4
. This is inside the brush_blend.glsl
shader which is used for all CSS filters and is also used for handling opacity in some cases. The change from mat3
to mat4
adds a performance penalty for all effects going through this shader.
In the case of the sessionrestore
test, the only time the brush_blend shader is used is for opacity effects.
If a stacking context has an opacity of < 1, we manually add an opacity filter op here:
https://searchfox.org/mozilla-central/rev/d4d6f81e0ab479cde192453bae83d5e3edfb39d6/gfx/webrender_bindings/src/bindings.rs#2387-2409
The opacity filter goes through the brush_blend shader which has quite a bit of branching and a lot of logic for handling various filters. It would probably make sense to handle opacity on a stacking context in a different way than sending it through the filter path.
I don't really see a way to fix this regression without backing out the changes from bug 1606742 or giving stacking context opacity its own faster path. Any thoughts?
Assignee | ||
Updated•5 years ago
|
Comment 3•5 years ago
|
||
yep it'd be worth putting opacity in its own simple shader. Likewise isolating the color matrix filter which has a bunch more data than the others could be nice.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 4•5 years ago
|
||
Opacity is a common effect that is used and the opacit filter path is also used when a stacking
context has an opacity of < 1. The brush_blend shader is slow since it has support for a large
portion of CSS filters; however, opacity is used much more often than the rest of the filters.
This patch adds a simple shader for opacity effects which bypasses the extra overhead in the
brush_blend shader.
Assignee | ||
Comment 5•5 years ago
|
||
I removed the old opacity filter path in the brush_blend shader and cleaned up the filter mode
constants in the shader so there are less magic numbers. This should help if/when we move more
filters to their own shaders.
Depends on D59610
Assignee | ||
Comment 6•5 years ago
|
||
After moving opacity to its own shader, we have some pretty nice perf improvements on the talos tests:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=d568bc0aae4ca84f421f10b72a07a28d1675f564&framework=1&selectedTimeRange=172800
Comment 9•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b0226237c55e
https://hg.mozilla.org/mozilla-central/rev/5cd63d35010f
Updated•5 years ago
|
Comment 10•5 years ago
|
||
== Change summary for alert #24673 (as of Wed, 15 Jan 2020 10:44:01 GMT) ==
Regressions:
41% raptor-tp6-yahoo-mail-firefox-cold loadtime windows10-64-shippable-qr opt 943.00 -> 1,333.83
25% raptor-tp6-slides-firefox-cold fcp windows10-64-shippable-qr opt 1,193.50 -> 1,491.83
24% raptor-tp6-slides-firefox-cold fcp windows10-64-shippable-qr opt 1,192.42 -> 1,477.17
16% raptor-tp6-slides-firefox-cold fcp linux64-shippable-qr opt 1,212.25 -> 1,400.50
9% raptor-tp6-yahoo-mail-firefox-cold windows10-64-shippable-qr opt 527.52 -> 575.10
6% raptor-tp6-slides-firefox-cold windows10-64-shippable-qr opt 1,212.80 -> 1,290.84
4% raptor-tp6-slides-firefox-cold linux64-shippable-qr opt 1,236.37 -> 1,286.65
2% raptor-tp6-slides-firefox-cold loadtime linux64-shippable-qr opt 1,956.54 -> 1,997.00
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=24673
I think we should open this bug and investigate further.
Thanks!
Comment 11•5 years ago
|
||
Looks like cold startup regressed due to compiling an extra shader (opacity). One possible way to deal with this could be to allow the blend shader to do opacity in addition to having a dedicated opacity shader. The renderer could chose which shader to use based on some criterion (for example how many shaders were need to be compiled this frame or simply just wait for a number of frames before compiling and using the dedicated opacity shader to avoid doing too much work during startup). It'd make reproducing issues with either shaders a bit annoying but we could use a pref or environment variable to force a certain behavior instead of having the heuristics.
Assignee | ||
Comment 12•5 years ago
|
||
Just to make sure I understand the idea, we would need to track how many shaders we are needing to compile within a frame (or use some other heuristic) and based on that switch out which shader we compile/bind to handle opacity? This would not affect anything in the batching code as the batching would happen before the renderer is able to determine which shader it wants to bind. Unfortunately, this means there could be cases where batching could occur but doesn't when the blend shader is used for opacity and there are other filter effects in the frame.
Comment 13•5 years ago
|
||
== Change summary for alert #24660 (as of Tue, 14 Jan 2020 10:41:05 GMT) ==
Improvements:
11% sessionrestore linux64-shippable-qr opt e10s stylo 1,002.75 -> 894.33
9% startup_about_home_paint linux64-shippable-qr opt e10s stylo 1,277.67 -> 1,164.67
9% startup_about_home_paint_realworld_webextensions linux64-shippable-qr opt e10s stylo 1,298.00 -> 1,183.50
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=24660
Comment 14•5 years ago
|
||
Yep that's the idea. Indeed we would miss some batching opportunities compared to always having a single shader for blending and opacity, but taking as a baseline that opacity and blending are separate shaders, this would only be a way to avoid shader compilation when we are already very busy.
We could improve upon that by making the batching code aware of which shader configurations are compiled and required to guide batching decisions, but that's something we can move towards incrementally. I don't think it would be worth the effort just for opacity vs blending, but I would like to gradually move the batching and shader infrastructure towards being able to decide between slow/multi-function shaders and small/fast ones during batching to balance per-draw-call and per-pixel cost depending on what's hurting the frame most (which we can reasonably estimate during batching).
That was something that crossed my mind when I saw the reported regression, but it's only a suggestion. Feel free to tackle it some other way if you are inspired!
Assignee | ||
Comment 15•5 years ago
|
||
I'm not seeing improvements in the new regressions from delaying compilation of the opacity shader. Even if I always route the opacity effect through the blend shader, which means opacity never gets compiled, there are still no improvements. Raptor perf results: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=54b1bdeb484fbaa82c2598192da817d673584d75&framework=10&selectedTimeRange=172800
I wonder if the regressions could be due to less optimal batching because I split opacity into its own shader. I'll look at the specific tests which regressed to see how much batching was affected by moving opacity to its own shader.
Comment 16•5 years ago
|
||
It's odd because typically the *-cold tests are sensible to shader compilation times since we don't get any hit in the shader cache, while it's rare for batching regressions to specifically affect cold startup tests.
Assignee | ||
Comment 17•5 years ago
|
||
I didn't really get anywhere with improving cold starts tests by delaying compilation of certain shaders, but since then there have been some optimizations around shader compilation (bug 1604615)
Description
•