Shader compilation takes too much time on macOS
Categories
(Core :: Graphics: WebRender, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox77 | --- | fixed |
People
(Reporter: mstange, Assigned: jnicol)
References
(Blocks 1 open bug)
Details
(Keywords: perf-alert)
Attachments
(6 files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
Bug 1604615 - Use cargo linker wrapper for native sanitizer builds, but don't set problematic flags.
47 bytes,
text/x-phabricator-request
|
Details | Review |
On macOS, WebRender is suffering from very long shader compilation times. Turning on WebRender on macOS currently causes large regressions in the Talos startup numbers, such as ts_paint and various sessionrestore tests.
More specifically, shader compilation slows down the opening and drawing of the first window. But it can also cause janks during regular operation, for example during scrolling, when elements come on-screen which require shaders that haven't been used previously.
In a profile that includes startup, page load of https://www.w3.org/conf/2013sf/, and scrolling on that page, I can see:
- 644ms in
glCompileShader
andglLinkProgram
, https://perfht.ml/35xW0gT - 934ms during draw calls doing GPU-specific compilation, https://perfht.ml/2PtM4zl (60%/40% split for vertex/fragment shaders)
Reporter | ||
Comment 1•6 years ago
|
||
There are multiple things we should try here:
- Reduce the complexity of the shader code that we pass to the driver, for example by running it through a GLSL optimizer at build time.
- Compile fewer shaders. We should audit our use of
WR_FEATURE_
defines and check whether a given flag speeds up the shader sufficiently over a runtime check to warrant having a separate shader. - Compile shaders at a different time. There is a time during startup before the first paint. With some work we could do some of the shader compilation work early, before the first paint is kicked off, in parallel to main thread startup work.
Reporter | ||
Comment 2•6 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #1)
- Reduce the complexity of the shader code that we pass to the driver, for example by running it through a GLSL optimizer at build time.
:jrmuizel has started experimenting with this approach. For the text shader (vertex + fragment shader), feeding optimized input into GL improves the time for glCompileShader
+ glLinkProgram
from ~30ms to ~8ms on his machine. That's a 3x to 4x improvement. We have not measured how much impact it has on the time that is spent on the GPU-specific compilation that happens during the draw call.
The optimization was performed with glsl-optimizer
. We had to make some adjustments to the GLSL source because glsl-optimizer does not support switch statements (#124) or mix(bvec2, ...)
.
This disadvantage of this approach is that we'd be adding yet another GLSL compiler to our system, and we'd be committing ourselves to maintaining glsl-optimizer.
Comment 3•6 years ago
|
||
I fixed the switch statement and mix problems in glsl-optimizer and put it here: https://github.com/jrmuizel/glsl-optimizer/commits/master
Updated•6 years ago
|
Assignee | ||
Comment 4•5 years ago
|
||
Update Cargo.lock files and vendor sources in to tree.
Updated•5 years ago
|
Assignee | ||
Comment 5•5 years ago
|
||
glslopt's build script ignores environment variables such as
CC_$TARGET
and CFLAGS_$TARGET
. This is because we are building it
as a host build-dependency, but in cases where the host and target are
the same (ie when not cross-compiling), CFLAGS_$TARGET
and
CFLAGS_$HOST
are the same variable, and gecko's build system ensures
the target flags take priority. Attempting to build glslopt as a
build-dependency using the target flags causes a few issues, so we
ignore those variables. However, there are some additional changes
along those lines which must also be made:
-
For native sanitizer builds, we currently do not set the cargo
linker wrapper, or LDFLAGS, as they can cause crashes in some
crate's build scripts. However, this results in glslopt's build
using the wrong linker, and being unable to find libstdc++. Instead,
set the linker wrapper, but omit the problematic LDFLAGS. -
For wrench builds, the clang toolchain must now be fetched and be
present in PATH for bindgen to work. When building OSMesa, we must
therefore setLLVM_CONFIG=no
so that it does not attempt to build
llvmpipe. -
For wrench Mac cross-compiles, we must be careful to expose CFLAGS
and similar variables using the target-specific variable names, so
that host builds do not attempt to use flags intended for the
target. When building OSMesa we must use the generic variableCC
,
so now we additionally setHOST_CC
, so that host builds use the
host variable rather than the generic one. -
Similarily, for wrench android builds we must use a fork of
cargo-apk which sets the target-specific variables rather than the
generic ones. Otherwise we would attempt to use the NDK toolchain
for host builds.
Depends on D70030
Assignee | ||
Comment 6•5 years ago
|
||
Move more shader parsing code to webrender_build, so it can be used
both at runtime and build time.
At build time optimize a set of shaders and feature flag combinations,
using glslopt. Some features are skipped because they are not
supported by the gl version, because the optimizer does not support
them, or because webrender does not need them currently.
Use build-parallel to ensure the optimization is performed in parallel
using the make jobserver. Write the optimized shader source to a
hashmap to be used at runtime, in addition to the unoptimized source.
Depends on D70031
Assignee | ||
Comment 7•5 years ago
|
||
Add a gecko pref "gfx.webrender.use-optimized-shaders". If enabled,
then when attempting to compile a webrender shader first look for the
optimized source. If the optimized source is not present, emit a
warning and fall back to the unoptimized source.
Use the optimized source by default in wrench, and add the flag
"--use-unoptimized-shaders" to override this.
Depends on D70032
Assignee | ||
Comment 8•5 years ago
|
||
Depends on D70033
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 9•5 years ago
|
||
For native sanitizer builds, we currently do not pass the linker flags
to cargo, as they were causing crashes in some build scripts. Without
this, however, the linker is unable to find libstdc++. Instead, do
tell cargo to use the linker wrapper, but omit the problematic flags
from MOZ_CARGO_WRAP_LDFLAGS.
Depends on D70030
Updated•5 years ago
|
Comment 10•5 years ago
|
||
Comment 11•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/5ab330177479
https://hg.mozilla.org/mozilla-central/rev/7dbc0adfe69c
https://hg.mozilla.org/mozilla-central/rev/3fa4a6511d94
https://hg.mozilla.org/mozilla-central/rev/416526bbe3da
https://hg.mozilla.org/mozilla-central/rev/37870a77af53
https://hg.mozilla.org/mozilla-central/rev/b4ebfaf0e6f3
Comment 12•5 years ago
|
||
== Change summary for alert #25684 (as of Thu, 23 Apr 2020 04:30:33 GMT) ==
Improvements:
17% about_newtab_with_snippets responsiveness linux64-shippable-qr opt e10s stylo 0.28 -> 0.23
15% sessionrestore linux64-shippable-qr opt e10s stylo 890.50 -> 753.33
11% startup_about_home_paint linux64-shippable-qr opt e10s stylo 1,147.83 -> 1,016.42
11% startup_about_home_paint_realworld_webextensions linux64-shippable-qr opt e10s stylo 1,165.17 -> 1,035.50
11% startup_about_home_paint_realworld_webextensions linux64-shippable-qr opt e10s stylo 1,161.92 -> 1,037.58
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=25684
Comment 13•5 years ago
|
||
== Change summary for alert #25676 (as of Wed, 22 Apr 2020 08:18:54 GMT) ==
Improvements:
8% raptor-tp6-reddit-firefox-cold fcp windows10-64-shippable-qr opt 302.83 -> 277.25
8% raptor-tp6-slides-firefox-cold fcp windows10-64-shippable-qr opt 1,579.00 -> 1,459.83
7% raptor-tp6-yandex-firefox-cold fcp windows10-64-shippable-qr opt 297.00 -> 276.08
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=25676
Updated•5 years ago
|
Comment 14•5 years ago
|
||
FWIW, this also caused a massive regression in the size of libxul on all platforms. https://treeherder.mozilla.org/perf.html#/graphs?highlightAlerts=1&series=autoland,1921056,1,2&series=autoland,1921101,1,2&timerange=1209600
Comment 15•5 years ago
|
||
Note that the labels on the Y axis on that graph are missing a leading 1.
Description
•