Closed Bug 1604615 Opened 6 years ago Closed 5 years ago

Shader compilation takes too much time on macOS

Categories

(Core :: Graphics: WebRender, defect, P3)

All
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla77
Tracking Status
firefox77 --- fixed

People

(Reporter: mstange, Assigned: jnicol)

References

(Blocks 1 open bug)

Details

(Keywords: perf-alert)

Attachments

(6 files)

On macOS, WebRender is suffering from very long shader compilation times. Turning on WebRender on macOS currently causes large regressions in the Talos startup numbers, such as ts_paint and various sessionrestore tests.

More specifically, shader compilation slows down the opening and drawing of the first window. But it can also cause janks during regular operation, for example during scrolling, when elements come on-screen which require shaders that haven't been used previously.

In a profile that includes startup, page load of https://www.w3.org/conf/2013sf/, and scrolling on that page, I can see:

There are multiple things we should try here:

  • Reduce the complexity of the shader code that we pass to the driver, for example by running it through a GLSL optimizer at build time.
  • Compile fewer shaders. We should audit our use of WR_FEATURE_ defines and check whether a given flag speeds up the shader sufficiently over a runtime check to warrant having a separate shader.
  • Compile shaders at a different time. There is a time during startup before the first paint. With some work we could do some of the shader compilation work early, before the first paint is kicked off, in parallel to main thread startup work.

(In reply to Markus Stange [:mstange] from comment #1)

  • Reduce the complexity of the shader code that we pass to the driver, for example by running it through a GLSL optimizer at build time.

:jrmuizel has started experimenting with this approach. For the text shader (vertex + fragment shader), feeding optimized input into GL improves the time for glCompileShader + glLinkProgram from ~30ms to ~8ms on his machine. That's a 3x to 4x improvement. We have not measured how much impact it has on the time that is spent on the GPU-specific compilation that happens during the draw call.
The optimization was performed with glsl-optimizer. We had to make some adjustments to the GLSL source because glsl-optimizer does not support switch statements (#124) or mix(bvec2, ...).
This disadvantage of this approach is that we'd be adding yet another GLSL compiler to our system, and we'd be committing ourselves to maintaining glsl-optimizer.

Depends on: 1604567

I fixed the switch statement and mix problems in glsl-optimizer and put it here: https://github.com/jrmuizel/glsl-optimizer/commits/master

Priority: -- → P3

Update Cargo.lock files and vendor sources in to tree.

Assignee: nobody → jnicol
Status: NEW → ASSIGNED

glslopt's build script ignores environment variables such as
CC_$TARGET and CFLAGS_$TARGET. This is because we are building it
as a host build-dependency, but in cases where the host and target are
the same (ie when not cross-compiling), CFLAGS_$TARGET and
CFLAGS_$HOST are the same variable, and gecko's build system ensures
the target flags take priority. Attempting to build glslopt as a
build-dependency using the target flags causes a few issues, so we
ignore those variables. However, there are some additional changes
along those lines which must also be made:

  • For native sanitizer builds, we currently do not set the cargo
    linker wrapper, or LDFLAGS, as they can cause crashes in some
    crate's build scripts. However, this results in glslopt's build
    using the wrong linker, and being unable to find libstdc++. Instead,
    set the linker wrapper, but omit the problematic LDFLAGS.

  • For wrench builds, the clang toolchain must now be fetched and be
    present in PATH for bindgen to work. When building OSMesa, we must
    therefore set LLVM_CONFIG=no so that it does not attempt to build
    llvmpipe.

  • For wrench Mac cross-compiles, we must be careful to expose CFLAGS
    and similar variables using the target-specific variable names, so
    that host builds do not attempt to use flags intended for the
    target. When building OSMesa we must use the generic variable CC,
    so now we additionally set HOST_CC, so that host builds use the
    host variable rather than the generic one.

  • Similarily, for wrench android builds we must use a fork of
    cargo-apk which sets the target-specific variables rather than the
    generic ones. Otherwise we would attempt to use the NDK toolchain
    for host builds.

Depends on D70030

Move more shader parsing code to webrender_build, so it can be used
both at runtime and build time.

At build time optimize a set of shaders and feature flag combinations,
using glslopt. Some features are skipped because they are not
supported by the gl version, because the optimizer does not support
them, or because webrender does not need them currently.

Use build-parallel to ensure the optimization is performed in parallel
using the make jobserver. Write the optimized shader source to a
hashmap to be used at runtime, in addition to the unoptimized source.

Depends on D70031

Add a gecko pref "gfx.webrender.use-optimized-shaders". If enabled,
then when attempting to compile a webrender shader first look for the
optimized source. If the optimized source is not present, emit a
warning and fall back to the unoptimized source.

Use the optimized source by default in wrench, and add the flag
"--use-unoptimized-shaders" to override this.

Depends on D70032

Depends on D70033

Attachment #9138858 - Attachment is obsolete: true
Attachment #9138858 - Attachment is obsolete: false

For native sanitizer builds, we currently do not pass the linker flags
to cargo, as they were causing crashes in some build scripts. Without
this, however, the linker is unable to find libstdc++. Instead, do
tell cargo to use the linker wrapper, but omit the problematic flags
from MOZ_CARGO_WRAP_LDFLAGS.

Depends on D70030

Attachment #9138855 - Attachment description: Bug 1604615 - Fix build issues on CI arising from glslopt. r?jrmuizel → Bug 1604615 - Fix wrench build issues on CI arising from glslopt. r?jrmuizel
Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5ab330177479 Add glslopt and build-parallel as build dependencies of webrender. r=jrmuizel https://hg.mozilla.org/integration/autoland/rev/7dbc0adfe69c Use cargo linker wrapper for native sanitizer builds, but don't set problematic flags. r=glandium https://hg.mozilla.org/integration/autoland/rev/3fa4a6511d94 Fix wrench build issues on CI arising from glslopt. r=jrmuizel https://hg.mozilla.org/integration/autoland/rev/416526bbe3da Optimize webrender shaders at build time. r=gw https://hg.mozilla.org/integration/autoland/rev/37870a77af53 Use optimized shader source in webrender. r=jrmuizel https://hg.mozilla.org/integration/autoland/rev/b4ebfaf0e6f3 Adjust reftest expectations. r=jrmuizel
Regressions: 1631778
Regressions: 1632025
Regressions: 1632316

== Change summary for alert #25684 (as of Thu, 23 Apr 2020 04:30:33 GMT) ==

Improvements:

17% about_newtab_with_snippets responsiveness linux64-shippable-qr opt e10s stylo 0.28 -> 0.23
15% sessionrestore linux64-shippable-qr opt e10s stylo 890.50 -> 753.33
11% startup_about_home_paint linux64-shippable-qr opt e10s stylo 1,147.83 -> 1,016.42
11% startup_about_home_paint_realworld_webextensions linux64-shippable-qr opt e10s stylo 1,165.17 -> 1,035.50
11% startup_about_home_paint_realworld_webextensions linux64-shippable-qr opt e10s stylo 1,161.92 -> 1,037.58

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=25684

== Change summary for alert #25676 (as of Wed, 22 Apr 2020 08:18:54 GMT) ==

Improvements:

8% raptor-tp6-reddit-firefox-cold fcp windows10-64-shippable-qr opt 302.83 -> 277.25
8% raptor-tp6-slides-firefox-cold fcp windows10-64-shippable-qr opt 1,579.00 -> 1,459.83
7% raptor-tp6-yandex-firefox-cold fcp windows10-64-shippable-qr opt 297.00 -> 276.08

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=25676

Keywords: perf-alert

Note that the labels on the Y axis on that graph are missing a leading 1.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: