Closed Bug 1697899 Opened 3 years ago Closed 3 years ago

Add fast path version of composite shader

Categories

(Core :: Graphics: WebRender, task)

task

Tracking

()

RESOLVED FIXED
88 Branch
Tracking Status
firefox88 --- fixed

People

(Reporter: jnicol, Assigned: jnicol)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

In bug 1697663 we see lots of poor performance on a Mali-T830 MP1. This GPU has a single shader core and is frequently fragment shader bound. Using the Arm graphics analyzer I can see that we often spend a significant number of cycles in the composite_TEXTURE_2D fragment shader. It's already a very simple shader, however it touches a large number of pixels.

Using the mali offline compiler on the composite_TEXTURE_2D fragment shader, targeting Mali-T830, gives the following stats:

                                A      LS       T    Bound
Total instruction cycles:    1.50    3.00    1.00       LS
Shortest path cycles:        1.00    3.00    1.00       LS
Longest path cycles:         1.00    3.00    1.00       LS

indicating we are load/store pipe bound. This is due to the varyings:

flat varying vec4 vColor;
varying vec2 vUv;
flat varying vec4 vUVBounds;

In the common case of compositing picture cache tiles we always set the colour to WHITE and render the entire texture. Therefore we should be able to remove vColor and vUVBounds. So I propose making a FAST_PATH shader variant which doesn't multiply by a colour and doesn't clamp UVs.

From local profiling this makes a noticeable improvement on a similar device, though is not enough by itself to make things good.

On low powered android devices it has been observed that we are GPU
bound on many pages. The composite shader, despite being relatively
simple, can account for a large proportion of these cycles due to the
large number of fragments it touches.

On Mali-T GPUs, the composite fragment shader is bound by loading the
varyings (for example, this takes 3 cycles on a Mali-T830). This patch
adds a fast path variant of the shader which removes the vColor and
vUVBounds varyings, reducing the number of cycles per fragment to 1 on
this GPU. This variant can only be used where the shader does not need
to modulate the output by a color (ie aColor is white), and when the
UV coordinates do not need to be clamped (eg because the entire
texture is being composited). Fortunately both of these conditions are
true in the common case of compositing picture cache tiles.

Pushed by jnicol@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/75ff3efd147a
Add fast path composite shader. r=gw
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 88 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: