Open Bug 1762603 Opened 3 years ago Updated 6 months ago

Implement a different approach for rectange AA in WebRender

Categories

(Core :: Graphics: WebRender, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: nical, Unassigned)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file)

Anti-aliasing is currently computed by inflating the quad in local space by two pixels which is large enough for most cases. the fragment shader then compares the local space fragment position with a local space representation of the rectange and compute a simple distance to determine the AA. Scaling factor between local and screen space is compensated by taking fwidth into account.

The main problem with that is that the way the scaling factor is computed only works when the scaling is mostly uniform in x and y. if the scaling is stronger in an axis than the other, the aa gets squashed on one axis and very blurry on the other. (See bug 1759526)

In addition it is more complicated and expensive than I'd want it to be.

The advantage is that the aa does not require extra geometry (although we are missing the opportunity to move pixels to the opaque pass when not using extra geometry).

Extruding the aa in screen-space

I propose to replace this aa method with one where all the work is done in the fragment shader. The aa is always handled by extra geometry.

quad vertices are extruded in screen space by half a pixel inward/outward in order to generate thin band that deal with the anti-aliasing. Since the extrusion happens in screen space, we can have these bands always be exactly one pixel wide which means no conservative inflation, and it is correct under any (invertible) transform.

The transparency falloff is no longer computed via a distance function, instead it is baked in the vertex varying (0.0 for vertices extruded outward, 1.0 for vertices extruded inward). The alpha value is obtained from the interpolation of this vertex varying, so no computation is needed in the fragment shader.

Since the fragment shader also needs some information interpolated in local space (for example for uvs), the vertex shader does the following steps in pseudo-code:

#ifdef WR_FEATURE_ANTIALIASING
    vec2 local_normal = get_normal(edge_flags, aPosition);
    // adjust the length after normalization depending on whether the normal aligned with the rectangle or diagonal.
    float screen_length = local_normal.x == 0.0 || local_nornal.y == 0.0 ? 1.0 : sqrt(2.0) / 2.0; 
    float sign = outward ? 1.0 : -1.0;
    vec2 screen_normal = transform.transform_vector(local_normal).normalize() * screen_length;
    vec2 screen_position = transform.transform_point(local_position) + screen_normal * 0.5 * sign;
    // Have to reproject the adjusted screen position in local space. 
    vec2 adjusted_local_position = inv_transform.transform_point(screen_position);
#else
    vec2 adjusted_local_position = local_position;
#endif

v_uv = adjusted_local_position;

There are more matrix transformations happening on the vertex shader, I don't expect they will make a big difference although we could have fast paths for when we know the transform to be axis aligned or scale uniformly. I think it's more likely that the speedup of having no work in the fragment shader will outweight the extra alu in the vertex shader.

The potentially more complicated part of implementing this is that the batching code now has to output an extra quad for each edge that needs anti-aliasing. I don't think it changes much from what is currently done but we have to make sure it integrates properly with how brush segments already generate multiple quads per primitive. One of the questions to answer is should the aa bands be segments themselves?

An idea:
Currently we submit 2-triangle/4-vertices instances. In practice GPUs today don't put multiple instances into the same wavefronts n vertex shaders so we have very very low vertex shader occupancy. Maybe we could make our instances be always 10-triangles/12-vertices. (2-tri/4-verts for the interior + 8-tri/8verts for the aa bands). When an edge does not need aa, it is not extruded which results in the aa triangles having no area and not contributing to the final image. When we want to render the interior in the opaque pass, we can squash it into a zero-area quad in the instance that renders the aa in the alpha pass (hence the center area not sharing its vertices with the aa bands).
Perhaps that won't add noticeable overhead since we are adding geometry in currently unused lanes, and all of the extra lanes will be fetching the same data from the gpu cache (so I don't expect extra latency). It would make it very easy to integrate the aa into the batching code. Opaque batches could remain with two triangles per instance.
That is a good candidate for a prototype outside of webrender.

Attached image vertex aa illustration

A simple illustration of what the geometry would look like. On the right side the bottom edge was not flagged for aa so its aa "band" is squashed.

Back when I wrote this I didn't think about how it would work with 3d transforms and near plane clipping. Correctly extruding by exactly 1 device pixel won't just work when one or several corners are behind the view.

That said, we can still keep a conservative extrusion for the AA like we do today with the general idea of how the primitive is split between the AA parts and the opaque part in my earlier comments.

Also, while we are at it, the current AA code does not deal well with non-uniform scales.
One solution is to store in a interpolated varying vec4 the signed distance in screen space between the vertex and the line defined by each of the edges. In the fragment shader you can just take the max of the 4 floats, add 0.5, clamp to 0..1 and that gives you the AA.
Lee implemented this approach in DrawTargetWebGL: https://phabricator.services.mozilla.com/D143832

An alternative to the splitting scheme is to only use 1 quad for the non-aa part, hidden behind the opaque part. That way there's less edge cases to deal with (if the primitive is small enough, don't bother drawing the smaller opaque part).

Blocks: wr-todos
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: