Ok so here's the situation with this bug:
The image mask shader (cs_clip_image.glsl) has a difficult task: it takes in an image
(instanced for each tile if the input image was tiled) that is in local space as
well as a target rectangle in screen space (subrect / actual_rect).
This is a problem because we may have two rectangles in different spaces that we need to render the intersection of. In particular, the subrect is always axis-aligned, but the local rect may not be. The current implementation approaches the problem by having the vertex shader generate geometry for the subrect while using discard in the fragment shader to mask out the shape of the input image (tile). The bug is ultimately that this doesn't properly perform AA on the boundaries of the image (tile), so we can end up rendering garbage there when the edges are non-trivial (when the image is not axis aligned).
An interesting side-effect of this approach is that the image-mask shader does not initialize the pixels of the subrect that aren't touched by the image (tile). This is actually necessary for tiling to work, because each tile instance is unaware of the other tiles, so no instance has the information necessary for us to initialize the pixels outside our own tile. It is conjectured that this is "fine" because anything that uses this image-mask will naturally clip itself to the image's rect, and therefore not source the out-of-bounds pixels (which contain whatever garbage was in the texture beforehand).
I was working on changing our approach so that the vertex shader generates geometry for the local image (tile), and use a scissor rect to clip to the subrect.
The benefit of this is that we could use the same edge AA logic we use for other prims, and the GPU should theoretically be able to do less work because it can clearly see the geometry and avoid fragment shading, instead of relying on discard
The downside of this is that changing the scissor rect breaks batching. This wouldn't hurt the tiled case, as each tile shares the same subrect, but it would hurt batching of other alpha tasks together. A potential optimization we can perform is to identify when the image rect is axis-aligned, in which case we can avoid using a scissor and instead make the vertices represent the intersection of the subrect and the image rect.
My patch is incomplete in a few regards:
- I was still just testing out the approach, so my implementation is a bit messy
- I hadn't integrated my approach with the rest of the cs_clip_shared family of shaders, but I'm not sure
that's actually necessary/desirable. cs_clip_image is the only one that handles tiling, so it's genuinely a bit different.
- I hadn't exactly finished figuring out how to properly apply the prim_transform and clip_transform; there is some suggestion that this is legacy over-engineered for things gecko won't ever emit.
- I hadn't yet piped through the logic for the scissor rect, which I had been intending to ask gw to handle (since he seemed to think he could do it relatively easily, while it would have taken me a while to learn all the relevant code)
- I hadn't implemented the axis-aligned optimization, although I was intending to only seriously consider that as a follow-up if the theoretical batching issues show up for real.