This patch fixes some wasted GPU time on mobile devices due to
redundant resolve / copy steps.
In the first case, we would previously do:
- Global clear of color / depth on main framebuffer.
- Bind and draw off-screen targets.
- Bind main framebuffer and draw scene.
Between step 1 and 2, a resolve step is triggered on tiled GPU
drivers, wasting a lot of GPU time. To fix this, the clear is
now deferred until the framebuffer of the first document is
drawn. This does slightly change the semantics of how WR does
clear operations, but I think it works fine and makes more sense.
In the second case, we would previously do:
- Draw main framebuffer
- End frame and invalidate the contents of input textures.
- Bind main framebuffer and draw debug overlay.
This also introduces an extra resolve / copy step, even if the
debug overlay is not enabled. To fix this, the invalidation step
of the input textures to the main framebuffer pass is deferred
until all drawing is complete on the main framebuffer, by doing
the invalidation in the end_frame() call of the texture resolver.
Together, these save a very significant amount of ms per frame
in GPU time on the mobile devices I tested.