Several redundant full-size tile composite passes
Categories
(Core :: Graphics: WebRender, defect, P3)
Tracking
()
People
(Reporter: nical, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
1.23 MB,
image/png
|
Details |
I'm not sure how to interpret what I'm seeing in renderdoc, but I get a first composite pass that render the content picture tiles, followed by two z-rejected full-size draw calls over the content area with tile-sized quads at different vertical offsets, and then a final draw call for the tabs.
See in the attached image the three passes (with the wireframe in yellow that is a bit hard to see).
Comment 1•5 years ago
|
||
Previously noted in https://bugzilla.mozilla.org/show_bug.cgi?id=1584794#c17 (see point 5). Might be related to how we have multiple picture caching slices now (for fixed-position stuff versus the scrolled stuff).
Comment 2•5 years ago
|
||
Yes, these are (currently) expected, from the background tiles in the content and the UI.
They should be getting z-rejected (appears they are from your comment above), so shouldn't be a huge cost, even though they are not ideal.
My plan is to occlude these tiles on the CPU - I need to do a small amount of refactoring to allow this. The refactoring will allow us to not only skip compositing those tiles, but also know early enough to skip allocating a texture and rasterizing those occluded tiles.
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Comment 3•5 years ago
|
||
I did some profiling of this on my (reasonably old) Intel HD4600 integrated GPU on a 4k screen in RenderDoc.
I did a WR capture, and then replayed this in wrench with the --no-batch
option, and got draw call timings in RenderDoc for each of the tile blits.
The tile blits for the content tiles at the front of the screen are ~0.45 - ~0.5ms per tile. The timings for the occluded tiles are typically ~0.002 - ~0.02 ms per tile.
All up, the reported GPU time for the blits of the occluded tiles on that GPU was 0.21 ms / frame, which is ~1.3% of the GPU frame budget time.
I suspect that this is due to the use of hi-z allowing the tiles to be rejected quickly and with minimal memory bandwidth, but we should perhaps try to verify this with GPA which can probably get the Intel hardware counters.
Given this, I think we can probably make this a low priority for now (there's much bigger GPU wins to be had elsewhere), unless we can find a system where the profiler reports significant GPU time in these passes. Does that sound reasonable?
Updated•5 years ago
|
Comment 4•5 years ago
|
||
Yes, this is totally low priority. Note that Hi-Z may be subject to HW/driver differences, so your footprint on Android, for example, may be different.
Updated•5 years ago
|
Reporter | ||
Comment 5•5 years ago
|
||
It makes the picture caching debugging view very slow and hard to read, though. It would be good to address that at least.
Comment 6•5 years ago
|
||
In various discussions, it became clear that having the tile occlusion would provide several other benefits too (see https://bugzilla.mozilla.org/show_bug.cgi?id=1591526#c1 for more information).
Given those reasons, I'm planning to make this a higher priority - hopefully get it done in the next week or so.
Description
•