Closed Bug 1534187 Opened 5 years ago Closed 4 years ago

WR render target allocation strategy can be very inefficient.

Categories

(Core :: Graphics: WebRender, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED

People

(Reporter: gw, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [wr-amvp][wr-q2])

I'm profiling espn.com on a mobile device, which is running much slower than expected.

In this case, I'm seeing ~85% of the GPU time spent in a render pass that is drawing clip masks.

On this mobile device, both drawing the clip masks and resolving the framebuffer tiles are very expensive.

I added some logging to show the utilization of this render target:

Size: 1833 x 2037 (area 3733821)
Actual used area (sum of area of allocated rects): 198518.

So the utilization of the target is ~5%.

In this case, it's because there are some very long and thin clip mask allocations on each axis, and so placing them in the same render target results in a very large allocated rect, even though the used pixels in the target are a small fraction of that.

So, we need to come up with a better strategy for render target allocation. It might make sense to have a heuristic where allocations that have one dimension much larger than the other go into a separate rectangular render target, perhaps?

Thoughts / suggestions?

Flags: needinfo?(dmalyshau)
Flags: needinfo?(bobbyholley)

Would this be solved by more aggressive tiling of the clip masks? I.e. just split everything in 512 chunks in each dimension, like we do for blob images.

We could even go as far as do the tile splits according to the available storage in the current target, i.e. supposing we need to allocate 1900x10 piece, we look at the current 2048x2048 surface and see that we can only fit 1000x10 at most, so we cut it there in 2 pieces, and draw the other half in the next layer. On a second thought, this might be complex to implement: we only know about the available RT size when we assign to passes, and only at this point we'd know how the instances are built on the dependent passes...

So, we need to come up with a better strategy for render target allocation. It might make sense to have a heuristic where allocations that have one dimension much larger than the other go into a separate rectangular render target, perhaps?

Not sure how this would work though: if we have a separate target with a separate size, that means it has to be processed in a separate pass. This means more RT/texture switches, and not clear how the pass dependencies are going to work out, logically, within the current framework. Nicola has been looking into making the RT graph more generalized, so this might be handy.

Perhaps, what we are missing is just more precise tracking of the available texture/target space, i.e. a better allocator?

Flags: needinfo?(dmalyshau)
Blocks: 1533833

(In reply to Dzmitry Malyshau [:kvark] from comment #1)

Perhaps, what we are missing is just more precise tracking of the available texture/target space, i.e. a better allocator?

I'm not sure how a better allocator can help here. If we have one mask that's 1x2000 and another that's 2000x1, we need to either drop the requirement that everything goes in the same texture array or allocate a 2000x2001 texture.

It feels to me that the right general solution is tiling, like kvark suggested. I remember talking with Glenn while I was working on the current RT allocator and he suggested that it'd just be a stop-gap until we got our tiling situation in order. Forcing the render output into dimensions that we control seems like the only way to avoid being at the mercy of pessimal geometry in web content.

Flags: needinfo?(bobbyholley)

(In reply to Bobby Holley (:bholley) from comment #2)

(In reply to Dzmitry Malyshau [:kvark] from comment #1)

Perhaps, what we are missing is just more precise tracking of the available texture/target space, i.e. a better allocator?

I'm not sure how a better allocator can help here. If we have one mask that's 1x2000 and another that's 2000x1, we need to either drop the requirement that everything goes in the same texture array or allocate a 2000x2001 texture.

I was referring to split_guillotine logic, which is rough and clearly sub-optimal in some cases. In your example, allocating a 2000x2001 texture is not necessarily the end of game for us. Supposing we over-allocate to 2048x2048, end up fitting all the thin allocations plus having lots of space for the rest of texture data. Ability to efficiently allocate the space we have is what a better allocator could help with.

Priority: -- → P2
Whiteboard: [wr-amvp][wr-q2]

@Glenn: Is this still relevant?

Blocks: wr-android-perf
No longer blocks: wr-android
Flags: needinfo?(gwatson)

No, this is no longer an issue.

Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(gwatson)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.