1534187 - WR render target allocation strategy can be very inefficient.

Glenn Watson [:gw]

Reporter

Description

•

6 years ago

I'm profiling espn.com on a mobile device, which is running much slower than expected.

In this case, I'm seeing ~85% of the GPU time spent in a render pass that is drawing clip masks.

On this mobile device, both drawing the clip masks and resolving the framebuffer tiles are very expensive.

I added some logging to show the utilization of this render target:

Size: 1833 x 2037 (area 3733821)
Actual used area (sum of area of allocated rects): 198518.

So the utilization of the target is ~5%.

In this case, it's because there are some very long and thin clip mask allocations on each axis, and so placing them in the same render target results in a very large allocated rect, even though the used pixels in the target are a small fraction of that.

So, we need to come up with a better strategy for render target allocation. It might make sense to have a heuristic where allocations that have one dimension much larger than the other go into a separate rectangular render target, perhaps?

Thoughts / suggestions?

Glenn Watson [:gw]

Reporter

Updated

•

6 years ago

Flags: needinfo?(dmalyshau)

Flags: needinfo?(bobbyholley)

Dzmitry Malyshau [:kvark]

Comment 1

•

6 years ago

Would this be solved by more aggressive tiling of the clip masks? I.e. just split everything in 512 chunks in each dimension, like we do for blob images.

We could even go as far as do the tile splits according to the available storage in the current target, i.e. supposing we need to allocate 1900x10 piece, we look at the current 2048x2048 surface and see that we can only fit 1000x10 at most, so we cut it there in 2 pieces, and draw the other half in the next layer. On a second thought, this might be complex to implement: we only know about the available RT size when we assign to passes, and only at this point we'd know how the instances are built on the dependent passes...

So, we need to come up with a better strategy for render target allocation. It might make sense to have a heuristic where allocations that have one dimension much larger than the other go into a separate rectangular render target, perhaps?

Not sure how this would work though: if we have a separate target with a separate size, that means it has to be processed in a separate pass. This means more RT/texture switches, and not clear how the pass dependencies are going to work out, logically, within the current framework. Nicola has been looking into making the RT graph more generalized, so this might be handy.

Perhaps, what we are missing is just more precise tracking of the available texture/target space, i.e. a better allocator?

Flags: needinfo?(dmalyshau)

Bobby Holley (:bholley)

Updated

•

6 years ago

Blocks: 1533833

Bobby Holley (:bholley)

Comment 2

•

6 years ago

(In reply to Dzmitry Malyshau [:kvark] from comment #1)

Perhaps, what we are missing is just more precise tracking of the available texture/target space, i.e. a better allocator?

I'm not sure how a better allocator can help here. If we have one mask that's 1x2000 and another that's 2000x1, we need to either drop the requirement that everything goes in the same texture array or allocate a 2000x2001 texture.

It feels to me that the right general solution is tiling, like kvark suggested. I remember talking with Glenn while I was working on the current RT allocator and he suggested that it'd just be a stop-gap until we got our tiling situation in order. Forcing the render output into dimensions that we control seems like the only way to avoid being at the mercy of pessimal geometry in web content.

Flags: needinfo?(bobbyholley)

Dzmitry Malyshau [:kvark]

Comment 3

•

6 years ago

(In reply to Bobby Holley (:bholley) from comment #2)

(In reply to Dzmitry Malyshau [:kvark] from comment #1)

Perhaps, what we are missing is just more precise tracking of the available texture/target space, i.e. a better allocator?

I'm not sure how a better allocator can help here. If we have one mask that's 1x2000 and another that's 2000x1, we need to either drop the requirement that everything goes in the same texture array or allocate a 2000x2001 texture.

I was referring to split_guillotine logic, which is rough and clearly sub-optimal in some cases. In your example, allocating a 2000x2001 texture is not necessarily the end of game for us. Supposing we over-allocate to 2048x2048, end up fitting all the thin allocations plus having lots of space for the rest of texture data. Ability to efficiently allocate the space we have is what a better allocator could help with.

Jeff Muizelaar [:jrmuizel]

Updated

•

6 years ago

Blocks: wr-android-mvp

Priority: -- → P2

Jessie [:jbonisteel] pls NI

Updated

•

6 years ago

Whiteboard: [wr-amvp][wr-q2]

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

5 years ago

Blocks: wr-android
No longer blocks: wr-android-mvp

Kris Taeleman (:ktaeleman)

Comment 4

•

4 years ago

@Glenn: Is this still relevant?

Blocks: wr-android-perf
No longer blocks: wr-android

Flags: needinfo?(gwatson)

Glenn Watson [:gw]

Reporter

Comment 5

•

4 years ago

No, this is no longer an issue.

Status: NEW → RESOLVED

Closed: 4 years ago

Flags: needinfo?(gwatson)

Resolution: --- → FIXED

Bugzilla

WR render target allocation strategy can be very inefficient.

Categories

(Core :: Graphics: WebRender, enhancement, P2)

Tracking

()

People

(Reporter: gw, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [wr-amvp][wr-q2])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Comment 2

Comment 3

Updated

Updated

Updated

Comment 4

Comment 5