Closed Bug 1480172 Opened 7 years ago Closed 5 years ago

Partial-present for webrender

Categories

(Core :: Graphics: WebRender, enhancement, P3)

enhancement

Tracking

()

RESOLVED DUPLICATE of bug 1582624

People

(Reporter: bholley, Unassigned)

References

Details

Jeff thinks there isn't a bug on file for this, so I'm filing one.
I was just profiling CPU usage on [1], and found WR to be significantly higher than stock Firefox (30% vs 17%). Jeff suspect this was likely related to redrawing the browser chrome, and sure enough, rendering full screen got the numbers much closer (20% vs 22%). Here are profiles, which show about twice as much work being done on the Render thread when the browser-chrome is visible: * WR Animation profile normal: https://perfht.ml/2M6qoFd * WR Animation profile full-screen: https://perfht.ml/2KgLoHM Jeff says the plan is to fix this with document splitting (bug 1441308). However, it seems to me that we should be able to fix this with partial present. Jeff says that some Render CPU work will scale with the size of the scenegraph. But that seems problematic to me, since complex pages with with small async animations (i.e. transform/opacity) would perform worse than they do under the current setup. Thoughts Glenn? [1] https://staktrace.com/kats/
Flags: needinfo?(gwatson)
(In reply to Bobby Holley (:bholley) from comment #1) > [1] https://staktrace.com/kats/ I think you mean https://mozilla.staktrace.com/tmp/anim.html
(In reply to Markus Stange [:mstange] from comment #2) > (In reply to Bobby Holley (:bholley) from comment #1) > > [1] https://staktrace.com/kats/ > > I think you mean https://mozilla.staktrace.com/tmp/anim.html Yes, sorry. Doing too many things at once. :-)
Priority: -- → P3
Partial present is unlikely to help with CPU usage - it would be more related to saving GPU memory bandwidth and thus power savings. Document splitting is one solution to this - there are some other possibilities that we can discuss if we decide that there is too much work in Gecko to allow us to enable document splitting soon.
Flags: needinfo?(gwatson)
My question from comment 1 is more general. Does the design of WR fundamentally mean that CPU work on the Render thread will scale with the complexity of the scene graph? If so, that means WR will use more CPU in a lot of situations, which seems problematic.
Flags: needinfo?(gwatson)
There's a heap of subtleties involved here - it may be easier to discuss on vidyo. In particular: There's the RenderBackend thread, and the Render thread. You've referred to the Render thread above, but I'm not totally sure if that was deliberate? Broadly speaking, we can say: - The Render thread time is *somewhat* dependent on the complexity of the scene graph, but not linearly. For example, the more text runs you have that are visible, the more vertices need to be sent to the GPU each frame. In general cases, it's "approximately" independent of the scene graph complexity - for example, in most real world cases you typically end up with *roughly* the same number of draw calls, shader switches etc, regardless of the content complexity. - Currently we don't cache any of the GPU submission data (like vertex buffers) even when replaying the same Frame with different animation. There's nothing technically difficult about this, it just hasn't been a priority yet. - The RenderBackend time is dependent on the scenegraph complexity in a few ways. First, the larger the scenegraph the more serialization / deserialization we need to do. This is roughly linear in primitive + clip count - although this is typically a small amount of the total RenderBackend time (except in stress tests). Second, there is a certain amount of work that needs to be done per item - primarily checking if it's visible. This is usually also a small part of the overall time. Finally, there's a certain amount of work that needs to be done for each visible primitive - although, larger primitives tend to be more expensive to process than smaller ones (due to segmentation), so a small number of large items can sometimes be similar to cost of a large number of smaller items! - Most of the above only really applies when a new display list is being sent. For example, in the case of a transform / property animation, there is ~0 work needed on the RenderBackend thread. Once we start to cache GPU submission data, that would also very significantly reduce the cost of the Render thread time in the case of property animation. Does that help?
Flags: needinfo?(gwatson)
(In reply to Glenn Watson [:gw] from comment #6) > There's a heap of subtleties involved here - it may be easier to discuss on > vidyo. In particular: > > There's the RenderBackend thread, and the Render thread. You've referred to > the Render thread above, but I'm not totally sure if that was deliberate? It was - the profiles in comment 0 show ~twice the CPU usage on the Render thread for a simple transform animation when the browser UI is visible (compared to full screen). > > Broadly speaking, we can say: > > - The Render thread time is *somewhat* dependent on the complexity of the > scene graph, but not linearly. For example, the more text runs you have that > are visible, the more vertices need to be sent to the GPU each frame. In > general cases, it's "approximately" independent of the scene graph > complexity - for example, in most real world cases you typically end up with > *roughly* the same number of draw calls, shader switches etc, regardless of > the content complexity. > > - Currently we don't cache any of the GPU submission data (like vertex > buffers) even when replaying the same Frame with different animation. > There's nothing technically difficult about this, it just hasn't been a > priority yet. > > - The RenderBackend time is dependent on the scenegraph complexity in a few > ways. First, the larger the scenegraph the more serialization / > deserialization we need to do. This is roughly linear in primitive + clip > count - although this is typically a small amount of the total RenderBackend > time (except in stress tests). This is the serialization of GPU commands to be executed on the Render thread, is that right? > Second, there is a certain amount of work > that needs to be done per item - primarily checking if it's visible. This is > usually also a small part of the overall time. Finally, there's a certain > amount of work that needs to be done for each visible primitive - although, > larger primitives tend to be more expensive to process than smaller ones > (due to segmentation), so a small number of large items can sometimes be > similar to cost of a large number of smaller items! So I think the main question I'm getting at is whether we actually need to do all this work for everything in the scenegraph during partial present, or if there's a way to quickly skip primitives outside the area being rendered. > - Most of the above only really applies when a new display list is being > sent. For example, in the case of a transform / property animation, there is > ~0 work needed on the RenderBackend thread. Once we start to cache GPU > submission data, that would also very significantly reduce the cost of the > Render thread time in the case of property animation. So this answers my question in comment 0 I think. If I understand you correctly, then this (rather than either partial present or split documents) is the way to hit parity with FLB on the spinning rectangle animation. Is that right? Would this also apply to scrolling (assuming the scroll is small enough that we an use the existing scene graph)?

Was this resolved by bug 1582624 ?

There is a 2 types of partial present.

[1] for Windows is enabled on nightly. Then we could close this bug.

[2] is expected to be used on a PC that [1] could not be used. [2] is not completed yet by Bug 1595014.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.