Closed Bug 1437091 Opened 7 years ago Closed 6 years ago

Categories

(Core :: Graphics: WebRender, defect, P2)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: jrmuizel, Assigned: gw)

References

Details

The WrBackend thread is the bottleneck for us now https://perfht.ml/2nOH3mE
Assignee: nobody → gwatson
This was tested without the changes from bug 1434243. That bug should make things quite a bit better.
Blocks: 1426770
I did a quick profile with a current release build. On my machine (Haswell i7-4790 CPU @ 3.60GHz), I see: Primitives: ~20,000 Gecko DisplayList: 9 - 11 ms / frame Gecko RenderLayers: 3 - 5 ms / frame WR Backend: ~12.5 ms / frame Of the time in WR backend, we have: 66% in Document::render ~ 50% of that in prepare_prim_for_render ~ 25% of that in batching 34% in create_frame_builder Mostly ends up in add_primitive and deserialization So there's nothing *obviously* wrong here, that's where I would expect the time to be spent - it's not that any of those methods are super-slow, there's just a lot of primitives to process so they are called a lot. We've spent zero time on CPU optimization, so there's probably several ms that can be saved from that time. Pushing some of that work onto multiple CPU threads is an option, although may not help much on low end machines with low core counts. If we need more than that, we probably need to consider other options, such as incremental display list support etc.
An optimization idea I just had that could help significantly with this test case, and also other real world sites: https://github.com/servo/webrender/issues/2627
We should also be able to make the batching code significantly faster in the near-ish future. Once all primitives are brush primitives, we can compress the size of each instance from 32 bytes down to 16-20 bytes. On a scene with this many primitives, that could have quite an effect on CPU time (and also compositor GPU upload time).
Priority: P1 → P2
Blocks: stage-wr-next
No longer blocks: stage-wr-trains

With picture caching enabled, the backend CPU time for this is now down to ~6ms on my machine (~half what it was previously). We can still improve with general optimizations, but this seems reasonable for ~20k primitives.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.