Closed
Bug 1437091
Opened 7 years ago
Closed 6 years ago
Too much time spent in WrBackend on https://jrmuizel.github.io/implementation-tests/dl-test.html
Categories
(Core :: Graphics: WebRender, defect, P2)
Core
Graphics: WebRender
Tracking
()
RESOLVED
FIXED
People
(Reporter: jrmuizel, Assigned: gw)
References
Details
The WrBackend thread is the bottleneck for us now
https://perfht.ml/2nOH3mE
Reporter | ||
Updated•7 years ago
|
Assignee: nobody → gwatson
Reporter | ||
Comment 1•7 years ago
|
||
This was tested without the changes from bug 1434243. That bug should make things quite a bit better.
Updated•7 years ago
|
Blocks: stage-wr-trains
Priority: -- → P1
Assignee | ||
Comment 2•7 years ago
|
||
I did a quick profile with a current release build. On my machine (Haswell i7-4790 CPU @ 3.60GHz), I see:
Primitives: ~20,000
Gecko DisplayList: 9 - 11 ms / frame
Gecko RenderLayers: 3 - 5 ms / frame
WR Backend: ~12.5 ms / frame
Of the time in WR backend, we have:
66% in Document::render
~ 50% of that in prepare_prim_for_render
~ 25% of that in batching
34% in create_frame_builder
Mostly ends up in add_primitive and deserialization
So there's nothing *obviously* wrong here, that's where I would expect the time to be spent - it's not that any of those methods are super-slow, there's just a lot of primitives to process so they are called a lot.
We've spent zero time on CPU optimization, so there's probably several ms that can be saved from that time. Pushing some of that work onto multiple CPU threads is an option, although may not help much on low end machines with low core counts.
If we need more than that, we probably need to consider other options, such as incremental display list support etc.
Assignee | ||
Comment 3•7 years ago
|
||
An optimization idea I just had that could help significantly with this test case, and also other real world sites:
https://github.com/servo/webrender/issues/2627
Assignee | ||
Comment 4•7 years ago
|
||
We should also be able to make the batching code significantly faster in the near-ish future. Once all primitives are brush primitives, we can compress the size of each instance from 32 bytes down to 16-20 bytes. On a scene with this many primitives, that could have quite an effect on CPU time (and also compositor GPU upload time).
Reporter | ||
Updated•6 years ago
|
Priority: P1 → P2
Reporter | ||
Updated•6 years ago
|
Assignee | ||
Comment 5•6 years ago
|
||
With picture caching enabled, the backend CPU time for this is now down to ~6ms on my machine (~half what it was previously). We can still improve with general optimizations, but this seems reasonable for ~20k primitives.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•