Taskify WebRender
Categories
(Core :: Graphics: WebRender, task, P3)
Tracking
()
People
(Reporter: kvark, Unassigned)
References
(Blocks 1 open bug)
Details
Currently, the work in WebRender is pipelined across 3+ threads:
- scene builder
- frame builder
- renderer
The cost of each of these stages varies depending on the content as well as the platform, even though we are trying to minimize it across the board. Some pages are batch-heavy and stress the renderer, some have a lot of blobs to slow down the scene building, some have deep picture stacks that makes frame building hard. On dual-core systems, we'd have too much of a thread contention. Multiple windows open may also unreasonably increase the CPU load.
The structure of this threading is fairly rigid: the only corner we can cut is skipping the scene building thread for some of the lighter scenes to be built. Yet, we aim for at most 2 frames of latency, so we can stretch the overall document processing pipeline (that includes Gecko stages in addition to WR) to be too long.
It would be good if our stages had an ability to execute in a pipelined way only when it is beneficial, as determined in a concrete situation (based on the aforementioned content and platform differences). The only solution to this that I see is removing the rigid threading and instead turning every step we have today into a "task" (hence the term "taskifying" in the subject) that has dependencies.
We already have the task graph in WR but it's made specifically for rendering jobs. Instead, it would be great to have a more general task scheduler that we could push the tasks into. The scheduler would make decisions on where a task could be executed, thus affecting the "pipelined" property of the whole frame. Such systems have long been popular in complex game engines:
- http://advances.realtimerendering.com/destiny/gdc_2015/Tatarchuk_GDC_2015__Destiny_Renderer_web.pdf
- http://twvideo01.ubm-us.net/o1/vault/gdc2015/presentations/Gyrling_Christian_Parallelizing_The_Naughty.pdf
Also, given that Gecko is trying to introduce a scheduler for all the tasks it has (see design document), doing this prep work would make it easy for us to hook up to an external/global scheduler (once it supports GPU process tasks), which may be desirable for overall complexity and performance of the Firefox in the future.
Concrete steps we could take in order to reach this state:
- Define a "task" interface with dependencies and potential context/thread pinning
- Wrap the existing major pipeline stages into tasks (scene building, frame building, rendering)
- Introduce a scheduler that would own a thread pool and run the tasks roughly in the same configuration they are ran today
- Optimize the scheduler and split up the major tasks into smaller ones
Updated•5 years ago
|
Comment 1•5 years ago
|
||
Hi Dzmitry, thanks for describing the context in detail and for sharing your thoughts on the problem. It would be great to discuss this more when we meet, so in preparation, here are a few initial thoughts and questions I had.
My understanding of the benefit of "traditional" taskification in games is that it's aimed at improving core utilization (see eg. page 17 in the Destiny slides). As in, taking long running functions and breaking them down into smaller parts that can run in parallel.
So first of all I'm not sure how you see taskification to help with overloading the CPU (dual core and/or multiple windows). Other than the notion of having fewer worker-threads instead of a fixed number of system-on-a-threads.
Also if the main problem is delays due to gating at the boundaries between the 3 pipeline stages, then being able to jump that gate and fast-track some of the work within the same frame is something we could investigate. I'm not sure if we need full-on taskification for that, maybe we have different visions of what taskification means :) That's why we need to have coffee together :)
If a new scheduler is coming, it feels potentially wasteful to develop our own scheduler and taskification-API, we might have to rework that considerably to fit into the new one. So we need to find the balance between getting some traction versus doing redundant work.
On the tooling front, to improve core utilization we need good visualizers that can show us idle cores, and which (worker) thread is blocked on what and why, which functions are good candidates for taskifying. That could be a mix of Firefox profiler, WPA, or even PIX, maybe other things. That is also something where I'd expect some overlap with the scheduling R&D, I'm sure they'll have similar tooling requirements once they start taskifying?
TLDR 1/ are we thinking about the same scope :) and 2/ let's wait and see a bit?
Thanks :)
| Reporter | ||
Comment 2•5 years ago
|
||
We had a good conversation about this at Berlin All Hands. Shaping the work in a graph of tasks with dependencies has a lot of synergy with the whole wr-complexity issue. We don't, however, want to start that reformation right away. There are concerns about:
- whether this will truly be faster for cases we care about
- losing the compile-time enforcement of resource sharing policies
The next step we are to take is recording a profile of the scheduling work we do today, and then trying to run it via a generic taskifier-like scheduler off-line, then evaluating a potential benefit from that. These experiments will inform our decision on the taskification.
Updated•3 years ago
|
Description
•