WebRender program cache / shader initialization blocks the main thread via the sync IPC message PWebRenderBridge::EnsureConnected (25ms-40ms startup penalty)
Categories
(Core :: Graphics: WebRender, defect, P3)
Tracking
()
Performance Impact | medium |
People
(Reporter: mstange, Assigned: mstange)
References
(Depends on 1 open bug, Blocks 4 open bugs)
Details
(Keywords: perf:responsiveness)
Profile: https://perfht.ml/39BkB5A (edit 2025-03-26 New profile: https://share.firefox.dev/4j4SFLy )
During startup, the main thread blocks until WebRender is initialized, via the following call path:
nsWindow::Create -> nsWindow::CreateLayerManager -> nsBaseWidget::CreateCompositor -> nsBaseWidget::CreateCompositorSession -> WebRenderLayerManager::Initialize -> bridge->SendEnsureConnected()
The time spent blocking in this function should be minimized. The main thread has more things to do before it can start the initial page load.
However, the reply to this message seems to be delayed by shader initialization: The WR program cache reads cached shaders from disk and passes them on to GL. I don't know whether this work is happening as a result of the EnsureConnected() call, or whether it started early and just happens to still run at the time the EnsureConnected() call comes in. (edit 2025-03-26: it's the latter, see comment 3.) In any case, it would be good to do this work asynchronously so that EnsureConnected can finish as quickly as possible and the main thread can move on.
Updated•5 years ago
|
Comment 2•5 years ago
|
||
Because this bug's Severity has not been changed from the default since it was filed, and it's Priority is P3
(Backlog,) indicating it has been triaged, the bug's Severity is being updated to S3
(normal.)
Updated•3 years ago
|
Updated•1 year ago
|
Assignee | ||
Comment 3•3 months ago
•
|
||
New profile: https://share.firefox.dev/4j4SFLy
This bug is a huge problem on the A55 devices that we run the applink startup performance test on, in CI. These devices are affected by bug 1954587 and end up blocking the Parent process Gecko main thread for more than 500ms.
Moving the initialization of shaders out of the EnsureConnected
blocking path would unblock page loading and other things.
Here's how the parent process Gecko main thread ends up blocking on shader initialization:
- Parent process Gecko main thread calls
WebRenderBridgeChild::SendEnsureConnected
. (sync IPC) - GPU process Compositor thread is blocked in
CompositorBridgeParent::AllocPWebRenderBridgeParent
, specifically in thetask.Wait()
call inWebRenderAPI::Create
, waiting for theNewRenderer
event to run on the Renderer thread. - The Renderer thread is busy in
InitDeviceTask()
, specifically inShaders::new
, whereLazilyCompiledShader::new
calls for startup shaders end up callingDevice::create_program
. - Once the Renderer thread is done with
InitDeviceTask()
, it can process theNewRenderer
task. - Once the Renderer thread is done with the
NewRenderer
task, thetask.Wait()
call in the compositor thread completes. - Once the compositor thread is done with
CompositorBridgeParent::AllocPWebRenderBridgeParent
, it can runWebRenderBridgeChild::RecvEnsureConnected
. - Once that's done,
WebRenderBridgeChild::SendEnsureConnected
completes in the parent process Gecko main thread.
Assignee | ||
Comment 4•3 months ago
|
||
So we'll need to find a way to ensure that the Renderer thread can run the NewRenderer
task before the shader initialization happens.
Assignee | ||
Comment 5•3 months ago
|
||
On the other hand, we also want to get started on shaders early, so that we can do the driver work in parallel to what the main thread is doing, and be ready to render quickly once Gecko has gotten around to opening the window and rendering the first page. See bug 1624030 for some previous work in this direction.
So maybe what we really want is a way to run the NewRenderer
task in the middle of shader compilation.
Ok, so under the assumptions that 1. We have to keep EnsureConnected synchronous, and 2. The NewRenderer work must finish before we can return from AllocPWebRenderBridgeParent (I don't know if these are reasonable assumptions), here's a suggestion:
We could split the shader initialization into small tasks, one task per shader. For example, the Shaders struct could keep a list of shaders that still remain to be initialized, and the Renderer thread could keep redispatching a ResumeShaderInit
task to itself, each of which could call a method on Shaders which handles a single shader and then returns a boolean "done" value which indicates whether it needs to be called again. If we do this, then the call to Shaders::new
during InitDeviceTask()
would complete quickly, and the NewRenderer
task could run between the ResumeShaderInit
tasks on the Renderer thread, which would allow the synchronous task.Wait()
on the compositor thread to complete quickly. And then the compositor thread becomes free to process the RecvEnsureConnected
message.
Assignee | ||
Comment 6•3 months ago
|
||
(In reply to Markus Stange [:mstange] from comment #5)
We could split the shader initialization into small tasks, one task per shader.
I have a prototype of this at https://treeherder.mozilla.org/jobs?repo=try&revision=955fe189dab5e2153c8f8aadc64acccdba9b0539 and it seems to work well. On the A55 (with the glProgramBinary weirdness), the time that the parent process gecko main thread is blocked in EnsureConnected is reduced from 759ms to 81ms in these profiles: https://share.firefox.dev/4i4HwcC (before) vs https://share.firefox.dev/4cuLXwf (after)
Comment 7•3 months ago
|
||
Does it also improve things on Windows?
Assignee | ||
Comment 8•3 months ago
•
|
||
Unlikely - with ANGLE on D3D, glProgramBinary is fast.
In comment 0 I also mentioned the work of reading cached shaders from disk, which may have some impact on Windows. The prototype in the try push doesn't break that work out from the InitDeviceTask yet.
Assignee | ||
Comment 9•3 months ago
|
||
I've put those patches up in bug 1959846, and I've filed bug 1959860 for the follow-up to also move the disk cache reading out of the InitDeviceTask.
I'll leave this bug open until both bugs are fixed and until we've confirmed that EnsureConnected completes faster.
Description
•