Open Bug 1626816 Opened 5 years ago Updated 3 months ago

WebRender program cache / shader initialization blocks the main thread via the sync IPC message PWebRenderBridge::EnsureConnected (25ms-40ms startup penalty)

Categories

(Core :: Graphics: WebRender, defect, P3)

defect

Tracking

()

ASSIGNED
Performance Impact medium

People

(Reporter: mstange, Assigned: mstange)

References

(Depends on 1 open bug, Blocks 4 open bugs)

Details

(Keywords: perf:responsiveness)

Profile: https://perfht.ml/39BkB5A (edit 2025-03-26 New profile: https://share.firefox.dev/4j4SFLy )

During startup, the main thread blocks until WebRender is initialized, via the following call path:
nsWindow::Create -> nsWindow::CreateLayerManager -> nsBaseWidget::CreateCompositor -> nsBaseWidget::CreateCompositorSession -> WebRenderLayerManager::Initialize -> bridge->SendEnsureConnected()

The time spent blocking in this function should be minimized. The main thread has more things to do before it can start the initial page load.

However, the reply to this message seems to be delayed by shader initialization: The WR program cache reads cached shaders from disk and passes them on to GL. I don't know whether this work is happening as a result of the EnsureConnected() call, or whether it started early and just happens to still run at the time the EnsureConnected() call comes in. (edit 2025-03-26: it's the latter, see comment 3.) In any case, it would be good to do this work asynchronously so that EnsureConnected can finish as quickly as possible and the main thread can move on.

Priority: -- → P3

(might be even p1)

Whiteboard: [qf] → [qf:p2:responsiveness]

Because this bug's Severity has not been changed from the default since it was filed, and it's Priority is P3 (Backlog,) indicating it has been triaged, the bug's Severity is being updated to S3 (normal.)

Severity: normal → S3
Performance Impact: --- → P2
Whiteboard: [qf:p2:responsiveness]
Blocks: wr-todos
Blocks: perf-android
Blocks: 1894804
No longer blocks: 1894804

New profile: https://share.firefox.dev/4j4SFLy

This bug is a huge problem on the A55 devices that we run the applink startup performance test on, in CI. These devices are affected by bug 1954587 and end up blocking the Parent process Gecko main thread for more than 500ms.

Moving the initialization of shaders out of the EnsureConnected blocking path would unblock page loading and other things.

Here's how the parent process Gecko main thread ends up blocking on shader initialization:

  • Parent process Gecko main thread calls WebRenderBridgeChild::SendEnsureConnected. (sync IPC)
  • GPU process Compositor thread is blocked in CompositorBridgeParent::AllocPWebRenderBridgeParent, specifically in the task.Wait() call in WebRenderAPI::Create, waiting for the NewRenderer event to run on the Renderer thread.
  • The Renderer thread is busy in InitDeviceTask(), specifically in Shaders::new, where LazilyCompiledShader::new calls for startup shaders end up calling Device::create_program.
  • Once the Renderer thread is done with InitDeviceTask(), it can process the NewRenderer task.
  • Once the Renderer thread is done with the NewRenderer task, the task.Wait() call in the compositor thread completes.
  • Once the compositor thread is done with CompositorBridgeParent::AllocPWebRenderBridgeParent, it can run WebRenderBridgeChild::RecvEnsureConnected.
  • Once that's done, WebRenderBridgeChild::SendEnsureConnected completes in the parent process Gecko main thread.

So we'll need to find a way to ensure that the Renderer thread can run the NewRenderer task before the shader initialization happens.

On the other hand, we also want to get started on shaders early, so that we can do the driver work in parallel to what the main thread is doing, and be ready to render quickly once Gecko has gotten around to opening the window and rendering the first page. See bug 1624030 for some previous work in this direction.

So maybe what we really want is a way to run the NewRenderer task in the middle of shader compilation.

Ok, so under the assumptions that 1. We have to keep EnsureConnected synchronous, and 2. The NewRenderer work must finish before we can return from AllocPWebRenderBridgeParent (I don't know if these are reasonable assumptions), here's a suggestion:

We could split the shader initialization into small tasks, one task per shader. For example, the Shaders struct could keep a list of shaders that still remain to be initialized, and the Renderer thread could keep redispatching a ResumeShaderInit task to itself, each of which could call a method on Shaders which handles a single shader and then returns a boolean "done" value which indicates whether it needs to be called again. If we do this, then the call to Shaders::new during InitDeviceTask() would complete quickly, and the NewRenderer task could run between the ResumeShaderInit tasks on the Renderer thread, which would allow the synchronous task.Wait() on the compositor thread to complete quickly. And then the compositor thread becomes free to process the RecvEnsureConnected message.

(In reply to Markus Stange [:mstange] from comment #5)

We could split the shader initialization into small tasks, one task per shader.

I have a prototype of this at https://treeherder.mozilla.org/jobs?repo=try&revision=955fe189dab5e2153c8f8aadc64acccdba9b0539 and it seems to work well. On the A55 (with the glProgramBinary weirdness), the time that the parent process gecko main thread is blocked in EnsureConnected is reduced from 759ms to 81ms in these profiles: https://share.firefox.dev/4i4HwcC (before) vs https://share.firefox.dev/4cuLXwf (after)

Assignee: nobody → mstange.moz
Status: NEW → ASSIGNED

Does it also improve things on Windows?

Unlikely - with ANGLE on D3D, glProgramBinary is fast.

In comment 0 I also mentioned the work of reading cached shaders from disk, which may have some impact on Windows. The prototype in the try push doesn't break that work out from the InitDeviceTask yet.

Depends on: 1959846
Depends on: 1959860

I've put those patches up in bug 1959846, and I've filed bug 1959860 for the follow-up to also move the disk cache reading out of the InitDeviceTask.

I'll leave this bug open until both bugs are fixed and until we've confirmed that EnsureConnected completes faster.

You need to log in before you can comment on or make changes to this bug.