Open Bug 1464086 Opened 2 years ago Updated Last year

SImplify async scene building threading flow

Categories

(Core :: Graphics: WebRender, defect, P2)

Other Branch
defect

Tracking

()

Tracking Status
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- fix-optional

People

(Reporter: kats, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [gfx-noted])

For both snapshotting (e.g. for reftests) and the SyncWithCompositor API, we basically need a synchronous mechanism to flush the WR pending actions (e.g. scene builds, frame renders). Right now that appears to not actually be possible to do because of the convoluted thread flow. This is what happens when we get e.g. a new display list in the parent over PWebRenderBridge, with async scene building enabled:

1. Compositor thread gets RecvSetDisplayList [1]
2. Compositor thread sends display list to RenderBackend thread [2]
3. RenderBackend thread sends display list to SceneBuilder thread [3]
4. SceneBuilder builds scene (possibly a long operation) [4]
5. SceneBuilder sends built scene back to RenderBackend thread [5]
6. SceneBuilder invokes post-swap hook [6]
7. post-swap hook queues a task on the Compositor thread [7]
8. Compositor thread calls ScheduleGenerateFrame [8]
9. At the next vsync, CompositeToTarget is called on the compositor thread, which sends a GenerateFrame transaction to the RenderBackend thread [9]
10. The RenderBackend thread publishes stuff to the Renderer thread [10]
11. The Renderer thread does the render [11]
12. The Renderer thread posts a PipelineRendered message back to the Compositor thread [12]

The problematic part of this flow is steps 7 and 8, because that goes back to the compositor thread. So if sometime during steps 3-6, we get a call to WebRenderBridgeParent::RecvGetSnapshot, there's no way we can reach step 12 before returning from RecvGetSnapshot, because that call also runs on the compositor thread. In other words, the very fact that the compositor thread is running RecvGetSnapshot means that it can't run the task from step 7, and so the flow gets blocked until RecvGetSnapshot returns.

One of the reasons the flow is done this way is because of the frame counter machinery in RenderThread.cpp, and to ensure that doesn't get into a bad state. It seems to me that if we can figure out a better solution to throttling frames (i.e. a replacement for the TooManyPendingFrames API) then we should be able to clean up this flow somewhat. This bug is to tracking doing that.

[1] https://searchfox.org/mozilla-central/rev/eb51d734a8c06ac5973291be9bcc74ed4e999bca/gfx/layers/wr/WebRenderBridgeParent.cpp#601
[2] https://searchfox.org/mozilla-central/rev/eb51d734a8c06ac5973291be9bcc74ed4e999bca/gfx/layers/wr/WebRenderBridgeParent.cpp#678
[3] https://searchfox.org/mozilla-central/rev/dc6d85680539cb7e5fe2843d38a30a0581bfefe1/gfx/webrender/src/render_backend.rs#979
[4] https://searchfox.org/mozilla-central/rev/dc6d85680539cb7e5fe2843d38a30a0581bfefe1/gfx/webrender/src/scene_builder.rs#140
[5] https://searchfox.org/mozilla-central/rev/dc6d85680539cb7e5fe2843d38a30a0581bfefe1/gfx/webrender/src/scene_builder.rs#165
[6] https://searchfox.org/mozilla-central/rev/dc6d85680539cb7e5fe2843d38a30a0581bfefe1/gfx/webrender/src/scene_builder.rs#179
[7] https://searchfox.org/mozilla-central/rev/dc6d85680539cb7e5fe2843d38a30a0581bfefe1/gfx/webrender_bindings/RenderThread.cpp#576
[8] https://searchfox.org/mozilla-central/rev/dc6d85680539cb7e5fe2843d38a30a0581bfefe1/gfx/webrender_bindings/RenderThread.cpp#576
[9] https://searchfox.org/mozilla-central/rev/eb51d734a8c06ac5973291be9bcc74ed4e999bca/gfx/layers/wr/WebRenderBridgeParent.cpp#1379
[10] https://searchfox.org/mozilla-central/rev/dc6d85680539cb7e5fe2843d38a30a0581bfefe1/gfx/webrender/src/render_backend.rs#1078
[11] https://searchfox.org/mozilla-central/rev/dc6d85680539cb7e5fe2843d38a30a0581bfefe1/gfx/webrender_bindings/RenderThread.cpp#278
[12] https://searchfox.org/mozilla-central/rev/dc6d85680539cb7e5fe2843d38a30a0581bfefe1/gfx/webrender_bindings/RenderThread.cpp#287
mstange pointed out another potential source of sadness on IRC: "with non-webrender, resizing the window only blocks on parent process painting + compositing, and the compositing step composites whatever layer contents it has of the content processes, it doesn't wait for the content processes to paint"

In the WR case we might end up blocking on scene building during window resize, and scene building would include content process scenes. I don't know offhand which sync IPC calls are made during window resize.
The window resize seems to be calling the sync version of FlushRendering, which comes into WRBP at [1]. I'm not sure what the expectation is with respect to what gets "flushed" here, but right now it's probably not meeting the expectation even without async scene building. The function pushes a GenerateFrame transaction to the render backend thread, and then waits for a task on the Renderer thread. What's missing from this is that it also needs to wait for the document to get published from the RB thread to the Renderer thread before it waits on the Renderer thread. Otherwise it can return before the stuff it wanted to wait for is actually done.

[1] https://searchfox.org/mozilla-central/rev/eb51d734a8c06ac5973291be9bcc74ed4e999bca/gfx/layers/wr/WebRenderBridgeParent.cpp#1453
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #2)
> The function pushes a
> GenerateFrame transaction to the render backend thread, and then waits for a
> task on the Renderer thread. What's missing from this is that it also needs
> to wait for the document to get published from the RB thread to the Renderer
> thread before it waits on the Renderer thread. Otherwise it can return
> before the stuff it wanted to wait for is actually done.

This comment is wrong. RunOnRenderThread actually posts a task to the render backend thread to post a task to the Renderer thread, so the behaviour here should actually be correct.
Batch updating status flags based on WR schedule - please adjust if needed.
After some more fiddling I think it's not necessary to ship this to turn on ASB in automation. The solution I have just results in an extra frame render sometimes but it should be fine for now. I'll still leave this as blocking the async-scene-building metabug because we'll want to do something about this before too long.
Assignee: bugmail → nobody
Blocks: 1455591
No longer blocks: 1417784, 1452845
Priority: -- → P2
Assignee: nobody → bugmail
The various flushing changes that have landed since this bug was filed have resolved the issues I had originally filed this bug for. The "we'll block on content scene build during resize" from comment 1 is still an issue. I would have expected that resizing the browser window on e.g. https://static.mozilla.com/moco/en-US/images/mozilla_eoy_2013_EN.svg would be really slow and janky but happily that's not really the case. Maybe because the slow part is blob rendering and that gets cached so it doesn't happen on every scene build?

Also I still would like to clean up the TooManyPendingFrames machinery but so far it seems to be working ok and I no obvious ideas on how to improve it come to mind, so maybe we can defer that. Bumping this bug to the backlog.
Blocks: stage-wr-backlog
No longer blocks: stage-wr-nightly
Summary: SImplify async scene building threading flow to allow sync flushing → SImplify async scene building threading flow
Assignee: bugmail → nobody
You need to log in before you can comment on or make changes to this bug.