Async transactions don't integrate with the tab switcher

RESOLVED FIXED

Status

()

defect
P1
normal
RESOLVED FIXED
9 months ago
7 months ago

People

(Reporter: mstange, Assigned: nical)

Tracking

(Depends on 1 bug, Blocks 2 bugs)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox63 disabled)

Details

Attachments

(3 attachments)

(Reporter)

Description

9 months ago
Steps to reproduce:
 1. Enable WebRender.
 2. Open a background tab that's slow to paint, for example https://static.mozilla.com/moco/en-US/images/mozilla_eoy_2013_EN.svg .
 3. Switch to that tab.

Expected results:
After 400ms, the UI should update to visually select the clicked tab and show a white tab with a spinner in the center. Once the tab has finished painting, the spinner should go away and the actual tab contents should be shown.
The browser UI should be responsive while the tab is painting, and it should be possible to switch back and forth tabs, or to other tabs entirely.

Actual results:
The whole UI "hangs", and the tab bar is inoperable, until rasterization has completed. The only thing you can do while the tab is painting is to scroll the original tab.
(Reporter)

Comment 1

9 months ago
I think this requires two changes:
 1. Dispatch the MozLayerTreeReady and MozLayerTreeCleared events at the right times. This seems to boil down to delaying the call to TabParent::LayerTreeUpdate until after the transaction has been processed by WebRender.
 2. Allow "ignoring" pending transactions from pipelines that are not visible on the screen, while presenting newer frames from pipelines that don't have any pending transactions.
(In reply to Markus Stange [:mstange] from comment #1)
> I think this requires two changes:
>  1. Dispatch the MozLayerTreeReady and MozLayerTreeCleared events at the
> right times. This seems to boil down to delaying the call to
> TabParent::LayerTreeUpdate until after the transaction has been processed by
> WebRender.

This seems easy enough to do. It should just involve moving the code at [1] to be invoked when the scene swap happens. (Or maybe after the frame build?)

>  2. Allow "ignoring" pending transactions from pipelines that are not
> visible on the screen, while presenting newer frames from pipelines that
> don't have any pending transactions.

I'm not sure I understand this part. If we have a pending transaction from a pipeline that's not yet visible, I think that transaction would always get processed on the scene builder thread, and it will be in the queue behind the transaction that makes that pipeline visible. Meanwhile the RB thread can still present new frames from the currently visible pipeline. So this should already be happening, I think.

[1] https://searchfox.org/mozilla-central/rev/d160ac574a5798010eda4e33e91ee628aa268a33/gfx/layers/wr/WebRenderBridgeParent.cpp#829-831
(Reporter)

Comment 3

9 months ago
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #2)
> >  2. Allow "ignoring" pending transactions from pipelines that are not
> > visible on the screen, while presenting newer frames from pipelines that
> > don't have any pending transactions.
> 
> I'm not sure I understand this part. If we have a pending transaction from a
> pipeline that's not yet visible, I think that transaction would always get
> processed on the scene builder thread, and it will be in the queue behind
> the transaction that makes that pipeline visible. Meanwhile the RB thread
> can still present new frames from the currently visible pipeline.

Let's call the current pipeline pipeline A and the pending pipeline pipeline B.
What if we now want to display pipeline C? Will it be in the queue behind pipeline B? We'll want to have a way to present pipeline C without waiting for pipeline B to be finished.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #2)
> This seems easy enough to do. It should just involve moving the code at [1]
> to be invoked when the scene swap happens. (Or maybe after the frame build?)

I spent some time working on this and I have a decent WIP here. However it seems like getting the async tab switching working with WR is going to require some serious surgery.

For one thing, one of the assumptions built into the design of the async tab switcher is that there is an expensive "paint" part and a cheap "composite" part to rendering a tab, and it's possible to do the expensive paint on a background tab and cache the result. With WR the "expensive paint" would correspond to a scene/frame build, but it's not currently possible to do that on a background tab at all. We can send the display list to WR, but the flattening step will just ignore it, because the root pipeline doesn't have an iframe item that links to the pipeline of the background tab. So the scene build for the background tab is just skipped, and will only happen for the first time after the tab switch is complete.

The other problem is that the tab switcher assumes it will be able to display a spinner if the expensive paint takes too long. For this to work, the work needed to paint and render the spinner must be able to run in parallel with the expensive paint. Without WR this happens because the spinner gets painted (I think) in the UI process while the expensive paint is happening in the content process. With WR these do not run in parallel, but instead run serially on the WR threads. I guess this is what Markus meant in comment 3 - pipeline C being the spinner which needs to be presented without waiting for pipeline B which is the expensive paint.

At any rate, I think both of these problems basically require that we be able to do the expensive part (scene/frame build) of a background tab in the background without affecting anything else, and I think that would be much easier after we implement document splitting (bug 1441308). I can't think of a good way to do it without document splitting.
Depends on: document-splitting
(Reporter)

Comment 7

9 months ago
That would make document splitting a hard ship blocker and not just an optimization, I think.
Priority: -- → P1
(Reporter)

Comment 8

9 months ago
Hmm, how about this scenario:
 - There's a main thread animation running in the browser UI.
 - The currently visible tab changes its contents to something that's expensive to rasterize.

Will the rasterization of the new content jank the animation running in the UI? I expect it will. That's not great.
Yeah, I think that scenario was one of the primary motivations for document splitting.
(Reporter)

Comment 10

9 months ago
kats, jrmuizel and I just had a meeting where we found some solutions to these problems, without the need for document splitting.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #5)
> For one thing, one of the assumptions built into the design of the async tab
> switcher is that there is an expensive "paint" part and a cheap "composite"
> part to rendering a tab, and it's possible to do the expensive paint on a
> background tab and cache the result. With WR the "expensive paint" would
> correspond to a scene/frame build, but it's not currently possible to do
> that on a background tab at all.

Right - scene building is done on the entire window, and you can't eagerly scene build just a "pipeline".

However, instead of considering the entire scene build as the "expensive paint" part, we can simplify things a bit and only consider blob rasterization as the expensive part. The rest of scene building is usually pretty quick.

We could add an API to WR that triggers rasterization for a specified list of blob images. This API could be called before scene building and even before the display list that contains these blob images is submitted to WR. WR would then rasterize the entire "visible area" for each of the specified blob images. Then, once a display list is submitted which uses the images, when WR then runs scene building on the window it'll notice that the necessary blob images don't have any invalid area, and scene building will be quick.

Whenever a transaction from a content process arrives at the compositor thread, we can process the transaction as follows:
 1. Trigger rasterization for the blobs in this transaction.
 2. Wait for rasterization to be completed, asynchronously.
 3. Once rasterization has completed, submit the new display list to WR.
 4. Send the notification to the tab switcher to indicate that this tab is now ready to be composited.

Then the tab switcher triggers the submission of a new parent process display list which includes an iframe item which points to the new tab. On the compositor side, we submit this new display list to WR. This triggers scene building on the new display list which contains the new tab's contents. Scene building is quick because no blob images need to be rasterized. After scene building has finished, on the next vsync the new scene is presented to the screen.

> The other problem is that the tab switcher assumes it will be able to
> display a spinner if the expensive paint takes too long. For this to work,
> the work needed to paint and render the spinner must be able to run in
> parallel with the expensive paint. Without WR this happens because the
> spinner gets painted (I think) in the UI process while the expensive paint
> is happening in the content process. With WR these do not run in parallel,
> but instead run serially on the WR threads.

With the new plan, rasterization in WR still happens on the WRWorker threads, but does not go through scene building. So when the parent process wants to put the spinner on the screen, the WRWorker threads will be busy but the SceneBuilder thread will be available. If the tab spinner requires no rasterization, then this scene can be presented to the screen instantly.
If the tab spinner *does* require rasterization, then it'll need to wait for a WRWorker thread to become available. We might be able to play some games with work prioritization here to make this happen more quickly - e.g. any pending "eager rasterization" work could be lower priority when the WRWorker thread chooses its next work item.

(In reply to Markus Stange [:mstange] from comment #8)
> Hmm, how about this scenario:
>  - There's a main thread animation running in the browser UI.
>  - The currently visible tab changes its contents to something that's
> expensive to rasterize.
> 
> Will the rasterization of the new content jank the animation running in the
> UI? I expect it will. That's not great.

The plan outlined above will solve this problem, too: The rasterization of the new content will happen before the content's new display list is submitted to WR, so WR still has the old display list for the content which is ready to be drawn quickly. As the parent process's new display list arrive, WR can keep drawing those new display lists with the old content. Only once the rasterization of the new content blob images has completed will the new content's display list be submitted to WR, and at that point it'll be quick to draw.
Good summary of our conclusions, thanks!

(In reply to Markus Stange [:mstange] from comment #10)
> The plan outlined above will solve this problem, too: The rasterization of
> the new content will happen before the content's new display list is
> submitted to WR, so WR still has the old display list for the content which
> is ready to be drawn quickly. As the parent process's new display list
> arrive, WR can keep drawing those new display lists with the old content.
> Only once the rasterization of the new content blob images has completed
> will the new content's display list be submitted to WR, and at that point
> it'll be quick to draw.

I realized that for this scenario, any blobs that need to be rasterized for the parent process will also need to prioritized ahead of the content process' blobs. So it makes sense to just always give the UI process blobs higher priority. Or maybe reserve one worker thread such that it will never work on content process things.
(Reporter)

Comment 12

9 months ago
Makes sense.

Another thing I forgot to mention is that this plan would move WebRender's async processing model a bit closer to the current (non-WebRender) model: With non-WebRender, rasterization happens in the content process before the new layer tree is submitted to the compositor. With this plan in WebRender, rasterization will happen in the WR process but it'll happen *before* the display list is submitted to WebRender. This similarity might make it a bit easier to convince ourselves that some performance characteristics will look similar between non-WebRender and WebRender.

For example, take FlushRendering: If you have a content page with expensive blob images, with this plan, things like window focus changes or window resizes will no longer block on rasterization of those blob images: The rasterization of those blob images happens before the new display list has been submitted, so FlushRendering will only flush the *old* content scene, which is fast. In other words, our plan addresses bug 1464086 comment 1.
Quick update: with nical's help I wrote a patch to move the blob rendering so that it happens off the scene builder thread, before the scene is built. Patch is at [1]. However we need to ensure that the transactions for a given pipeline don't get reordered (i.e. if a content process sends transaction A with many blobs followed by transaction B with no blobs, they are processed in the same order on the scene builder thread). We also need to add a mechanism for the parent process blobs to get priority. And finally we'll need to delay the layers observer update notification until after the scene build completes. I have WIP patches from before that do this at [2], they need more testing.

I'm working on the prioritization of parent process stuff; the easiest way might just be to use our dedicated WRWorker threadpool only for parent process blobs and bump all the other blobs to the rayon global threadpool. I didn't find any mechanism within rayon to prioritize some jobs ahead of others. Even with that though, we might need to adjust thread priorities explicitly.

[1] https://github.com/staktrace/gecko-dev/commit/e2eed93ce7ae4180a47bfe1bc39a8ca361b0d297
[2] https://github.com/staktrace/gecko-dev/commits/asyncswitch
After some more fiddling, I discovered the approach I implemented in comment 13 doesn't work properly once we have the UI process blobs rasterizing concurrently with the content process blobs. A scenario I ran into:

(a) content process sends a display list update with expensive-to-rasterize blobs
(b) UI process sends a display list update which cheap/no blobs
(c) UI blob rasterization completes, goes to scene build and render
(d) content blob rasterization completes, goes to scene build and render

The problem here is that in step (a), a scene request is created with a clone of all the current pending scene data, and by the time we get to (d) that data is stale because the UI scene pipeline is different. Trying to render the stale scene can panic because of images that have since been deleted, etc.

Conceptually the point at which the scene request is created seems to be the first place in the data flow where the data from different processes get merged. In order to allow concurrent blob rasterization for the two processes we need it to happen before that point.

I wrote up a modified version [1] that introduces a "rasterizer" module that is similar to the scene_builder module, in that it has a dedicated thread where it receives transactions (which in this case are blob rasterization requests), does the blocking rasterize() call, and sends the result back to the render_backend. The result is then pumped back into update_document, at which point the scene request is built and sent to the scene builder thread. Except that I have two rasterizers, one for the UI process transactions and one for the content process transactions. They both use the same worker pool for the actual rasterization, but I plan to change that. It's unfortunate that this approach introduces yet more threads, but I think that if we ditch rayon and use a custom threadpool with support for job prioritization we should be able to consolidate some of the threads.

@nical, question for you: What scenario is the code at [2] used for? Is it used in servo, or maybe left over from an attempt to support async blob rasterization without async scene building on by default? I don't think we should ever hit this codepath in gecko but maybe I'm wrong.

[1] https://github.com/staktrace/gecko-dev/commits/eagerblob
[2] https://searchfox.org/mozilla-central/rev/f0c15db995198a1013e1c5f5b5bea54ef83f1049/gfx/webrender/src/render_backend.rs#1012-1014
Forgot to needinfo for the question in the previous comment
Flags: needinfo?(nical.bugzilla)
(Assignee)

Comment 17

9 months ago
> What scenario is the code at [2] used for? Is it used in servo, or maybe left over from an attempt to support async blob rasterization without async scene building on by default? I don't think we should ever hit this codepath in gecko but maybe I'm wrong.

If there is a blob image to rasterize within the transaction, since we want to rasterize blobs on the scene builder and want the blob rasterization to be visible at the same time as the rest of the scene changes we have to not skip the scene builder thread. So I felt that it was healthy to put this check here to avoid bogus transactions that dispatch rasterization and apply other updates right away without waiting for the rasterization to be done, even though if gecko carefully builds the commands we don't technically need this check.

If you need to relax this, that's fine. The only risk is that we send a transaction that updates a blob image without changing the scene, and updates a bunch of other things like a non-blob image or an animated property, and then the blob rasterization could be out of sync with the other updates if the transaction is set to skip the scene builder. But as long as gecko doesn't generate this type of thing that's fine. We could even allow this by not rasterizing blobs on the scene builder in this situation if need be.
Servo doesn't rely on this (doesn't use blobs at all AFAIK).
Flags: needinfo?(nical.bugzilla)
(Assignee)

Comment 18

9 months ago
I am not very enthused by the prospect of having a gazillion thread pools. Having prioritization would be cool but I'm not convinced that we really need it in this case. The way rayon's thread pool works when doing fork-join parallelism is that the dispatching thread makes jobs that can be stolen by workers of the pool, but if the thread pool is already super busy, what tends to happen is that the workers just don't steal any job and the dispatching thread just runs the jobs sequentially (as a bonus we don't get latency from waiting for the last job to finish on a worker).
In general having a heavy blob workload coming from content should not block the UI's rasterization, just reduce the amount of concurrency we potentially get there. Adding a separate thread pool doesn't necessarily help because if the content's thread pool keeps the CPU busy, the ui's thread pool threads won't necessarily wake up to steal jobs and do work. On the other hand if we have many threads we increase the likelihood of a UI blob rasterization to be interrupted by the scheduler giving CPU time to something else like the heavy content blob workload, in which case the UI stuff is blocked until the scheduler assigns CPU resources to the worker thread again.
Ok, thanks for the explanation. My latest WIP is at https://treeherder.mozilla.org/#/jobs?repo=try&revision=429ae14da24c938e3ef8d135148098fc1568cf79 - there are some failures that need investigation but I don't know if I'll have the time to look into that for the next ~6 weeks; feel free to steal. The patches as written do allow async tab switching (at least the scenario from comment 0) to work, in my local testing. That is, the spinner gets shown while the slow content rasterizes blobs.
(Just realized that the test failures might be because I didn't update the FlushSceneBuild code path to also flush the rasterizer, but it should do that for snapshotting to work properly. So that's probably the first place to look)
No longer blocks: stage-wr-trains
(Reporter)

Updated

8 months ago
Duplicate of this bug: 1426164
Assignee: nobody → nical.bugzilla
(Assignee)

Comment 23

8 months ago
I rebased Kats' latest patch queue and replaced the pair of rasterizer threads with a single low priority thread, see https://github.com/servo/webrender/pull/2989.
The idea is the same except that high priority blobs are still rasterized on the scene builder thread while the low priority ones first go to a special thread that does the rasterization before forwarding the transaction along with the rasterized blobs to the regular scene builder.

This simplifies things a little by not having to wake up the render backend after rasterization and avoiding the FlushContext (flush is just forwarded to the regular scene builder through the low-priority one), etc.

> +------------------+     +------------------+
> | LowPSceneBuilder | --> |   SceneBuilder   |
> +------------------+     +------------------+
>               ^           ^     |
>               |           |     |
>               |           |     v
>          +-----------------------+      +-----------+
> api ---> |     RenderBackend     | ---> |  Renderer |
>          +-----------------------+      +-----------+
Do you have a try push or something with everything rebased/applied? I'm having a hard time seeing how this will avoid the problem I described at the start of comment 15 but I also don't know what the final code looks like with everything applied.
(Assignee)

Comment 25

8 months ago
I had to move the pending scene data structure to the scene builder thread [1]. I am in the process of rebasing everything, I'll put up a try push soon. The problem is resolved by lettung the scene builder always have an up to date view of the latest scene changes so that it doesn't need to message the render backend for the latest/coherent state.


[1] https://github.com/servo/webrender/pull/2998
(Assignee)

Updated

7 months ago
Blocks: 1479912
We know what we need to do for this and it should be done soon. We don't need to block Nightly.
Blocks: stage-wr-trains
No longer blocks: stage-wr-nightly
Comment on attachment 9008059 [details]
Bug 1477819 - Use WebRender low priority transactions for web content. r=jrmuizel

Jeff Muizelaar [:jrmuizel] has approved the revision.
Attachment #9008059 - Flags: review+

Comment 29

7 months ago
Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/627738d5f791
Use WebRender low priority transactions for web content. r=jrmuizel
(Assignee)

Updated

7 months ago
Keywords: leave-open

Comment 30

7 months ago
Pushed by dluca@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3bf051cda7f6
Use WebRender low priority transactions for web content. r=jrmuizel
(Assignee)

Comment 35

7 months ago
I got the tab spinner to show up using these patches (and artificially making all blob rasterization take 4 seconds). There's a tiny modification of the rust code that needs to wait for the next WebRender update.
Depends on: 1493473
No longer depends on: 1493473
(Assignee)

Comment 36

7 months ago
I rebased the two patches, they are ready for review and landing. Try push looks good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=752ed90bbede0172a1ff89f971dc3c932d4c5d38
Comment on attachment 9011011 [details]
Bug 1477819 - Expose Transaction::Notify in WebRender's C++ wrapper. r=jrmuizel

Jeff Muizelaar [:jrmuizel] has approved the revision.
Attachment #9011011 - Flags: review+
Comment on attachment 9011010 [details]
Bug 1477819 - Defer ObserveLayersUpdate to after scene building. r=jrmuizel

Jeff Muizelaar [:jrmuizel] has approved the revision.
Attachment #9011010 - Flags: review+
(Reporter)

Comment 39

7 months ago
(In reply to Nicolas Silva [:nical] from comment #36)
> I rebased the two patches, they are ready for review and landing. Try push
> looks good:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=752ed90bbede0172a1ff89f971dc3c932d4c5d38

The try build is showing some odd behavior in some cases, but I think that behavior is caused by bug 1491906, whose fix is not included in this build. (I checked - the fix for bug 1491906 was pushed to central in https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=b418f5bf31cb , but the try build is based on the earlier push https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=08592337ced1 .)

Comment 40

7 months ago
Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/87f63e01f33b
Expose Transaction::Notify in WebRender's C++ wrapper. r=jrmuizel
https://hg.mozilla.org/integration/mozilla-inbound/rev/67f51e6c1683
Defer ObserveLayersUpdate to after scene building. r=jrmuizel
(Assignee)

Updated

7 months ago
Status: NEW → RESOLVED
Last Resolved: 7 months ago
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.