Tab warming of background tabs janks the browser UI

RESOLVED FIXED

Status

()

defect
P2
normal
RESOLVED FIXED
Last year
9 months ago

People

(Reporter: mstange, Assigned: nical)

Tracking

(Blocks 1 bug)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox63 affected)

Details

Attachments

(2 attachments, 1 obsolete attachment)

Steps to reproduce:
 1. Open https://static.mozilla.com/moco/en-US/images/mozilla_eoy_2013_EN.svg in a new tab and switch to it.
 2. Switch back to the previous tab, scroll up and down a bit, and wait for things to settle down.
 3. Move your mouse over the other tab in the tab bar. (This triggers tab warming.)
 4. Move your mouse button over toolbar buttons or try scrolling the foreground tab.

Expected results:
Everything should stay smooth.

Actual results:
Hovering the background tab janks the UI and pauses scrolling.


In a profile, you can see that hovering the tab triggers blob image rasterization activity on the WRWorker threads.
This is very surprising because the current scene does not contain an iframe item for the background tab, so scene building on it shouldn't be triggering any work for the background tab's blob images.
It looks like every time we send a display list transaction to WR, the blob image updates in that transaction get rasterized, regardless of which pipeline they are attached to.

The blobs are pulled from the transaction at [1] and get passed through to the scene builder which rasterizes them at [2]. The fact that the display list is for some unconnected sub-pipeline is never checked.

[1] https://searchfox.org/mozilla-central/rev/196560b95f191b48ff7cba7c2ba9237bba6b5b6a/gfx/webrender/src/render_backend.rs#978
[2] https://searchfox.org/mozilla-central/rev/196560b95f191b48ff7cba7c2ba9237bba6b5b6a/gfx/webrender/src/scene_builder.rs#155
Assignee: nobody → nical.bugzilla
Priority: -- → P1
The ongoing work on de-prioritizing content blob rasterization should help a lot here, although this begs the question of whether we should optimistically rasterize blob images during warming if we aren't sure they are going to be shown (given how much time CPU blob image rasterization can consume).
We can't release this to the field, but we can let this ride to beta.
Priority: P1 → P2
Depends on: 1477819
(In reply to Nicolas Silva [:nical] from comment #2)
> The ongoing work on de-prioritizing content blob rasterization should help a
> lot here, although this begs the question of whether we should
> optimistically rasterize blob images during warming if we aren't sure they
> are going to be shown (given how much time CPU blob image rasterization can
> consume).

I think we probably should, but ideally at a low priority. The whole point of tab warming is to do speculative (and intensive) work for stuff we think is likely to be shown soon, and blob rasterization should fit into that model.

I don't fully understand why we need the de-prioritization to solve this. Until bug 1477819 is fixed, my understanding of the current state of affairs is that blob rasterization happens during composite. But we shouldn't be doing any WR work at all for non-active tabs until the tab switcher decides to activate them and send them to the compositor, right?
Flags: needinfo?(nical.bugzilla)
The de-prioritization work for tab switching happens to solve this because expensive content blob images end up not blocking the UI transactions anymore. We could fix it some other way but I need the de-prioritization for tab switching anyway.

> But we shouldn't be doing any WR work at all for non-active tabs until the tab switcher decides to activate them and send them to the compositor, right?

Blob images are mostly rasterized when building the scene which happens when WebRender receives a display list and the tab warming code appears to be generating display lists.

I'm not sure if by composition you mean the moment when we execute the GPU commands (triggered by vsync) or if it encompasses everything that webrender does on the compositor process (including scene building triggered by display list updates).
Flags: needinfo?(nical.bugzilla)
(In reply to Nicolas Silva [:nical] from comment #5)
> The de-prioritization work for tab switching happens to solve this because
> expensive content blob images end up not blocking the UI transactions
> anymore. We could fix it some other way but I need the de-prioritization for
> tab switching anyway.

Right, I get that part.

> > But we shouldn't be doing any WR work at all for non-active tabs until the tab switcher decides to activate them and send them to the compositor, right?
> 
> Blob images are mostly rasterized when building the scene which happens when
> WebRender receives a display list and the tab warming code appears to be
> generating display lists.
> 
> I'm not sure if by composition you mean the moment when we execute the GPU
> commands (triggered by vsync) or if it encompasses everything that webrender
> does on the compositor process (including scene building triggered by
> display list updates).

I think my question is why we're calling SetDisplayList before the tab switcher activates the tab. The async tab switcher tells the content process to generate a DL, and then gets a notification when that's done. It seems like we do want to do eager blob rasterization before the tab switcher pulls the trigger, but it seems less good to do scene building (which IIUC discards the old scene) when we may not actually switch to the tab.
Flags: needinfo?(nical.bugzilla)
> It seems like we do want to do eager blob rasterization before the tab switcher pulls the trigger

Certainly. That's the main point of tab warming.

> but it seems less good to do scene building

Does scene building here even see the background tab or does it just rebuild the current scene of the window, which displaying the foreground tab?
We discussed this on IRC, which led to bug 1490751 and bug 1490752 being filed.
Flags: needinfo?(nical.bugzilla)
This was fixed by the low-priority transaction work.
Status: NEW → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Doesn't seem fixed unfortunately :(

Steps to reproduce:
 (0. Enable gfx.webrender.debug.new-frame-indicator and gfx.webrender.debug.new-scene-indicator.)
 1. Open https://web.archive.org/web/20150401205722/http://www.pokecommunity.com/showthread.php?t=161616 in a background tab and wait for it to load.
 2. Open a few more tabs.
 3. Mouse around the tab bar.

Expected results:
The tab hover animation should be responsive at all times.

Actual results:
Shortly after hovering the pokecommunity tab, the browser UI freezes for a while.
During this time, the new-frame-indicator keeps spinning but the new-scene-indicator is stuck.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Interestingly, the compositor thread isn't receiving any display list during the freeze, but it's not blocked and we are still getting webrender to schedule frames, which would indicate that some other mechanism is holding off on sending SetDisplayList messages to the WebRenderBridgeParent.
Oh dear.

On linux it appears that we synchronously flush everything a lot more than we should. The main thread looks like this:

> pthread_cond_wait
> 0x55dcc715b89a
> mozilla::ipc::MessageChannel::Send(IPC::Message*, IPC::Message*)
> mozilla::layers::PCompositorBridgeChild::SendFlushRendering()
> PCompositorBridge::Msg_FlushRendering
> mozilla::layers::WebRenderLayerManager::FlushRendering()
> nsViewManager::Refresh(nsView*, mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel> const&)
> nsViewManager::PaintWindow(nsIWidget*, mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel> const&)
> nsView::PaintWindow(nsIWidget*, mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel>)
> draw_window_of_widget(_GtkWidget*, _GdkWindow*, _cairo*)
> expose_event_cb(_GtkWidget*, _cairo*)
> gtk_main_do_event

Flush rendering flushes *everything* including low priority transactions.

From there I think that there are two things we can do independently:

 - Figure out why we are doing these synchronous flushes while merely hovering tabs
 - Not block on low priority work when flushing (or only when we really need to)
See Also: → 1455597
Depends on: 1495228
(In reply to Nicolas Silva [:nical] from comment #14)
> Created attachment 9013968 [details]
> Bug 1479912 - Don't rasterize blobs in parallel when there isn't a lot of
> blobs. r=jrmuizel

Does this make an noticeable performance difference? i.e. Can you include some more motivation in the commit message?
Flags: needinfo?(nical.bugzilla)
I kicked a couple of try pushes with 5 and 3 blobs as the threshold, we'll see what talos thinks about it. As far as I'm concerned I didn't notice a difference but that was from browsing with a powerful computer and it'd be worth experimenting with the constant on slower hardware.
Flags: needinfo?(nical.bugzilla)
Attachment #9013968 - Attachment is obsolete: true
Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bea77a4e9699
Don't rasterize blobs in parallel when there isn't a lot of blobs. r=jrmuizel
Keywords: leave-open
There isn't much justification in the commit message. Can you explain why it solves the bug as filed? Or does it not solve it and that's why the bug is going to stay open?
> There isn't much justification in the commit message. Can you explain why it solves the bug as filed? Or does it not solve it and that's why the bug is going to stay open?

There are several sources of jank:
 - priority inversion happening when a high priority blob rasterization request for the UI has to wait for a busy thread pool full of expensive low-priority work to pick it up,
 - any time we call FlushSceneBuilder on the compositor thread or other blocking api method when there is a lot of work happening on the scene builder thread.

The patch that just landed addresses most of the former scenario by avoiding rayon when there are few enough jobs that potential parallelism might not be worth the cost of dispatching to the thread pool. This could be avoided if the job scheduling system we used was able to have the caller thread start doing work while letting the workers steal from it, alas rayon doesn't work that way.

The other patch on this bug hasn't landed yet (it needed something to be done on the webrender side first) but it should be good now (I just need to rebase it), and addresses one of the big offenders of the second source of jank: when closing a tab, the cleanup code does synchronous work which can take a long time and freeze the compositor thread. There will still be similar potential UI jank after these two patches due to remaining synchronous calls But I think these two patches get rid of most of the issues I was able to reproduce.
We gained perf improvements:

== Change summary for alert #16663 (as of Wed, 10 Oct 2018 16:45:02 GMT) ==

Improvements:

  8%  tsvgx linux64-qr opt e10s stylo          263.44 -> 242.28
  4%  tsvgx windows10-64-qr opt e10s stylo     267.15 -> 255.19

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=16663
Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/59b465985117
Clear WebRender resources asynchronously. r=sotaro
Status: REOPENED → RESOLVED
Closed: 10 months ago9 months ago
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.