Closed
Bug 1479912
Opened 6 years ago
Closed 6 years ago
Tab warming of background tabs janks the browser UI
Categories
(Core :: Graphics: WebRender, defect, P2)
Core
Graphics: WebRender
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox63 | --- | affected |
People
(Reporter: mstange, Assigned: nical)
References
Details
Attachments
(2 files, 1 obsolete file)
Steps to reproduce:
1. Open https://static.mozilla.com/moco/en-US/images/mozilla_eoy_2013_EN.svg in a new tab and switch to it.
2. Switch back to the previous tab, scroll up and down a bit, and wait for things to settle down.
3. Move your mouse over the other tab in the tab bar. (This triggers tab warming.)
4. Move your mouse button over toolbar buttons or try scrolling the foreground tab.
Expected results:
Everything should stay smooth.
Actual results:
Hovering the background tab janks the UI and pauses scrolling.
In a profile, you can see that hovering the tab triggers blob image rasterization activity on the WRWorker threads.
This is very surprising because the current scene does not contain an iframe item for the background tab, so scene building on it shouldn't be triggering any work for the background tab's blob images.
Comment 1•6 years ago
|
||
It looks like every time we send a display list transaction to WR, the blob image updates in that transaction get rasterized, regardless of which pipeline they are attached to.
The blobs are pulled from the transaction at [1] and get passed through to the scene builder which rasterizes them at [2]. The fact that the display list is for some unconnected sub-pipeline is never checked.
[1] https://searchfox.org/mozilla-central/rev/196560b95f191b48ff7cba7c2ba9237bba6b5b6a/gfx/webrender/src/render_backend.rs#978
[2] https://searchfox.org/mozilla-central/rev/196560b95f191b48ff7cba7c2ba9237bba6b5b6a/gfx/webrender/src/scene_builder.rs#155
Updated•6 years ago
|
Assignee | ||
Comment 2•6 years ago
|
||
The ongoing work on de-prioritizing content blob rasterization should help a lot here, although this begs the question of whether we should optimistically rasterize blob images during warming if we aren't sure they are going to be shown (given how much time CPU blob image rasterization can consume).
Comment 3•6 years ago
|
||
We can't release this to the field, but we can let this ride to beta.
Priority: P1 → P2
Comment 4•6 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #2)
> The ongoing work on de-prioritizing content blob rasterization should help a
> lot here, although this begs the question of whether we should
> optimistically rasterize blob images during warming if we aren't sure they
> are going to be shown (given how much time CPU blob image rasterization can
> consume).
I think we probably should, but ideally at a low priority. The whole point of tab warming is to do speculative (and intensive) work for stuff we think is likely to be shown soon, and blob rasterization should fit into that model.
I don't fully understand why we need the de-prioritization to solve this. Until bug 1477819 is fixed, my understanding of the current state of affairs is that blob rasterization happens during composite. But we shouldn't be doing any WR work at all for non-active tabs until the tab switcher decides to activate them and send them to the compositor, right?
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 5•6 years ago
|
||
The de-prioritization work for tab switching happens to solve this because expensive content blob images end up not blocking the UI transactions anymore. We could fix it some other way but I need the de-prioritization for tab switching anyway.
> But we shouldn't be doing any WR work at all for non-active tabs until the tab switcher decides to activate them and send them to the compositor, right?
Blob images are mostly rasterized when building the scene which happens when WebRender receives a display list and the tab warming code appears to be generating display lists.
I'm not sure if by composition you mean the moment when we execute the GPU commands (triggered by vsync) or if it encompasses everything that webrender does on the compositor process (including scene building triggered by display list updates).
Flags: needinfo?(nical.bugzilla)
Comment 6•6 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #5)
> The de-prioritization work for tab switching happens to solve this because
> expensive content blob images end up not blocking the UI transactions
> anymore. We could fix it some other way but I need the de-prioritization for
> tab switching anyway.
Right, I get that part.
> > But we shouldn't be doing any WR work at all for non-active tabs until the tab switcher decides to activate them and send them to the compositor, right?
>
> Blob images are mostly rasterized when building the scene which happens when
> WebRender receives a display list and the tab warming code appears to be
> generating display lists.
>
> I'm not sure if by composition you mean the moment when we execute the GPU
> commands (triggered by vsync) or if it encompasses everything that webrender
> does on the compositor process (including scene building triggered by
> display list updates).
I think my question is why we're calling SetDisplayList before the tab switcher activates the tab. The async tab switcher tells the content process to generate a DL, and then gets a notification when that's done. It seems like we do want to do eager blob rasterization before the tab switcher pulls the trigger, but it seems less good to do scene building (which IIUC discards the old scene) when we may not actually switch to the tab.
Flags: needinfo?(nical.bugzilla)
Reporter | ||
Comment 7•6 years ago
|
||
> It seems like we do want to do eager blob rasterization before the tab switcher pulls the trigger
Certainly. That's the main point of tab warming.
> but it seems less good to do scene building
Does scene building here even see the background tab or does it just rebuild the current scene of the window, which displaying the foreground tab?
Comment 8•6 years ago
|
||
We discussed this on IRC, which led to bug 1490751 and bug 1490752 being filed.
Updated•6 years ago
|
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 9•6 years ago
|
||
This was fixed by the low-priority transaction work.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 10•6 years ago
|
||
Doesn't seem fixed unfortunately :(
Steps to reproduce:
(0. Enable gfx.webrender.debug.new-frame-indicator and gfx.webrender.debug.new-scene-indicator.)
1. Open https://web.archive.org/web/20150401205722/http://www.pokecommunity.com/showthread.php?t=161616 in a background tab and wait for it to load.
2. Open a few more tabs.
3. Mouse around the tab bar.
Expected results:
The tab hover animation should be responsive at all times.
Actual results:
Shortly after hovering the pokecommunity tab, the browser UI freezes for a while.
During this time, the new-frame-indicator keeps spinning but the new-scene-indicator is stuck.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 11•6 years ago
|
||
Interestingly, the compositor thread isn't receiving any display list during the freeze, but it's not blocked and we are still getting webrender to schedule frames, which would indicate that some other mechanism is holding off on sending SetDisplayList messages to the WebRenderBridgeParent.
Assignee | ||
Comment 12•6 years ago
|
||
Oh dear.
On linux it appears that we synchronously flush everything a lot more than we should. The main thread looks like this:
> pthread_cond_wait
> 0x55dcc715b89a
> mozilla::ipc::MessageChannel::Send(IPC::Message*, IPC::Message*)
> mozilla::layers::PCompositorBridgeChild::SendFlushRendering()
> PCompositorBridge::Msg_FlushRendering
> mozilla::layers::WebRenderLayerManager::FlushRendering()
> nsViewManager::Refresh(nsView*, mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel> const&)
> nsViewManager::PaintWindow(nsIWidget*, mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel> const&)
> nsView::PaintWindow(nsIWidget*, mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel>)
> draw_window_of_widget(_GtkWidget*, _GdkWindow*, _cairo*)
> expose_event_cb(_GtkWidget*, _cairo*)
> gtk_main_do_event
Flush rendering flushes *everything* including low priority transactions.
From there I think that there are two things we can do independently:
- Figure out why we are doing these synchronous flushes while merely hovering tabs
- Not block on low priority work when flushing (or only when we really need to)
Assignee | ||
Comment 13•6 years ago
|
||
Assignee | ||
Comment 14•6 years ago
|
||
Comment 15•6 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #14)
> Created attachment 9013968 [details]
> Bug 1479912 - Don't rasterize blobs in parallel when there isn't a lot of
> blobs. r=jrmuizel
Does this make an noticeable performance difference? i.e. Can you include some more motivation in the commit message?
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 16•6 years ago
|
||
I kicked a couple of try pushes with 5 and 3 blobs as the threshold, we'll see what talos thinks about it. As far as I'm concerned I didn't notice a difference but that was from browsing with a powerful computer and it'd be worth experimenting with the constant on slower hardware.
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 17•6 years ago
|
||
parallel rasterization with 3 blobs or more: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9ae03cdc5bf8&newProject=try&newRevision=d733e9b56e012187f4da5ff3b52dbabf7fc8e921&framework=1&showOnlyComparable=1
parallel rasterization with 5 blobs or more: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9ae03cdc5bf8&newProject=try&newRevision=76a6862ff28b19a675aa16cd023a523973988bfa&framework=1
5+ blobs compared to 3+ blobs: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=76a6862ff28b&newProject=try&newRevision=d733e9b56e012187f4da5ff3b52dbabf7fc8e921&framework=1
It doesn't look like there's very significant difference between the two, but they both appear to improve tsvgx.
I'm kicking some more jobs to reduce the noise.
Assignee | ||
Comment 18•6 years ago
|
||
Updated•6 years ago
|
Attachment #9013968 -
Attachment is obsolete: true
Comment 19•6 years ago
|
||
Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bea77a4e9699
Don't rasterize blobs in parallel when there isn't a lot of blobs. r=jrmuizel
Assignee | ||
Updated•6 years ago
|
Keywords: leave-open
Reporter | ||
Comment 20•6 years ago
|
||
There isn't much justification in the commit message. Can you explain why it solves the bug as filed? Or does it not solve it and that's why the bug is going to stay open?
Comment 21•6 years ago
|
||
bugherder |
Assignee | ||
Comment 22•6 years ago
|
||
> There isn't much justification in the commit message. Can you explain why it solves the bug as filed? Or does it not solve it and that's why the bug is going to stay open?
There are several sources of jank:
- priority inversion happening when a high priority blob rasterization request for the UI has to wait for a busy thread pool full of expensive low-priority work to pick it up,
- any time we call FlushSceneBuilder on the compositor thread or other blocking api method when there is a lot of work happening on the scene builder thread.
The patch that just landed addresses most of the former scenario by avoiding rayon when there are few enough jobs that potential parallelism might not be worth the cost of dispatching to the thread pool. This could be avoided if the job scheduling system we used was able to have the caller thread start doing work while letting the workers steal from it, alas rayon doesn't work that way.
The other patch on this bug hasn't landed yet (it needed something to be done on the webrender side first) but it should be good now (I just need to rebase it), and addresses one of the big offenders of the second source of jank: when closing a tab, the cleanup code does synchronous work which can take a long time and freeze the compositor thread. There will still be similar potential UI jank after these two patches due to remaining synchronous calls But I think these two patches get rid of most of the issues I was able to reproduce.
Comment 23•6 years ago
|
||
We gained perf improvements:
== Change summary for alert #16663 (as of Wed, 10 Oct 2018 16:45:02 GMT) ==
Improvements:
8% tsvgx linux64-qr opt e10s stylo 263.44 -> 242.28
4% tsvgx windows10-64-qr opt e10s stylo 267.15 -> 255.19
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=16663
Comment 24•6 years ago
|
||
Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/59b465985117
Clear WebRender resources asynchronously. r=sotaro
Comment 25•6 years ago
|
||
bugherder |
Assignee | ||
Updated•6 years ago
|
Status: REOPENED → RESOLVED
Closed: 6 years ago → 6 years ago
Keywords: leave-open
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•