Closed Bug 1479912 Opened 7 years ago Closed 7 years ago

Tab warming of background tabs janks the browser UI

Tracking

()

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

firefox63

---

affected

People

(Reporter: mstange, Assigned: nical)

References

Details

Attachments

(2 files, 1 obsolete file)

Bug 1479912 - Clear WebRender resources asynchronously. r=sotaro 7 years ago Nicolas Silva [:nical] 46 bytes, text/x-phabricator-request		Details \| Review
Bug 1479912 - Don't rasterize blobs in parallel when there isn't a lot of blobs. r=jrmuizel 7 years ago Nicolas Silva [:nical] 46 bytes, text/x-phabricator-request		Details \| Review
Bug 1479912 - Don't rasterize blobs in parallel when there isn't a lot of blobs. r=jrmuizel 7 years ago Nicolas Silva [:nical] 46 bytes, text/x-phabricator-request		Details \| Review

Markus Stange [:mstange]

Reporter

Description

•

7 years ago

Steps to reproduce: 1. Open https://static.mozilla.com/moco/en-US/images/mozilla_eoy_2013_EN.svg in a new tab and switch to it. 2. Switch back to the previous tab, scroll up and down a bit, and wait for things to settle down. 3. Move your mouse over the other tab in the tab bar. (This triggers tab warming.) 4. Move your mouse button over toolbar buttons or try scrolling the foreground tab. Expected results: Everything should stay smooth. Actual results: Hovering the background tab janks the UI and pauses scrolling. In a profile, you can see that hovering the tab triggers blob image rasterization activity on the WRWorker threads. This is very surprising because the current scene does not contain an iframe item for the background tab, so scene building on it shouldn't be triggering any work for the background tab's blob images.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 1

•

7 years ago

It looks like every time we send a display list transaction to WR, the blob image updates in that transaction get rasterized, regardless of which pipeline they are attached to. The blobs are pulled from the transaction at [1] and get passed through to the scene builder which rasterizes them at [2]. The fact that the display list is for some unconnected sub-pipeline is never checked. [1] https://searchfox.org/mozilla-central/rev/196560b95f191b48ff7cba7c2ba9237bba6b5b6a/gfx/webrender/src/render_backend.rs#978 [2] https://searchfox.org/mozilla-central/rev/196560b95f191b48ff7cba7c2ba9237bba6b5b6a/gfx/webrender/src/scene_builder.rs#155

Jeff Muizelaar [:jrmuizel]

Updated

•

7 years ago

Assignee: nobody → nical.bugzilla

Blocks: stage-wr-trains

Priority: -- → P1

Nicolas Silva [:nical]

Assignee

Comment 2

•

7 years ago

The ongoing work on de-prioritizing content blob rasterization should help a lot here, although this begs the question of whether we should optimistically rasterize blob images during warming if we aren't sure they are going to be shown (given how much time CPU blob image rasterization can consume).

Maire Reavy [:mreavy]

Comment 3

•

7 years ago

We can't release this to the field, but we can let this ride to beta.

Priority: P1 → P2

Nicolas Silva [:nical]

Assignee

Updated

•

7 years ago

Depends on: 1477819

Bobby Holley (:bholley)

Comment 4

•

7 years ago

(In reply to Nicolas Silva [:nical] from comment #2) > The ongoing work on de-prioritizing content blob rasterization should help a > lot here, although this begs the question of whether we should > optimistically rasterize blob images during warming if we aren't sure they > are going to be shown (given how much time CPU blob image rasterization can > consume). I think we probably should, but ideally at a low priority. The whole point of tab warming is to do speculative (and intensive) work for stuff we think is likely to be shown soon, and blob rasterization should fit into that model. I don't fully understand why we need the de-prioritization to solve this. Until bug 1477819 is fixed, my understanding of the current state of affairs is that blob rasterization happens during composite. But we shouldn't be doing any WR work at all for non-active tabs until the tab switcher decides to activate them and send them to the compositor, right?

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 5

•

7 years ago

The de-prioritization work for tab switching happens to solve this because expensive content blob images end up not blocking the UI transactions anymore. We could fix it some other way but I need the de-prioritization for tab switching anyway. > But we shouldn't be doing any WR work at all for non-active tabs until the tab switcher decides to activate them and send them to the compositor, right? Blob images are mostly rasterized when building the scene which happens when WebRender receives a display list and the tab warming code appears to be generating display lists. I'm not sure if by composition you mean the moment when we execute the GPU commands (triggered by vsync) or if it encompasses everything that webrender does on the compositor process (including scene building triggered by display list updates).

Flags: needinfo?(nical.bugzilla)

Bobby Holley (:bholley)

Comment 6

•

7 years ago

(In reply to Nicolas Silva [:nical] from comment #5) > The de-prioritization work for tab switching happens to solve this because > expensive content blob images end up not blocking the UI transactions > anymore. We could fix it some other way but I need the de-prioritization for > tab switching anyway. Right, I get that part. > > But we shouldn't be doing any WR work at all for non-active tabs until the tab switcher decides to activate them and send them to the compositor, right? > > Blob images are mostly rasterized when building the scene which happens when > WebRender receives a display list and the tab warming code appears to be > generating display lists. > > I'm not sure if by composition you mean the moment when we execute the GPU > commands (triggered by vsync) or if it encompasses everything that webrender > does on the compositor process (including scene building triggered by > display list updates). I think my question is why we're calling SetDisplayList before the tab switcher activates the tab. The async tab switcher tells the content process to generate a DL, and then gets a notification when that's done. It seems like we do want to do eager blob rasterization before the tab switcher pulls the trigger, but it seems less good to do scene building (which IIUC discards the old scene) when we may not actually switch to the tab.

Flags: needinfo?(nical.bugzilla)

Markus Stange [:mstange]

Reporter

Comment 7

•

7 years ago

> It seems like we do want to do eager blob rasterization before the tab switcher pulls the trigger Certainly. That's the main point of tab warming. > but it seems less good to do scene building Does scene building here even see the background tab or does it just rebuild the current scene of the window, which displaying the foreground tab?

Bobby Holley (:bholley)

Comment 8

•

7 years ago

We discussed this on IRC, which led to bug 1490751 and bug 1490752 being filed.

Bobby Holley (:bholley)

Updated

•

7 years ago

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 9

•

7 years ago

This was fixed by the low-priority transaction work.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Markus Stange [:mstange]

Reporter

Comment 10

•

7 years ago

Doesn't seem fixed unfortunately :( Steps to reproduce: (0. Enable gfx.webrender.debug.new-frame-indicator and gfx.webrender.debug.new-scene-indicator.) 1. Open https://web.archive.org/web/20150401205722/http://www.pokecommunity.com/showthread.php?t=161616 in a background tab and wait for it to load. 2. Open a few more tabs. 3. Mouse around the tab bar. Expected results: The tab hover animation should be responsive at all times. Actual results: Shortly after hovering the pokecommunity tab, the browser UI freezes for a while. During this time, the new-frame-indicator keeps spinning but the new-scene-indicator is stuck.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Nicolas Silva [:nical]

Assignee

Comment 11

•

7 years ago

Interestingly, the compositor thread isn't receiving any display list during the freeze, but it's not blocked and we are still getting webrender to schedule frames, which would indicate that some other mechanism is holding off on sending SetDisplayList messages to the WebRenderBridgeParent.

Nicolas Silva [:nical]

Assignee

Comment 12

•

7 years ago

Oh dear. On linux it appears that we synchronously flush everything a lot more than we should. The main thread looks like this: > pthread_cond_wait > 0x55dcc715b89a > mozilla::ipc::MessageChannel::Send(IPC::Message*, IPC::Message*) > mozilla::layers::PCompositorBridgeChild::SendFlushRendering() > PCompositorBridge::Msg_FlushRendering > mozilla::layers::WebRenderLayerManager::FlushRendering() > nsViewManager::Refresh(nsView*, mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel> const&) > nsViewManager::PaintWindow(nsIWidget*, mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel> const&) > nsView::PaintWindow(nsIWidget*, mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel>) > draw_window_of_widget(_GtkWidget*, _GdkWindow*, _cairo*) > expose_event_cb(_GtkWidget*, _cairo*) > gtk_main_do_event Flush rendering flushes *everything* including low priority transactions. From there I think that there are two things we can do independently: - Figure out why we are doing these synchronous flushes while merely hovering tabs - Not block on low priority work when flushing (or only when we really need to)

Nicolas Silva [:nical]

Assignee

Comment 13

•

7 years ago

Attached file Bug 1479912 - Clear WebRender resources asynchronously. r=sotaro — Details

Sotaro Ikeda [:sotaro]

Updated

•

7 years ago

Comment 14

•

7 years ago

Attached file Bug 1479912 - Don't rasterize blobs in parallel when there isn't a lot of blobs. r=jrmuizel (obsolete) — Details

Nicolas Silva [:nical]

Assignee

Updated

•

7 years ago

Depends on: 1495228

Jeff Muizelaar [:jrmuizel]

Comment 15

•

7 years ago

(In reply to Nicolas Silva [:nical] from comment #14) > Created attachment 9013968 [details] > Bug 1479912 - Don't rasterize blobs in parallel when there isn't a lot of > blobs. r=jrmuizel Does this make an noticeable performance difference? i.e. Can you include some more motivation in the commit message?

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 16

•

7 years ago

I kicked a couple of try pushes with 5 and 3 blobs as the threshold, we'll see what talos thinks about it. As far as I'm concerned I didn't notice a difference but that was from browsing with a powerful computer and it'd be worth experimenting with the constant on slower hardware.

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 17

•

7 years ago

parallel rasterization with 3 blobs or more: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9ae03cdc5bf8&newProject=try&newRevision=d733e9b56e012187f4da5ff3b52dbabf7fc8e921&framework=1&showOnlyComparable=1 parallel rasterization with 5 blobs or more: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9ae03cdc5bf8&newProject=try&newRevision=76a6862ff28b19a675aa16cd023a523973988bfa&framework=1 5+ blobs compared to 3+ blobs: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=76a6862ff28b&newProject=try&newRevision=d733e9b56e012187f4da5ff3b52dbabf7fc8e921&framework=1 It doesn't look like there's very significant difference between the two, but they both appear to improve tsvgx. I'm kicking some more jobs to reduce the noise.

Nicolas Silva [:nical]

Assignee

Comment 18

•

7 years ago

Attached file Bug 1479912 - Don't rasterize blobs in parallel when there isn't a lot of blobs. r=jrmuizel — Details

Phabricator Automation

Updated

•

7 years ago

Attachment #9013968 - Attachment is obsolete: true

Pulsebot

Comment 19

•

7 years ago

Pushed by nsilva@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/bea77a4e9699 Don't rasterize blobs in parallel when there isn't a lot of blobs. r=jrmuizel

Nicolas Silva [:nical]

Assignee

Updated

•

7 years ago

Keywords: leave-open

Markus Stange [:mstange]

Reporter

Comment 20

•

7 years ago

There isn't much justification in the commit message. Can you explain why it solves the bug as filed? Or does it not solve it and that's why the bug is going to stay open?

Tiberius Oros[:tiberius_oros]

Comment 21

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/bea77a4e9699

Nicolas Silva [:nical]

Assignee

Comment 22

•

7 years ago

> There isn't much justification in the commit message. Can you explain why it solves the bug as filed? Or does it not solve it and that's why the bug is going to stay open? There are several sources of jank: - priority inversion happening when a high priority blob rasterization request for the UI has to wait for a busy thread pool full of expensive low-priority work to pick it up, - any time we call FlushSceneBuilder on the compositor thread or other blocking api method when there is a lot of work happening on the scene builder thread. The patch that just landed addresses most of the former scenario by avoiding rayon when there are few enough jobs that potential parallelism might not be worth the cost of dispatching to the thread pool. This could be avoided if the job scheduling system we used was able to have the caller thread start doing work while letting the workers steal from it, alas rayon doesn't work that way. The other patch on this bug hasn't landed yet (it needed something to be done on the webrender side first) but it should be good now (I just need to rebase it), and addresses one of the big offenders of the second source of jank: when closing a tab, the cleanup code does synchronous work which can take a long time and freeze the compositor thread. There will still be similar potential UI jank after these two patches due to remaining synchronous calls But I think these two patches get rid of most of the issues I was able to reproduce.

Ionuț Goldan [:igoldan]

Comment 23

•

7 years ago

We gained perf improvements: == Change summary for alert #16663 (as of Wed, 10 Oct 2018 16:45:02 GMT) == Improvements: 8% tsvgx linux64-qr opt e10s stylo 263.44 -> 242.28 4% tsvgx windows10-64-qr opt e10s stylo 267.15 -> 255.19 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=16663

Pulsebot

Comment 24

•

7 years ago

Pushed by nsilva@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/59b465985117 Clear WebRender resources asynchronously. r=sotaro

Daniel Varga [:dvarga]

Comment 25

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/59b465985117

Nicolas Silva [:nical]

Assignee

Updated

•

7 years ago

Status: REOPENED → RESOLVED

Closed: 7 years ago → 7 years ago

Keywords: leave-open

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.