Closed Bug 1476368 Opened 7 years ago Closed 4 years ago

https://old.reddit.com/r/nier/ does not run smoothly at 120fps

Categories

(Core :: Graphics: WebRender, defect, P5)

63 Branch
x86_64
Windows 10
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: pnm79623, Assigned: nical, NeedInfo)

References

(Depends on 1 open bug, )

Details

(Keywords: nightly-community)

Attachments

(14 files)

4.72 MB, video/webm
Details
17.16 KB, text/plain
Details
9.99 MB, video/webm
Details
9.01 MB, video/webm
Details
9.95 MB, video/webm
Details
4.49 MB, video/webm
Details
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0 Build ID: 20180713213322 Steps to reproduce: Open https://old.reddit.com/r/nier/ When page finishes loading scrolling becomes consistently choppy. If animation is blocked, scrolling becomes normal again. Actual results: Pages with complex animations that use transparency degrade scrolling performance when using WebRender. Expected results: Scrolling is smooth like when using Direct3D 11 Advanced Layers Compositing.
Attached video Video demonstration
Video demonstration, first half WebRender OFF second half WebRender ON.
Could you please open about:support, click on the "Copy text to clipboard" button, paste it into a text file and upload it here (Attach File)? Thanks!
OS: Unspecified → Windows 10
Hardware: Unspecified → x86_64
This seems to run well for me on Mac. I'll try it on Windows.
Flags: needinfo?(jmuizelaar)
What's the refresh rate of you monitor? Can you also turn on gfx.webrender.debug.gpu-time-queries and gfx.webrender.debug.gpu-sample-queries?
Flags: needinfo?(jmuizelaar) → needinfo?(pnm79623)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #6) > What's the refresh rate of you monitor? Can you also turn on > gfx.webrender.debug.gpu-time-queries and > gfx.webrender.debug.gpu-sample-queries? Native 120Hz display. Attaching new video.
Flags: needinfo?(pnm79623)
Great. It looks like we might be bottlenecked on the CPU side. Can you attach another video that has layers.acceleration.draw-fps on and WebRender off?
Flags: needinfo?(pnm79623)
CPU is i7-4790k@4.4 on all cores (power states/steps/saving disabled)
Flags: needinfo?(pnm79623)
Summary: Animations with transparency causes choppy scrolling → https://old.reddit.com/r/nier/ does not run smoothly at 120fps
Depends on: 1477358
So I remembered that gfx.webrender.debug.gpu-time-queries and gfx.webrender.debug.gpu-sample-queries can have a big impact on performance. If you turns those off and set gfx.webrender.debug.compact-profiler=true do you get a reasonably consistent frame rate of 120fps during scrolling? If so is it still choppy even though the frame-rate says 120fps?
Flags: needinfo?(pnm79623)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #11) > So I remembered that gfx.webrender.debug.gpu-time-queries and > gfx.webrender.debug.gpu-sample-queries can have a big impact on performance. > If you turns those off and set gfx.webrender.debug.compact-profiler=true do > you get a reasonably consistent frame rate of 120fps during scrolling? If so > is it still choppy even though the frame-rate says 120fps? With WebRender I'm getting 120FPS while "stationary". During scrolling FPS drops but rarely to double digits. You can see it in this attachment. https://bug1476368.bmoattachments.org/attachment.cgi?id=8993540 With advanced layers FPS stays very high even during scrolling, though it still dips to 117-113 it doesn't feel weird like when WR is used. On the other note, yesterday I tied forcing maximal performance mode in GPU for Firefox but it help at all. Is there anything else I can do that might help?
Flags: needinfo?(pnm79623)
There might be a frame rate consistency issue here. We'll try to add some better metrics to the HUD to get a better idea of what's going on. Can you install the Gecko profiler add-on https://perf-html.io/, open the add-on and go to settings, add the following threads "RenderBackend,Renderer,WebRender,Wr" to the list and then get a profile of the scrolling slowdown?
Flags: needinfo?(pnm79623)
Priority: -- → P3
Attached video CPU usage
So I just quickly checked if anything changed in almost a month of nightly development but unfortunately even more problems appeared. CPU usage is very high on that page (4 times higher than without webrender 20% vs 5%). CPU usage drops to 0 if I close the tab or minimize Firefox. I will test performance with that addon later.
Flags: needinfo?(pnm79623)
Priority: P3 → P4
Jeff -- Have you ever reproduced this on Windows/nvidia? Have we added metrics (ref: Comment 13)? Reporter -- Does this still happen for you? What driver are you using? (nvidia? Intel? something else?) Have you only seen it on one machine so far? Thanks!
Flags: needinfo?(pnm79623)
Flags: needinfo?(jmuizelaar)
I still see some weird frame pacing on this profile: https://perfht.ml/2EaT6oV
Flags: needinfo?(jmuizelaar)
(In reply to Maire Reavy [:mreavy] Plz needinfo from comment #15) > Jeff -- Have you ever reproduced this on Windows/nvidia? Have we added > metrics (ref: Comment 13)? > Reporter -- Does this still happen for you? What driver are you using? > (nvidia? Intel? something else?) Have you only seen it on one machine so > far? > > Thanks! This is still happening and CPU usage is relatively very high comparing to advanced layers. nVidia GTX1070, drivers 416.34, windows 10 1809
Flags: needinfo?(pnm79623)

How does this look for you now?

Flags: needinfo?(pnm79623)
Priority: P4 → P5

This page performs poorly on my x1 carbon laptop (linux + intel). From a quick look in perf there seem to be some low hanging fruits to pick. Here' what stands out at a glance:

  • The average number of retired instruction per cycle is low (usually means unhappy caches). On this CPU I usually get about 2.0 ins/cycle for instruction-bound workloads and here I'm getting 0.6.
  • There's a lot of page faults happening in RenderTaskTree::add. Looks like we can't recycle the allocation because the tree is sent to the render thread but we can record the previous allocated size and pre-allocate the vectors each frame (*).
  • A lot of instruction cache misses on the render thread in driver code and in draw_instanced_batch. Dzmitry's suggestion to remove redundant gl calls might help here (Edit: I got mixed up, the suggestion was in another bug).
  • PrimitiveStore::update_visibility is is high in number of samples and also in the number of data cache misses. This one might not be a low hanging fruit but I'm pointing it out because this function consistently shows up at the top of profiles for me lately.

(*): Actually, even though we send the RenderTaskTree to the renderer it looks like we only use it there for debugging purposes ... aaand no we do use it for non-debug things as well.

On a machine with a more powerful CPU and a 4k screen the biggest problem is GPU times with lots of time spent in B_Blend.
(Edit: I had picture caching disabled, my bad, it does wonders on this page).

On the CPU side it clearly doesn't help that the banner at the top has a css animation on background-position which is causing us to continuously go through DL building, scene building, frame building and rendering even when it is off-screen.

Optimizations will help but the best way to really cut CPU times for this type of pages is to add support for more animated properties.

Pushed by nsilva@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b7ad79d07c44 Preallocate the render task tree. r=kvark
Keywords: leave-open
Pushed by nsilva@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/df0e32716df1 Preallocate a few more items in the render task tree. r=kvark

Here is a series of small changes in a general attempt to reduce the __memmove_avx_unaligned_erms samples that were towards the top of the profile and consolidated into a single group. Perf wouldn't give me the stacks for these samples unfortunately, so I resorted to gdb breakpoints to figuring where a lot of these memmoves come from.

Removing some of the memmoves might not yield real perf wins if the time was mostly spent waiting for cold misses (the read will still happen), but if anything these patches reduce the amount of perf samples that fall into the __memmove_avx_unaligned_erms bucket and the potential cache misses move into hopefully more helpful symbols. Also the changes are generally trivial.

One of the most notable source of frequent small __memmove_avx_unaligned_erms on the render backend comes from moving TransformUpdateState in ClipScrollTree::update_tree. It isn't as straighforward to reduce as the first wave of changes, though.

Another thing that came up while profiling this page is the cost of moving/hashing/cloning FontInstace, but that required more involved surgery so I filed bug 1529272 for that.

Pushed by nsilva@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/b512bb508796 Pre-allocate vectors in HitTester::read_clip_scroll_tree. r=kvark https://hg.mozilla.org/integration/mozilla-inbound/rev/932ab91a896e Pre-allocate primitives vector in setup_picture_caching. r=gw https://hg.mozilla.org/integration/mozilla-inbound/rev/d0c93c9acd66 Avoid moving texture cache entries when evicting them. r=gw https://hg.mozilla.org/integration/mozilla-inbound/rev/659354cec17d Reserve storage for the dynamic property vectors in the bindings before filling them. r=kvark https://hg.mozilla.org/integration/mozilla-inbound/rev/9af23e1d86c8 Avoid moving picture primitives when destroying them. r=gw https://hg.mozilla.org/integration/mozilla-inbound/rev/7102801e2ca8 Add VecHelper::take/take_and_preallocate. r=gw

The leave-open keyword is there and there is no activity for 6 months.
:jbonisteel, maybe it's time to close this bug?

Flags: needinfo?(jbonisteel)

We can leave it open for now

Flags: needinfo?(jbonisteel)

The leave-open keyword is there and there is no activity for 6 months.
:jbonisteel, maybe it's time to close this bug?

Flags: needinfo?(jbonisteel)

Does this still need to be left open, Nical?

Flags: needinfo?(jbonisteel) → needinfo?(nical.bugzilla)

Performance isn't great but not aweful on this page these days. I don't have a way to test at 120fps, though. We could leave it open and revisite when progress is made on the two dependencies.

Flags: needinfo?(nical.bugzilla)

The leave-open keyword is there and there is no activity for 6 months.
:jimm, maybe it's time to close this bug?

Flags: needinfo?(jmathies)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(jmathies)

The leave-open keyword is there and there is no activity for 6 months.
:jimm, maybe it's time to close this bug?

Flags: needinfo?(jmathies)
Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(jmathies)
Resolution: --- → FIXED
Assignee: nobody → nical.bugzilla
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: