1678935 - Extremely low fps with translateZ since 83.0

erik.faulhaber

Reporter

Description

•

4 years ago

Attached file performance_analysis.zip — Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0

Steps to reproduce:

I created a minimal example to reproduce:
https://codepen.io/erik-f/pen/PozrrPb

I could reproduce the bug on multiple devices running Windows 10 with Firefox 83.0.
In Firefox 83.0 on Archlinux everything works fine.
In Firefox 82.0.3 on Windows everything works as expected too.

Actual results:

While developing a website with parallax scrolling (using translateZ), I experienced horrible lags a few days ago. While reducing the source code to a minimal example, I accidentally broke the parallax effect, but the performance issues still persist (that's what you can see in the codepen above).

Both the working parallax website and the codepen above, which doesn't have any visible parallax effect now, lag horribly while scrolling and resizing the window.

I did a performance analysis while slowly resizing the window in both 83.0 and 82.0.3 on Windows 10. I attached both results below.
In 82.0.3 the frame rate is constantly at 60 fps. In 83.0 the frame rate is mostly around 5 fps. GPU usage (GTX 1080) is around 10% on 82.0.3 and around 60% on 83.0.

The problem seems to scale with the screen resolution. In a maximized window on a 1440p monitor it's horrible, while it's not very noticeable in a small window.

Note: Even when removing the translateZ lines from the CSS file, resizing is still not 100% smooth as in 82.0.3. However, it doesn't seem to happen exclusively with SVG files. We tried PNGs instead and while the performance was a lot better, there was still a noticeable lag with very large PNGs (again, everything works fine in 82.0.3).

Expected results:

I expect smooth scrolling and resizing at 60 fps like in 82.0.3.

Daniel Bodea [:danibodea]

Comment 1

•

4 years ago

I have to mention that, in my experience, this issue only occurs when in Maximized window mode, does not occur when the window is smaller than the whole screen.
Secondly, the lag while zooming in and out is still seen in versions as old as Nighty v78.0a1, so this issue will address the scrolling.
Thirdly, the same issue is seen in Firefox Release v82.0.1, but I went further back and I observed that Nightly v78.0a1 does have a smooth scrolling action, so I performed a regression and these are my results:

2020-11-26T14:01:20: DEBUG : Found commit message:
Bug 1623715 - [8.2] Move media fullscreen event to JS and extend its metadata. r=geckoview-reviewers,snorp,alwu
Differential Revision: https://phabricator.services.mozilla.com/D86350
2020-11-26T14:01:20: DEBUG : Did not find a branch, checking all integration branches
2020-11-26T14:01:20: INFO : The bisection is done.
2020-11-26T14:01:20: INFO : Stopped

This issue is not observed on Mac OS 10.15.6 or Ubuntu 20.04.

I have chosen the (Core) Web Painting component for this issue. Please set a more appropriate one if incorrect.

Status: UNCONFIRMED → NEW

status-firefox83: --- → affected

status-firefox84: --- → affected

status-firefox85: --- → affected

status-firefox-esr78: --- → unaffected

Component: Untriaged → Web Painting

Ever confirmed: true

Keywords: regression

OS: Unspecified → Windows

Product: Firefox → Core

Regressed by: 1623715

Hardware: Unspecified → Desktop

•

4 years ago

I also crashed trying to test this on MacOS, in tex_sub_image_2d_pbo.

Matt Woodrow (:mattwoodrow)

Updated

•

4 years ago

Blocks: gfx-triage

Alastor Wu [:alwu]

Updated

•

4 years ago

No longer regressed by: 1623715

erik.faulhaber

Reporter

Comment 6

•

4 years ago

I spent the last days investigating this bug.
After a lot of manual bisecting (unfortunately I didn't know of the mozregression tool until today) I found that the extremely laggy resizing actually originated in 2019 (0e4d7f204a27).
However, the scrolling lag seems to be independent of this. The scrolling definitely works fine in 2020-01-01-09-29-38 and is definitely broken in 2020-09-01-09-45-42. I tried the regression tool multiple times, but it seems to me that it's not one particular revision that is causing the lag. It rather seems to me that there are several versions that are "a bit worse" and they build up to the very broken version 2020-09-01-09-45-42.

Both the scrolling and the resizing are definitely broken in 2020-09-01-09-45-42. We made another codepen where the parallax effect actually works and the scrolling seems to be even worse than in the other one: https://codepen.io/lucamarcelpeters/full/OJRJgBR

I wondered why it worked in 82.0.3 (and 82.0 and 82.0b1), but not in 83.0.
It must have been fixed somewhere after 2020-09-01-09-45-42 (which is 82.0a1), but it seems to be that this fix didn't make it into 83.0 for some reason. Shouldn't the release branch containing the fix have been merged back to central and beta?
I tried following the bug fix back from NIGHTLY_82_END (cecca8e30949) where it didn't work to 82.0b1 where everything works fine. Right after the merge revision (acc3d41c2c93) everything works fine. For me, it seems like the bug fix was already in the beta branch before 82 beta and that it didn't get merged back to central. I still don't know why it's bugged again in 83.0 though.

I hope this is useful somehow, as I spent way too much time on it. I'll leave this to someone who actually knows what they're doing now. Please let me know if I'm somehow right with my "the bug fix is in beta but didn't get merged back to nightly" theory or if that is all nonsense.

Flags: needinfo?(erik.faulhaber)

Ryan VanderMeulen [:RyanVM]

Comment 7

•

4 years ago

(In reply to erik.faulhaber from comment #6)

I hope this is useful somehow, as I spent way too much time on it. I'll leave this to someone who actually knows what they're doing now. Please let me know if I'm somehow right with my "the bug fix is in beta but didn't get merged back to nightly" theory or if that is all nonsense.

Not all features that are enabled on Nightly remain enabled when a release goes to Beta. So what you're seeing is very possibly due to Nightly vs. Beta configuration issues (in particular differences in WebRender being enabled or not). In general, all code changes land on Nightly before being merged into Beta, so that's not likely to be the explanation here.

FWIW, on my Win10 system, I see a noticeable drop in scrolling performance on release between Fx80 and Fx81. I was able to bisect with mozregression:

 3:09.34 INFO: Last good revision: 4b8de762e09740f9d140a0a097922fbccc4d1406
 3:09.34 INFO: First bad revision: c8ca1d1866e7e3591d2df84c2a4f0204d43386ed
 3:09.34 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=4b8de762e09740f9d140a0a097922fbccc4d1406&tochange=c8ca1d1866e7e3591d2df84c2a4f0204d43386ed

Which fits the regression range found in comment 1 with the notable difference of having some WebRender changes prior to the Android ones noted in that comment. I don't know whether bug 1623792 or bug 1658182 is more likely to the culprit here, but those at least seem plausible.

status-firefox83: affected → wontfix

status-firefox84: affected → wontfix

Flags: needinfo?(daniel.bodea) → needinfo?(gwatson)

erik.faulhaber

Reporter

Comment 8

•

4 years ago

Thank you, I was really stupid. I can confirm, that's where it breaks.

I tried the regression tool multiple times, but it seems to me that it's not one particular revision that is causing the lag. It rather seems to me that there are several versions that are "a bit worse" and they build up to the very broken version 2020-09-01-09-45-42.

It turned out that I didn't realize I'm launching the wrong version with mozregression --launch without using --repo autoland. I did several regressions and always ended up with the same output as you. Unfortunately, I tried testing these two builds again without --repo autoland and couldn't find a difference (duh!).
However, I wasn't completely wrong. It is getting "a bit worse" before it's completely breaking. I did another regression and that's what I came up with (using the minimal example from my last comment, https://codepen.io/lucamarcelpeters/full/OJRJgBR):

As stated in my first comment, the laggy resizing starts in 0e4d7f204a27. Scrolling is still 100% smooth here though.
The first (subtle) drop in scrolling performance led me to these regression results:

4:15.15 INFO: Last good revision: 2d55f2c0fc33eda6c995ea77bb7fe59b86bba6f0
4:15.15 INFO: First bad revision: 8f4b47079a44eeea87caa560b3b072148551aa3c

In 2d55f2c0fc33 I get solid 60 fps minimum when scrolling. In 8f4b47079a44 it drops to 35 fps.
3. The second regression led me to the same results as [:RyanVM], in 4b8de762e097 I get 28 fps minimum, in c8ca1d1866e7 scrolling only works with 10 fps.

Jim Mathies [:jimm]

Comment 9

•

4 years ago

•

Edited

https://hg.mozilla.org/mozilla-central/rev/8f4b47079a44
https://hg.mozilla.org/mozilla-central/rev/c8ca1d1866e7

Glenn Watson [:gw]

Assignee

Comment 10

•

4 years ago

Might be fixed by https://phabricator.services.mozilla.com/D98043?

Flags: needinfo?(gwatson)

Matt Woodrow (:mattwoodrow)

Comment 11

•

4 years ago

Jamie, I think you recently changed our behaviour around managing large textures in the texture cache. Did that fix this, and if not, should it?

Flags: needinfo?(jnicol)

Ryan VanderMeulen [:RyanVM]

Comment 12

•

4 years ago

Doesn't scroll any different for me on a current Nightly build.

erik.faulhaber

Reporter

Comment 13

•

4 years ago

I just tested 0ee685602a7f (as suggested by [:gw]) and I get 28 fps again. Same in today's Nightly build, @[:RyanVM].

So the issues introduced in c8ca1d1866e7 seem to be fixed now.
I still don't get near 60 fps like before 8f4b47079a44 though.

Jamie Nicol [:jnicol] out of office until 6th Jan

Comment 14

•

4 years ago

•

Edited

Jamie, I think you recently changed our behaviour around managing large textures in the texture cache. Did that fix this, and if not, should it?

•

4 years ago

I tried to look into this - for some reason I can't explain the SVG files on that domain won't load for me. I tried two different internet connections, both of them fail. tracepath also times out on that domain, somewhere in the US.

It's probably some temporary routing issue? But if someone is able to attach the test case directly to the bug, that would be great.

Assignee: nobody → gwatson

Flags: needinfo?(gwatson) → needinfo?(erik.faulhaber)

erik.faulhaber

Reporter

Comment 18

•

4 years ago

Attached file parallax_svgs.zip — Details

Flags: needinfo?(erik.faulhaber)

erik.faulhaber

Reporter

Comment 19

•

4 years ago

I attached the SVG files. Hopefully, that's just a temporary routing issue, as we will eventually deploy our application to this domain and server.

Glenn Watson [:gw]

Assignee

Comment 20

•

4 years ago

Yes, it seems like it was a temporary routing issue, the page is loading correctly for me here now. Thanks!

Glenn Watson [:gw]

Assignee

Comment 21

•

4 years ago

OK, there's multiple issues involved here - some of them are related:

The texture cache eviction bug referenced above (which is now fixed, and has improved things).
The SVGs are rasterized at a very large size (12k x 4k) on my screen. I think this is due to the scale transform. There is some planned work to improve this by working out a better scale to rasterize SVG files at.
We currently treat all rasterized SVG files as being possibly translucent - if we can detect that the background layer(s) are opaque, WR will use that to reduce the blending cost for those layers.
Since the rasterized images are very large, they get split into tiles and spread across multiple texture cache pages. WR currently does a bad job of batching tiled images which span multiple texture pages.

I will look into a fix for 4 today - that should be a reasonably simple fix, and then we can hand this off to people who will be looking at 3 and 4 for further improvements.

Glenn Watson [:gw]

Assignee

Comment 22

•

4 years ago

Another issue causing high draw calls is that when we have a tiled image, we only calculate the visible tiles once, and then replay that visible image tile list across all picture cache tiles.

In this case, each SVG file has ~135 visible tiles, which replayed across 5 - 6 picture cache tiles, multiplied by the number of parallax layers.

Even with the fix for (4) above, this is still a high draw call count. To fix this, we need to calculate a visible set of image tiles per picture cache tile (I'm planning to implement this in the new year, for other reasons).

Glenn Watson [:gw]

Assignee

Updated

•

4 years ago

Depends on: 1683962

Julien Cristau [:jcristau]

Updated

•

4 years ago

status-firefox85: affected → wontfix

status-firefox86: --- → affected

Jim Mathies [:jimm]

•

8 months ago

Blocks: wr-perf

performance_analysis.zip 4 years ago erik.faulhaber 5.38 MB, application/x-zip-compressed		Details
parallax_svgs.zip 4 years ago erik.faulhaber 87.71 KB, application/x-zip-compressed		Details