Extremely low fps with translateZ since 83.0
Categories
(Core :: Graphics: WebRender, defect, P3)
Tracking
()
People
(Reporter: erik.faulhaber, Assigned: gw)
References
(Blocks 1 open bug)
Details
(Keywords: regression)
Attachments
(2 files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0
Steps to reproduce:
I created a minimal example to reproduce:
https://codepen.io/erik-f/pen/PozrrPb
I could reproduce the bug on multiple devices running Windows 10 with Firefox 83.0.
In Firefox 83.0 on Archlinux everything works fine.
In Firefox 82.0.3 on Windows everything works as expected too.
Actual results:
While developing a website with parallax scrolling (using translateZ), I experienced horrible lags a few days ago. While reducing the source code to a minimal example, I accidentally broke the parallax effect, but the performance issues still persist (that's what you can see in the codepen above).
Both the working parallax website and the codepen above, which doesn't have any visible parallax effect now, lag horribly while scrolling and resizing the window.
I did a performance analysis while slowly resizing the window in both 83.0 and 82.0.3 on Windows 10. I attached both results below.
In 82.0.3 the frame rate is constantly at 60 fps. In 83.0 the frame rate is mostly around 5 fps. GPU usage (GTX 1080) is around 10% on 82.0.3 and around 60% on 83.0.
The problem seems to scale with the screen resolution. In a maximized window on a 1440p monitor it's horrible, while it's not very noticeable in a small window.
Note: Even when removing the translateZ lines from the CSS file, resizing is still not 100% smooth as in 82.0.3. However, it doesn't seem to happen exclusively with SVG files. We tried PNGs instead and while the performance was a lot better, there was still a noticeable lag with very large PNGs (again, everything works fine in 82.0.3).
Expected results:
I expect smooth scrolling and resizing at 60 fps like in 82.0.3.
Comment 1•4 years ago
|
||
I have to mention that, in my experience, this issue only occurs when in Maximized window mode, does not occur when the window is smaller than the whole screen.
Secondly, the lag while zooming in and out is still seen in versions as old as Nighty v78.0a1, so this issue will address the scrolling.
Thirdly, the same issue is seen in Firefox Release v82.0.1, but I went further back and I observed that Nightly v78.0a1 does have a smooth scrolling action, so I performed a regression and these are my results:
2020-11-26T14:01:20: DEBUG : Found commit message:
Bug 1623715 - [8.2] Move media fullscreen event to JS and extend its metadata. r=geckoview-reviewers,snorp,alwu
Differential Revision: https://phabricator.services.mozilla.com/D86350
2020-11-26T14:01:20: DEBUG : Did not find a branch, checking all integration branches
2020-11-26T14:01:20: INFO : The bisection is done.
2020-11-26T14:01:20: INFO : Stopped
This issue is not observed on Mac OS 10.15.6 or Ubuntu 20.04.
I have chosen the (Core) Web Painting component for this issue. Please set a more appropriate one if incorrect.
Comment 2•4 years ago
|
||
That is impossible, the bug 1623715 is for Android only and it's nothing to do with painting.
Comment 3•4 years ago
|
||
Can you please look at (or copy) the graphics section of about:support for the good and bad cases here?
Comment 4•4 years ago
|
||
My profile on Windows for this: https://share.firefox.dev/37clRNh
It looks like translateZ is pushing each svg into a separate blob image, and WebRender is really struggling with giant uploads while scrolling.
Comment 5•4 years ago
|
||
I also crashed trying to test this on MacOS, in tex_sub_image_2d_pbo.
Updated•4 years ago
|
Reporter | ||
Comment 6•4 years ago
|
||
I spent the last days investigating this bug.
After a lot of manual bisecting (unfortunately I didn't know of the mozregression
tool until today) I found that the extremely laggy resizing actually originated in 2019 (0e4d7f204a27).
However, the scrolling lag seems to be independent of this. The scrolling definitely works fine in 2020-01-01-09-29-38 and is definitely broken in 2020-09-01-09-45-42. I tried the regression tool multiple times, but it seems to me that it's not one particular revision that is causing the lag. It rather seems to me that there are several versions that are "a bit worse" and they build up to the very broken version 2020-09-01-09-45-42.
Both the scrolling and the resizing are definitely broken in 2020-09-01-09-45-42. We made another codepen where the parallax effect actually works and the scrolling seems to be even worse than in the other one: https://codepen.io/lucamarcelpeters/full/OJRJgBR
I wondered why it worked in 82.0.3 (and 82.0 and 82.0b1), but not in 83.0.
It must have been fixed somewhere after 2020-09-01-09-45-42 (which is 82.0a1), but it seems to be that this fix didn't make it into 83.0 for some reason. Shouldn't the release branch containing the fix have been merged back to central and beta?
I tried following the bug fix back from NIGHTLY_82_END (cecca8e30949) where it didn't work to 82.0b1 where everything works fine. Right after the merge revision (acc3d41c2c93) everything works fine. For me, it seems like the bug fix was already in the beta branch before 82 beta and that it didn't get merged back to central. I still don't know why it's bugged again in 83.0 though.
I hope this is useful somehow, as I spent way too much time on it. I'll leave this to someone who actually knows what they're doing now. Please let me know if I'm somehow right with my "the bug fix is in beta but didn't get merged back to nightly" theory or if that is all nonsense.
Comment 7•4 years ago
|
||
(In reply to erik.faulhaber from comment #6)
I hope this is useful somehow, as I spent way too much time on it. I'll leave this to someone who actually knows what they're doing now. Please let me know if I'm somehow right with my "the bug fix is in beta but didn't get merged back to nightly" theory or if that is all nonsense.
Not all features that are enabled on Nightly remain enabled when a release goes to Beta. So what you're seeing is very possibly due to Nightly vs. Beta configuration issues (in particular differences in WebRender being enabled or not). In general, all code changes land on Nightly before being merged into Beta, so that's not likely to be the explanation here.
FWIW, on my Win10 system, I see a noticeable drop in scrolling performance on release between Fx80 and Fx81. I was able to bisect with mozregression:
3:09.34 INFO: Last good revision: 4b8de762e09740f9d140a0a097922fbccc4d1406
3:09.34 INFO: First bad revision: c8ca1d1866e7e3591d2df84c2a4f0204d43386ed
3:09.34 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=4b8de762e09740f9d140a0a097922fbccc4d1406&tochange=c8ca1d1866e7e3591d2df84c2a4f0204d43386ed
Which fits the regression range found in comment 1 with the notable difference of having some WebRender changes prior to the Android ones noted in that comment. I don't know whether bug 1623792 or bug 1658182 is more likely to the culprit here, but those at least seem plausible.
Reporter | ||
Comment 8•4 years ago
|
||
Thank you, I was really stupid. I can confirm, that's where it breaks.
I tried the regression tool multiple times, but it seems to me that it's not one particular revision that is causing the lag. It rather seems to me that there are several versions that are "a bit worse" and they build up to the very broken version 2020-09-01-09-45-42.
It turned out that I didn't realize I'm launching the wrong version with mozregression --launch
without using --repo autoland
. I did several regressions and always ended up with the same output as you. Unfortunately, I tried testing these two builds again without --repo autoland
and couldn't find a difference (duh!).
However, I wasn't completely wrong. It is getting "a bit worse" before it's completely breaking. I did another regression and that's what I came up with (using the minimal example from my last comment, https://codepen.io/lucamarcelpeters/full/OJRJgBR):
- As stated in my first comment, the laggy resizing starts in 0e4d7f204a27. Scrolling is still 100% smooth here though.
- The first (subtle) drop in scrolling performance led me to these regression results:
4:15.15 INFO: Last good revision: 2d55f2c0fc33eda6c995ea77bb7fe59b86bba6f0
4:15.15 INFO: First bad revision: 8f4b47079a44eeea87caa560b3b072148551aa3c
In 2d55f2c0fc33 I get solid 60 fps minimum when scrolling. In 8f4b47079a44 it drops to 35 fps.
3. The second regression led me to the same results as [:RyanVM], in 4b8de762e097 I get 28 fps minimum, in c8ca1d1866e7 scrolling only works with 10 fps.
Comment 9•4 years ago
•
|
||
Assignee | ||
Comment 10•4 years ago
|
||
Might be fixed by https://phabricator.services.mozilla.com/D98043?
Comment 11•4 years ago
|
||
Jamie, I think you recently changed our behaviour around managing large textures in the texture cache. Did that fix this, and if not, should it?
Comment 12•4 years ago
|
||
Doesn't scroll any different for me on a current Nightly build.
Reporter | ||
Comment 13•4 years ago
|
||
I just tested 0ee685602a7f (as suggested by [:gw]) and I get 28 fps again. Same in today's Nightly build, @[:RyanVM].
So the issues introduced in c8ca1d1866e7 seem to be fixed now.
I still don't get near 60 fps like before 8f4b47079a44 though.
Comment 14•4 years ago
•
|
||
Jamie, I think you recently changed our behaviour around managing large textures in the texture cache. Did that fix this, and if not, should it?
That's the patch Glenn mentions in comment 10, which seems to have helped. Presumably that part was due to bug 1658182 as identified in comment 7.
As for the earlier regression caused by bug 1616901. We no longer use texture arrays and have changed to 2048x2048 2d textures. But the effect is the same: the cache is now split in to multiple fixed size textures instead of fewer massive ones. On my computer I see some frames with really high draw call counts, so I suspect the remaining slowness is due to this. We definitely don't want to go back to larger textures, but maybe batching could be improved on this page somehow.
Updated•4 years ago
|
Updated•4 years ago
|
Comment 15•4 years ago
|
||
Here's an updated profile from MacOS: https://share.firefox.dev/2JSOTtM
Comment 16•4 years ago
|
||
The profile just matches Jamie's explanation from comment 14. We're spending a lot of time issuing draw calls, so improving batching would be the main thing we could do to fix this.
Updated•4 years ago
|
Assignee | ||
Comment 17•4 years ago
|
||
I tried to look into this - for some reason I can't explain the SVG files on that domain won't load for me. I tried two different internet connections, both of them fail. tracepath
also times out on that domain, somewhere in the US.
It's probably some temporary routing issue? But if someone is able to attach the test case directly to the bug, that would be great.
Reporter | ||
Comment 18•4 years ago
|
||
Reporter | ||
Comment 19•4 years ago
|
||
I attached the SVG files. Hopefully, that's just a temporary routing issue, as we will eventually deploy our application to this domain and server.
Assignee | ||
Comment 20•4 years ago
|
||
Yes, it seems like it was a temporary routing issue, the page is loading correctly for me here now. Thanks!
Assignee | ||
Comment 21•4 years ago
|
||
OK, there's multiple issues involved here - some of them are related:
- The texture cache eviction bug referenced above (which is now fixed, and has improved things).
- The SVGs are rasterized at a very large size (12k x 4k) on my screen. I think this is due to the scale transform. There is some planned work to improve this by working out a better scale to rasterize SVG files at.
- We currently treat all rasterized SVG files as being possibly translucent - if we can detect that the background layer(s) are opaque, WR will use that to reduce the blending cost for those layers.
- Since the rasterized images are very large, they get split into tiles and spread across multiple texture cache pages. WR currently does a bad job of batching tiled images which span multiple texture pages.
I will look into a fix for 4
today - that should be a reasonably simple fix, and then we can hand this off to people who will be looking at 3
and 4
for further improvements.
Assignee | ||
Comment 22•4 years ago
|
||
Another issue causing high draw calls is that when we have a tiled image, we only calculate the visible tiles once, and then replay that visible image tile list across all picture cache tiles.
In this case, each SVG file has ~135 visible tiles, which replayed across 5 - 6 picture cache tiles, multiplied by the number of parallax layers.
Even with the fix for (4) above, this is still a high draw call count. To fix this, we need to calculate a visible set of image tiles per picture cache tile (I'm planning to implement this in the new year, for other reasons).
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Description
•