Multi-second lags on Firefox UI operations in GFX code
Categories
(Core :: Graphics, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox70 | + | wontfix |
firefox71 | - | fix-optional |
People
(Reporter: bzbarsky, Assigned: bradwerth)
Details
(Keywords: perf, regression)
Attachments
(1 file)
385.67 KB,
application/x-gzip
|
Details |
For a while now I've been seeing Firefox on Mac get really laggy after running for a few days. Opening new tabs, opening windows, closing tabs, switching tabs, closing windows, all laggy.
Today I finally managed to get some profiles of this in a recent-ish nightly. This nightly performed fine when I first started it 5 days ago. In case it matters, I have somewhere between 60 and 70 windows open.
https://perfht.ml/2JGnys1 is a profile from switching tabs, iirc what I was doing before that pause. It covers about 5.5 seconds, most of which was laggy; the longest pause in there is 3.5 seconds.
https://perfht.ml/2JDnc5d is a profile from opening a new window. It's again about 4.5s, with the longest single pause in there about 1.25s.
Both profiles show a bunch of time taken in memmove and ARGBSetRow_X86 in the parent process, both under graphics code. I don't know to what extent the issue may be swapping, but about:memory does not show that many hard page faults in the parent process....
Both profiles also show about 1GB of memory allocation taking place over the 4.5s of the profile, if I read them correctly. That's about how big the total gfx-textures number is in about:memory for the parent process.
This feels like a regression sometime in late spring or early summer. I definitely don't recall this sort of behavior in April... Sorry about not filing this earlier, but I didn't have hard profile data until today.
Are there any known changes that could have caused us to touch a bunch more texture memory sometime in the last few months in multiple-windows situations?
![]() |
Reporter | |
Comment 1•6 years ago
|
||
Another profile just now closing a tab, with a 4.8s pause: https://perfht.ml/2JDpa5t
Comment 2•6 years ago
|
||
What surprises me the most in the first profile is the fact that there are about 60 paints within a single refresh tick. What could cause that?
![]() |
Reporter | |
Comment 3•6 years ago
|
||
What could cause that?
Would having about 60 windows open do it?
![]() |
Reporter | |
Comment 4•6 years ago
|
||
I had to restart the browse to capture this log, so at the moment it's not showing the bad lagginess, but I figured I'd get it uploaded for a start. Once the lag comes back I will do another dump.
Comment 5•6 years ago
|
||
Could this have something to do with downloading? During downloading we paint all the windows.
![]() |
Reporter | |
Comment 6•6 years ago
|
||
I was not downloading during those lags, though I had downloaded previously during the session. I am aware of the laggy "repaint all the progress indicators" thing downloading does...
It's also possible that the profiler icon itself, which is present in all windows, is triggering the paints, I guess. But the lags were happening even with the profiler stopped...
Comment 7•6 years ago
|
||
It looks like for some reason we decided that we needed to repaint all 60 windows, it's hard to tell what caused that :(
It also looks like we've discarded all our tile backbuffers (to save memory), so the vast majority of the main thread profile is allocating tiles, clearing them and then copying the front buffer to the new back buffer (I think we're touching every pixel twice unfortunately).
Compositing is just slow inside swap buffers, but we do asynchronously uploading of texture data so all those brand new tiles being uploaded is likely hidden inside this time.
Comment 8•6 years ago
|
||
Not clearing new tile buffers when we're also going to copy the front buffer in would help a bit, though we're working around a skia issue where it doesn't properly understand RGBX, so we have to initialize the alpha channel to opaque.
It'd also be interesting to see if we could find out from the OS (or from focus?) which window(s) were on top, and prioritize painting them first. The second paint on the focused window would still be slow though, unless we can do something really magical.
It looks like compositing is the long pole though, and nothing stands out as obviously broken there.
In general though, trying to paint 60 windows at the same time is probably always going to be pretty slow, so trying to figure out why we're doing that seems like the most important thing.
![]() |
Reporter | |
Comment 9•6 years ago
|
||
One interesting thing about the "repaint all 60 windows" bit: I have the windows on multiple spaces (on OS X). There were definitely not 60 windows that were even potentially visible at the time when I took those profiles. I don't recall which exact space I was on, but it was somewhere between 3 and maybe 40.
I can pretty easily run a custom browser against my session and log whatever if people want; it's possible that we paint all 60 windows all along but haven't discarded the tile backbuffers until a while into the session, and then we might be able to at least get some info on why we're painting all those windows. Any suggestions for things to log?
Comment 10•6 years ago
|
||
I'd guess we'd want to know when we're painting all 60 windows. Which I'd guess is just PresShell::Paint. Once we can see clearly when all 60 windows are painting we'd have to work backwards to figure out the cause.
Updated•6 years ago
|
![]() |
Reporter | |
Comment 11•6 years ago
|
||
Which I'd guess is just PresShell::Paint
Sure, being called from nsRefreshDriver::Tick
via nsViewManager::ProcessPendingUpdates
. At least that's what I assume Markus was referring to in the attached profiles.
So isn't the real question "why are we ticking all the refresh drivers"? Esp. on Mac, where I thought we didn't do that for things that are occluded.
Comment 12•6 years ago
|
||
OK, as I understand it there may be a perf issue here with Macs that we haven't seen widely reported and aren't seeing in our test suite. It sounds like a bad issue but maybe something of an edge case. Jeff, bz, does that sound fair? Should this be P3 or do you think it still needs more investigation?
![]() |
Reporter | |
Comment 13•6 years ago
|
||
The wideness of the reporting is unclear: it's actually very common for people to complain about the lagginess of our UI performance on Mac. That said, the behavior I am seeing seems to be a regression, and one recent enough to not be in release yet. If true, then on the one hand, it's good because it means it's not really biting most users, but on the other hand it would reduce the extent to which it would be reported.
I agree that this is somewhat of an edge case in terms of number of open windows.
This needs more investigation at least on my part, because if nothing else it absolutely craters my productivity to have multi-second lags any time I try to do anything in Bugzilla. It's bad enough I've been considering a different browser for my daily browsing.... What I could really use is a hand in figuring out what needs to be poked at, exactly. I am still running a browser instance that I can use to capture the logs akin to comment 4 that mstange asked for, but so far it's not showing the performance issue, as far as I can tell.
Comment 14•6 years ago
|
||
Oh, good point, since this is a recent issue then we may start to see more reports of this problem as 69 goes to release. I just want to call this out for the gfx team folks in this bug since marking it P3 usually means it goes into backlog.
Comment 15•6 years ago
|
||
Changing the priority to p2 as the bug is tracked by a release manager for the current nightly.
See What Do You Triage for more information
![]() |
Reporter | |
Comment 16•6 years ago
|
||
So I am seeing this happening again. Any data I can gather? Would a log like comment 4 be helpful?
Comment 17•6 years ago
|
||
Following up by emailing Jeff.
Comment 18•6 years ago
|
||
bz, are you sure that this is a regression? Is it still lagging your Mac ? Maybe a log would be useful.
Comment 19•6 years ago
|
||
Boris, can you confirm that the problem only happens when we're painting all of the windows? and if so can you find out why all the windows are painting?
![]() |
Reporter | |
Comment 20•6 years ago
|
||
bz, are you sure that this is a regression?
I'd say about 60% sure. I don't really recall this happening until this summer, but it's possible that something about my usage patterns changed too...
Is it still lagging your Mac ?
I can keep an eye on it. I've been restarting pretty often in an attempt to avoid this problem and the memory leaks described in bug 1549819.
Maybe a log would be useful.
What sort of log?
can you confirm that the problem only happens when we're painting all of the windows?
I don't know how to tell whether that's what's happening or not. If that information is available from profiles, I can try catching the problem in the profiler again and seeing what's going on there.
if so can you find out why all the windows are painting?
Any suggestions on how I do that, especially given the somewhat-intermittent nature of the problem?
Comment 21•6 years ago
|
||
I had a look at the profile again. And in this range it clearly looks like we're doing a bunch of display list builds all in a row. This very much suggests painting a bunch of windows. If this happens to you again, we can probably confirm that the number of display list builds corresponds roughly to the number of windows that you have open.
The next thing to do is probably to turn on paint flashing to see if you can figure out what's being painted in all of the other windows.
![]() |
Reporter | |
Comment 22•6 years ago
|
||
Thanks, I will give that a shot. I'll also end up running a build which has the fix for bug 1574538, in case that matters.
Comment 23•6 years ago
|
||
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #11)
So isn't the real question "why are we ticking all the refresh drivers"? Esp. on Mac, where I thought we didn't do that for things that are occluded.
We still do it for parent process refresh drivers. Just not for tab content refresh drivers. I've filed bug 1578123 about that.
Updated•6 years ago
|
Comment 24•6 years ago
|
||
Fix-optional for 71 as we are in beta now.
Comment 25•3 years ago
|
||
In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.
Comment 26•2 years ago
|
||
The severity field is not set for this bug.
:bhood, could you have a look please?
For more information, please visit auto_nag documentation.
Comment 27•2 years ago
|
||
Brad, do you suppose this report is still relevant in current versions of Fx on macOS?
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 28•2 years ago
|
||
Based on how this was happening years ago, if it wasn't already fixed intentionally, it will have been fixed by the landing of Bug 1768495.
Description
•