Terrible lag opening new window due to parent process waiting for 15+s (!) on a condition variable in IPC from ClientLayerManager
Categories
(Core :: Graphics, defect, P3)
Tracking
()
People
(Reporter: bzbarsky, Unassigned)
Details
Attachments
(4 files)
See profile at https://perfht.ml/2lzUOrF
It looks like the parent process is doing mozilla::layers::PLayerTransactionChild::SendGetTide
which blocks for 15+s waiting on a condvar under mozilla::ipc::MessageChannel::Send
.
The user-perceived behavior is I hit Cmd+N and then beachball for 15s until the new window opens...
Possibly-relevant: about:memory claims 3GB of heap-unclassified in the parent process, and I am definitely very much swapping (though to SSD, in case that matters).
Reporter | ||
Comment 1•5 years ago
|
||
Hmm. Opening other windows (e.g. Thunderbird composition window or iTunes or Safari) is also either really laggy or doesn't happen at all. So maybe there is something on the operating system side that is being a problem in terms of responding quickly and we're not handling it too well (though better than iTunes or Safari, apparently!).
Comment 2•5 years ago
|
||
The monitor is being waited on until the compositor thread sends its reply to the sync IPC message being sent (SendGetTextureFactoryIdentifier).
https://searchfox.org/mozilla-central/rev/f43ae7e1c43a4a940b658381157a6ea6c5a185c1/gfx/layers/ipc/PLayerTransaction.ipdl#113-114
The target thread of the sync IPC is the compositor thread, which is blocked due to calling into the OS, preventing main thread -> compositor thread IPC from being processed.
https://searchfox.org/mozilla-central/rev/f43ae7e1c43a4a940b658381157a6ea6c5a185c1/gfx/gl/GLContextProviderCGL.mm#187-232
Moving to Graphics, as they are the user of this parent-process main thread sync IPC call.
Comment 3•5 years ago
|
||
Appears to be Mac-specific, if that helps find someone to triage it.
Reporter | ||
Comment 4•5 years ago
|
||
For what it's worth, I have now restarted Firefox. Before I did that, total memory + swap usage was about 22GB (on a 16GB system). When I quit Firefox, that goes down to 8GB or so. When I start Firefox again, it goes to 18GB. Before the restart, pretty much anything that involved creating a new window in any app was laggy-to-broken (e.g. Zoom would not open its window, Emacs would not manage to open a window, etc). The one exception seems to be XQuartz: I could create new xterm windows all I wanted.
Now that I've restarted, things are back to normal.
Between that and the heap-unclassified bit, I wonder whether we had managed to leak a bunch of resources both on our side and on the kernel and window manager side somehow.
Comment 5•5 years ago
|
||
Markus, Jeff, any ideas on why it is taking so long to create a context?
Updated•5 years ago
|
Updated•5 years ago
|
Comment 6•5 years ago
|
||
Is the GPU switching during that time? 15s is really really long, but it's pooossible this is a slow GPU->GPU data migration, and could even be swapping. I usually see GPU switches taking about a second, but that's also on a fairly fast machine. It even sounds like yours might have more resources allocated (heap-unclassified) for some reason.
Firefox definitely shouldn't be using that much RAM, even in many-tabs situations.
Reporter | ||
Comment 7•5 years ago
|
||
Is the GPU switching during that time?
I don't know. How would I tell? And again, this problem wasn't being limited to Firefox (though Firefox was definitely the trigger, in that quitting it made things normal again for all apps).
Possibly-relevant: I have "Automatic graphics switching" turned off in system preferences. Would I still have GPU switches going on?
Firefox definitely shouldn't be using that much RAM, even in many-tabs situations.
Note that I have a many-windows situation (~60 windows), not just many-tabs, in case that matters.
Comment 8•5 years ago
|
||
(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #4)
Between that and the heap-unclassified bit, I wonder whether we had managed to leak a bunch of resources both on our side and on the kernel and window manager side somehow.
I think this is what must be happening. Bug 1581081 also indicates that something accumulates over time.
Comment 9•5 years ago
|
||
Boris, the next time this happens for you, can you run ioclasscount
in the bad state, after closing Firefox, and after reopening Firefox? It might give us some indication about what leaks. I'll also try to reproduce this locally.
Reporter | ||
Comment 10•5 years ago
|
||
can you run ioclasscount in the bad state, after closing Firefox, and after reopening Firefox?
Absolutely! Didn't know that existed; that looks pretty useful!
Reporter | ||
Comment 11•5 years ago
|
||
Reporter | ||
Comment 12•5 years ago
|
||
Reporter | ||
Comment 13•5 years ago
|
||
Markus, does anything in there look interesting?
Comment 14•5 years ago
|
||
A bunch of things look interesting. AMDRadeonX4150_AMDAccelVidMemory (13296/2321/4754), IGAccelGLContext(320/15/74), IOMultiMemoryDescriptor (197/90/90). I'll take a closer look tomorrow.
Reporter | ||
Comment 15•5 years ago
|
||
Did another restart today after things got into the bad state. Here are more counts...
Updated•2 years ago
|
Comment 16•2 years ago
|
||
I don't think we ever got to the bottom of the IOSurface count increase.
Description
•