Open Bug 1583922 Opened 5 years ago Updated 2 years ago

Terrible lag opening new window due to parent process waiting for 15+s (!) on a condition variable in IPC from ClientLayerManager

Categories

(Core :: Graphics, defect, P3)

Desktop
macOS
defect

Tracking

()

People

(Reporter: bzbarsky, Unassigned)

Details

Attachments

(4 files)

See profile at https://perfht.ml/2lzUOrF

It looks like the parent process is doing mozilla::layers::PLayerTransactionChild::SendGetTide which blocks for 15+s waiting on a condvar under mozilla::ipc::MessageChannel::Send.

The user-perceived behavior is I hit Cmd+N and then beachball for 15s until the new window opens...

Possibly-relevant: about:memory claims 3GB of heap-unclassified in the parent process, and I am definitely very much swapping (though to SSD, in case that matters).

Hmm. Opening other windows (e.g. Thunderbird composition window or iTunes or Safari) is also either really laggy or doesn't happen at all. So maybe there is something on the operating system side that is being a problem in terms of responding quickly and we're not handling it too well (though better than iTunes or Safari, apparently!).

The monitor is being waited on until the compositor thread sends its reply to the sync IPC message being sent (SendGetTextureFactoryIdentifier).
https://searchfox.org/mozilla-central/rev/f43ae7e1c43a4a940b658381157a6ea6c5a185c1/gfx/layers/ipc/PLayerTransaction.ipdl#113-114

The target thread of the sync IPC is the compositor thread, which is blocked due to calling into the OS, preventing main thread -> compositor thread IPC from being processed.
https://searchfox.org/mozilla-central/rev/f43ae7e1c43a4a940b658381157a6ea6c5a185c1/gfx/gl/GLContextProviderCGL.mm#187-232

Moving to Graphics, as they are the user of this parent-process main thread sync IPC call.

Component: IPC → Graphics

Appears to be Mac-specific, if that helps find someone to triage it.

OS: Unspecified → macOS

For what it's worth, I have now restarted Firefox. Before I did that, total memory + swap usage was about 22GB (on a 16GB system). When I quit Firefox, that goes down to 8GB or so. When I start Firefox again, it goes to 18GB. Before the restart, pretty much anything that involved creating a new window in any app was laggy-to-broken (e.g. Zoom would not open its window, Emacs would not manage to open a window, etc). The one exception seems to be XQuartz: I could create new xterm windows all I wanted.

Now that I've restarted, things are back to normal.

Between that and the heap-unclassified bit, I wonder whether we had managed to leak a bunch of resources both on our side and on the kernel and window manager side somehow.

Markus, Jeff, any ideas on why it is taking so long to create a context?

Flags: needinfo?(jgilbert)
Priority: -- → P3
Flags: needinfo?(mstange)
Hardware: Unspecified → Desktop

Is the GPU switching during that time? 15s is really really long, but it's pooossible this is a slow GPU->GPU data migration, and could even be swapping. I usually see GPU switches taking about a second, but that's also on a fairly fast machine. It even sounds like yours might have more resources allocated (heap-unclassified) for some reason.

Firefox definitely shouldn't be using that much RAM, even in many-tabs situations.

Flags: needinfo?(jgilbert) → needinfo?(bzbarsky)

Is the GPU switching during that time?

I don't know. How would I tell? And again, this problem wasn't being limited to Firefox (though Firefox was definitely the trigger, in that quitting it made things normal again for all apps).

Possibly-relevant: I have "Automatic graphics switching" turned off in system preferences. Would I still have GPU switches going on?

Firefox definitely shouldn't be using that much RAM, even in many-tabs situations.

Note that I have a many-windows situation (~60 windows), not just many-tabs, in case that matters.

Flags: needinfo?(bzbarsky)

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #4)

Between that and the heap-unclassified bit, I wonder whether we had managed to leak a bunch of resources both on our side and on the kernel and window manager side somehow.

I think this is what must be happening. Bug 1581081 also indicates that something accumulates over time.

Flags: needinfo?(mstange)

Boris, the next time this happens for you, can you run ioclasscount in the bad state, after closing Firefox, and after reopening Firefox? It might give us some indication about what leaks. I'll also try to reproduce this locally.

can you run ioclasscount in the bad state, after closing Firefox, and after reopening Firefox?

Absolutely! Didn't know that existed; that looks pretty useful!

Markus, does anything in there look interesting?

Flags: needinfo?(mstange)

A bunch of things look interesting. AMDRadeonX4150_AMDAccelVidMemory (13296/2321/4754), IGAccelGLContext(320/15/74), IOMultiMemoryDescriptor (197/90/90). I'll take a closer look tomorrow.

Flags: needinfo?(mstange)

Did another restart today after things got into the bad state. Here are more counts...

Flags: needinfo?(mstange)
Severity: normal → S3

I don't think we ever got to the bottom of the IOSurface count increase.

Flags: needinfo?(mstange.moz)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: