Closed Bug 1350635 Opened 8 years ago Closed 8 years ago

PGPU::Msg_AddLayerTreeIdMapping sync IPC is extremely inefficient

Categories

(Core :: Graphics: Layers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Performance Impact high

People

(Reporter: ehsan.akhgari, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [gfx-noted])

According to our telemetry data on Nightly, the median time for this one is 35.96ms, over 2 frame budgets. :( David, can you please take this?
Flags: needinfo?(dvander)
Is there any more context? Like how far into startup we're getting long send times? Or how that compares to timing on beta/aurora (where the GPU process does not relaunch on kill?) Or how often this message is being sent? Background: This message is sent to the compositor process, which shouldn't be busy. And it just adds an integer to a map. There's two cases where it's sent. The first is when you open a tab. That currently has to be synchronous due to ordering issues (the child could try to connect its layers before the compositor association has been made). And in fact we send an additional sync message immediately after - so even if we drop *this* sync message, or combine them into one, we will still block the UI on the compositor thread. The second case is when the GPU process first launches. During that time it's probably doing busy work like loading d3d11.dll, so I could see it taking longer. We do need to block on that message being received, but it can be delayed up until the first paint. And for the first launch of the GPU process the message is probably empty and doesn't even need to be sent. Anyway, having some more information would help narrow down a plan of attack.
Flags: needinfo?(dvander)
(In reply to David Anderson [:dvander] from comment #1) > Is there any more context? Like how far into startup we're getting long send > times? No, I filed this based on telemetry data. More investigation may be needed. > Or how that compares to timing on beta/aurora (where the GPU process > does not relaunch on kill?) Or how often this message is being sent? You can check the IPC_SYNC_MAIN_LATENCY_MS on those branches, it's available on them all (but not on release.) Note that it's a keyed probe, once you load it look for the "AddLayerTreeIdMapping" key in the histogram search UI. > Background: This message is sent to the compositor process, which shouldn't > be busy. And it just adds an integer to a map. > > There's two cases where it's sent. The first is when you open a tab. That > currently has to be synchronous due to ordering issues (the child could try > to connect its layers before the compositor association has been made). And > in fact we send an additional sync message immediately after - so even if we > drop *this* sync message, or combine them into one, we will still block the > UI on the compositor thread. This means pausing the entire UI thread and all event handling and everything else, which is really bad especially at tab opening time, and doing that two times is exceptionally bad, so at the very least we have to drop them to one for efficient tab opening. Have you done profiling around this in practice to see what happens for example when the machine is a bit under load, or when you have a video playing, or a few tabs open, etc.? In my experience, sync IPCs have drastically different timings under these types of different environments. (I haven't profiled this on Windows extensively myself yet.) > The second case is when the GPU process first launches. During that time > it's probably doing busy work like loading d3d11.dll, so I could see it > taking longer. We do need to block on that message being received, but it > can be delayed up until the first paint. So it doesn't need to be synchronous in that case, right? Only need to have a conditional variable that you can wait on that will be signaled upon the first paint or some such... > And for the first launch of the GPU > process the message is probably empty and doesn't even need to be sent. Great! > Anyway, having some more information would help narrow down a plan of attack. As a next step I suggest doing some local profiling under some load situations to get a sense of how things look like. Also (and please take the following with a big grain of salt since I don't know anything about this code) I find the need for synchronicity in order to guarantee ordering a bit suspect. Let me give you an example. For the creation of windows, we have this situation where either the child process, or the parent process can at any time request a window to be created, and they need to set up an IPC actor in the process and send some initialization messages in both directions before the initialization is considered done. Right now part of this setup relies on the content process to send a sync IPC message to the parent process to get a simple way to order things, but I'm changing that by basically breaking that message up into a request and response pair and waiting in the middle (very carefully!) and dealing with the cases where some code somewhere tries to use something from the window setup before we're ready... I don't know how much of the same concepts would apply to the ordering questions here, but if there is a way to use async messaging and guarantee ordering with a bit of a more complicated setup, it may be worth considering...
The ordering problem is "real" in that, if you make the UI message async, it can be out of order with respect to another message coming from the content process's compositor channel. (bug 1316632 comment #33 has the scenario.) Like you suggested I doubt it's fundamental and some careful restructuring might be able to make it sync only for content. This code is from when e10s and layers were first integrated. The constraints are that the UI process has to do this mapping for security reasons, and content has to wait for it to complete. Anyway - I will file a bug for the easy tasks of making the launch message async and combining the allocate/notify sync messages into one. Even if the problem is ultimately elsewhere it still seems like a good idea.
Whiteboard: [gfx-noted] → [gfx-noted][qf]
Whiteboard: [gfx-noted][qf] → [gfx-noted][qf:p1]
This doesn't sound like an easy win at this point. More research is needed.
It looks like the number of submissions in Telemetry has plummeted after this landed, so maybe this is good enough for now.
With bug 1350660, can we look at the telemetry again and see if this is still at the top?
Whiteboard: [gfx-noted][qf:p1] → [gfx-noted][qf]
According to the evolution dashboard, the number of reports for it fell dramatically on Nightly.
Whiteboard: [gfx-noted][qf] → [gfx-noted][qf:p1]
After a conversation with :bas, we're going to mark this as fixed.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Performance Impact: --- → P1
Whiteboard: [gfx-noted][qf:p1] → [gfx-noted]
You need to log in before you can comment on or make changes to this bug.