Open Bug 1612564 Opened 5 years ago Updated 2 years ago

Experiment with increased IPC buffer sizes

Categories

(Core :: IPC, task, P3)

ARM
Android
task

Tracking

()

People

(Reporter: acreskey, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Attached image requestStart_gve.png

We've found evidence that early network requests are being delayed by e10s, at least on Android.
(See attachment, comparing requestStart from the navigation API on fennec, 1 process builds of geckoview_example, and also the default e10s build, gve_beta)

The bug is to experimentally increase the buffer sizes of the IPC pipes (which looks to be a socketpair) to see if it can improve IPC performance.

See:
https://searchfox.org/mozilla-central/rev/e076e40ab1290f4e5e67ebd21dc8af753fc05be6/ipc/chromium/src/chrome/common/ipc_channel_posix.cc#253

Assignee: nobody → adam

I did hack together a patch which increased the socket options, SO_RCVBUF and SO_SNDBUF, on the IPC sockets
https://hg.mozilla.org/try/rev/971e91fe195cea46e21eed8271e688e30b36b7c4

The buffers appear to larger than we expected, 224k by default on Android.
https://paste.rs/i66

In an overnight run, I was not able to discern any performance improvements from the change (see column fenix_socket_buffers)
https://docs.google.com/spreadsheets/d/1p6EabhB3pjSwpK9wB_IxzPPkOQB4gsafUcaf4pDeA1U/edit#gid=294628143

We've found evidence that early network requests are being delayed by e10s, at least on Android.

jld has pointed out in other latency bugs that "early" requests might have high latency because the process is still launching. Could this be an issue here?

(In reply to Gian-Carlo Pascutto [:gcp] from comment #2)

We've found evidence that early network requests are being delayed by e10s, at least on Android.

jld has pointed out in other latency bugs that "early" requests might have high latency because the process is still launching. Could this be an issue here?

I don't think that's the case because we've been using a browser conditioning script which would load a dummy site (https://www.example.com) prior to loading the actual site where we could collect the metrics.

So the android content process, e.g. org.mozilla.firefox:tab should have already been created at that time.

(This I verified with adb shell ps | grep org.mozilla)

We have since simplified the conditioning to skip the example.com loading as it wasn't improving noise, but was significantly increasing test duration time (there was also a 30 second delay).
I'm still seeing the same delays there, but that's a good point about the process launch time that we need to be careful about.

See Also: → 1604326

Adam,
I wasn't able to find any performance wins from increasing the socket send/rcv buffers.
Do you have any other ideas?

Flags: needinfo?(adam)

(In reply to Andrew Creskey [:acreskey] [he/him] from comment #4)

I wasn't able to find any performance wins from increasing the socket send/rcv buffers.
Do you have any other ideas?

I have some vague notions that I hope to nail down in the next week or two. In my early testing on Mac and Windows, I'm seeing IPC delays (which are approaching 70 ms on a fast machine) being dwarfed by the delays introduced by the main thread being too busy to service the incoming IPC messages (on the order of 1.5 seconds in some cases). Without instrumenting the IPC calls (which is what Haik has been working on), it's hard to tease these apart.

For what it's worth, I do suspect the buffer size may play a role in IPC delay when large transfers are underway. For example, I'm seeing the IPC latency on a single queue climb from ~0.02 ms to ~40 ms as a long series of PAltDataOutputStream::Msg_WriteData messages (each 128kb + 116 bytes of overhead) are sent between processes. This is under Windows on a Ryzen Threadripper 2920X, so I imagine these numbers will be substantially higher on a mobile device. So if we're focusing exclusively on IPC, there may still be a benefit to finding a better buffer size.

The other approach that might be worth investigating for these large buffers that seem to slow things down is using shared memory to pass them between processes rather than pipes (either by baking logic into IPC that automatically moves the information into a shared buffer when it's above a threshold size, or for even less copying, by tracking down the sites that generate large buffers and modifying them and their recipients to use shared memory buffers directly.) These approaches would need some security scrutiny, since e.g., a child process tweaking a buffer after it has left its own domain could cause issues. I think Tom Ritter's Taint<> work here might help, but you'd need to coordinate with him to be sure.

In any case, I hope to work out a breakdown of IPC latency versus main-thread latency on Android in the near future, and I hope that helps to shed some light here.

Flags: needinfo?(adam)

Thank you for the perspective, Adam.
With our short and medium term focus on Fenix, any light that can be shed on Android IPC is very welcome.

Priority: -- → P3

Somewhat belatedly, I can confirm that my own testing with different IPC pipe buffer sizes on Windows did not have an appreciable impact on performance, even when the IPC message buffer segment size was increased to match. I experimented with 64k, 128k, and 256k buffers, with no improvements in IPC latency. Unless someone wants to take a third run at buffer size changes, I would propose that we close this bug.

Unassigning, but leaving open in case someone has other ideas about how to approach this.

Assignee: adam → nobody

Any idea where this stands? Ran into what I believe is reported above.

Short version is IPC writes exceeded MSGMAX which was causing stalls and artificial delays in rendering. Was seeing writes over 100k with 32k being the average. The default non-tweaked at least on centos is 8192.

Bumping /proc/sys//kernel/msgmax to 512k works around it for now but the undllying problem(s) still remain. Bigger writes incur higher, measurable latency for a couple reasons. One is likely that it simply doesn't fit within L1/L2 cache. It gets worse if the two threads are on different CPUs or cross numa nodes (as in the case of higherend cpus like Ryzen/Threadripper, Xeons, etc). Firefox can and is easily exceeding the non-boosted ipc limits

I've looked at the code but I don't quite understand why there are exceptions for MACOS and even BSD, but not linux?

./ipc/chromium/src/chrome/common/ipc_channel_posix.cc:

 43 #if defined(IOV_MAX)
 44 static const size_t  = IOV_MAX;
 45 #elif defined(ANDROID)
 46 static const size_t kMaxIOVecSize = 256;
 47 #else
 48 static const size_t kMaxIOVecSize = 16;
 49 #endif

656     struct iovec iov[kMaxIOVecSize];
703 #if defined(OS_MACOSX) || defined(OS_NETBSD)
704           // (Note: this comment is copied from https://crrev.com/86c3d9ef4fdf6;
705           // see also bug 1142693 comment #73.)

https://crrev.com/86c3d9ef4fdf6:

Recognize EMSGSIZE as non-fatal on OS X.

BUG=29225
TEST=PageCyclerTest.Intl2File test should succeed.
Review URL: http://codereview.chromium.org/460102

As per Chrome, they added MAC and Netbsd (no comment), but not linux?

Or am I completely missing this..

(In reply to Milenko from comment #9)

I've looked at the code but I don't quite understand why there are exceptions for MACOS and even BSD, but not linux?

That's a workaround for a bug in macOS (and NetBSD; that part was contributed in bug 1553389), where sending a message would sometimes fail with EMSGSIZE but retrying it later would succeed. Specifically, this happens if the message has attached file descriptors, and the buffer has at least one byte free — causing it to poll as writeable and not give EAGAIN — but not enough space for the fd attachment. It's unrelated to the amount of data being written. Without that patch, we'd treat that as an unrecoverable I/O error and probably end up with at least a tab crash. As far as I know Linux doesn't exhibit that behavior, and it's separate from the performance issues with large messages.

EMSGSIZE will also be returned when the message is too long though no? That is what I was seeing, on linux. Increasing the max message size fixed at least the overhead of it. IOV_MAX for me is 1024, that is potentially a 16.7M write @ 16k per message. What I'm getting at is if the receiver's blocked could that not also trigger?

Unless there is something else going on locally, I can confirm increasing msgmax has completely removed the stalls I was seeing. There's still an awful lot of idle IPC traffic with sizes that grow with the number of tabs but sendmsg is considerably less.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: