Closed Bug 915733 Opened 12 years ago Closed 8 years ago

Use Linux AIO instead of IO threads for IPC

Tracking

(blocking-b2g:-)

Status:

RESOLVED WONTFIX

Project Flags:

blocking-b2g

People

(Reporter: sinker, Assigned: cyu)

References

Details

(Keywords: perf, Whiteboard: [c=progress p= s= u=])

Attachments

(1 file)

ipc_latency.py 12 years ago Thinker Li [:sinker] 4.69 KB, text/x-python		Details

Thinker Li [:sinker]

Reporter

Description

•

12 years ago

AIO is more efficient by without latency of context switching introduced by IO Threads. AIO is also better by calling less number of syscalls. Without AIO, it requires 3 context switching to dispatch an IPC message while it is only one for with AIO.

Donovan Preston [:fzzzy]

Comment 1

•

12 years ago

I think this would make things a whole lot faster. Great idea.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 2

•

12 years ago

We're talking about Linux Async I/O (io_setup/destroy/submit/getevents) not POSIX Async I/O (aio_read/write/etc) right?

Thinker Li [:sinker]

Reporter

Comment 3

•

12 years ago

Yes, we are talking Linux Async I/O, not POSIX Async I/O.

Thinker Li [:sinker]

Reporter

Updated

•

12 years ago

Summary: Use AIO instead of IO threads for IPC → Use Linux AIO instead of IO threads for IPC

Kai-Chih Hu [:khu]

Comment 4

•

12 years ago

This sounds great! Maybe it can help performance functional team. Thanks, Thinker. So, who will work on this?

Ben Kelly [:bkelly, not reviewing]

Updated

•

12 years ago

blocking-b2g: --- → 1.3?

Keywords: perf

Mike Lee [:mlee]

Updated

•

12 years ago

Whiteboard: [c=progress p= s= u=]

Thinker Li [:sinker]

Reporter

Updated

•

12 years ago

Assignee: nobody → cyu

Benjamin Smedberg

Comment 5

•

12 years ago

Do we have evidence that context switches show up as a reason for latency or raw throughput slowness? cjones did a bunch of testing of this way back when we started using the IPC layer, and you can send thousands of messages per second even on single-core phone hardware. Don't want to rain on this parade if it will actually help, but I suspect that it might be better to focus on the application layer or work on coalescing message or message handling to reduce the total load. And I'm worried about putting the Linux code on a completely separate codepath without an I/O thread, because we'll undoubtedly end up with Linux-specific bugs and our testing population on Linux desktop and phones is relatively so small that it's much less likely that we'll catch bugs early.

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 6

•

12 years ago

Thousands of messages per second sounds reasonable even for low-end hardware that we currently ship. But that's thouroughput. Do we have numbers of latency in the previous tests? Since many IPC messages are used to deledate things like inputs, events, rendering, etc., we can have many round trips between the processes even for a simple click on the virtual keyboard. We can first measure the latency and make a small proof-of-concept patch to test if it really helps. Then we can consider if we want to trade the maintenance cost of linux-specific code path with the improved latency.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 7

•

12 years ago

So far all the slow cases I've seen involve the main thread event loops of either/both the parent and child processes. I would love to be proven wrong but even if we manage to increase the IPC speeds somehow I think we'll never notice the difference due to the current event loop problems.

Mike Lee [:mlee]

Updated

•

12 years ago

Status: NEW → ASSIGNED

Dietrich Ayala (:dietrich)

Updated

•

12 years ago

Target Milestone: --- → 1.2 C3(Oct25)

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 8

•

12 years ago

I made some experiments and measured the time spent in sending an IPC message. It takes hundreds of microseconds from scheduling the IO request to starting running the IO request. And it normally takes < 100 microseconds to run the IO request (i.e. sendmsg()). That is, we normally spend < 1 ms for sending an IPC request. On the receving side, if the main thread is not loaded I guess the firgures are comparable. We could improve responsiveness by categorizing the IPC requests and don't let all IPC requests queue up equally. Input events are a good example. We want the device to respond to our inputs even it's under heavy load. If we can make TabChild receive the inputs faster even under load, we might be able the improve responsiveness and hence perceived performance.

Thinker Li [:sinker]

Reporter

Comment 9

•

12 years ago

Do you mean with async I/O? With async I/O, two problems are supposed to be resolved. 1. avoid overhead of additional context switching, 2. reduce average and variance of latency for dispatching IPC messages. For input events, maybe we should put a throttle on input events. While we are in heavy loading, we may in scrolling or some animations. It means to handle several input events but generate only one picture. It means input events already overloading the main thread. We should aggregate input events to avoid input events overloading the main thread, not trying to deliver it faster.

Benjamin Smedberg

Comment 10

•

12 years ago

Delivering IPC out of order really requires a separate root IPDL actor: one of the important guarantees that makes IPDL "safe" is that the messages are checked and delivered in order; if there is a state machine, you can statically prove that you won't be sending messages to dead actors, etc. It's certainly possible to have a separate root actor for important versus unimportant messages, but we'd have to do a bunch of additional thinking about how that might affect security checking.

Thinker Li [:sinker]

Reporter

Comment 11

•

12 years ago

It is still delivered by the order.

Mike Lee [:mlee]

Comment 12

•

12 years ago

Cervantes, any update on this? Also, what tools did you use to gather the measurements you quoted in comment 8?

blocking-b2g: 1.3? → -

Priority: -- → P4

Target Milestone: 1.2 C3(Oct25) → ---

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 13

•

12 years ago

Mike, my plates are full of the Nuwa process bugs to make it into 1.3. I am going to come back for this after that. The measurements are from adaptation of the WIP in bug 908005.

Cervantes Yu [:cyu] [:cervantes]

Assignee

Comment 14

•

12 years ago

Typo. s/bug 908005/bug 908995

Mason Chang [Inactive] [:mchang]

Updated

•

12 years ago

Priority: P4 → P1

Mason Chang [Inactive] [:mchang]

Updated

•

12 years ago

Comment 15

•

12 years ago

I still don't think we've seen any evidence that this is worth engineering effort.

Thinker Li [:sinker]

Reporter

Comment 16

•

12 years ago

Attached file ipc_latency.py — Details

This is a simple simulator of the overhead of IOThread in Python with simpy. By changing parameters at line 160~164, obviously, latency would be increased by iothread a lot. For example, with default setting with iothread, the mean value of latency is ~38ms, but only ~20ms without iothread.

Mason Chang [Inactive] [:mchang]

Updated

•

12 years ago

Priority: P1 → P3

BMO Automation

Comment 17

•

8 years ago

Firefox OS is not being worked on

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.