Closed Bug 998379 Opened 11 years ago Closed 11 years ago

(silk) Combine sendTouchEvent + Vsync event into 1 IPC message to Content Process

Categories

(Core :: Graphics, defect, P3)

ARM
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: mchang, Assigned: mchang)

References

Details

(Keywords: perf, Whiteboard: [c=uniformity p=3 s=2014.06.20 u=])

Attachments

(3 files)

+++ This bug was initially created as a clone of Bug #991420 +++ Currently for Project Butter, we send 2 IPC messages on a vsync to the content process. 1) sendTouchEvent 2) vsync event to start drawing On the homescreen app, this causes the vsync event to show a ~2ms delay from vsync to the content process starting to draw. This is also caused by a 1.3ms delay in the Homescreen app itself responding to the sendTouchEvent. This bug is to track the work to combine the sendTouchEvent + vsync message into one IPC message, reducing the 2 ms delay. Previous work has been done on Chromium for Android in their Zero Input Latency Chrome project [1]. The specific issue is using touch events to improve vsync scheduling [2]. The design doc that combines touch events + vsync is [3]. They show a ~1ms performance improvement on a Nexus 4, which is what we're seeing in bug 991420. [1] https://docs.google.com/document/d/1HmS0YQtWg2ToY67fE8A33PJUyPSwGUwUCLMk_zjK7ik/edit [2] https://codereview.chromium.org/11854013/ [3] https://docs.google.com/document/d/16822du6DLKDZ1vQVNWI3gDVYoSqCSezgEmWZ0arvkP8/edit
(In reply to Mason Chang [:mchang] from comment #0) > On the homescreen app, this causes the vsync event to show a ~2ms delay from > vsync to the content process starting to draw. This is also caused by a > 1.3ms delay in the Homescreen app itself responding to the sendTouchEvent. > This bug is to track the work to combine the sendTouchEvent + vsync message > into one IPC message, reducing the 2 ms delay. I don't see how combining the messages will help. You still need to wait for the touch event to be processed before you can actually do any drawing, so at best you can hope to save the 0.7ms, if that. Also keep in mind that there isn't always a touch event to piggyback the vsync event on, and non-move touchevents like touchdown and touchup don't get interpolated so this will add more complexity to the code overall. I'm not convinced it's worth it.
Since both touch event and vsync are async messages, the overhead of IPC does not be accounted as Mason mentioned here. Kernel don't send messages immediately, and buffering in the kernel would aggregate messages. So, it actually don't save 2ms of overhead if a vsync and touch events are close enough in time. The reason to aim on 2ms here is for it is too busy to update pictures in time. So, it means content process don't handle vsync and touch events immediately, so there is no additional overhead for deliver events except system calls. IO threads are overhead now, but it would be removed by bug 915733. With bug 915733, the overhead of IPC would be dropped to 1 context-switching instead of 3. So, even there has improvement, it would be neither 2ms nor 1.7ms. It more like to be less than 0.7ms in my estimation.
Thanks for the feedback Thinker and Kats. I went back through the original systrace in bug 991420, and I think combining events is still worth it. There are cases when the first sendTouchEvent is processed immediatley, and the drawing starts 2ms later due to bad scheduling decisions. Assuming the Homescreen app doesn't get pre-empted, combining the messages would save the 2ms delayhere. Please take a look at the attachment. We see 0.184ms delay between the sendTouchEvent on the b2g process until the Homescreen app starts executing OnMessageReceived. Then there is a 2.287 ms delay between the sendVsyncNotification until the Homescreen app executes OnMessageReceived due to the OS not scheduling the Gecko_IOThread out for a bit later, which sends the IPC message. The Homescreen app then goes back to sleep waiting for messages for 1.343ms. If we combined the vsync message + sendTouchEvent message, we would save 1.5ms in this case, or 9% of our frame budget. Of course, this assumes we want the Homescreen app to be executing for responsiveness reasons. This pattern, where the first sendTouchEvent message is processed right away and a ~1-2ms delay for the vsync message later doesn't occur very frequently, but often enough. (Timestamp 1268.8ms, 1352, 1484, 1868, 2167). In the best case, when the Gecko_IOThread + Homescreen app are scheduled right away, Kats is right, the delay between the two OnMessageReceived in the HomescreenApp is only 0.05 ms. However, the delay depends a lot on the OS scheduling the Gecko_IOThread. So even if we combine these events, we might still have the OS scheduler pre-empt the Homescreen app and we would not begin to draw, but at least we would the OS scheduler has one less context switch. The OS doesn't have to context switch twice: once to schedule in the gecko_iothread + another to schedule in the content app again. We would also ignore the IPC overhead. There are a few alternatives we could explore here if we really don't want to combine the messages. We could look at elevating the Gecko_IO thread priority to ensure the second message gets sent right away. The other is to tweak the OS scheduler to automatically schedule what we believe to be important, but this seems riskier. (In reply to Thinker Li [:sinker] from comment #2) > The reason to aim on 2ms here is for it is too busy to update pictures in > time. So, it means content process don't handle vsync and touch events > immediately, so there is no additional overhead for deliver events except > system calls. IO threads are overhead now, but it would be removed by bug > 915733. With bug 915733, the overhead of IPC would be dropped to 1 > context-switching instead of 3. So, even there has improvement, it would be > neither 2ms nor 1.7ms. It more like to be less than 0.7ms in my estimation. Isn't there overhead if the vsync + touch events are not sent in the same batch in the kernel? I'm probably misunderstanding something here because I'm not very familiar with the IPC system. Can you please be a bit more detailed here? (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #1) > Also keep in mind that there isn't always a touch event to piggyback the > vsync event on, and non-move touchevents like touchdown and touchup don't > get interpolated so this will add more complexity to the code overall. I'm > not convinced it's worth it. Yes this is true. I want to double check the delay on non-homescreen apps and regular APZ scrolling apps to see if we experience the scheduling issue in those cases. I can imagine the delay between the sendTouchEvent + vsync occurring more often as the other apps shouldn't have to process anything as scrolling is done by APZ.
Flags: needinfo?(tlee)
Attached image scheduleDelay.png
(In reply to Mason Chang [:mchang] from comment #3) > Isn't there overhead if the vsync + touch events are not sent in the same > batch in the kernel? I explain it in two cases. For case 1, vsync and touch events are very close in time. Linux kernel don't switch to another process/thread after sending first IPC message (vsync or touch), so b2g process could handle the later event (touch or vsync) and sent another IPC message immediately after. So, kernel would do buffering in kernel, and deliver it to the content process once be running. So, it means no additional overhead for the content process to receive two messages except an additional syscall, and syscall is very cheap related in this case. For case 2, vsync and touch event are not close enough in time. If CPU is no busy, it is out of the case. I talk only the case of being busy. Since touch event and vsync event are no close, there has two context switch to b2g process for handling vsync and touch event even with combining. Without combining and with high-loading, the IPC message of later one, ex. touch event, would be in the buffer of the kernel. Once the content process finish handling first message, it could handle the later one immediately without additional context switching, no addition overhead except the minor ones syscalls. The graph of comment 4 is obviously a scheduling issue. I believe it would be raised to hundreds ms if music player on the background. It could be resolved partly by removing context switching for IO threads, and prioritize some important threads.
Flags: needinfo?(tlee)
What is the disadvantage of combining the events? Why are we arguing against it?
What the real problem does it try to resolve? Are these problems valid? I don't find any real valid problem yet. Maybe I have missed some key points, but, for now, there is still a lot misunderstanding on problems according my understanding.
(In reply to Mason Chang [:mchang] from comment #3) > Thanks for the feedback Thinker and Kats. I went back through the original > systrace in bug 991420, and I think combining events is still worth it. > There are cases when the first sendTouchEvent is processed immediatley, and > the drawing starts 2ms later due to bad scheduling decisions. Assuming the > Homescreen app doesn't get pre-empted, combining the messages would save the > 2ms delayhere. Please take a look at the attachment. Looking into the image of comment 4, you will find the 2ms of latency is caused by waiting IOThread and scheduling. These context switching would be disappearing with AIO solution. I really suggest to wait for AIO if with no urgent inquire.
Attached file ipc_latency.py
This is a simple simulator for studying IPC latency of this bug. It is in Python with simpy. The simulator runs IPC sender and receiver with a given background CPU loading. Changing line 197~203, simulator run simulation with different combination of parameters. With default setting of the simulator, it gives out ~43ms of latency in average with iothread. It gives out ~20ms of latency in average without iothread, with AIO.
(In reply to Thinker Li [:sinker] from comment #5) > For case 2, vsync and touch event are not close enough in time. If CPU is > no busy, it is out of the case. I talk only the case of being busy. Since > touch event and vsync event are no close, there has two context switch to > b2g process for handling vsync and touch event even with combining. Without > combining and with high-loading, the IPC message of later one, ex. touch > event, would be in the buffer of the kernel. Once the content process > finish handling first message, it could handle the later one immediately > without additional context switching, no addition overhead except the minor > ones syscalls. > I'm confused about case 2. With Project butter, we combine the vsync + touch event, so they will occur at the same time. Touch events will only be sent on vsyncs. Without combining, we have two context switches to the b2g process. Once for the vsync, once for the touch event. With combining, we only have one context switch, to send the vsync + touch combined event. We would save one context switch here. With combining, we would not have a case where the vsync + touch event are not close in time. (In reply to Thinker Li [:sinker] from comment #7) > What the real problem does it try to resolve? Are these problems valid? I > don't find any real valid problem yet. Maybe I have missed some key points, > but, for now, there is still a lot misunderstanding on problems according my > understanding. The problem we are trying to solve is that with Project butter, we send 2 IPC messages on every vsync to the content process. 1) Touch event. 2) vsync message. Each message has a different latency between the actual hardware vsync until the content process gets it. This bug enables us to send only 1 IPC message on every vsync. This can sometimes reduce latency, as shown in the attachment in comment 4. The sendTouchEvent was sent and processed right away on the content app. Because the vsync is a second message, and was unfortunately sent much later because the Gecko_IOThread was scheduled out between the vsync and touch event, the Homescreen app went to sleep. If we combine the messages, the Homescreen app could keep processing both the touch event + vsync to reduce some of the bad scheduling decisions. Does that explain it?
For case 2, if vsync and touch event are not closed in time, the latency of vsync would not affected, or in little, by the touch event or in reversed. If they are closed, it would fall into the case I had mentioned. Reduce context switching is good, but for this case, scheduling is major part of latency, the overhead of context switching seems ignorable. I suggest to do experiments on the overhead of the context switching to make sure we are basing on right assumptions.
Attached file cswitch.c
This is test program to evaluate the overhead of context switching. It forks to parent and child process. Parent send one byte to child, waiting for child to reply, and repeat it for 1M times. It means 2M times of context switching. I run it on unagi, it takes 33.153s. It means 1.65765e-05 second in average of context switching; aka 16 us or 0.016 ms.
This experiments don't includes how CPU cache be affected by context switching.
Agree with Kats' Comment 1. Furthermore, the two events are totally different usages and what if it only plays an JS animation without touch event ? Thinker already proposed possible solution(bug 915733) for this fundamental issue. So far, I think it's not worth to put resource on this bug for "Project Silk"(we already renamed it). Anyway, the 3 links look interesting. I will also study it to see if there's any benefit for us to learn.
(In reply to Mason Chang [:mchang] from comment #0) > +++ This bug was initially created as a clone of Bug #991420 +++ > > Currently for Project Butter, we send 2 IPC messages on a vsync to the > content process. > > 1) sendTouchEvent > 2) vsync event to start drawing > > On the homescreen app, this causes the vsync event to show a ~2ms delay from > vsync to the content process starting to draw. This is also caused by a > 1.3ms delay in the Homescreen app itself responding to the sendTouchEvent. > This bug is to track the work to combine the sendTouchEvent + vsync message > into one IPC message, reducing the 2 ms delay. > > Previous work has been done on Chromium for Android in their Zero Input > Latency Chrome project [1]. The specific issue is using touch events to > improve vsync scheduling [2]. The design doc that combines touch events + > vsync is [3]. They show a ~1ms performance improvement on a Nexus 4, which > is what we're seeing in bug 991420. > > [1] > https://docs.google.com/document/d/ > 1HmS0YQtWg2ToY67fE8A33PJUyPSwGUwUCLMk_zjK7ik/edit > [2] https://codereview.chromium.org/11854013/ > [3] > https://docs.google.com/document/d/ > 16822du6DLKDZ1vQVNWI3gDVYoSqCSezgEmWZ0arvkP8/edit I have some comments on my copy of [3] to share with you. https://docs.google.com/document/d/1hCJQIjKU8xAZkGS-vhz6zLh1igUxhqCQvj4GJvasYiI/edit?usp=sharing
Priority: P1 → P3
Remove the target milestone for now since project silk is the feature for 2.1.
Target Milestone: 2.0 S1 (9may) → ---
From the proposed design in 930939, we want to move touch events off the main thread. Since the APZ controller can read the touch events off main thread, we don't need to combine the events anymore. Instead, we send a vsync signal to the APZ Controller. Then the APZ Controller reads the touch event data instead of getting a message. We don't need this bug anymore. Resolving as invalid.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
Whiteboard: [c=uniformity p=4 s= u=] → [c=uniformity p=3 s=2014.06.20 u=]
See Also: → 913942
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: