Closed Bug 941984 Opened 8 years ago Closed 8 years ago

[FlatFish] InputReader takes ~18ms to get each event. Makes FPS bounded at ~50fps.

Categories

(Firefox OS Graveyard :: GonkIntegration, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(Not tracked)

RESOLVED WONTFIX
2.0 S1 (9may)

People

(Reporter: vlin, Assigned: vlin, NeedInfo)

References

Details

(Keywords: perf, Whiteboard: [c=effect p= s=2014.05.09.t u=flatfish])

Attachments

(5 files)

1. Disable profiler. (Or overhead will bias some function calls significantly.)
2. Wiping between two homescreen pages. (no painting, only touch events and compositions.)
3. Profiling by Timer and logcat.

We found the FPS was bounded at 50 FPS. After investigating, the bottleneck was the call epoll_wait() to in EventHub::getEvents(). This event polling to kernel takes ~18ms. Then homescreen received only ~50 time touch events.

Statistics of epoll_wait() cost with 100 sampling times
avg:18023.500000 us, max:18174 us, min:17843 us, std:55.630657
ni Terry.
Flags: needinfo?(phterry)
Hey, Vincent,
epoll_wait() is waiting for next available input events.  So are you saying that sampling rate of input events from kernel is not enough?  Let me loop Jeff Chuang to look into this issue.
Flags: needinfo?(phterry)
(In reply to Terry Li from comment #2)
> Hey, Vincent,
> epoll_wait() is waiting for next available input events.  So are you saying
> that sampling rate of input events from kernel is not enough?  Let me loop
> Jeff Chuang to look into this issue.

Yes. 
I've also talked to Jeff in Skype. 
Please keep updating this issue.
Thanks~
We measure the report rate of touch driver(from touch firmware to input core) is over 57 fps(avg.). It is fine with touch driver. We will check other parts of kernel.
(In reply to Jeff Chuang from comment #4)
> Created attachment 8337583 [details]
> The touch driver report rate is over 57 fps(avg. )
> 
> We measure the report rate of touch driver(from touch firmware to input
> core) is over 57 fps(avg.). It is fine with touch driver. We will check
> other parts of kernel.

How about the status ?

Can you find the root cause before the week we work in Toronto ?
Flags: needinfo?(phterry)
Flags: needinfo?(jeff.cy.chuang)
Assignee: nobody → vlin
I am sorry that I did not find the root cause yet.
I am not confident to find the root cause of it by the week because I have not finished the code tracing from input core to epoll_wait().
Flags: needinfo?(jeff.cy.chuang)
(In reply to Jeff Chuang from comment #6)
> I am sorry that I did not find the root cause yet.
> I am not confident to find the root cause of it by the week because I have
> not finished the code tracing from input core to epoll_wait().

How about the period of epoll_wait() running Android on FlatFish ?
Flags: needinfo?(jeff.cy.chuang)
(In reply to Vincent Lin[:vilin] from comment #7)
> How about the period of epoll_wait() running Android on FlatFish ?
It is the same as b2g runing b2g. epoll_wait often spends 18 ms when wiping in Android Homescreen.
Flags: needinfo?(jeff.cy.chuang)
After investigating, I am wondering whether the epoll_wait() is bottleneck of frame rate according to the follow findings:
1.According to the logcat messages(this attachment) of epoll_wait(), raw events in InputReader, touch events: Not all raws events will be considered and transformed to touch events(ex: touchstart, touchmove), ex: Only the first 22 raw events are transformed into touchmove event(@Line 233) in 1st wiping action and others are filtered. In 2nd wiping, only 21 raw events are considered.
From these observations, I think frame rate shall not be limited by raw event report rate, that is, 1000 ms/epoll_wait() spent time.
2.Working with systrace to check the timeline of touch event and composition, I find the InputReader(EventHub), homescreen and compositor are different processes. I guest the composition shall not be blocking the waiting of epoll_wait().
3.According to dumpsys gfxinfo results in Flatfish Android image, the render time of each frame is less than 7 ms(means 60 fps is possible) even the spent time of epoll_wait() is also 18 ms. So it shall be also true in FxOS
Please let me know if I have anything misunderstanding.
Could you provide any information about why you think epoll_wait() waiting time is the bottleneck?

BTW, the raw event report rate shall be 55 event/s(1000/18=55.6), not 50...:)
From the results, InputReader&compositor are process 1269, homescreen is process 1610.
Sorry, I made a mistake in last comment. InputReader and compositor are the same process. I will check whether they are in the same thread later.
In Android, the render time of each frame is less than 7 ms even the spent time of epoll_wait() is 18 ms, too.
(In reply to Jeff Chuang from comment #9)
> Created attachment 8346347 [details]
> The logcat messages include epoll_wait, InputReader raw events, touch events
> in Homescreen app
> Could you provide any information about why you think epoll_wait() waiting
> time is the bottleneck?
> 
> BTW, the raw event report rate shall be 55 event/s(1000/18=55.6), not 50...:)

Homescreen will wait for trigger from touch event then do reflow in the next tick.
This Bug is just one of those issues make 60fps unreachable. 
Bug 944564 tracks the others.

BTW.
InputReader and Compositor are in the same process, but in different threads.
Attached file trace_touch_event.html
We have refresh driver to trigger Homescreen reflow ([Timer]Fire) with 1/60 sec. period. But it must have received at least one touch event(RecvRealTouchMoveEvent) from last tick to the current tick. So AsyncSendRealTouchMoveEvent must send at least 60 times in one AsyncSendRealTouchMoveEvent period strongly depends on InputReaderThread::threadLoop.
Flags: needinfo?(jeff.cy.chuang)
We are preparing for open source program until now. Will update later once we have progress on it.
FxOS Perf P1 as this has a direct negative impact on Frame Uniformity.

Vincent, do these findings also apply to our non-flatfish devices?
Status: NEW → ASSIGNED
Flags: needinfo?(vlin)
Keywords: perf
Priority: -- → P1
Whiteboard: [c=effect p= s= u=]
This is a flatfish-only bug i.e the finding only exists in flatfish.
Flags: needinfo?(vlin)
I find the touch interrupt handler is triggered with the rate around 60-55 times/sec
The rate is dropping when the number of the continuous events is accumulating:
Under 12 events: 60 times/sec, under 16 events: 59 times/sec, under 30 events: 57 times/sec, over 70 events: 55  times/sec

There are 2 possible root causes:
1)The auto reduce report rate mechanism of touch firmware: There is a register to control the mechanism is on/off according to the TP datasheet. Currently, it is enabled.
2)The interrupt pending bit is not cleared immediately: The platform uses a proprietary function to register ISR and ISR needs to return IRQ_NONE instead of IRQ_HANDLED.I did not any expression to clear pending bit in ISR or work queue.
Flags: needinfo?(jeff.cy.chuang)
After disabled auto reduce report rate mechanism by setting the register, the dropping rate is not gone.
We will check it with touch panel vendor next week and try to add the expressions of clearing pending bit in touch driver
After traced the flow of interrupt handling on Flatfish, the interrupt pending bit is just cleared before touch ISR is invoked and touch ISR&work queue finishes processed an interrupt within 2.2 ms(avg.).
So I think the cause of the dropping report rate should be the touch firmware.
After checked the touch vendor FAE, he said the expected report rate is between 50~60 points/sec and thought 55 points/sec should be a reasonable rate. When talking about the possibility of increasing the report rate to 60-80 points/sec, he thinks it is not possible on flatfish because the current firmware have been optimized under the interference from current panel and charging components. About the dropping report rate, he says the register is obsolete and the report rate should be consistent. He suggest us to check the problem with oscilloscope.
After measure the waveform of touchscreen interrupt pin, the interrupt pin is pulled down every 18.1 ms to trigger the touch ISR and the report rate is consistent at 55 points per second. So we can not provide better touch report rate with the current panel and charging components on flatfish.
About the dropping report rate mentioned in comment 17, it is caused by the wrong algorithm on calculating report rate. After modified the test code to simply report the trigger time of ISR, it confirmed that the interrupt handler is triggered every 18 ms.
(In reply to Jeff Chuang from comment #21)
> After measure the waveform of touchscreen interrupt pin, the interrupt pin
> is pulled down every 18.1 ms to trigger the touch ISR and the report rate is
> consistent at 55 points per second. So we can not provide better touch
> report rate with the current panel and charging components on flatfish.
> About the dropping report rate mentioned in comment 17, it is caused by the
> wrong algorithm on calculating report rate. After modified the test code to
> simply report the trigger time of ISR, it confirmed that the interrupt
> handler is triggered every 18 ms.

Vincent, according to this measurement, it's hardware limitation. There is no way to fix this problem at driver or gecko side. Please close this issue after double confirm
Flags: needinfo?(vlin)
As mentioned in Comment 22.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Flags: needinfo?(vlin)
Resolution: --- → WONTFIX
Whiteboard: [c=effect p= s= u=] → [c=effect p= s=2014.05.09.t u=flatfish]
Target Milestone: --- → 2.0 S1 (9may)
You need to log in before you can comment on or make changes to this bug.