Closed Bug 1177335 Opened 10 years ago Closed 10 years ago

Quickly scrolling in the x86 emulator causes the screen to jump around and makes most app unusable

Categories

(Firefox OS Graveyard :: Emulator, defect, P3)

All
Gonk (Firefox OS)
defect

Tracking

(firefox43 fixed)

RESOLVED FIXED
FxOS-S5 (21Aug)
Tracking Status
firefox43 --- fixed

People

(Reporter: gsvelto, Assigned: freesamael)

References

Details

Attachments

(2 files)

STR: - build the emulator-x86-kk & launch it - try scrolling the homescreen or the settings app - if scrolling very slowly everything works correctly, as soon as one picks up a little speed the screen starts jumping up and down seemingly erratically, this makes using those apps almost impossible After poking the code I think that this is caused by the touch resampling being activate even though there's no hardware vsync signal present in the emulator. It seems to me that for KitKat touch resampling just assumes that the hardware vsync signal is present.
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
This patch ensures that touch resampling is only enabled if the hardware compositor (and thus the hardware vsync signal) are available. This solves the problem with the scroll stuttering in the KitKat emulator which obviously does not have a real hardware compositor available and thus never generates vsync signals.
Attachment #8626053 - Flags: review?(mchang)
This seems wrong. Even though there isn't hardware vsync, we should be using software vsync, which should still enable touch resampling. If we have a bug with touch resampling, that could be a different story, but it shouldn't depend on having actual hardware vsync. Can you verify that Software vsync is occurring and that we're touch resampling? Software vsync should be happening at [1]. You can try disabling touch resampling at [2]. [1] https://dxr.mozilla.org/mozilla-central/source/gfx/thebes/SoftwareVsyncSource.cpp?from=SoftwareVsyncSource.cpp&case=true#107 [2] https://dxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxPrefs.h?from=gfxPrefs.h&case=true#238
Attachment #8626053 - Flags: review?(mchang)
(In reply to Mason Chang [:mchang] from comment #2) > This seems wrong. Even though there isn't hardware vsync, we should be using > software vsync, which should still enable touch resampling. If we have a bug > with touch resampling, that could be a different story, but it shouldn't > depend on having actual hardware vsync. OK, thanks for clarifying that. > Can you verify that Software vsync is occurring and that we're touch > resampling? Software vsync should be happening at [1]. You can try disabling > touch resampling at [2]. Software vsync is indeed occurring and touch re-sampling is enabled (disabling it makes the problem go away which is what fooled me into thinking that the issue was with vsyncs not being delivered). I'm having a look at the timestamps coming out of it to figure out why the end result is so jittery.
OK, after some further investigation I've figured out what's causing the jank (but not why it's happening): the diff between two timestamps as seen by the resampling code is sometimes negative, i.e. vsyncTouchDiff here [1] regularly becomes negative while scrolling. [1] https://dxr.mozilla.org/mozilla-central/source/widget/gonk/GeckoTouchDispatcher.cpp#178
This usually means that by the time the GeckoTouchDispatcher was able to process the touch event, another touch event has already come in from the input thread on the device. This is by default 20ms. It probably means that either the APZ Controller Thread (not sure what it is on an emulator, but on a device it's the compositor thread), or that the input thread is severely delayed in dispatching input events, so we're throwing away touch events in an effort to keep up. Is your CPU usage at like 100%?
(In reply to Mason Chang [:mchang] from comment #5) > This usually means that by the time the GeckoTouchDispatcher was able to > process the touch event, another touch event has already come in from the > input thread on the device. This is by default 20ms. It probably means that > either the APZ Controller Thread (not sure what it is on an emulator, but on > a device it's the compositor thread), or that the input thread is severely > delayed in dispatching input events, so we're throwing away touch events in > an effort to keep up. Is your CPU usage at like 100%? CPU usage is hovering around 80% while scrolling. Is there a way to figure out if we're throwing away events? Shouldn't we just stop resampling in that case? BTW I've noticed that the timestamp in the touch event is create using TimeStamp::FromSystemTime() so I was wondering if on the emulator build it's really using the same timesource as TimeStamp::Now().
OK, it looks like the timestamp is coming from systemTime(SYSTEM_TIME_MONOTONIC) which should use the same clock source as TimeStamp::Now() so that shouldn't be an issue (even though I'm still not 100% sure).
(In reply to Gabriele Svelto [:gsvelto] from comment #6) > (In reply to Mason Chang [:mchang] from comment #5) > > This usually means that by the time the GeckoTouchDispatcher was able to > > process the touch event, another touch event has already come in from the > > input thread on the device. This is by default 20ms. It probably means that > > either the APZ Controller Thread (not sure what it is on an emulator, but on > > a device it's the compositor thread), or that the input thread is severely > > delayed in dispatching input events, so we're throwing away touch events in > > an effort to keep up. Is your CPU usage at like 100%? > > CPU usage is hovering around 80% while scrolling. Is there a way to figure > out if we're throwing away events? Shouldn't we just stop resampling in that > case? BTW I've noticed that the timestamp in the touch event is create using > TimeStamp::FromSystemTime() so I was wondering if on the emulator build it's > really using the same timesource as TimeStamp::Now(). We should be throwing away events if we're consistently delayed. We check if we should resampling at [1], and if the thread is too busy to resample, we just use the last touch event. We could be throwing away lots of touch events. You can check if we're resampling, and if no, how many events we're throwing away at [2], as we only send the last touch event received. [1] https://dxr.mozilla.org/mozilla-central/source/widget/gonk/GeckoTouchDispatcher.cpp#183 [2] https://dxr.mozilla.org/mozilla-central/source/widget/gonk/GeckoTouchDispatcher.cpp#186
I've just done a quick check and it seems we're always resampling (i.e. [1] in your comment is always true). It seems to me that the most likely explanation for this behavior is that the clock source for those events is just unreliable but I wasn't able to prove it yet.
Does SYSTEM_TIME_MONOTONIC correspond to the respective POSIX clock CLOCK_MONOTONIC? CLOCK_MONOTONIC is affected by adjtime and can show clock skew. Is there a way to try CLOCK_MONOTONIC_RAW instead? This is a raw clock that is not affected by anything.
'man clock_gettime' has the details.
In bluetoothd we use Android's CLOCK_BOOTTIME because CLOCK_MONOTONIC is still affected by suspends. CLOCK_BOOTTIME ticks even through suspends.
Sorry for the delay, I've dug further into this and this is what I found by dumping out as much data as possible: - When the bizarre backwards jump happens in [1] aFrameDiff is negative and aTouchDiff is less than 1ms (i.e. it's printed out as 0 in [2]). The result of the interpolation is a point that is quite far from the current position. Here's a sample of this from the log: I/Gonk ( 988): interpolate base (204, 146), current (204, 144) to (204, 308) alpha -81.145760, touch diff 0, frame diff -20 - Looking at the events received in [3] I can often see that when the above happens more than two events in a row are happening before we have a chance to resample, i.e. GeckoTouchDispatcher::ResampleTouchMoves() is being called only once every 3-4 events. In this scenario the last two events have a very small time difference between them while between the oldest one and the most recent the time difference is bigger. Sometimes this happens with only two events though so this might not be relevant [1] http://hg.mozilla.org/mozilla-central/file/0b901209064c/widget/gonk/GeckoTouchDispatcher.cpp#l229 [2] http://hg.mozilla.org/mozilla-central/file/0b901209064c/widget/gonk/GeckoTouchDispatcher.cpp#l254 [3] http://hg.mozilla.org/mozilla-central/file/0b901209064c/widget/gonk/nsAppShell.cpp#l728 Since I'm not very familiar with this code I can't really tell what is the root cause but the practical effect is that the resampling code ends up emitting points that are quite far from the real sample position and thus cause the screen to jump around when scrolling. Mason what should be the proper fix in this scenario? Flat-out disabling resampling fixes the issue but I suppose we'd like a more generic fix that will always cope with above scenario correctly.
Flags: needinfo?(mchang)
(In reply to Gabriele Svelto [:gsvelto] from comment #13) > Sorry for the delay, I've dug further into this and this is what I found by > dumping out as much data as possible: > > - When the bizarre backwards jump happens in [1] aFrameDiff is negative and > aTouchDiff is less than 1ms (i.e. it's printed out as 0 in [2]). The result > of the interpolation is a point that is quite far from the current position. > Here's a sample of this from the log: > > I/Gonk ( 988): interpolate base (204, 146), current (204, 144) to (204, > 308) alpha -81.145760, touch diff 0, frame diff -20 > > - Looking at the events received in [3] I can often see that when the above > happens more than two events in a row are happening before we have a chance > to resample, i.e. GeckoTouchDispatcher::ResampleTouchMoves() is being called > only once every 3-4 events. In this scenario the last two events have a very > small time difference between them while between the oldest one and the most > recent the time difference is bigger. Sometimes this happens with only two > events though so this might not be relevant > Thanks for finding this! Multiple touch events happening before we have a chance to resample is ok, what's odd is that the last two events are next to each other. At least on physical hardware, the touch input events like vsync occur at pretty regular intervals. For example, on a flame, a touch event occurs roughly every 13 ms. If aTouchDiff is 0, that means we're getting two touch events at the same time. Do we know why this happens so differently compared to physical hardware? On physical hardware, we get input events at [1] on a different libui thread. Can you log the time difference between the touch events and see that they aren't clustering up at specific times? e.g. 2 events occur right next to each other, then a pause for like 50ms, then another 2 events? Are input events not occurring at regular intervals? If they are bunching up together, we probably have to set limits on how far ahead we can interpolate or figure out why they are bunching up together instead of being dispatched at regular intervals. [1] https://dxr.mozilla.org/mozilla-central/source/widget/gonk/nsAppShell.cpp#730
Flags: needinfo?(mchang) → needinfo?(gsvelto)
Priority: -- → P3
(In reply to Mason Chang [:mchang] from comment #14) > Thanks for finding this! Multiple touch events happening before we have a > chance to resample is ok, what's odd is that the last two events are next to > each other. At least on physical hardware, the touch input events like vsync > occur at pretty regular intervals. For example, on a flame, a touch event > occurs roughly every 13 ms. If aTouchDiff is 0, that means we're getting two > touch events at the same time. Do we know why this happens so differently > compared to physical hardware? On physical hardware, we get input events at > [1] on a different libui thread. Can you log the time difference between the > touch events and see that they aren't clustering up at specific times? e.g. > 2 events occur right next to each other, then a pause for like 50ms, then > another 2 events? Are input events not occurring at regular intervals? > > If they are bunching up together, we probably have to set limits on how far > ahead we can interpolate or figure out why they are bunching up together > instead of being dispatched at regular intervals. > > [1] > https://dxr.mozilla.org/mozilla-central/source/widget/gonk/nsAppShell.cpp#730 I have some interesting findings. The emulator uses a 60Hz GUI refresh rate [1]. On each refresh, qemulator_refresh [2] polls SDL event queue for pending events, passes those events to goldfish_event_device [3], and eventually delivers to the goldfish kernel. In usual case it works as expected, mouse events are delivered around every 16ms. However, if the host mouse polling rate is better than 60Hz, there could be multiple pending mouse events exist in the event queue when emulator polls it, causing multiple touch events occur together intermittently. I tried to add some logs in SDL's X11 video subsystem [4], there could be up to 2 or 3 pending mouse events exist (note that B2G build system uses libSDL.a under ${B2G}/prebuilts directory, so you'll have to overwrite the file if you modified SDL). Another interesting observation is that the symptom seems not happening on Mac OS X. No matter how slow the refresh rate is (I tried down to 10Hz), at most one mouse event exists. It should be related how Cocoa delivers input events. I didn't dig into that. [1] https://github.com/mozilla-b2g/platform_external_qemu/blob/b2g-kitkat/console.h#L22 [2] https://github.com/mozilla-b2g/platform_external_qemu/blob/b2g-kitkat/android/qemulator.c#L496 [3] https://github.com/mozilla-b2g/platform_external_qemu/blob/b2g-kitkat/hw/goldfish_events_device.c#L266 [4] https://github.com/mozilla-b2g/platform_external_qemu/blob/master/distrib/sdl-1.2.12/src/video/x11/SDL_x11events.c#L923
Hi Gabriele, Not sure if you're still working on this. Do you mind if I take it?
Assignee: gsvelto → sawang
Comment on attachment 8649661 [details] [diff] [review] Skip resampling if the time difference of touches is less than 2ms. r=mchang Hi Mason, Since there should be no real hardware which generates input events in less than 2ms, it looks to me that Android's 2ms minimal delta [1] was designed to avoid the emulator issue (Android emulator has the same behavior of sending multiple mouse events together). I would like to adapt Android's minimal delta design in Gecko's resampling algorithm, so it doesn't impact other use cases when software vsync applies, while avoid screen jumping on emulator as well. I've verified the patch on both emulator-kk and emulator-x86-kk and it works as expected. Would you help to review the patch? [1] http://androidxref.com/5.1.1_r6/xref/frameworks/native/libs/input/InputTransport.cpp#52
Attachment #8649661 - Flags: review?(mchang)
Attachment #8649661 - Flags: review?(mchang) → review+
Flags: needinfo?(gsvelto)
(In reply to Samael Wang [:freesamael][:sawang] from comment #18) > Comment on attachment 8649661 [details] [diff] [review] > Skip resampling if the time difference of touches is less than 2ms > > Hi Mason, > > Since there should be no real hardware which generates input events in less > than 2ms, it looks to me that Android's 2ms minimal delta [1] was designed > to avoid the emulator issue (Android emulator has the same behavior of > sending multiple mouse events together). > > I would like to adapt Android's minimal delta design in Gecko's resampling > algorithm, so it doesn't impact other use cases when software vsync applies, > while avoid screen jumping on emulator as well. > > I've verified the patch on both emulator-kk and emulator-x86-kk and it works > as expected. Would you help to review the patch? > > [1] > http://androidxref.com/5.1.1_r6/xref/frameworks/native/libs/input/ > InputTransport.cpp#52 Thanks for digging into this and fixing it!
Thanks, Mason!
Comment on attachment 8649661 [details] [diff] [review] Skip resampling if the time difference of touches is less than 2ms. r=mchang Try server result: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a03ae17f3571
Attachment #8649661 - Attachment description: Skip resampling if the time difference of touches is less than 2ms → Skip resampling if the time difference of touches is less than 2ms. r=mchang
(In reply to Samael Wang [:freesamael][:sawang] from comment #16) > Not sure if you're still working on this. Do you mind if I take it? Thanks for taking care of this! I've been too busy with other stuff and since disabling resampling worked around the problem I kept that in the meantime. Now I understand why this was happening on my machines; I've got some fairly old Razer mouses but they're all setup for at least 125Hz sampling (and one of them has a 1KHz sample frequency, yuck! I had completely forgot it supported that kind of sampling rate).
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → FxOS-S5 (21Aug)
Thank you so much for fixing this bug.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: