Closed Bug 1026527 Opened 10 years ago Closed 7 years ago

Touch interactions are sometimes ignored on Flame

Categories

(Testing Graveyard :: Eideticker, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: davehunt, Unassigned)

Details

Attachments

(2 files)

It appears that fresh after a flash the first B2G app launch test (contacts) works, however subsequent runs of this test fail to launch the application. Switching to test-orng.py I'm still unable to send the tap. If I send swipe_left then the homescreen swipes, and from then on I can successfully launch the app using the tap command.

After running the b2g-startup-contacts test the old homescreen will be in use. The coordinates I've been using to launch the contacts app are: 300, 759.
Spent the afternoon trying to debug this. I'm pretty sure the problem is in gecko somehow, not orangutan or the kernel level. Doing "tap 200 300" fixes the issue just as well as swiping. Oddly, it always works if I physically touch the device, even when I modify orangutan to send *exactly* the same events. 

I'm not sure of a good way to figuring this out deeper short of going deep into the gonk/gecko input layer, which I've done before, but don't really want to do again (it's a huge time sink). :/

A quick workaround would be to modify the startup test to do the above tap before actually trying to scroll the homescreen or tap on icons. It shouldn't effect anything. As a followup, we probably want to engage someone on the b2g team who can help us figure this out.
(In reply to William Lachance (:wlach) from comment #1)
> A quick workaround would be to modify the startup test to do the above tap
> before actually trying to scroll the homescreen or tap on icons. It
> shouldn't effect anything. As a followup, we probably want to engage someone
> on the b2g team who can help us figure this out.

I'm not clear on what's needed for this workaround. Are you suggesting performing a tap at [200, 300] before the tap on the icon? Isn't there a chance that this would tap another icon, causing a different app to launch?
(In reply to Dave Hunt (:davehunt) from comment #2)
> (In reply to William Lachance (:wlach) from comment #1)
> > A quick workaround would be to modify the startup test to do the above tap
> > before actually trying to scroll the homescreen or tap on icons. It
> > shouldn't effect anything. As a followup, we probably want to engage someone
> > on the b2g team who can help us figure this out.
> 
> I'm not clear on what's needed for this workaround. Are you suggesting
> performing a tap at [200, 300] before the tap on the icon? Isn't there a
> chance that this would tap another icon, causing a different app to launch?

Well, I think it normally starts on the everything.me page, so it shouldn't effect anything? Another alternative might be to just swipe, I guess. I don't want to block deploying the flame on the fundamental issue, if it's really difficult to fix. I'll dig into this more today and try a few more things.
Assignee: nobody → wlachance
Ok, I've been looking at this all day without apparent success, although I did find out that we were calculating touch positions wrong (patch for that forthcoming). Here's what confuses me: if I capture the event stream using getevent -i -l during the unrecognized tap, here's what I see:

[   38178.758461] EV_ABS       ABS_MT_TRACKING_ID   00000000            
[   38178.760847] EV_SYN       SYN_REPORT           00000000            
[   38178.870286] EV_ABS       ABS_MT_TRACKING_ID   ffffffff            
[   38178.871736] EV_SYN       SYN_REPORT           00000000             rate 9

If I execute a tap on another portion of the touchscreen, all of a sudden I start seeing this:

[   37735.978886] EV_ABS       ABS_MT_TRACKING_ID   00000000            
[   37735.980733] EV_ABS       ABS_MT_POSITION_X    0000012c            
[   37735.981150] EV_ABS       ABS_MT_POSITION_Y    0000031b            
[   37735.981844] EV_ABS       ABS_MT_TOUCH_MAJOR   0000007f            
[   37735.982208] EV_ABS       ABS_MT_WIDTH_MAJOR   00000004            
[   37735.982894] EV_SYN       SYN_REPORT           00000000            
[   37736.094577] EV_ABS       ABS_MT_TRACKING_ID   ffffffff            
[   37736.096705] EV_SYN       SYN_REPORT           00000000             rate 8

Some small fraction of the time (10%?) this works as expected the first time. Other times only the x coordinate gets eaten. But most of the time, no, the expected events corresponding to the x/y coordinates of the gesture are seemingly never received, and so the whole event is a no-op. You would think that this was a driver issue, except that this seems to happen after I restart the b2g process, which has nothing to do with the driver? I'm very confused.

Doing a tap to a higher region of the touchscreen bounds (i.e. 100 100) always fixes the problem. It seems as if this event only occurs when tapping the screen relatively near the edge.

Pinging :mwu, who might have an idea about how the input subsystem on the flame might work? The orangutan source is at https://github.com/mozilla-b2g/orangutan
Flags: needinfo?(mwu)
Attached patch WorkaroundSplinter Review
Attachment #8442879 - Flags: review?(dave.hunt)
Summary: Interactions are sometimes ignored on Flame → Touch interactions are sometimes ignored on Flame
I think I need more specific STR to help here.
Flags: needinfo?(mwu)
Ok, somewhat involved reproduction steps. On Linux do something like:

virtualenv orng-test
cd orng-test
source bin/activate
pip install gaiatest
wget http://people.mozilla.com/~wlachance/orng
wget http://people.mozilla.com/~wlachance/test-orng
wget http://people.mozilla.com/~wlachance/switch-to-homescreen
wget http://people.mozilla.com/~wlachance/reset-homescreen

# flame must be plugged in from here onwards. marionette must be enabled on flame.
adb forward tcp:2828 tcp:2828
adb push orng /data/local/orng
adb shell chmod 777 /data/local/orng
python reset-homescreen # only needs running once
adb shell stop b2g && adb shell start b2g && python switch-to-homescreen
python test-orng --input-device /dev/input/event0
   type -> tap 300 800

expected result: contacts app is opened 100% of the time
actual result: contacts is not opened 80% of the time (sometimes it works)

if, instead, you do:

   type -> swipe_right
   type -> type: tap 300 800

Contacts is always opened
Flags: needinfo?(mwu)
Comment on attachment 8442879 [details] [diff] [review]
Workaround

Review of attachment 8442879 [details] [diff] [review]:
-----------------------------------------------------------------

Works great, thanks!
Attachment #8442879 - Flags: review?(dave.hunt) → review+
No longer blocks: 1020215
Just tried with the latest build of today, still reproducible.
Jed, gwagner suggested you might be a good person to help investigate this problem.  Do you have time to help out here?
Flags: needinfo?(jed+bmo)
Is there fade in/out is use in Gaia here?
https://bugzilla.mozilla.org/show_bug.cgi?id=869512

I've seen similar touch failures but only immediately after a transition so there may be other transitions affected. Usually a short sleep resolves it.
(In reply to Zac C (:zac) from comment #12)
> Is there fade in/out is use in Gaia here?
> https://bugzilla.mozilla.org/show_bug.cgi?id=869512
> 
> I've seen similar touch failures but only immediately after a transition so
> there may be other transitions affected. Usually a short sleep resolves it.

No, this is just on the old home screen, not doing anything at all.
(In reply to Jonathan Griffin (:jgriffin) PTO July 10,11 from comment #11)
> Jed, gwagner suggested you might be a good person to help investigate this
> problem.  Do you have time to help out here?

I'll take a look now, but I don't expect I'll be the right person to shine much light on this.  I wonder if :mdas has any ideas, as she has deep knowledge of event handling in tests?
Flags: needinfo?(jed+bmo) → needinfo?(mdas)
(In reply to Jed Parsons (use needinfo, please) [:jedp, :jparsons] from comment #14)
> (In reply to Jonathan Griffin (:jgriffin) PTO July 10,11 from comment #11)
> > Jed, gwagner suggested you might be a good person to help investigate this
> > problem.  Do you have time to help out here?
> 
> I'll take a look now, but I don't expect I'll be the right person to shine
> much light on this.  I wonder if :mdas has any ideas, as she has deep
> knowledge of event handling in tests?

Too many Jeds :) We were talking about jld. jld, can you take a look here?
Flags: needinfo?(jld)
Flags: needinfo?(mdas)
Chatted a bit about how to fix this with mwu on irc:

<wlach> jld: yeah orangutan was a bit finnicky a year ago... but I think that problem is fixed. in any case, the problem now seems to be that some low-level input events are getting eaten
<mwu> that's the wrong input device
<mwu> wait
<mwu> no
<mwu> that's right
<wlach> mwu: yeah it changed for some reason
<mwu> sitting too far away from the monitor to read 8 and 0 right
<mwu> Goodix-TS
<mwu> kinda weird that events would get eaten though - everything is being fed by userspace
<mwu> but I don't know how that mechanism works
<wlach> mwu: I don't know how to explain the behaviour... I've never seen it before. it seems like when you physically interact with the touchscreen, the events get generated, but when we insert them with orangutan they get eaten/ignored and not emitted
<wlach> mwu: hence it feels like a kernel-level problem to me
* alex_tz (Adium@moz-DFAA4E15.p2p.sfo1.mozilla.com) has joined #developers
<mwu> wlach: have you tested other devices recently?
<wlach> mwu: more or less the same orangutan code was working fine with the hamachi, inari, and tarako
<mwu> and you tested recently?
<mwu> within the last few months, we added code which eats late events
<wlach> mwu: the tarako quite recently, running 1.3t. the inari and hamachi I haven't tested
<wlach> ah
<wlach> mwu: at the kernel level?
<mwu> gecko level
<wlach> hmm
<mwu> so it runs on every device
<wlach> let me test something
<mwu> and I don't remember if it takes timestamps from the event or from when it receives events
<wlach> mwu: this sounds like it could be the culprit... except that it seems like it's the actual kernel events that are getting eaten
<wlach> mwu: like not appearing when I run getevent
<mwu> sounds different then
<mwu> you should dig into the input layer then
<wlach> mwu: but I guess one odd thing is that this behaviour is most reproducible when I restart gecko
<wlach> mwu: is there a about:config flag to turn this new behaviour off?
<mwu> nope
<wlach> mwu: do you have a bug #?
<mwu> you can hack it off in widget/gonk/nsAppShell.cpp, but by the way you describe it, I doubt it'll help
<mwu> wlach: bug 971633 . but the code has been changed a bit
<wlach> mwu: I feel like the fastest route is to instrument the kernel driver to debug what's happening
<mwu> I'd printk everything in the input subsystem
<wlach> mwu: yeah that makes sense. this isn't normally work I do, so I don't have a flame checkout or anything. I don't mind doing it if neither you or jld has time, but it'll probably take me a lot longer
<mwu> wlach: it's in my NI queue at least, so I'll probably get to it eventually
<mwu> is the current work around insufficient?
<wlach> mwu: it works okish. our scrolling tests don't currently work (it seems like orangutan is hanging) I suspect due to this issue
<mwu> ok
Just verified that this is reproducible with 1.4.
I'm going to be busy with sandboxing for the foreseeable future, and I know approximately nothing about the input stack, so it's probably better if someone else looks at this.
Flags: needinfo?(jld)
The bug is in orangutan. The input layer needs ABS_MT_SLOT sent first.
Flags: needinfo?(mwu)
So I tried to add a change mt slot command into the orangutan tap command, but this doesn't seem to fix the issue. Also, I don't see this command sent when I listen for events being sent to the touchscreen with getevent -l /dev/input/event0:

EV_ABS       ABS_MT_TRACKING_ID   00000000            
EV_ABS       ABS_MT_POSITION_X    00000146            
EV_ABS       ABS_MT_POSITION_Y    0000028c            
EV_ABS       ABS_MT_TOUCH_MAJOR   0000001d            
EV_ABS       ABS_MT_WIDTH_MAJOR   0000001d            
EV_SYN       SYN_REPORT           00000000            
EV_ABS       ABS_MT_TRACKING_ID   ffffffff            
EV_SYN       SYN_REPORT           00000000            

Could you describe your reasoning for thinking the problem was a missing slot message?
Attachment #8461621 - Flags: feedback?(mwu)
If it turns out that the bug is in gecko, smaug may be able to help.  CCing him preemptively.
I'm definitely not working on this. I think we had some irc discussions since I last commented on the bug, and we're pretty sure it's a kernel issue. It looks like no one has time to look into this though...
Assignee: wlachance → nobody
Comment on attachment 8461621 [details] [diff] [review]
Non-working patch

Cancelling feedback request on patch which we know doesn't address the issue.
Attachment #8461621 - Flags: feedback?(mwu)
Hi Ting, it sounds like you fixed this exact issue in bug 1168269? If so, please resolve this as a duplicate. :)
Flags: needinfo?(janus926)
The problem here should be the same as bug 1168269 comment 22, but I added just a reset command to orng, the test case still need to trigger a reset command before it starts the test. I didn't add code to reset everytime running orng since a test could run it many times but only need reset once.

Change attachment 8442879 [details] [diff] [review] to a reset command should work.
Flags: needinfo?(janus926)
Ok, we're not running Eideticker against B2G at the moment, so let's leave this unfixed for now. It's good to know a solution exists though. :)
Status: NEW → UNCONFIRMED
Ever confirmed: false
Status: UNCONFIRMED → NEW
Ever confirmed: true
Eideticker and Firefox OS are no longer active projects.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: