Closed Bug 1047639 Opened 6 years ago Closed 6 years ago

Unable to open any app from home screen.

Categories

(Core Graveyard :: Widget: Gonk, defect, P1, blocker)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:2.0+, firefox32 wontfix, firefox33 wontfix, firefox34 fixed, b2g-v1.3T affected, b2g-v1.4 unaffected, b2g-v2.0 fixed, b2g-v2.1 fixed)

RESOLVED FIXED
2.1 S2 (15aug)
blocking-b2g 2.0+
Tracking Status
firefox32 --- wontfix
firefox33 --- wontfix
firefox34 --- fixed
b2g-v1.3T --- affected
b2g-v1.4 --- unaffected
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: tkundu, Assigned: kats)

References

()

Details

(Keywords: regression, Whiteboard: [caf priority: p3][CR 700015][systemsfe])

Attachments

(5 files, 2 obsolete files)

Test steps:
1. Tap on camera app to open.
2. While opening camera app, press power key and tap on “Turn airplane mode on”
3.  after that tap on that screen multiple times.
4. Press cancel in that screen. (Long press power button screen)
5. Not responding. Press home key.
6. Tap on any app in home screen. No response.
Observations:
1. Time is updating on UI.
2. Power button and Volume button are functional.
3. Long press on home key “no recent apps”
4. Press on home key gives vibration.
5. Tapping on any icon in home key no response. Unable to scroll the screen.
6. It is not a memleak issue.
7  if we kill homescreen |adb shell kill pid| then homescreen restarts but still it is stuck.

https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/v2.0&id=0a864988f5dce7f9f3dea9609e8ef054679c30ff
https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/v2.0&id=745b486db495248e4d4503039e374cb8d5bb244f


I knew that STR will be very difficult to reproduce . But if you give us some patch which can produce additional logs then it may help you to analyze more here.
[Blocking Requested - why for this release]:
blocking-b2g: --- → 2.0?
Did the build you saw this on have the patch for bug 1038854 in it before it was backed out?
Flags: needinfo?(tkundu)
Whiteboard: [CR 700015] → [caf priority: p1][CR 700015]
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #2)
> Did the build you saw this on have the patch for bug 1038854 in it before it
> was backed out?

No. We can test with that patch and confirm you again.
Flags: needinfo?(tkundu)
Flags: needinfo?(tkundu)
The patch that landed caused a number of regressions, including apps not starting, so I want to make sure that you do *not* have it.
Kyle -- we do not have the patch in our build.
Flags: needinfo?(tkundu)
(In reply to Tapas Kumar Kundu from comment #0)
> 6. Tap on any app in home screen. No response.
> 5. Tapping on any icon in home key no response. Unable to scroll the screen.

The fact that the homescreen renders icons, but you can not even scroll makes me feel that this is probably a panning and zooming bug, though bug 1043689 was recently handed off to me and describes that we may be instantiating multiple instances of dragdrop. I have no idea if this could be causing the problem, but it seems like there might be a chance that we are doing weird stuff with touch events. Going to depend on that for now and get a patch in.
Depends on: 1043689
Whiteboard: [caf priority: p1][CR 700015] → [caf priority: p1][CR 700015][systemsfe]
After being able to reproduce bug 1043689, this is not related.
No longer depends on: 1043689
blocking-b2g: 2.0? → 2.0+
The log shows:
08-01 19:56:09.885  1464  1464 E GeckoConsole: Content JS ERROR at app://verticalhome.gaiamobile.org/gaia_build_defer_index.js:392 in GridItem.prototype.doRenderIcon/<: Error fetching icon [Exception... "File error: Not found"  nsresult: "0x80520012 (NS_ERROR_FILE_NOT_FOUND)"  location: "JS frame :: app://verticalhome.gaiamobile.org/gaia_build_defer_index.js :: fetchBlob/< :: line 374"  data: no]
Assignee: nobody → kgrandon
(In reply to Gregor Wagner [:gwagner] from comment #8)
> The log shows:

Sounds like an error we should squelch, but would not lead to this. I've also noticed that we are unable to launch apps in desktop firefox. Seems like they might be related, and that this should go into the mozapps component, but will try to verify.
Can we add qawanted here for verification?

Ideally I would like to see a video/screenshot and potentially reduced STR if possible.
Keywords: qawanted
There is also:
8-01 15:53:51.368   226   226 E GeckoConsole: [JavaScript Error: "TypeError: StackManager.getCurrent(...) is undefined" {file: "app://system.gaiamobile.org/js/edge_swipe_detector.js" line: 119}]
Unable to reproduce this issue on Flame 2.0 or Buri 2.0 devices.

Environmental Variables:
Device: Flame 2.0
BuildID: 20140804060132
Gaia: 4ab7384db7aee130be165a699472cc19405a4456
Gecko: 5e94ab16ec71
Version: 32.0 (2.0) 
Firmware Version: v122
---------------------------------------------------
Environmental Variables:
Device: Buri 2.0
BuildID: 20140804060132
Gaia: 4ab7384db7aee130be165a699472cc19405a4456
Gecko: 5e94ab16ec71
Version: 32.0 (2.0) 
Firmware Version: v1.2device.cfg
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
QA Contact: croesch
Tapas - We've also just landed and uplifted bug 1043293 to v2.0. Can you test with the patch from the latest v2.0 (or ensure that you have the one from that bug), and see if this issue reproduces?

That bug changes the way that we listen to click/touch events, and I just wanted to be sure that it does or does not fix the problem. Thanks!
Flags: needinfo?(tkundu)
(In reply to Kevin Grandon :kgrandon from comment #13)
> Tapas - We've also just landed and uplifted bug 1043293 to v2.0. Can you
> test with the patch from the latest v2.0 (or ensure that you have the one
> from that bug), and see if this issue reproduces?
> 
> That bug changes the way that we listen to click/touch events, and I just
> wanted to be sure that it does or does not fix the problem. Thanks!

Kevin, Thanks for your quick help in this. I will update you asap on this :)
I don't think this is a type of bug we are going to be able to reproduce manually, as this comes across as a bug only seen in a stress-style automation setup. Our best option here is to setup logging, reproduce the bug, and diagnose it via the avenue.
Target Milestone: --- → 2.1 S2 (15aug)
Tapas: could you clarify some things about your STR to help us understand?

> Test steps:
> 1. Tap on camera app to open.

Have you tried to reproduce this with apps other than the Camera?  In particular, does it reproduce for other full-screen apps like the Gallery?  The status bar is not displayed in Camera or Gallery, but turning airplane mode on will want to update the status bar.  If this is not camera specific but is full-screen specific that would be a big hint.

> 2. While opening camera app, press power key and tap on “Turn airplane mode
> on”

Does step 2 have to happen immediately after step 1? Literally while the app is launching?

> 3.  after that tap on that screen multiple times.

I don't understand this part: doesn't the power dialog go away as soon as you tap "turn on airplane mode"?  How can you tap on the screen multiple times?

> 4. Press cancel in that screen. (Long press power button screen)

Again, I don't understand why this is even possible. Once you've tapped the airplane mode button the cancel button should not be visible. I guess you're saying that there is also a bug in the way the power button menu is behaving, not just in the home screen, is that right?

> 5. Not responding. Press home key.

Do you mean that the cancel button does not dismiss the power button menu?  But that the home key does?

> 6. Tap on any app in home screen. No response.

Hmm. I wonder if the system app has made the power menu transparent but has not actually removed it? Are the tap events being directed to an invisible overlay and never getting to the homescreen?

Finally, is this on a 256mb device?  Does it feel sluggish (like zmem is thrashing) when you reproduce this?
(In reply to David Flanagan [:djf] from comment #16)
> Tapas: could you clarify some things about your STR to help us understand?
> 
> > Test steps:

I am sorry to tell you that this issue has came stability testing and this STR will never help you in reproducing this issue.
PLEASE DON'T TRY TO REPRODUCE IT YOUR OWN DEVICE. THIS WILL SAVE YOUR TIME.

> 
> Hmm. I wonder if the system app has made the power menu transparent but has
> not actually removed it? Are the tap events being directed to an invisible
> overlay and never getting to the homescreen?
> 

Nice find. Could you please give us a logging patch which can confirm this theory ? 

> Finally, is this on a 256mb device?  
YES

> Does it feel sluggish (like zmem is
> thrashing) when you reproduce this?

I think that LMK should kill app before systems becomes very slow. 

BTW, here is the memory log when issue has happened on device :

                           |      megabytes     |
           NAME   PID PPID  CPU(s) NICE  USS  PSS  RSS SWAP VSIZE OOM_ADJ USER
            b2g   228    1 11760.2    0 46.8 47.7 49.8 19.7 245.8       0 root
         (Nuwa)  1074  228    87.5    0  0.0  0.1  0.6  7.8  53.9       0 root
     Homescreen  4313  228    29.2    1  0.0  0.9  2.8 15.8  80.0       2 u0_a4313
(Preallocated a 12103 1074     2.2   18  0.0  0.8  2.5 10.8  60.9       1 u0_a12103

System memory info:

            Total 167.6 MB
        SwapTotal 192.0 MB
     Used - cache 127.3 MB
  B2G procs (PSS)  49.5 MB
    Non-B2G procs  77.7 MB
     Free + cache  40.4 MB
             Free  13.5 MB
            Cache  26.8 MB
         SwapFree 113.5 MB

Low-memory killer parameters:

  notify_trigger 9216 KB

  oom_adj min_free
        0  4096 KB
       58  5120 KB
      117  6144 KB
      352  7168 KB
      470  8192 KB
      588 10240 KB
             
This does not tell me that we are really running very low memory when issue has happened. 

Please note that we have seen this issue in latest gaia/gecko
https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/v2.0&id=9e5907995c9327f14cb5d182cee5ff16b1743ed4
https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/v2.0&id=ed87f0a54baf646eb0b20b4debc090c8016d2104

and we also have fix from bug bug 1043293 in that test build
Flags: needinfo?(dflanagan)
Flags: needinfo?(tkundu)
Attached patch Patch - Debug APZ (obsolete) — Splinter Review
Kats - I'm looking for a patch to help debug this issue. Because Comment 0 states that the screen can not even be scrolled, I think getting some APZ logs might help. Do you think this patch will help us get the necessary information, and is there additional logging we should add before asking Tapas to test it?
Flags: needinfo?(bugmail.mozilla)
(In reply to Kevin Grandon :kgrandon from comment #18)
> Created attachment 8468869 [details] [diff] [review]
> Patch - Debug APZ
> 
> Kats - I'm looking for a patch to help debug this issue. Because Comment 0
> states that the screen can not even be scrolled, I think getting some APZ
> logs might help. Do you think this patch will help us get the necessary
> information, and is there additional logging we should add before asking
> Tapas to test it?

I would also enable APZCTM_LOG at the top of gfx/layers/apz/src/APZCTreeManager.cpp.
Attached patch Patch - Debug APZ (obsolete) — Splinter Review
Tapas - would you be able to apply these gecko patches and try to reproduce the issue? I think they may help us track it down. Thank you!
Attachment #8468869 - Attachment is obsolete: true
Flags: needinfo?(tkundu)
Attached file logs_mozilla.tar.bz2
here is latest log when this issue happens again.
Flags: needinfo?(tkundu)
(In reply to Tapas Kumar Kundu from comment #21)
> Created attachment 8468879 [details]
> logs_mozilla.tar.bz2
> 
> here is latest log when this issue happens again.

Tapas - do those logs contain the patch that I attached previously? I don't see any of the expected debug statements in them.
Flags: needinfo?(tkundu)
(In reply to Kevin Grandon :kgrandon from comment #22)
> (In reply to Tapas Kumar Kundu from comment #21)
> > Created attachment 8468879 [details]
> > logs_mozilla.tar.bz2
> > 
> > here is latest log when this issue happens again.
> 
> Tapas - do those logs contain the patch that I attached previously? I don't
> see any of the expected debug statements in them.

I will try this patch. Those logs are from a device which does not have your patch. I asked our test team to test with your patch and we will update soon


BTW, if you look into logs of #Comment 20, then you will see we have lots of "queued-ipc-messages" in 

0 (100.0%) -- queued-ipc-messages
├──0 (100.0%) ── content-parent((Preallocated), pid=12103, open channel, 0xac672800, refcnt=15)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac1a5400, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac338c00, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac431800, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac464000, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac466800, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac469000, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac46b000, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac46b400, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac46b800, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac46bc00, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac46c800, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac46d000, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac46d800, refcnt=1)
............................many others are present. ..................
├──0 (100.0%) ── content-parent(Homescreen, pid=4313, open channel, 0xac17dc00, refcnt=51)
├──0 (100.0%) ── content-parent(Settings, pid=-1, closed channel, 0xac80fc00, refcnt=1)
└──0 (100.0%) ── content-parent(Settings, pid=-1, closed channel, 0xad3bbc00, refcnt=1)


My question : Are we leaking IPC message from b2g ? Could you please tell me story what can show this kind of queued IPC message in memory report of affected test device ? It would be great if you can also give us some logging patch to debug it in this direction as well. Thanks a lot.
Flags: needinfo?(tkundu) → needinfo?(kgrandon)
(In reply to Kevin Grandon :kgrandon from comment #20)
> Created attachment 8468874 [details] [diff] [review]
> Patch - Debug APZ
> 
> Tapas - would you be able to apply these gecko patches and try to reproduce
> the issue? I think they may help us track it down. Thank you!

Could you please re-base your patch for FFOS v2.0 ?
Tapas - Here is an updated patch for v2.0. Thanks!
Attachment #8468874 - Attachment is obsolete: true
Flags: needinfo?(kgrandon) → needinfo?(tkundu)
(In reply to Tapas Kumar Kundu from comment #23)
> My question : Are we leaking IPC message from b2g ? Could you please tell me
> story what can show this kind of queued IPC message in memory report of
> affected test device ? It would be great if you can also give us some
> logging patch to debug it in this direction as well. Thanks a lot.

I will also check with some people who might know more about IPC than I know.
Clearing needinfo for now; the patch you have should provide the relevant logging from the APZ side. (Thanks botond!)

Tapas, if this is going to be run in a stress-test type scenario, the log will likely be very large. It would help to make sure the log has timestamps and that you know the approximate timestamp at which the error manifested. We can then work backwards from that point to see where things went wrong.
Flags: needinfo?(bugmail.mozilla)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #27)
> Clearing needinfo for now; the patch you have should provide the relevant
> logging from the APZ side. (Thanks botond!)
> 
> Tapas, if this is going to be run in a stress-test type scenario, the log
> will likely be very large. It would help to make sure the log has timestamps
> and that you know the approximate timestamp at which the error manifested.
> We can then work backwards from that point to see where things went wrong.

This issue does not produce any crash . Stability team will run this test for 48 hours and it will be difficult to say when this issue has occurred after 48hours.  We cannot ask them to watch device 48 hours continuously.

But I can assure you that all logs will always have device timestamps. 

Could you please add any log which can confirm approximate time stamp of this UI hang issue ? Otherwise our test team can confirm approximate time stamp only when they come to office (i.e. may be after 1 day of testing ) . Sorry for it.
(In reply to Tapas Kumar Kundu from comment #28)
> This issue does not produce any crash . Stability team will run this test
> for 48 hours and it will be difficult to say when this issue has occurred
> after 48hours.  We cannot ask them to watch device 48 hours continuously.

Maybe it would help if you explain the kind of setup you're using to reproduce this. Based on the STR you provided, I assume you have the device in a physical harness with some sort of mechanical finger that you're using to stress test the device. 48 hours later you go to the device and see that it's stuck on the homescreen and that tapping anywhere has no effect. So you conclude that somewhere during the stress test things went bad, but you have no clear STR because you're not sure at what point the problem happened.

Is that accurate? If not please describe in as much detail as you can what is going on.

> Could you please add any log which can confirm approximate time stamp of
> this UI hang issue ? Otherwise our test team can confirm approximate time
> stamp only when they come to office (i.e. may be after 1 day of testing ) .
> Sorry for it.

If we don't know what is causing the problem then there's no way to add logging to catch this problem. If you can describe the setup that causes this issue to occur we might be able to narrow down the cause enough to add some targetted logging.
(In reply to Tapas Kumar Kundu from comment #23)
> My question : Are we leaking IPC message from b2g ? Could you please tell me
> story what can show this kind of queued IPC message in memory report of
> affected test device ? It would be great if you can also give us some
> logging patch to debug it in this direction as well. Thanks a lot.

No, we're not leaking IPC messages.  You can see that the count of pending messages is 0.  What we are leaking is ContentParent objects, the parent side IPC machinery.  I'm investigating.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #30)
> (In reply to Tapas Kumar Kundu from comment #23)
> > My question : Are we leaking IPC message from b2g ? Could you please tell me
> > story what can show this kind of queued IPC message in memory report of
> > affected test device ? It would be great if you can also give us some
> > logging patch to debug it in this direction as well. Thanks a lot.
> 
> No, we're not leaking IPC messages.  You can see that the count of pending
> messages is 0.  What we are leaking is ContentParent objects, the parent
> side IPC machinery.  I'm investigating.

I spun this off as bug 1050423 with some information about what's happening.
Attached file logs with APZ traces
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #27)
> Clearing needinfo for now; the patch you have should provide the relevant
> logging from the APZ side. (Thanks botond!)
> 
> Tapas, if this is going to be run in a stress-test type scenario, the log
> will likely be very large. It would help to make sure the log has timestamps
> and that you know the approximate timestamp at which the error manifested.
> We can then work backwards from that point to see where things went wrong.


Approximate timestamp when this issue happens is: 01-08 02:11 .

Please note if we kill homescreen then it does not restart again. 

I am also seeing following log in logcat :

 01-08 02:13:27.319  1445  1445 I GeckoIPC: [time:612807326939][1445->229][PTextureChild] Sending Msg_RemoveTexture([TODO])

1445 is homescreen pid and it says that homescreen main thread is doing some activity even after this issue has occurred. 

Probably, b2g is stuck somewhere . Any idea how to further debugging here ?  

It seems to me interesting that homescreen process is not killed
Flags: needinfo?(tkundu) → needinfo?(bugmail.mozilla)
The last set of touch events received in the APZ code was at 01-08 02:05:53.019. This is a while before the timestamp you say things go bad at. So the problem is somewhere earlier in the flow of input events, likely somewhere in the root process event handling code, or the edge swipe detector (see [1] for reference).

Also kind of interesting is that the last set of touch events received by the APZ code had two touch-start events and one touch-end event, and then the controller instance handling them was destroyed (presumably because the layer tree changed). Although kind of weird, I don't think this would have left anything in a permanently-bad state in the APZ code, because the next touch-start event received should have reset the state and found a new controller to handle it. It's just that we didn't get any events after that point.

Assuming your test harness is flawless (a rather big assumption, given that you haven't answered my question about how it works) I think the most likely place for things to go wrong is in the edge swipe detector code [2].

[1] https://bug930939.bugzilla.mozilla.org/attachment.cgi?id=8437066
[2] https://github.com/mozilla-b2g/gaia/blob/e28dd2237b052024866e5cfa8776373f4d6820a9/apps/system/js/touch_forwarder.js#L96
Flags: needinfo?(bugmail.mozilla)
Not sure if bug 1050423 should be in the dependency chain or not, so just doing a "see also" for now so it's easy to find.
See Also: → 1050423
In terms of additional logging, I would suggest adding a log in nsAppShell or nsWindow where the events are actually created and dispatched, and another log in the edge swipe detector code to see if it's mishandling them.
Tapas, can we also verify that bug 1050423 is not related? Same steps as bug 1047645 comment 21.
:fabrice, given you helped in bug 1050423, can we get more predictable info that the bug you fixed has a high probability of helping p here that Kevin is suspecting? Or should we start looking in some other direction.
Flags: needinfo?(fabrice)
(In reply to bhavana bajaj [:bajaj] from comment #37)
> :fabrice, given you helped in bug 1050423, can we get more predictable info
> that the bug you fixed has a high probability of helping p here that Kevin
> is suspecting? Or should we start looking in some other direction.

comment 23 shows ContentParent leaks related to the camera app that are very likely fixed in bug 1050423 but the ones related to the Settings app won't. bug 1050423 fixed issues related to the permission prompts that we don't use anywhere in the context of the settings app.

Kyle, any idea what's holding the settings app contentparent alive?
Flags: needinfo?(fabrice) → needinfo?(khuey)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell)
I'm not aware of any way to determine which iframe belongs to which app from the CC log, so I would have to do an exhaustive search of the 49 iframes, identifying what is holding each one of them alive.  I don't really have time to do that unfortunately.  If you can point me at some way to tell which iframe belongs to which app (is there a property with the app URI hanging off of the JS object or something?) I can take a look, but otherwise I think we have to wait for a log with bug 1050423 fixed.

Regardless, I think the ContentParent leaks are unlikely to be related to the root cause of this bug.  I've never seen those sorts of leaks cause any problems other than increased memory usage before.
Flags: needinfo?(khuey)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #39)
> I'm not aware of any way to determine which iframe belongs to which app from
> the CC log, so I would have to do an exhaustive search of the 49 iframes,
> identifying what is holding each one of them alive.  I don't really have
> time to do that unfortunately.  If you can point me at some way to tell
> which iframe belongs to which app (is there a property with the app URI
> hanging off of the JS object or something?) I can take a look, but otherwise
> I think we have to wait for a log with bug 1050423 fixed.
> 
> Regardless, I think the ContentParent leaks are unlikely to be related to
> the root cause of this bug.  I've never seen those sorts of leaks cause any
> problems other than increased memory usage before.

The mozapp attribute gives the manifest url, but I don't think it's reflected to property.
I debugged a device in this state with Tapas at CAF today.  We found that the device was in a state where mTouchEventsFiltered[0] was true and mTouchDownCount[1] is 2.  Because of this, we discard all touch events, and we do not recheck the timestamps because this code believes we are still in the middle of some touch.

[0] http://mxr.mozilla.org/mozilla-central/source/widget/gonk/nsAppShell.cpp#759
[1] http://mxr.mozilla.org/mozilla-central/source/widget/gonk/nsAppShell.cpp#726
Component: Gaia::Homescreen → Widget: Gonk
Product: Firefox OS → Core
Thanks for digging into this Kyle. Being that the fix for this is in Gonk, I'm probably not the best person to look into this if it's urgent. Let me know if I can help though.
Assignee: kgrandon → nobody
Assignee: nobody → mwu
After looking at the widget code a bit I believe we should be using the id instead of the index when updating the mTouchDown bitset. This is likely a regression from bug 988110 which added the code. I have a patch that I'm building to do a sanity test on, will post it shortly.
Passing to kats for now, but if this doesn't work out, feel free to pass it back to me and I'll look at it. I have some other ideas to try if this doesn't work.
Assignee: mwu → bugmail.mozilla
This patch is based off 2.0 but the code hasn't changed since then so the same patch should apply to master.
Attachment #8471103 - Flags: review?(mwu)
Attachment #8471103 - Flags: feedback?(khuey)
Comment on attachment 8471103 [details] [diff] [review]
Use pointer id instead of pointer index when tracking pointers

Redirecting f? to CAF for inclusion in their test runs.
Attachment #8471103 - Flags: feedback?(khuey) → feedback?(tkundu)
I'm slightly worried about touch ids that are greater than 31, since we're using a 32 bit bitfield. Those ids would break the touch down counter. Probably not a big deal on real devices, but maybe we can avoid incrementing the counter for touch ids we can't handle.

Also, we can remove the index variable now that we're using id.
(In reply to Michael Wu [:mwu] from comment #47)
> I'm slightly worried about touch ids that are greater than 31, since we're
> using a 32 bit bitfield. Those ids would break the touch down counter.
> Probably not a big deal on real devices, but maybe we can avoid incrementing
> the counter for touch ids we can't handle.

If the hardware ever generates touch ids > 31 then the code in libui/InputReader.cpp will break before it ever gets here. See for example http://mxr.mozilla.org/mozilla-central/source/widget/gonk/libui/InputReader.cpp?rev=0b914f0e61ab#3951 which is where libui determines which touch points are going up or down, by comparing the active touch ids against the last set of touch ids. That also uses a BitSet32 and so makes the same assumption.

> 
> Also, we can remove the index variable now that we're using id.

Well, I'm using the index to get the id. I can inline that if you like but it seems to cleaner the way it is now.
Comment on attachment 8471103 [details] [diff] [review]
Use pointer id instead of pointer index when tracking pointers

Got it. I didn't see you used index to get the id.
Attachment #8471103 - Flags: review?(mwu) → review+
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #46)
> Comment on attachment 8471103 [details] [diff] [review]
> Use pointer id instead of pointer index when tracking pointers
> 
> Redirecting f? to CAF for inclusion in their test runs.

We started running test with this patch and we will update as soon as we reproduces this issue.
https://hg.mozilla.org/mozilla-central/rev/99b9e0a6a909
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Flags: needinfo?(dflanagan)
Whiteboard: [caf priority: p1][CR 700015][systemsfe] → [caf priority: p3][CR 700015][systemsfe]
For this to be covered, a new test case must be created.
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(srapanan)
(In reply to Tapas Kumar Kundu from comment #51)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #46)
> > Comment on attachment 8471103 [details] [diff] [review]
> > Use pointer id instead of pointer index when tracking pointers
> > 
> > Redirecting f? to CAF for inclusion in their test runs.
> 
> We started running test with this patch and we will update as soon as we
> reproduces this issue.

This is fixed. Thanks to all and especially Kyle for good work :)
Attachment #8471103 - Flags: feedback?(tkundu) → feedback+
Test case added in moztrap:

https://moztrap.mozilla.org/manage/case/14369/
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(srapanan)
Flags: in-moztrap+
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.