1047639 - Unable to open any app from home screen.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Description

•

10 years ago

Attached file logs when device is stuck in homescreen — Details

Test steps:
1. Tap on camera app to open.
2. While opening camera app, press power key and tap on “Turn airplane mode on”
3.  after that tap on that screen multiple times.
4. Press cancel in that screen. (Long press power button screen)
5. Not responding. Press home key.
6. Tap on any app in home screen. No response.
Observations:
1. Time is updating on UI.
2. Power button and Volume button are functional.
3. Long press on home key “no recent apps”
4. Press on home key gives vibration.
5. Tapping on any icon in home key no response. Unable to scroll the screen.
6. It is not a memleak issue.
7  if we kill homescreen |adb shell kill pid| then homescreen restarts but still it is stuck.

https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/v2.0&id=0a864988f5dce7f9f3dea9609e8ef054679c30ff
https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/v2.0&id=745b486db495248e4d4503039e374cb8d5bb244f


I knew that STR will be very difficult to reproduce . But if you give us some patch which can produce additional logs then it may help you to analyze more here.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 1

•

10 years ago

[Blocking Requested - why for this release]:

blocking-b2g: --- → 2.0?

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 2

•

10 years ago

Did the build you saw this on have the patch for bug 1038854 in it before it was backed out?

Flags: needinfo?(tkundu)

cafbot (PoC: ggrisco)

Updated

•

10 years ago

Whiteboard: [CR 700015] → [caf priority: p1][CR 700015]

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 3

•

10 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #2)
> Did the build you saw this on have the patch for bug 1038854 in it before it
> was backed out?

No. We can test with that patch and confirm you again.

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Updated

•

10 years ago

Flags: needinfo?(tkundu)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 4

•

10 years ago

The patch that landed caused a number of regressions, including apps not starting, so I want to make sure that you do *not* have it.

Inder

Comment 5

•

10 years ago

Kyle -- we do not have the patch in our build.

Naoki Hirata :nhirata (please use needinfo instead of cc)

Updated

•

10 years ago

status-b2g-v2.0: --- → affected

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Updated

•

10 years ago

Flags: needinfo?(tkundu)

Kevin Grandon :kgrandon

Comment 6

•

10 years ago

(In reply to Tapas Kumar Kundu from comment #0)
> 6. Tap on any app in home screen. No response.
> 5. Tapping on any icon in home key no response. Unable to scroll the screen.

The fact that the homescreen renders icons, but you can not even scroll makes me feel that this is probably a panning and zooming bug, though bug 1043689 was recently handed off to me and describes that we may be instantiating multiple instances of dragdrop. I have no idea if this could be causing the problem, but it seems like there might be a chance that we are doing weird stuff with touch events. Going to depend on that for now and get a patch in.

Depends on: 1043689

Candice Serran (:cserran)

Updated

•

10 years ago

Whiteboard: [caf priority: p1][CR 700015] → [caf priority: p1][CR 700015][systemsfe]

Kevin Grandon :kgrandon

Comment 7

•

10 years ago

After being able to reproduce bug 1043689, this is not related.

No longer depends on: 1043689

bhavana bajaj [:bajaj]

Updated

•

10 years ago

blocking-b2g: 2.0? → 2.0+

Gregor Wagner [:gwagner]

Comment 8

•

10 years ago

The log shows:
08-01 19:56:09.885  1464  1464 E GeckoConsole: Content JS ERROR at app://verticalhome.gaiamobile.org/gaia_build_defer_index.js:392 in GridItem.prototype.doRenderIcon/<: Error fetching icon [Exception... "File error: Not found"  nsresult: "0x80520012 (NS_ERROR_FILE_NOT_FOUND)"  location: "JS frame :: app://verticalhome.gaiamobile.org/gaia_build_defer_index.js :: fetchBlob/< :: line 374"  data: no]

Assignee: nobody → kgrandon

Kevin Grandon :kgrandon

Comment 9

•

10 years ago

(In reply to Gregor Wagner [:gwagner] from comment #8)
> The log shows:

Sounds like an error we should squelch, but would not lead to this. I've also noticed that we are unable to launch apps in desktop firefox. Seems like they might be related, and that this should go into the mozapps component, but will try to verify.

Kevin Grandon :kgrandon

Comment 10

•

10 years ago

Can we add qawanted here for verification?

Ideally I would like to see a video/screenshot and potentially reduced STR if possible.

Keywords: qawanted

Johan Lorenzo [:jlorenzo]

Updated

•

10 years ago

Keywords: steps-wanted

Gregor Wagner [:gwagner]

Comment 11

•

10 years ago

There is also:
8-01 15:53:51.368   226   226 E GeckoConsole: [JavaScript Error: "TypeError: StackManager.getCurrent(...) is undefined" {file: "app://system.gaiamobile.org/js/edge_swipe_detector.js" line: 119}]

Cody Roesch [:croesch]

Comment 12

•

10 years ago

Unable to reproduce this issue on Flame 2.0 or Buri 2.0 devices.

Environmental Variables:
Device: Flame 2.0
BuildID: 20140804060132
Gaia: 4ab7384db7aee130be165a699472cc19405a4456
Gecko: 5e94ab16ec71
Version: 32.0 (2.0) 
Firmware Version: v122
---------------------------------------------------
Environmental Variables:
Device: Buri 2.0
BuildID: 20140804060132
Gaia: 4ab7384db7aee130be165a699472cc19405a4456
Gecko: 5e94ab16ec71
Version: 32.0 (2.0) 
Firmware Version: v1.2device.cfg
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

QA Whiteboard: [QAnalyst-Triage?]

Flags: needinfo?(jmitchell)

QA Contact: croesch

Kevin Grandon :kgrandon

Comment 13

•

10 years ago

Tapas - We've also just landed and uplifted bug 1043293 to v2.0. Can you test with the patch from the latest v2.0 (or ensure that you have the one from that bug), and see if this issue reproduces?

That bug changes the way that we listen to click/touch events, and I just wanted to be sure that it does or does not fix the problem. Thanks!

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 14

•

10 years ago

(In reply to Kevin Grandon :kgrandon from comment #13)
> Tapas - We've also just landed and uplifted bug 1043293 to v2.0. Can you
> test with the patch from the latest v2.0 (or ensure that you have the one
> from that bug), and see if this issue reproduces?
> 
> That bug changes the way that we listen to click/touch events, and I just
> wanted to be sure that it does or does not fix the problem. Thanks!

Kevin, Thanks for your quick help in this. I will update you asap on this :)

Jason Smith [:jsmith]

Comment 15

•

10 years ago

I don't think this is a type of bug we are going to be able to reproduce manually, as this comes across as a bug only seen in a stress-style automation setup. Our best option here is to setup logging, reproduce the bug, and diagnose it via the avenue.

Keywords: qawanted, steps-wanted

Candice Serran (:cserran)

Updated

•

10 years ago

Target Milestone: --- → 2.1 S2 (15aug)

David Flanagan [:djf]

Comment 16

•

10 years ago

Tapas: could you clarify some things about your STR to help us understand?

> Test steps:
> 1. Tap on camera app to open.

Have you tried to reproduce this with apps other than the Camera?  In particular, does it reproduce for other full-screen apps like the Gallery?  The status bar is not displayed in Camera or Gallery, but turning airplane mode on will want to update the status bar.  If this is not camera specific but is full-screen specific that would be a big hint.

> 2. While opening camera app, press power key and tap on “Turn airplane mode
> on”

Does step 2 have to happen immediately after step 1? Literally while the app is launching?

> 3.  after that tap on that screen multiple times.

I don't understand this part: doesn't the power dialog go away as soon as you tap "turn on airplane mode"?  How can you tap on the screen multiple times?

> 4. Press cancel in that screen. (Long press power button screen)

Again, I don't understand why this is even possible. Once you've tapped the airplane mode button the cancel button should not be visible. I guess you're saying that there is also a bug in the way the power button menu is behaving, not just in the home screen, is that right?

> 5. Not responding. Press home key.

Do you mean that the cancel button does not dismiss the power button menu?  But that the home key does?

> 6. Tap on any app in home screen. No response.

Hmm. I wonder if the system app has made the power menu transparent but has not actually removed it? Are the tap events being directed to an invisible overlay and never getting to the homescreen?

Finally, is this on a 256mb device?  Does it feel sluggish (like zmem is thrashing) when you reproduce this?

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 17

•

10 years ago

(In reply to David Flanagan [:djf] from comment #16)
> Tapas: could you clarify some things about your STR to help us understand?
> 
> > Test steps:

I am sorry to tell you that this issue has came stability testing and this STR will never help you in reproducing this issue.
PLEASE DON'T TRY TO REPRODUCE IT YOUR OWN DEVICE. THIS WILL SAVE YOUR TIME.

> 
> Hmm. I wonder if the system app has made the power menu transparent but has
> not actually removed it? Are the tap events being directed to an invisible
> overlay and never getting to the homescreen?
> 

Nice find. Could you please give us a logging patch which can confirm this theory ? 

> Finally, is this on a 256mb device?  
YES

> Does it feel sluggish (like zmem is
> thrashing) when you reproduce this?

I think that LMK should kill app before systems becomes very slow. 

BTW, here is the memory log when issue has happened on device :

                           |      megabytes     |
           NAME   PID PPID  CPU(s) NICE  USS  PSS  RSS SWAP VSIZE OOM_ADJ USER
            b2g   228    1 11760.2    0 46.8 47.7 49.8 19.7 245.8       0 root
         (Nuwa)  1074  228    87.5    0  0.0  0.1  0.6  7.8  53.9       0 root
     Homescreen  4313  228    29.2    1  0.0  0.9  2.8 15.8  80.0       2 u0_a4313
(Preallocated a 12103 1074     2.2   18  0.0  0.8  2.5 10.8  60.9       1 u0_a12103

System memory info:

            Total 167.6 MB
        SwapTotal 192.0 MB
     Used - cache 127.3 MB
  B2G procs (PSS)  49.5 MB
    Non-B2G procs  77.7 MB
     Free + cache  40.4 MB
             Free  13.5 MB
            Cache  26.8 MB
         SwapFree 113.5 MB

Low-memory killer parameters:

  notify_trigger 9216 KB

  oom_adj min_free
        0  4096 KB
       58  5120 KB
      117  6144 KB
      352  7168 KB
      470  8192 KB
      588 10240 KB
             
This does not tell me that we are really running very low memory when issue has happened. 

Please note that we have seen this issue in latest gaia/gecko
https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/v2.0&id=9e5907995c9327f14cb5d182cee5ff16b1743ed4
https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/v2.0&id=ed87f0a54baf646eb0b20b4debc090c8016d2104

and we also have fix from bug bug 1043293 in that test build

Flags: needinfo?(dflanagan)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Updated

•

10 years ago

Flags: needinfo?(tkundu)

Kevin Grandon :kgrandon

Comment 18

•

10 years ago

Attached patch Patch - Debug APZ (obsolete) — Details — Splinter Review

Kats - I'm looking for a patch to help debug this issue. Because Comment 0 states that the screen can not even be scrolled, I think getting some APZ logs might help. Do you think this patch will help us get the necessary information, and is there additional logging we should add before asking Tapas to test it?

Flags: needinfo?(bugmail.mozilla)

Botond Ballo [:botond]

Comment 19

•

10 years ago

(In reply to Kevin Grandon :kgrandon from comment #18)
> Created attachment 8468869 [details] [diff] [review]
> Patch - Debug APZ
> 
> Kats - I'm looking for a patch to help debug this issue. Because Comment 0
> states that the screen can not even be scrolled, I think getting some APZ
> logs might help. Do you think this patch will help us get the necessary
> information, and is there additional logging we should add before asking
> Tapas to test it?

I would also enable APZCTM_LOG at the top of gfx/layers/apz/src/APZCTreeManager.cpp.

Kevin Grandon :kgrandon

Comment 20

•

10 years ago

Attached patch Patch - Debug APZ (obsolete) — Details — Splinter Review

Tapas - would you be able to apply these gecko patches and try to reproduce the issue? I think they may help us track it down. Thank you!

Attachment #8468869 - Attachment is obsolete: true

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 21

•

10 years ago

Attached file logs_mozilla.tar.bz2 — Details

here is latest log when this issue happens again.

Flags: needinfo?(tkundu)

Kevin Grandon :kgrandon

Comment 22

•

10 years ago

(In reply to Tapas Kumar Kundu from comment #21)
> Created attachment 8468879 [details]
> logs_mozilla.tar.bz2
> 
> here is latest log when this issue happens again.

Tapas - do those logs contain the patch that I attached previously? I don't see any of the expected debug statements in them.

Flags: needinfo?(tkundu)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 23

•

10 years ago

(In reply to Kevin Grandon :kgrandon from comment #22)
> (In reply to Tapas Kumar Kundu from comment #21)
> > Created attachment 8468879 [details]
> > logs_mozilla.tar.bz2
> > 
> > here is latest log when this issue happens again.
> 
> Tapas - do those logs contain the patch that I attached previously? I don't
> see any of the expected debug statements in them.

I will try this patch. Those logs are from a device which does not have your patch. I asked our test team to test with your patch and we will update soon


BTW, if you look into logs of #Comment 20, then you will see we have lots of "queued-ipc-messages" in 

0 (100.0%) -- queued-ipc-messages
├──0 (100.0%) ── content-parent((Preallocated), pid=12103, open channel, 0xac672800, refcnt=15)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac1a5400, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac338c00, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac431800, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac464000, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac466800, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac469000, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac46b000, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac46b400, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac46b800, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac46bc00, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac46c800, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac46d000, refcnt=1)
├──0 (100.0%) ── content-parent(Camera, pid=-1, closed channel, 0xac46d800, refcnt=1)
............................many others are present. ..................
├──0 (100.0%) ── content-parent(Homescreen, pid=4313, open channel, 0xac17dc00, refcnt=51)
├──0 (100.0%) ── content-parent(Settings, pid=-1, closed channel, 0xac80fc00, refcnt=1)
└──0 (100.0%) ── content-parent(Settings, pid=-1, closed channel, 0xad3bbc00, refcnt=1)


My question : Are we leaking IPC message from b2g ? Could you please tell me story what can show this kind of queued IPC message in memory report of affected test device ? It would be great if you can also give us some logging patch to debug it in this direction as well. Thanks a lot.

Flags: needinfo?(tkundu) → needinfo?(kgrandon)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 24

•

10 years ago

(In reply to Kevin Grandon :kgrandon from comment #20)
> Created attachment 8468874 [details] [diff] [review]
> Patch - Debug APZ
> 
> Tapas - would you be able to apply these gecko patches and try to reproduce
> the issue? I think they may help us track it down. Thank you!

Could you please re-base your patch for FFOS v2.0 ?

Kevin Grandon :kgrandon

Comment 25

•

10 years ago

Attached patch Patch - Debug APZ — Details — Splinter Review

Tapas - Here is an updated patch for v2.0. Thanks!

Attachment #8468874 - Attachment is obsolete: true

Flags: needinfo?(kgrandon) → needinfo?(tkundu)

Kevin Grandon :kgrandon

Comment 26

•

10 years ago

(In reply to Tapas Kumar Kundu from comment #23)
> My question : Are we leaking IPC message from b2g ? Could you please tell me
> story what can show this kind of queued IPC message in memory report of
> affected test device ? It would be great if you can also give us some
> logging patch to debug it in this direction as well. Thanks a lot.

I will also check with some people who might know more about IPC than I know.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 27

•

10 years ago

Clearing needinfo for now; the patch you have should provide the relevant logging from the APZ side. (Thanks botond!)

Tapas, if this is going to be run in a stress-test type scenario, the log will likely be very large. It would help to make sure the log has timestamps and that you know the approximate timestamp at which the error manifested. We can then work backwards from that point to see where things went wrong.

Flags: needinfo?(bugmail.mozilla)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 28

•

10 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #27)
> Clearing needinfo for now; the patch you have should provide the relevant
> logging from the APZ side. (Thanks botond!)
> 
> Tapas, if this is going to be run in a stress-test type scenario, the log
> will likely be very large. It would help to make sure the log has timestamps
> and that you know the approximate timestamp at which the error manifested.
> We can then work backwards from that point to see where things went wrong.

This issue does not produce any crash . Stability team will run this test for 48 hours and it will be difficult to say when this issue has occurred after 48hours.  We cannot ask them to watch device 48 hours continuously.

But I can assure you that all logs will always have device timestamps. 

Could you please add any log which can confirm approximate time stamp of this UI hang issue ? Otherwise our test team can confirm approximate time stamp only when they come to office (i.e. may be after 1 day of testing ) . Sorry for it.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 29

•

10 years ago

(In reply to Tapas Kumar Kundu from comment #28)
> This issue does not produce any crash . Stability team will run this test
> for 48 hours and it will be difficult to say when this issue has occurred
> after 48hours.  We cannot ask them to watch device 48 hours continuously.

Maybe it would help if you explain the kind of setup you're using to reproduce this. Based on the STR you provided, I assume you have the device in a physical harness with some sort of mechanical finger that you're using to stress test the device. 48 hours later you go to the device and see that it's stuck on the homescreen and that tapping anywhere has no effect. So you conclude that somewhere during the stress test things went bad, but you have no clear STR because you're not sure at what point the problem happened.

Is that accurate? If not please describe in as much detail as you can what is going on.

> Could you please add any log which can confirm approximate time stamp of
> this UI hang issue ? Otherwise our test team can confirm approximate time
> stamp only when they come to office (i.e. may be after 1 day of testing ) .
> Sorry for it.

If we don't know what is causing the problem then there's no way to add logging to catch this problem. If you can describe the setup that causes this issue to occur we might be able to narrow down the cause enough to add some targetted logging.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 30

•

10 years ago

(In reply to Tapas Kumar Kundu from comment #23)
> My question : Are we leaking IPC message from b2g ? Could you please tell me
> story what can show this kind of queued IPC message in memory report of
> affected test device ? It would be great if you can also give us some
> logging patch to debug it in this direction as well. Thanks a lot.

No, we're not leaking IPC messages.  You can see that the count of pending messages is 0.  What we are leaking is ContentParent objects, the parent side IPC machinery.  I'm investigating.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 31

•

10 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #30)
> (In reply to Tapas Kumar Kundu from comment #23)
> > My question : Are we leaking IPC message from b2g ? Could you please tell me
> > story what can show this kind of queued IPC message in memory report of
> > affected test device ? It would be great if you can also give us some
> > logging patch to debug it in this direction as well. Thanks a lot.
> 
> No, we're not leaking IPC messages.  You can see that the count of pending
> messages is 0.  What we are leaking is ContentParent objects, the parent
> side IPC machinery.  I'm investigating.

I spun this off as bug 1050423 with some information about what's happening.

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 32

•

10 years ago

Attached file logs with APZ traces — Details

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #27)
> Clearing needinfo for now; the patch you have should provide the relevant
> logging from the APZ side. (Thanks botond!)
> 
> Tapas, if this is going to be run in a stress-test type scenario, the log
> will likely be very large. It would help to make sure the log has timestamps
> and that you know the approximate timestamp at which the error manifested.
> We can then work backwards from that point to see where things went wrong.


Approximate timestamp when this issue happens is: 01-08 02:11 .

Please note if we kill homescreen then it does not restart again. 

I am also seeing following log in logcat :

 01-08 02:13:27.319  1445  1445 I GeckoIPC: [time:612807326939][1445->229][PTextureChild] Sending Msg_RemoveTexture([TODO])

1445 is homescreen pid and it says that homescreen main thread is doing some activity even after this issue has occurred. 

Probably, b2g is stuck somewhere . Any idea how to further debugging here ?  

It seems to me interesting that homescreen process is not killed

Flags: needinfo?(tkundu) → needinfo?(bugmail.mozilla)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 33

•

10 years ago

The last set of touch events received in the APZ code was at 01-08 02:05:53.019. This is a while before the timestamp you say things go bad at. So the problem is somewhere earlier in the flow of input events, likely somewhere in the root process event handling code, or the edge swipe detector (see [1] for reference).

Also kind of interesting is that the last set of touch events received by the APZ code had two touch-start events and one touch-end event, and then the controller instance handling them was destroyed (presumably because the layer tree changed). Although kind of weird, I don't think this would have left anything in a permanently-bad state in the APZ code, because the next touch-start event received should have reset the state and found a new controller to handle it. It's just that we didn't get any events after that point.

Assuming your test harness is flawless (a rather big assumption, given that you haven't answered my question about how it works) I think the most likely place for things to go wrong is in the edge swipe detector code [2].

[1] https://bug930939.bugzilla.mozilla.org/attachment.cgi?id=8437066
[2] https://github.com/mozilla-b2g/gaia/blob/e28dd2237b052024866e5cfa8776373f4d6820a9/apps/system/js/touch_forwarder.js#L96

Flags: needinfo?(bugmail.mozilla)

Kevin Grandon :kgrandon

Comment 34

•

10 years ago

Not sure if bug 1050423 should be in the dependency chain or not, so just doing a "see also" for now so it's easy to find.

Comment 35

•

10 years ago

In terms of additional logging, I would suggest adding a log in nsAppShell or nsWindow where the events are actually created and dispatched, and another log in the edge swipe detector code to see if it's mishandling them.

Kevin Grandon :kgrandon

Comment 36

•

10 years ago

Tapas, can we also verify that bug 1050423 is not related? Same steps as bug 1047645 comment 21.

bhavana bajaj [:bajaj]

Comment 37

•

10 years ago

:fabrice, given you helped in bug 1050423, can we get more predictable info that the bug you fixed has a high probability of helping p here that Kevin is suspecting? Or should we start looking in some other direction.

Flags: needinfo?(fabrice)

[:fabrice] Fabrice Desré

Comment 38

•

10 years ago

(In reply to bhavana bajaj [:bajaj] from comment #37)
> :fabrice, given you helped in bug 1050423, can we get more predictable info
> that the bug you fixed has a high probability of helping p here that Kevin
> is suspecting? Or should we start looking in some other direction.

comment 23 shows ContentParent leaks related to the camera app that are very likely fixed in bug 1050423 but the ones related to the Settings app won't. bug 1050423 fixed issues related to the permission prompts that we don't use anywhere in the context of the settings app.

Kyle, any idea what's holding the settings app contentparent alive?

Flags: needinfo?(fabrice) → needinfo?(khuey)

Joshua Mitchell (Inactive)

Updated

•

10 years ago

QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]

Flags: needinfo?(jmitchell)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 39

•

10 years ago

I'm not aware of any way to determine which iframe belongs to which app from the CC log, so I would have to do an exhaustive search of the 49 iframes, identifying what is holding each one of them alive.  I don't really have time to do that unfortunately.  If you can point me at some way to tell which iframe belongs to which app (is there a property with the app URI hanging off of the JS object or something?) I can take a look, but otherwise I think we have to wait for a log with bug 1050423 fixed.

Regardless, I think the ContentParent leaks are unlikely to be related to the root cause of this bug.  I've never seen those sorts of leaks cause any problems other than increased memory usage before.

Flags: needinfo?(khuey)

[:fabrice] Fabrice Desré

Comment 40

•

10 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #39)
> I'm not aware of any way to determine which iframe belongs to which app from
> the CC log, so I would have to do an exhaustive search of the 49 iframes,
> identifying what is holding each one of them alive.  I don't really have
> time to do that unfortunately.  If you can point me at some way to tell
> which iframe belongs to which app (is there a property with the app URI
> hanging off of the JS object or something?) I can take a look, but otherwise
> I think we have to wait for a log with bug 1050423 fixed.
> 
> Regardless, I think the ContentParent leaks are unlikely to be related to
> the root cause of this bug.  I've never seen those sorts of leaks cause any
> problems other than increased memory usage before.

The mozapp attribute gives the manifest url, but I don't think it's reflected to property.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 41

•

10 years ago

I debugged a device in this state with Tapas at CAF today.  We found that the device was in a state where mTouchEventsFiltered[0] was true and mTouchDownCount[1] is 2.  Because of this, we discard all touch events, and we do not recheck the timestamps because this code believes we are still in the middle of some touch.

[0] http://mxr.mozilla.org/mozilla-central/source/widget/gonk/nsAppShell.cpp#759
[1] http://mxr.mozilla.org/mozilla-central/source/widget/gonk/nsAppShell.cpp#726

Component: Gaia::Homescreen → Widget: Gonk

Product: Firefox OS → Core

Kevin Grandon :kgrandon

Comment 42

•

10 years ago

Thanks for digging into this Kyle. Being that the fix for this is in Gonk, I'm probably not the best person to look into this if it's urgent. Let me know if I can help though.

Assignee: kgrandon → nobody

Hema Koka [:hema]

Updated

•

10 years ago

Assignee: nobody → mwu

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 43

•

10 years ago

After looking at the widget code a bit I believe we should be using the id instead of the index when updating the mTouchDown bitset. This is likely a regression from bug 988110 which added the code. I have a patch that I'm building to do a sanity test on, will post it shortly.

Michael Wu [:mwu]

Comment 44

•

10 years ago

Passing to kats for now, but if this doesn't work out, feel free to pass it back to me and I'll look at it. I have some other ideas to try if this doesn't work.

Assignee: mwu → bugmail.mozilla

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 45

•

10 years ago

Attached patch Use pointer id instead of pointer index when tracking pointers — Details — Splinter Review

This patch is based off 2.0 but the code hasn't changed since then so the same patch should apply to master.

Attachment #8471103 - Flags: review?(mwu)

Attachment #8471103 - Flags: feedback?(khuey)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 46

•

10 years ago

Comment on attachment 8471103 [details] [diff] [review]
Use pointer id instead of pointer index when tracking pointers

Redirecting f? to CAF for inclusion in their test runs.

Attachment #8471103 - Flags: feedback?(khuey) → feedback?(tkundu)

Michael Wu [:mwu]

Comment 47

•

10 years ago

I'm slightly worried about touch ids that are greater than 31, since we're using a 32 bit bitfield. Those ids would break the touch down counter. Probably not a big deal on real devices, but maybe we can avoid incrementing the counter for touch ids we can't handle.

Also, we can remove the index variable now that we're using id.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 48

•

10 years ago

(In reply to Michael Wu [:mwu] from comment #47)
> I'm slightly worried about touch ids that are greater than 31, since we're
> using a 32 bit bitfield. Those ids would break the touch down counter.
> Probably not a big deal on real devices, but maybe we can avoid incrementing
> the counter for touch ids we can't handle.

If the hardware ever generates touch ids > 31 then the code in libui/InputReader.cpp will break before it ever gets here. See for example http://mxr.mozilla.org/mozilla-central/source/widget/gonk/libui/InputReader.cpp?rev=0b914f0e61ab#3951 which is where libui determines which touch points are going up or down, by comparing the active touch ids against the last set of touch ids. That also uses a BitSet32 and so makes the same assumption.

> 
> Also, we can remove the index variable now that we're using id.

Well, I'm using the index to get the id. I can inline that if you like but it seems to cleaner the way it is now.

Michael Wu [:mwu]

Comment 49

•

10 years ago

Comment on attachment 8471103 [details] [diff] [review]
Use pointer id instead of pointer index when tracking pointers

Got it. I didn't see you used index to get the id.

Attachment #8471103 - Flags: review?(mwu) → review+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 50

•

10 years ago

https://hg.mozilla.org/integration/b2g-inbound/rev/99b9e0a6a909

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 51

•

10 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #46)
> Comment on attachment 8471103 [details] [diff] [review]
> Use pointer id instead of pointer index when tracking pointers
> 
> Redirecting f? to CAF for inclusion in their test runs.

We started running test with this patch and we will update as soon as we reproduces this issue.

Nigel Babu [:nigelb]

Comment 52

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/99b9e0a6a909

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

10 years ago

status-b2g-v2.1: --- → fixed

Flags: needinfo?(dflanagan)

Ryan VanderMeulen [:RyanVM]

Comment 53

•

10 years ago

https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/f4baa2e76f81

status-b2g-v2.0: affected → fixed

status-firefox32: --- → wontfix

status-firefox33: --- → wontfix

status-firefox34: --- → fixed

cafbot (PoC: ggrisco)

Updated

•

10 years ago

Whiteboard: [caf priority: p1][CR 700015][systemsfe] → [caf priority: p3][CR 700015][systemsfe]

[:SalvadorR]

Comment 54

•

10 years ago

For this to be covered, a new test case must be created.

QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]

Flags: needinfo?(ktucker)

Flags: in-moztrap?(srapanan)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Comment 55

•

10 years ago

(In reply to Tapas Kumar Kundu from comment #51)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #46)
> > Comment on attachment 8471103 [details] [diff] [review]
> > Use pointer id instead of pointer index when tracking pointers
> > 
> > Redirecting f? to CAF for inclusion in their test runs.
> 
> We started running test with this patch and we will update as soon as we
> reproduces this issue.

This is fixed. Thanks to all and especially Kyle for good work :)

Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me)

Reporter

Updated

•

10 years ago

Attachment #8471103 - Flags: feedback?(tkundu) → feedback+

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 56

•

10 years ago

\o/

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

10 years ago

status-b2g-v1.3T: --- → affected

status-b2g-v1.4: --- → unaffected

Keywords: regression

KTucker [:KTucker][Inactive 3/4/2016]

Comment 57

•

10 years ago

Test case added in moztrap:

https://moztrap.mozilla.org/manage/case/14369/

URL: https://moztrap.mozilla.org/manage/ca...

QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]

Flags: needinfo?(ktucker)

Flags: in-moztrap?(srapanan)

Flags: in-moztrap+

BMO Automation

Updated

•

6 years ago

Product: Core → Core Graveyard

logs when device is stuck in homescreen 10 years ago Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) 41.95 KB, application/x-bzip		Details
Patch - Debug APZ 10 years ago Kevin Grandon :kgrandon 486 bytes, patch		Details \| Diff \| Splinter Review
Patch - Debug APZ 10 years ago Kevin Grandon :kgrandon 1012 bytes, patch		Details \| Diff \| Splinter Review
logs_mozilla.tar.bz2 10 years ago Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) 6.91 MB, application/x-bzip		Details
Patch - Debug APZ 10 years ago Kevin Grandon :kgrandon 1.16 KB, patch		Details \| Diff \| Splinter Review
logs with APZ traces 10 years ago Tapas[:tkundu on #b2g/gaia/memshrink/gfx] (always NI me) 464.06 KB, application/x-bzip		Details
Use pointer id instead of pointer index when tracking pointers 10 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 1.79 KB, patch	mwu : review+ tkundu : feedback+	Details \| Diff \| Splinter Review