Closed
Bug 988110
Opened 11 years ago
Closed 10 years ago
[tarako] touch event filtering can get stuck when up/down actions are unbalanced
Categories
(Core :: IPC, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
b2g-v1.3T | --- | fixed |
People
(Reporter: yaoyao.wu, Assigned: viralwang)
References
Details
(Keywords: regression, Whiteboard: [priority])
Attachments
(16 files, 2 obsolete files)
14.44 KB,
text/plain
|
Details | |
801.53 KB,
application/x-bzip
|
Details | |
123.15 KB,
image/png
|
Details | |
130.75 KB,
text/x-log
|
Details | |
127.13 KB,
text/x-log
|
Details | |
7.56 KB,
text/plain
|
Details | |
130.07 KB,
text/x-log
|
Details | |
15.84 KB,
image/jpeg
|
Details | |
9.06 KB,
image/png
|
Details | |
1.08 MB,
application/zip
|
Details | |
1.14 MB,
video/quicktime
|
Details | |
31.61 KB,
application/zip
|
Details | |
4.40 MB,
video/quicktime
|
Details | |
1.44 MB,
application/zip
|
Details | |
6.11 MB,
application/zip
|
Details | |
2.61 KB,
patch
|
Details | Diff | Splinter Review |
changeset from 191
=========================
gaia
commit 943ff7984dbb8fbe833c3daef18b09a62c139c1e
Merge: ea0653b f513242
Author: Ryan VanderMeulen <ryanvm@gmail.com>
Date: Fri Mar 21 10:53:54 2014 -0400
Merge branch 'v1.3' into v1.3t
commit ea0653b1329d78a569f2e9a19f763ad8881449e2
Merge: 1917b47 e33d80e
Author: Ryan VanderMeulen <rvandermeulen@mozilla.com>
Date: Fri Mar 21 08:40:25 2014 -0400
Merge pull request #17422 from timdream/keyboard-mempressure-race
Bug 984238 - Don't kill the keyboard when when it's being launched +shepherd
commit 1917b47b7099de1fab2c9786ae04219122966049
Merge: a170c6f 28e0261
Author: Yuren Ju <yurenju@gmail.com>
Date: Fri Mar 21 18:25:08 2014 +0800
Merge pull request #17433 from evanxd/v1.3t-bug-986345
Bug 986345 - Fix CI_ACTION=marionette_js failure on 1.3/1.3t branch. a=testonly
commit f5132420bef0d4dc27dfcb32bb9c5df2ab80530a
Merge: 864957c e66437c
Author: Yuren Ju <yurenju@gmail.com>
Date: Fri Mar 21 18:25:00 2014 +0800
Merge pull request #17432 from evanxd/v1.3-bug-986345
Bug 986345 - Fix CI_ACTION=marionette_js failure on 1.3/1.3t branch. a=testonly
commit a170c6f78a2a710e843e68243691ac39746d3456
Merge: a6919e4 ea4731f
Author: Yuren Ju <yurenju@gmail.com>
Date: Fri Mar 21 18:16:42 2014 +0800
Merge pull request #17436 from yurenju/wappush-v1.3t
Bug 986380 - fix unit test case "auth_test.js" in wappush app a=testonly
commit 864957ce10921155f702aa60413fddcf0c6b078e
Merge: 001f2e6 5b9c7ff
Author: Yuren Ju <yurenju@gmail.com>
Date: Fri Mar 21 18:08:20 2014 +0800
Merge pull request #17435 from yurenju/wappush
Bug 986380 - fix unit test case "auth_test.js" in wappush app a=testonly
commit e66437c5522499cd98d349081691349d29bce4b4
Author: Evan Xd <itoyxd@gmail.com>
Date: Fri Mar 21 17:17:25 2014 +0800
Bug 986345 - Add black list on Travis.
Summary: [tarako] B2G_NOOPT=1 build, monkey test无响应 → [tarako] B2G_NOOPT=1 build, 测试2个小时左右会出现monkey test无响应的问题
Updated•11 years ago
|
Summary: [tarako] B2G_NOOPT=1 build, 测试2个小时左右会出现monkey test无响应的问题 → [tarako] gaia no respond after monkey test start about 2 hours
Updated•11 years ago
|
Flags: needinfo?(ttsai)
Flags: needinfo?(ehung)
Comment 1•11 years ago
|
||
Spreadtrum hudson#191 build first met this issue, and the attachment is change list.
We can still meet this issue in latest hudson build. It block our monkey test now.
Comment 2•11 years ago
|
||
Do we know in which state gaia is blocked? can we get a screenshot?
Comment 3•11 years ago
|
||
(In reply to Fabrice Desré [:fabrice] from comment #2)
> Do we know in which state gaia is blocked? can we get a screenshot?
Homescreen.
Yaoyao, please provide slog here. Thanks.
Flags: needinfo?(yaoyao.wu)
Comment 4•11 years ago
|
||
Add Yang here for analyzing.
Comment 6•11 years ago
|
||
Can you attach a debugger to the hanging phone? It would be great to get stack traces for the hang situation. And maybe ps ax.
Comment 7•11 years ago
|
||
(In reply to Andreas Gal :gal from comment #6)
> Can you attach a debugger to the hanging phone? It would be great to get
> stack traces for the hang situation. And maybe ps ax.
Hi Andreas,
Please see Yaoyao's attachment. ADB works well and no gecko/gaia error log, adb shell getevent can show touch panel log, home/power key also can work, so I think it's gaia issue.
Comment 8•11 years ago
|
||
Yang, please tell yaoyao how to run local monkey test and use gdb to attach homescreen process and print backtrace, thanks.
Flags: needinfo?(yaoyao.wu)
Flags: needinfo?(yang.zhao)
Comment 9•11 years ago
|
||
We can't use gdb to attach hudson build, please build in local PC and ./run-gdb.sh attach b2g-process or homescreen-process, and get backtrace and comment here.
Thanks.
Comment 10•11 years ago
|
||
Evelyn, can you please update to this bug on what you suggested over email to our partner? Thanks
Flags: needinfo?(yaoyao.wu)
Flags: needinfo?(yang.zhao)
Flags: needinfo?(ttsai)
Comment 11•11 years ago
|
||
Joe, please traige this bug. Its unconfirmed and its in the wrong component.
Comment 12•11 years ago
|
||
Bug 984238 - Don't kill the keyboard when when it's being launched +shepherd
Bug 959966: prevent the preallocated process from being killed by the lowmem killer by rasing its priority. r=khuey,gsvelto a=1.3t+
I recommend temporarily backing out these two patches to see if that fixes the problem.
Comment 13•11 years ago
|
||
(In reply to Joe Cheng [:jcheng] from comment #10)
> Evelyn, can you please update to this bug on what you suggested over email
> to our partner? Thanks
My suggestion over email aligned with comment 12, I also suspect these two bugs. Please try to remove these two patches and try your monkey test again.
Flags: needinfo?(ehung)
Comment 14•11 years ago
|
||
Run the monkey test,after about 3 hours,the phone stayed in a browser page and had no response at all.
adb shell b2g-info
| megabytes |
NAME PID PPID CPU(s) NICE USS PSS RSS VSIZE OOM_ADJ USER
b2g 13504 1 2429.5 0 49.7 51.3 53.5 179.6 0 root
(Nuwa) 13556 13504 5.3 0 0.2 0.9 2.1 50.6 0 root
(Preallocated a 18276 13556 0.7 18 7.1 8.7 11.0 57.6 1 root
System memory info:
Total 97.5 MB
Used - cache 80.5 MB
B2G procs (PSS) 60.8 MB
Non-B2G procs 19.7 MB
Free + cache 17.0 MB
Free 2.2 MB
Cache 14.8 MB
Low-memory killer parameters:
notify_trigger 14336 KB
oom_adj min_free
0 1024 KB
1 2048 KB
2 4096 KB
6 18432 KB
8 18432 KB
10 24576 KB
Comment 15•11 years ago
|
||
And backtrace in gdb is:
#0 ~nsCOMPtr_base (this=0xbed22dd0, __in_chrg=<value optimized out>) at /home/yangzhao/B2G/B2G6821_test/gecko/media/webrtc/signaling/./src/peerconnection/PeerConnectionImpl.h:629
#1 0x413a2ba0 in ~nsCOMPtr (aNode=0x4678be70) at ../../../dist/include/nsCOMPtr.h:473
#2 nsContentUtils::IsUserFocusIgnored (aNode=0x4678be70) at /home/yangzhao/B2G/B2G6821_test/gecko/content/base/src/nsContentUtils.cpp:5925
#3 0x4142785e in nsEventStateManager::PostHandleEvent (this=0x460a3d40, aPresContext=<value optimized out>, aEvent=0xbed23590, aTargetFrame=0x47f16580, aStatus=0xbed2309c)
at /home/yangzhao/B2G/B2G6821_test/gecko/content/events/src/nsEventStateManager.cpp:3231
#4 0x415ff574 in PresShell::HandleEventInternal (this=0x44120d60, aEvent=0xbed23590, aStatus=<value optimized out>)
at /home/yangzhao/B2G/B2G6821_test/gecko/layout/base/nsPresShell.cpp:7149
#5 0x415ff704 in PresShell::HandlePositionedEvent (this=0x44120d60, aTargetFrame=0x4678be70, aEvent=0xbed23590, aEventStatus=0xbed2309c)
at /home/yangzhao/B2G/B2G6821_test/gecko/layout/base/nsPresShell.cpp:6867
#6 0x415ffddc in PresShell::HandleEvent (this=<value optimized out>, aFrame=0x47f16580, aEvent=0xbed23590, aDontRetargetEvents=<value optimized out>, aEventStatus=0xbed2309c)
at /home/yangzhao/B2G/B2G6821_test/gecko/layout/base/nsPresShell.cpp:6666
#7 0x4138ccbe in nsViewManager::DispatchEvent (this=<value optimized out>, aEvent=0xbed23590, aView=<value optimized out>, aStatus=<value optimized out>)
at /home/yangzhao/B2G/B2G6821_test/gecko/view/src/nsViewManager.cpp:753
#8 0x4138b9c4 in nsView::HandleEvent (this=<value optimized out>, aEvent=0xbed23590, aUseAttachedEvents=<value optimized out>)
at /home/yangzhao/B2G/B2G6821_test/gecko/view/src/nsView.cpp:1084
#9 0x4122dfbc in nsWindow::DispatchEvent (this=<value optimized out>, aEvent=0x0, aStatus=@0xbed230c4) at /home/yangzhao/B2G/B2G6821_test/gecko/widget/gonk/nsWindow.cpp:478
#10 0x4122e956 in nsWindow::DispatchInputEvent (aEvent=..., aWasCaptured=0x0) at /home/yangzhao/B2G/B2G6821_test/gecko/widget/gonk/nsWindow.cpp:282
#11 0x4122d67a in sendMouseEvent (this=0xbed230e8) at /home/yangzhao/B2G/B2G6821_test/gecko/widget/gonk/nsAppShell.cpp:175
#12 GeckoInputDispatcher::dispatchOnce (this=0xbed230e8) at /home/yangzhao/B2G/B2G6821_test/gecko/widget/gonk/nsAppShell.cpp:599
#13 0x4122c514 in nsAppShell::ProcessNextNativeEvent (this=0x435554c0, mayWait=<value optimized out>) at /home/yangzhao/B2G/B2G6821_test/gecko/widget/gonk/nsAppShell.cpp:927
#14 0x41235126 in nsBaseAppShell::DoProcessNextNativeEvent (this=0xbed23590, mayWait=false, recursionDepth=0)
at /home/yangzhao/B2G/B2G6821_test/gecko/widget/xpwidgets/nsBaseAppShell.cpp:137
#15 0x412351ec in nsBaseAppShell::OnProcessNextEvent (this=0x435554c0, thr=0x4031e8e0, mayWait=false, recursionDepth=0)
at /home/yangzhao/B2G/B2G6821_test/gecko/widget/xpwidgets/nsBaseAppShell.cpp:283
#16 0x40d4fb7c in nsThread::ProcessNextEvent (this=0x4031e8e0, mayWait=false, result=0xbed237df) at /home/yangzhao/B2G/B2G6821_test/gecko/xpcom/threads/nsThread.cpp:585
#17 0x40d229c0 in NS_ProcessNextEvent (thread=0x435554c0, mayWait=false) at /home/yangzhao/B2G/B2G6821_test/gecko/xpcom/glue/nsThreadUtils.cpp:263
#18 0x40e7b4b0 in mozilla::ipc::MessagePump::Run (this=0x40302d30, aDelegate=0x4034c0c0) at /home/yangzhao/B2G/B2G6821_test/gecko/ipc/glue/MessagePump.cpp:85
#19 0x40e71458 in MessageLoop::RunInternal (this=0x0) at /home/yangzhao/B2G/B2G6821_test/gecko/ipc/chromium/src/base/message_loop.cc:222
#20 0x40e714d6 in MessageLoop::RunHandler (this=0x4034c0c0) at /home/yangzhao/B2G/B2G6821_test/gecko/ipc/chromium/src/base/message_loop.cc:215
#21 MessageLoop::Run (this=0x4034c0c0) at /home/yangzhao/B2G/B2G6821_test/gecko/ipc/chromium/src/base/message_loop.cc:189
#22 0x41234bdc in nsBaseAppShell::Run (this=0x435554c0) at /home/yangzhao/B2G/B2G6821_test/gecko/widget/xpwidgets/nsBaseAppShell.cpp:161
#23 0x41867d60 in nsAppStartup::Run (this=0x43560130) at /home/yangzhao/B2G/B2G6821_test/gecko/toolkit/components/startup/nsAppStartup.cpp:276
#24 0x4183f9e8 in XREMain::XRE_mainRun (this=0xbed23a14) at /home/yangzhao/B2G/B2G6821_test/gecko/toolkit/xre/nsAppRunner.cpp:4059
#25 0x418423ba in XREMain::XRE_main (this=0xbed23a14, argc=<value optimized out>, argv=<value optimized out>, aAppData=0x21170)
at /home/yangzhao/B2G/B2G6821_test/gecko/toolkit/xre/nsAppRunner.cpp:4127
#26 0x41842524 in XRE_main (argc=1, argv=0xbed25bf4, aAppData=0x21170, aFlags=<value optimized out>) at /home/yangzhao/B2G/B2G6821_test/gecko/toolkit/xre/nsAppRunner.cpp:4337
#27 0x000098de in do_main (argc=1, argv=0xbed25bf4) at /home/yangzhao/B2G/B2G6821_test/gecko/b2g/app/nsBrowserApp.cpp:163
#28 main (argc=1, argv=0xbed25bf4) at /home/yangzhao/B2G/B2G6821_test/gecko/b2g/app/nsBrowserApp.cpp:256
Comment 16•11 years ago
|
||
(In reply to Evelyn Hung [:evelyn] from comment #13)
> (In reply to Joe Cheng [:jcheng] from comment #10)
> > Evelyn, can you please update to this bug on what you suggested over email
> > to our partner? Thanks
>
> My suggestion over email aligned with comment 12, I also suspect these two
> bugs. Please try to remove these two patches and try your monkey test again.
Yaoyao will provide test result of remove patch build.
Flags: needinfo?(yaoyao.wu)
Comment 17•11 years ago
|
||
Doesn't seem to be affected by the JavaScript engine, giving a quick glance at the backtrace. Thanks for checking whether tarako bugs are provoked by the JavaScript engine or not before triaging them in this component.
Component: JavaScript Engine → General
Product: Core → Firefox OS
Version: 28 Branch → unspecified
Updated•11 years ago
|
Assignee: nobody → yaoyao.wu
Updated•11 years ago
|
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Updated•11 years ago
|
Component: General → Gaia
Comment 18•11 years ago
|
||
triage: minus for now, please renom if we have further information
blocking-b2g: 1.3T? → -
Comment 19•11 years ago
|
||
(In reply to Evelyn Hung [:evelyn] from comment #13)
> (In reply to Joe Cheng [:jcheng] from comment #10)
> > Evelyn, can you please update to this bug on what you suggested over email
> > to our partner? Thanks
>
> My suggestion over email aligned with comment 12, I also suspect these two
> bugs. Please try to remove these two patches and try your monkey test again.
hi,Evelyn
I reverted these two bugs:
Bug 984238 - Don't kill the keyboard when when it's being launched +shepherd
Bug 959966: prevent the preallocated process from being killed by the lowmem killer by rasing its
It seems ok after running over a night monkey test.I'll test it once again.
Flags: needinfo?(ehung)
Comment 20•11 years ago
|
||
Our QA has 60% reproduce rate, please check these revert commit.
Bug 984238 - Don't kill the keyboard when when it's being launched +shepherd
Bug 959966: prevent the preallocated process from being killed by the lowmem killer by rasing its
blocking-b2g: - → 1.3T?
Comment 21•11 years ago
|
||
Need Gaia help, we still meet this issue when monkey test.
We need connect nightly to debug this issue.
But currently there is a prompt window when nightly connect!!!
Flags: needinfo?(ttsai)
Flags: needinfo?(styang)
Flags: needinfo?(kkuo)
Updated•11 years ago
|
Flags: needinfo?(gal)
Comment 23•11 years ago
|
||
Hi! Tim, Cervantes,
According to partner, the monkey test can go smoothly with reverting following patches.
Bug 984238 - Don't kill the keyboard when when it's being launched +shepherd
Bug 959966: prevent the preallocated process from being killed by the lowmem killer by rasing its
Need your help to check those patches again. Thanks
--
Keven
Flags: needinfo?(timdream)
Flags: needinfo?(kkuo)
Flags: needinfo?(cyu)
Comment 24•11 years ago
|
||
Can you please check with patch breaks the release? And then lets re-open and back out the offending patch.
Flags: needinfo?(gal)
Comment 25•11 years ago
|
||
:dwi2, please also monitor this issue in China next week. thanks.
Flags: needinfo?(styang) → needinfo?(tzhuang)
Comment 26•11 years ago
|
||
(In reply to yang.zhao from comment #19)
> (In reply to Evelyn Hung [:evelyn] from comment #13)
> > (In reply to Joe Cheng [:jcheng] from comment #10)
> > > Evelyn, can you please update to this bug on what you suggested over email
> > > to our partner? Thanks
> >
> > My suggestion over email aligned with comment 12, I also suspect these two
> > bugs. Please try to remove these two patches and try your monkey test again.
> hi,Evelyn
> I reverted these two bugs:
> Bug 984238 - Don't kill the keyboard when when it's being launched +shepherd
> Bug 959966: prevent the preallocated process from being killed by the lowmem
> killer by rasing its
>
> It seems ok after running over a night monkey test.I'll test it once
> again.
Today's monkey test is not ok.The phone stays on lockscreen ,and couldn't unlock the phone.There is no response at all.From the nightly,you can see that when you try to unlock the phone,there is no "touched" class added to lockscreen.But in a normal phone,there will add a class named "touched" when you put your finger on the unlock area.
Comment 27•11 years ago
|
||
Tim can you guys look at this?
Comment 28•11 years ago
|
||
One of the phone with the latest version is not ok.You could unlock the phone normally,but the homescreen has no response.From the nightly,I found that there were two "<div id="homescreen"...></div>“,the second one is empty.When I set "disply:none" to the second div,the phone come back to normal.Please see the picture in the attachment.
I think maybe there is something wrong in window_manager control.
Comment 29•11 years ago
|
||
Thanks for the diagnosis.
Comment 30•11 years ago
|
||
(In reply to Andreas Gal :gal from comment #24)
> Can you please check with patch breaks the release? And then lets re-open
> and back out the offending patch.
We can reproduce this issue today without these patch. So we should fix it by Yang's nightly debug information.
Updated•11 years ago
|
Assignee: yaoyao.wu → anygregor
Component: Gaia → Gaia::System::Window Mgmt
Comment 31•11 years ago
|
||
Alive may know what happened in comment 28.
Flags: needinfo?(ehung) → needinfo?(alive)
Comment 32•11 years ago
|
||
I remember we workaround a similar issue on v1.1hd before. Window manager before v1.4 is not robust enough to deal with randomly creating/switching windows by monkey test. As I remember, workaround in v1.1hd was - we force to clean up those windows which are in an unknown states in window manager while every visibility change (window transition happens).
Luke can find the patch we had on v1.1hd .
Flags: needinfo?(lchang)
Comment 33•11 years ago
|
||
The patch in bug 959966 is just priority changes to the preallocated process to make it less likely to fall victim to the lowmem killer. This should not have impact to functionality.
Flags: needinfo?(cyu)
Comment 34•11 years ago
|
||
(In reply to yang.zhao from comment #28)
> Created attachment 8399055 [details]
> two div with the id= homescreen
>
> One of the phone with the latest version is not ok.You could unlock the
> phone normally,but the homescreen has no response.From the nightly,I found
> that there were two "<div id="homescreen"...></div>“,the second one is
> empty.When I set "disply:none" to the second div,the phone come back to
> normal.Please see the picture in the attachment.
> I think maybe there is something wrong in window_manager control.
I don't know how to do so called monkey test, but you could do some debugging here:
https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/homescreen_window.js#L149
The view is injected by render() and it's called by
1. constructor
2. ensure
3. restart
Since one of the #homescreen div is empty I guess render is called twice and kill()[https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/homescreen_window.js#L142] doesn't happen for some reason.
Flags: needinfo?(alive)
Comment 36•11 years ago
|
||
(In reply to Evelyn Hung [:evelyn] from comment #32)
> I remember we workaround a similar issue on v1.1hd before. Window manager
> before v1.4 is not robust enough to deal with randomly creating/switching
> windows by monkey test. As I remember, workaround in v1.1hd was - we force
> to clean up those windows which are in an unknown states in window manager
> while every visibility change (window transition happens).
>
> Luke can find the patch we had on v1.1hd .
Bug 921829 is what Evelyn mentioned. However, the patch I attached in that bug is still not working for that issue so I'm not sure it's workable in this case.
Flags: needinfo?(lchang)
Comment 37•11 years ago
|
||
(In reply to Andreas Gal :gal from comment #29)
> Thanks for the diagnosis.
One phone with the latest version stays in lockscreen and has no response after running monkey test again.And when I try to unlock the phone ,the "touchstart" event hasn't been triggered and handled by handleEvent in lockscreen.js ,but you can see the touchevent does received from 'adb shell getevent -l'.So maybe something wrong in gecko and cause the event couldn't be passed to gaia,I guess.
Comment 38•11 years ago
|
||
triage: we should not block before we have more actionable information.
let's not block on this before we have more information. thanks
blocking-b2g: 1.3T? → -
Comment 39•11 years ago
|
||
(In reply to yang.zhao from comment #37)
> (In reply to Andreas Gal :gal from comment #29)
> > Thanks for the diagnosis.
> One phone with the latest version stays in lockscreen and has no response
> after running monkey test again.And when I try to unlock the phone ,the
> "touchstart" event hasn't been triggered and handled by handleEvent in
> lockscreen.js ,but you can see the touchevent does received from 'adb shell
> getevent -l'.So maybe something wrong in gecko and cause the event couldn't
> be passed to gaia,I guess.
Reproduced again.From the nightly,I can see the homescreen div is recreated all the time."watch adb shell b2g-ps",you can see the homescreen process restarted again and again.
The b2g-info is as follows:
| megabytes |
NAME PID PPID CPU(s) NICE USS PSS RSS VSIZE OOM_ADJ USER
b2g 776 1 4895.8 0 31.4 33.4 37.0 160.1 0 root
Communications 4919 1 619.5 0 0.6 1.3 3.1 82.9 1 app_4919
(Nuwa) 15409 776 6.6 0 0.8 3.4 7.5 50.5 0 root
(Preallocated a 17056 15409 0.5 18 4.5 7.2 11.4 55.5 2 root
System memory info:
Total 98.5 MB
Used - cache 71.9 MB
B2G procs (PSS) 45.2 MB
Non-B2G procs 26.7 MB
Free + cache 26.6 MB
Free 8.9 MB
Cache 17.7 MB
Low-memory killer parameters:
notify_trigger 14336 KB
oom_adj min_free
0 1024 KB
1 2048 KB
2 4096 KB
6 18432 KB
8 18432 KB
10 24576 KB
Comment 40•11 years ago
|
||
Does anyone can tell me why the homescreen couldn't be created sucessfully,and why the process restarted again and again?
Flags: needinfo?(james.zhang)
Flags: needinfo?(alive)
Reporter | ||
Comment 41•11 years ago
|
||
Run monkey test with latest version,and the phone has no response after about 1 hour.
The b2g info:
APPLICATION USER PID PPID VSIZE RSS WCHAN PC NAME
b2g root 691 1 151776 38720 ffffffff 40033518 S /system/b2g/b2g
(Nuwa) root 759 691 51688 752 ffffffff 4010c518 S /system/b2g/plugin-container
Homescreen app_4519 4519 759 69556 12100 ffffffff 4010c518 S /system/b2g/plugin-container
Usage app_5106 5106 759 64176 16920 ffffffff 4010c518 S /system/b2g/plugin-container
b2g:
#0 js::VisitGrayWrapperTargets (zone=0x1, callback=0x466368c0, closure=0x421e8d10)
at /home/bingjiechen/B2G6821/gecko/js/src/jsfriendapi.cpp:662
#1 0x423c32ac in mozilla::dom::UIEventBinding::InterfaceObjectClass ()
from /home3/bingjiechen/B2G6821/objdir-gecko/dist/bin/libxul.so
#2 0x423c32ac in mozilla::dom::UIEventBinding::InterfaceObjectClass ()
from /home3/bingjiechen/B2G6821/objdir-gecko/dist/bin/libxul.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Homescreen:
#0 epoll_wait () at bionic/libc/arch-arm/syscalls/epoll_wait.S:10
#1 0x40096fd4 in __wrap_epoll_wait (epfd=<value optimized out>, events=<value optimized out>, maxevents=<value optimized out>,
timeout=<value optimized out>) at /home/bingjiechen/B2G6821/gecko/mozglue/build/Nuwa.cpp:962
#2 0x40c3dc2c in virtual thunk to GeckoInputReaderPolicy::~GeckoInputReaderPolicy() ()
at /home/bingjiechen/B2G6821/gecko/widget/gonk/nsAppShell.cpp:967
#3 0x00000000 in ?? ()
Flags: needinfo?(yaoyao.wu)
Comment 42•11 years ago
|
||
Do you have the phone still in this state? Could I come over and take a look?
Comment 43•11 years ago
|
||
(In reply to Andreas Gal :gal from comment #42)
> Do you have the phone still in this state? Could I come over and take a look?
Yes,the phone still in this state.You can come to take a look.Thank you!
Comment 44•11 years ago
|
||
(In reply to Andreas Gal :gal from comment #42)
> Do you have the phone still in this state? Could I come over and take a look?
I'm in Shanghai now,and my seat is near James.I'll keep the phone till you come here.
Comment 45•11 years ago
|
||
Actually I won't make it to you in time and to my next meeting so we will have to do this over bugzilla. Can you please post a full stack, then continue, and then break again, and post a stack again? I am trying to see whether Gecko is hanging in VisitGrayWrapperTargets or whether it was a coincidence that we broke in there.
Comment 46•11 years ago
|
||
Could we have the adb logcat and dmesg of the phone that is restarting the homescreen repeatedly?
Comment 47•11 years ago
|
||
Comment 48•11 years ago
|
||
Comment 49•11 years ago
|
||
Hi! Cervantes,
Both logcat and dmesg are captured under situation of restarting homescreen repeatly.
Nuwa is still alive.
--
Keven
Comment 50•11 years ago
|
||
It seems we have two bugs here. A homescreen restart and a homescreen hang. Please split the bug in 2 and assign people to investigate both cases.
Comment 51•11 years ago
|
||
We can't not reproduce the homescreen keep restarting issue.
From log, memory is not hit OOM criteria, so it's not because Homescreen is killed and system app's trying to restart it. Cervantes is helping on this restarting issue.
At the same time, we may be able to reproduce two homescreen <div> case after our monkey test, looking into detail now.
Comment 52•11 years ago
|
||
(In reply to Andreas Gal :gal from comment #50)
> It seems we have two bugs here. A homescreen restart and a homescreen hang.
> Please split the bug in 2 and assign people to investigate both cases.
agree, I will file a bug for two homescreen div, and leave this one for homescreen restarting issue.
Updated•11 years ago
|
blocking-b2g: - → 1.3T?
Comment 53•11 years ago
|
||
(In reply to Evelyn Hung [:evelyn] from comment #52)
> (In reply to Andreas Gal :gal from comment #50)
> > It seems we have two bugs here. A homescreen restart and a homescreen hang.
> > Please split the bug in 2 and assign people to investigate both cases.
>
> agree, I will file a bug for two homescreen div, and leave this one for
> homescreen restarting issue.
bug 990365 filed.
Comment 54•11 years ago
|
||
May I know where to get your monkey test build? I found our QA doesn't have this kind of build, so it's hard to reproduce these issues.
Flags: needinfo?(yang.zhao)
Comment 55•11 years ago
|
||
(In reply to Evelyn Hung [:evelyn] from comment #54)
> May I know where to get your monkey test build? I found our QA doesn't have
> this kind of build, so it's hard to reproduce these issues.
I build this in my local machine so that I could use gdb.I updated the code yesterday.
Flags: needinfo?(yang.zhao)
Comment 56•11 years ago
|
||
(In reply to yang.zhao from comment #55)
> (In reply to Evelyn Hung [:evelyn] from comment #54)
> > May I know where to get your monkey test build? I found our QA doesn't have
> > this kind of build, so it's hard to reproduce these issues.
>
> I build this in my local machine so that I could use gdb.I updated the code
> yesterday.
And run "./_monkey.sh test-config/6821-hudson-config test" under folder monkey.test ,the time is 12 hours,so I run it in the night.
Comment 57•11 years ago
|
||
(In reply to yang.zhao from comment #56)
> (In reply to yang.zhao from comment #55)
> > (In reply to Evelyn Hung [:evelyn] from comment #54)
> > > May I know where to get your monkey test build? I found our QA doesn't have
> > > this kind of build, so it's hard to reproduce these issues.
> >
> > I build this in my local machine so that I could use gdb.I updated the code
> > yesterday.
>
> And run "./_monkey.sh test-config/6821-hudson-config test" under folder
> monkey.test ,the time is 12 hours,so I run it in the night.
Thanks, we are trying to make a monkey build which all settings (remote debug + DEVICE_DEBUG=1) are enabled. So once we can reproduce the issue, we are able to look into details.
Comment 58•11 years ago
|
||
From your dmesg log:
cervantes@cyu-dt:/tmp$ grep Homescreen dmesg.log
(many lines removed for brevity)
<4>0[48789.036494] select 30361 (Homescreen), adj 8, size 4744, to kill
<4>0[48789.036570] send sigkill to 30361 (Homescreen), adj 8, size 4744
<4>0[48806.850460] select 30383 (Homescreen), adj 8, size 5433, to kill
<4>0[48806.850525] send sigkill to 30383 (Homescreen), adj 8, size 5433
<4>0[48813.106587] select 30422 (Homescreen), adj 8, size 6172, to kill
<4>0[48813.106658] send sigkill to 30422 (Homescreen), adj 8, size 6172
<4>0[48837.716777] select 30457 (Homescreen), adj 8, size 5227, to kill
<4>0[48837.716849] send sigkill to 30457 (Homescreen), adj 8, size 5227
<4>0[48845.512687] select 30495 (Homescreen), adj 8, size 5988, to kill
<4>0[48845.512755] send sigkill to 30495 (Homescreen), adj 8, size 5988
<6>0[48883.209845] [30536] 40536 30536 17528 4285 0 2 134 Homescreen
<4>0[49190.955490] select 30536 (Homescreen), adj 8, size 5376, to kill
<4>0[49190.955558] send sigkill to 30536 (Homescreen), adj 8, size 5376
<4>0[49220.400303] select 30724 (Homescreen), adj 8, size 5692, to kill
<4>0[49220.400375] send sigkill to 30724 (Homescreen), adj 8, size 5692
From logcat:
04-01 10:41:23.270 776 776 I Gonk : Setting nice for pid 30361 to 18
04-01 10:41:23.270 776 776 I Gonk : Changed nice for pid 30361 from 0 to 18.
04-01 10:41:23.280 776 776 D Sensors : activate handle=3; drv=1
04-01 10:41:23.290 776 776 D Sensors : SensorTMD2771::setEnable=0= en=0; newState=0; what=0; mEnabled=1
04-01 10:41:23.290 776 776 D Sensors : taos ---Light::TAOS_IOCTL_ALS_OFF=== err=0
04-01 10:41:23.290 776 776 D Sensors : SensorTMD2771::setEnable=1= en=0; newState=0; what=0; mEnabled=0
04-01 10:41:23.380 776 776 I GeckoDump: XXX FIXME : Got a mozContentEvent: nfc-hardware-state-change
04-01 10:41:23.410 776 776 I power : *** set_screen_state 1
04-01 10:41:23.420 776 776 D Sensors : activate handle=3; drv=1
04-01 10:41:23.420 776 776 D Sensors : SensorTMD2771::setEnable=0= en=1; newState=1; what=0; mEnabled=0
04-01 10:41:23.530 776 776 D Sensors : taos ---Light::TAOS_IOCTL_ALS_ON=== err=0
04-01 10:41:23.530 776 776 D Sensors : SensorTMD2771::setEnable=1= en=1; newState=1; what=0; mEnabled=1
04-01 10:41:23.530 776 776 D Sensors : SensorTMD2771::getEnable===mEnabled=0x1; enable=1
04-01 10:41:23.620 776 776 I GeckoDump: XXX FIXME : Got a mozContentEvent: nfc-hardware-state-change
04-01 10:41:23.950 776 776 I Gonk : Setting nice for pid 30361 to 18
04-01 10:41:23.950 776 776 I Gonk : Changed nice for pid 30361 from 18 to 18.
04-01 10:41:23.950 776 776 I Gecko : [Parent 776] WARNING: waitpid failed pid:30361 errno:10: file /home/yangzhao/B2G/B2G6821_test/gecko/ipc/chromium/src/base/process_util_posix.cc, line 254
04-01 10:41:23.950 776 776 I power : *** set_screen_state 0
04-01 10:41:24.000 776 816 D Sensors : SensorTMD2771::readEvents()======data_fd=52,count=16
04-01 10:41:24.000 776 816 D Sensors : SensorTMD2771::processEvent()=====code=40::value=97
04-01 10:41:24.010 776 816 D Sensors : SensorTMD2771: mPendingEvents[Light].light = 97.000000
04-01 10:41:24.440 776 776 E [Gralloc-ERROR]: int gralloc_register_buffer(const gralloc_module_t*, const native_handle_t*):79 Unable to register handle 0x4a3f8d30 coming from different process: 776
04-01 10:41:24.440 776 776 E [Gralloc-ERROR]: int gralloc_register_buffer(const gralloc_module_t*, const native_handle_t*):79 Unable to register handle 0x4a3f8dd0 coming from different process: 776
04-01 10:41:25.290 30361 30361 I Gecko : ###################################### forms.js loaded
04-01 10:41:25.310 30361 30361 I Gecko : ############################### browserElementPanning.js loaded
04-01 10:41:25.330 30361 30361 I Gecko : ######################## BrowserElementChildPreload.js loaded
04-01 10:41:25.640 776 776 I Gonk : Setting nice for pid 30361 to 18
04-01 10:41:25.640 776 776 I Gonk : Changed nice for pid 30361 from 18 to 18.
04-01 10:41:25.850 776 776 I Gecko :
04-01 10:41:25.850 776 776 I Gecko : ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
04-01 10:41:25.850 776 776 I Gecko :
04-01 10:41:25.960 776 794 I Gecko : [Parent 776] WARNING: waitpid failed pid:30361 errno:10: file /home/yangzhao/B2G/B2G6821_test/gecko/ipc/chromium/src/base/process_util_posix.cc, line 254
The PIDs for the killed homescreen app are very close. We see the priority change from 0 to 18 and stays in 18 then the process got killed. So this means we restarted the homescreen app in BACKGROUND_HOMESCREEN priority and it got killed repeatedly. We need to check why the homescreen is restarted when it's in background.
Flags: needinfo?(cyu)
Comment 59•11 years ago
|
||
(In reply to Cervantes Yu from comment #58)
> We need to check why the homescreen is restarted
> when it's in background.
In case we could reproduce, check the callstack from these two functions:
https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/homescreen_window.js#L116
https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/homescreen_window.js#L173
Flags: needinfo?(alive)
Comment 60•11 years ago
|
||
(In reply to Alive Kuo [:alive][NEEDINFO!][God bless Taiwan.] from comment #59)
> https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/
> homescreen_window.js#L116
> https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/
> homescreen_window.js#L173
Before we can reproduce the issue here, Tzu-lin can help provide a patch to print some logs or trace stack via remote debugger.
@Tzu-lin, please update your finding.
Comment 61•11 years ago
|
||
Yang Zhao showed me the phone which homescreen continuously restart. I set two break points by app-manager at
https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/homescreen_window.js#L116
and
https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/homescreen_window.js#L173
I found out that HomescreenWindow.prototype.ensure() is kept invoked repeatedly.
So I made a patch (https://github.com/dwi2/gaia/commit/a4155c43fc2f6b65c4f27d9b5ebeea9bddf66e5a) to print out caller of ensure() and try to get more detail.
Partners are working on running monkey test against the patch now.
Flags: needinfo?(tzhuang)
Comment 62•11 years ago
|
||
the print log patch in comment 61 should be this link https://github.com/dwi2/gaia/commits/bug988110_printlog
Comment 63•11 years ago
|
||
(In reply to Tzu-Lin Huang [:dwi2][:tzhuang] from comment #62)
> the print log patch in comment 61 should be this link
> https://github.com/dwi2/gaia/commits/bug988110_printlog
reproduce again with your first patch.The following logs appeared repeatedlly.
E/GeckoConsole( 644): Content JS LOG at app://system.gaiamobile.org/js/homescreen_window.js:178 in hw_ensure: THIS.ELEMENT ID = homescreen
E/GeckoConsole( 644): Content JS WARN at app://system.gaiamobile.org/js/window.js:700 in aw_setOrientation: screen.mozLockOrientation() returned false for app://homescreen.gaiamobile.org orientation default
E/GeckoConsole( 644): Content JS LOG at app://system.gaiamobile.org/js/homescreen_window.js:123 in hw_restart: CALLER IS function (evt) {
E/GeckoConsole( 644): if (evt.detail.type == 'fatal') {
E/GeckoConsole( 644): evt.stopImmediatePropagation();
E/GeckoConsole( 644): self.restart();
E/GeckoConsole( 644): }
E/GeckoConsole( 644): }
E/GeckoConsole( 644): Content JS LOG at app://system.gaiamobile.org/js/homescreen_window.js:175 in hw_ensure: CALLER IS function (next, evt) {
E/GeckoConsole( 644): this.ensure();
E/GeckoConsole( 644): if (!AttentionScreen.isFullyVisible())
E/GeckoConsole( 644): this.setVisible(true);
E/GeckoConsole( 644): this.resetTransition();
E/GeckoConsole( 644): this.setOrientation();
E/GeckoConsole( 644): this.resize();
E/GeckoConsole( 644): }
Comment 64•11 years ago
|
||
(In reply to yang.zhao from comment #63)
> (In reply to Tzu-Lin Huang [:dwi2][:tzhuang] from comment #62)
> > the print log patch in comment 61 should be this link
> > https://github.com/dwi2/gaia/commits/bug988110_printlog
> reproduce again with your first patch.The following logs appeared
> repeatedlly.
>
>
>
> E/GeckoConsole( 644): Content JS LOG at
> app://system.gaiamobile.org/js/homescreen_window.js:178 in hw_ensure:
> THIS.ELEMENT ID = homescreen
> E/GeckoConsole( 644): Content JS WARN at
> app://system.gaiamobile.org/js/window.js:700 in aw_setOrientation:
> screen.mozLockOrientation() returned false for
> app://homescreen.gaiamobile.org orientation default
> E/GeckoConsole( 644): Content JS LOG at
> app://system.gaiamobile.org/js/homescreen_window.js:123 in hw_restart:
> CALLER IS function (evt) {
this is event handler of 'mozbrowsererror'
https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/homescreen_window.js#L77
Which means homescreen app is crashed or killed.
> E/GeckoConsole( 644): if (evt.detail.type == 'fatal') {
> E/GeckoConsole( 644): evt.stopImmediatePropagation();
> E/GeckoConsole( 644): self.restart();
> E/GeckoConsole( 644): }
> E/GeckoConsole( 644): }
> E/GeckoConsole( 644): Content JS LOG at
> app://system.gaiamobile.org/js/homescreen_window.js:175 in hw_ensure: CALLER
> IS function (next, evt) {
This is one of the callbacks of transition state change
https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/homescreen_window.js#L183
I think ensure() is the place where homescreen really restart. But I am still try to get to know the logic of these transition state change callbacks. Anyone has idea?
> E/GeckoConsole( 644): this.ensure();
> E/GeckoConsole( 644): if (!AttentionScreen.isFullyVisible())
> E/GeckoConsole( 644): this.setVisible(true);
> E/GeckoConsole( 644): this.resetTransition();
> E/GeckoConsole( 644): this.setOrientation();
> E/GeckoConsole( 644): this.resize();
> E/GeckoConsole( 644): }
I'll embed more log to see if homescreen app is in foreground or not when restar() is called after crash.
Comment 65•11 years ago
|
||
(In reply to Tzu-Lin Huang [:dwi2][:tzhuang] from comment #64)
> (In reply to yang.zhao from comment #63)
> > (In reply to Tzu-Lin Huang [:dwi2][:tzhuang] from comment #62)
> > > the print log patch in comment 61 should be this link
> > > https://github.com/dwi2/gaia/commits/bug988110_printlog
> > reproduce again with your first patch.The following logs appeared
> > repeatedlly.
> >
> >
> >
> > E/GeckoConsole( 644): Content JS LOG at
> > app://system.gaiamobile.org/js/homescreen_window.js:178 in hw_ensure:
> > THIS.ELEMENT ID = homescreen
> > E/GeckoConsole( 644): Content JS WARN at
> > app://system.gaiamobile.org/js/window.js:700 in aw_setOrientation:
> > screen.mozLockOrientation() returned false for
> > app://homescreen.gaiamobile.org orientation default
> > E/GeckoConsole( 644): Content JS LOG at
> > app://system.gaiamobile.org/js/homescreen_window.js:123 in hw_restart:
> > CALLER IS function (evt) {
>
> this is event handler of 'mozbrowsererror'
> https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/
> homescreen_window.js#L77
> Which means homescreen app is crashed or killed.
>
> > E/GeckoConsole( 644): if (evt.detail.type == 'fatal') {
> > E/GeckoConsole( 644): evt.stopImmediatePropagation();
> > E/GeckoConsole( 644): self.restart();
> > E/GeckoConsole( 644): }
> > E/GeckoConsole( 644): }
> > E/GeckoConsole( 644): Content JS LOG at
> > app://system.gaiamobile.org/js/homescreen_window.js:175 in hw_ensure: CALLER
> > IS function (next, evt) {
> This is one of the callbacks of transition state change
> https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/
> homescreen_window.js#L183
> I think ensure() is the place where homescreen really restart. But I am
> still try to get to know the logic of these transition state change
> callbacks. Anyone has idea?
>
Alive, any thought?
Flags: needinfo?(alive)
Comment 66•11 years ago
|
||
1.3T+, partner priority bug
blocking-b2g: 1.3T? → 1.3T+
Whiteboard: [priority]
Comment 67•11 years ago
|
||
The state machine is similar to v1.4:
http://alivedise.github.io/gaia-system-jsdoc/AppTransitionController.html
When the state is from closed to opening, _leave_closed function would be invoked.
It seems the homescreen is repeatedly killed 'at foreground' so homescreen_window would try to relaunch it again right away. But that shouldn't cause the problem if this.element is already there (https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/homescreen_window.js#L174).
I wonder that if we hit some bad LMK issue:
A mozbrowser iframe is inited as foreground state and if it's killed right away by some reason we will see homescreen is launched repeatedly.
The possible solution maybe to call ensure() only when home button is pressed. But I am not sure there's side effect or not. Maybe the user will see homescreen empty.
Flags: needinfo?(alive)
Comment 68•11 years ago
|
||
After discussion with :cyu and :eve,
I think homescreen_window mistakenly think homescreen is foreground because the decision is
https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/homescreen_window.js#L126
and we will assume homescreen is launched as foreground:
https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/homescreen_window.js#L150
So I guess that:
0. Homescreen is opened(active, foreground)
1. Incoming call
2. After 3sec, Homescreen is sent to background(https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/window_manager.js#L1118) but still has 'active' class.
3. Homescreen is killed by LMK.(https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/homescreen_window.js#L78)
4. Homescreen is restarted since it's having active class.(https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/homescreen_window.js#L125)
5. Homescreen is killed right away. (gecko issue)
Repeated.
The possible solution is:
(1) In AppWindow.prototype.setVisible, add |this._gaiaVisibilityState = true/false| after setting.
(2) Check this._gaiaVisibilityState in restart function.
It's because |this._visibilityState| is changed by asynchronous event callback(https://github.com/mozilla-b2g/gaia/blob/v1.3t/apps/system/js/homescreen_window.js#L87) and gecko may kill the app before it sends the event so it's not reliable.
Also we in gaia assume a new mozbrowser iframe is foreground. If there's no setVisible call we won't get visibilitychange event.
Comment 69•11 years ago
|
||
https://github.com/mozilla-b2g/gaia/pull/17868 patch per comment 68, TzuLin, please check if this helps.
Comment 70•11 years ago
|
||
Hi Alive,
Thanks, I'll try this patch on partner side monkey test.
Comment 71•11 years ago
|
||
It is still happen after applying https://github.com/mozilla-b2g/gaia/pull/17868
Also I saw that b2g process and comms app consume lots of CPU
root@android:/ # b2g-info
| megabytes |
NAME PID PPID CPU(s) NICE USS PSS RSS VSIZE OOM_ADJ USER
b2g 1343 1 786.0 0 26.4 28.9 33.6 143.4 0 root
(Nuwa) 1395 1343 18.2 0 0.1 0.2 0.5 50.5 0 root
Communications 2359 1395 81.7 0 0.8 1.4 3.3 84.1 1 app_2359
Homescreen 19317 1395 6.2 18 8.3 11.2 16.4 65.0 2 app_19317
(Preallocated a 19359 1395 2.7 18 1.7 3.6 7.8 57.5 1 root
System memory info:
Total 98.5 MB
Used - cache 72.3 MB
B2G procs (PSS) 45.3 MB
Non-B2G procs 27.0 MB
Free + cache 26.2 MB
Free 6.8 MB
Cache 19.4 MB
Low-memory killer parameters:
notify_trigger 14336 KB
oom_adj min_free
0 1024 KB
1 2048 KB
2 4096 KB
6 18432 KB
8 18432 KB
10 24576 KB
Comment 72•11 years ago
|
||
the log attached is generate with the patch https://github.com/dwi2/gaia/compare/536870d239...4cea602cf9
There is something weird that:
in hw_kill, before we remove homescreen iframe, 'this.element' did hold the iframe (see line 2), however, 'this.containerElement' did not have iframe (this.element) inside of it (see line 3 to line 16).
Comment 73•11 years ago
|
||
Comment 74•11 years ago
|
||
another weird part is that:
in hw_render (see line 48 of attachment 8401061 [details]), after inject <div id="homescreen" ...> into 'this.containerElement', there is still no homescreen div in it.
Comment 75•11 years ago
|
||
We can still meet this issue after apply alive's patch!
Flags: needinfo?(james.zhang)
Comment 76•11 years ago
|
||
(In reply to Tzu-Lin Huang [:dwi2][:tzhuang] from comment #74)
> another weird part is that:
> in hw_render (see line 48 of attachment 8401061 [details]), after inject
> <div id="homescreen" ...> into 'this.containerElement', there is still no
> homescreen div in it.
Tzu-lin's printing log patch here: https://github.com/dwi2/gaia/commit/17cc1ba7bc784b5f5658334606df5e01ffe75767
Comment 77•11 years ago
|
||
I am looking at this, will try to produce another patch (& patch for log) today.
Comment 78•11 years ago
|
||
https://github.com/timdream/gaia/compare/mozilla-b2g:v1.3t...v1.3t-crazy-monkey
Please try this. We suspect home screen is mistakenly relaunched when attention screen is being animated. There is an additional patch on top of existing logging commits and tentitive fix for it.
Updated•11 years ago
|
Severity: major → blocker
Priority: -- → P1
Updated•11 years ago
|
Attachment #8401061 -
Attachment mime type: text/x-log → text/plain
Comment 79•11 years ago
|
||
Comment 80•11 years ago
|
||
hi,all
I found another interesting phenomenon.
1)launch an app,such as contacts,keep it foreground.Kill homescreen by adb shell kill PID
2)Start homescreen by clicking "START" from the nightly.
3)You'll see picture1 in the attachment
4 [review])Press Home button,you'll see picture2 in the attachment
I don't know whether it's helpfull,just afford if for your information.
Comment 81•11 years ago
|
||
(In reply to yang.zhao from comment #80)
> 2)Start homescreen by clicking "START" from the nightly.
What you did is expected because you are launching Homescreen as a usual app, so it shows up in the cards view. It's not related to this bug.
Comment 82•11 years ago
|
||
(In reply to Tim Guan-tin Chien [:timdream] (MoCo-TPE) (please ni?) from comment #78)
> https://github.com/timdream/gaia/compare/mozilla-b2g:v1.3t...v1.3t-crazy-
> monkey
>
> Please try this. We suspect home screen is mistakenly relaunched when
> attention screen is being animated. There is an additional patch on top of
> existing logging commits and tentitive fix for it.
I just push a new commit to the branch to count the # of restarts.
I think even if we fix the window management state management part of this issue, we still need to figure out why Gecko doesn't allow us to launch app repetitively within a short time. Gecko should have a better behavior for Gaia System to depend on.
Flags: needinfo?(tzhuang)
Comment 83•11 years ago
|
||
Hi Tim,
Thanks, I've ask partner to use the latest patch that you provided to run monkey test. Also I slightly modify some of log printing to avoid truncating. You can see them here https://github.com/dwi2/gaia/compare/mozilla-b2g:v1.3t...bug988110_tims_patch (I modify it on my commit only)
Flags: needinfo?(tzhuang)
Updated•11 years ago
|
Flags: needinfo?(james.zhang)
Comment 84•11 years ago
|
||
I have started new build with your latest patch.
Flags: needinfo?(james.zhang)
Comment 85•11 years ago
|
||
What's the log for the last run? Even without the new logging commits, it would be still nice if we know what was going on on that build.
Comment 86•11 years ago
|
||
Hi Tim,
I've attach latest logs. Partner runs monkey test with the patch you provide last night.
You can find gaia output in logcat.log in the attachement. I also get dmesg and b2g-info for investigation.
The screen still hangs during monkey test. However, there is no repetitive homescreen restart, because homescreen app is no longer existed (see b2g-info), and there is no homescreen div in system. (see app-manager-screenshot.png)
Comment 87•11 years ago
|
||
Here is a video showing the screen hanged
Comment 88•11 years ago
|
||
NI Tim and Alive.
Since you are discussing with Tzu-Lin.
Tzu-Lin mentioned homescreen recovered after 1 min.
--
Keven
Flags: needinfo?(timdream)
Flags: needinfo?(alive)
Comment 89•11 years ago
|
||
Here are b2g-info, logcat, dmesg output when I call "LockScreen.unlock()" in app-manager on the screen-hanged device. Also the DOM structure under #screen > #windows is in the file name DOM_screen_windows.
Homescreen did recover in 40 seconds after I call LockScreen.unlock.
Comment 90•11 years ago
|
||
Here is the video show what happened in comment 89
Comment 91•11 years ago
|
||
(In reply to Tzu-Lin Huang [:dwi2][:tzhuang] from comment #90)
> Created attachment 8401721 [details]
> FORCE_UNLOCK.MOV
>
> Here is the video show what happened in comment 89
I saw the bug with the phone from tzhuang.
We can unlock the screen ,enter into homscreen ,lunch app by app-manager, and the app seemed run ok.
But the touch event seemed lost from that time, all touch operation were not responsed.
Comment 92•11 years ago
|
||
I think we have solved the window management issue here. From comment 88, we need to find out why lock screen does not respond to touches. Do we log hardware touches in dmesg and console? Do we see them? If so we need to add logs to lockscreen touch event listeners and run it again.
Besides that, from comment 90, we have serious memory issue. The phone evidently runs slower as the test progresses. We need platform engineers to solve that, which IMHO is the real issue.
Thanks Tzu-Lin!
Comment 93•11 years ago
|
||
(In reply to ying.xu from comment #91)
> I saw the bug with the phone from tzhuang.
> We can unlock the screen ,enter into homscreen ,lunch app by app-manager,
> and the app seemed run ok.
> But the touch event seemed lost from that time, all touch operation were not
> responsed.
There might be touch event focusing issue here. Maybe touch events is being sent to the wrong frame?
(This happens before)
Flags: needinfo?(timdream)
Comment 94•11 years ago
|
||
Update on comment 92. Tzu-lin just said we can still receive touches with |adb shell getevent|.
So we need to look at Gecko and maybe Gaia now.
Comment 95•11 years ago
|
||
I sent the patch https://github.com/dwi2/gaia/compare/mozilla-b2g:v1.3t...bug988110_tims_patch to partner to run another round of monkey test.
I'd added debug log at touch event handler in lockscreen, homescreen and utility tray. But I think I still need all kind of methods to debug possible touch event problem no matter what layer it is.
Comment 96•11 years ago
|
||
(In reply to Tzu-Lin Huang [:dwi2][:tzhuang] from comment #95)
> I sent the patch
> https://github.com/dwi2/gaia/compare/mozilla-b2g:v1.3t...
> bug988110_tims_patch to partner to run another round of monkey test.
> I'd added debug log at touch event handler in lockscreen, homescreen and
> utility tray. But I think I still need all kind of methods to debug possible
> touch event problem no matter what layer it is.
With your patch, we reproduce this issue.
We can't find you gecko log and can find getevent log. Is it gecko issue ?
Please see modification and revert gecko commit.
diff --git a/dom/ipc/TabChild.cpp b/dom/ipc/TabChild.cpp
index 5e20e9f..e5fccd0 100644
--- a/dom/ipc/TabChild.cpp
+++ b/dom/ipc/TabChild.cpp
@@ -72,7 +72,9 @@
#include "nsILoadContext.h"
#include "ipc/nsGUIEventIPC.h"
#include "UnitTransforms.h"
-
+#include <android/log.h>
+using namespace android;
+#define LLOG(args...) __android_log_print(ANDROID_LOG_INFO, "tzhuang ==>" , ## args)
#ifdef DEBUG
#include "PCOMContentPermissionRequestChild.h"
#endif /* DEBUG */
@@ -1890,6 +1892,7 @@ bool
TabChild::RecvRealTouchEvent(const WidgetTouchEvent& aEvent,
const ScrollableLayerGuid& aGuid)
{
+ LLOG("Touch Receive: %d", aEvent.message);
WidgetTouchEvent localEvent(aEvent);
for (size_t i = 0; i < localEvent.touches.Length(); i++) {
aEvent.touches[i]->mRefPoint = APZCCallbackHelper::ApplyCallbackTransform(aEvent.touches[i]->mRefPoint, aGuid, mWidget->GetDefaultScale());
diff --git a/dom/ipc/TabParent.cpp b/dom/ipc/TabParent.cpp
index 2ff44b6..d0ee96c 100644
--- a/dom/ipc/TabParent.cpp
+++ b/dom/ipc/TabParent.cpp
@@ -58,7 +58,9 @@
#include "TabChild.h"
#include "nsNetCID.h"
#include <algorithm>
-
+#include <android/log.h>
+using namespace android;
+#define LLOG(args...) __android_log_print(ANDROID_LOG_INFO, "tzhuang ==>" , ## args)
using namespace mozilla::dom;
using namespace mozilla::ipc;
using namespace mozilla::layers;
@@ -786,6 +788,7 @@ bool TabParent::SendRealKeyEvent(WidgetKeyboardEvent& event)
bool TabParent::SendRealTouchEvent(WidgetTouchEvent& event)
{
+ LLOG("Touch Send: %d", event.message);
if (mIsDestroyed) {
return false;
}
Comment 97•11 years ago
|
||
Hi! James,
Is this still due to lack of ZRAM?
Will Homescreen recover in 40 seconds after call LockScreen.unlock?
--
Keven
Comment 98•11 years ago
|
||
Keven,
Homescreen recovers in 40 seconds after I call LockScreen.unlock() through app-manager.
dmesg, b2g-info and logcat are in the attachement. This monkey test runs with the patch https://github.com/dwi2/gaia/compare/mozilla-b2g:v1.3t...bug988110_tims_patch and the patch James posted on comment 96
Comment 99•11 years ago
|
||
Alive, let's clone a bug and have your window management patch lands on 1.3t/master with tests first.
People can continuing working on touch event issue here.
Component: Gaia::System::Window Mgmt → General
Updated•11 years ago
|
Component: General → IPC
Product: Firefox OS → Core
Version: unspecified → 28 Branch
Updated•11 years ago
|
Whiteboard: [priority] → [priority][MP_Blocker]
Comment 100•11 years ago
|
||
We did another round of monkey test at partner's office. Here's the log. This time we add debug log of touch event handling in both gecko and gaia (please refer to patches/ in attachment)
After several hour of monkey test. the screen hanged. (Lockscreen still could be unlocked with app-manager). |adb shell getevent| shows that we did receive touch events. But there seems no touch events passed in gecko and gaia.
Updated•11 years ago
|
Flags: needinfo?(alive)
Comment 101•11 years ago
|
||
So Viral is working on investigating why touch event didn't pass to Gecko.
Flags: needinfo?(vwang)
Updated•11 years ago
|
Whiteboard: [priority][MP_Blocker] → [priority]
Assignee | ||
Comment 102•11 years ago
|
||
(In reply to Tzu-Lin Huang [:dwi2][:tzhuang] from comment #100)
> Created attachment 8402442 [details]
> with_gecko_touch_log_2.zip
>
> We did another round of monkey test at partner's office. Here's the log.
> This time we add debug log of touch event handling in both gecko and gaia
> (please refer to patches/ in attachment)
>
> After several hour of monkey test. the screen hanged. (Lockscreen still
> could be unlocked with app-manager). |adb shell getevent| shows that we did
> receive touch events. But there seems no touch events passed in gecko and
> gaia.
log shows GeckoInputDispatcher didn't dispatch event when issue happened.
next step is to check EventHub.cpp and InputReader.cpp in gonk layer.
Now I'm setting two devices for monkey test and try to reproduce this bug.
Flags: needinfo?(vwang)
Comment 103•11 years ago
|
||
When the touch events were not responded. the key press can be handled.
And they share the same code flow.
Comment 104•11 years ago
|
||
Next time this issue is reproduced, can someone attach the orangutan script that reproduces it?
Comment 105•11 years ago
|
||
Were these monkey tests run to completion, or were they interrupted after a few hours?
Comment 106•11 years ago
|
||
I ran run-monkey.sh for several hours until it completed and did not see this issue.
Comment 107•11 years ago
|
||
(In reply to Michael Wu [:mwu] from comment #105)
> Were these monkey tests run to completion, or were they interrupted after a
> few hours?
Yes, currently it will run for 12hours
Comment 108•11 years ago
|
||
hi,all
I/Gonk (13087): bug988110 dispatchOnce data.type: 0
I/Gonk (13087): bug988110 dispatchOnce MOTION mTouchEventsFiltered: 1
The above is the logcat output when the phone reproduced this issue.It seems that the touch event has been filtered.The log patch is the belows:
diff --git a/widget/gonk/nsAppShell.cpp b/widget/gonk/nsAppShell.cpp
index 2cd7dae..d87b797 100644
--- a/widget/gonk/nsAppShell.cpp
+++ b/widget/gonk/nsAppShell.cpp
-
+ LOG("bug988110 dispatchOnce data.type: %d", data.type);
switch (data.type) {
case UserInputData::MOTION_DATA: {
if (!mTouchDownCount) {
// No pending events, the filter state can be updated.
mTouchEventsFiltered = isExpired(data);
}
-
int32_t action = data.action & AMOTION_EVENT_ACTION_MASK;
switch (action) {
case AMOTION_EVENT_ACTION_DOWN:
@@ -561,7 +561,7 @@ GeckoInputDispatcher::dispatchOnce()
default:
break;
}
-
+ LOG("bug988110 dispatchOnce MOTION mTouchEventsFiltered: %d", mTouchEventsFiltered);
if (mTouchEventsFiltered) {
return;
}
@@ -604,7 +604,7 @@ GeckoInputDispatcher::dispatchOnce()
// No pending events, the filter state can be updated.
mKeyEventsFiltered = isExpired(data);
}
-
+ LOG("bug988110 dispatchOnce KEY mTouchEventsFiltered: %d", mTouchEventsFiltered);
mKeyDownCount += (data.action == AKEY_EVENT_ACTION_DOWN) ? 1 : -1;
if (mKeyEventsFiltered) {
return;
Assignee | ||
Comment 109•11 years ago
|
||
I have monkey test in two devices.
One is still running over 12 hours and one met the case similar to comment #108.
In the defected device, log shows GeckoInputDispatcher did dispatch but gecko didn't send touch event in TabParent. (the result is not consistent with comment #100)
I would like to add more log between GeckoInputDispatcher and system app and do another round testing.
Assignee | ||
Comment 110•11 years ago
|
||
just figure out comment #108 is great help on this bug.
since this filter land on 3/21, shall we have any information like this bug only happened after 3/21?
Comment 111•11 years ago
|
||
Alright, so here is my theory:
Due to a bug in the driver we aren't seeing on other silicon mTouchDownCount gets out of balance and doesn't go back to 0, as a result mTouchEventsFiltered remains true and we filter all touch events.
Comment 112•11 years ago
|
||
Actually this might be some sort of a race condition since the touch events are sent by orangutan and not the actual driver. In any case the mTouchDownCount counting seems to be not stable and we should solve that somehow differently.
Comment 113•11 years ago
|
||
I think the filter Viral means is this patch: https://bugzilla.mozilla.org/show_bug.cgi?id=971633
(In reply to viral [:viralwang] from comment #110)
> just figure out comment #108 is great help on this bug.
> since this filter land on 3/21, shall we have any information like this bug
> only happened after 3/21?
Assignee | ||
Comment 114•11 years ago
|
||
(In reply to Andreas Gal :gal from comment #111)
> Alright, so here is my theory:
>
> Due to a bug in the driver we aren't seeing on other silicon mTouchDownCount
> gets out of balance and doesn't go back to 0, as a result
> mTouchEventsFiltered remains true and we filter all touch events.
We do met mTouchDownCount out of balance and we will miss all touch events after that.
(failed rate: 2/2)
01-01 18:03:42.029 I/GonkTouch( 82): timeNowMs: 10221068, data.timeMs: 10188343 //mTouchEventsFiltered=1
01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 0, mTouchDownCount: 1 //finger down
01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 9, mTouchDownCount: 1 //hover enter
01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 10, mTouchDownCount: 1 //hover exit
01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 0, mTouchDownCount: 2 //finger up
01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 1, mTouchDownCount: 1
01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 0, mTouchDownCount: 2
01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 1, mTouchDownCount: 1
Assignee | ||
Comment 115•11 years ago
|
||
bug 989234 is not blocking this bug and this could be an regression of bug 971633.
Keywords: regression
Comment 116•11 years ago
|
||
Are we sure it's a regression of bug 971633 ? Please only set dependencies while we have them verified!
Comment 117•11 years ago
|
||
So this is a bug in the driver. action is 0 for up and should be 1.
01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 0, mTouchDownCount: 1 //finger down
01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 9, mTouchDownCount: 1 //hover enter
01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 10, mTouchDownCount: 1 //hover exit
01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 0, mTouchDownCount: 2
James can you guys fix this?
Flags: needinfo?(james.zhang)
Updated•11 years ago
|
Summary: [tarako] gaia no respond after monkey test start about 2 hours → [tarako] touch event filtering can get stuck when up/down actions are unbalanced
Comment 118•11 years ago
|
||
Looking at the sequence here the 2nd finger down event is simply bogus. We should fix that in the driver but Android likely tolerates this, and we should do the same, because other vendors probably make the same mistake.
01-01 18:03:42.029 I/GonkTouch( 82): timeNowMs: 10221068, data.timeMs: 10188343 //mTouchEventsFiltered=1
01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 0, mTouchDownCount: 1 //finger down
01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 9, mTouchDownCount: 1 //hover enter
01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 10, mTouchDownCount: 1 //hover exit
01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 0, mTouchDownCount: 2 //bogus!
01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 1, mTouchDownCount: 1
Comment 119•11 years ago
|
||
Comment 120•11 years ago
|
||
(In reply to Andreas Gal :gal from comment #117)
> So this is a bug in the driver. action is 0 for up and should be 1.
AIUI, the monkey testing here involves events generated by a test program rather than by the driver itself.
Comment 121•11 years ago
|
||
I won't have time to work on this, please feel free to steal the patch and land it if it works.
Comment 122•11 years ago
|
||
Yeah, this might be an artificial failure that doesn't happen with real driver input. We have to unblock monkey testing either way though, so we can as well land the patch and make this work.
Comment 123•11 years ago
|
||
We make a test build with revised attachment 8403767 [details] [diff] [review] to monkey test again. We expect this revised patch can fix the side-effect of mTouchDownCount.
Flags: needinfo?(ttsai)
Assignee | ||
Comment 124•11 years ago
|
||
Hi Michael,
We already start another round testing now but still need hours to verify.
Could you please help to review it?
Thanks.
Attachment #8403861 -
Flags: review?(mwu)
Comment 125•11 years ago
|
||
What's the rationale of fixing this in Gecko instead of in the monkey test program?
Comment 126•11 years ago
|
||
(In reply to Andreas Gal :gal from comment #117)
> So this is a bug in the driver. action is 0 for up and should be 1.
>
> 01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 0,
> mTouchDownCount: 1 //finger down
> 01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 9,
> mTouchDownCount: 1 //hover enter
> 01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 10,
> mTouchDownCount: 1 //hover exit
> 01-01 18:03:42.029 I/GonkTouch( 82): dispatchOnce action: 0,
> mTouchDownCount: 2
>
> James can you guys fix this?
Flags: needinfo?(james.zhang) → needinfo?(ying.xu)
Comment 127•11 years ago
|
||
This event is not come from touch panel driver, it come from monkey test tool -- orng.
Comment 128•11 years ago
|
||
Without bug 971633 commit, our monkey test has run about 6 hours, it won't stuck.
Comment 129•11 years ago
|
||
(In reply to Tim Guan-tin Chien [:timdream] (MoCo-TPE) (please ni?) from comment #125)
> What's the rationale of fixing this in Gecko instead of in the monkey test
> program?
defense in depth. Be liberal with what you receive, and careful with what you send.
Updated•11 years ago
|
Attachment #8403861 -
Flags: review?(mwu) → review+
Comment 130•11 years ago
|
||
Thanks for the fix. We should definitely defend against this in Gecko. Drivers will get this wrong, too. Ship it!
Assignee | ||
Comment 131•11 years ago
|
||
We already have two devices pass over 12 hours monkey test and touch still working.
I will prepare a patch for landing.
Comment 132•11 years ago
|
||
Confirmed comment #131.
Assignee | ||
Comment 133•11 years ago
|
||
update the patch for landing.
I think both 1.3t and master should apply this patch.
Assignee: anygregor → vwang
Attachment #8403767 -
Attachment is obsolete: true
Attachment #8403861 -
Attachment is obsolete: true
Assignee | ||
Updated•11 years ago
|
Keywords: checkin-needed
Comment 134•11 years ago
|
||
Updated•11 years ago
|
Keywords: checkin-needed
Updated•11 years ago
|
status-b2g-v1.3T:
--- → affected
Comment 135•10 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
Comment 136•10 years ago
|
||
You need to log in
before you can comment on or make changes to this bug.
Description
•