Closed
Bug 1016746
Opened 10 years ago
Closed 10 years ago
A few hours monkey test,the homescreen turns up a black screen,there is a lot of activity-window under windows div
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
People
(Reporter: yang.zhao, Assigned: bkelly)
References
Details
(Keywords: regression, Whiteboard: [sprd313838])
Attachments
(5 files)
3.00 KB,
image/png
|
Details | |
415.31 KB,
image/png
|
Details | |
397.93 KB,
image/png
|
Details | |
3.73 KB,
patch
|
khuey
:
review+
praghunath
:
approval-mozilla-b2g30+
|
Details | Diff | Splinter Review |
4.29 KB,
patch
|
Details | Diff | Splinter Review |
After a few hours monkey test ,several phones got the issue that the homescreen turns up a black screen .Press Home button can't make the phone return to homescreen. From the app-manager ,you can see a lot of activity-window are active.
Here is the app-manager screenshot. Hi,fabrice Is it related with the inproc homescreen ?
Flags: needinfo?(james.zhang)
Flags: needinfo?(fabrice)
Flags: needinfo?(alive)
Comment 2•10 years ago
|
||
I guess it's because homescreen inproc -> homescreen's activity is inproc -> activitydone is not fired well at this case. But I am not sure.
Flags: needinfo?(alive)
Comment 3•10 years ago
|
||
Can we check which activity this is to help reproduce?
Flags: needinfo?(fabrice)
Comment 5•10 years ago
|
||
ok thanks. I'll look at that tomorrow.
Assignee: nobody → fabrice
blocking-b2g: --- → 1.3?
Comment 6•10 years ago
|
||
If we can't fix it quickly, we'll enable home screen oop and cause performance issue again.
Assignee: fabrice → nobody
blocking-b2g: 1.3? → 1.3T?
status-b2g-v1.3T:
--- → affected
Flags: needinfo?(james.zhang)
After these things happen,when I press the power key to lock the phone,the following log will be printed out: E/GeckoConsole( 599): [JavaScript Error: "TypeError: this.frame is undefined" {file: "app://system.gaiamobile.org/js/window.js" line: 125}] Press the power key to light the phone,then unlock it,the following log will be printed out: E/GeckoConsole( 599): [JavaScript Error: "TypeError: this.frame is undefined" {file: "app://system.gaiamobile.org/js/window.js" line: 120}] E/GeckoConsole( 599): [JavaScript Error: "NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [inIDOMUtils.setContentState]" {file: "chrome://global/content/BrowserElementPanning.js" line: 59}]
I found the following logs: E GeckoConsole: [JavaScript Error: "TypeError: this.enableWithoutCover is not a function" {file: "app://system.gaiamobile.org/js/lockscreen.js" line: 446}] hi,Anthony Ricaud Since you removed this function in commit 449e0cd70ed999ffc4ae9e6152d57ab01193376d.It seems that you didn't remove clearly.Please help to check it.Thank you!
Flags: needinfo?(anthony)
Updated•10 years ago
|
blocking-b2g: 1.3T? → 1.3T+
Keywords: regression
Comment 9•10 years ago
|
||
I can't reproduce locally. Activities triggered both from the homescreen and from an oop app are closed as expected, in all cases (success, cancel or going back to the homescreen). One thing I note is that in the screenshot from comment 4, activity frames are direct children of <div id="windows">. This is not what I see locally, where they are children of the wrapper div for the app that called MozActivity(). Alive, any idea how we can end up in this situation?
Flags: needinfo?(alive)
Comment 10•10 years ago
|
||
We can meet this issue every day with 12 hours monkey test. Danny, do you also meet it on your side?
Flags: needinfo?(ttsai)
Flags: needinfo?(styang)
Flags: needinfo?(kkuo)
Flags: needinfo?(dliang)
Comment 11•10 years ago
|
||
(In reply to Fabrice Desré [:fabrice] from comment #9) > I can't reproduce locally. Activities triggered both from the homescreen and > from an oop app are closed as expected, in all cases (success, cancel or > going back to the homescreen). > > One thing I note is that in the screenshot from comment 4, activity frames > are direct children of <div id="windows">. This is not what I see locally, > where they are children of the wrapper div for the app that called > MozActivity(). > > Alive, any idea how we can end up in this situation? Good catch. We render the activity inside caller in bug https://bugzilla.mozilla.org/show_bug.cgi?id=977934, and it's also in 1.3t. but in the screenshot it seems they didn't take this patch. Yang, could you confirm?
Flags: needinfo?(alive)
Comment 12•10 years ago
|
||
(In reply to Fabrice Desré [:fabrice] from comment #9) > I can't reproduce locally. Activities triggered both from the homescreen and > from an oop app are closed as expected, in all cases (success, cancel or > going back to the homescreen). > > One thing I note is that in the screenshot from comment 4, activity frames > are direct children of <div id="windows">. This is not what I see locally, > where they are children of the wrapper div for the app that called > MozActivity(). > > Alive, any idea how we can end up in this situation? ...And from the screenshot, check the parentapp property of the mozbrowser iframe: some is gallery and some is sms. But it's also strange that this activity is not rendered inside gallery and sms.
Comment 13•10 years ago
|
||
Hi! Alive, Could you help? Based on the log yang.zhao provided. -- Keven
Flags: needinfo?(kkuo) → needinfo?(alive)
Reporter | ||
Comment 14•10 years ago
|
||
(In reply to Alive Kuo [:alive][NEEDINFO!] from comment #11) > (In reply to Fabrice Desré [:fabrice] from comment #9) > > I can't reproduce locally. Activities triggered both from the homescreen and > > from an oop app are closed as expected, in all cases (success, cancel or > > going back to the homescreen). > > > > One thing I note is that in the screenshot from comment 4, activity frames > > are direct children of <div id="windows">. This is not what I see locally, > > where they are children of the wrapper div for the app that called > > MozActivity(). > > > > Alive, any idea how we can end up in this situation? > > Good catch. > We render the activity inside caller in bug > https://bugzilla.mozilla.org/show_bug.cgi?id=977934, > and it's also in 1.3t. > but in the screenshot it seems they didn't take this patch. > > Yang, could you confirm? The patch in 977934 already landed on v1.3t,the commit is:1188d95f1c903098bfc9026efb9d5251566c0036
Comment 15•10 years ago
|
||
Be noted activity is found leak in bug 1007520.
Comment 16•10 years ago
|
||
Hi, could you see if the patch of bug 1007520 helps? I cannot reproduce locally too.
Flags: needinfo?(alive)
Reporter | ||
Comment 17•10 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #15) > Be noted activity is found leak in bug 1007520. Hi,Ting-Yu Does the three parts of patch in bug 1007520 are all needed?If not,please tell me which patch should I use for testing this issue.Thank you!
Comment 18•10 years ago
|
||
(In reply to James Zhang from comment #10) > We can meet this issue every day with 12 hours monkey test. > Danny, do you also meet it on your side? We also met this issue, 4 devices all are black screen under 24 hr monkey test. My build included part 1~3 of bug 1007520 and some debug patches, detail in https://github.com/dannyliang/tarako-patches
Flags: needinfo?(dliang)
Comment 19•10 years ago
|
||
(In reply to yang.zhao from comment #17) > Hi,Ting-Yu > Does the three parts of patch in bug 1007520 are all needed?If not,please > tell me which patch should I use for testing this issue.Thank you! According to comment 18, the patches do not help.
Flags: needinfo?(tchou)
Reporter | ||
Comment 20•10 years ago
|
||
(In reply to Alive Kuo [:alive][NEEDINFO!] from comment #16) > Hi, could you see if the patch of bug 1007520 helps? I cannot reproduce > locally too. hi,Alive Is it possible that it's Bug 1014704 - '[B2G][Tarako][E.me] Everything.me apps are not adding to homescreen when selecting 'Add to Homescreen' causes the issue ?
Flags: needinfo?(alive)
Comment 21•10 years ago
|
||
(In reply to yang.zhao from comment #7) > E/GeckoConsole( 599): [JavaScript Error: "TypeError: this.frame is > undefined" {file: "app://system.gaiamobile.org/js/window.js" line: 120}] Any idea that how can the frame be undefined? If an ActivityWindow is instantiated with a caller without frame, the containerElement will be "windows", which matches what Fabrice found in comment 9.
Comment 22•10 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #21) > (In reply to yang.zhao from comment #7) > > E/GeckoConsole( 599): [JavaScript Error: "TypeError: this.frame is > > undefined" {file: "app://system.gaiamobile.org/js/window.js" line: 120}] > > Any idea that how can the frame be undefined? If an ActivityWindow is > instantiated with a caller without frame, the containerElement will be > "windows", which matches what Fabrice found in comment 9. Might be, but I really don't know why this will happen. 1. Does inproc homescreen causes this bug? 2. Could you log down something in activity_window?
Flags: needinfo?(alive)
Reporter | ||
Comment 23•10 years ago
|
||
(In reply to Alive Kuo [:alive][NEEDINFO!] from comment #22) > (In reply to Ting-Yu Chou [:ting] from comment #21) > > (In reply to yang.zhao from comment #7) > > > E/GeckoConsole( 599): [JavaScript Error: "TypeError: this.frame is > > > undefined" {file: "app://system.gaiamobile.org/js/window.js" line: 120}] > > > > Any idea that how can the frame be undefined? If an ActivityWindow is > > instantiated with a caller without frame, the containerElement will be > > "windows", which matches what Fabrice found in comment 9. > > Might be, but I really don't know why this will happen. > 1. Does inproc homescreen causes this bug? > 2. Could you log down something in activity_window? This issues happens after the patch for inproc homescreen,but I don't think we could say that inproc homescreen caused this bug. I reverted the patch in Bug 1014704 ,and run monkey test on two phones,it seems that this issue didn't happen. If you want some log in activity_window,you can add some log,and I will help to run the monkey test again.
Comment 24•10 years ago
|
||
We have removed bug 1014704 with WIP patch.
Comment 25•10 years ago
|
||
(In reply to James Zhang from comment #24) > We have removed bug 1014704 with WIP patch. Not a good idea imho. You should rather take the followup in bug 1017331
Updated•10 years ago
|
Flags: needinfo?(yang.zhao)
Reporter | ||
Comment 26•10 years ago
|
||
(In reply to Fabrice Desré [:fabrice] from comment #25) > (In reply to James Zhang from comment #24) > > We have removed bug 1014704 with WIP patch. > > Not a good idea imho. You should rather take the followup in bug 1017331 Already get bug 1014704 back.
Flags: needinfo?(yang.zhao)
Comment 27•10 years ago
|
||
Fabrice, wonder if you are the right person to take this bug? Thanks
Updated•10 years ago
|
Flags: needinfo?(fabrice)
We believe this is a Gecko issue and bkelly is working on a patch.
Assignee: nobody → bkelly
Updated•10 years ago
|
Flags: needinfo?(fabrice)
Assignee | ||
Comment 29•10 years ago
|
||
So our theory is that ContentParent::NotifyTabDestroyed() is being executed in the activity case which runs ContentParent::ShutDownProcess() from a runnable. If ShutDownProcess() completes prior to ActorDestroy() then mMessageManager is set to null preventing the 'child-process-shutdown' message from being sent. Typically the Close() call in ShutDownProcess() should trigger ActorDestroy(). It only does this once, though, setting a flag to avoid running ActorDestroy() again. So the theory is we get this sequence: 1) ContentParent::NotifyTabDestroyed() is putting runnable on event queue 2) ShutDownProcess() is called as normal 3) ContentParent::Close() is called 4) MessageChannel::Close() 5) MessageChannel flushes pending runnables in event queue triggering ShutDownProcess() again via runnable from (1) 6) Second ShutdownProcess() call clears mMessageManager 7) ActorDestroy() runs from first ShutDownProcess() call and fails to send 'child-process-shutdown' 8) First ShutDownProcess() call completes I have not been able to verify this actually occurs. Danny, can you run an overnight monkey test with this patch plus the patches from bug 1007520?
Attachment #8431738 -
Flags: review?(khuey)
Attachment #8431738 -
Flags: feedback?(dliang)
Assignee | ||
Comment 30•10 years ago
|
||
Here is a v1.3t version of attachment 8431738 [details] [diff] [review].
Assignee | ||
Comment 31•10 years ago
|
||
(In reply to Ben Kelly (PTO, back Mon June 9) [:bkelly] from comment #29) > Created attachment 8431738 [details] [diff] [review] > Consistently send 'child-process-shutdown' from ContentParent destruction. > (v0) > > So our theory is that ContentParent::NotifyTabDestroyed() is being executed > in the activity case which runs ContentParent::ShutDownProcess() from a > runnable. If ShutDownProcess() completes prior to ActorDestroy() then > mMessageManager is set to null preventing the 'child-process-shutdown' > message from being sent. > > Typically the Close() call in ShutDownProcess() should trigger > ActorDestroy(). It only does this once, though, setting a flag to avoid > running ActorDestroy() again. > > So the theory is we get this sequence: > > 1) ContentParent::NotifyTabDestroyed() is putting runnable on event queue > 2) ShutDownProcess() is called as normal > 3) ContentParent::Close() is called > 4) MessageChannel::Close() > 5) MessageChannel flushes pending runnables in event queue triggering > ShutDownProcess() again via runnable from (1) > 6) Second ShutdownProcess() call clears mMessageManager > 7) ActorDestroy() runs from first ShutDownProcess() call and fails to send > 'child-process-shutdown' > 8) First ShutDownProcess() call completes My flush step in (5) may be bogus. Instead it appears if the Close() encounters an error we re-enter ShutDownProcess() and call CloseWithError(). Since CloseWithError() is async we then proceed with steps 6 to 8 here. In theory. We need to test the patch.
Attachment #8431738 -
Flags: review?(khuey) → review+
Assignee | ||
Comment 32•10 years ago
|
||
The feedback I got on IRC was that due to deadlines, timezones, and the test turn-around, I should land this for now and mark the bug leave-open. The patch should cause no harm, even if it does not fix the root issue here.
Keywords: leave-open
Assignee | ||
Comment 33•10 years ago
|
||
https://hg.mozilla.org/integration/b2g-inbound/rev/6cdd0de5bcaa
Assignee | ||
Comment 34•10 years ago
|
||
Comment on attachment 8431738 [details] [diff] [review] Consistently send 'child-process-shutdown' from ContentParent destruction. (v0) Requesting for b2g28_v1.3t, although I don't see a specific flag for that. [Approval Request Comment] Bug caused by (feature/regressing bug #): Possibly bug 895204 User impact if declined: Memory leak Testing completed: Light device testing on v1.3t to verify nothing obvious broke. Need long running monkey test to test if this really fixes the bug here. Risk to taking this patch (and alternatives if risky): Very low. String or UUID changes made by this patch: None The patch needed some minor rebasing for v1.3t. See attachment 8431739 [details] [diff] [review].
Attachment #8431738 -
Flags: approval-mozilla-b2g28?
Updated•10 years ago
|
Attachment #8431738 -
Flags: approval-mozilla-b2g28?
Assignee | ||
Comment 35•10 years ago
|
||
TBPL b2g-inbound looks relatively good for the commit in comment 33, but I am hesitant to push to b2g28_v1_3t before it merges to mozilla-central. Also, I'm leaving on PTO basically now, so I would not be around to fix any unexpected breakage. So I'd like to leave the uplift to b2g28_v1_3t to the sheriffs if possible. Also, throwing this over to Kyle while I'm gone. Summary of next steps is to get this in v1.3t branch and re-run the monkey test to see if it helps. It would be nice to include the patches from bug 1007520 in the test as well.
Assignee: bkelly → khuey
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3t/rev/d2e80455cbad We should land this on 1.4 too.
Component: Gaia → DOM
Product: Firefox OS → Core
Comment 38•10 years ago
|
||
With below commit of bug 1017331 and revert nothing, commit e68858693b71d917c9c5ee7e215f7ceea04635f7 Author: Fabrice Desre <fabrice@desre.org> Date: Thu May 29 21:24:21 2014 -0700 Bug 1017331 - [Tarako] Trying to create a contact from the SMS results does not create a contact. r=alive the result is positive on 24hrs monkey test w/ 4 devices. We don't find any black homescreen in monkey test.
Updated•10 years ago
|
status-b2g-v1.4:
--- → affected
status-b2g-v2.0:
--- → fixed
Comment 39•10 years ago
|
||
(In reply to Ben Kelly (PTO, back Mon June 9) [:bkelly] from comment #29) > Created attachment 8431738 [details] [diff] [review] > Consistently send 'child-process-shutdown' from ContentParent destruction. > (v0) > > So our theory is that ContentParent::NotifyTabDestroyed() is being executed > in the activity case which runs ContentParent::ShutDownProcess() from a > runnable. If ShutDownProcess() completes prior to ActorDestroy() then > mMessageManager is set to null preventing the 'child-process-shutdown' > message from being sent. > > Typically the Close() call in ShutDownProcess() should trigger > ActorDestroy(). It only does this once, though, setting a flag to avoid > running ActorDestroy() again. > > So the theory is we get this sequence: > > 1) ContentParent::NotifyTabDestroyed() is putting runnable on event queue > 2) ShutDownProcess() is called as normal > 3) ContentParent::Close() is called > 4) MessageChannel::Close() > 5) MessageChannel flushes pending runnables in event queue triggering > ShutDownProcess() again via runnable from (1) > 6) Second ShutdownProcess() call clears mMessageManager > 7) ActorDestroy() runs from first ShutDownProcess() call and fails to send > 'child-process-shutdown' > 8) First ShutDownProcess() call completes > > I have not been able to verify this actually occurs. Danny, can you run an > overnight monkey test with this patch plus the patches from bug 1007520? Monkey test has setup today, it included part 1~3 of bug 1007520, let's wait result tomorrow.
Comment 40•10 years ago
|
||
hi Danny, can you please udpate the results? Thanks
Flags: needinfo?(dliang)
Updated•10 years ago
|
Flags: needinfo?(styang)
Comment 41•10 years ago
|
||
(In reply to Danny Liang [:dliang] from comment #39) > (In reply to Ben Kelly (PTO, back Mon June 9) [:bkelly] from comment #29) > > Created attachment 8431738 [details] [diff] [review] > > Consistently send 'child-process-shutdown' from ContentParent destruction. > > (v0) > > > > So our theory is that ContentParent::NotifyTabDestroyed() is being executed > > in the activity case which runs ContentParent::ShutDownProcess() from a > > runnable. If ShutDownProcess() completes prior to ActorDestroy() then > > mMessageManager is set to null preventing the 'child-process-shutdown' > > message from being sent. > > > > Typically the Close() call in ShutDownProcess() should trigger > > ActorDestroy(). It only does this once, though, setting a flag to avoid > > running ActorDestroy() again. > > > > So the theory is we get this sequence: > > > > 1) ContentParent::NotifyTabDestroyed() is putting runnable on event queue > > 2) ShutDownProcess() is called as normal > > 3) ContentParent::Close() is called > > 4) MessageChannel::Close() > > 5) MessageChannel flushes pending runnables in event queue triggering > > ShutDownProcess() again via runnable from (1) > > 6) Second ShutdownProcess() call clears mMessageManager > > 7) ActorDestroy() runs from first ShutDownProcess() call and fails to send > > 'child-process-shutdown' > > 8) First ShutDownProcess() call completes > > > > I have not been able to verify this actually occurs. Danny, can you run an > > overnight monkey test with this patch plus the patches from bug 1007520? > > Monkey test has setup today, it included part 1~3 of bug 1007520, let's wait > result tomorrow. We still found the leak from content-parent by yesterday's monkey test. DUT1: No leak from content-parent 0 (100.0%) -- queued-ipc-messages ├──0 (100.0%) ── content-parent((Preallocated), pid=12337, open channel, 0x48e51400, refcnt=16) └──0 (100.0%) ── content-parent(Video, pid=12298, open channel, 0x474ea000, refcnt=157) DUT2: 0 (100.0%) -- queued-ipc-messages ├──0 (100.0%) ── content-parent(E-Mail, pid=-1, closed channel, 0x444b2800, refcnt=1) └──0 (100.0%) ── content-parent(Usage, pid=-1, closed channel, 0x43e8ec00, refcnt=1) DUT3: 0 (100.0%) -- queued-ipc-messages ├──0 (100.0%) ── content-parent((Preallocated), pid=14574, open channel, 0x46979000, refcnt=16) ├──0 (100.0%) ── content-parent(Clock, pid=13802, open channel, 0x46e15c00, refcnt=17) └──0 (100.0%) ── content-parent(E-Mail, pid=-1, closed channel, 0x4386d800, refcnt=1) DUT4: No leak from content-parent 0 (100.0%) -- queued-ipc-messages └──0 (100.0%) ── content-parent((Preallocated), pid=15919, open channel, 0x40b72c00, refcnt=16)
Flags: needinfo?(dliang)
Comment 42•10 years ago
|
||
Comment on attachment 8431738 [details] [diff] [review] Consistently send 'child-process-shutdown' from ContentParent destruction. (v0) By test result, there is still leak of content-parent.
Attachment #8431738 -
Flags: feedback?(dliang)
Was this patch in the monkey test build?
Flags: needinfo?(dliang)
So, actually, the most recent log doesn't have any activity-window-N divs in it. It appears we fixed the issue in comment 0 \o/ Let's move everything else back to bug 1007520.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(dliang)
Resolution: --- → FIXED
Comment on attachment 8431738 [details] [diff] [review] Consistently send 'child-process-shutdown' from ContentParent destruction. (v0) We should take this on 1.4. It fixes a leak that happens during an edge case in shutdown of app processes. Those leaks can build up in the parent process and slow down the phone. [Approval Request Comment] Bug caused by (feature/regressing bug #): N/A User impact if declined: Possible leaks if apps shut down in strange ways. Testing completed: Tested on tarako and m-c. Risk to taking this patch (and alternatives if risky): Low risk String or UUID changes made by this patch: None
Attachment #8431738 -
Flags: approval-mozilla-b2g30?
Assignee: khuey → bkelly
Comment 46•10 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #43) > Was this patch in the monkey test build? Yes, I based on gecko commit 426342f00390d0bd2831e782245dd035477bbfcd which included this patch. commit 426342f00390d0bd2831e782245dd035477bbfcd Author: B2G Bumper Bot <release+b2gbumper@mozilla.com> Date: Mon Jun 2 19:19:26 2014 -0700 Bumping manifests a=b2g-bump commit 78a21287b3cdab6e4c97ac1b9be2856945e441cd Author: Ben Kelly <ben@wanderview.com> Date: Mon Jun 2 19:08:09 2014 -0700 Bug 1016746: Consistently send 'child-process-shutdown' from ContentParent destruction. r=khuey a=1.3T+
Updated•10 years ago
|
Flags: needinfo?(anthony)
Updated•10 years ago
|
Keywords: leave-open
Target Milestone: --- → mozilla32
Comment 47•10 years ago
|
||
Please provide testing details and explain technical risks.
See comment 45?
Updated•10 years ago
|
Flags: needinfo?(praghunath)
Updated•10 years ago
|
Attachment #8431738 -
Flags: approval-mozilla-b2g30? → approval-mozilla-b2g30+
Updated•10 years ago
|
Flags: needinfo?(praghunath)
Reporter | ||
Comment 51•10 years ago
|
||
(In reply to James Zhang (Spreadtrum) from comment #50) > Please run dolphin monkey test to verify. It seems no black screen but have other issues when running monkey test,such as bug 1019419.
Flags: needinfo?(yang.zhao)
Updated•10 years ago
|
Flags: needinfo?(ttsai)
Updated•5 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•