[QRD KK Only] homescreen is not displayed after pressing home key

RESOLVED DUPLICATE of bug 1047645

Status

--
critical
RESOLVED DUPLICATE of bug 1047645
4 years ago
4 years ago

People

(Reporter: tkundu, Assigned: kgrandon)

Tracking

unspecified
ARM
Gonk (Firefox OS)
Dependency tree / graph
Bug Flags:
in-moztrap +

Firefox Tracking Flags

(blocking-b2g:2.0+)

Details

(Whiteboard: [caf priority: p1][CR 700028], URL)

Attachments

(12 attachments, 1 obsolete attachment)

4.39 MB, application/x-bzip
Details
1.19 KB, text/x-patch
tkundu
: feedback+
Details
46 bytes, text/x-github-pull-request
Details | Review | Splinter Review
739.61 KB, application/x-bzip
Details
37.66 KB, text/plain
Details
94.52 KB, text/plain
Details
5.66 MB, application/x-bzip
Details
41.44 KB, application/x-bzip
Details
4.06 MB, application/x-bzip
Details
108.77 KB, image/png
Details
49.48 KB, image/png
Details
1.35 MB, application/zip
Details
Created attachment 8470004 [details]
logs when device is stuck

+++ This bug was initially created as a clone of Bug #1045332 +++

[Blocking Requested - why for this release]:

Steps to Reproduce:

1. Start camera
2. Switch to video mode
3. record video for 5 minutes or so
4. Stop recording
5. Press the Home button

Expectation:  homescreen should be displayed
Actual Result:  Camera app remains as the current app and the app is still functional.  Repeated presses of home button do not bring up homescreen, neither does holding the home button key.
[Blocking Requested - why for this release]:

Can we add more log to understand why home-screen is not displayed ? Could you please give us a patch for this ? 


Please note that CPU usage is normal and its not a memleak.
blocking-b2g: --- → 2.0?
Flags: needinfo?(dflanagan)
Summary: [QRD KK Only] After 5 minutes of camera usage, homescreen can no longer be launched → [QRD KK Only] homescreen can no longer be launched
Summary: [QRD KK Only] homescreen can no longer be launched → [QRD KK Only] homescreen is not displayed after pressing home key
Tapas. What happens when you long press the home button? Can you switch to other opened applications? I'm wondering if the homescreen app is crashing.
Flags: needinfo?(tkundu)

Comment 3

4 years ago
Someone from systemsfrontend/homescreen should investigate it. Within camera there is no control on when the app goes to background or closes. Perhaps homescreen is getting killed upon video recording is happening in this case. Adding Kevin and Gregor for investigation. 

Kevin/Gregor: Ping Diego Marcos if you have camera questions/help. David is on another P1

Thanks
Hema
Flags: needinfo?(kgrandon)
Flags: needinfo?(dflanagan)
Flags: needinfo?(anygregor)
What is the difference to bug 1045332?
Flags: needinfo?(anygregor)
This does not reproduce on Flame/KK:
- gonk:  v162-3
- gecko: master:a7cf4142b4a5c50e95b6929de4141ca55b135d33
- gaia:  master:c97d1b6c3094e854377b6affa5f46b8d4b7316ce

With the STR in comment 0, observed: correct and timely return to the Homescreen.

Also, this Flame was configured with 319MB of RAM, and no LMKs were observed in the kernel log.
(Assignee)

Comment 6

4 years ago
Possibly another bug in ContentParent.cpp or nuwa?
Flags: needinfo?(kgrandon)
(Also, the Camera app was limited to 480p recording.)
With my Flame configured as in comment 5 but limited to 286 MB, I am unable to reproduce the issue with the STR in comment 0, though I noticed it took some time for the Camera app to respond to my touching the stop-recording button.

Furthermore, the kernel logs show no LMKs (confirmed by b2g-ps).

Useful data: with a mem limit of 286 MB, the Flame/KK build reports MemTotal as ~182MB:

# adb shell cat /proc/meminfo
MemTotal:         186916 kB
Looking at the logcat, I see:

29:05: homescreen process 19449 is alive at this point

45:58: camera launched
46:01 done loading
46:27 error message presumably from requirejs (?) I don't know if this is benign or not.
 none of these modules would be involved in the described STR, but this message should be investigated:

GeckoConsole: [JavaScript Error: "Error: Timeout for modules: controllers/preview-gallery,controllers/confirm,views/confirm,views/preview-gallery,MediaFrame" {file: "app://camera.gaiamobile.org/js/main.js" line: 1115}]
01

46:29: the Search Results app is sigkilled because of OOM

46:48: repeated warnings from the window manager module of the system app:  GeckoConsole: [JavaScript Error: "Error: Timeout for modules: controllers/preview-gallery,controllers/confirm,views/confirm,views/preview-gallery,MediaFrame" {file: "app://camera.gaiamobile.org/js/main.js" line: 1115}]
01

I have seen these warnings before and usually I think they are benign. This message is repeated 40 times rapidly. I suspect this is repeated presses on the home button, but will need to check to see what causes this message to appear.

47:20: the window manager warning messages appears again a couple of times at this point.

47:25: process 10351 looks like it is spawned at this point

47:36: the camera and settings apps are killed becasue of OOM.

48:22: homescreen (pid 19449) is killed for OOM

48:23: pid 10351 has a lot of activity here

48:37: error message from the homescreen (in new pid 10351). Likely a benign message, but it shows that the homescreen is running at this point:  GeckoConsole: [JavaScript Error: "Error: Timeout for modules: controllers/preview-gallery,controllers/confirm,views/confirm,views/preview-gallery,MediaFrame" {file: "app://camera.gaiamobile.org/js/main.js" line: 1115}]
01

The fact that the new homescreen process has a lower PID than the old one seems very strange to me. Does that mean that we've actually rolled over the PID counter all the way?  Why wouldn't the homescreen in 19449 have died long ago from OOM on this device if there was that much activity going on. And could there be a bug somewhere in gecko or the system app that assumes that PIDs always increase and never decrease. Or maybe I'm not interpreting that part of the logs correctly. Or maybe we're running a version of linux that reuses old PIDs and doesn't just increment a counter?
Tapas: some questions for you

1) Do you think this is exactly the same bug as 1045332?  Are the logs in that bug different than the logs attached here?

2) I assumed this occurred during automated testing and video recording. Do you have a way to correlate STR to the timestamps in the logs?  For example, I suspect that at 46:48 in the log file the home button was pressed repeatedly. If that could be confirmed in the STR (that the home button was pressed at that time or 50 seconds after the camera was launced) that could be very helpful.

3) In the log I see that the camera app is eventually killed because of OOM at 47:36. Did things start workign correctly at this point?  If you were still seeing the camera at this point that would be very weird. Can you confirm that you were able to get to the homescreen after that point in the logs?

4) not a question but a recommendation: please turn on metrics either in the FTU sequence or in Settings->Improve Firefox OS->Submit data.  If you do that then the log will contain "[App Usage]" information that tells us when apps switch and gives us information about what the system app window manager is doing. I don't know enough to produce a patch that would help debug this, but if it reproduces with those metrics turned on, that should help some.
Okay, in comment #9, I pasted the same error message three times.  I'll fix that soon.
(In reply to Diego Marcos [:dmarcos] from comment #2)
> Tapas. What happens when you long press the home button? Can you switch to
> other opened applications? I'm wondering if the homescreen app is crashing.

Homescreen app is not crashing. We didn't see any crashreport on device when this happens. You can see in logs that homescreen is still running and it is able to process some IPC request at 

01-09 00:52:40.649 10351 10351 I GeckoIPC: [time:694360657322][10351->234][PBrowserChild] Sending Msg_AsyncMessage([TODO])

You can also look into b2g-info log for it. 

(In reply to David Flanagan [:djf] from comment #10)
> Tapas: some questions for you
> 
> 1) Do you think this is exactly the same bug as 1045332? 

I am not sure about that. That bug disappeared for sometime back.
>  Are the logs in
> that bug different than the logs attached here?
> 
Yes. We have IPC logs, APZ logs etc in the bug. So if any IPC request is stuck then you should be able to tell it by looking into logs. I didn't find any such thing. But it would be better if you can take a look and confirm us that.


> 2) I assumed this occurred during automated testing and video recording. Do
> you have a way to correlate STR to the timestamps in the logs?  For example,
> I suspect that at 46:48 in the log file the home button was pressed
> repeatedly. If that could be confirmed in the STR (that the home button was
> pressed at that time or 50 seconds after the camera was launced) that could
> be very helpful.

Sorry, I cannot confirm that from test team . But Could you please add a small log for home button press in gecko ? I think that this should be very easy and we don't need any help from test team to confirm exact timestamp of repeatedly pressing home button. Please suggest. 

Please also note that STR mentioned in comment 0 is not accurate and we are hitting this issue randomly.  
This means that it will be wrong if you assume that this issue comes only during camera recording. It can be any app which is opened and we are trying to press home key and it is not returning to homescreen. 

> 
> 3) In the log I see that the camera app is eventually killed because of OOM
> at 47:36. Did things start workign correctly at this point?  If you were
> still seeing the camera at this point that would be very weird. Can you
> confirm that you were able to get to the homescreen after that point in the
> logs?
> 

We are not hitting this issue with camera app in this log. So please note even if camera app is killed and there is some other app which is in foreground and that does not allow us to go to homescreen if we press home key.

Please ignore STR in comment 0 and this issue does not have any proper STR.

> 4) not a question but a recommendation: please turn on metrics either in the
> FTU sequence or in Settings->Improve Firefox OS->Submit data.  If you do
> that then the log will contain "[App Usage]" information that tells us when
> apps switch and gives us information about what the system app window
> manager is doing. I don't know enough to produce a patch that would help
> debug this, but if it reproduces with those metrics turned on, that should
> help some.

Very good idea. I will ask our test team to do this next time. Thanks for this idea. 


I have following questions for you : 

1) I am seeing 3 preallocate process at timestamp : 1970-01-09 00:47:42 . Can you please check timestamp in b2g-info and match those with logcat and tell us if this is expected or not ? Please look into bug 1038854 Comment 5 for this. 

2) Please look into timestamp all logs between timestamp 1970-01-09 00:46:20 and 1970-01-09 00:54:48 in b2g-ps.XXX.log.txt
   I am seeing that multiple apps are running with OOM_ADJ=2 . But I guess that OOM_ADJ=2 is only for foreground app. 
   Could you please confirm this and tell us why are seeing multiple apps with OOM_ADJ=2 in this case ?
Flags: needinfo?(tkundu) → needinfo?(dflanagan)

Updated

4 years ago
Component: Gaia::Camera → Gaia::System
Flags: needinfo?(alive)
I pasted the wrong error messages into my long comment #9:

This is the system app window management warning that is repeated at 46:48:

E GeckoConsole: Content JS WARN at app://system.gaiamobile.org/js/app_window_manager.js:102 in awm_display: the app has been displayed.

(That window management message appears benignly when you press and hold the home button to bring up the task manager.  But the fact that it appears 40 times in rapid succession in the log makes me thing that something else is triggering it in this case.)

This is the homescreen error at 48:37:

E GeckoConsole: [JavaScript Error: "TypeError: this.grid._grid.dragdrop is undefined" {file: "app://verticalhome.gaiamobile.org/gaia_build_defer_index.js" line: 332}]
(In reply to David Flanagan [:djf] from comment #13)
> I pasted the wrong error messages into my long comment #9:
> 
> This is the system app window management warning that is repeated at 46:48:
> 
> E GeckoConsole: Content JS WARN at
> app://system.gaiamobile.org/js/app_window_manager.js:102 in awm_display: the
> app has been displayed.
> 
> (That window management message appears benignly when you press and hold the
> home button to bring up the task manager.  But the fact that it appears 40
> times in rapid succession in the log makes me thing that something else is
> triggering it in this case.)

nice find. 
I can see both Email and Homescreen are running as foreground app(OOM_ADJ=2) in timestamp 1970-01-09 00:46:47 
And both email and settings apps are running as foreground app(OOM_ADJ=2) in timestamp 1970-01-09 00:46:53

After pressing home button at 46:48, homescreen should become forefround app . right ? But b2g-ps log shows that it is in background at 46:53

> 
> This is the homescreen error at 48:37:
> 
> E GeckoConsole: [JavaScript Error: "TypeError: this.grid._grid.dragdrop is
> undefined" {file:
> "app://verticalhome.gaiamobile.org/gaia_build_defer_index.js" line: 332}]

Again we are seeing both homescreen and gallery as foreground app (OOM_ADJ=2) between timestamp 1970-01-09 00:48:34 and 1970-01-09 00:48:41  when above error is printed in logcat. 

Could you please give us a patch for debugging more in this direction ? We may want to print more logs to confirm why we are hitting error mesg "JavaScript Error: "TypeError: this.grid._grid.dragdrop is undefined"
Created attachment 8470256 [details]
patch to log hardware buttons and other chrome events in the system app

Tapas,

This is a gaia-level patch for the system app that will tell us what events the system app is receiving from Gecko and what it is doing with them. It will show us what is happening inside the hardware_buttons.js state machine and also log other mozChromeEvents that the system app receives, which might prove to be useful information.
Attachment #8470256 - Flags: feedback?(tkundu)
Flags: needinfo?(dflanagan)
(In reply to David Flanagan [:djf] from comment #15)
> Created attachment 8470256 [details]
> patch to log hardware buttons and other chrome events in the system app
> 
> Tapas,
> 
> This is a gaia-level patch for the system app that will tell us what events
> the system app is receiving from Gecko and what it is doing with them. It
> will show us what is happening inside the hardware_buttons.js state machine
> and also log other mozChromeEvents that the system app receives, which might
> prove to be useful information.

Thanks we will try and update you.
Comment on attachment 8470256 [details]
patch to log hardware buttons and other chrome events in the system app

Do you think that it will confirm why we are seeing multiple preallocate process or multiple forground gaia apps(OOM_ADJ=2) at same time ?
Attachment #8470256 - Flags: feedback?(tkundu) → feedback-
Flags: needinfo?(dflanagan)
We might have multiple problems here.

1) multiple pre-allocated processes. This seems NUWA related. Can the Nuwa team take a look here?
2) Something seems strange with the window manager and apps in background/foreground. Alive, any idea whats going on?
Flags: needinfo?(tlee)
Flags: needinfo?(kk1fff)
Flags: needinfo?(cyu)
(In reply to Tapas Kumar Kundu from comment #14)

> nice find. 
> I can see both Email and Homescreen are running as foreground app(OOM_ADJ=2)
> in timestamp 1970-01-09 00:46:47 
> And both email and settings apps are running as foreground app(OOM_ADJ=2) in
> timestamp 1970-01-09 00:46:53
> 
> After pressing home button at 46:48, homescreen should become forefround app
> . right ? But b2g-ps log shows that it is in background at 46:53

Actually both processes have NICE=1 and OOM_ADJ=2 starting at 46:15.  And I'm no longer sure that the error messages at 46:48 show where the home button was pressed.  That was maybe a press-and-hold event.

I'm a gaia person, so I don't know how our process management works, but this does seem wrong. If two apps have NICE=1 that would seem to indicate that both are foreground apps. My speculation is that the Home button was pressed around 46:15 and that the Homescreen app did correctly come to the foreground, but that for some reason the previously running Camera app did not go to the background and remained visible, obscuring the Homescreen.

It would be interesting to know how the window management code in the system app would behave if it thought there were two foreground apps. Could that cause it to display the "the app has been displayed" error?

Alive: do you have any thoughts about this?  (For context, from what I'm seeing in the logs, this is a situation of extreme load where multiple apps are being launched at about the same time and it can take more than 10 seconds between the time that new preallocated processes are created and the time that apps are running in the old preallocated processes.  Could the window manager be malfunctioning when the system is so slow (with swapping, I assume) that it takes this long for apps to launch?

> > 
> > This is the homescreen error at 48:37:
> > 
> > E GeckoConsole: [JavaScript Error: "TypeError: this.grid._grid.dragdrop is
> > undefined" {file:
> > "app://verticalhome.gaiamobile.org/gaia_build_defer_index.js" line: 332}]
> 
> Again we are seeing both homescreen and gallery as foreground app
> (OOM_ADJ=2) between timestamp 1970-01-09 00:48:34 and 1970-01-09 00:48:41 
> when above error is printed in logcat. 
> 
> Could you please give us a patch for debugging more in this direction ? We
> may want to print more logs to confirm why we are hitting error mesg
> "JavaScript Error: "TypeError: this.grid._grid.dragdrop is undefined"

I'll see if I can figure out where that error is coming from, and if logging looks promising I'll attach something you can try out.
Is it possible to get the output from attaching gdb to the various preallocated processes and running the command 'thread apply all bt'?
The error message from the homescreen is coming from line 200 of the app.js file.  This is the code that makes the homescreen scroll to the top when the home button is pressed.  So when we see this error message it means that the home button events are being processed somehow.  The homescreen app receives them in the form of hashchange events. I suspect from the system app. And this presumably only happens when the system app thinks that the homescreen is the foreground app.

I haven't looked yet to find out why this.grid._grid.dragdrop is not defined and we're getting the error in the first place.
Flags: needinfo?(dflanagan)
The this.grid._grid.dragdrop error from the homescreen is caused because the dragdrop stuff is lazy loaded. So if the homescreen launches is killed while in the background because of OOM, and then the user rapidly taps on the home buttton, a new homescreen will be launched, but this.grid._grid.dragdrop will not be defined yet and we'll get that error message for the subsequent home button presses. If I adb shell kill my homescreen, I can reproduce this manually.

So this error message is unrelated to the cause of the bug, but it tells us that the homescreen app is running, was recently launched, and that the home button was pressed.
This sounds a lot like bug 1047645.  Output from b2g-info would confirm the symptoms are the same as what I refer to in bug 1047645 comment 26.
Filed bug 1051061 about the homescreen error message.  Tapas: I'm not going to attach a patch to log this activity in the homescreen because I think this error message is telling us as much as a logging statement would.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #23)
> This sounds a lot like bug 1047645.  Output from b2g-info would confirm the
> symptoms are the same as what I refer to in bug 1047645 comment 26.

Kyle,

The attached .tar.bz2 file does have b2g-info logs in it.

Note that I think these logs are all generated from automated test runs. There is apparently no known manual way to get this device into this state, so I doubt that CAF could attach gdb to the relevant processes. 

What they are hoping for is patches that they can apply to their build that will aid in diagnosis. So if you suspect something is going wrong here, and can create a patch that will produce logging output to confirm your suspicion, that is something that Tapas will be able to use.
Flags: needinfo?(khuey)
So it does.  The b2g-info logs look very similar to the ones in bug 1047645.
Flags: needinfo?(khuey)
I think that the lines in the logcat that match this string:

   [234->10351][PBrowserParent] Sending Msg_LoadURL

represent home button events being sent from the system app to the homescreen. This happens 93 times between 48:30 and 52:40

We see the _grid.dragdrop error 14 times from the homescreen between 48:37 and 48:41 

A big problem here is that CAF has not been able to tell us what the timestamp of the reported symptoms are. When was it that the home button was not working?  

My hunch is that the homescreen was launched correctly but that it was obscured by another app that never went to the background or that took a long time to go to the background.
Created attachment 8470312 [details]
a new version of the logging patch to get window manager debug messages plus hardware button and mozChromeEvents

Tapas: please use this version of the patch instead of the one I attached earlier. It also turns on window management debugging which should help diagnose this if it is window manager related.
Attachment #8470256 - Attachment is obsolete: true
Attachment #8470312 - Flags: feedback?(tkundu)
(Assignee)

Updated

4 years ago
Blocks: 1047645
@david: we also have a nice idea on bug 1047645 Comment 26 . Please take a look. He also has some debugging patch. I will ask test team to run with that too.
Flags: needinfo?(dflanagan)
I really want the thread stacks of the homescreen after it gets into this state.  That will be far more useful than that logging.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #30)
> I really want the thread stacks of the homescreen after it gets into this
> state.  That will be far more useful than that logging.

Sure, we will get it for you :) . Thanks for the ideas.
Clear ni! Just like what Kyle's inquiry, I will look into it once we have a backtrace log.
Flags: needinfo?(tlee)
Comment on attachment 8470312 [details]
a new version of the logging patch to get window manager debug messages plus hardware button and mozChromeEvents

We have AppWindow.prototype._DEBUG to log all AppWindow related messages.
I am going to provide another patch to turn it on.
Flags: needinfo?(alive)
Created attachment 8470424 [details] [review]
Turn on all window management debugging log

This one turns all flag on.

Updated

4 years ago
blocking-b2g: 2.0? → 2.0+
Multiple preallocated processes is a bug, but it won't affect functionality. It just consumes more memory. We can fix that.
Flags: needinfo?(cyu)
The problem with multiple preallocated processes is reproduced in bug 1051745. I'll track the issue in that bug.
Passing this to :cyu as he's  helping with bug 1051745 which may help here.
Assignee: nobody → cyu

Comment 38

4 years ago
Kyle/Tapas,

Could you please provide more info from your debug sessions? do we have logs with Alive's debug patch for further investigation?

Thanks
Hema
Flags: needinfo?(tkundu)
Flags: needinfo?(khuey)
If tapas did not add anything we have no additional information.  This was not one of the bugs we looked at today.
Flags: needinfo?(khuey)
I am waiting our internal team to reproduce this bug again with log from #comment 33. 
I will update as soon as they reproduce this bug with those logs. 
I am also trying to make another remote gdb session for this bug.
(In reply to Tapas Kumar Kundu from comment #29)
> @david: we also have a nice idea on bug 1047645 Comment 26 . Please take a
> look. He also has some debugging patch. I will ask test team to run with
> that too.

That sounds good to me. As a Gaia programmer, that Gecko code is completely unfamiliar to me. I suppose that when the window management code in the system app moves an app from foreground or background, it triggers code in ProcessPriorityManager.cpp to actually change the process priority.  With logging in both of those places we ought to have a better idea about what is going on and how we end up with two foreground apps.
Flags: needinfo?(dflanagan)
I see that the ProcessPriorityManager involves a timer, which makes me wonder whether there could be a race condition that manifests under extreme loads (when the system app does not respond quickly and the remote-browser-shown event is slow to arrive)

Updated

4 years ago
Flags: needinfo?(khuey)
Flags: needinfo?(cyu)
Flags: needinfo?(bbajaj)
Could this device be made available to us in gdb/etc?
Flags: needinfo?(ikumar)

Comment 45

4 years ago
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #44)
> Could this device be made available to us in gdb/etc?

Kyle -- yes Tapas has the gdb session, he is going to get it in touch with you.
Flags: needinfo?(ikumar)
Created attachment 8472616 [details]
Mozilla_logs.tar.bz2

another set of logs from affected device .
Created attachment 8472651 [details]
backtrace of all homescreen threads  after device is affected
Created attachment 8472657 [details]
dmesg_20-01-1970-00-53-31.log.txt

kernel logs for Comment 46
Flags: in-moztrap?(ychung)
Created attachment 8472675 [details]
memory report from affected device
Created attachment 8472702 [details]
backtrace of b2g process when it fails to send IPC request.tar.bz2

b2g process is receving touch events in affected device and 

b2g process is not able to send IPC request to homesreen app althoguh homescreen is running.

b2g process is giving error from here :


http://lxr.mozilla.org/mozilla-aurora/source/ipc/glue/MessageChannel.cpp#1472
Ths initial STR/result looks similar to bug 1016782 — are they related?
(Assignee)

Comment 52

4 years ago
Next steps: Add additional logging around gaia homescreen code to help narrow down problem. It might also be helpful to log some inside of content processes/content parent.
As discussed in meeting , we are waiting for another gdb session to figure out why  b2g process is not able to send IPC request to homesreen app althoguh homescreen is running.

Please note it is may be impossible to reproduce this issue with tons of IPC logging :( 

Thanks a lot for your time and effort in gdb session :)
(In reply to Jed Davis [:jld] from comment #51)
> Ths initial STR/result looks similar to bug 1016782 — are they related?

Looking closer (and based on what I'm hearing out-of-band), probably not — this one requires doing video recording, and apparently doesn't require invoking the camera from the lockscreen.  Sorry for the noise.
(Assignee)

Comment 55

4 years ago
Tapas - could you have your team add this patch to your current build? It should provide some additional logging in the gaia system app so we can 100% rule out that it's not being caused by gaia. Thanks!

https://github.com/mozilla-b2g/gaia/pull/22857
Flags: needinfo?(tkundu)
Please note that paying too much attention to multiple home screen might take us in the wrong direction. Multiple preallocated processes is a bug, but it is due to the bug that we requested more-than-enough preallocated processes (also note that the 2 have different parent processes in the log). If the preallocated processes are healthy, the existence of multiple prealloc processes won't prevent us from going back to the homescreen.

We need to check whether the prealloc processes are healthy:
* gdb attach to the process and show the backtraces of all threads
* check if we'd ever send out a request to the prealloc process and it failed to complete the launch process. For example, in the IPC log
[xxx->yyy][PBrowserParent] Sending Msg_LoadURL
is present but the interaction between the b2g and the app process stopped after that.

I'll wait for the next round of gdb log to see if we can have more info about this bug.

One more note for Tapas: in comment #50, the it doesn't look to me we'll get stuck in this line:
http://lxr.mozilla.org/mozilla-aurora/source/ipc/glue/MessageChannel.cpp#1472

It looks like we just capture the stack frame by chance. Next time you capture this, please let gdb continue for several times, or try to finish the stack frame.
Flags: needinfo?(cyu)
Blocks: 1054011
No longer blocks: 1054011
Created attachment 8473246 [details]
memory report from affected device
Created attachment 8473254 [details]
screenshot of affected device
Created attachment 8473324 [details]
2014-08-15-03-56-53_after_browser_url_touched_in_affected_device.png
(Assignee)

Comment 60

4 years ago
By hacking some state in the system app we have observed the same issue reported here. The root cause is the same, but it appears differently because the browser app happens to be in-process, unlike the other apps.

It turns out that the fix we have in bug 1047645 also happens to fix this bug as well. I am also stealing this bug as I am working on landing a fix for it, and it should land soon.
Assignee: cyu → kgrandon
Component: Gaia::System → Gaia::System::Window Mgmt

Updated

4 years ago
QA Whiteboard: [2.0-signoff-need+]
(Assignee)

Comment 61

4 years ago
We have determined that the core issue here is the same as in bug 1047645. Since we have a patch there, let's use that bug so everyone is focused on the same thing.
No longer blocks: 1047645
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Flags: needinfo?(kk1fff)
Flags: needinfo?(khuey)
Resolution: --- → DUPLICATE
Duplicate of bug: 1047645
New test case needs to be added. There is no existing test case.
QA Whiteboard: [2.0-signoff-need+] → [2.0-signoff-need+][QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Created attachment 8476020 [details]
tapass_cr_700028.zip

Issue is reproduced again with fix. It has following logs: 

1) Screenshot of affected device after pressing home key 
2) memory report from affected device after pressing home key 
3) kernel and logcat logs
Flags: needinfo?(tkundu) → needinfo?(kgrandon)
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Test case added to moztrap:

https://moztrap.mozilla.org/manage/case/14396/
QA Whiteboard: [2.0-signoff-need+][QAnalyst-Triage?] → [2.0-signoff-need+][QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(ychung)
Flags: in-moztrap+
(In reply to Tapas Kumar Kundu from comment #63)
> Created attachment 8476020 [details]
> tapass_cr_700028.zip
> 
> Issue is reproduced again with fix. It has following logs: 
> 
> 1) Screenshot of affected device after pressing home key 
> 2) memory report from affected device after pressing home key 
> 3) kernel and logcat logs

There's no cycle collector logs in there.  Can you get those?
Flags: needinfo?(tkundu)
I created another bug https://bugzilla.mozilla.org/show_bug.cgi?id=1056216 And I have logs for Comment 65 in that bug
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
Flags: needinfo?(tkundu)
Resolution: --- → DUPLICATE
Duplicate of bug: 1047645
(Assignee)

Comment 67

4 years ago
Will help investigate in bug 1056216.
Flags: needinfo?(kgrandon)

Updated

4 years ago
Flags: needinfo?(bbajaj)
Attachment #8470312 - Flags: feedback?(tkundu) → feedback+
You need to log in before you can comment on or make changes to this bug.