Closed Bug 1047645 Opened 5 years ago Closed 5 years ago

Home screen does not load after pressing home button

Categories

(Firefox OS Graveyard :: Gaia::System::Window Mgmt, defect, P1, blocker)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:2.0+, b2g-v1.3 affected, b2g-v1.3T affected, b2g-v1.4 verified, b2g-v2.0 verified, b2g-v2.1 verified)

VERIFIED FIXED
2.1 S2 (15aug)
blocking-b2g 2.0+
Tracking Status
b2g-v1.3 --- affected
b2g-v1.3T --- affected
b2g-v1.4 --- verified
b2g-v2.0 --- verified
b2g-v2.1 --- verified

People

(Reporter: tkundu, Assigned: kgrandon)

References

()

Details

(Whiteboard: [caf priority: p1][CR 703260][systemsfe])

Attachments

(14 files, 1 obsolete file)

115.61 KB, application/x-bzip
Details
669.32 KB, application/x-bzip
Details
5.04 MB, application/x-bzip
Details
2.72 MB, text/plain
Details
182.63 KB, text/plain
Details
21.16 KB, text/plain
Details
1.14 MB, application/x-bzip
Details
6.27 MB, application/x-bzip
Details
9.76 KB, image/png
Details
3.83 MB, application/x-bzip
Details
1.13 MB, application/x-bzip
Details
46 bytes, text/x-github-pull-request
kgrandon
: review+
Details | Review
46 bytes, text/x-github-pull-request
kgrandon
: review+
Details | Review
46 bytes, text/x-github-pull-request
Details | Review
Test steps:
1. Open camera and take pictures.
2. Open video and play videos.
3. Play music. Open settings and do airplane mode on/off and BT on/off.
4. After playing videos tapped on home key.
5. Black screen is displayed.
Observations:
1. Complete black screen is displayed.
2. Short press of power key is showing display down and up.
3. Volume up and down keys are updating the UI.
4. Device condition is shared in a video.
5. If we kill homescreen |adb shell kill pid| then homescreen is not restarted again.


https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/commit/?h=mozilla/v2.0&id=0a864988f5dce7f9f3dea9609e8ef054679c30ff
https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/commit/?h=mozilla/v2.0&id=745b486db495248e4d4503039e374cb8d5bb244f


I knew that STR will be very difficult to reproduce . But if you give us some patch which can produce additional logs then it may help you to analyze more here.
[Blocking Requested - why for this release]:
blocking-b2g: --- → 2.0?
Whiteboard: [CR 703260] → [caf priority: p1][CR 703260]
Whiteboard: [caf priority: p1][CR 703260] → [caf priority: p1][CR 703260][systemsfe]
blocking-b2g: 2.0? → 2.0+
08-01 15:58:24.585 10098 10098 E GeckoConsole: Content JS ERROR at app://verticalhome.gaiamobile.org/gaia_build_defer_index.js:392 in GridItem.prototype.doRenderIcon/<: Error fetching icon [Exception... "File error: Not found"  nsresult: "0x80520012 (NS_ERROR_FILE_NOT_FOUND)"  location: "JS frame :: app://verticalhome.gaiamobile.org/gaia_build_defer_index.js :: fetchBlob/< :: line 374"  data: no]
Seems like this is either going to be something in ContentParent again or the system app. Moving components, but will keep ownership for now unless someone wants to steal.
Component: Gaia::Homescreen → Gaia::System::Window Mgmt
Can we add qawanted here for verification?

Ideally I would like to see a video/screenshot and potentially reduced STR if possible.
Keywords: qawanted
I was unable to reproduce this issue on Buri 2.0 and Flame 2.0 devices.

Environmental Variables:
Device: Flame 2.0
BuildID: 20140804060132
Gaia: 4ab7384db7aee130be165a699472cc19405a4456
Gecko: 5e94ab16ec71
Version: 32.0 (2.0) 
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0


Environmental Variables:
Device: Buri 2.0
BuildID: 20140804060132
Gaia: 4ab7384db7aee130be165a699472cc19405a4456
Gecko: 5e94ab16ec71
Version: 32.0 (2.0) 
Firmware Version: v1.2device.cfg
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
Tapas - we are having trouble reproducing this bug. Can you provide the build id of where you first saw this issue?
Flags: needinfo?(jmitchell) → needinfo?(tkundu)
I also tried very hard to reproduce with todays build. But no luck.
How old is your build?
(In reply to Joshua Mitchell [:Joshua_M] from comment #7)
> Tapas - we are having trouble reproducing this bug. Can you provide the
> build id of where you first saw this issue?

I provided exact gaia/gecko SHA1's in Comment 0 when I first saw this bug. 

Here is the SHA1's of gaia/gecko which is tested by us and we didn't see this issue : 

https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gecko/tag/?h=mozilla/v2.0&id=AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.043
https://www.codeaurora.org/cgit/quic/lf/b2g/mozilla/gaia/tag/?h=0a25eff60fcc699687e45ba2ac8b9a3ab3782672&id=AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.043
Flags: needinfo?(tkundu)
(In reply to Tapas Kumar Kundu from comment #9)
> (In reply to Joshua Mitchell [:Joshua_M] from comment #7)
> > Tapas - we are having trouble reproducing this bug. Can you provide the
> > build id of where you first saw this issue?
> 
> I provided exact gaia/gecko SHA1's in Comment 0 when I first saw this bug. 


Yes, I saw that and while I appreciate that that info was included in the original write-up different departments operate differently. When we (QA-Wanted) can not reproduce an issue in the latest we try to go to the exact build # that the reporter was using. Unfortunately QA-Wanted does not have the ability to take a Gaia and Gecko commit # and cross-reference them in some way to pull an applicable build # from that data.
(In reply to Joshua Mitchell [:Joshua_M] from comment #10)
> (In reply to Tapas Kumar Kundu from comment #9)
> > (In reply to Joshua Mitchell [:Joshua_M] from comment #7)
> > > Tapas - we are having trouble reproducing this bug. Can you provide the
> > > build id of where you first saw this issue?
> > 
> > I provided exact gaia/gecko SHA1's in Comment 0 when I first saw this bug. 
> 
> 
> Yes, I saw that and while I appreciate that that info was included in the
> original write-up different departments operate differently. When we
> (QA-Wanted) can not reproduce an issue in the latest we try to go to the
> exact build # that the reporter was using. Unfortunately QA-Wanted does not
> have the ability to take a Gaia and Gecko commit # and cross-reference them
> in some way to pull an applicable build # from that data.

I don't think this is a type of bug we are going to be able to reproduce manually, as this comes across as a bug only seen in a stress-style automation setup. Our best option here is to setup logging, reproduce the bug, and diagnose it via the avenue.
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+][lead-review+]
Keywords: qawanted
I saw a error msg in the log ...
08-01 15:51:06.858   226   226 E GeckoConsole: [JavaScript Error: "TypeError: this.element is null" {file: "app://system.gaiamobile.org/js/app_window.js" line: 1277}]

I guess it's when some appWindow is resizing but cannot get this.element since it's already killed. bug 1041467 may have solved it.
Assignee: nobody → kgrandon
Target Milestone: --- → 2.1 S2 (15aug)
(In reply to George Duan [:gduan] [:喬智] from comment #12)
> I saw a error msg in the log ...
> 08-01 15:51:06.858   226   226 E GeckoConsole: [JavaScript Error:
> "TypeError: this.element is null" {file:
> "app://system.gaiamobile.org/js/app_window.js" line: 1277}]
> 
> I guess it's when some appWindow is resizing but cannot get this.element
> since it's already killed. bug 1041467 may have solved it.

Thanks for the comment George. Going to mark as a dependency then based on your hunch.
Depends on: 1041467
Tapas - Since we feel that bug 1041467 may have solved this, can you check to see if this reproduces with the patch from that bug? Thank you!
Flags: needinfo?(tkundu)
(In reply to Kevin Grandon :kgrandon from comment #14)
> Tapas - Since we feel that bug 1041467 may have solved this, can you check
> to see if this reproduces with the patch from that bug? Thank you!

I saw that fix has already landed in v2.0 . Our test team is testing a build which contains fix from that bug. We will update you asap. Thanks for your help
Mozilla MTBF team has completed a verification on this bug. It should not happen again. Please let me or pyang@mozilla.com know about it if you are still able to reproduce it.
Actually I'm not sure this was fixed since symptom are not the same.  Keep running test and see if it will happen
Attached file logset1
(In reply to Kevin Grandon :kgrandon from comment #14)
> Tapas - Since we feel that bug 1041467 may have solved this, can you check
> to see if this reproduces with the patch from that bug? Thank you!

It has happened again with fix from that bug and 

Its not a memleak .

Homescreen main thread has following logs in logcat : 
01-07 23:36:45.349  9723  9723 I GeckoIPC: [time:603405363254][9723<-232][PHalChild] Received Msg_NotifyWakeLockChange([TODO])

and homescreeen main thread is not hanged when this happened. We also have APZ, IPC logs in logcat. Could you please check to see if you find any b2g IPC request is hanging or not ?
Flags: needinfo?(tkundu)
Flags: needinfo?(kgrandon)
Attached file logset2
1) homescreen main thread is not hanged.


01-09 04:15:27.059  5461  5461 I GeckoIPC: [time:706527059640][5461<-238][PHalChild] Received Msg_NotifyWakeLockChange([TODO])

2) There is two preallocated process running on device which is very strange. It should be resolved by bug 1038854. But we are hitting same issue again. @Sotaro : COuld you please suggest on this ? 

3) No one is using high CPU
4) Not a memeleak.

5) We killed homescreen manually using |adb shell kill | after this issue is reproduced. But homescreen is not restarted. 
6) We also have IPC, APZ logs in logcat. Can anyone check to see if b2g IPC thread is hanging or not ?
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(ikeda.sohtaroh)
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(ikeda.sohtaroh)
Flags: needinfo?(sotaro.ikeda.g)
Please also check APZ logs and confirm us if you find something interesting in that or not .
Tapas - I would like to rule out that this is not the same root cause as bug 1050423. I think there are a few things we can do to check. Can we try the following:

1 - Check with the patches in bug 1050423 (R+'d and should be landing soon), to see if this issue reproduces.
2 - Get a memory report when this happens. Perhaps using get_about_memory.py?
Flags: needinfo?(kgrandon) → needinfo?(tkundu)
(In reply to Kevin Grandon :kgrandon from comment #21)
> Tapas - I would like to rule out that this is not the same root cause as bug
> 1050423. I think there are a few things we can do to check. Can we try the
> following:
> 
> 1 - Check with the patches in bug 1050423 (R+'d and should be landing soon),
> to see if this issue reproduces.
> 2 - Get a memory report when this happens. Perhaps using get_about_memory.py?

Thanks for your help. We will try this and update you asap.
(In reply to Kevin Grandon :kgrandon from comment #21)
> Tapas - I would like to rule out that this is not the same root cause as bug
> 1050423. I think there are a few things we can do to check. Can we try the
> following:
> 
> 1 - Check with the patches in bug 1050423 (R+'d and should be landing soon),
> to see if this issue reproduces.
> 2 - Get a memory report when this happens. Perhaps using get_about_memory.py?

I attached b2g-ps log for #comment 19

I have following questions:
1) I am seeing multiple 4 preallocate process at timestamp 1970-01-09 04:07:37 in b2g-ps-XXXX.txt
   I don't see such thing on normal device. Could you please also find out what is causing this?
2) two or more apps are running with OOM_ADJ=2 after timestamp 1970-01-09 04:07:18 in attached log.
   Could you please tell us why are putting 2-3 app in foreground? 
   
  I also see these two issues in Bug 1050751 Comment 12. Can we also find out what can cause this in gecko ?
Kyle - Any ideas about comment 23, or what kind of logging we can add to help debug this?


(In reply to Tapas Kumar Kundu from comment #23)
>    I don't see such thing on normal device.

This is a bit concerning to me - are these not being run on a normal device?
Flags: needinfo?(khuey)
(In reply to Kevin Grandon :kgrandon from comment #24)
> This is a bit concerning to me - are these not being run on a normal device?

Just for clarification: This for the term "normal device" which I used in above comment. I meant to say that I don't see that 2 FFOS app running with OOM_ADJ=2 (foreground app) in my own device. 

But our test team also runs tests on SAME device but they have different stability tests which they do manually for a very long time. So customers can see it if they use device for long time.
The first thing I would do is turn on the logging in the process priority manager.  That can be done by uncommenting the line at http://mxr.mozilla.org/mozilla-central/source/dom/ipc/ProcessPriorityManager.cpp#46.

Let me know if you would like that in patch form.

Please confirm that the logs from comment 19 were taken with the final fixes for bug 1038854 landed.  The homescreen appears to have failed to completely fork from Nuwa because the USS/PSS is so low.  One of the preallocated processes is in the same state.  Please attach to these processes in gdb and provide the output of the command 'thread apply all bt'.
Flags: needinfo?(khuey)
Please note that this could be related to another bug 1050751 we have filed which also shows multiple preallocated processes and multiple foreground apps.
(In reply to Inder from comment #27)
> Please note that this could be related to another bug 1050751 we have filed
> which also shows multiple preallocated processes and multiple foreground
> apps.

The seeing multiple pre-allocated processes issue definitely seems like it's worth adding a dependency here to help us debug the cause of this issue.
Depends on: 1050751
Attached file bt.txt
I caught the phone in a state where I can't wake it up again. The screen stays black even if I press the power botton.

Attached the backtraces of all processes.
Kyle, anything interesting in there?
Flags: needinfo?(khuey)
Attached file log.txt
logcat as well.
E/GeckoConsole( 1355): Content JS WARN at app://system.gaiamobile.org/js/app_window.js:1363 in AppWindow.prototype.lockOrientation: screen.mozLockOrientation() returned false for app://verticalhome.gaiamobile.org orientation default come from manually killing the home screen
There's nothing obvious there.  All of the threads in all of the processes are just waiting for events.  It might be interesting to trace through the code that is supposed to handle receiving the power button press to see what happens.

Note that the b2g-info output is not like the ones CAF has.
Flags: needinfo?(khuey)
(In reply to Tapas Kumar Kundu from comment #19)
> Created attachment 8470030 [details]
> logset2
> 
> 1) homescreen main thread is not hanged.
> 
> 
> 01-09 04:15:27.059  5461  5461 I GeckoIPC:
> [time:706527059640][5461<-238][PHalChild] Received
> Msg_NotifyWakeLockChange([TODO])
> 
> 2) There is two preallocated process running on device which is very
> strange. It should be resolved by bug 1038854. But we are hitting same issue
> again. @Sotaro : COuld you please suggest on this ? 
> 
> 3) No one is using high CPU
> 4) Not a memeleak.
> 
> 5) We killed homescreen manually using |adb shell kill | after this issue is
> reproduced. But homescreen is not restarted. 

Well, we won't restart the homescreen if it's not the active app.
We will restart it if it's killed and it is going to be opened.

But if homescreen is already opened, the restart won't be invoked.

If you could use AppManager, go to system app, and type |homescreenLauncher.getHomescreen.ensure(true);| to see what happens.
Flags: needinfo?(sotaro.ikeda.g)
I am waiting our internal team to reproduce this bug again with log from #comment 26. 
I will update as soon as they reproduce this bug with those logs. 
I am also trying to make another remote gdb session for this bug.
Tapas pointed out to be yesterday while we were looking at another issue that the b2g-info here shows that the homescreen is swapped out, not that it's using no memory.  I missed the swap column when I was looking at these.  That means this is unlikely to be a Nuwa issue, so we're back to having no idea what is going on.

An example:

Every 5s: b2g-info                                          1970-01-09 04:15:39

                           |     megabytes    |
           NAME   PID PPID CPU(s) NICE USS  PSS  RSS SWAP VSIZE OOM_ADJ USER     
            b2g   238    1  910.5    0 9.8 10.7 12.9 40.6 252.2       0 root     
         (Nuwa)  1147  238    3.8    0 0.0  0.1  0.6  7.8  54.1       0 root     
          Video  2376 1147   16.3   18 0.8  0.9  1.8 14.0  85.5      11 u0_a2376 
 Search Results  3405 1147    6.5   18 4.6  5.1  6.7  7.3  66.2      10 u0_a3405 
         Camera  3711 1147   11.1    7 4.1  4.9  7.1 12.7  77.4       6 u0_a3711 
     Homescreen  5461 1147   21.0    1 0.0  0.2  1.1 16.0  77.2       2 u0_a5461 
(Preallocated a 11634 1147    1.8   18 0.0  0.1  0.8 11.0  61.2       1 u0_a11634
(Preallocated a 11635  238    3.1    1 2.8  3.5  5.6  8.3  67.1       2 u0_a11635
Flags: in-moztrap?(dharris)
Test case needs to be written to cover this issue
QA Whiteboard: [QAnalyst-Triage+][lead-review+] → [QAnalyst-Triage?][lead-review+]
Flags: needinfo?(ktucker)
Tapas - do you have the ability to use the app manager in this state? Can you try the javascript in comment 32 inside of the app manager console to see what the state is?

homescreenLauncher.getHomescreen.ensure(true);
We have additional logs and live gdb session for this issue. Tapas is going to upload the logs and contact Kevin to look at the gdb session.
Attached file mozilla_logs.tar.bz2
another set of logs from affected device
Note sure if its relevant, but one of the Preallocated processes was launched from b2g, and one of the preallocated processes was launched from Nuwa.
(In reply to Dave Hylands [:dhylands] from comment #39)
> Note sure if its relevant, but one of the Preallocated processes was
> launched from b2g, and one of the preallocated processes was launched from
> Nuwa.

I discussed with kyle. It seems to be a bug although it may/may not be related to this bug . Could you please create a new bug for this ?
Flags: needinfo?(dhylands)
screenshot of affected device
In this case the homescreen app has both the "active" and "render" classes, and the iframe does not contain the "hidden" class. We are left with two active apps open at the same time.
FULL logs from affected device after pressing home button again
Depends on: 1053990
Blocks: 1054011
(In reply to Tapas Kumar Kundu from comment #41)
> (In reply to Dave Hylands [:dhylands] from comment #39)
> > Note sure if its relevant, but one of the Preallocated processes was
> > launched from b2g, and one of the preallocated processes was launched from
> > Nuwa.
> 
> I discussed with kyle. It seems to be a bug although it may/may not be
> related to this bug . Could you please create a new bug for this ?

I filed bug 1054011
Flags: needinfo?(dhylands)
kgrandon and I debugged this on-site with tkundu and found the following:

There are *two* 'active' apps, the homescreen and the camera.  The homescreen is being obscured by the camera.  That is, the homescreen iframe is behind the camera's iframe when painting, so it is not visible.  The camera app has crashed at some point in the past, so there are no contents of the iframe to paint, which gets us our black screen.  It appears we did successfully exit fullscreen mode so the status bar is visible.

When the camera app crashed Gecko should have fired a mozbrowsererror event at the iframe of the camera app.  That was Gaia's opportunity to remove the Camera app iframe from the DOM and clean up.  Based on the cycle collector log, we believe that Gecko fired the mozbrowsererror event.  The event listener added at https://github.com/mozilla-b2g/gaia/blob/v2.0/apps/system/js/app_window.js#L423 is visible as 0xadb49880 in the CC log, and it is attached to the event listener manager for the Camera app iframe's containing <div>.

Because of bug 1053990 we cannot used the continued presence of the event listener as proof that the event listener never fired.  But we can see that if destroy were called, the node would have been removed from the DOM or an exception would have been thrown.  Since there are no exceptions that appear to be relevant in the logcat provided and the node is still in the DOM, we believe that the event listener never fired.  The "_closed" event is driven off of a "swipeout" event fired from stack_manager.js.  We discontinued tracing the code at this point.

We do not understand why stack_manager.js did not fire the event, because the relevant state is not preserved after app transitions.  We can, however, off a timer or something, force that event to fire if stack_manager.js is not sending us the event.  This is incredibly hacky, but we're in the 11th hour of the release here and we don't have time to trace this further.  Based on the information that we gathered here, kgrandon was able to simulate the conditions that led to being in this state and reproduce the issue.  We were then able to verify that our proposed fix of firing the event via a "backup" method solves the issue.
No longer blocks: CAF-v2.0-CC-metabug, 1054011
blocking-b2g: 2.0+ → ---
Grrr. Wrong bug restoring
blocking-b2g: --- → 2.0+
No longer depends on: 1053990
After further investigation it appears that the correct event origin should be from app_transition_controller.

Furthermore we've determined that it's possible to put the device in the same state.  If you early return before this.app.publish(state);, then kill the process in adb shell you will get into the same state. E.g., 

if (state === 'closed') { return; }
this.app.publish(state);
No longer blocks: 1054011
See Also: → 1054011
I am working on a test for this now.
Comment on attachment 8473245 [details] [review]
Github pull request - Use manual timeout to guard _closed event from not firing

A test is added now. Alive or Tim - could you guys review this one? Somehow the transition is being missed, I'm not sure how. This will guard the transition with a call to destroy the element after a timeout.

We should find the root cause, but it seems like we should probably have these guards for all transition listeners in the system app.
Attachment #8473245 - Flags: review?(timdream)
Attachment #8473245 - Flags: review?(alive)
QA Whiteboard: [QAnalyst-Triage?][lead-review+] → [QAnalyst-Triage?][lead-review+][2.0-signoff-need+]
Duplicate of this bug: 1027626
Duplicate of this bug: 1019419
Duplicate of this bug: 1045980
Duplicate of this bug: 1031225
No longer depends on: 1050751
Duplicate of this bug: 1050751
Summary: Black screen is displayed in place of home screen. → Home screen does not load after pressing home button
No longer blocks: 1045980
Duplicate of this bug: 1045980
This needs to be r+d today.
Flags: needinfo?(timdream)
Flags: needinfo?(alive)
Comment on attachment 8473245 [details] [review]
Github pull request - Use manual timeout to guard _closed event from not firing

Alive should r+ this.
Attachment #8473245 - Flags: review?(timdream) → feedback+
Flags: needinfo?(timdream)
Hi,Kevin
  1)Some of the bug you duplicate is on v1.4, will you uplift the patch to v1.4?
  
  2)When working on 1031225, I find another issue,the log is :
E GeckoConsole: [JavaScript Error: "TypeError: this.app.element is null" {file: "app://system.gaiamobile.org/js/app_transition_controller.js" line: 156}]

This because when press Home button to back to idle in low memory situation,homescreen is killed before app transition handler handle it.You could see bug 1031225 comment 71 ~ comment 81 for more det    ail.Could this patch also work for that issue?

 Hope your reply and thank you very much!
Flags: needinfo?(kgrandon)
Comment on attachment 8473245 [details] [review]
Github pull request - Use manual timeout to guard _closed event from not firing

r+ for a blocker. This is something in my mind as well.
I think for long term we need to figure out why the closed is not triggered.
Attachment #8473245 - Flags: review?(alive) → review+
Flags: needinfo?(alive)
(In reply to yang.zhao from comment #60)
> Hi,Kevin
>   1)Some of the bug you duplicate is on v1.4, will you uplift the patch to
> v1.4?

This is currently not blocking 1.4, so we should check with Bhavana to see what the process for uplifting it is. I don't immediately want to mark as 1.4? because then we will lose the 2.0+.
Flags: needinfo?(kgrandon) → needinfo?(bbajaj)
I don't think this can uplift cleanly on 1.4.
(In reply to Tim Guan-tin Chien [:timdream] (MoCo-TPE) (please ni?) from comment #64)
> I don't think this can uplift cleanly on 1.4.

The fix is fairly trivial to make in 1.4 as well: https://github.com/mozilla-b2g/gaia/blob/v1.4/apps/system/js/app_window.js#L405

It should not be difficult to write a patch if we need to.
(In reply to Kevin Grandon :kgrandon from comment #65)
> (In reply to Tim Guan-tin Chien [:timdream] (MoCo-TPE) (please ni?) from
> comment #64)
> > I don't think this can uplift cleanly on 1.4.
> 
> The fix is fairly trivial to make in 1.4 as well:
> https://github.com/mozilla-b2g/gaia/blob/v1.4/apps/system/js/app_window.
> js#L405
> 
> It should not be difficult to write a patch if we need to.

I suspect this could fix the issue on dolphin..
I think this patch could fix some issues on dolphin. Could anyone can help uplift it to v1.4?Thank you!
Apparently this was backed out for causing some errors during UI testing. It should be possible to guard against these errors, so I will submit another patch.

E/GeckoConsole( 4262): [JavaScript Error: "TypeError: this.app.element is null" {file: "app://system.gaiamobile.org/js/app_transition_controller.js" line: 324}]

https://github.com/mozilla-b2g/gaia/commit/d0d773c277a9105288ee35da2121f4ae62709be8
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Hey Alive - could you take another look?

The only change here is to do another similar guard as we do everywhere else in this file, by checking on app.element. This change is here: https://github.com/mozilla-b2g/gaia/pull/22928/files#diff-f70ce6c514b65fb6094cdde4d5382495R313
Attachment #8473678 - Flags: review?(alive)
Comment on attachment 8473678 [details] [review]
Github pull request - Take 2

This is the original patch with the single-line change to follow the convention of *everywhere* else in the file. I am fine carrying the original review here, and r=me for the new line of code. We will use bug 1031225 to examine the app_transition error in more detail.
Attachment #8473678 - Flags: review?(alive) → review+
Attachment #8473245 - Attachment is obsolete: true
Master: https://github.com/mozilla-b2g/gaia/commit/6f6df701391eccac1f38cb506b2fcf84e862323c

Since we failed to revert this from 2.0, I've cherry-picked both the revert and the new uplift to streamline the patching process for partners.

2.0 revert: https://github.com/mozilla-b2g/gaia/commit/a5283446aab3d4e730973ed3fc1de69b7e874c08
2.0 re-landing: https://github.com/mozilla-b2g/gaia/commit/5b7daa13fb14325f3571049118ff7e5ad1463c22
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Alive - After looking through some more code I feel that the problem might be happening when we OOM kill an app in the "closing" state. One thing that seems possible is that we call .kill() twice, and the second time isActive thinks it is false and we miss the closing event.

If this is happening, it should be possible to slow down the closing to a minute or so, then simulate an OOM kill event.

Adding a 'closing' check here might also be prudent: https://github.com/mozilla-b2g/gaia/blob/master/apps/system/js/app_window.js#L316
Flags: needinfo?(alive)
After the re-landing I am being informed that we are still seeing app.element errors in the logs. The problem is that there are several places that we might access the app.element object after it's been remove, this adds additional guards to the problem. I've gone through the entire file and I don't believe that there could be more occurrences of this happening. There always could be stuff outside of the transition controller though, so let's see what this does.

This pull request in question backs out the original commit, and relands with the additional guard for ease of uplifting and sheriffing.
Attachment #8473925 - Flags: review+
(In reply to Kevin Grandon :kgrandon from comment #74)
> Created attachment 8473925 [details] [review]
> Pull request - additional guard in app transition controller

As discussed in IRC , we already stopped importing gaia/gecko code. Could you please provide us a patch for v2.0 ?
Flags: needinfo?(tkundu) → needinfo?(kgrandon)
I am keeping it in opened status till we get patch for v2.0 :) If possible then please get it reviewed too :)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Re-landing due to additional logged problems being reported by our QA.

Re-landed in master: https://github.com/mozilla-b2g/gaia/commit/84ea29dc6b53732d6ed6665c70a6d0bc36a625db
Re-landed in v2.0: https://github.com/mozilla-b2g/gaia/commit/fb2dd31abed2803eb7ad67eb4c52abb48de1e0f7
Tapas - you will want to apply the latest patch in v2.0. This patch can be found here: https://github.com/mozilla-b2g/gaia/commit/7c0a9b420320ead6c0183cb0352b7bcbf24c73f2
Flags: needinfo?(kgrandon) → needinfo?(tkundu)
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
(In reply to Kevin Grandon :kgrandon from comment #73)
> Alive - After looking through some more code I feel that the problem might
> be happening when we OOM kill an app in the "closing" state. One thing that
> seems possible is that we call .kill() twice, and the second time isActive
> thinks it is false and we miss the closing event.
> 
> If this is happening, it should be possible to slow down the closing to a
> minute or so, then simulate an OOM kill event.
> 
> Adding a 'closing' check here might also be prudent:
> https://github.com/mozilla-b2g/gaia/blob/master/apps/system/js/app_window.
> js#L316

I will follow-up on this in bug 1031225 - seems more relevant there now.
Duplicate of this bug: 990975
According to kevin's comment in bug 990975, this bug might also affecting v1.3.
Clearing ni. Keep contact in bug 1031225.
Flags: needinfo?(alive)
Test case added in moztrap:

https://moztrap.mozilla.org/manage/case/14356/
QA Whiteboard: [QAnalyst-Triage?][lead-review+][2.0-signoff-need+] → [QAnalyst-Triage+][lead-review+][2.0-signoff-need+]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(dharris)
Flags: in-moztrap+
Attached file v1.4 branch patch
Comment on attachment 8475226 [details] [review]
v1.4 branch patch

Asking approval per bug 1031225 comment 96.
Attachment #8475226 - Flags: approval-gaia-v1.4?
Attachment #8475226 - Flags: approval-gaia-v1.4? → approval-gaia-v1.4+
Flags: needinfo?(bbajaj)
Duplicate of this bug: 1050751
clearing NI . We are again seeing same issue and we are working on bug 1056216 for this.
Flags: needinfo?(tkundu)
Issue is verified fixed in Flame 2.1, 2.0, 1.4  builds (Full Flash, nightly, 319 MB memory). 

Actual Results: Pressing Home takes user to the homescreen.  

Device: Flame 2.1
BuildID: 20141028001203
Gaia: a0174f7166745256aaca1cb3aa9f894033fbffa6
Gecko: 43bda3541f6b
Gonk: 6e51d9216901d39d192d9e6dd86a5e15b0641a89
Version: 34.0 (2.1)
Firmware: V188
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

Device: Flame 2.0
BuildID: 20141029000205
Gaia: 9f5b6f025e528fabfcc068782cb9b492cb51a7f9
Gecko: de8cfd54bf93
Gonk: 48835395daa6a49b281db62c50805bd6ca24077e
Version: 32.0 (2.0)
Firmware: V188
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

Device: Flame 1.4
BuildID: 20140922000333
Gaia: efa2b8cb095407df942fee7732a5547c7034ef9b
Gecko: 02154a103d43
Gonk: 
Version: 30.0 (1.4)
Firmware: V123
User Agent: Mozilla/5.0 (Mobile; rv:30.0) Gecko/30.0 Firefox/30.0
Status: RESOLVED → VERIFIED
QA Whiteboard: [QAnalyst-Triage+][lead-review+][2.0-signoff-need+] → [QAnalyst-Triage?][lead-review+][2.0-signoff-need+]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?][lead-review+][2.0-signoff-need+] → [QAnalyst-Triage+][lead-review+][2.0-signoff-need+]
Flags: needinfo?(ktucker)
You need to log in before you can comment on or make changes to this bug.