Closed Bug 989713 Opened 10 years ago Closed 10 years ago

[tarako]MMS/Email/Contact may be killed by lockscreen

Categories

(Firefox OS Graveyard :: Performance, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:1.3T+, b2g-v1.3 wontfix, b2g-v1.3T affected, b2g-v1.4 unaffected, b2g-v2.0 unaffected)

RESOLVED WORKSFORME
2.0 S2 (23may)
blocking-b2g 1.3T+
Tracking Status
b2g-v1.3 --- wontfix
b2g-v1.3T --- affected
b2g-v1.4 --- unaffected
b2g-v2.0 --- unaffected

People

(Reporter: james.zhang, Assigned: gsvelto)

References

Details

(Keywords: perf, regression, Whiteboard: [c=memory p= s= u=tarako], OOM, eta:4/18 [sprd316016 ])

Attachments

(9 files, 1 obsolete file)

In MMS/Email apps, lock screen and unlock screen, the apps has been killed.
This issue happen in tarako daily build 20140329.
http://bugzilla.spreadtrum.com/bugzilla/show_bug.cgi?id=294315
blocking-b2g: --- → 1.3T?
Flags: needinfo?(styang)
Flags: needinfo?(fabrice)
Summary: [tarako]MMS/Email will be killed by lockscreen → [tarako]MMS/Email may be killed by lockscreen
ni? thinker
can you have an initial look?T thanks 

James, can you please provide better STR? Thanks
Flags: needinfo?(tlee)
Flags: needinfo?(james.zhang)
Flags: needinfo?(fabrice)
(In reply to James Zhang from comment #1)
> http://bugzilla.spreadtrum.com/bugzilla/show_bug.cgi?id=294315

This link says:
1. configuring email account in Email app
2. press power button twice and see lockscreen
3. unlock screen

Expected result:
see email app

Actual result:
see homescreen, because email app has been killed.
Is there any other app in the background?
Flags: needinfo?(tlee)
Could you provide procrank for bugs of likes.  We almost need procrank info for this kind of bug.
The application is killed by LMK as it falls to oom_adj 10.
Flags: needinfo?(james.zhang)
Triage: potentially a regression. 1.3T+
to ting to investigate if this is due to recent LMK changes
Assignee: nobody → tchou
blocking-b2g: 1.3T? → 1.3T+
Keywords: regression
See Also: → 989710
Attached file profile_lock.html
Captured a profile, it shows when power button is pressed, b2g process does a restyle and screenshot.
(In reply to Ting-Yu Chou [:ting] from comment #8)
> Captured a profile, it shows when power button is pressed, b2g process does
> a restyle and screenshot.

Correction: b2g process does 2 restyle (CSS::ProcessRestyles()), which are from:

1. lock: function ls_lock(instant) {
     ...
     this.overlay.focus();

2. AppWindow.prototype.getScreenshot = function aw_getScreenshot(callback) {
     ...
     var req = this.iframe.getScreenshot(
       this.iframe.offsetWidth, this.iframe.offsetHeight);

The second one includes a reflow (PresShell::ProcessReflowCommands()) as well.
Trace the path 2 of comment 9 a bit more. There are two event handlers registered for hidewindow event in window_manager.js, and both will be triggered when visibility_manager.js handles lock event, I am wondering:

1. Isn't this redundant? Can they be unified?

   window.addEventListener('hidewindow', function() {
     if (displayedApp !== HomescreenLauncher.origin) {
       runningApps[displayedApp].setVisible(false);
     } else {
       HomescreenLauncher.getHomescreen().setVisible(false);
     }
   });

   window.addEventListener('hidewindow', function(e) {
     runningApps[displayedApp].setVisible(false, true);
   });

2. visibiltiy_manager.js set screenshoting false to the event's detail, but it is not considered in window_manager's event handler?

   case 'lock':
     ...
     if (!this._normalAudioChannelActive) {
       this.publish('hidewindow', { screenshoting: false });
Flags: needinfo?(alive)
Please remove the second event handler, sad bug.
Flags: needinfo?(alive)
For the path 1 of comment 9, the restyle is triggered from PresShell::FlushPendingNotifications():

  // The FlushResampleRequests() above flushed style changes.
  if (!mIsDestroying) {
    ...
    mPresContext->RestyleManager()->ProcessPendingRestyles();
  }

but FlushResampleRequests() the comment mentioned actually is not called since HasAnimationController() is false:

  // Flush any requested SMIL samples.
  if (mDocument->HasAnimationController()) {
    mDocument->GetAnimationController()->FlushResampleRequests();
  }

I will dive in to understand is the restyle necessary here.
Severity: major → blocker
Priority: -- → P1
Whiteboard: [c=memory p= s= u=]
Status: NEW → ASSIGNED
Keywords: perf
Whiteboard: [c=memory p= s= u=] → [c=memory p= s= u=tarako]
Comment on attachment 8400546 [details] [review]
PR - remove redundant hidewindow event handler

Ask Tim for reviewing as Alive is not around.
Attachment #8400546 - Flags: review?(alive) → review?(timdream)
Comment on attachment 8400546 [details] [review]
PR - remove redundant hidewindow event handler

r=timdream in place of Alive.
Attachment #8400546 - Flags: review?(timdream) → review+
1.3t: https://github.com/mozilla-b2g/gaia/commit/8d212f640014db62c622e8c939ddbf8595f00438

I am not entirely sure we need this on master or not. Alive can confirm.
status-b2g-v1.4: --- → ?
status-b2g-v2.0: --- → ?
Flags: needinfo?(styang) → needinfo?(alive)
I don't always see a LMK's killing message if the applicaiton goes away when I turn screen on/off!
(In reply to Ting-Yu Chou [:ting] from comment #17)
> I don't always see a LMK's killing message if the applicaiton goes away when
> I turn screen on/off!

Double checked, the message can be found from dmesg, seems adb logcat drop it.
Hi! James,

Just wonder this case is also fixed by 989572, patch provided by Ying?

--
Keven
Flags: needinfo?(james.zhang)
I think this case is fixed by Ting-Yu's patch.
Flags: needinfo?(james.zhang)
Whiteboard: [c=memory p= s= u=tarako] → [c=memory p= s= u=tarako], OOM
(In reply to Keven Kuo [:kkuo] from comment #19)
> Just wonder this case is also fixed by 989572, patch provided by Ying?

No. patches in 989572 just keep LMK  kill process properly.
And this bug was caused by LMK.

I noticed that you people upgrade the oom_adjust of prelaunch process, the value is the same with FOREGROUND_HIGH. It would cause the  prelaunch process always alive.

pref("hal.processPriorityManager.gonk.PREALLOC.OomScoreAdjust", 67);
pref("hal.processPriorityManager.gonk.PREALLOC.Nice", 18);
pref("hal.processPriorityManager.gonk.FOREGROUND_HIGH.OomScoreAdjust", 67);

I did some tests. I killed the prelaunch process manually, then test this bug. It could pass.
Hi! James, Ying,

Could we have a conclusion that this case is fixed by Ting-Yu's patch?
If yes, let's land the patch and close this case.
Thanks

--
Keven
(In reply to Keven Kuo [:kkuo] from comment #22)
> Hi! James, Ying,
> 
> Could we have a conclusion that this case is fixed by Ting-Yu's patch?
> If yes, let's land the patch and close this case.
> Thanks

No.
With heavy workload, entering the second item BIG-THREAD-MIXED, it happens still
(In reply to ying.xu from comment #21)
> I did some tests. I killed the prelaunch process manually, then test this
> bug. It could pass.

I would like to keep preallocated always alive.  I am considering to remove changes of https://bugzilla.mozilla.org/show_bug.cgi?id=974308#c62 .  It seems to cause a lot of problem.  Is there any one trying if removing https://bugzilla.mozilla.org/show_bug.cgi?id=974308#c62 fix issues of likes?
(In reply to Thinker Li [:sinker] from comment #24)

> I would like to keep preallocated always alive.  I am considering to remove
> changes of https://bugzilla.mozilla.org/show_bug.cgi?id=974308#c62 .  It

The patch is OK, It just did not have very obvious improvement ,but should have no side effect.

To improve capabilities of background app running, I intend to disable only prelaunch process and keep nuwa.
From what I saw, the RSS of prelaunch process was about 6 MB or more. That's a lot of memory.
And the speed of launching app was faster than android2.3.5,
Maybe we can scarifies app launch time to improve capabilities of background.
(In reply to ying.xu from comment #25)
> The patch is OK, It just did not have very obvious improvement ,but should
> have no side effect.

I feel that change never being well test with good numbers in result.
Could you explain what the obvious improvement is?  I need this information to judge objectively.

We have argued, for a long while, of the approach of fixing these memory issues with endless of adjustments of LMK.  It causes a lot side-effect.  We are looking for new solutions, include a thrashing detector, to fix the problem more exactly.
(In reply to Thinker Li [:sinker] from comment #26)
> (In reply to ying.xu from comment #25)
> I feel that change never being well test with good numbers in result.
> Could you explain what the obvious improvement is?  I need this information
> to judge objectively.

I think I could understand the original idea of the patch.But the test result was not so good.So ,no obvious improvement.

ffos@ffos2:~/yingxu/6821/device/sprd$ git show b8ee760324a8b9d714fa761bed2b3ad823f8e9a5
commit b8ee760324a8b9d714fa761bed2b3ad823f8e9a5
Author: james.zhang <james.zhang@spreadtrum.com>
Date:   Tue Mar 25 21:53:03 2014 +0800
    Bug#269156 merge patch from mozilla, Bug 974308 - [tarako] Study memory thrashing and find out a proper way to hand
diff --git a/sp6821a_gonk/dev-pref.js b/sp6821a_gonk/dev-pref.js
index 080c14c..b9b222f 100644
--- a/sp6821a_gonk/dev-pref.js
+++ b/sp6821a_gonk/dev-pref.js
@@ -15,9 +15,11 @@ pref("hal.processPriorityManager.gonk.FOREGROUND_HIGH.KillUnderKB", 2048);
 
 pref("hal.processPriorityManager.gonk.FOREGROUND.KillUnderKB", 4096);
 
-pref("hal.processPriorityManager.gonk.BACKGROUND_PERCEIVABLE.KillUnderKB", 6144);
+pref("hal.processPriorityManager.gonk.BACKGROUND_PERCEIVABLE.KillUnderKB", 18432);
 
-pref("hal.processPriorityManager.gonk.BACKGROUND.KillUnderKB", 20480);
+pref("hal.processPriorityManager.gonk.BACKGROUND_HOMESCREEN.KillUnderKB", 18432);
+
+pref("hal.processPriorityManager.gonk.BACKGROUND.KillUnderKB", 24576);
 
 pref("hal.processPriorityManager.gonk.notifyLowMemUnderKB", 14336);
There's no window manager in 1.4/1.5 hence not affected.
Flags: needinfo?(alive)
(In reply to ying.xu from comment #27)
> I think I could understand the original idea of the patch.But the test
> result was not so good.So ,no obvious improvement.

I don't get the point.  Do you mean to revert the patch b8ee760324a8b9d714fa761bed2b3ad823f8e9a5 improves little?
(In reply to Thinker Li [:sinker] from comment #29)
> I don't get the point.  Do you mean to revert the patch
> b8ee760324a8b9d714fa761bed2b3ad823f8e9a5 improves little?

yes. And the value has been changed. No need to revert the commit b8ee760324a8b9d714fa761bed2b3ad823f8e9a5
(In reply to ying.xu from comment #30)
> yes. And the value has been changed. No need to revert the commit
> b8ee760324a8b9d714fa761bed2b3ad823f8e9a5

So you're saying you have changed the min_free, and after the change you don't meet the issue even you're using heavy workload, is my understanding correct? If it is correct, could you please let me know the min_free you have now?
Flags: needinfo?(ying.xu)
(In reply to Ting-Yu Chou [:ting] from comment #31)
> So you're saying you have changed the min_free, and after the change you
> don't meet the issue even you're using heavy workload, is my understanding
> correct? If it is correct, could you please let me know the min_free you
> have now?

The issue still happens. 
We just lower value of the min_free for music/FM app, keep them alive at background.
Flags: needinfo?(ying.xu)
Memory pressure (low-memory) will be notified when the application moves to background. Thinker proposes an idea to suppress it if the visibility change is due to lock screen.
(In reply to ying.xu from comment #32)
> The issue still happens. 
> We just lower value of the min_free for music/FM app, keep them alive at
> background.

min_free of oom_adj 10 was also changed (24576KB -> 18432KB), but can still reproduce the issue (as the steps of comment 23 without background application).
(In reply to Ting-Yu Chou [:ting] from comment #33)
> Memory pressure (low-memory) will be notified when the application moves to
> background. Thinker proposes an idea to suppress it if the visibility change
> is due to lock screen.

I tried

  if (aPriority < PROCESS_PRIORITY_FOREGROUND) {
      //unused << mContentParent->SendFlushMemory(NS_LITERAL_STRING("low-memory"));
  }

but it does not help, Email was still killed.
base on the discussions, 1.3T is still not fixed
Whiteboard: [c=memory p= s= u=tarako], OOM → [c=memory p= s= u=tarako], OOM, eta:4/18
Attached file profile_sms.html
Captured profile of reading BIG-THREAD-MIXED thread, it shows the messages of the thread and the threads list are rendered simultaneously. The application keeps loading threads/messages even after set it background, and will be likely killed in this case.
Attachment #8400450 - Attachment description: profile.html → profile_lock.html
Attachment #8400450 - Attachment filename: profile.html → profile_lock.html
Thinker suggested to monitor the memory consumption caused by lock screen from uss/pss/rss. Following is what I got from b2g-info before/after pressing power button to lock screen (foreground application is homescreen):

<before>
           NAME PID PPID CPU(s) NICE  USS  PSS  RSS VSIZE OOM_ADJ USER
            b2g  84    1   34.1    0 36.0 40.0 47.9 131.5       0 root
         (Nuwa) 330   84    0.9    0  3.3  6.0 12.5  51.3       0 root
     Homescreen 371  330    4.2    1  9.7 14.1 23.3  67.7       2 app_371
(Preallocated a 428  330    0.8   18  5.1  8.2 15.9  58.3       1 root

<after>
           NAME PID PPID CPU(s) NICE  USS  PSS  RSS VSIZE OOM_ADJ USER
            b2g  84    1   34.4    0 36.1 40.1 48.0 130.5       0 root
         (Nuwa) 330   84    0.9    0  3.3  6.0 12.5  51.3       0 root
     Homescreen 371  330    4.2   18  9.3 13.7 22.9  66.4       8 app_371
(Preallocated a 428  330    0.8   18  5.1  8.2 15.9  58.3       1 root

<after>
           NAME PID PPID CPU(s) NICE  USS  PSS  RSS VSIZE OOM_ADJ USER
            b2g  84    1   34.4    0 36.1 40.1 48.0 130.5       0 root
         (Nuwa) 330   84    0.9    0  3.3  6.0 12.5  51.3       0 root
     Homescreen 371  330    4.2   18  9.3 13.7 22.9  66.4       8 app_371
(Preallocated a 428  330    0.8   18  5.1  8.2 15.9  58.3       1 root
Tried to stop loading threads/messages if the document is hidden, but there're still chances SMS gets killed. Since after pressing power button to lock screen, the application's priority will be changed to background at first (ParticularProcessPriorityManager::SetPriorityNow()), then visibilitychange event callback.

diff --git a/apps/sms/js/message_manager.js b/apps/sms/js/message_manager.js
index 1cbf96a..ec04e9a 100644
--- a/apps/sms/js/message_manager.js
+++ b/apps/sms/js/message_manager.js
@@ -321,7 +321,8 @@ var MessageManager = {
 
         each && each(this.result);
 
-        this.continue();
+        if (!document.hidden)
+          this.continue();
         return;
       }
 
@@ -372,7 +373,7 @@ var MessageManager = {
           shouldContinue = each(this.result);
         }
         // if each returns false the iteration stops
-        if (shouldContinue !== false) { // if this is undefined this is fine
+        if (shouldContinue !== false && !document.hidden) { // if this is undef
           this.continue();
         } else {
           done && done();
Change application's oom_adj to background before visibilitychange event callback means the application may have no chances to lower its loading before gets killed.

But I also learned from Cervantes that there's a comment (http://goo.gl/vWeG38) explain this, it is about OOM concerns.
oom_adj is changed to background by following path:

  scm_turnScreenOff()
    ls_lock()
      aw_setVisible()
        aw__hideFrame()
          iframe.setVisible(false)
            nsFrameLoader::SetVisible
              ParticularProcessPriorityManager::SetPriorityNow())

I am now checking how is visibilitychange event fired.
This is how priority changed and visibiltiychange event fired:

# B2G
  scm_turnScreenOff()
    ls_lock()
      aw_setVisible()
        aw__hideFrame()
          iframe.setVisible(false)
            BrowserElementParent._setVisible(false)
              _sendAsyncMsg('set-visible')
              nsFrameLoader::SetVisible
                ParticularProcessPriorityManager::OnFrameloaderVisibleChanged()
                  ResetPriorityNow()
                    SetPriorityNow()

# Messages
  BrowserElementChildPreload._recvSetVisible()
    BrowserElementChildPreload._upredateVisibility()
      nsDocShell.SetIsActive()
        nsDocument::PostVisibilityUpdateEvent()
          NS_NewRunnableMethod(&nsDocument::UpdateVisibilityState)
      sendAsyncMsg('visibilitychange')
  ...
  nsDocument::UpdateVisibilityState()
    nsContentUtils::DispatchTrustedEvent('visibilitychange')
      nsINode::DispatchEvent('visibilitychange')
        nsEventDispatcher::DispatchDOMEvent('visibilitychange')
Depends on: 99892
Depends on: 993892
No longer depends on: 99892
For the Messages app:

  1. It does not stop loading threads/messages after change to background, filed
     bug 993892.

  2. oom_adj may be changed to background level before receiving visibilitychange
     event, i.e., LMK has chances to kill the app before it decides to suspend
     loading.

I discussed #2 with Thinker, he thinks it's not reasonable to guarantee visibilitychange to be done earlier than changing process's priority, we can only do it best-effort. Also handle visibilitychange of background application synchronously may affect foreground application lunch time.

So probably I will not do anything about #2, but it is welcome if anyone else have different opinions.
James, for this issue, you are expecting?

  1. The app should never be killed,
  2. The app can be killed but need to have some sorts of recovering, e.g., showing on cards view, or
  3. Others
Flags: needinfo?(james.zhang)
1
I think We must keep at least one background app running.
(In reply to ying.xu from comment #45)
> 1
> I think We must keep at least one background app running.

Agree.
We MUST keep at least one background app.
And We can have some sorts if more than one background app.
Flags: needinfo?(james.zhang)
I must clarify the comment of Ting.  For the gecko, it is reasonable to expect Gecko to keep one background app at least, since apps could use a lot of memory, not controlled by the gecko.  Keeping one background app is only true with some constrain.

Tim, could you help us to find the people who could handle this at app side? (MMS and EMail)
I hope we could also reduce size of apps from the app side.
Flags: needinfo?(timdream)
(In reply to Thinker Li [:sinker] from comment #47)
> I must clarify the comment of Ting.  For the gecko, it is reasonable to
                                               unreasonable ^^^^^^^^^^
I'll also check why Messages app needs that much memory for loading threads/messages.
Roughly estimation from b2g-info, Attachment.getThumbnail() takes around 5MB while rendering BIG-THREAD-MIXED, ThreadListUI.setContact() takes another 3MB for reference-workload-heavy.
about:memory after launch Email (page "New Account").
Attached file about_memory.sms
about:memory of Messages app while loading BIG-THREAD-MMS.
(In reply to Joe Cheng [:jcheng] from comment #36)
> base on the discussions, 1.3T is still not fixed

First of all, we really shouldn't be reopening a bug when the patch is not reverted.

Second of all, it seems that the message case is already covered in bug 993892, and it's being decided to be not blocking.

I don't know about the case of e-mail so ni Dylan.

If there is nothing to be done in Gecko/Gonk/LMK we should close this bug as invalid.
Flags: needinfo?(timdream) → needinfo?(doliver)
I also see background application to be killed by LMK when Preallocated process calls PreloadSlowThings().
Just noticed "image.mozsamplesize.enabled" is default false on Tarako.
(In reply to Ting-Yu Chou [:ting] from comment #55)
> Just noticed "image.mozsamplesize.enabled" is default false on Tarako.

Never mind, it does not affect certified app.
Flags: needinfo?(jcheng)
Depends on: 996516
Attachment #8406034 - Attachment description: about_memory.email → about_memory.email.new_account
about:memory at Email inbox, use moztpeqa@gmail.com as account, and have around 140 mails.
Throw to Gaia::E-mail to get more visibility.
Component: Performance → Gaia::E-Mail
It seems really weird to move a bug with 57 comments about system issues to the e-mail component.  I think the usual thing would be to file a new e-mail specific bug about e-mail's memory usage and make it block this bug as a meta bug.  I'm going to cc the e-mail developers and move this bug back since I expect the goal was to have all of us see it without having to know all our e-mails.  We can then spin off e-mail-specific bugs as appropriate.  I'll make a comment on other options for e-mail here.
Component: Gaia::E-Mail → Performance
A few starter questions:

0) What is the number in about:memory that we are looking to optimize to not get killed?  Does the kernel OOM killer understand the memory email is sharing with the parent process?

For example, naively reading the about:memory logs, I would think resident-unique might be a good number for us to optimize, but it barely moves at all between the 2 email attachments where we gain another 10 MiB in explicit use.


1) What is the memory usage goal for email that will get us not killed?

Comment 3 implies that the 14.27 MiB resident-unique just with our account config screen up in attachment 8406034 [details] will get us killed.  If that's the case, then there's effectively no way e-mail can optimize itself into not getting killed since then it's a question of code size that's killing us and not even all of our code is loaded at that point since the protocol-stuff will get lazy loaded during the account setup process.

All that leaves us then is persisting our state somehow and having the system app revive us when the screen unlocks.

Note that I am assuming that the e-mail app started up with no accounts created as is described there.  Memory usage will be very different if you get to that screen by deleting an existing account without closing the email app.


2) So in attachment 8406762 [details] I see email's process was using 26MiB with 6MiB free.  The 6MiB seems like a lot; does this imply the e-mail app is fragmenting memory badly and that we should try and clean up our allocation strategies?  Or does it mean that gecko is not trying to reclaim heap as aggressively as it should be?
And one more question:

3) Have we done anything about SpiderMonkey duplicately storing copies of the source code for toString() purposes? (Also, potentially lazy compilation.)  The email app categorically does not use toString() on its functions and doesn't need/want the costs for such things.
(In reply to Andrew Sutherland (:asuth) from comment #61)
> 3) Have we done anything about SpiderMonkey duplicately storing copies of
> the source code for toString() purposes? (Also, potentially lazy
> compilation.)  The email app categorically does not use toString() on its
> functions and doesn't need/want the costs for such things.

See bug 990353 for the current effort to improve this.

Note, we used to do compression for saved sources on b2g devices in v1.1, but this got disabled for other perf reasons some time in the v1.2 timeframe.
Flags: needinfo?(jcheng)
See Also: → 998633
Can this be repro now? And the depending bug 998633 is P3. Can this bug be "fixed" if it can't be repro.
Flags: needinfo?(james.zhang)
(In reply to thomas tsai from comment #63)
> Can this be repro now? And the depending bug 998633 is P3. Can this bug be
> "fixed" if it can't be repro.

Yes.
Flags: needinfo?(james.zhang)
Flags: needinfo?(doliver)
perhaps Bug 999563 did not help with this bug? ni? Gabriele
Thanks
Flags: needinfo?(gsvelto)
I noticed LMK is usually invoked from kswapd, some numbers:

  /proc/sys/vm/min_free_kbytes = 1350
  /proc/zoneinfo/pages_min     = 337
                 pages_low     = 549
                 pages_high    = 633

and the messages from lowmem_shrink():

  <4>0[  202.937636] 1244 free pages

Seems kswapd is woken unnecessarily according to the number of free pages?
James, do you know why the number of free pages from

  show_mem(), and
  zone_page_state(..., NR_FREE_PAGES)

are different if there's only one zone, did I miss anything?
Flags: needinfo?(james.zhang)
Loop Ying, and I will ask our kernel memory owner.
Flags: needinfo?(james.zhang) → needinfo?(ying.xu)
Following STR from Comment 3, unable to reproduce end result of e-mail app termination and return to homescreen. Attempted manual setup (3 times) and IMAP+SMTP (2) and all times e-mail app remained intact on latest 1.3T build.

Environmental Variables:
Device: Tarako 1.3T MOZ
BuildID: 20140502014001
Gaia: a8e0ff550de08e58e4bf75af3cecf175b9b71e70
Gecko: 71790bf476cb
Version: 28.1
Firmware Version: sp6821a-gonk4.0-4-29
Flags: needinfo?(nhirata)
Even though we can't repro around the STR, my suggestion is to anyway leave this bug open so that we can revisit the scenario and retest.   These steps might be working, but others may be broken as per 998663, so leaving open for now.
Status: ASSIGNED → NEW
Flags: needinfo?(nhirata)
This still can occur for me.
Launch email app, scroll a little bit down and then lock the screen.

Gaia      a8e0ff550de08e58e4bf75af3cecf175b9b71e70
Gecko     https://hg.mozilla.org/releases/mozilla-b2g28_v1_3t/rev/71790bf476cb
BuildID   20140502014001
Version   28.1
ro.build.version.incremental=eng.cltbld.20140502.085951
ro.build.date=Fri May  2 08:59:59 EDT 2014
Tarako
Status: NEW → ASSIGNED
I'm late to this bug but I still want to try something here: I think that comment 40 is pretty close to nailing the issue down. When we turn off the screen the foreground app visibility is set to hidden, in the process priority manager we assume this means the app went into the background and thus lower its priority. This makes the LMK more likely to kill the former foreground app rather than the homescreen which always resides at a higher priority than other background apps. In this scenario however it would make more sense to kill the homescreen instead so that the foreground app is still available when the user turns the screen on again.

BTW as was already pointed out the actual killing is probably happening in response to us taking screenshots of the various apps; this causes a spike in memory usage that triggers the LMK. AFAIK we have no reasonable fix for this issue at the moment.
Flags: needinfo?(gsvelto)
(In reply to Gabriele Svelto [:gsvelto] from comment #72)
> This makes the LMK more likely to kill the former
> foreground app rather than the homescreen which always resides at a higher
> priority than other background apps.

Seems bug 994518 would handle this.
See Also: → 994518
(In reply to Ting-Yu Chou [:ting] from comment #66)
> Created attachment 8415144 [details]
> dmesg - LMK selects Email

> Seems kswapd is woken unnecessarily according to the number of free pages?

This case mean a big order(2, 4pages,or more) memory-block was being requiring but there is no match memory-block in the memory pool. So the kswapd kept working to release memory.
Flags: needinfo?(ying.xu)
(In reply to Ting-Yu Chou [:ting] from comment #67)
> James, do you know why the number of free pages from
> 
>   show_mem(), and
>   zone_page_state(..., NR_FREE_PAGES)
> 
> are different if there's only one zone, did I miss anything?

I don't know that
(In reply to ying.xu from comment #74)
> This case mean a big order(2, 4pages,or more) memory-block was being
> requiring but there is no match memory-block in the memory pool. So the
> kswapd kept working to release memory.

Ying, what I'd like to know is why following numbers are that different:

  <4>0[  202.935558] Normal free:2380kB
  <4>0[  202.937636] 1244 free pages

There're ~2.6MB free pages not belong to the zone, do you know what is it reserved for?
See Also: 994518
I'm trying to develop a fix for what I've described in comment 72 but I'm moving the work under bug 994518 as suggested by Ting-Yu to prevent confusion with the rest of the discussion here.
We may delay pushing the foreground app to background and keep at least one foreground app. I tried to implement this hack and it prevents the email app from being killed by the lock screen given that we always push the app to background before bringing the next app to foreground. But this is not the case for switching to the cards view, there the homescreen is brought to the foreground before we got the screenshot of the foreground app and send it to background.
I've toyed with different approaches but couldn't find a good one yet. I'm attaching a WIP I've been working on that discards foreground -> background transitions when the screen is turned off. It works but it's fairly fragile (it relies on the sequence of notifications to work) and because of this doesn't fix the issue when the lockscreen is enabled which causes the foreground app to become background *before* the screen is turned off.

Anyway until this point Cervantes' patch in attachment 8418699 [details] [diff] [review] seems to work better though I only gave it little testing.

I'm also playing with making the homescreen more likely to be killed when there's a foreground app and the screen is turned off.
Comment on attachment 8422457 [details] [diff] [review]
WIP: Discard foreground -> background transitions when the screen is off

Quick update, I've tested Cervantes' patch thoroughly and I found a flaw I hadn't noticed right away: it works only for the first foreground -> background transition. So if you turn the screen on and off twice the foreground app will be sent to the background anyway.

Fortunately - and I really have to thank Cervantes for this - he gave me the right idea on how to fix this which is why I'm obsoleting this patch. I'm cooking up another patch that tracks which is the current foreground app and degrades it's priority only if another foreground app appears. So whenever the foreground app becomes invisible but no other app comes to the foreground it's not degraded.

I'm currently testing it and it's looking good but I have to verify that this doesn't regress the time needed to launch a new app.
Attachment #8422457 - Attachment is obsolete: true
Taking this bug as I think I've finally got a robust solution for fixing it. This patch tracks the current foreground application and lowers its priority only when another foreground application comes up (ignoring the keyboard which is treated separately even when it's in the foreground). I still have to add a small modification, namely when a foreground application becomes hidden even if I don't lower its LMK priority immediately I still want to lower its CPU priority so that it doesn't impact the startup time of another application.

Cervantes, what do you think of this change? I gave it some testing on my Tarako and it seems to work well.
Assignee: tchou → gsvelto
Attachment #8423464 - Flags: feedback?(cyu)
Comment on attachment 8423464 [details] [diff] [review]
[PATCH WIP] Prevent the foreground application from being killed when we turn off the phone

Review of attachment 8423464 [details] [diff] [review]:
-----------------------------------------------------------------

I think you mean when we turn off the screen, not the phone, or that would be amazing :D.

The patch provides a neat solution and doesn't have the problem in cards view.
The extra quirk is when I test this on m-c, the foreground app is pushed to background when I turn on the screen, and the keyboard app becomes foreground. Is this what you mean in comment #82 to be handled separately (although this is a nonissue to tarako)?
Attachment #8423464 - Flags: feedback?(cyu) → feedback+
(In reply to Cervantes Yu from comment #83)
> I think you mean when we turn off the screen, not the phone, or that would
> be amazing :D.

Good point :D

> The patch provides a neat solution and doesn't have the problem in cards
> view.

I'm verifying that activities with multiple windows still work, that's another corner case I want to make right. One of the downsides of my patch is that I'll need to patch up almost all the mochitests in Gecko, but that's a small price for fixing this.

> The extra quirk is when I test this on m-c, the foreground app is pushed to
> background when I turn on the screen, and the keyboard app becomes
> foreground. Is this what you mean in comment #82 to be handled separately
> (although this is a nonissue to tarako)?

No, what I meant is that even though I don't want to change the foreground's app priority right away it would still be good to drop its CPU usage. Let me explain what scenario I have in mind:

- app A is in the foreground and working, app B is launched
- app A is sent into the background, we don't drop it's priority
- app B is starting so it's not visible and still in the background, app A is slowing it down
- app B goes into the foreground and finally app A priority drops

My idea is to do the following

- app A is in the foreground and working, app B is launched
- app A is sent into the background, we don't drop it's LMK priority but we drop it's CPU priority (i.e. increase its nice value)
- app B is starting so it's not visible and still in the background, app A doesn't slow it down
- app B goes into the foreground and finally app A LMK priority drops

The keyboard problem appears to be a separate issue. Apparently the keyboard application type was recently changed to "inputmethod" so we don't pick the right priority anymore. Changing the code ParticularProcessPriorityManager::ComputePriority() seems to solve this problem.
Ugh, the keyboard issue is actually worse than I thought. When we launch the keyboard app we set its priority before setting its application type so it gets the PROCESS_PRIORITY_FOREGROUND right away and sticks with it due to my patch. It picks the PROCESS_PRIORITY_FOREGROUND_KEYBOARD priority only when it's rescheduled; this will require yet another fix :-|
OK, I've done some further testing and I've found out that the issue described in comment 85 cannot be fixed easily on master. The issue here is whenever a hidden application launches (such as the keyboard) it's visible for a brief amount of time before it becomes hidden. This dupes the logic in attachment 8423464 [details] [diff] [review] into thinking that that's the currently visible app and thus potentially degrading the priority of another perfectly valid and visible application which should have been kept in the foreground.

In addition to this my change is somewhat brittle in the face of multi-window activities where all windows are considered visible at the same time. I've tried some scenarios with three applications opened (A calls activity that opens B that calls activity that opens C) and the resulting behavior is very inconsistent. Without my patch the three applications are either considered all in the foreground (when visible) or all in the background (when hidden). This is suboptimal but consistent. With my patch applied one of the three will be still considered in the foreground when the display is off, but it's essentially picked at random among the three apps.

On the bright side I did some further testing today using the latest PAC file provided (2014-05-12) as the phone's base system and I cannot reproduce the original problem anymore. I.e. turning off the phone does not kill the SMS/E-mail app anymore, even with the application populated by the reference workloads. I haven't checked how much memory was available with the old PAC file but my guess is that the kernel or some other component is using less memory which prevents the application from being reaped in spite of its OOM adjustment being raised. James could you confirm this too?

I'd be very, very glad to close this as WORKSFORME at this stage because all the approaches that we've tried are quite risky and prone to regressions.

The process priority manager badly needs an overhaul to prevent these situations in the future but the changes involved are large enough that there's no hope of doing them within the 5/20 deadline.
Flags: needinfo?(james.zhang)
MMS support draft saving now, WORKSFORME.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: needinfo?(james.zhang)
Resolution: --- → WORKSFORME
(In reply to James Zhang from comment #87)
> MMS support draft saving now, WORKSFORME.

Thanks, note that the e-mail app also stays alive when turning off and on the screen.
We found 100% reproduce path.
Import contact from sdcard, lock screen, contact app killed by lock screen and import failed.
Status: RESOLVED → REOPENED
Flags: needinfo?(shiwei.zhang)
Flags: needinfo?(arvin.zhang)
Resolution: WORKSFORME → ---
Summary: [tarako]MMS/Email may be killed by lockscreen → [tarako]MMS/Email/Contact may be killed by lockscreen
Whiteboard: [c=memory p= s= u=tarako], OOM, eta:4/18 → [c=memory p= s= u=tarako], OOM, eta:4/18 [sprd316016 ]
Flags: needinfo?(arvin.zhang)
importing contact doesn't dim the screen right? did you put the phone to lockscreen by pressing power key? thanks
Flags: needinfo?(james.zhang)
Yes, press power key.
Flags: needinfo?(james.zhang)
Scratch my previous comment, I just realized that the homescreen has been made in-process in bug 1014272. Can you still reproduce with the patch from that bug applied?
Flags: needinfo?(james.zhang)
Loop Arvin.
Flags: needinfo?(james.zhang) → needinfo?(arvin.zhang)
Just to give a clarification here this bug doesn't reproduce anymore because the root cause of the issue went away with the fix for bug 1014272. The foreground app here was being killed when the lockscreen appeared as it was sent to the background and thus got a lower oom_score_adj value than the homescreen. The LMK would then kill it instead of the homescreen. Since the homscreeen app has been moved into the main process this doesn't happen anymore and even with two apps open the former foreground app will be kept alive in this scenario as background apps are killed in least recently used order.

I'm waiting for a final feedback from the original reporter but from my part I'd close this bug already.
Since there's been no further action on this bug I'm closing it as I can't reproduce it anymore, neither with the original STR nor with the latest ones.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Resolution: FIXED → WORKSFORME
(In reply to Gabriele Svelto [:gsvelto] from comment #95)

> I'm waiting for a final feedback from the original reporter but from my part
> I'd close this bug already.

I can confirm the buy can not be reproduced. Thank everybody working on this.
Target Milestone: --- → 2.0 S2 (23may)
Flags: needinfo?(arvin.zhang)
Flags: needinfo?(shiwei.zhang)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: