Closed Bug 675259 Opened 13 years ago Closed 12 years ago

Understand what causes Android to kill other processes

Categories

(Firefox for Android Graveyard :: General, defect)

All
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: stechz, Assigned: gbrown)

Details

(Keywords: mobile, perf, Whiteboard: mobilestartupshrink)

Attachments

(1 file)

I have noticed that if Android decides to kill processes while Fennec is opening, there are several services that will restart themselves immediately. This seems to starve Fennec for resources and greatly increases startup time.

It would help if we knew what kind of memory pressure causes Android to kill everything else. Perhaps we could optimize startup for that.
Whiteboard: mobilestartupshrink
Assignee: nobody → gbrown
There are comments on the web about Android's "low memory killer". Mostly people seem to be interested in tweaking it to free up more memory more often, in hopes of a more responsive UI and/or longer battery life.

I found this post helpful: http://forum.xda-developers.com/showthread.php?t=622666

Summary:
 - Every process has a oom_adj value, visible at /proc/<pid>/oom_adj. Higher values of oom_adj are more likely to be killed by the kernel's oom killer. The current foreground app has a oom_adj of 0.
 - There are oom_adj values associated with process classes such as "foreground app", "visible app", "secondary server", "hidden app", etc.
 - The low memory killer uses configurable rules based on free memory and oom_adj thresholds. ie, rules state "if free memory < X1, kill processes with oom_adj > Y1"
 - There is often one rule for each process class. However, it appears there is a limit of 6 rules in total, so sometimes there is one rule for 2 or more process classes.
 - The rules are configured succinctly using two parameters:

/sys/module/lowmemorykiller/parameters/adj=<list of oom_adj thresholds>
/sys/module/lowmemorykiller/parameters/minfree=<parallel list of memory thresholds, in 4K pages>


From my Samsung Galaxy S' /init.rc:

# Define the oom_adj values for the classes of processes that can be
# killed by the kernel.  These are used in ActivityManagerService.
    setprop ro.FOREGROUND_APP_ADJ 0
    setprop ro.VISIBLE_APP_ADJ 1
    setprop ro.SECONDARY_SERVER_ADJ 2
    setprop ro.BACKUP_APP_ADJ 2
    setprop ro.HOME_APP_ADJ 4
    setprop ro.HIDDEN_APP_MIN_ADJ 7
    setprop ro.CONTENT_PROVIDER_ADJ 14
    setprop ro.EMPTY_APP_ADJ 15

# Define the memory thresholds at which the above process classes will
# be killed.  These numbers are in pages (4k).
    setprop ro.FOREGROUND_APP_MEM 2560
    setprop ro.VISIBLE_APP_MEM 4096
    setprop ro.SECONDARY_SERVER_MEM 6144
    setprop ro.BACKUP_APP_MEM 6144
    setprop ro.HOME_APP_MEM 6144
    setprop ro.HIDDEN_APP_MEM 7168
    setprop ro.CONTENT_PROVIDER_MEM 8192
    setprop ro.EMPTY_APP_MEM 9216

# Write value must be consistent with the above properties.
# Note that the driver only supports 6 slots, so we have HOME_APP at the
# same memory level as services.
    write /sys/module/lowmemorykiller/parameters/adj 0,1,2,7,14,15                             
    write /sys/module/lowmemorykiller/parameters/minfree 2560,4096,6144,7168,8192,9216                       

I read this as:
 - if free memory goes below 9216x4K, kill EMPTY_APP processes;
 - if free memory goes below 8192x4K, kill CONTENT_PROVIDER and EMPTY_APPs.
 - if free memory goes below 7168x4K, kill HIDDEN_APP, CONTENT_PROVIDER and EMPTY_APPs
 - ... etc.
Notice the algorithm:
http://android.git.kernel.org/?p=kernel/common.git;a=blob;f=drivers/misc/lowmemorykiller.c;h=6be1b08a4c1589243b109ad7350cf4414ec19edc;hb=HEAD

Trigger is nr_free_pages (which seems to be the amount of free memory pages on the system--not virtual, global memory).

The metric used to kill processes is RSS. IIUC this means actual pages of virtual memory that are in real memory, which would include memory shared by other processes but not uncommitted memory from mmap.
Zippity tracks RSS for Fennec. Check out the data here: http://zippityserver.appspot.com/metrics/graph
(In reply to Mark Finkle (:mfinkle) from comment #4)
> Zippity tracks RSS for Fennec. Check out the data here:
> http://zippityserver.appspot.com/metrics/graph

Yep, those are pretty bad.

So, things that will help us for Fennec:
* shrinking binary size (RSS includes code loaded into memory! This is about 32MB or so for Fennec now.) -- perhaps we could think about unloading a lot of the binary pages and lazily reload them when the process is reactivated?
* shrinking committed memory
** shrinking all caches, especially on hiding (it won't kill us when we're foreground, but when we're background we should attempt to remove all cache that we can from memory IMO--I don't know how our memory pressure event works, but I'm not sure I trust it)

Things that don't really help:
* sharing memory (RSS size will be the same, though it might prevent the trigger from occuring)
* shrinking uncommitted memory (I looked at the JS stack, which is listed as 8MB, but the majority of it is not committed so it will not be in RSS AIUI)
(In reply to Benjamin Stover (:stechz) from comment #5)
> (In reply to Mark Finkle (:mfinkle) from comment #4)
> > Zippity tracks RSS for Fennec. Check out the data here:
> > http://zippityserver.appspot.com/metrics/graph
> 
> Yep, those are pretty bad.
> 
> So, things that will help us for Fennec:
> * shrinking binary size (RSS includes code loaded into memory! This is about
> 32MB or so for Fennec now.) -- perhaps we could think about unloading a lot
> of the binary pages and lazily reload them when the process is reactivated?
> * shrinking committed memory
> ** shrinking all caches, especially on hiding (it won't kill us when we're
> foreground, but when we're background we should attempt to remove all cache
> that we can from memory IMO--I don't know how our memory pressure event
> works, but I'm not sure I trust it)

We do fire a "memory-pressure" notification when leaving the foreground. Some caches are cleared by Gecko in response. MXR can help show which ones are and which ones we could add support.
Keywords: mobile, perf
OS: All → Android
(In reply to Benjamin Stover (:stechz) from comment #5)
> So, things that will help us for Fennec:

Great observations here, but I wonder if we are straying from the original objective: avoid killing services when Fennec starts. If that is our primary objective, then it seems to me that we really want to avoid *triggering* the kill-off, which is based on global_page_state(NR_FREE_PAGES), and global_page_state(NR_FILE_PAGES). (A definition of NR_FREE_PAGES and NR_FILE_PAGES would be helpful, but I'm not finding anything definitive.)
This is a quick-and-dirty python script that uses adb to periodically execute ps on device and report changes. For example:


--------------Mon Aug 29 22:35:29 2011--------------
              total         used         free       shared      buffers
  Mem:       346844       327256        19588            0         4756
 Swap:            0            0            0
Total:       346844       327256        19588
 
 >> NEW: app_83    23406 2765  230508 19024 ffffffff 809203c4 R org.mozilla.fennec_geoff

--------------Mon Aug 29 22:35:32 2011--------------
              total         used         free       shared      buffers
  Mem:       346844       343680         3164            0         4740
 Swap:            0            0            0
Total:       346844       343680         3164
 
 >> NEW: app_83    23444 23406 35944  4708  c02e3c44 afd0dc5c D /data/data/org.mozilla.fennec_geoff/plugin-container

--------------Mon Aug 29 22:35:42 2011--------------
              total         used         free       shared      buffers
  Mem:       346844       335932        10912            0         4760
 Swap:            0            0            0
Total:       346844       335932        10912
 
 >> KILLED: app_68    21634 2765  232784 22800 ffffffff afd0ee48 S com.sec.android.app.videoplayer
Does anyone want me to investigate something further for this bug, or are we satisfied with the info we have now?
Marking as Fixed: Nothing was landed, but this bug was about gathering information and that's in the comments above.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: