Closed
Bug 974308
Opened 11 years ago
Closed 9 years ago
[tarako] Study memory thrashing and find out a proper way to handle it.
Categories
(Firefox OS Graveyard :: Performance, defect, P3)
Tracking
(blocking-b2g:-, tracking-b2g:-, b2g-v1.3T affected)
RESOLVED
INCOMPLETE
Tracking | Status | |
---|---|---|
b2g-v1.3T | --- | affected |
People
(Reporter: sinker, Assigned: ting)
References
Details
(Keywords: perf, Whiteboard: [c=memory p= s= u=tarako])
Attachments
(9 files, 7 obsolete files)
154 bytes,
text/html
|
Details | |
154 bytes,
text/html
|
Details | |
154 bytes,
text/html
|
Details | |
5.62 KB,
patch
|
Details | Diff | Splinter Review | |
1.28 MB,
application/gzip
|
Details | |
154 bytes,
text/html
|
Details | |
154 bytes,
text/html
|
Details | |
154 bytes,
text/html
|
Details | |
6.17 KB,
patch
|
Details | Diff | Splinter Review |
This bug is to study memory thrashing of 128MB devices, and find out a way to improve the response time.
The tarako would be totally blocked for a few seconds. It is some kind of memory thrashing. The solution can be changing parameters of OOM killer or a customized thrashing aware killer.
Assignee | ||
Comment 1•11 years ago
|
||
Modified vmstat a bit to gather the system information we're intrested. Will collect yaffs numbers as well later.
Assignee | ||
Updated•11 years ago
|
Attachment #8378186 -
Attachment description: the spreadsheet of collected system information → http://goo.gl/fNPxny
Assignee | ||
Updated•11 years ago
|
Attachment #8378186 -
Attachment mime type: text/plain → text/url
Assignee | ||
Updated•11 years ago
|
Attachment #8378186 -
Attachment mime type: text/url → text/x-url
Updated•11 years ago
|
blocking-b2g: --- → 1.3T?
Assignee | ||
Updated•11 years ago
|
Attachment #8378186 -
Attachment mime type: text/x-url → application/x-url
Assignee | ||
Comment 2•11 years ago
|
||
Attachment #8378186 -
Attachment is obsolete: true
Reporter | ||
Comment 3•11 years ago
|
||
Hi Ting, what the number is responsible for response time of the device? I mean how do I know the device is blocked or not at a given period?
Reporter | ||
Comment 4•11 years ago
|
||
Could you also explain what these fields meaning?
Assignee | ||
Comment 5•11 years ago
|
||
Comment on attachment 8378206 [details]
high_sys_load#1
running = /procs/stat procs_running
blocked = /procs/stat procs_blocked
free = /proc/meminfo MemFree
cached = /proc/meminfo Cached
mapped = /proc/meminfo Mapped
anonomous = /proc/meminfo AnonPages
zram = /sys/block/zram0/mem_used_total
major fault = /proc/vmstat pgmajfault
swap in = /proc/vmstat pswpin
swap out = /proc/vmstat pswpout
us = /proc/stat cpu user
ni = /proc/stat cpu nice
sy = /proc/stat cpu system
id = /proc/stat cpu idle
wa = /proc/stat cpu iowait
ir = /proc/stat cpu irq
si = /proc/stat cpu softirq
For the attachment 8378206 [details], system loading is always around 99, which is not
responding. I will add yaffs numbers and iterate more times to understand the
status before entering high system loading.
Attachment #8378206 -
Attachment description: spreadsheet_collected-stats → high_sys_load#1
Attachment #8378206 -
Attachment filename: spreadsheet.html → high_sys_load#1.html
Assignee | ||
Comment 6•11 years ago
|
||
Recorded another log, added two numbers:
yaffs_n_page_reads = summation of /proc/yaffs n_page_reads
yaffs_n_page_writes = summation of /proc/yaffs n_page_writes
and dropped 2:
cpu_irq
cpu_softirq
System loading is showed as stepped area at right vertical axis. Be noted system
loading goes up when yaffs_n_page_reads increases.
Assignee | ||
Comment 7•11 years ago
|
||
Updated link to the chart.
Attachment #8378206 -
Attachment is obsolete: true
Assignee | ||
Comment 8•11 years ago
|
||
Attachment #8378815 -
Attachment is obsolete: true
Assignee | ||
Comment 9•11 years ago
|
||
Captured another log, the high load started from tapping message button in a contact. I then tried to edit message and pressed keyboard several times.
Assignee | ||
Comment 10•11 years ago
|
||
I am thinking:
- Seek chances to improve NAND read performance
- Read ahead and utilize chip's feature to read with data cache
- bug 944457, comment 211 shows reading smaller file requires 200~400us for
reading a page (2KB), but from NAND chip's spec, it takes 25us maximum
- This will be a system wide improvement if we can make any
- Avoid swapping by killing background application earlier
- LMK actually is not triggered when the min_free level is reached. It is a
slab shrinker, which is called after kernel has tried hard to reclaim pages
- Possible options:
- ulmkd, a user space low memory killer daemon
- cgroup memory pressure notification, but it is introduced in kernel 3.10
- Invokes LMK by ourselves whenever we are going to launch an application
- Will we need some rules for killing, e.g., not to kill background music
application?
- Others worth checking
- Swap token, prohibits a process from page frame reclaiming for a period
Assignee | ||
Comment 11•11 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #10)
> - Avoid swapping by killing background application earlier
> - LMK actually is not triggered when the min_free level is reached. It is a
> slab shrinker, which is called after kernel has tried hard to reclaim pages
I was wrong, LMK is called quite frequently, but usually return earlier because 1) nr_to_scan == 0, or 2) NR_FILE_PAGES - NR_SHMEM - total_swapcache_pages is larger than maximum min_free.
Comment 12•11 years ago
|
||
So there are two ways to go in general:
1. Reduce the working set.
1.1 Kill the apps in comment 10.
1.2 We may as well suspend/sigstop the background apps. Compared to suspension, to kill an app releases it's dirty pages. Note that the benefit differs only by the compressed size provided that the overhead of zram is acceptable. Or to combine them; Killing is the last resort.
1.3 De-prioritize background apps so that they are unable to compete for precious frames. We may also make the message queue in chrome-process aware of process priority.
2. Improve IO
2.1 Improve the flash performance in comment 10.
2.2 Just in case, to make sure the filesystem is correctly aligned to the physical boundary that the flash controller can access efficiently.
2.3 A compressed filesystem to trade cpu times for less IO.
Comment 13•11 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #10)
> - cgroup memory pressure notification, but it is introduced in kernel 3.10
Just a note on this, we've experimented with cgroups in the past and even with kernels that supported it we run into very annoying bugs so we decided not to use it at the time. From what I could see the cgroups code in most of our kernels contains some Android-specific modifications compared to the Linux mainline kernel and those seem to have made it quite fragile.
Assignee | ||
Comment 14•11 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #13)
> Just a note on this, we've experimented with cgroups in the past and even
> with kernels that supported it we run into very annoying bugs so we decided
> not to use it at the time. From what I could see the cgroups code in most of
> our kernels contains some Android-specific modifications compared to the
> Linux mainline kernel and those seem to have made it quite fragile.
Gabriele, thank you for the information.
Comment 15•11 years ago
|
||
This wip patch implements a strategy that is as follows:
- when an app goes into the background, we add it to a list of killable app. The homescreen and apps that have the PERCEIVABLE_BACKGROUND priority don't make it in this list.
- when an app goes into the foreground, we try to kill all current background apps except the one that is moving to the foreground.
That doesn't seem too bad on a tarako. I've been playing a bit with launching apps while the music player is active in the background and while we still swap sometimes I did not experience long freezes.
Still to do: make that work properly with activities, ie. ensure we don't kill the activity caller and also maybe iac users (not sure if iac matters for 1.3 as the important user is the music app that we don't kill for other reasons).
The implementation is a quick hack, but if some of you could test it and also feel it's an improvement I'll clean it up (and add a pref to only use that on low memory devices).
Comment 16•11 years ago
|
||
I found that IsPreallocated() was still true on the first time we set an app the FOREGROUND priority because of the order of the test in MaybeTakePreallocatedAppProcess. This patch changes that, and that prevents killing the preallocated app which was sometimes happening with the previous patch.
Attachment #8380088 -
Attachment is obsolete: true
Assignee | ||
Comment 17•11 years ago
|
||
(In reply to Fabrice Desré [:fabrice] from comment #16)
> The implementation is a quick hack, but if some of you could test it and
> also feel it's an improvement I'll clean it up (and add a pref to only use
> that on low memory devices).
Thank you Fabrice, I just passed this information to Gonk team, see if they can include the patch in today's build and have more people try it out. I will also try it today.
Comment 18•11 years ago
|
||
(In reply to Fabrice Desré [:fabrice] from comment #15)
> Created attachment 8380088 [details] [diff] [review]
> kill-bg-apps.patch
>
> This wip patch implements a strategy that is as follows:
> - when an app goes into the background, we add it to a list of killable app.
> The homescreen and apps that have the PERCEIVABLE_BACKGROUND priority don't
> make it in this list.
> - when an app goes into the foreground, we try to kill all current
> background apps except the one that is moving to the foreground.
There are some concerns about "kill all current background apps except the one that is moving to the foreground."
We tested with playing music at background, then send SMS with attachment, with your patch, the music playback sometimes will be killed. It impacted user experience.
How about kill few apps with priority rather than kill all current background apps except one?
Comment 19•11 years ago
|
||
This patch should kill the music app (and was not for me) when it's playing music since it has then the BACKGROUND_PERCEIVABLE level and not BACKGROUND.
Comment 20•11 years ago
|
||
(In reply to Fabrice Desré [:fabrice] from comment #19)
> This patch should kill the music app (and was not for me) when it's playing
should *not* kill...
Comment 21•11 years ago
|
||
Hm, I think I know why you could hit that. I'll post a new patch tomorrow.
Updated•11 years ago
|
Attachment #8380375 -
Attachment is obsolete: true
Comment 22•11 years ago
|
||
(In reply to Fabrice Desré [:fabrice] from comment #15)
> Still to do: make that work properly with activities, ie. ensure we don't
> kill the activity caller and also maybe iac users (not sure if iac matters
> for 1.3 as the important user is the music app that we don't kill for other
> reasons).
Currently that's not a problem because apps invoking inline activities are still considered in the foreground. Until we fix bug 892371 that's still going to be the case. (And yes, I've wanted to fix that bug for a long time but still haven't got enough spare time to do so.)
Assignee | ||
Comment 23•11 years ago
|
||
Thinker prefers to improve the code above gonk, which benefits not only Tarako but also other projects. So I'll leave improving NAND read as the lowest priority. Now he'd like to test followings to see if it prevents system from freezing:
a. decrease the priority of either b2g or foreground content process
b. suspend background applications
Short summary: Both does not help, no matter it's done before or when system freezes.
The steps I used to reproduce the system freeze with reference-workload-light:
1. play music in background
2. launch browser
3. launch message app, enter BIG-THREAD-MMS, scroll up/down rapidly, and press
home key when the message is loading
For a), I renice either b2g or foreground content process to 19. For b), I suspended homescreen, music, and browser.
Comment 24•11 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #22)
> (In reply to Fabrice Desré [:fabrice] from comment #15)
> > Still to do: make that work properly with activities, ie. ensure we don't
> > kill the activity caller and also maybe iac users (not sure if iac matters
> > for 1.3 as the important user is the music app that we don't kill for other
> > reasons).
>
> Currently that's not a problem because apps invoking inline activities are
> still considered in the foreground. Until we fix bug 892371 that's still
> going to be the case. (And yes, I've wanted to fix that bug for a long time
> but still haven't got enough spare time to do so.)
It's very much a problem for non-inline activities, and we need to support them. I'll take another look at 892371 today.
Comment 25•11 years ago
|
||
New patch - I realized I could just reuse the BackgroundProcessLRUPool to track background apps instead of re-implementing that. I also added a pref to easily turn that on/off since we won't need that on all devices.
Assignee | ||
Comment 26•11 years ago
|
||
Fabrice, last time you asked me would it be helpful if letting application load less things. Could you please point me out where is the code about this? Thank you.
Flags: needinfo?(fabrice)
Comment 27•11 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #26)
> Fabrice, last time you asked me would it be helpful if letting application
> load less things. Could you please point me out where is the code about
> this? Thank you.
I don't remember what this was about :(
I've seen very low i/o performance when loading resources from app packages (that happens in nsJARChannel.cpp). Maybe we trash more because we don't limit the number of simultaneous loads. That may be worthwhile to try having a limited pool of jar channels.
Flags: needinfo?(fabrice)
Assignee | ||
Comment 28•11 years ago
|
||
I added a log into yaffs_readpage_nolock() with serial number, filename, and offset. Then tried to reproduce the system freeze by:
1. play music in background
2. launch browser and back to home screen
3. launch dialer and enter call log, scroll a bit and select a contact
4. press message button at contact's page
5. press "Message" to launch keyboard
Used shell command to capture dmesg:
$ while true; do adb shell dmesg -c >> dmesg.out; sleep 1; done
The system freezed during 1190.0 ~ 1410.5, from serial number there were 212966 yaffs_readpage_nolock() invocations, 96868 of them were captured:
63591 /system/b2g/libxul.so
6623 /system/lib/libMali.so
5088 /system/lib/libc.so
4025 /system/bin/toolbox
2102 /system/b2g/libmozglue.so
2020 /system/b2g/libnss3.so
1800 /system/bin/mksh
1434 /system/lib/libcutils.so
1379 /system/lib/libm.so
767 /system/bin/linker
685 /system/lib/libui.so
549 /system/lib/libEGL.so
460 /system/b2g/omni.ja
434 /system/lib/libutils.so
433 /system/b2g/b2g
375 /system/lib/libstlport.so
365 /data/local/webapps/keyboard.gaiamobile.org/application.zip
323 /system/lib/hw/audio.primary.sc8810.so
305 /data/local/storage/persistent/chrome/idb/3406066227csotncta.sqlite
299 /system/lib/libusbhost.so
269 /data/local/webapps/sms.gaiamobile.org/application.zip
260 /system/lib/liblog.so
236 /system/bin/slog
201 /system/lib/libstdc++.so
174 /system/lib/libUMP.so
157 /system/lib/libreference-ril_sp.so
150 /system/lib/libstagefright.so
146 /system/lib/libbinder.so
142 /system/lib/libmedia.so
130 /system/fonts/FiraSansOT-Regular.otf
124 /system/lib/hw/gralloc.sc8810.so
123 /data/local/storage/persistent/chrome/idb/2588645841ssegtnti.sqlite
117 /system/lib/hw/sensors.sc8810.so
111 /system/bin/phoneserver_2sim
98 /system/lib/libril_sp.so
96 /system/lib/libwilhelm.so
95 /data/local/storage/persistent/chrome/idb/226660312ssm.sqlite
91 /system/lib/libaudioflinger.so
86 /data/local/webapps/system.gaiamobile.org/application.zip
75 /system/fonts/FiraSansOT-Light.otf
75 /system/bin/eng_rf_nv_update
72 /system/lib/libGLESv1_CM.so
66 /system/bin/engpcclient
57 /system/lib/libpixelflinger.so
56 /system/b2g/libfreebl3.so
54 /system/lib/libGLESv2.so
54 /system/lib/egl/libGLESv2_mali.so
47 /system/lib/egl/libEGL_mali.so
40 /system/fonts/FiraSansOT-Medium.otf
39 /system/bin/vold
37 /system/lib/libstagefright_sprd_mp3dec.so
36 /system/bin/netd
32 /system/bin/vcharged
32 /system/b2g/libsoftokn3.so
27 /system/lib/libstagefright_omx.so
27 /system/bin/b2g-info
24 /system/lib/libstagefright_foundation.so
21 /system/bin/rild_sp
14 /system/lib/libsysutils.so
14 /system/lib/hw/audio_policy.default.so
14 /system/bin/rilproxy
12 /system/lib/libhardware_legacy.so
8 /data/local/tmp/mozilla-temp-337349264
6 /system/lib/libmediaplayerservice.so
6 /data/local/storage/persistent/8+f+app+++music.gaiamobile.org/idb/598554340MFe2d%icaiDsBu%m2F.sqlite
5 /data/local/storage/persistent/chrome/idb/2588645841ssegtnti.sqlite-journal
4 /system/bin/usbmond
3 /system/bin/servicemanager
2 /system/bin/b2g-ps
2 /sys/system/b2g/libxul.so
1 /system/bin/oom-msg-logger
1 /system/b2g/plugin-container
1 /data/local/tmp/etilqs_oX0DRIUiwhiDL1z
1 /data/local/storage/persistent/chrome/idb/3406/system/b2g/libxul.so
Assignee | ||
Comment 29•11 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #28)
> 63591 /system/b2g/libxul.so
Check http://goo.gl/3zZt6L for the number of times yaffs_readpage_nolock() reads libxul.so for each offset. The distribution is diffuse.
BTW, I couldn't map offset 0x00085000 ~ 0x00229000 to any section of libxul.so, still not quite sure how to resolve them to symbols.
Assignee | ||
Comment 30•11 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #29)
> BTW, I couldn't map offset 0x00085000 ~ 0x00229000 to any section of
> libxul.so, still not quite sure how to resolve them to symbols.
It maps to text section, I mixed up the address and offset.
Assignee | ||
Comment 31•11 years ago
|
||
Thinker & Ting-Yuan are thinking to let linker reorder the correlated functions of libxul.so to have them stayed closer. And then I found PGO (bug 970175), I will try to enable it and see how does it go.
Assignee | ||
Comment 32•11 years ago
|
||
I noticed nsIFrame.SchedulePaint() triggers RefreshDriver.ScheduleViewManagerFlush() at contact's profile page, even after pressing the message button and Messages app shows up. Not sure if it causes the freeze of comment 28.
Assignee | ||
Comment 33•11 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #32)
> I noticed nsIFrame.SchedulePaint() triggers
> RefreshDriver.ScheduleViewManagerFlush() at contact's profile page, even
> after pressing the message button and Messages app shows up. Not sure if it
> causes the freeze of comment 28.
Created bug 978712 for this.
Updated•11 years ago
|
Attachment #8380962 -
Flags: review?(cyu)
Comment 34•11 years ago
|
||
triage: 1.3T+, this is needed for tarako to avoid sudden lockup of device
blocking-b2g: 1.3T? → 1.3T+
Assignee | ||
Comment 35•11 years ago
|
||
Modified gonk-misc/Android.mk to enable PGO:
- $(MAKE) -C $(GECKO_PATH) -f client.mk -s && \
+ $(MAKE) -C $(GECKO_PATH) -f client.mk -s profiledbuild && \
but both arm-eabi-4.4.3 and arm-eabi-4.6 ran into:
{standard input}: Assembler messages:
{standard input}:112: Error: lo register required -- `cbz r9,2f'
{standard input}:202: Error: lo register required -- `cbz r9,2f'
make[7]: *** [/home/ting/w/fxos/tarako-v13t/objdir-gecko/security/nss/lib/freebl/mpi_arm.o] Error 1
Checked CFLAGS, there's only one difference from non-PGO build: "-fprofile-generage", is still trying to work around it.
Comment 36•11 years ago
|
||
Comment on attachment 8380962 [details] [diff] [review]
bg-apps-bye.patch
Review of attachment 8380962 [details] [diff] [review]:
-----------------------------------------------------------------
I remove the review flag because I don't think this is the route we should take, even we might finally decide to do this. If we set the OOM parameters and process priority correctly, we should able achieve the same or similar result. We should strive to fix the problem that background apps still do something (e.g. perform GC/CC) that causes swapping of the foreground app, exhausting the file cache and the like and cause such problem. Killing background apps should be used as the last resort.
::: b2g/app/b2g.js
@@ +620,1 @@
>
Shouldn't we set it to false and only make it true on 1.3T instead?
::: dom/ipc/ProcessPriorityManager.cpp
@@ +1320,5 @@
> + for (uint32_t i = 0; i < killed.Length(); i++) {
> + ContentParent* parent = killed[i];
> + LOG("About to force kill pid: %d", parent->Pid());
> + RemoveFromBackgroundLRUPool(parent);
> + unused << kill(parent->Pid(), SIGKILL);
We shouldn't kill() the process with SIGKILL in high-level code like ProcessPriorityManager. Use parent->KillHard() if we really need to.
Attachment #8380962 -
Flags: review?(cyu)
Comment 37•11 years ago
|
||
I don't disagree that we need a better solution Cervantes, but we have a deadline in less than 2 weeks so it may be time to cut corners to get a stable build we can test. I'll post a new patch fixing your comment.
Comment 38•11 years ago
|
||
This patch doesn't handle activities because both the caller and the activity have foreground priority. If we still have music playing in background, there is no one to kill from the background LRU pool. In this case we can still have system freeze. We need to think what we can do for it:
1. kill the background music
2. kill the activity
3. kill the activity caller
If we all agree that long system freeze (e.g. more than 5 seconds) is the worst and needs to be avoided, then we need to kill someone. For such case we might as well extend the background LRU pool and kill the one with lowest priority and least used.
Assignee | ||
Comment 39•11 years ago
|
||
Determine system status per attachment 8378843 [details], and attachment 8378844 [details]:
- going freezed: both pswpin & pswpout grows over 800 for 2 seconds
- freezed: cpu system usage over 80 for 3 seconds
Assignee | ||
Comment 40•11 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #35)
> {standard input}: Assembler messages:
> {standard input}:112: Error: lo register required -- `cbz r9,2f'
> {standard input}:202: Error: lo register required -- `cbz r9,2f'
The system image for generating profile comes out after following modifications, but it is too big to download to Tarako (189.1 MB).
diff --git a/security/nss/lib/freebl/Makefile b/security/nss/lib/freebl/Makefile
index 0d293f1..d30a910 100644
--- a/security/nss/lib/freebl/Makefile
+++ b/security/nss/lib/freebl/Makefile
@@ -201,7 +201,7 @@ ifeq ($(CPU_ARCH),x86)
#ECL_USE_FP = 1
endif
ifeq ($(CPU_ARCH),arm)
- DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE
+# DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE
DEFINES += -DMP_USE_UINT_DIGIT
DEFINES += -DSHA_NO_LONG_LONG # avoid 64-bit arithmetic in SHA512
MPI_SRCS += mpi_arm.c
diff --git a/client.mk b/client.mk
index 561a294..9dfe936 100644
--- a/client.mk
+++ b/client.mk
@@ -243,10 +243,10 @@ endif
profiledbuild::
$(MAKE) -f $(TOPSRCDIR)/client.mk realbuild MOZ_PROFILE_GENERATE=1 MOZ_P
$(MAKE) -C $(PGO_OBJDIR) package MOZ_PGO_INSTRUMENTED=1 MOZ_INTERNAL_SIG
- rm -f ${PGO_OBJDIR}/jarlog/en-US.log
- MOZ_PGO_INSTRUMENTED=1 JARLOG_FILE=jarlog/en-US.log EXTRA_TEST_ARGS=10 $
- $(MAKE) -f $(TOPSRCDIR)/client.mk maybe_clobber_profiledbuild
- $(MAKE) -f $(TOPSRCDIR)/client.mk realbuild MOZ_PROFILE_USE=1
+# rm -f ${PGO_OBJDIR}/jarlog/en-US.log
+# MOZ_PGO_INSTRUMENTED=1 JARLOG_FILE=jarlog/en-US.log EXTRA_TEST_ARGS=10 $
+# $(MAKE) -f $(TOPSRCDIR)/client.mk maybe_clobber_profiledbuild
+# $(MAKE) -f $(TOPSRCDIR)/client.mk realbuild MOZ_PROFILE_USE=1
#####################################################
# Build date unification
Assignee | ||
Comment 41•11 years ago
|
||
Tried to enlarge min_free for LMK, which Ting-Yuan thinks it'd be helpful if we can keep ~20MB free+cache most of time. Meanwhile, Thinker prefers to keep as many background apps as possible (still with good user experience), so the test was done without Fabrice's patch (attachment 8380962 [details] [diff] [review]).
The test results:
http://goo.gl/gh1JUS
- Killing Homescreen doesn't feel all right, since I can see nothing but
wallpaper for 1~2 seconds when back to home screen.
- Smaller min_free earns us just one extra background app, which I prefer
smoothness than that.
Comment 42•11 years ago
|
||
When you run this kind of tests, can you also add app launch time (just turn on the time to load option in the Settings app) ?
I wrote this (admittedly aggressive) background app kill patch because I noticed that even if we kept apps around, it was in many cases slower to move them back into the foreground than to start them from scratch. This means that the overall user experience was actually way better, and this is the goal here.
Assignee | ||
Comment 43•11 years ago
|
||
I (In reply to Fabrice Desré [:fabrice] from comment #42)
> When you run this kind of tests, can you also add app launch time (just turn
> on the time to load option in the Settings app) ?
I see.
> I wrote this (admittedly aggressive) background app kill patch because I
> noticed that even if we kept apps around, it was in many cases slower to
> move them back into the foreground than to start them from scratch. This
> means that the overall user experience was actually way better, and this is
> the goal here.
Yes, this is also my understanding. Just Thinker'd like to know how many background apps can be there with larger min_free, so I need to test it without your patch.
Comment 44•11 years ago
|
||
Ting, since you have been working on this, assign to you for now. please reassign if you are not working on this. thanks
Assignee: nobody → tchou
Assignee | ||
Comment 45•11 years ago
|
||
There were discussions about killing application:
a) Extend background LRU to kill not just background app
b) Polling conditions may cause thrashing, and kill beforehand
c) Tuning LMK, probably more than one configuration for different situations
Assignee | ||
Comment 46•11 years ago
|
||
Hardly run into system freeze with this configuration today:
adb shell "echo 256,512,1024,4608,4608,6144 > /sys/module/lowmemorykiller/parameters/minfree"
oom_adj min_free
0 1024 KB
1 2048 KB
2 4096 KB
6 18432 KB
8 18432 KB
10 24576 KB
The cases I tested were:
a) Start message activity from a contact's details page while playing music,
b) Launch Gallery with 50 images while playing music, and
c) Launch Marketplace while playing music
The drawback is Homescreen app does not recover fast enough, which I can see only wallpaper when back to home screen.
The tests were done without Fabrice's patch attachment 8380962 [details] [diff] [review], if the patch is included, min_free for oom_adj 10 can be a small number such as 1024 KB to let LMK never select a victim from oom_adj 10 (which there won't be anyone with Fabrice's patch).
Comment 47•11 years ago
|
||
Ting-Yu, what were the app startup times with your latests parameters? (if you can test both without and with my patch that would be awesome).
Assignee | ||
Comment 48•11 years ago
|
||
(In reply to Fabrice Desré [:fabrice] from comment #47)
> Ting-Yu, what were the app startup times with your latests parameters? (if
> you can test both without and with my patch that would be awesome).
Sure, then do you have any particular scenarios/applications would like to test?
Comment 49•11 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #48)
> (In reply to Fabrice Desré [:fabrice] from comment #47)
> > Ting-Yu, what were the app startup times with your latests parameters? (if
> > you can test both without and with my patch that would be awesome).
>
> Sure, then do you have any particular scenarios/applications would like to
> test?
Not really. Whatever is covered by QA smoketests should be good.
Comment 50•11 years ago
|
||
Moving this to Runtime to get it out of the General component. This should probably go into a new "Performance" component.
Component: General → Runtime
Assignee | ||
Comment 51•11 years ago
|
||
Fabrice, the smoketests are quite simple, so I tried to get loading time with different scenarios.
Assignee | ||
Comment 52•11 years ago
|
||
Fixed attachment content-type.
Attachment #8389727 -
Attachment is obsolete: true
Comment 53•11 years ago
|
||
I am working on an active killer similar to fabrice's background killer. It's less aggressive and it can also kill non-background apps. This should act as our last resort and complement LMK parameter tuning.
Updated•11 years ago
|
Component: Runtime → Performance
Assignee | ||
Comment 54•11 years ago
|
||
We just had a team discussion, the conclusion is:
- Will try to detect thrashing and kill application accordingly, which is a
backup as we assume LMK can not handle all the cases, such as bug 973596.
Currently there're two proposals for the detection: a) timer skew, and b)
iowait.
- The patch of bug 972130 and LMK min_free changes of bug 974308, comment 46
will be included from next daily build. Will test it see if we still meet
system freeze.
I am assigned to check whether (a) and (b) can be used as an indicator for bug
973596.
Assignee | ||
Comment 55•11 years ago
|
||
From the test results, neither timer skew nor iowait are suitable indicator for the case of bug 973596. Better confirm is it thrashing at first.
Comment 56•11 years ago
|
||
What can I do on my side?
Updated•11 years ago
|
Status: NEW → ASSIGNED
Priority: -- → P1
Whiteboard: [c=memory p= s= u=tarako]
Comment 57•11 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #55)
> Created attachment 8392767 [details]
> incoming_call_timer_skew_&_iowait
>
> From the test results, neither timer skew nor iowait are suitable indicator
> for the case of bug 973596. Better confirm is it thrashing at first.
How about page fault/context switch/user space CPU time?
Assignee | ||
Comment 58•11 years ago
|
||
Monitor system status for incoming call, also Gallery launching for comparison.
Comment 59•11 years ago
|
||
NI on :joe to help follow-up on next steps here as discussed.
Flags: needinfo?(jcheng)
Assignee | ||
Comment 60•11 years ago
|
||
James, could you please merge the patch http://goo.gl/SjBj7r which enlarges the min_free of oom_adj 6, 8, and 10?
Flags: needinfo?(james.zhang)
Comment 61•11 years ago
|
||
triage: minus for now
if we have concrete plans on how to move forward with this bug, please renom for 1.3T? thanks
blocking-b2g: 1.3T+ → -
Flags: needinfo?(jcheng)
Comment 62•11 years ago
|
||
Done.
commit b8ee760324a8b9d714fa761bed2b3ad823f8e9a5
Author: james.zhang <james.zhang@spreadtrum.com>
Date: Tue Mar 25 21:53:03 2014 +0800
Bug#269156 merge patch from mozilla, Bug 974308 - [tarako] Study memory thrashing and find out a proper way to handle it.
[bug number ]
[root cause ]
[changes ]
[side effects]
[self test ] mozilla test
[reviewers ]
Change-Id: I5f4673c7c27d88d8669eb3f497af6fd05472752b
Flags: needinfo?(james.zhang)
Assignee | ||
Comment 63•11 years ago
|
||
After killing background applications, there're still chances run into thrashing when multiple web activities are requested. In that case, is there anything else we can do except killing foreground application?
Comment 64•11 years ago
|
||
music app is very easy to be killed by:
pref("hal.processPriorityManager.gonk.BACKGROUND.KillUnderKB", 24576);
Reporter | ||
Comment 65•11 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #63)
> After killing background applications, there're still chances run into
> thrashing when multiple web activities are requested. In that case, is there
> anything else we can do except killing foreground application?
The detection of thrashing is a complementary of OOM. It fixes the problem of OOM killer killing too late for some situation, and inflexible.
Assignee | ||
Comment 66•11 years ago
|
||
Tested multiple web activites Contacts -> SMS -> Gallery with bug 982491 applied, it was thrashing while picking image, note there's no background app.
Is it possible not to keep all the applications on the activity chain in foreground? For instance:
1. Contacts -> SMS
2. Contacts moves to background and may be killed
3. SMS -> Gallery
4. SMS moves to background and may be killed
5. An image is picked, post result to SMS and launch it if it has been killed
6. SMS done, launch Contacts if it has been killed
# There were two processes Communications and Gallery, seems Gallery is not grouped.
Comment 67•11 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #66)
> Is it possible not to keep all the applications on the activity chain in
> foreground? For instance:
>
> 1. Contacts -> SMS
> 2. Contacts moves to background and may be killed
> 3. SMS -> Gallery
We could have this if we fix bug 892371.
> 4. SMS moves to background and may be killed
> 5. An image is picked, post result to SMS and launch it if it has been
> killed
Re-starting SMS after it has been killed is impossible as there's no way to save and restore the last state the app was in. What we can hope is that sending the contacts app to the background is enough to have the SMS app and gallery alive during the activity.
Assignee | ||
Comment 68•11 years ago
|
||
I will come out a daemon detecting thrashing for the reason Thinker commented at comment 65, but still I am not sure how should it handle the case when there're only foreground applications. Killing seems not a good idea, user may never finish her task.
Reporter | ||
Comment 69•11 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #67)
> Re-starting SMS after it has been killed is impossible as there's no way to
> save and restore the last state the app was in. What we can hope is that
> sending the contacts app to the background is enough to have the SMS app and
> gallery alive during the activity.
I have raised a discussion of session restoring at dev-b2g mailing list. I am not sure how far it is from implementation, but it could solve the problem of restoring state of the app.
Assignee | ||
Comment 70•11 years ago
|
||
Created a thrashing detector based on pswpin/pswpout of /proc/vmstat, it determines thrashing if the system is swapping in/out more than 2MB each for 3 seconds continuously (per experimental data). For now it kills only background application if thrashing is detected, which does not help for the case of web activities chain.
It selects victim the same as LMK, maximum RSS and swap consumption with largest oom_adj.
Assignee | ||
Comment 71•11 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #67)
> Re-starting SMS after it has been killed is impossible as there's no way to
> save and restore the last state the app was in. What we can hope is that
> sending the contacts app to the background is enough to have the SMS app and
> gallery alive during the activity.
As web activities can be chained differently, guess it'd be difficult to handle by case.
Note there're different test cases for chaining web activities (see bug 972669, comment 32), I am worried how to pass them.
Assignee | ||
Comment 72•11 years ago
|
||
Some minor updates and print not only selected process id but also its name before killing.
Attachment #8399295 -
Attachment is obsolete: true
Updated•11 years ago
|
status-b2g-v1.3T:
--- → affected
Comment 73•11 years ago
|
||
I'm intend to modify watermark of memory zone in kernel, to advance the wakeup of kswapd and increase the work load of it.
So the kswapd would start to work after free memory was lower than WMARK_LOW,and would not stop until free memroy was higher than WMARK_HIGH.
I think it may reduce the thrashing, But I don't know how to test it.
ffos@ffos2:~/yingxu/6821/kernel$ git diff HEAD~1
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2500ec4..9f92ddc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5184,8 +5184,8 @@ void setup_per_zone_wmarks(void)
zone->watermark[WMARK_MIN] = tmp;
}
- zone->watermark[WMARK_LOW] = min_wmark_pages(zone) + (tmp >> 2);
- zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + (tmp >> 1);
+ zone->watermark[WMARK_LOW] = min_wmark_pages(zone) + (tmp);
+ zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + (tmp*2);
setup_zone_migrate_reserve(zone);
spin_unlock_irqrestore(&zone->lock, flags);
}
Comment 74•11 years ago
|
||
current LMK setting: music app in background can survive longer till 8172 KB.
notify_trigger 14336 KB
oom_adj min_free
0 1024 KB
1 2048 KB
2 4096 KB
6 8172 KB
8 16384 KB
10 18432 KB
commit 3945e13d8165bc6e0a5d865c6f12ce63954d910c
Author: ying.xu <ying.xu@spreadtrum.com>
Date: Sat Apr 5 13:31:09 2014 +0800
Bug#245770 change the LMK parameters
[self test ] kill process OK
Change-Id: I04043c0911463652bd927be028eafe28a5cd243c
Updated•11 years ago
|
Priority: P1 → P3
Updated•9 years ago
|
tracking-b2g:
--- → -
Comment 75•9 years ago
|
||
I don't think we'll ever touch a Tarako again so I'm closing this. Please reopen if you think this is still somehow relevant.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•