974308 - [tarako] Study memory thrashing and find out a proper way to handle it.

Reporter

Description

•

11 years ago

This bug is to study memory thrashing of 128MB devices, and find out a way to improve the response time. The tarako would be totally blocked for a few seconds. It is some kind of memory thrashing. The solution can be changing parameters of OOM killer or a customized thrashing aware killer.

Ting-Yu Chou [:ting] (away)

Assignee

Comment 1

•

11 years ago

Attached file http://goo.gl/fNPxny (obsolete) — Details

Modified vmstat a bit to gather the system information we're intrested. Will collect yaffs numbers as well later.

Ting-Yu Chou [:ting] (away)

Assignee

Updated

•

11 years ago

Attachment #8378186 - Attachment description: the spreadsheet of collected system information → http://goo.gl/fNPxny

Ting-Yu Chou [:ting] (away)

Assignee

Updated

•

11 years ago

Attachment #8378186 - Attachment mime type: text/plain → text/url

Ting-Yu Chou [:ting] (away)

Assignee

Updated

•

11 years ago

Attachment #8378186 - Attachment mime type: text/url → text/x-url

Joe Cheng [:jcheng] (please needinfo)

Updated

•

11 years ago

blocking-b2g: --- → 1.3T?

Ting-Yu Chou [:ting] (away)

Assignee

Updated

•

11 years ago

Attachment #8378186 - Attachment mime type: text/x-url → application/x-url

Ting-Yu Chou [:ting] (away)

Assignee

Comment 2

•

11 years ago

Attached file high_sys_load#1 (obsolete) — Details

Attachment #8378186 - Attachment is obsolete: true

Thinker Li [:sinker]

Reporter

Comment 3

•

11 years ago

Hi Ting, what the number is responsible for response time of the device? I mean how do I know the device is blocked or not at a given period?

Thinker Li [:sinker]

Reporter

Comment 4

•

11 years ago

Could you also explain what these fields meaning?

Ting-Yu Chou [:ting] (away)

Assignee

Comment 5

•

11 years ago

Comment on attachment 8378206 [details] high_sys_load#1 running = /procs/stat procs_running blocked = /procs/stat procs_blocked free = /proc/meminfo MemFree cached = /proc/meminfo Cached mapped = /proc/meminfo Mapped anonomous = /proc/meminfo AnonPages zram = /sys/block/zram0/mem_used_total major fault = /proc/vmstat pgmajfault swap in = /proc/vmstat pswpin swap out = /proc/vmstat pswpout us = /proc/stat cpu user ni = /proc/stat cpu nice sy = /proc/stat cpu system id = /proc/stat cpu idle wa = /proc/stat cpu iowait ir = /proc/stat cpu irq si = /proc/stat cpu softirq For the attachment 8378206 [details], system loading is always around 99, which is not responding. I will add yaffs numbers and iterate more times to understand the status before entering high system loading.

Attachment #8378206 - Attachment description: spreadsheet_collected-stats → high_sys_load#1

Attachment #8378206 - Attachment filename: spreadsheet.html → high_sys_load#1.html

Ting-Yu Chou [:ting] (away)

Assignee

Comment 6

•

11 years ago

Attached file high_sys_load#2 (obsolete) — Details

Recorded another log, added two numbers: yaffs_n_page_reads = summation of /proc/yaffs n_page_reads yaffs_n_page_writes = summation of /proc/yaffs n_page_writes and dropped 2: cpu_irq cpu_softirq System loading is showed as stepped area at right vertical axis. Be noted system loading goes up when yaffs_n_page_reads increases.

Ting-Yu Chou [:ting] (away)

Assignee

Comment 7

•

11 years ago

Attached file high_system_load#1 — Details

Updated link to the chart.

Attachment #8378206 - Attachment is obsolete: true

Ting-Yu Chou [:ting] (away)

Assignee

Comment 8

•

11 years ago

Attached file high_system_load#2 — Details

Attachment #8378815 - Attachment is obsolete: true

Ting-Yu Chou [:ting] (away)

Assignee

Comment 9

•

11 years ago

Attached file high_system_load#3 — Details

Captured another log, the high load started from tapping message button in a contact. I then tried to edit message and pressed keyboard several times.

Ting-Yu Chou [:ting] (away)

Assignee

Comment 10

•

11 years ago

I am thinking: - Seek chances to improve NAND read performance - Read ahead and utilize chip's feature to read with data cache - bug 944457, comment 211 shows reading smaller file requires 200~400us for reading a page (2KB), but from NAND chip's spec, it takes 25us maximum - This will be a system wide improvement if we can make any - Avoid swapping by killing background application earlier - LMK actually is not triggered when the min_free level is reached. It is a slab shrinker, which is called after kernel has tried hard to reclaim pages - Possible options: - ulmkd, a user space low memory killer daemon - cgroup memory pressure notification, but it is introduced in kernel 3.10 - Invokes LMK by ourselves whenever we are going to launch an application - Will we need some rules for killing, e.g., not to kill background music application? - Others worth checking - Swap token, prohibits a process from page frame reclaiming for a period

Ting-Yu Chou [:ting] (away)

Assignee

Comment 11

•

11 years ago

(In reply to Ting-Yu Chou [:ting] from comment #10) > - Avoid swapping by killing background application earlier > - LMK actually is not triggered when the min_free level is reached. It is a > slab shrinker, which is called after kernel has tried hard to reclaim pages I was wrong, LMK is called quite frequently, but usually return earlier because 1) nr_to_scan == 0, or 2) NR_FILE_PAGES - NR_SHMEM - total_swapcache_pages is larger than maximum min_free.

Ting-Yuan Huang

Comment 12

•

11 years ago

So there are two ways to go in general: 1. Reduce the working set. 1.1 Kill the apps in comment 10. 1.2 We may as well suspend/sigstop the background apps. Compared to suspension, to kill an app releases it's dirty pages. Note that the benefit differs only by the compressed size provided that the overhead of zram is acceptable. Or to combine them; Killing is the last resort. 1.3 De-prioritize background apps so that they are unable to compete for precious frames. We may also make the message queue in chrome-process aware of process priority. 2. Improve IO 2.1 Improve the flash performance in comment 10. 2.2 Just in case, to make sure the filesystem is correctly aligned to the physical boundary that the flash controller can access efficiently. 2.3 A compressed filesystem to trade cpu times for less IO.

Gabriele Svelto [:gsvelto]

Comment 13

•

11 years ago

(In reply to Ting-Yu Chou [:ting] from comment #10) > - cgroup memory pressure notification, but it is introduced in kernel 3.10 Just a note on this, we've experimented with cgroups in the past and even with kernels that supported it we run into very annoying bugs so we decided not to use it at the time. From what I could see the cgroups code in most of our kernels contains some Android-specific modifications compared to the Linux mainline kernel and those seem to have made it quite fragile.

Ting-Yu Chou [:ting] (away)

Assignee

Comment 14

•

11 years ago

(In reply to Gabriele Svelto [:gsvelto] from comment #13) > Just a note on this, we've experimented with cgroups in the past and even > with kernels that supported it we run into very annoying bugs so we decided > not to use it at the time. From what I could see the cgroups code in most of > our kernels contains some Android-specific modifications compared to the > Linux mainline kernel and those seem to have made it quite fragile. Gabriele, thank you for the information.

[:fabrice] Fabrice Desré

Comment 15

•

11 years ago

Attached patch kill-bg-apps.patch (obsolete) — Details — Splinter Review

This wip patch implements a strategy that is as follows: - when an app goes into the background, we add it to a list of killable app. The homescreen and apps that have the PERCEIVABLE_BACKGROUND priority don't make it in this list. - when an app goes into the foreground, we try to kill all current background apps except the one that is moving to the foreground. That doesn't seem too bad on a tarako. I've been playing a bit with launching apps while the music player is active in the background and while we still swap sometimes I did not experience long freezes. Still to do: make that work properly with activities, ie. ensure we don't kill the activity caller and also maybe iac users (not sure if iac matters for 1.3 as the important user is the music app that we don't kill for other reasons). The implementation is a quick hack, but if some of you could test it and also feel it's an improvement I'll clean it up (and add a pref to only use that on low memory devices).

[:fabrice] Fabrice Desré

Comment 16

•

11 years ago

Attached patch kill-bg-apps.patch (obsolete) — Details — Splinter Review

I found that IsPreallocated() was still true on the first time we set an app the FOREGROUND priority because of the order of the test in MaybeTakePreallocatedAppProcess. This patch changes that, and that prevents killing the preallocated app which was sometimes happening with the previous patch.

Attachment #8380088 - Attachment is obsolete: true

Ting-Yu Chou [:ting] (away)

Assignee

Comment 17

•

11 years ago

(In reply to Fabrice Desré [:fabrice] from comment #16) > The implementation is a quick hack, but if some of you could test it and > also feel it's an improvement I'll clean it up (and add a pref to only use > that on low memory devices). Thank you Fabrice, I just passed this information to Gonk team, see if they can include the patch in today's build and have more people try it out. I will also try it today.

Danny Liang [:dliang]

Comment 18

•

11 years ago

(In reply to Fabrice Desré [:fabrice] from comment #15) > Created attachment 8380088 [details] [diff] [review] > kill-bg-apps.patch > > This wip patch implements a strategy that is as follows: > - when an app goes into the background, we add it to a list of killable app. > The homescreen and apps that have the PERCEIVABLE_BACKGROUND priority don't > make it in this list. > - when an app goes into the foreground, we try to kill all current > background apps except the one that is moving to the foreground. There are some concerns about "kill all current background apps except the one that is moving to the foreground." We tested with playing music at background, then send SMS with attachment, with your patch, the music playback sometimes will be killed. It impacted user experience. How about kill few apps with priority rather than kill all current background apps except one?

[:fabrice] Fabrice Desré

Comment 19

•

11 years ago

This patch should kill the music app (and was not for me) when it's playing music since it has then the BACKGROUND_PERCEIVABLE level and not BACKGROUND.

[:fabrice] Fabrice Desré

Comment 20

•

11 years ago

(In reply to Fabrice Desré [:fabrice] from comment #19) > This patch should kill the music app (and was not for me) when it's playing should *not* kill...

[:fabrice] Fabrice Desré

Comment 21

•

11 years ago

Hm, I think I know why you could hit that. I'll post a new patch tomorrow.

[:fabrice] Fabrice Desré

Updated

•

11 years ago

Attachment #8380375 - Attachment is obsolete: true

Gabriele Svelto [:gsvelto]

Comment 22

•

11 years ago

(In reply to Fabrice Desré [:fabrice] from comment #15) > Still to do: make that work properly with activities, ie. ensure we don't > kill the activity caller and also maybe iac users (not sure if iac matters > for 1.3 as the important user is the music app that we don't kill for other > reasons). Currently that's not a problem because apps invoking inline activities are still considered in the foreground. Until we fix bug 892371 that's still going to be the case. (And yes, I've wanted to fix that bug for a long time but still haven't got enough spare time to do so.)

Ting-Yu Chou [:ting] (away)

Assignee

Comment 23

•

11 years ago

Thinker prefers to improve the code above gonk, which benefits not only Tarako but also other projects. So I'll leave improving NAND read as the lowest priority. Now he'd like to test followings to see if it prevents system from freezing: a. decrease the priority of either b2g or foreground content process b. suspend background applications Short summary: Both does not help, no matter it's done before or when system freezes. The steps I used to reproduce the system freeze with reference-workload-light: 1. play music in background 2. launch browser 3. launch message app, enter BIG-THREAD-MMS, scroll up/down rapidly, and press home key when the message is loading For a), I renice either b2g or foreground content process to 19. For b), I suspended homescreen, music, and browser.

[:fabrice] Fabrice Desré

Comment 24

•

11 years ago

(In reply to Gabriele Svelto [:gsvelto] from comment #22) > (In reply to Fabrice Desré [:fabrice] from comment #15) > > Still to do: make that work properly with activities, ie. ensure we don't > > kill the activity caller and also maybe iac users (not sure if iac matters > > for 1.3 as the important user is the music app that we don't kill for other > > reasons). > > Currently that's not a problem because apps invoking inline activities are > still considered in the foreground. Until we fix bug 892371 that's still > going to be the case. (And yes, I've wanted to fix that bug for a long time > but still haven't got enough spare time to do so.) It's very much a problem for non-inline activities, and we need to support them. I'll take another look at 892371 today.

[:fabrice] Fabrice Desré

Comment 25

•

11 years ago

Attached patch bg-apps-bye.patch — Details — Splinter Review

New patch - I realized I could just reuse the BackgroundProcessLRUPool to track background apps instead of re-implementing that. I also added a pref to easily turn that on/off since we won't need that on all devices.

Ting-Yu Chou [:ting] (away)

Assignee

Comment 26

•

11 years ago

Fabrice, last time you asked me would it be helpful if letting application load less things. Could you please point me out where is the code about this? Thank you.

Flags: needinfo?(fabrice)

[:fabrice] Fabrice Desré

Comment 27

•

11 years ago

(In reply to Ting-Yu Chou [:ting] from comment #26) > Fabrice, last time you asked me would it be helpful if letting application > load less things. Could you please point me out where is the code about > this? Thank you. I don't remember what this was about :( I've seen very low i/o performance when loading resources from app packages (that happens in nsJARChannel.cpp). Maybe we trash more because we don't limit the number of simultaneous loads. That may be worthwhile to try having a limited pool of jar channels.

Flags: needinfo?(fabrice)

Ting-Yu Chou [:ting] (away)

Assignee

Comment 28

•

11 years ago

Attached file yaffs_readpage.txt.tar.gz — Details

I added a log into yaffs_readpage_nolock() with serial number, filename, and offset. Then tried to reproduce the system freeze by: 1. play music in background 2. launch browser and back to home screen 3. launch dialer and enter call log, scroll a bit and select a contact 4. press message button at contact's page 5. press "Message" to launch keyboard Used shell command to capture dmesg: $ while true; do adb shell dmesg -c >> dmesg.out; sleep 1; done The system freezed during 1190.0 ~ 1410.5, from serial number there were 212966 yaffs_readpage_nolock() invocations, 96868 of them were captured: 63591 /system/b2g/libxul.so 6623 /system/lib/libMali.so 5088 /system/lib/libc.so 4025 /system/bin/toolbox 2102 /system/b2g/libmozglue.so 2020 /system/b2g/libnss3.so 1800 /system/bin/mksh 1434 /system/lib/libcutils.so 1379 /system/lib/libm.so 767 /system/bin/linker 685 /system/lib/libui.so 549 /system/lib/libEGL.so 460 /system/b2g/omni.ja 434 /system/lib/libutils.so 433 /system/b2g/b2g 375 /system/lib/libstlport.so 365 /data/local/webapps/keyboard.gaiamobile.org/application.zip 323 /system/lib/hw/audio.primary.sc8810.so 305 /data/local/storage/persistent/chrome/idb/3406066227csotncta.sqlite 299 /system/lib/libusbhost.so 269 /data/local/webapps/sms.gaiamobile.org/application.zip 260 /system/lib/liblog.so 236 /system/bin/slog 201 /system/lib/libstdc++.so 174 /system/lib/libUMP.so 157 /system/lib/libreference-ril_sp.so 150 /system/lib/libstagefright.so 146 /system/lib/libbinder.so 142 /system/lib/libmedia.so 130 /system/fonts/FiraSansOT-Regular.otf 124 /system/lib/hw/gralloc.sc8810.so 123 /data/local/storage/persistent/chrome/idb/2588645841ssegtnti.sqlite 117 /system/lib/hw/sensors.sc8810.so 111 /system/bin/phoneserver_2sim 98 /system/lib/libril_sp.so 96 /system/lib/libwilhelm.so 95 /data/local/storage/persistent/chrome/idb/226660312ssm.sqlite 91 /system/lib/libaudioflinger.so 86 /data/local/webapps/system.gaiamobile.org/application.zip 75 /system/fonts/FiraSansOT-Light.otf 75 /system/bin/eng_rf_nv_update 72 /system/lib/libGLESv1_CM.so 66 /system/bin/engpcclient 57 /system/lib/libpixelflinger.so 56 /system/b2g/libfreebl3.so 54 /system/lib/libGLESv2.so 54 /system/lib/egl/libGLESv2_mali.so 47 /system/lib/egl/libEGL_mali.so 40 /system/fonts/FiraSansOT-Medium.otf 39 /system/bin/vold 37 /system/lib/libstagefright_sprd_mp3dec.so 36 /system/bin/netd 32 /system/bin/vcharged 32 /system/b2g/libsoftokn3.so 27 /system/lib/libstagefright_omx.so 27 /system/bin/b2g-info 24 /system/lib/libstagefright_foundation.so 21 /system/bin/rild_sp 14 /system/lib/libsysutils.so 14 /system/lib/hw/audio_policy.default.so 14 /system/bin/rilproxy 12 /system/lib/libhardware_legacy.so 8 /data/local/tmp/mozilla-temp-337349264 6 /system/lib/libmediaplayerservice.so 6 /data/local/storage/persistent/8+f+app+++music.gaiamobile.org/idb/598554340MFe2d%icaiDsBu%m2F.sqlite 5 /data/local/storage/persistent/chrome/idb/2588645841ssegtnti.sqlite-journal 4 /system/bin/usbmond 3 /system/bin/servicemanager 2 /system/bin/b2g-ps 2 /sys/system/b2g/libxul.so 1 /system/bin/oom-msg-logger 1 /system/b2g/plugin-container 1 /data/local/tmp/etilqs_oX0DRIUiwhiDL1z 1 /data/local/storage/persistent/chrome/idb/3406/system/b2g/libxul.so

Ting-Yu Chou [:ting] (away)

Assignee

Comment 29

•

11 years ago

(In reply to Ting-Yu Chou [:ting] from comment #28) > 63591 /system/b2g/libxul.so Check http://goo.gl/3zZt6L for the number of times yaffs_readpage_nolock() reads libxul.so for each offset. The distribution is diffuse. BTW, I couldn't map offset 0x00085000 ~ 0x00229000 to any section of libxul.so, still not quite sure how to resolve them to symbols.

Ting-Yu Chou [:ting] (away)

Assignee

Comment 30

•

11 years ago

(In reply to Ting-Yu Chou [:ting] from comment #29) > BTW, I couldn't map offset 0x00085000 ~ 0x00229000 to any section of > libxul.so, still not quite sure how to resolve them to symbols. It maps to text section, I mixed up the address and offset.

Ting-Yu Chou [:ting] (away)

Assignee

Comment 31

•

11 years ago

Thinker & Ting-Yuan are thinking to let linker reorder the correlated functions of libxul.so to have them stayed closer. And then I found PGO (bug 970175), I will try to enable it and see how does it go.

Ting-Yuan Huang

Updated

•

11 years ago

Comment 32

•

11 years ago

I noticed nsIFrame.SchedulePaint() triggers RefreshDriver.ScheduleViewManagerFlush() at contact's profile page, even after pressing the message button and Messages app shows up. Not sure if it causes the freeze of comment 28.

Ting-Yu Chou [:ting] (away)

Assignee

Comment 33

•

11 years ago

(In reply to Ting-Yu Chou [:ting] from comment #32) > I noticed nsIFrame.SchedulePaint() triggers > RefreshDriver.ScheduleViewManagerFlush() at contact's profile page, even > after pressing the message button and Messages app shows up. Not sure if it > causes the freeze of comment 28. Created bug 978712 for this.

[:fabrice] Fabrice Desré

Updated

•

11 years ago

Attachment #8380962 - Flags: review?(cyu)

Joe Cheng [:jcheng] (please needinfo)

Comment 34

•

11 years ago

triage: 1.3T+, this is needed for tarako to avoid sudden lockup of device

blocking-b2g: 1.3T? → 1.3T+

Ting-Yu Chou [:ting] (away)

Assignee

Comment 35

•

11 years ago

Modified gonk-misc/Android.mk to enable PGO: - $(MAKE) -C $(GECKO_PATH) -f client.mk -s && \ + $(MAKE) -C $(GECKO_PATH) -f client.mk -s profiledbuild && \ but both arm-eabi-4.4.3 and arm-eabi-4.6 ran into: {standard input}: Assembler messages: {standard input}:112: Error: lo register required -- `cbz r9,2f' {standard input}:202: Error: lo register required -- `cbz r9,2f' make[7]: *** [/home/ting/w/fxos/tarako-v13t/objdir-gecko/security/nss/lib/freebl/mpi_arm.o] Error 1 Checked CFLAGS, there's only one difference from non-PGO build: "-fprofile-generage", is still trying to work around it.

Cervantes Yu [:cyu] [:cervantes]

Comment 36

•

11 years ago

Comment on attachment 8380962 [details] [diff] [review] bg-apps-bye.patch Review of attachment 8380962 [details] [diff] [review]: ----------------------------------------------------------------- I remove the review flag because I don't think this is the route we should take, even we might finally decide to do this. If we set the OOM parameters and process priority correctly, we should able achieve the same or similar result. We should strive to fix the problem that background apps still do something (e.g. perform GC/CC) that causes swapping of the foreground app, exhausting the file cache and the like and cause such problem. Killing background apps should be used as the last resort. ::: b2g/app/b2g.js @@ +620,1 @@ > Shouldn't we set it to false and only make it true on 1.3T instead? ::: dom/ipc/ProcessPriorityManager.cpp @@ +1320,5 @@ > + for (uint32_t i = 0; i < killed.Length(); i++) { > + ContentParent* parent = killed[i]; > + LOG("About to force kill pid: %d", parent->Pid()); > + RemoveFromBackgroundLRUPool(parent); > + unused << kill(parent->Pid(), SIGKILL); We shouldn't kill() the process with SIGKILL in high-level code like ProcessPriorityManager. Use parent->KillHard() if we really need to.

Attachment #8380962 - Flags: review?(cyu)

[:fabrice] Fabrice Desré

Comment 37

•

11 years ago

I don't disagree that we need a better solution Cervantes, but we have a deadline in less than 2 weeks so it may be time to cut corners to get a stable build we can test. I'll post a new patch fixing your comment.

Cervantes Yu [:cyu] [:cervantes]

Comment 38

•

11 years ago

This patch doesn't handle activities because both the caller and the activity have foreground priority. If we still have music playing in background, there is no one to kill from the background LRU pool. In this case we can still have system freeze. We need to think what we can do for it: 1. kill the background music 2. kill the activity 3. kill the activity caller If we all agree that long system freeze (e.g. more than 5 seconds) is the worst and needs to be avoided, then we need to kill someone. For such case we might as well extend the background LRU pool and kill the one with lowest priority and least used.

Ting-Yu Chou [:ting] (away)

Assignee

Comment 39

•

11 years ago

Determine system status per attachment 8378843 [details], and attachment 8378844 [details]: - going freezed: both pswpin & pswpout grows over 800 for 2 seconds - freezed: cpu system usage over 80 for 3 seconds

Ting-Yu Chou [:ting] (away)

Assignee

Comment 40

•

11 years ago

(In reply to Ting-Yu Chou [:ting] from comment #35) > {standard input}: Assembler messages: > {standard input}:112: Error: lo register required -- `cbz r9,2f' > {standard input}:202: Error: lo register required -- `cbz r9,2f' The system image for generating profile comes out after following modifications, but it is too big to download to Tarako (189.1 MB). diff --git a/security/nss/lib/freebl/Makefile b/security/nss/lib/freebl/Makefile index 0d293f1..d30a910 100644 --- a/security/nss/lib/freebl/Makefile +++ b/security/nss/lib/freebl/Makefile @@ -201,7 +201,7 @@ ifeq ($(CPU_ARCH),x86) #ECL_USE_FP = 1 endif ifeq ($(CPU_ARCH),arm) - DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE +# DEFINES += -DMP_ASSEMBLY_MULTIPLY -DMP_ASSEMBLY_SQUARE DEFINES += -DMP_USE_UINT_DIGIT DEFINES += -DSHA_NO_LONG_LONG # avoid 64-bit arithmetic in SHA512 MPI_SRCS += mpi_arm.c diff --git a/client.mk b/client.mk index 561a294..9dfe936 100644 --- a/client.mk +++ b/client.mk @@ -243,10 +243,10 @@ endif profiledbuild:: $(MAKE) -f $(TOPSRCDIR)/client.mk realbuild MOZ_PROFILE_GENERATE=1 MOZ_P $(MAKE) -C $(PGO_OBJDIR) package MOZ_PGO_INSTRUMENTED=1 MOZ_INTERNAL_SIG - rm -f ${PGO_OBJDIR}/jarlog/en-US.log - MOZ_PGO_INSTRUMENTED=1 JARLOG_FILE=jarlog/en-US.log EXTRA_TEST_ARGS=10 $ - $(MAKE) -f $(TOPSRCDIR)/client.mk maybe_clobber_profiledbuild - $(MAKE) -f $(TOPSRCDIR)/client.mk realbuild MOZ_PROFILE_USE=1 +# rm -f ${PGO_OBJDIR}/jarlog/en-US.log +# MOZ_PGO_INSTRUMENTED=1 JARLOG_FILE=jarlog/en-US.log EXTRA_TEST_ARGS=10 $ +# $(MAKE) -f $(TOPSRCDIR)/client.mk maybe_clobber_profiledbuild +# $(MAKE) -f $(TOPSRCDIR)/client.mk realbuild MOZ_PROFILE_USE=1 ##################################################### # Build date unification

Ting-Yu Chou [:ting] (away)

Assignee

Comment 41

•

11 years ago

Tried to enlarge min_free for LMK, which Ting-Yuan thinks it'd be helpful if we can keep ~20MB free+cache most of time. Meanwhile, Thinker prefers to keep as many background apps as possible (still with good user experience), so the test was done without Fabrice's patch (attachment 8380962 [details] [diff] [review]). The test results: http://goo.gl/gh1JUS - Killing Homescreen doesn't feel all right, since I can see nothing but wallpaper for 1~2 seconds when back to home screen. - Smaller min_free earns us just one extra background app, which I prefer smoothness than that.

[:fabrice] Fabrice Desré

Comment 42

•

11 years ago

When you run this kind of tests, can you also add app launch time (just turn on the time to load option in the Settings app) ? I wrote this (admittedly aggressive) background app kill patch because I noticed that even if we kept apps around, it was in many cases slower to move them back into the foreground than to start them from scratch. This means that the overall user experience was actually way better, and this is the goal here.

Ting-Yu Chou [:ting] (away)

Assignee

Comment 43

•

11 years ago

I (In reply to Fabrice Desré [:fabrice] from comment #42) > When you run this kind of tests, can you also add app launch time (just turn > on the time to load option in the Settings app) ? I see. > I wrote this (admittedly aggressive) background app kill patch because I > noticed that even if we kept apps around, it was in many cases slower to > move them back into the foreground than to start them from scratch. This > means that the overall user experience was actually way better, and this is > the goal here. Yes, this is also my understanding. Just Thinker'd like to know how many background apps can be there with larger min_free, so I need to test it without your patch.

Ting-Yu Chou [:ting] (away)

Assignee

Updated

•

11 years ago

Updated

•

11 years ago

Comment 44

•

11 years ago

Ting, since you have been working on this, assign to you for now. please reassign if you are not working on this. thanks

Assignee: nobody → tchou

Ting-Yu Chou [:ting] (away)

Assignee

Comment 45

•

11 years ago

There were discussions about killing application: a) Extend background LRU to kill not just background app b) Polling conditions may cause thrashing, and kill beforehand c) Tuning LMK, probably more than one configuration for different situations

Ting-Yu Chou [:ting] (away)

Assignee

Comment 46

•

11 years ago

Hardly run into system freeze with this configuration today: adb shell "echo 256,512,1024,4608,4608,6144 > /sys/module/lowmemorykiller/parameters/minfree" oom_adj min_free 0 1024 KB 1 2048 KB 2 4096 KB 6 18432 KB 8 18432 KB 10 24576 KB The cases I tested were: a) Start message activity from a contact's details page while playing music, b) Launch Gallery with 50 images while playing music, and c) Launch Marketplace while playing music The drawback is Homescreen app does not recover fast enough, which I can see only wallpaper when back to home screen. The tests were done without Fabrice's patch attachment 8380962 [details] [diff] [review], if the patch is included, min_free for oom_adj 10 can be a small number such as 1024 KB to let LMK never select a victim from oom_adj 10 (which there won't be anyone with Fabrice's patch).

[:fabrice] Fabrice Desré

Comment 47

•

11 years ago

Ting-Yu, what were the app startup times with your latests parameters? (if you can test both without and with my patch that would be awesome).

Ting-Yu Chou [:ting] (away)

Assignee

Comment 48

•

11 years ago

(In reply to Fabrice Desré [:fabrice] from comment #47) > Ting-Yu, what were the app startup times with your latests parameters? (if > you can test both without and with my patch that would be awesome). Sure, then do you have any particular scenarios/applications would like to test?

[:fabrice] Fabrice Desré

Comment 49

•

11 years ago

(In reply to Ting-Yu Chou [:ting] from comment #48) > (In reply to Fabrice Desré [:fabrice] from comment #47) > > Ting-Yu, what were the app startup times with your latests parameters? (if > > you can test both without and with my patch that would be awesome). > > Sure, then do you have any particular scenarios/applications would like to > test? Not really. Whatever is covered by QA smoketests should be good.

Andreas Gal :gal

Comment 50

•

11 years ago

Moving this to Runtime to get it out of the General component. This should probably go into a new "Performance" component.

Component: General → Runtime

Ting-Yu Chou [:ting] (away)

Assignee

Comment 51

•

11 years ago

Attached file time_to_load (obsolete) — Details

Fabrice, the smoketests are quite simple, so I tried to get loading time with different scenarios.

Ting-Yu Chou [:ting] (away)

Assignee

Comment 52

•

11 years ago

Attached file time_to_load — Details

Fixed attachment content-type.

Attachment #8389727 - Attachment is obsolete: true

Cervantes Yu [:cyu] [:cervantes]

Comment 53

•

11 years ago

I am working on an active killer similar to fabrice's background killer. It's less aggressive and it can also kill non-background apps. This should act as our last resort and complement LMK parameter tuning.

[:fabrice] Fabrice Desré

Updated

•

11 years ago

Component: Runtime → Performance

Ting-Yu Chou [:ting] (away)

Assignee

Comment 54

•

11 years ago

We just had a team discussion, the conclusion is: - Will try to detect thrashing and kill application accordingly, which is a backup as we assume LMK can not handle all the cases, such as bug 973596. Currently there're two proposals for the detection: a) timer skew, and b) iowait. - The patch of bug 972130 and LMK min_free changes of bug 974308, comment 46 will be included from next daily build. Will test it see if we still meet system freeze. I am assigned to check whether (a) and (b) can be used as an indicator for bug 973596.

Naoki Hirata :nhirata (please use needinfo instead of cc)

Updated

•

11 years ago

Keywords: perf

Ting-Yu Chou [:ting] (away)

Assignee

Comment 55

•

11 years ago

Attached file incoming_call_timer_skew_&_iowait — Details

From the test results, neither timer skew nor iowait are suitable indicator for the case of bug 973596. Better confirm is it thrashing at first.

James Zhang (Spreadtrum)

Comment 56

•

11 years ago

What can I do on my side?

Mike Lee [:mlee]

Updated

•

11 years ago

Status: NEW → ASSIGNED

Priority: -- → P1

Whiteboard: [c=memory p= s= u=tarako]

ying.xu

Comment 57

•

11 years ago

(In reply to Ting-Yu Chou [:ting] from comment #55) > Created attachment 8392767 [details] > incoming_call_timer_skew_&_iowait > > From the test results, neither timer skew nor iowait are suitable indicator > for the case of bug 973596. Better confirm is it thrashing at first. How about page fault/context switch/user space CPU time?

Ting-Yu Chou [:ting] (away)

Assignee

Comment 58

•

11 years ago

Attached file incoming_call_monitoring — Details

Monitor system status for incoming call, also Gallery launching for comparison.

bhavana bajaj [:bajaj]

Comment 59

•

11 years ago

NI on :joe to help follow-up on next steps here as discussed.

Flags: needinfo?(jcheng)

Ting-Yu Chou [:ting] (away)

Assignee

Comment 60

•

11 years ago

James, could you please merge the patch http://goo.gl/SjBj7r which enlarges the min_free of oom_adj 6, 8, and 10?

Flags: needinfo?(james.zhang)

Joe Cheng [:jcheng] (please needinfo)

Comment 61

•

11 years ago

triage: minus for now if we have concrete plans on how to move forward with this bug, please renom for 1.3T? thanks

blocking-b2g: 1.3T+ → -

Flags: needinfo?(jcheng)

James Zhang (Spreadtrum)

Comment 62

•

11 years ago

Done. commit b8ee760324a8b9d714fa761bed2b3ad823f8e9a5 Author: james.zhang <james.zhang@spreadtrum.com> Date: Tue Mar 25 21:53:03 2014 +0800 Bug#269156 merge patch from mozilla, Bug 974308 - [tarako] Study memory thrashing and find out a proper way to handle it. [bug number ] [root cause ] [changes ] [side effects] [self test ] mozilla test [reviewers ] Change-Id: I5f4673c7c27d88d8669eb3f497af6fd05472752b

Flags: needinfo?(james.zhang)

Ting-Yu Chou [:ting] (away)

Assignee

Comment 63

•

11 years ago

After killing background applications, there're still chances run into thrashing when multiple web activities are requested. In that case, is there anything else we can do except killing foreground application?

thomas tsai

Comment 64

•

11 years ago

music app is very easy to be killed by: pref("hal.processPriorityManager.gonk.BACKGROUND.KillUnderKB", 24576);

thomas tsai

Updated

•

11 years ago

Depends on: 988164

Thinker Li [:sinker]

Reporter

Comment 65

•

11 years ago

(In reply to Ting-Yu Chou [:ting] from comment #63) > After killing background applications, there're still chances run into > thrashing when multiple web activities are requested. In that case, is there > anything else we can do except killing foreground application? The detection of thrashing is a complementary of OOM. It fixes the problem of OOM killer killing too late for some situation, and inflexible.

Ting-Yu Chou [:ting] (away)

Assignee

Comment 66

•

11 years ago

Tested multiple web activites Contacts -> SMS -> Gallery with bug 982491 applied, it was thrashing while picking image, note there's no background app. Is it possible not to keep all the applications on the activity chain in foreground? For instance: 1. Contacts -> SMS 2. Contacts moves to background and may be killed 3. SMS -> Gallery 4. SMS moves to background and may be killed 5. An image is picked, post result to SMS and launch it if it has been killed 6. SMS done, launch Contacts if it has been killed # There were two processes Communications and Gallery, seems Gallery is not grouped.

Gabriele Svelto [:gsvelto]

Comment 67

•

11 years ago

(In reply to Ting-Yu Chou [:ting] from comment #66) > Is it possible not to keep all the applications on the activity chain in > foreground? For instance: > > 1. Contacts -> SMS > 2. Contacts moves to background and may be killed > 3. SMS -> Gallery We could have this if we fix bug 892371. > 4. SMS moves to background and may be killed > 5. An image is picked, post result to SMS and launch it if it has been > killed Re-starting SMS after it has been killed is impossible as there's no way to save and restore the last state the app was in. What we can hope is that sending the contacts app to the background is enough to have the SMS app and gallery alive during the activity.

Ting-Yu Chou [:ting] (away)

Assignee

Comment 68

•

11 years ago

I will come out a daemon detecting thrashing for the reason Thinker commented at comment 65, but still I am not sure how should it handle the case when there're only foreground applications. Killing seems not a good idea, user may never finish her task.

Thinker Li [:sinker]

Reporter

Comment 69

•

11 years ago

(In reply to Gabriele Svelto [:gsvelto] from comment #67) > Re-starting SMS after it has been killed is impossible as there's no way to > save and restore the last state the app was in. What we can hope is that > sending the contacts app to the background is enough to have the SMS app and > gallery alive during the activity. I have raised a discussion of session restoring at dev-b2g mailing list. I am not sure how far it is from implementation, but it could solve the problem of restoring state of the app.

Ting-Yu Chou [:ting] (away)

Assignee

Comment 70

•

11 years ago

Attached patch thrashdog.patch (obsolete) — Details — Splinter Review

Created a thrashing detector based on pswpin/pswpout of /proc/vmstat, it determines thrashing if the system is swapping in/out more than 2MB each for 3 seconds continuously (per experimental data). For now it kills only background application if thrashing is detected, which does not help for the case of web activities chain. It selects victim the same as LMK, maximum RSS and swap consumption with largest oom_adj.

Ting-Yu Chou [:ting] (away)

Assignee

Comment 71

•

11 years ago

(In reply to Gabriele Svelto [:gsvelto] from comment #67) > Re-starting SMS after it has been killed is impossible as there's no way to > save and restore the last state the app was in. What we can hope is that > sending the contacts app to the background is enough to have the SMS app and > gallery alive during the activity. As web activities can be chained differently, guess it'd be difficult to handle by case. Note there're different test cases for chaining web activities (see bug 972669, comment 32), I am worried how to pass them.

Ting-Yu Chou [:ting] (away)

Assignee

Comment 72

•

11 years ago

Attached patch thrashdog.patch — Details — Splinter Review

Some minor updates and print not only selected process id but also its name before killing.

Attachment #8399295 - Attachment is obsolete: true

Naoki Hirata :nhirata (please use needinfo instead of cc)

Updated

•

11 years ago

status-b2g-v1.3T: --- → affected

ying.xu

Comment 73

•

11 years ago

I'm intend to modify watermark of memory zone in kernel, to advance the wakeup of kswapd and increase the work load of it. So the kswapd would start to work after free memory was lower than WMARK_LOW,and would not stop until free memroy was higher than WMARK_HIGH. I think it may reduce the thrashing, But I don't know how to test it. ffos@ffos2:~/yingxu/6821/kernel$ git diff HEAD~1 diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2500ec4..9f92ddc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5184,8 +5184,8 @@ void setup_per_zone_wmarks(void) zone->watermark[WMARK_MIN] = tmp; } - zone->watermark[WMARK_LOW] = min_wmark_pages(zone) + (tmp >> 2); - zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + (tmp >> 1); + zone->watermark[WMARK_LOW] = min_wmark_pages(zone) + (tmp); + zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + (tmp*2); setup_zone_migrate_reserve(zone); spin_unlock_irqrestore(&zone->lock, flags); }

thomas tsai

Comment 74

•

11 years ago

current LMK setting: music app in background can survive longer till 8172 KB. notify_trigger 14336 KB oom_adj min_free 0 1024 KB 1 2048 KB 2 4096 KB 6 8172 KB 8 16384 KB 10 18432 KB commit 3945e13d8165bc6e0a5d865c6f12ce63954d910c Author: ying.xu <ying.xu@spreadtrum.com> Date: Sat Apr 5 13:31:09 2014 +0800 Bug#245770 change the LMK parameters [self test ] kill process OK Change-Id: I04043c0911463652bd927be028eafe28a5cd243c

:Eli Perelman

Updated

•

11 years ago

Priority: P1 → P3

Bobby Chien

Updated

•

10 years ago

tracking-b2g: --- → -

Gabriele Svelto [:gsvelto]

Comment 75

•

10 years ago

I don't think we'll ever touch a Tarako again so I'm closing this. Please reopen if you think this is still somehow relevant.

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → INCOMPLETE

http://goo.gl/fNPxny 11 years ago Ting-Yu Chou [:ting] (away) 20 bytes, application/x-url		Details
high_sys_load#1 11 years ago Ting-Yu Chou [:ting] (away) 184 bytes, text/html		Details
high_sys_load#2 11 years ago Ting-Yu Chou [:ting] (away) 154 bytes, text/html		Details
high_system_load#1 11 years ago Ting-Yu Chou [:ting] (away) 154 bytes, text/html		Details
high_system_load#2 11 years ago Ting-Yu Chou [:ting] (away) 154 bytes, text/html		Details
high_system_load#3 11 years ago Ting-Yu Chou [:ting] (away) 154 bytes, text/html		Details
kill-bg-apps.patch 11 years ago [:fabrice] Fabrice Desré 3.22 KB, patch		Details \| Diff \| Splinter Review
kill-bg-apps.patch 11 years ago [:fabrice] Fabrice Desré 4.32 KB, patch		Details \| Diff \| Splinter Review
bg-apps-bye.patch 11 years ago [:fabrice] Fabrice Desré 5.62 KB, patch		Details \| Diff \| Splinter Review
yaffs_readpage.txt.tar.gz 11 years ago Ting-Yu Chou [:ting] (away) 1.28 MB, application/gzip		Details
time_to_load 11 years ago Ting-Yu Chou [:ting] (away) 162 bytes, text/plain		Details
time_to_load 11 years ago Ting-Yu Chou [:ting] (away) 154 bytes, text/html		Details
incoming_call_timer_skew_&_iowait 11 years ago Ting-Yu Chou [:ting] (away) 154 bytes, text/html		Details
incoming_call_monitoring 11 years ago Ting-Yu Chou [:ting] (away) 154 bytes, text/html		Details
thrashdog.patch 11 years ago Ting-Yu Chou [:ting] (away) 5.97 KB, patch		Details \| Diff \| Splinter Review
thrashdog.patch 11 years ago Ting-Yu Chou [:ting] (away) 6.17 KB, patch		Details \| Diff \| Splinter Review