Closed
Bug 989572
Opened 10 years ago
Closed 10 years ago
[Tarako] LMK don't work fine, cause kernel out of memory and kill a lot of process and phone no respond
Categories
(Firefox OS Graveyard :: Performance, defect, P1)
Tracking
(blocking-b2g:1.3T+, b2g-v1.3T fixed)
Tracking | Status | |
---|---|---|
b2g-v1.3T | --- | fixed |
People
(Reporter: james.zhang, Assigned: ying.xu, NeedInfo)
References
Details
(Keywords: perf, Whiteboard: [c=memory p= s= u=tarako] [POVB][priority][wait for sprd kernel patch])
Attachments
(8 files)
see kernel log in attachment and the keyword is 'Out of memory: Kill process' 294706/slog_20140327145600_sp6821a_userdebug/external_storage/last_log/20140327142655/kernel/kernel.log 03-27 14:47:16.130 <3>0[ 1181.686917] Out of memory: Kill process 81 (b2g) score 223 or sacrifice child 03-27 14:47:16.130 <3>0[ 1181.686985] Killed process 81 (b2g) total-vm:139004kB, anon-rss:0kB, file-rss:5632kB 03-27 14:46:23.700 <3>0[ 1169.324988] Out of memory: Kill process 1111 (Music) score 260 or sacrifice child 03-27 14:46:23.700 <3>0[ 1169.325055] Killed process 1111 (Music) total-vm:83808kB, anon-rss:528kB, file-rss:4544kB
Reporter | ||
Updated•10 years ago
|
blocking-b2g: --- → 1.3T?
Flags: needinfo?(ttsai)
Flags: needinfo?(styang)
Flags: needinfo?(kkuo)
Flags: needinfo?(gal)
Reporter | ||
Updated•10 years ago
|
Flags: needinfo?(fabrice)
Comment 1•10 years ago
|
||
Hi! Thinker, Could someone of your team member take a look at this? Thanks -- Keven
Flags: needinfo?(ttsai)
Flags: needinfo?(tlee)
Flags: needinfo?(kkuo)
Comment 2•10 years ago
|
||
By comment 0, it seems like some process use too much memory and zRAM work too hard to eat out of memory. How could I read the attached log? It is in an unknown format for me. Do you have STR?
Flags: needinfo?(tlee)
Comment 3•10 years ago
|
||
Could you provide the output of the procrank?
Reporter | ||
Comment 4•10 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #2) > By comment 0, it seems like some process use too much memory and zRAM work > too hard to eat out of memory. How could I read the attached log? It is in > an unknown format for me. Do you have STR? Use edit or vim to ready this log. see 294706/slog_20140327145600_sp6821a_userdebug/external_storage/last_log/20140327142655/kernel/kernel.log A lot of STR, please register and access our bugzilla to check. http://bugzilla.spreadtrum.com/bugzilla/show_bug.cgi?id=294616
Updated•10 years ago
|
Flags: needinfo?(fabrice)
Comment 5•10 years ago
|
||
I have seen similar effects before where one process is allocating a large chunk of memory and isn't return to the event loop quickly enough to handle the incoming memory pressure events.
Flags: needinfo?(gal)
Comment 6•10 years ago
|
||
Flags: needinfo?(styang)
Comment 7•10 years ago
|
||
(In reply to James Zhang from comment #4) > (In reply to Thinker Li [:sinker] from comment #2) > > By comment 0, it seems like some process use too much memory and zRAM work > > too hard to eat out of memory. How could I read the attached log? It is in > > an unknown format for me. Do you have STR? > > Use edit or vim to ready this log. > see > 294706/slog_20140327145600_sp6821a_userdebug/external_storage/last_log/ > 20140327142655/kernel/kernel.log > > > A lot of STR, please register and access our bugzilla to check. > http://bugzilla.spreadtrum.com/bugzilla/show_bug.cgi?id=294616 I'm not able to access the bug, please add me in. Attach kernel log above.
Updated•10 years ago
|
Flags: needinfo?(tlee)
Comment 9•10 years ago
|
||
ting would look into this.
Comment 10•10 years ago
|
||
LMK did work: 03-27 14:39:12.426 <4>0[ 741.897543] select 430 (Music), adj 10, size 6577, to kill 03-27 14:39:12.426 <4>0[ 741.897562] send sigkill to 430 (Music), adj 10, size 6577 03-27 14:42:55.247 <4>0[ 964.722728] select 601 (Gallery), adj 10, size 5837, to kill 03-27 14:42:55.247 <4>0[ 964.722794] send sigkill to 601 (Gallery), adj 10, size 5837 and the very first message from OOM killer is: 03-27 14:45:14.690 4>0[ 1103.027669] engpcclient invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=-16, oom_score_adj=-941 James, how should I reproduce this?
Flags: needinfo?(james.zhang)
Comment 11•10 years ago
|
||
James, sorry, I overlooked the STR at comment 4.
Comment 12•10 years ago
|
||
Slab unreclaimable occupies a lot of memory, grows even to around 87MB: 03-27 14:45:14.690 <4>0[ 1103.029564] ...slab_unreclaimable:41016kB 03-27 14:47:46.980 <4>0[ 1254.463877] ...slab_unreclaimable:89548kB I inspected the number when a) b2g just starts, and b) there're several applications running, it wasn't really effected by zram: a) b2g 1109 1 14.9 0 34.7 38.8 46.6 145.8 0 root (Nuwa) 1167 1109 0.8 0 3.7 7.2 15.1 51.3 0 root Homescreen 1184 1167 2.8 1 10.7 15.8 26.1 68.4 2 app_1184 (Preallocated a 1265 1167 0.6 18 4.7 8.6 17.5 60.2 1 root root@android:/ # cat /sys/block/zram0/mem_used_total 2420736 root@android:/ # cat /proc/meminfo SUnreclaim: 6560 kB b) b2g 83 1 48.4 0 22.1 23.6 30.3 153.6 0 root (Nuwa) 342 83 1.2 0 0.0 0.9 5.2 51.3 0 root Homescreen 372 342 5.6 18 0.6 1.9 9.1 65.5 8 app_372 Music 686 342 10.0 7 2.6 4.0 11.2 68.9 6 app_686 Gallery 724 342 10.1 18 0.5 1.8 8.9 66.5 11 app_724 Settings 833 342 5.8 18 0.6 1.9 9.1 65.4 10 app_833 Communications 944 342 12.6 1 13.8 15.2 22.4 76.1 2 app_944 (Preallocated a 996 342 0.4 18 0.9 1.3 4.0 54.2 1 root root@android:/ # cat /sys/block/zram0/mem_used_total 22597632 root@android:/ # cat /proc/meminfo SUnreclaim: 6616 kB
Reporter | ||
Comment 13•10 years ago
|
||
Does kernel memory leak? Ying, please check it with our kernel guys.
Flags: needinfo?(james.zhang) → needinfo?(ying.xu)
Comment 14•10 years ago
|
||
Ting, zRAM is boosted up from 2.4M to 22.5M. Is it you looking wrong or pasting wrong?
Comment 15•10 years ago
|
||
I copied that directly from console, and I think it does make sense as there were several applicaitons around.
Assignee | ||
Comment 16•10 years ago
|
||
You should see last_msg log From the kernel log, You can find slab_unreclaimable:43348kB and more and more bigger. I suspect there was some memory leak in kernel. I would enable the memory leak detector in kernel and run monkey-test. Hope to reproduce this bug.
Comment 17•10 years ago
|
||
triage: 1.3T+ tracking kernel memory leak bug. [POVB] Ying.xu, assign to you for now as you seem to be working on it. thanks
Assignee: nobody → ying.xu
blocking-b2g: 1.3T? → 1.3T+
Whiteboard: [POVB]
Updated•10 years ago
|
Whiteboard: [POVB] → [POVB][priority]
Updated•10 years ago
|
Severity: critical → blocker
Status: NEW → ASSIGNED
Keywords: perf
Priority: -- → P1
Whiteboard: [POVB][priority] → [c=memory p= s= u=tarako] [POVB][priority]
Reporter | ||
Comment 18•10 years ago
|
||
With Ying's patch. We can't reproduce this issue now. Do mozilla guys meet this issue again ? commit b23e47c77ecbca81b87edf58a7b358871f21e1d0 Author: ying.xu <ying.xu@spreadtrum.com> Date: Sat Mar 29 17:20:42 2014 +0800 Bug#294616 increase proc/sys/vm/min_free_kbytes to 4096 KBytes [self test ] pass Change-Id: Ib7b2bdb52481982c86d63c46bc408fd37f4c567e diff --git a/sp6821a_gonk/init.rc b/sp6821a_gonk/init.rc index 65c315a..5125f64 100644 --- a/sp6821a_gonk/init.rc +++ b/sp6821a_gonk/init.rc @@ -244,6 +244,7 @@ on boot # level system server to be able to adjust the kernel OOM driver # parameters to match how it is managing things. write /proc/sys/vm/overcommit_memory 1 + write /proc/sys/vm/min_free_kbytes 4096 write /proc/sys/vm/min_free_order_shift 4 chown root system /sys/module/lowmemorykiller/parameters/adj chmod 0664 /sys/module/lowmemorykiller/parameters/adj
Reporter | ||
Updated•10 years ago
|
Flags: needinfo?(tlee)
Flags: needinfo?(tchou)
Comment 19•10 years ago
|
||
I tried only 4 or 5 times, and have never reproduced it. BTW, I am not sure is Ying's patch relevant for slab continually growing.
Flags: needinfo?(tchou)
Reporter | ||
Comment 20•10 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #19) > I tried only 4 or 5 times, and have never reproduced it. BTW, I am not sure > is Ying's patch relevant for slab continually growing. But we meet zram leak now. It cause performance issue now.
Updated•10 years ago
|
Flags: needinfo?(tlee)
Comment 21•10 years ago
|
||
(In reply to James Zhang from comment #20) > But we meet zram leak now. It cause performance issue now. Is there a bug about this "zram leak"? From comment 12, even zram consumes memory from 2.4MB to 22.5 MB, but still slab unreclaimable stays around 6.4MB.
Reporter | ||
Comment 22•10 years ago
|
||
This bug is still exist with Ying's minifree patch, see http://bugzilla.spreadtrum.com/bugzilla/show_bug.cgi?id=297942 This bug has 100% reproduce rate now.
Assignee | ||
Comment 23•10 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #10) > LMK did work: > > 03-27 14:39:12.426 <4>0[ 741.897543] select 430 (Music), adj 10, size 6577, > to kill > 03-27 14:39:12.426 <4>0[ 741.897562] send sigkill to 430 (Music), adj 10, > size 6577 > 03-27 14:42:55.247 <4>0[ 964.722728] select 601 (Gallery), adj 10, size > 5837, to kill > 03-27 14:42:55.247 <4>0[ 964.722794] send sigkill to 601 (Gallery), adj 10, > size 5837 > > and the very first message from OOM killer is: > > 03-27 14:45:14.690 4>0[ 1103.027669] engpcclient invoked oom-killer: > gfp_mask=0xd0, order=0, oom_adj=-16, oom_score_adj=-941 > > James, how should I reproduce this? We don't have the STR. But it did happen many times, through various operations
Flags: needinfo?(ying.xu)
Reporter | ||
Comment 24•10 years ago
|
||
Assignee | ||
Comment 25•10 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #19) > I tried only 4 or 5 times, and have never reproduced it. BTW, I am not sure > is Ying's patch relevant for slab continually growing. no,it should have no relation to value of slab_unreclaimable. Because the value of slab_unreclaimable was too high And most of processes were killed by OOM-kill,The value of slab_unreclaimable should be decreased. But it was not, So I suspect there may be memory leak in kernel
Assignee | ||
Comment 26•10 years ago
|
||
After landing these two patches in kernel, I think the LMK should work properly. commit 62841ea1775f84d5b0569ccb45c166944e3f5c90 Author: ying.xu <ying.xu@spreadtrum.com> Date: Sat Apr 5 16:17:15 2014 +0800 Bug#245770 add swap check condition of low memory notify function [self test ] test ok Change-Id: I1d90c3bc508e6fa31773b7ad2ebb02e297237ec2 commit dfca13faa3d0c12cea2115c0007e4f443b91954c Author: ying.xu <ying.xu@spreadtrum.com> Date: Sat Apr 5 13:09:14 2014 +0800 Bug#245770 add swap free memory check function [self test ] kernel run ok, kill process ok Change-Id: I06b8ca31e323ccdf4e5face4a882fc575267ee3e
Reporter | ||
Comment 27•10 years ago
|
||
When run monkey test, I use "adb shell vmstat" to monitor memory status. It has run about 12 hours, I don't find memory leak.
Flags: needinfo?(ttsai)
Comment 28•10 years ago
|
||
current LMK setting: music app in background can survive longer till 8172 KB. notify_trigger 14336 KB oom_adj min_free 0 1024 KB 1 2048 KB 2 4096 KB 6 8172 KB 8 16384 KB 10 18432 KB commit 3945e13d8165bc6e0a5d865c6f12ce63954d910c Author: ying.xu <ying.xu@spreadtrum.com> Date: Sat Apr 5 13:31:09 2014 +0800 Bug#245770 change the LMK parameters [self test ] kill process OK Change-Id: I04043c0911463652bd927be028eafe28a5cd243c
Flags: needinfo?(ttsai)
Comment 29•10 years ago
|
||
Update something abnormal in kernel log: 03-27 14:46:23.700 <3>0[ 1169.324988] Out of memory: Kill process 1111 (Music) score 260 or sacrifice child 03-27 14:46:23.700 <3>0[ 1169.325055] Killed process 1111 (Music) total-vm:83808kB, anon-rss:528kB, file-rss:4544kB 03-27 14:46:23.700 <4>0[ 1170.341033] slog: page allocation failure: order:0, mode:0x20058 ... 03-27 14:46:23.702 <4>0[ 1170.343473] Normal: 4*4kB 26*8kB 33*16kB 15*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1360kB Slog need to allocate 4kB (order 0) memory and there is also enough memory, we have 4 4kB pages and 1360kB free memory. It looks abnormal and then cause OOM killer to kill process. In this kernel log, we found several memory allocation failure but we still have memory: 1. 03-27 14:45:31.470 <4>0[ 1120.033113] sync_supers: page allocation failure: order:0, mode:0x20058 ... 03-27 14:45:31.470 <4>0[ 1120.034977] Normal: 0*4kB 36*8kB 31*16kB 14*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1360kB 2. 03-27 14:45:31.470 <4>0[ 1120.773613] sync_supers: page allocation failure: order:0, mode:0x20058 ... 03-27 14:45:39.043 <4>0[ 1120.775490] Normal: 1*4kB 33*8kB 31*16kB 14*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1340kB 3. 03-27 14:46:23.700 <4>0[ 1170.341033] slog: page allocation failure: order:0, mode:0x20058 ... 03-27 14:46:23.702 <4>0[ 1170.343473] Normal: 4*4kB 26*8kB 33*16kB 15*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1360kB hi James, please have member to check if any leak or memory allocation issue here.
Flags: needinfo?(james.zhang)
Comment 30•10 years ago
|
||
This is interesting: cervantes@cyu-dt:/tmp/294706/slog_20140327145600_sp6821a_userdebug/external_storage/last_log/20140327142655/kernel$ grep "slab pages" kernel.log 03-27 14:45:14.690 <4>0[ 1103.031831] 10543 slab pages 03-27 14:45:31.470 <4>0[ 1120.036695] 10541 slab pages 03-27 14:45:39.043 <4>0[ 1120.777375] 10587 slab pages 03-27 14:46:21.656 <4>0[ 1169.322580] 12717 slab pages 03-27 14:46:23.702 <4>0[ 1170.345162] 12731 slab pages 03-27 14:47:11.158 <4>0[ 1181.377434] 14990 slab pages 03-27 14:47:36.880 <4>0[ 1245.634627] 22606 slab pages 03-27 14:47:39.590 <4>0[ 1246.122733] 22639 slab pages 03-27 14:47:46.980 <4>0[ 1254.466034] 22633 slab pages 03-27 14:47:47.022 <4>0[ 1254.868664] 22693 slab pages 03-27 14:47:56.055 <4>0[ 1264.484709] 22733 slab pages 03-27 14:48:07.690 <4>0[ 1271.103246] 22691 slab pages 03-27 14:48:11.293 <4>0[ 1271.310682] 22640 slab pages 03-27 14:48:12.792 <4>0[ 1271.933731] 22712 slab pages 03-27 14:48:13.248 <4>0[ 1280.293407] 22720 slab pages 03-27 14:48:13.251 <4>0[ 1280.383515] 22777 slab pages 03-27 14:48:13.255 <4>0[ 1280.533541] 22791 slab pages 03-27 14:48:15.810 <4>0[ 1284.782953] 22808 slab pages 03-27 14:48:17.804 <4>0[ 1284.873316] 22815 slab pages 03-27 14:48:28.594 <4>0[ 1297.772646] 22786 slab pages 03-27 14:48:35.397 <4>0[ 1300.436390] 22869 slab pages 03-27 14:48:35.899 <4>0[ 1300.445163] 22869 slab pages 03-27 14:48:48.471 <4>0[ 1317.715951] 22988 slab pages Obviously LMK isn't enough and even we have OOM killer the slab pages continue growing. By the end of this log OOM killer continues to kill other process even all b2g processes are dead: 03-27 14:48:48.471 <4>0[ 1317.714482] Total swap = 65532kB 03-27 14:48:48.471 <4>0[ 1317.715860] 28854 pages of RAM 03-27 14:48:48.471 <4>0[ 1317.715894] 807 free pages 03-27 14:48:48.471 <4>0[ 1317.715921] 1589 reserved pages 03-27 14:48:48.471 <4>0[ 1317.715951] 22988 slab pages 03-27 14:48:48.471 <4>0[ 1317.715978] 113 pages shared 03-27 14:48:48.471 <4>0[ 1317.716005] 0 pages swap cached 03-27 14:48:48.471 <6>0[ 1317.716035] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name 03-27 14:48:48.471 <6>0[ 1317.716110] [ 167] 1000 167 2089 0 0 -16 -941 engpcclient 03-27 14:48:48.471 <6>0[ 1317.716178] [ 169] 1000 169 464 0 0 -16 -941 engservice 03-27 14:48:48.471 <6>0[ 1317.716244] [ 312] 0 312 2856 0 0 -16 -941 slog 03-27 14:48:48.471 <6>0[ 1317.716307] [ 399] 2000 399 818 0 0 -16 -941 tcp 03-27 14:48:48.471 <6>0[ 1317.716369] [ 401] 0 401 2856 0 0 -16 -941 slog 03-27 14:48:48.471 <6>0[ 1317.716433] [ 1379] 0 1379 192 12 0 -16 -941 oom-msg-logger 03-27 14:48:48.471 <6>0[ 1317.716502] [ 1508] 0 1508 1453 0 0 -16 -941 b2g 03-27 14:48:48.471 <6>0[ 1317.716564] [ 1510] 1001 1510 1647 1 0 -16 -941 rild_sp 03-27 14:48:48.471 <6>0[ 1317.716629] [ 1513] 1001 1513 1647 1 0 -16 -941 rild_sp 03-27 14:48:48.471 <6>0[ 1317.716693] [ 1526] 0 1526 75 23 0 -16 -941 ueventd 03-27 14:48:48.471 <6>0[ 1317.716758] [ 1527] 1013 1527 382 0 0 -16 -941 mediaserver 03-27 14:48:48.471 <6>0[ 1317.716825] [ 1528] 1000 1528 207 1 0 -16 -941 servicemanager 03-27 14:48:48.471 <6>0[ 1317.716893] [ 1531] 0 1531 469 0 0 -16 -941 vold 03-27 14:48:48.471 <6>0[ 1317.716956] [ 1534] 1001 1534 205 1 0 -16 -941 rilproxy 03-27 14:48:48.471 <6>0[ 1317.717021] [ 1535] 0 1535 511 0 0 -16 -941 netd 03-27 14:48:48.473 173 0 0 -16 -941 debuggerd 03-27 14:48:48.473 <6>0[ 1317.717148] [ 1537] 1002 1537 329 0 0 -16 -941 dbus-daemon 03-27 14:48:48.473 <6>0[ 1317.717215] [ 1538] 1017 1538 432 0 0 -16 -941 keystore 03-27 14:48:48.473 <6>0[ 1317.717280] [ 1539] 1001 1539 205 1 0 -16 -941 rilproxy 03-27 14:48:48.473 <6>0[ 1317.717345] [ 1540] 0 1540 228 0 0 -16 -941 engmoded 03-27 14:48:48.473 <6>0[ 1317.717410] [ 1543] 0 1543 858 28 0 -16 -941 adbd 03-27 14:48:48.473 <6>0[ 1317.717472] [ 1547] 0 1547 176 0 0 -16 -941 sh 03-27 14:48:48.473 <3>0[ 1317.717531] Out of memory: Kill process 167 (engpcclient) score 1 or sacrifice child This looks like a kernel leak.
Summary: [Tarako] gecko LMK don't work fine, cause kernel out of memory and kill a lot of process and phone no respond → [Tarako] LMK don't work fine, cause kernel out of memory and kill a lot of process and phone no respond
Comment 31•10 years ago
|
||
James, please make sure this is still reproducible, and check if it is a kernel leak.
Flags: needinfo?(james.zhang)
Comment 32•10 years ago
|
||
Do you think it good to close this bug? You could open another bug once you make sure the leaking being fixed, but still have something wrong.
Flags: needinfo?(james.zhang)
Comment 33•10 years ago
|
||
In kernel log from attachment 8398866 [details], we can see -slab pages increase significantly as described comment 30 In kernel log from attachment 8402136 [details], -slab pages is stablely around 18xx Ying, can you confrim if it was fixed by patches in comment 26 and is there still memory leak in kernel?
Flags: needinfo?(ying.xu)
Comment 34•10 years ago
|
||
From another point of veiw, LMK vs OOM. In kernel log from attachment 8398866 [details], -OOM kill a lot of process include vold In kernel log from attachment 8402136 [details], -OOM kill b2g only twice. When b2g's being killed, there are still some processes under running, such as Homescreen, Marketplace and setting etc. Suppose if LMK is working properly, when free memory and cache is lower than min_free, some processes should be killed and OOM will not be triggered. Ying, can you help to check if there is any error in LMK and why OOM is still triggered?
Comment 35•10 years ago
|
||
(In reply to Kai-Zhen Li from comment #34) > From another point of veiw, LMK vs OOM. > > In kernel log from attachment 8398866 [details], > -OOM kill a lot of process include vold > > In kernel log from attachment 8402136 [details], > -OOM kill b2g only twice. > > When b2g's being killed, there are still some processes under running, such > as Homescreen, Marketplace and setting etc. > Suppose if LMK is working properly, when free memory and cache is lower than > min_free, some processes should be killed and OOM will not be triggered. > > > Ying, can you help to check if there is any error in LMK and why OOM is > still triggered? Ying, OOM was triggered when free memory and cache is lower than min_free. Could we set the value as default to see if the issue still happen? Author: ying.xu <ying.xu@spreadtrum.com> Date: Sat Mar 29 17:20:42 2014 +0800 Bug#294616 increase proc/sys/vm/min_free_kbytes to 4096 KBytes [self test ] pass Change-Id: Ib7b2bdb52481982c86d63c46bc408fd37f4c567e
Assignee | ||
Comment 36•10 years ago
|
||
yes We're going to revert these settings(vm, omm_adj, KillUnderKB) to initial value(@MWC). Then try and error.
Flags: needinfo?(ying.xu)
Comment 37•10 years ago
|
||
James, pre-discussed today, please have member to check LMK in kernel to see if any policy or algorithm need to refine. thanks.
Comment 38•10 years ago
|
||
Ying, I had a hack in kernel OOM, do nothing and return when out_of_memory is called and then I run monkey test for 8 hours. I found that b2g and nuwa will not be killed by LMK. This is b2g-info at the end of the test. #b2g-info | megabytes | NAME PID PPID CPU(s) NICE USS PSS RSS VSIZE OOM_ADJ USER b2g 83 1 15191.6 0 23.1 27.3 33.8 172.1 0 root (Nuwa) 356 83 254.4 0 0.2 1.2 3.4 77.6 0 root Communications 9976 356 3188.5 0 4.6 8.3 14.1 109.3 1 app_9976 (Preallocated a 20750 356 0.9 18 5.1 7.3 11.6 80.6 2 root
Reporter | ||
Comment 39•10 years ago
|
||
(In reply to Kai-Zhen Li from comment #38) > Created attachment 8403745 [details] > kernel.log > > Ying, I had a hack in kernel OOM, do nothing and return when out_of_memory > is called and then I run monkey test for 8 hours. I found that b2g and nuwa > will not be killed by LMK. This is b2g-info at the end of the test. > > #b2g-info > | megabytes | > NAME PID PPID CPU(s) NICE USS PSS RSS VSIZE OOM_ADJ USER > b2g 83 1 15191.6 0 23.1 27.3 33.8 172.1 0 root > (Nuwa) 356 83 254.4 0 0.2 1.2 3.4 77.6 0 root > Communications 9976 356 3188.5 0 4.6 8.3 14.1 109.3 1 app_9976 > (Preallocated a 20750 356 0.9 18 5.1 7.3 11.6 80.6 2 root Can you try image.baidu.com with userdebug build? It always cause OOM.
Flags: needinfo?(james.zhang)
Reporter | ||
Comment 40•10 years ago
|
||
We found a little memory leak in kernel now, but it's only 2~3KB, We'll fix memory leak as soon as possible.
Comment 41•10 years ago
|
||
(In reply to James Zhang from comment #39) > (In reply to Kai-Zhen Li from comment #38) > > Created attachment 8403745 [details] > > kernel.log > > > > Ying, I had a hack in kernel OOM, do nothing and return when out_of_memory > > is called and then I run monkey test for 8 hours. I found that b2g and nuwa > > will not be killed by LMK. This is b2g-info at the end of the test. > > > > #b2g-info > > | megabytes | > > NAME PID PPID CPU(s) NICE USS PSS RSS VSIZE OOM_ADJ USER > > b2g 83 1 15191.6 0 23.1 27.3 33.8 172.1 0 root > > (Nuwa) 356 83 254.4 0 0.2 1.2 3.4 77.6 0 root > > Communications 9976 356 3188.5 0 4.6 8.3 14.1 109.3 1 app_9976 > > (Preallocated a 20750 356 0.9 18 5.1 7.3 11.6 80.6 2 root > > Can you try image.baidu.com with userdebug build? > It always cause OOM. Browsing on image.baidu.com with build on 4/8, we don't meet b2g was killed by OOM. The current settings as below: 1. adb shell cat /proc/sys/vm/min_free_kbytes 1352 2. Low-memory killer parameters: notify_trigger 14336 KB oom_adj min_free 0 4096 KB 1 5120 KB 2 6144 KB 6 7168 KB 8 8192 KB 10 20480 KB
Comment 42•10 years ago
|
||
(In reply to Danny Liang [:dliang] from comment #41) > > Browsing on image.baidu.com with build on 4/8, we don't meet b2g was killed > by OOM. > > The current settings as below: > 1. adb shell cat /proc/sys/vm/min_free_kbytes > 1352 I think it is because 1352 is much lower than 4096, so kenrel's OOM is hard to be triggered.
Comment 43•10 years ago
|
||
Should current LMK setting be kept, but lower OOM to less than 1024K? 12 pref("hal.processPriorityManager.gonk.MASTER.KillUnderKB", 1024); 13 14 pref("hal.processPriorityManager.gonk.FOREGROUND_HIGH.KillUnderKB", 2048); 15 16 pref("hal.processPriorityManager.gonk.FOREGROUND.KillUnderKB", 4096); 17 18 pref("hal.processPriorityManager.gonk.BACKGROUND_PERCEIVABLE.KillUnderKB", 8172); 19 20 pref("hal.processPriorityManager.gonk.BACKGROUND_HOMESCREEN.KillUnderKB", 16384); 21 22 pref("hal.processPriorityManager.gonk.BACKGROUND.KillUnderKB", 18432);
Reporter | ||
Comment 44•10 years ago
|
||
(In reply to thomas tsai from comment #43) > Should current LMK setting be kept, but lower OOM to less than 1024K? > 12 pref("hal.processPriorityManager.gonk.MASTER.KillUnderKB", 1024); > 13 > 14 pref("hal.processPriorityManager.gonk.FOREGROUND_HIGH.KillUnderKB", > 2048); > 15 > 16 pref("hal.processPriorityManager.gonk.FOREGROUND.KillUnderKB", 4096); > 17 > 18 > pref("hal.processPriorityManager.gonk.BACKGROUND_PERCEIVABLE.KillUnderKB", > 8172); > 19 > 20 > pref("hal.processPriorityManager.gonk.BACKGROUND_HOMESCREEN.KillUnderKB", > 16384); > 21 > 22 pref("hal.processPriorityManager.gonk.BACKGROUND.KillUnderKB", 18432); The value is mozilla provide before, we have reverted all these changes. And we won't merge any LMK value change if this value without a lot of test cycles.
Flags: needinfo?(ying.xu)
Assignee | ||
Comment 45•10 years ago
|
||
(In reply to thomas tsai from comment #43) > Should current LMK setting be kept, but lower OOM to less than 1024K? > 12 pref("hal.processPriorityManager.gonk.MASTER.KillUnderKB", 1024); > 14 pref("hal.processPriorityManager.gonk.FOREGROUND_HIGH.KillUnderKB",2048); > 16 pref("hal.processPriorityManager.gonk.FOREGROUND.KillUnderKB", 4096); I think We should not lower the value of OOM trigger in kernel,currently the min_free_kbytes is 1352KB, it's a default set by kernel. We can increase the value of KillUnderKB ,at least greater than the value of OMM trigger. We want to keep these values: > pref("hal.processPriorityManager.gonk.BACKGROUND_PERCEIVABLE.KillUnderKB", 8172); > pref("hal.processPriorityManager.gonk.BACKGROUND_HOMESCREEN.KillUnderKB",16384); > pref("hal.processPriorityManager.gonk.BACKGROUND.KillUnderKB", 18432); And update these values > 12 pref("hal.processPriorityManager.gonk.MASTER.KillUnderKB", 1024); > 14 pref("hal.processPriorityManager.gonk.FOREGROUND_HIGH.KillUnderKB",2048); > 16 pref("hal.processPriorityManager.gonk.FOREGROUND.KillUnderKB", 4096); pref("hal.processPriorityManager.gonk.MASTER.KillUnderKB", 4096); pref("hal.processPriorityManager.gonk.FOREGROUND_HIGH.KillUnderKB", 5120); pref("hal.processPriorityManager.gonk.FOREGROUND.KillUnderKB", 6144);
Flags: needinfo?(ying.xu)
Comment 46•10 years ago
|
||
There is still Nuwa was killed by OOM, we met twice in monkey test. 1. Nuwa was killed by OOM but camera (oom_adj=2) application is still alive. 2. Nuwa was killed by OOM but camera (oom_adj=2) and communication (oom_adj=1) are still alive. In these two cases, LMK didn't work due to free+cache is still higher than threshold, but OOM was been trigger due to free memory is only 1360KB. Low-memory killer parameters: notify_trigger 14336 KB oom_adj min_free 0 4096 KB 1 5120 KB 2 6144 KB 6 7168 KB 8 8192 KB 10 20480 KB Moreover, with this LMK setting, homescreen (oom_adj=8) at background won't be killed by LMK due to threshold of oom_adj=8 is 8192KB, it's hard to reach it.
Comment 47•10 years ago
|
||
Comment 48•10 years ago
|
||
Comment on attachment 8405281 [details]
mtlog_monkey_0411.tar.gz
Attached kernel log and memory information.
See Also: → 995860
Comment 49•10 years ago
|
||
I had a monkey test from last night with the following LMK setting and b2g hasn't been killed during the test. --- oom_adj min_free 0 1024 KB 1 2048 KB 2 4096 KB 6 6144 KB 8 16384 KB 10 20480 KB --- But, with this set of values, backgroup process will be killed earlier. I think it'll be better also have anohter test where min_free is 10240KB for oom_adj=8. BTW, I also think "/proc/sys/vm/min_free_kbytes" should be less that min_free of oom_adj=0.
Assignee | ||
Comment 50•10 years ago
|
||
updated log I find oom-killer was involved mostly by userspace process. But LMK was involved mostly by kswapd, sometimes by userspace process. If LMK could be involved when userspace process allocated memory. The problem will gone.
Assignee | ||
Comment 51•10 years ago
|
||
/20140418130229/kernel$ grep -ir 'invoked lowmemorykiller' kernel.log -c 77 /20140418130229/kernel$ grep -ir 'kswapd0 invoked lowmemorykiller' kernel.log -c 65 Here, oom-killer should happen. /20140418130229/kernel$ grep -ir 'invoked oom-killer' kernel.log -c 59 Now,there are 2 suspicions . 1. LMK was not called by memory_shrink at all; 2. LMK was called by memory_shrink, but not meet the condition of kill process. I will add some log to check this.
Assignee | ||
Comment 52•10 years ago
|
||
04-18 22:07:55.506 <4>0[ 3933.657617] kswapd0 invoked lowmemorykiller: gfp_mask=0xd0, oom_adj=0, oom_score_adj=0 04-18 22:07:55.506 <4>0[ 3933.657678] Backtrace: 04-18 22:07:55.506 <4>0[ 3933.657745] [<c4537ad8>] (dump_backtrace+0x0/0x110) from [<c4889f0c>] (dump_stack+0x18/0x1c) 04-18 22:07:55.506 <4>0[ 3933.657813] r7:c70f3f78 r6:000000d0 r5:c15c57a0 r4:c70f2000 04-18 22:07:55.506 <4>0[ 3933.657901] [<c4889ef4>] (dump_stack+0x0/0x1c) from [<c4779804>] (lowmem_shrink+0x460/0x528) 04-18 22:07:55.506 <4>0[ 3933.657978] [<c47793a4>] (lowmem_shrink+0x0/0x528) from [<c45a45c8>] (shrink_slab+0x114/0x1c0) 04-18 22:07:55.506 <4>0[ 3933.658051] [<c45a44b4>] (shrink_slab+0x0/0x1c0) from [<c45a4bfc>] (kswapd+0x588/0x924) 04-18 22:07:55.506 <4>0[ 3933.658122] [<c45a4674>] (kswapd+0x0/0x924) from [<c456b7d0>] (kthread+0x8c/0x94) 04-18 22:07:55.506 <4>0[ 3933.658192] [<c456b744>] (kthread+0x0/0x94) from [<c45547a4>] (do_exit+0x0/0x5fc) 04-18 22:43:39.183 <4>0[ 6077.330405] Compositor invoked lowmemorykiller: gfp_mask=0x286d2, oom_adj=0, oom_score_adj=0 04-18 22:43:39.183 <4>0[ 6077.330417] Backtrace: 04-18 22:43:39.183 <4>0[ 6077.330456] [<c4537ad8>] (dump_backtrace+0x0/0x110) from [<c4889f0c>] (dump_stack+0x18/0x1c) 04-18 22:43:39.183 <4>0[ 6077.330472] r7:c6281d80 r6:000286d2 r5:c59ef440 r4:c6280000 04-18 22:43:39.183 <4>0[ 6077.330510] [<c4889ef4>] (dump_stack+0x0/0x1c) from [<c4779804>] (lowmem_shrink+0x460/0x528) 04-18 22:43:39.183 <4>0[ 6077.330533] [<c47793a4>] (lowmem_shrink+0x0/0x528) from [<c45a45c8>] (shrink_slab+0x114/0x1c0) 04-18 22:43:39.183 <4>0[ 6077.330554] [<c45a44b4>] (shrink_slab+0x0/0x1c0) from [<c45a5180>] (try_to_free_pages+0x1e8/0x368) 04-18 22:43:39.183 <4>0[ 6077.330581] [<c45a4f98>] (try_to_free_pages+0x0/0x368) from [<c459ce98>] (__alloc_pages_nodemask+0x368/0x598) 04-18 22:43:39.183 <4>0[ 6077.330631] [<c459cb30>] (__alloc_pages_nodemask+0x0/0x598) from [<bf003ed4>] (os_allocate+0x11c/0x38c [ump]) 04-18 22:43:39.183 <4>0[ 6077.330677] [<bf003db8>] (os_allocate+0x0/0x38c [ump]) from [<bf002b08>] (_ump_ukk_allocate+0x1f4/0x380 [ump]) 04-18 22:43:39.183 <4>0[ 6077.330727] [<bf002914>] (_ump_ukk_allocate+0x0/0x380 [ump]) from [<bf0057cc>] (ump_secure_id_from_phys_blocks_wrapper+0x308/0x450 [ump]) 04-18 22:43:39.183 <4>0[ 6077.330807] [<bf0056e0>] (ump_secure_id_from_phys_blocks_wrapper+0x21c/0x450 [ump]) from [<bf003754>] (ump_file_ioctl+0x11c/0x238 [ump]) 04-18 22:43:39.183 <4>0[ 6077.330824] r8:c4534244 r7:0000000c r6:c0109002 r5:45bff908 r4:45bff908 04-18 22:43:39.183 <4>0[ 6077.330871] [<bf003638>] (ump_file_ioctl+0x0/0x238 [ump]) from [<c45d7eb0>] (do_vfs_ioctl+0x540/0x5c4)
Assignee | ||
Comment 53•10 years ago
|
||
04-18 22:49:10.445 <4>0[ 6408.596004] Compositor invoked oom-killer: gfp_mask=0x286d2, order=0, oom_adj=0, oom_score_adj=0 04-18 22:49:10.445 <4>0[ 6408.596091] Backtrace: 04-18 22:49:10.445 <4>0[ 6408.596153] [<c4537ad8>] (dump_backtrace+0x0/0x110) from [<c4889f0c>] (dump_stack+0x18/0x1c) 04-18 22:49:10.445 <4>0[ 6408.596272] r7:000286d2 r6:000286d2 r5:00000000 r4:c6280000 04-18 22:49:10.445 <4>0[ 6408.596352] [<c4889ef4>] (dump_stack+0x0/0x1c) from [<c4599464>] (dump_header.clone.1+0x7c/0xac) 04-18 22:49:10.445 <4>0[ 6408.596428] [<c45993e8>] (dump_header.clone.1+0x0/0xac) from [<c45994d4>] (oom_kill_process.clone.0+0x40/0x26c) 04-18 22:49:10.445 <4>0[ 6408.596503] r6:00000000 r5:00000154 r4:c59ef7a0 04-18 22:49:10.445 <4>0[ 6408.596565] [<c4599494>] (oom_kill_process.clone.0+0x0/0x26c) from [<c45999b4>] (out_of_memory+0x2b4/0x3a0) 04-18 22:49:10.445 <4>0[ 6408.596649] [<c4599700>] (out_of_memory+0x0/0x3a0) from [<c459cfb4>] (__alloc_pages_nodemask+0x484/0x598) 04-18 22:49:10.445 <4>0[ 6408.596757] [<c459cb30>] (__alloc_pages_nodemask+0x0/0x598) from [<bf003ed4>] (os_allocate+0x11c/0x38c [ump]) 04-18 22:49:10.445 <4>0[ 6408.596863] [<bf003db8>] (os_allocate+0x0/0x38c [ump]) from [<bf002b08>] (_ump_ukk_allocate+0x1f4/0x380 [ump]) 04-18 22:49:10.445 <4>0[ 6408.596970] [<bf002914>] (_ump_ukk_allocate+0x0/0x380 [ump]) from [<bf0057cc>] (ump_secure_id_from_phys_blocks_wrapper+0x308/0x450 [ump]) 04-18 22:49:10.445 <4>0[ 6408.597094] [<bf0056e0>] (ump_secure_id_from_phys_blocks_wrapper+0x21c/0x450 [ump]) from [<bf003754>] (ump_file_ioctl+0x11c/0x238 [ump]) 04-18 22:49:10.445 <4>0[ 6408.597183] r8:c4534244 r7:0000000c r6:c0109002 r5:45bff908 r4:45bff908 04-18 22:49:10.445 <4>0[ 6408.597287] [<bf003638>] (ump_file_ioctl+0x0/0x238 [ump]) from [<c45d7eb0>] (do_vfs_ioctl+0x540/0x5c4)
Assignee | ||
Comment 54•10 years ago
|
||
we should dig out the code of __alloc_pages_nodemask which causes two code branches LMK/oom-killer
Assignee | ||
Comment 55•10 years ago
|
||
LMK has been called before involving oom-killer. But returned due to this code, here : sc->nr_to_scan <= 0, checked if (sc->nr_to_scan <= 0 || min_adj == OOM_ADJUST_MAX + 1) { lowmem_print(5, "lowmem_shrink %lu, %x, return %d\n", sc->nr_to_scan, sc->gfp_mask, rem); return rem; } <4>0[ 144.806796] lowmem_shrink 0, 200da, return 15851 <4>0[ 144.806742] lowmem_shrink swapfree 5699, min_swapfree 22796, min_adj 16 <4>0[ 144.806796] lowmem_shrink 0, 200da, return 15851 <4>0[ 144.806860] lowmem_shrink swapfree 5699, min_swapfree 22796, min_adj 16 <4>0[ 144.806915] lowmem_shrink 0, 200da, return 15851 <4>0[ 144.806989] lowmem_shrink swapfree 5699, min_swapfree 22796, min_adj 16 <4>0[ 144.807044] lowmem_shrink 0, 200da, return 15851 <4>0[ 144.807166] lowmem_shrink swapfree 5699, min_swapfree 22796, min_adj 16 <4>0[ 144.807221] lowmem_shrink 0, 200da, return 15851 <4>0[ 144.807413] lowmem_shrink swapfree 5699, min_swapfree 22796, min_adj 16 <4>0[ 144.807469] lowmem_shrink 0, 200da, return 15851 <4>0[ 144.807825] (Preallocated a invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=1, oom_score_adj=67 <4>0[ 144.807897] Backtrace: <4>0[ 144.807953] [<c4537ad8>] (dump_backtrace+0x0/0x110) from [<c4889f4c>] (dump_stack+0x18/0x1c) <4>0[ 144.808019] r7:000200da r6:000200da r5:00000000 r4:c6f6e000 <4>0[ 144.808096] [<c4889f34>] (dump_stack+0x0/0x1c) from [<c4599464>] (dump_header.clone.1+0x7c/0xac) <4>0[ 144.808173] [<c45993e8>] (dump_header.clone.1+0x0/0xac) from [<c45994d4>] (oom_kill_process.clone.0+0x40/0x26c) <4>0[ 144.808247] r6:00000000 r5:00000290 r4:c2f40360 <4>0[ 144.808309] [<c4599494>] (oom_kill_process.clone.0+0x0/0x26c) from [<c45999b4>] (out_of_memory+0x2b4/0x3a0) <4>0[ 144.808393] [<c4599700>] (out_of_memory+0x0/0x3a0) from [<c459cff4>] (__alloc_pages_nodemask+0x484/0x598) <4>0[ 144.808477] [<c459cb70>] (__alloc_pages_nodemask+0x0/0x598) from [<c45bc248>] (read_swap_cache_async+0x58/0x1b0) <4>0[ 144.808562] [<c45bc1f0>] (read_swap_cache_async+0x0/0x1b0) from [<c45bc42c>] (swapin_readahead+0x8c/0x94) <4>0[ 144.808646] [<c45bc3a0>] (swapin_readahead+0x0/0x94) from [<c45ae800>] (handle_pte_fault+0x2bc/0x6cc) <4>0[ 144.808726] [<c45ae544>] (handle_pte_fault+0x0/0x6cc) from [<c45af184>] (handle_mm_fault+0xd0/0xe4) <4>0[ 144.808806] [<c45af0b4>] (handle_mm_fault+0x0/0xe4) from [<c453b808>] (do_page_fault+0xe8/0x28c) <4>0[ 144.808888] [<c453b720>] (do_page_fault+0x0/0x28c) from [<c4528288>] (do_DataAbort+0x3c/0xa0) <4>0[ 144.808968] [<c452824c>] (do_DataAbort+0x0/0xa0) from [<c4533dcc>] (ret_from_exception+0x0/0x10)
Comment 56•10 years ago
|
||
(In reply to ying.xu from comment #55) > LMK has been called before involving oom-killer. > But returned due to this code, here : sc->nr_to_scan <= 0, checked > > if (sc->nr_to_scan <= 0 || min_adj == OOM_ADJUST_MAX + 1) { > lowmem_print(5, "lowmem_shrink %lu, %x, return %d\n", > sc->nr_to_scan, sc->gfp_mask, rem); > return rem; > } > > <4>0[ 144.806796] lowmem_shrink 0, 200da, return 15851 For looking into the code, this is normal. |try_to_free_pages()| call memory shrinker with 0 as the value of the nr_to_scan argument to return how many pages are potentially to free.
Assignee | ||
Comment 57•10 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #56) > For looking into the code, this is normal. |try_to_free_pages()| call > memory shrinker with 0 as the value of the nr_to_scan argument to return how > many pages are potentially to free. yes. But inside the function. nr_pages_scanned will be checked and updated . which is SWAP_CLUSTER_MAX (32) unsigned long shrink_slab(struct shrink_control *shrink, unsigned long nr_pages_scanned, unsigned long lru_pages) { struct shrinker *shrinker; unsigned long ret = 0; if (nr_pages_scanned == 0) nr_pages_scanned = SWAP_CLUSTER_MAX;
Assignee | ||
Comment 58•10 years ago
|
||
And I think the check condition of while loop may failed so do_shrinker_shrink with ns_to_scan!=0 may not be called. while (total_scan >= SHRINK_BATCH) { long this_scan = SHRINK_BATCH; int shrink_ret; int nr_before; nr_before = do_shrinker_shrink(shrinker, shrink, 0); shrink_ret = do_shrinker_shrink(shrinker, shrink, this_scan); if (shrink_ret == -1) break; if (shrink_ret < nr_before) ret += nr_before - shrink_ret; count_vm_events(SLABS_SCANNED, this_scan); total_scan -= this_scan; cond_resched(); }
Comment 59•10 years ago
|
||
lmk returns a big number for |max_pass| at the same funciton, except there is no any killable process. So, it is very unlikely to skip second call of |do_shrinker_shrink()| while there has some thing that could be killed by lmk. Maybe, we should monitor this value and add a log entry for this to make sure it running correctly.
Comment 60•10 years ago
|
||
(In reply to ying.xu from comment #53) > (ump_secure_id_from_phys_blocks_wrapper+0x21c/0x450 [ump]) from [<bf003754>] > (ump_file_ioctl+0x11c/0x238 [ump]) > 04-18 22:49:10.445 <4>0[ 6408.597183] r8:c4534244 r7:0000000c r6:c0109002 > r5:45bff908 r4:45bff908 > 04-18 22:49:10.445 <4>0[ 6408.597287] [<bf003638>] (ump_file_ioctl+0x0/0x238 > [ump]) from [<c45d7eb0>] (do_vfs_ioctl+0x540/0x5c4) For some driver, it would request free page from kernel prohibiting mem shrinker. It means once free memory is up OOM is called without trying mem shrinker. I guess ump is one of them.
Comment 61•10 years ago
|
||
Forget my previous comment, I have found another path for mem shrinker.
Comment 62•10 years ago
|
||
(In reply to ying.xu from comment #50) > Created attachment 8409282 [details] > updated log Your log is very noisy, too much useless files in the tarball. I don't know which file is really useful. Could you attach only useful files here next time? Thanks!
Assignee | ||
Comment 63•10 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #62) > Your log is very noisy, too much useless files in the tarball. I don't know > which file is really useful. Could you attach only useful files here next > time? Thanks! sorry please refer this path slog_external/20140418130229 in the packed file. slog_external/20140418130229$ du -ch 13M ./kernel //kernel log 70M ./android //android log 700K ./misc
Comment 65•10 years ago
|
||
I cannot found abnormal message in the log, except one thing about gralloc is unusual: 04-18 21:31:18.370 89 2097 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-18 21:31:18.460 89 2097 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-18 21:45:30.270 89 3095 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-18 21:45:30.310 89 3095 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-18 22:35:02.870 89 6424 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-18 22:35:02.870 89 6424 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-18 23:49:47.550 10869 12089 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-18 23:49:47.550 10869 12089 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-18 23:49:47.550 10869 12089 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 01:06:19.660 10869 16611 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 01:06:19.660 10869 16611 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 01:06:19.710 10869 16611 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 02:12:53.550 10869 19054 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 02:12:53.560 10869 19054 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 02:49:42.400 10869 20380 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 02:49:42.400 10869 20380 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 02:49:42.420 10869 20380 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 04:35:10.380 10869 24643 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 04:35:10.410 10869 24643 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 06:12:16.650 10869 28589 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 06:12:16.650 10869 28589 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 06:12:16.670 10869 28589 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 06:30:03.290 10869 29298 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 06:30:03.310 10869 29298 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 07:37:03.130 10869 31883 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 07:37:03.130 10869 31883 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 07:37:03.180 10869 31883 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 07:57:48.000 10869 32702 E SprdCameraHardware: Could not dequeue gralloc buffer! 04-19 07:57:48.050 10869 32702 E SprdCameraHardware: Could not dequeue gralloc buffer!
Flags: needinfo?(schiu)
Comment 66•10 years ago
|
||
Hi All, After discussion w/ Yiwen, I listed some points below: 1. b2g/Nuwa were killed by OOM. In these cases, there is still application at bg/fg and should be killed first. 2. slog only record the log till 18764 secs, but sysdump showed system is running till 138261 secs, is slog killed by anyone? 3. By ps status dump from Yiwen, adbd looks restart. 4. There is a lot of camera error in system dump. Free memory is more than 9MB. Below are the action items of LMK/OOM. 1. Change LMK threshold of homescreen to 16MB to make sure homescreen can be killed at background. 2. Add the mechanism to kill application between LMK and OOM. 3. Optimize the seek value in mm/vmscan.c in order to avoid LMK didn't work. 4. Make calculation of file cache more accurate Hi James, Could you have member to check the camera error in sysdump? OOM log: 04-19 23:08:47.948 <6>0[12519.033855] R (running) 14438 76796K 18752K 17600K 12631K 10004K 12964K Messages 04-19 23:08:47.948 <6>0[12519.035123] R (running) 14544 65320K 16752K 16748K 11524K 8656K 3684K E-Mail ... 04-19 23:08:47.948 <3>0[12519.035678] Out of memory: Kill process 629 (b2g) score 334 or sacrifice child 04-19 23:08:47.948 <3>0[12519.035713] Killed process 6229 ((Nuwa)) total-vm:51688kB, anon-rss:64kB, file-rss:788kB 04-19 23:23:51.700 <6>0[13422.718145] R (running) 15591 80088K 13452K 12876K 8077K 6488K 12936K Messages 04-19 23:23:51.700 <6>0[13422.718254] S (sleeping) 15608 1804K 116K 112K 6K 0K 148K hciattach 04-19 23:23:51.700 <6>0[13422.718362] S (sleeping) 15614 2140K 120K 120K 8K 0K 264K bluetoothd 04-19 23:23:51.700 <6>0[13422.719589] D (disk sleep) 15660 65920K 15496K 14920K 9745K 7792K 5096K Communications ... 04-19 23:23:51.700 <3>0[13422.720474] Out of memory: Kill process 629 (b2g) score 345 or sacrifice child 04-19 23:23:51.700 <3>0[13422.720560] Killed process 14736 ((Nuwa)) total-vm:51688kB, anon-rss:352kB, file-rss:2148kB 04-19 23:25:34.099 <6>0[13525.024321] R (running) 15724 72708K 15032K 13880K 10497K 8672K 10660K Homescreen ... 04-19 23:25:34.099 <3>0[13525.026461] Out of memory: Kill process 629 (b2g) score 341 or sacrifice child 04-19 23:25:34.099 <3>0[13525.026540] Killed process 15840 ((Nuwa)) total-vm:51688kB, anon-rss:0kB, file-rss:1816kB 04-19 23:25:41.321 <6>0[13532.335203] R (running) 15362 58852K 4584K 4580K 2880K 2320K 6860K Browser 04-19 23:25:41.321 <6>0[13532.336936] R (running) 15724 72772K 16948K 15796K 12338K 10312K 9120K Homescreen ... 04-19 23:25:41.321 <3>0[13532.338404] Out of memory: Kill process 629 (b2g) score 343 or sacrifice child 04-19 23:25:41.321 <3>0[13532.338477] Killed process 629 (b2g) total-vm:169604kB, anon-rss:29064kB, file-rss:7136kB sysdump: 0[138258.814601] V4L2: fail: set_next_buffer filled buffer is 0. 0[138258.890422] V4L2: path1_done_buffer: Fail to this entry. last addr: 7a25200, buf addr: 7a25200 0[138258.895825] dcam_cb_ISRCapSOF. 0[138258.895873] V4L2: fail: set_next_buffer filled buffer is 0. 0[138258.921666] 116U 0[138258.971676] V4L2: path1_done_buffer: Fail to this entry. last addr: 7a25200, buf addr: 7a25200 0[138258.977113] dcam_cb_ISRCapSOF. 0[138258.977159] V4L2: fail: set_next_buffer filled buffer is 0. 0[138259.010461] sprd_debug_check_crash_key: Crash key count : 2 0[138259.010527] Kernel panic - not syncing: Crash Key 0[138259.010568] Mem-info: 0[138259.010593] Normal per-cpu: 0[138259.010623] CPU 0: hi: 42, btch: 7 usd: 12 0[138259.010675] active_anon:4794 inactive_anon:4880 isolated_anon:0 0[138259.010683] active_file:621 inactive_file:574 isolated_file:0 0[138259.010691] unevictable:83 dirty:0 writeback:0 unstable:0 0[138259.010698] free:2421 slab_reclaimable:326 slab_unreclaimable:1166 0[138259.010706] mapped:734 shmem:3392 pagetables:457 bounce:0 0[138259.010915] Normal free:9684kB min:512kB low:1152kB high:1280kB active_anon:19176kB inactive_anon:19520kB active_file:2484kB inactive_file:2296kB unevictable:332kB isolated(anon):0kB isolated(file):0kB present:114408kB mlocked:0kB dirty:0kB writeback:0kB mapped:2936kB shmem:13568kB slab_reclaimable:1304kB slab_unreclaimable:4664kB kernel_stack:2048kB pagetables:1828kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no 0[138259.011179] lowmem_reserve[]: 0 0 0 0[138259.011225] Normal: 2217*4kB 98*8kB 2*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 9684kB 0[138259.011356] 4969 total pagecache pages 0[138259.011389] 289 pages in swap cache 0[138259.011423] Swap cache stats: add 2317039, delete 2316750, find 128810/1277541 0[138259.011479] Free swap = 0kB 0[138259.011506] Total swap = 65532kB 0[138259.012739] 28854 pages of RAM 0[138259.012770] 3055 free pages 0[138259.012797] 3771 reserved pages 0[138259.012826] 1109 slab pages 0[138259.012852] 3161 pages shared 0[138259.012880] 289 pages swap cached 0[138260.020092] 0[138260.020124] ***************************************************** 0[138260.020173] * * 0[138260.020221] * Sysdump enter, preparing debug info to dump ... * 0[138260.020269] * * 0[138260.020316] ***************************************************** 0[138260.020363] 0[138261.029526] vaddr is c7f00000,paddr is 07f00000 0[138261.029588] reason: Crash Key, sprd_sysdump_info->crash_key: 1
Flags: needinfo?(james.zhang)
Comment 67•10 years ago
|
||
kernel and system dump
Updated•10 years ago
|
Whiteboard: [c=memory p= s= u=tarako] [POVB][priority] → [c=memory p= s= u=tarako] [POVB][priority][wait for sprd kernel patch]
Reporter | ||
Comment 68•10 years ago
|
||
Danny, please discuss camera error log with dafeng.xu. His seat is near Ying.
Flags: needinfo?(james.zhang) → needinfo?(Dafeng.Xu)
Assignee | ||
Comment 69•10 years ago
|
||
(In reply to Danny Liang [:dliang] from comment #66) > Below are the action items of LMK/OOM. > 1. Change LMK threshold of homescreen to 16MB to make sure homescreen can be > killed at background. done > 2. Add the mechanism to kill application between LMK and OOM. WIP-patch and log > 3. Optimize the seek value in mm/vmscan.c in order to avoid LMK didn't work. done > 4. Make calculation of file cache more accurate no conclusion yet. We think there are no easy ways to calculate this
Assignee | ||
Comment 70•10 years ago
|
||
Assignee | ||
Comment 71•10 years ago
|
||
(In reply to ying.xu from comment #70) > Created attachment 8410738 [details] [diff] [review] > use_lmk_strategy_in_oom.patch I have run 12hours with this path And I can see the path of oom has fall into kill_bad_process_lmk 04-23 02:39:16.621 <3>0[19321.450230] vmscan shrink_slab: bf02c80c nr=0 total_scan=7 04-23 02:39:16.621 <3>0[19321.450348] oom_killer send sigkill to 20216 (Homescreen), adj 8, size 4885 04-23 02:39:16.621 <4>0[19321.450405] b2g invoked oom-killer: gfp_mask=0x0, order=-1, oom_adj=0, oom_score_adj=0 04-23 02:39:16.621 <4>0[19321.450465] Backtrace: 04-23 02:39:16.621 <4>0[19321.450525] [<c4537ad8>] (dump_backtrace+0x0/0x110) from [<c488a11c>] (dump_stack+0x18/0x1c) 04-23 02:39:16.621 <4>0[19321.450591] r7:c49c3c48 r6:00000000 r5:ffffffff r4:c1bbc000 04-23 02:39:16.621 <4>0[19321.450669] [<c488a104>] (dump_stack+0x0/0x1c) from [<c4599464>] (dump_header.clone.1+0x7c/0xac) 04-23 02:39:16.621 <4>0[19321.450745] [<c45993e8>] (dump_header.clone.1+0x0/0xac) from [<c4599830>] (kill_bad_process_lmk+0x130/0x158) 04-23 02:39:16.621 <4>0[19321.450892] r6:00000008 r5:c65130e0 r4:00001315 04-23 02:39:16.621 <4>0[19321.450958] [<c4599700>] (kill_bad_process_lmk+0x0/0x158) from [<c45999d4>] (out_of_memory+0x17c/0x3b4) 04-23 02:39:16.621 <4>0[19321.451040] [<c4599858>] (out_of_memory+0x0/0x3b4) from [<c459d160>] (__alloc_pages_nodemask+0x484/0x598) 04-23 02:39:16.621 <4>0[19321.451127] [<c459ccdc>] (__alloc_pages_nodemask+0x0/0x598) from [<c45bc3d4>] (read_swap_cache_async+0x58/0x1b0) 04-23 02:39:16.621 <4>0[19321.451212] [<c45bc37c>] (read_swap_cache_async+0x0/0x1b0) from [<c45bc5b8>] (swapin_readahead+0x8c/0x94) 04-23 02:39:16.621 <4>0[19321.451299] [<c45bc52c>] (swapin_readahead+0x0/0x94) from [<c45ae98c>] (handle_pte_fault+0x2bc/0x6cc) 04-23 02:39:16.621 <4>0[19321.451379] [<c45ae6d0>] (handle_pte_fault+0x0/0x6cc) from [<c45af310>] (handle_mm_fault+0xd0/0xe4) 04-23 02:39:16.621 <4>0[19321.451460] [<c45af240>] (handle_mm_fault+0x0/0xe4) from [<c453b808>] (do_page_fault+0xe8/0x28c) 04-23 02:39:16.621 <4>0[19321.451547] [<c453b720>] (do_page_fault+0x0/0x28c) from [<c4528288>] (do_DataAbort+0x3c/0xa0) 04-23 02:39:16.621 <4>0[19321.451629] [<c452824c>] (do_DataAbort+0x0/0xa0) from [<c4533dcc>] (ret_from_exception+0x0/0x10) 04-23 02:39:16.621 <4>0[19321.451698] Exception stack(0xc1bbdfb0 to 0xc1bbdff8) 04-23 02:39:16.621 <4>0[19321.451745] dfa0: 434e31e0 44aca000 44aaa110 44aaa104 04-23 02:39:16.621 <4>0[19321.451815] dfc0: 434e31e0 00000001 bec784d0 46f9e050 46f9e040 00000002 00000001 46ed8430 04-23 02:39:16.621 <4>0[19321.451883] dfe0: 00000001 bec78458 41b6d531 41b6d2d6 20800030 ffffffff 04-23 02:39:16.621 <4>0[19321.451935] r7:46f9e050 r6:bec784d0 r5:00000007 r4:0000040f 04-23 02:39:16.621 <4>0[19321.452001] Mem-info: 04-23 02:39:16.621 <4>0[19321.452026] Normal per-cpu: 04-23 02:39:16.621 <4>0[19321.452055] CPU 0: hi: 42, btch: 7 usd: 0 04-23 02:39:16.621 <4>0[19321.452107] active_anon:6858 inactive_anon:6888 isolated_anon:0 04-23 02:39:16.621 <4>0[19321.452115] active_file:2108 inactive_file:2 isolated_file:6 04-23 02:39:16.621 <4>0[19321.452122] unevictable:83 dirty:14 writeback:0 unstable:0 04-23 02:39:16.621 <4>0[19321.452130] free:340 slab_reclaimable:263 slab_unreclaimable:1042 04-23 02:39:16.621 <4>0[19321.452138] mapped:3061 shmem:1014 pagetables:439 bounce:0 04-23 02:39:16.646 ated(anon):0kB isolated(file):24kB present:114408kB mlocked:0kB dirty:56kB writeback:0kB mapped:12244kB shmem:4056kB slab_reclaimable:1052kB slab_unreclaimable:4168kB kernel_stack:2064kB pagetables:1756kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:596 all_unreclaimable? no 04-23 02:39:16.646 <4>0[19321.452609] lowmem_reserve[]: 0 0 0 04-23 02:39:16.646 <4>0[19321.452655] Normal: 34*4kB 1*8kB 0*16kB 4*32kB 9*64kB 4*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1360kB 04-23 02:39:16.646 <4>0[19321.452786] 3573 total pagecache pages 04-23 02:39:16.646 <4>0[19321.452819] 360 pages in swap cache 04-23 02:39:16.646 <4>0[19321.452853] Swap cache stats: add 3087750, delete 3087390, find 108563/1658455 04-23 02:39:16.646 <4>0[19321.452908] Free swap = 27232kB 04-23 02:39:16.646 <4>0[19321.452937] Total swap = 65532kB 04-23 02:39:16.646 <4>0[19321.454546] 28854 pages of RAM 04-23 02:39:16.646 <4>0[19321.454579] 961 free pages 04-23 02:39:16.646 <4>0[19321.454606] 3755 reserved pages 04-23 02:39:16.646 <4>0[19321.454634] 940 slab pages 04-23 02:39:16.646 <4>0[19321.454660] 8694 pages shared 04-23 02:39:16.646 <4>0[19321.454687] 360 pages swap cached 04-23 02:39:16.646 <4>0[19321.454725] zram0 status unit(page): 04-23 02:39:16.646 <4>0[19321.454731] mem_used_total: 3777 04-23 02:39:16.646 <4>0[19321.454737] compr_data_size: 3523 04-23 02:39:16.646 <4>0[19321.454742] orig_data_size: 9400 04-23 02:39:16.646 <4>0[19321.454748] num_reads: 1549867 04-23 02:39:16.646 <4>0[19321.454754] num_writes: 1536310 04-23 02:39:16.646 <6>0[19321.454908] stat pid vss rss rss_mapped pss uss swap 04-23 02:39:16.646 <6>0[19321.454975] ----------------------------------------------------------------- 04-23 02:39:16.646 <6>0[19321.455047] S (sleeping) 1 376K 108K 108K 45K 0K 124K init 04-23 02:39:16.646 <6>0[19321.455139] S (sleeping) 65 312K 88K 88K 44K 0K 92K ueventd 04-23 02:39:16.646 <6>0[19321.455235] S (sleeping) 80 828K 92K 88K 7K 4K 84K servicemanager 04-23 02:39:16.646 <6>0[19321.455357] S (sleeping) 81 4020K 96K 96K 3K 0K 264K vold 04-23 02:39:16.646 <6>0[19321.455438] S (sleeping) 82 248K 0K 0K 0K 0K 20K vcharged 04-23 02:39:16.646 <6>0[19321.455528] S (sleeping) 83 680K 68K 64K 2K 0K 72K sprd_monitor 04-23 02:39:16.646 <6>0[19321.455623] S (sleeping) 87 820K 92K 88K 3K 0K 92K rilproxy 04-23 02:39:16.646 <6>0[19321.455768] S (sleeping) 89 7420K 112K 112K 4K 0K 288K netd 04-23 02:39:16.646 <6>0[19321.455856] S (sleeping) 90 692K 68K 64K 2K 0K 72K debuggerd 04-23 02:39:16.646 <6>0[19321.455954] S (sleeping) 92 1364K 104K 100K 3K 0K 148K dbus-daemon 04-23 02:39:16.646 <6>0[19321.456054] S (sleeping) 93 1732K 60K 56K 1K 0K 160K keystore 04-23 02:39:16.646 <6>0[19321.456152] S (sleeping) 94 1276K 84K 80K 2K 0K 176K nvm_daemon 04-23 02:39:16.646 <6>0[19321.456253] S (sleeping) 95 1844K 72K 72K 2K 0K 96K modemd 04-23 02:39:16.646 <6>0[19321.456344] S (sleeping) 101 820K 92K 88K 3K 0K 92K rilproxy 04-23 02:39:16.646 <6>0[19321.456439] S (sleeping) 104 1112K 68K 64K 2K 0K 116K engmoded 04-23 02:39:16.646 <6>0[19321.456555] S (sleeping) 112 4464K 132K 132K 128K 128K 68K adbd 04-23 02:39:16.646 <6>0[19321.456852] S (sleeping) 155 28716K 88K 88K 3K 0K 460K phoneserver_2si 04-23 02:39:16.646 <6>0[19321.457008] S (sleeping) 158 8412K 84K 84K 2K 0K 424K engpcclient 04-23 02:39:16.646 <6>0[19321.457112] S (sleeping) 160 1872K 100K 100K 4K 0K 116K engmodemclient 04-23 02:39:16.646 <6>0[19321.457215] S (sleeping) 165 1856K 88K 88K 3K 0K 100K engservice 04-23 02:39:16.646 <6>0[19321.457351] S (sleeping) 319 5304K 240K 240K 115K 108K 88K slog 04-23 02:39:16.646 <6>0[19321.457447] R (running) 19595 764K 232K 228K 103K 60K 28K oom-msg-logger 04-23 02:39:16.680 tach 04-23 02:39:16.680 <6>0[19321.457658] S (sleeping) 19895 2144K 132K 132K 7K 0K 268K bluetoothd 04-23 02:39:16.680 <6>0[19321.457751] S (sleeping) 19994 756K 172K 168K 43K 0K 76K sh 04-23 02:39:16.680 <6>0[19321.457840] S (sleeping) 19996 688K 128K 128K 39K 36K 52K orng 04-23 02:39:16.680 <6>0[19321.457989] S (sleeping) 20139 7628K 196K 196K 17K 4K 240K rild_sp 04-23 02:39:16.680 <6>0[19321.458151] S (sleeping) 20140 9676K 200K 200K 17K 4K 260K rild_sp 04-23 02:39:16.680 <6>0[19321.458409] S (sleeping) 20149 20444K 264K 264K 31K 4K 1168K mediaserver 04-23 02:39:16.680 <6>0[19321.460931] R (running) 20150 141124K 33132K 29880K 25160K 23360K 14868K b2g 04-23 02:39:16.680 <6>0[19321.461559] S (sleeping) 20202 51684K 2968K 2968K 582K 0K 8996K (Nuwa) 04-23 02:39:16.680 <6>0[19321.462579] R (running) 20216 66160K 10468K 10468K 6058K 4652K 9072K Homescreen 04-23 02:39:16.680 <6>0[19321.462697] S (sleeping) 20252 2504K 204K 204K 50K 40K 244K wpa_supplicant 04-23 02:39:16.680 <6>0[19321.464842] S (sleeping) 20254 96676K 33980K 31384K 26265K 24224K 5664K Camera 04-23 02:39:16.680 <6>0[19321.465758] R (running) 20349 59868K 8404K 8400K 4402K 3172K 5672K (Preallocated a
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
status-b2g-v1.3T:
--- → fixed
Updated•10 years ago
|
Target Milestone: --- → 1.4 S6 (25apr)
You need to log in
before you can comment on or make changes to this bug.
Description
•