782477 - Cut-the-Rope subprocess is OOM'ing

Reporter

Description

•

12 years ago

Attached file stdout/stderr from gdb — Details

If I launch Cut-the-Rope on otoro, b2g dies. Even under gdb, the process gets wiped out, so unfortunately, no backtrace.

It looks like someone sent a SIGKILL which kills the process right out from under gdb.

Dave Hylands [:dhylands]

Reporter

Comment 1

•

12 years ago

Attached file logcat output — Details

Shian-Yow Wu [:swu]

Comment 2

•

12 years ago

As I know, the current version of Cut-the-Rope is designed for desktop, which hasn't been optimized for mobile on memory usage.  If you check by 'procrank' in adb shell, you'll see the memory exhausted and b2g got crashed.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 3

•

12 years ago

It's come out that the memory usage of the b2g process has regressed quite a bit since last time I measured --- it's using 40-50MB more than previously :(.  We need to investigate that.

Thomas Zimmermann [:tzimmermann] [:tdz]

Comment 4

•

12 years ago

Citing cjones:

> we've made a bunch of changes to free up memory for
> apps lately, but I've noticed one frustrating problem.  If you
>  - load Cut the Rope
>  - play the first level
>
> it works now.  But if you try to move on to the second level, the CTR
> app is lowmem killed.
>
> Its memory usage gets up around 90MB by that point, but we still have
> plenty of RAM in pagecache, 20MB or so.
>
> So the project is
>  - why isn't linux dropping those pagecache pages instead of
>    lowmem-killing?  is it locked from mapping or what?
>  - is there anything we can do about it?

Status: NEW → ASSIGNED

Thomas Zimmermann [:tzimmermann] [:tdz]

Updated

•

12 years ago

Assignee: nobody → tdz

Thomas Zimmermann [:tzimmermann] [:tdz]

Updated

•

12 years ago

OS: Linux → Android

Hardware: x86_64 → ARM

Thomas Zimmermann [:tzimmermann] [:tdz]

Comment 5

•

12 years ago

My results so far...

I managed to reproduce the OOM in my otoro phone. The free memory varies while CTR runs varies from 24 MiB to 3 MiB. I see a lot of pmem usage in logcat, with frequent OOMs

> D/memalloc( 2937): /dev/pmem: Mapped buffer base:0x47a00000 size:987136 offset:921600 fd:74
> D/memalloc( 2290): /dev/pmem: Allocated buffer base:0x4abb5000 size:614400 offset:4358144 > fd:168
> D/memalloc( 2937): /dev/pmem: Mapped buffer base:0x494ee000 size:4972544 offset:4358144 fd:77
> D/memalloc( 2937): /dev/pmem: Unmapping buffer base:0x4e8e2000 size:11075584 offset:11010048
> D/memalloc( 2290):  Out of PMEM. Dumping PMEM stats for debugging
> D/memalloc( 2290):  ------------- PRINT PMEM STATS --------------
> D/memalloc( 2290):  Node 0 -> Start Address : 0 Size 9600 Free info 0
> D/memalloc( 2290):  Node 1 -> Start Address : 9600 Size 9600 Free info 0
> D/memalloc( 2290):  Node 2 -> Start Address : 19200 Size 9600 Free info 0
> D/memalloc( 2290):  Node 3 -> Start Address : 28800 Size 2048 Free info 0
> D/memalloc( 2290):  Node 4 -> Start Address : 30848 Size 2048 Free info 0
> D/memalloc( 2290):  Node 5 -> Start Address : 32896 Size 5504 Free info 1
> D/memalloc( 2290):  Node 6 -> Start Address : 38400 Size 43008 Free info 0
> D/memalloc( 2290):  Node 7 -> Start Address : 81408 Size 18560 Free info 1
> D/memalloc( 2290):  Node 8 -> Start Address : 99968 Size 9600 Free info 0
> D/memalloc( 2290):  Node 9 -> Start Address : 109568 Size 13312 Free info 0
> D/memalloc( 2290):  Node 10 -> Start Address : 122880 Size 13312 Free info 0
> D/memalloc( 2290):  Node 11 -> Start Address : 136192 Size 19200 Free info 0
> D/memalloc( 2290):  Node 12 -> Start Address : 155392 Size 14976 Free info 1
> D/memalloc( 2290):  Node 13 -> Start Address : 170368 Size 60800 Free info 0
> D/memalloc( 2290):  Node 14 -> Start Address : 231168 Size 73728 Free info 0
> D/memalloc( 2290):  Node 15 -> Start Address : 304896 Size 19200 Free info 0
> D/memalloc( 2290):  Node 16 -> Start Address : 324096 Size 19200 Free info 0
> D/memalloc( 2290):  Node 17 -> Start Address : 343296 Size 3072 Free info 1
> D/memalloc( 2290):  Node 18 -> Start Address : 346368 Size 43008 Free info 0
> D/memalloc( 2290):  Node 19 -> Start Address : 389376 Size 73728 Free info 0
> D/memalloc( 2290):  Node 20 -> Start Address : 463104 Size 43008 Free info 0
> D/memalloc( 2290):  Node 21 -> Start Address : 506112 Size 18176 Free info 1
> D/memalloc( 2290):  Total Allocated: Total Free: 
> D/memalloc( 2290): ----------------------------------------------
> E/memalloc( 2290): /dev/pmem: No more pmem available
> W/memalloc( 2290): Falling back to ashmem
> D/memalloc( 2290): ashmem: Allocated buffer base:0x44f10000 size:614400 fd:171
> D/memalloc( 2937): ashmem: Mapped buffer base:0x45668000 size:614400 fd:62
> D/memalloc( 2290): /dev/pmem: Freeing buffer base:0x4af75000 size:425984 offset:3932160 fd:162
> D/memalloc( 2937): /dev/pmem: Unmapping buffer base:0x48dc0000 size:4358144 offset:3932160
> D/memalloc( 2290): /dev/pmem: Freeing buffer base:0x4af0d000 size:425984 offset:3506176 fd:56
> D/memalloc( 2937): /dev/pmem: Unmapping buffer base:0x48a00000 size:3932160 offset:3506176

Pmem hands out physically contiguous memory it receives from kmalloc, which could explain why the available pages are remain unused. IIRC kmalloc is implemented directly on top of the SLAB allocator and only uses the configured pools. The output of /proc/slabinfo before the crash is shown below. You can see that the allocator is short in memory.

> root@android:/ # cat /proc/slabinfo                                            
> slabinfo - version: 2.1
> # name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
> ...
> size-4194304           0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
> size-2097152           0      0 2097152    1  512 : tunables    1    1    0 : slabdata      0      0      0
> size-1048576           1      1 1048576    1  256 : tunables    1    1    0 : slabdata      1      1      0
> size-524288            1      1 524288    1  128 : tunables    1    1    0 : slabdata      1      1      0
> size-262144            0      0 262144    1   64 : tunables    1    1    0 : slabdata      0      0      0
> size-131072            0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
> size-65536             4      4  65536    1   16 : tunables    8    4    0 : slabdata      4      4      0
> size-32768             3      3  32768    1    8 : tunables    8    4    0 : slabdata      3      3      0
> size-16384            36     36  16384    1    4 : tunables    8    4    0 : slabdata     36     36      0
> size-8192             41     41   8192    1    2 : tunables    8    4    0 : slabdata     41     41      0
> size-4096            136    141   4096    1    1 : tunables   24   12    0 : slabdata    136    141      0
> size-2048            172    172   2048    2    1 : tunables   24   12    0 : slabdata     86     86      0
> size-1024            224    224   1024    4    1 : tunables   54   27    0 : slabdata     56     56      0
> size-512             861    864    512    8    1 : tunables   54   27    0 : slabdata    108    108      0
> size-256             165    165    256   15    1 : tunables  120   60    0 : slabdata     11     11      0
> size-192             148    160    192   20    1 : tunables  120   60    0 : slabdata      8      8      0
> size-128             328    330    128   30    1 : tunables  120   60    0 : slabdata     11     11      0
> size-96             1199   1240     96   40    1 : tunables  120   60    0 : slabdata     31     31      0
> size-64             4003   4012     64   59    1 : tunables  120   60    0 : slabdata     68     68      0
> size-32             6670   6780     32  113    1 : tunables  120   60    0 : slabdata     60     60      0
> kmem_cache           144    160     96   40    1 : tunables  120   60    0 : slabdata      4      4      0

Dave Hylands [:dhylands]

Reporter

Comment 6

•

12 years ago

There are actually 3 different kmalloc implementations, SLAB, SLOB, and SLUB. You may want to compare them to see what impact they have on performance and/or memory usage.

The otoro kernel is currently configured to use SLAB. IIRC SLAB is also a power of 2 allocator, so when that pmem request that allocates 11075584 would actually be consuming 2Mb.

Changing the SLAB/SLUB/SLOB setting will require all of the modules to be rebuilt as well (or at least it has in the past).

http://census-labs.com/news/2012/01/03/linux-kernel-heap-exploitation/
http://lwn.net/Articles/229984/
http://www.socallinuxexpo.org/scale9x/presentations/status-linux-slab-allocators.html

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 7

•

12 years ago

Running out of pmem is perfectly fine.  We just fall back on non-physically-contiguous shared memory (which forces our screen compositor code out of its fast path).

I'd like to find out why the kernel can't satisfy the allocation requests when there's apparently plenty of pagecache memory left.  Is it pinned somehow?  Does linux not allow it to fall below a certain threshold?  Etc.

Thomas Zimmermann [:tzimmermann] [:tdz]

Comment 8

•

12 years ago

Chris, are you sure you have 20 MiBs of pagecache left? I managed to get the output of /proc/meminfo shortly before CTR gets killed, which is while the first level is being loaded. But from the numbers, it doesn't seem like there is enough memory to run CTR.

> root@android:/ # cat /proc/meminfo
> MemTotal:         172880 kB
> MemFree:            3252 kB
> Buffers:               4 kB
> Cached:            25516 kB
> SwapCached:            0 kB
> Active:           139260 kB
> Inactive:           6040 kB
> Active(anon):     129020 kB
> Inactive(anon):       80 kB
> Active(file):      10240 kB

10 MiB of the pagecache are actively in use and cannot be reclaimed.

> Inactive(file):     5960 kB

These 6 MiB could be reclaimed.

> Unevictable:        9204 kB

It's probably worth investigating why there are 9 MiBs of unevictable memory. This means that the memory pages are mlock'd, or so.

> Mlocked:            8640 kB

See above.

> HighTotal:             0 kB
> HighFree:              0 kB
> LowTotal:         172880 kB
> LowFree:            3252 kB
> SwapTotal:             0 kB
> SwapFree:              0 kB
> Dirty:                 0 kB
> Writeback:             0 kB
> AnonPages:        129008 kB
> Mapped:            20744 kB
> Shmem:              8756 kB
> Slab:               7596 kB
> SReclaimable:       1312 kB
> SUnreclaim:         6284 kB
> KernelStack:        1672 kB
> PageTables:         1648 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:       86440 kB
> Committed_AS:     289468 kB
> VmallocTotal:     720896 kB
> VmallocUsed:       34444 kB
> VmallocChunk:     661052 kB

To release pages ASAP, I reduced the time for Linux to move pages from active to inactive state (/proc/sys/vm/dirty_expire_centisecs) to 1 cs, and increase the scheduling interval of the flusher thread to 10 cs (/proc/sys/vm/dirty_writeback_centisecs). This free'd up some memory earlier, but didn't improve the overall results.

From these results, I'd say this is not a problem within Linux, but CTR using too much memory.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 9

•

12 years ago

Thanks for the analysis, Thomas!

Here's the result I get from just before OOM death.  I'm actually not able to get through the first level anymore.

> MemTotal:         188248 kB

Thomas: make sure you have the kernel update 8.  It frees up more memory.

> MemFree:            2700 kB
> Buffers:              24 kB
> Cached:            17456 kB
> SwapCached:            0 kB
> Active:           158516 kB
> Inactive:           3240 kB
> Active(anon):     151804 kB
> Inactive(anon):       80 kB
> Active(file):       6712 kB
> Inactive(file):     3160 kB
> Unevictable:        7480 kB
> Mlocked:            6936 kB
> HighTotal:             0 kB
> HighFree:              0 kB
> LowTotal:         188248 kB
> LowFree:            2700 kB
> SwapTotal:             0 kB
> SwapFree:              0 kB
> Dirty:                 0 kB
> Writeback:             0 kB
> AnonPages:        151792 kB
> Mapped:            16968 kB
> Shmem:              7052 kB
> Slab:               8672 kB
> SReclaimable:       1480 kB
> SUnreclaim:         7192 kB
> KernelStack:        1672 kB
> PageTables:         1580 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:       94124 kB
> Committed_AS:     316376 kB
> VmallocTotal:     720896 kB
> VmallocUsed:       26444 kB
> VmallocChunk:     666460 kB

So my results look similar to yours.  I'm starting with more free memory, so it's odd that you're getting further along.  My kernel is also not able to reclaim about 3MB.

I'll try setting the parameters you suggest and see if I get any further.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 10

•

12 years ago

(In reply to Thomas Zimmermann from comment #8)
> From these results, I'd say this is not a problem within Linux, but CTR
> using too much memory.

I should have added here, this is indeed the case; the CTR app is egregious.  We're running code that was designed to run in IE on "desktop".  But it's a good test case for our low-memory handling.  We don't control the code of third-party apps like this, we'll need to deal with memory hogs.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 11

•

12 years ago

With the kernel params tweaked, I also see

> Inactive(file):     6620 kB

but the results are the same.  My logging isn't very precise, so maybe this memory is actually being reclaimed and the timing is just bad.

Andreas Gal :gal

Comment 12

•

12 years ago

The JS engine might mlock stacks. Grep the source for mlock.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Updated

•

12 years ago

Summary: Cut-the-Rope kills b2g on otoro (with SIGKILL) → Cut-the-Rope subprocess is OOM'ing

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 13

•

12 years ago

The PSS for the CTR process gets up to 90MB+ before it dies its OOM death.  This is in the neighborhood of the available memory we were targeting for app processes for v1, so this isn't necessarily a bug.

However, what's somewhat disturbing is that it appears the memory usage of the b2g process has gone up significantly since the last time I measured, possibly up to 8MB USS.  That probably explains why I can't play the first level anymore.

My kingdom for about:memory --> logcat! :)

Justin Lebar (not reading bugmail)

Comment 14

•

12 years ago

> My kingdom for about:memory --> logcat! :)

Actively working on it, bug 788021.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 15

•

12 years ago

Hum, after a reboot, the b2g process was using *10 MB* less USS, and I got past the first level on CTR.  Trying to load the second level, its PSS usage goes past 95MB and then it's OOM killed.

Thomas Zimmermann [:tzimmermann] [:tdz]

Comment 16

•

12 years ago

Thanks everyone. I'm currently investigating these unevictable pages. Searching through the sources hasn't been successful, so I'm working on a kernel module to output the relevant bits of the kernel's VMAs.

Justin Lebar (not reading bugmail)

Comment 17

•

12 years ago

> Thanks everyone. I'm currently investigating these unevictable pages.

Sorry...what exactly are you doing?

Thomas Zimmermann [:tzimmermann] [:tdz]

Comment 18

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #17)
> > Thanks everyone. I'm currently investigating these unevictable pages.
> 
> Sorry...what exactly are you doing?

Typically, almost all cached memory pages can be evicted from the pagecache. The output of /proc/meminfo in comments 8 and 9 shows 9 MiBs of memory that can not be evicted by Linux' flusher threads. I try to find the owner of this memory and afterwards I'll try to make the respective memory pages evictable.

Thomas Zimmermann [:tzimmermann] [:tdz]

Comment 19

•

12 years ago

I found that the unevictable pages are graphics buffers within '/system/b2g/b2g' (pid 105), which get allocated when CTR uses canvas elements. Shown below is the output of the kernel module is used for inspecting the kernel's VMAs. The format roughly resembles the content of /proc/<pid>/maps.

> <6>[09-11 22:21:57.067] [879: insmod]pid nr is 105
> <6>[09-11 22:21:57.067] [879: insmod]47900000-47a50000 rw-sl 00000000 00:04 3358
> <6>[09-11 22:21:57.067] [879: insmod] /dev/ashmem/gralloc-buffer-4a89dd40 (deleted)
> <6>[09-11 22:21:57.067] [879: insmod]
> <6>[09-11 22:21:57.067] [879: insmod]4d8fb000-4db3b000 rw-sl 00000000 00:04 3743
> <6>[09-11 22:21:57.067] [879: insmod] /dev/ashmem/gralloc-buffer-4a93d640 (deleted)
> <6>[09-11 22:21:57.067] [879: insmod]
> <6>[09-11 22:21:57.067] [879: insmod]4db3b000-4dd7b000 rw-sl 00000000 00:04 3781
> <6>[09-11 22:21:57.067] [879: insmod] /dev/ashmem/gralloc-buffer-4aa368c0 (deleted)
> <6>[09-11 22:21:57.067] [879: insmod]

We obviously cannot throw away the graphics buffers, but TowerJelly does not have unevictable pages.

 - Does this only depend on the application, or is there anything to optimize at the lower levels?
 - Or does this depend on the use of pmem vs ashmem?

It would be helpful to me if someone with knowledge in the graphics system could comment on these questions. Thanks!

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 20

•

12 years ago

Ah!  Nice find.

In the course of normal operation, we use pmem-backed "gralloc" memory for gfx.  But it's only allocated from a fixed-size pool, so it may run even when there's RAM still available.

This game is running at something like 16x the screen resolution of the otoro, and lord knows what crazy things it might be doing.  I always see it exhaust gralloc memory when it runs.

Usually exhausting gralloc just results in allocation failures, but the vendor allocator library here will fall back on ashmem when pmem is gone, transparent to the allocator.

The semantics of ashmem is a little odd --- as I recall it can be in one of two states, pinned or not-pinned.  When it's not pinned, the kernel is free to throw away the backing memory whenever it wants.  When it's pinned, the kernel can't do that.  IIRC ashmem segments start out pinned.

So I would bet that ashmem pinning is implemented by the same mechanism as mlock(), and the mlock()d memory here is for gfx buffers.

The problem is, we can't unpin this memory without causing crashes or graphical corruption.  So I guess we're doing about as well as we can do here :/.  Need to look for wins elsewhere.

(I have an idea that might let us not run out of pmem so soon.)

Thomas Zimmermann [:tzimmermann] [:tdz]

Comment 21

•

12 years ago

Ah, ok. Thanks for the info.

Michael Vines [:m1] [:evilmachines]

Comment 22

•

12 years ago

We could experiment with linking /dev/pmem and /dev/pmem_adsp together but then we need a in-gecko mechanism to evict gralloc buffers from pmem when it comes time to turn on the camera or do OMX video decoding.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 23

•

12 years ago

Good to know, thanks.

We have another 5MB win in hand, and I'm starting to think we may be able to get by with 8MB of gralloc, which would get us another 5MB.  We're already at the initial target for available app memory, so at this point it's probably not worth the risk of trying that.

Thomas Zimmermann [:tzimmermann] [:tdz]

Updated

•

12 years ago

Assignee: tzimmermann → nobody

Status: ASSIGNED → NEW

Jason Smith [:jsmith]

Updated

•

12 years ago

Whiteboard: [3rd-party-preloaded-apps]

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 24

•

12 years ago

Memory usage has improved enough to run the unmodified game.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Jason Smith [:jsmith]

Updated

•

12 years ago

Whiteboard: [3rd-party-preloaded-apps]

stdout/stderr from gdb 12 years ago Dave Hylands [:dhylands] 20.93 KB, text/plain		Details
logcat output 12 years ago Dave Hylands [:dhylands] 14.41 KB, text/plain		Details