Closed Bug 782477 Opened 12 years ago Closed 12 years ago

Cut-the-Rope subprocess is OOM'ing

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dhylands, Unassigned)

Details

Attachments

(2 files)

Attached file stdout/stderr from gdb
If I launch Cut-the-Rope on otoro, b2g dies. Even under gdb, the process gets wiped out, so unfortunately, no backtrace.

It looks like someone sent a SIGKILL which kills the process right out from under gdb.
Attached file logcat output
As I know, the current version of Cut-the-Rope is designed for desktop, which hasn't been optimized for mobile on memory usage.  If you check by 'procrank' in adb shell, you'll see the memory exhausted and b2g got crashed.
It's come out that the memory usage of the b2g process has regressed quite a bit since last time I measured --- it's using 40-50MB more than previously :(.  We need to investigate that.
Citing cjones:

> we've made a bunch of changes to free up memory for
> apps lately, but I've noticed one frustrating problem.  If you
>  - load Cut the Rope
>  - play the first level
>
> it works now.  But if you try to move on to the second level, the CTR
> app is lowmem killed.
>
> Its memory usage gets up around 90MB by that point, but we still have
> plenty of RAM in pagecache, 20MB or so.
>
> So the project is
>  - why isn't linux dropping those pagecache pages instead of
>    lowmem-killing?  is it locked from mapping or what?
>  - is there anything we can do about it?
Status: NEW → ASSIGNED
Assignee: nobody → tdz
OS: Linux → Android
Hardware: x86_64 → ARM
My results so far...

I managed to reproduce the OOM in my otoro phone. The free memory varies while CTR runs varies from 24 MiB to 3 MiB. I see a lot of pmem usage in logcat, with frequent OOMs

> D/memalloc( 2937): /dev/pmem: Mapped buffer base:0x47a00000 size:987136 offset:921600 fd:74
> D/memalloc( 2290): /dev/pmem: Allocated buffer base:0x4abb5000 size:614400 offset:4358144 > fd:168
> D/memalloc( 2937): /dev/pmem: Mapped buffer base:0x494ee000 size:4972544 offset:4358144 fd:77
> D/memalloc( 2937): /dev/pmem: Unmapping buffer base:0x4e8e2000 size:11075584 offset:11010048
> D/memalloc( 2290):  Out of PMEM. Dumping PMEM stats for debugging
> D/memalloc( 2290):  ------------- PRINT PMEM STATS --------------
> D/memalloc( 2290):  Node 0 -> Start Address : 0 Size 9600 Free info 0
> D/memalloc( 2290):  Node 1 -> Start Address : 9600 Size 9600 Free info 0
> D/memalloc( 2290):  Node 2 -> Start Address : 19200 Size 9600 Free info 0
> D/memalloc( 2290):  Node 3 -> Start Address : 28800 Size 2048 Free info 0
> D/memalloc( 2290):  Node 4 -> Start Address : 30848 Size 2048 Free info 0
> D/memalloc( 2290):  Node 5 -> Start Address : 32896 Size 5504 Free info 1
> D/memalloc( 2290):  Node 6 -> Start Address : 38400 Size 43008 Free info 0
> D/memalloc( 2290):  Node 7 -> Start Address : 81408 Size 18560 Free info 1
> D/memalloc( 2290):  Node 8 -> Start Address : 99968 Size 9600 Free info 0
> D/memalloc( 2290):  Node 9 -> Start Address : 109568 Size 13312 Free info 0
> D/memalloc( 2290):  Node 10 -> Start Address : 122880 Size 13312 Free info 0
> D/memalloc( 2290):  Node 11 -> Start Address : 136192 Size 19200 Free info 0
> D/memalloc( 2290):  Node 12 -> Start Address : 155392 Size 14976 Free info 1
> D/memalloc( 2290):  Node 13 -> Start Address : 170368 Size 60800 Free info 0
> D/memalloc( 2290):  Node 14 -> Start Address : 231168 Size 73728 Free info 0
> D/memalloc( 2290):  Node 15 -> Start Address : 304896 Size 19200 Free info 0
> D/memalloc( 2290):  Node 16 -> Start Address : 324096 Size 19200 Free info 0
> D/memalloc( 2290):  Node 17 -> Start Address : 343296 Size 3072 Free info 1
> D/memalloc( 2290):  Node 18 -> Start Address : 346368 Size 43008 Free info 0
> D/memalloc( 2290):  Node 19 -> Start Address : 389376 Size 73728 Free info 0
> D/memalloc( 2290):  Node 20 -> Start Address : 463104 Size 43008 Free info 0
> D/memalloc( 2290):  Node 21 -> Start Address : 506112 Size 18176 Free info 1
> D/memalloc( 2290):  Total Allocated: Total Free: 
> D/memalloc( 2290): ----------------------------------------------
> E/memalloc( 2290): /dev/pmem: No more pmem available
> W/memalloc( 2290): Falling back to ashmem
> D/memalloc( 2290): ashmem: Allocated buffer base:0x44f10000 size:614400 fd:171
> D/memalloc( 2937): ashmem: Mapped buffer base:0x45668000 size:614400 fd:62
> D/memalloc( 2290): /dev/pmem: Freeing buffer base:0x4af75000 size:425984 offset:3932160 fd:162
> D/memalloc( 2937): /dev/pmem: Unmapping buffer base:0x48dc0000 size:4358144 offset:3932160
> D/memalloc( 2290): /dev/pmem: Freeing buffer base:0x4af0d000 size:425984 offset:3506176 fd:56
> D/memalloc( 2937): /dev/pmem: Unmapping buffer base:0x48a00000 size:3932160 offset:3506176

Pmem hands out physically contiguous memory it receives from kmalloc, which could explain why the available pages are remain unused. IIRC kmalloc is implemented directly on top of the SLAB allocator and only uses the configured pools. The output of /proc/slabinfo before the crash is shown below. You can see that the allocator is short in memory.

> root@android:/ # cat /proc/slabinfo                                            
> slabinfo - version: 2.1
> # name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
> ...
> size-4194304           0      0 4194304    1 1024 : tunables    1    1    0 : slabdata      0      0      0
> size-2097152           0      0 2097152    1  512 : tunables    1    1    0 : slabdata      0      0      0
> size-1048576           1      1 1048576    1  256 : tunables    1    1    0 : slabdata      1      1      0
> size-524288            1      1 524288    1  128 : tunables    1    1    0 : slabdata      1      1      0
> size-262144            0      0 262144    1   64 : tunables    1    1    0 : slabdata      0      0      0
> size-131072            0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
> size-65536             4      4  65536    1   16 : tunables    8    4    0 : slabdata      4      4      0
> size-32768             3      3  32768    1    8 : tunables    8    4    0 : slabdata      3      3      0
> size-16384            36     36  16384    1    4 : tunables    8    4    0 : slabdata     36     36      0
> size-8192             41     41   8192    1    2 : tunables    8    4    0 : slabdata     41     41      0
> size-4096            136    141   4096    1    1 : tunables   24   12    0 : slabdata    136    141      0
> size-2048            172    172   2048    2    1 : tunables   24   12    0 : slabdata     86     86      0
> size-1024            224    224   1024    4    1 : tunables   54   27    0 : slabdata     56     56      0
> size-512             861    864    512    8    1 : tunables   54   27    0 : slabdata    108    108      0
> size-256             165    165    256   15    1 : tunables  120   60    0 : slabdata     11     11      0
> size-192             148    160    192   20    1 : tunables  120   60    0 : slabdata      8      8      0
> size-128             328    330    128   30    1 : tunables  120   60    0 : slabdata     11     11      0
> size-96             1199   1240     96   40    1 : tunables  120   60    0 : slabdata     31     31      0
> size-64             4003   4012     64   59    1 : tunables  120   60    0 : slabdata     68     68      0
> size-32             6670   6780     32  113    1 : tunables  120   60    0 : slabdata     60     60      0
> kmem_cache           144    160     96   40    1 : tunables  120   60    0 : slabdata      4      4      0
There are actually 3 different kmalloc implementations, SLAB, SLOB, and SLUB. You may want to compare them to see what impact they have on performance and/or memory usage.

The otoro kernel is currently configured to use SLAB. IIRC SLAB is also a power of 2 allocator, so when that pmem request that allocates 11075584 would actually be consuming 2Mb.

Changing the SLAB/SLUB/SLOB setting will require all of the modules to be rebuilt as well (or at least it has in the past).

http://census-labs.com/news/2012/01/03/linux-kernel-heap-exploitation/
http://lwn.net/Articles/229984/
http://www.socallinuxexpo.org/scale9x/presentations/status-linux-slab-allocators.html
Running out of pmem is perfectly fine.  We just fall back on non-physically-contiguous shared memory (which forces our screen compositor code out of its fast path).

I'd like to find out why the kernel can't satisfy the allocation requests when there's apparently plenty of pagecache memory left.  Is it pinned somehow?  Does linux not allow it to fall below a certain threshold?  Etc.
Chris, are you sure you have 20 MiBs of pagecache left? I managed to get the output of /proc/meminfo shortly before CTR gets killed, which is while the first level is being loaded. But from the numbers, it doesn't seem like there is enough memory to run CTR.

> root@android:/ # cat /proc/meminfo
> MemTotal:         172880 kB
> MemFree:            3252 kB
> Buffers:               4 kB
> Cached:            25516 kB
> SwapCached:            0 kB
> Active:           139260 kB
> Inactive:           6040 kB
> Active(anon):     129020 kB
> Inactive(anon):       80 kB
> Active(file):      10240 kB

10 MiB of the pagecache are actively in use and cannot be reclaimed.

> Inactive(file):     5960 kB

These 6 MiB could be reclaimed.

> Unevictable:        9204 kB

It's probably worth investigating why there are 9 MiBs of unevictable memory. This means that the memory pages are mlock'd, or so.

> Mlocked:            8640 kB

See above.

> HighTotal:             0 kB
> HighFree:              0 kB
> LowTotal:         172880 kB
> LowFree:            3252 kB
> SwapTotal:             0 kB
> SwapFree:              0 kB
> Dirty:                 0 kB
> Writeback:             0 kB
> AnonPages:        129008 kB
> Mapped:            20744 kB
> Shmem:              8756 kB
> Slab:               7596 kB
> SReclaimable:       1312 kB
> SUnreclaim:         6284 kB
> KernelStack:        1672 kB
> PageTables:         1648 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:       86440 kB
> Committed_AS:     289468 kB
> VmallocTotal:     720896 kB
> VmallocUsed:       34444 kB
> VmallocChunk:     661052 kB

To release pages ASAP, I reduced the time for Linux to move pages from active to inactive state (/proc/sys/vm/dirty_expire_centisecs) to 1 cs, and increase the scheduling interval of the flusher thread to 10 cs (/proc/sys/vm/dirty_writeback_centisecs). This free'd up some memory earlier, but didn't improve the overall results.

From these results, I'd say this is not a problem within Linux, but CTR using too much memory.
Thanks for the analysis, Thomas!

Here's the result I get from just before OOM death.  I'm actually not able to get through the first level anymore.

> MemTotal:         188248 kB

Thomas: make sure you have the kernel update 8.  It frees up more memory.

> MemFree:            2700 kB
> Buffers:              24 kB
> Cached:            17456 kB
> SwapCached:            0 kB
> Active:           158516 kB
> Inactive:           3240 kB
> Active(anon):     151804 kB
> Inactive(anon):       80 kB
> Active(file):       6712 kB
> Inactive(file):     3160 kB
> Unevictable:        7480 kB
> Mlocked:            6936 kB
> HighTotal:             0 kB
> HighFree:              0 kB
> LowTotal:         188248 kB
> LowFree:            2700 kB
> SwapTotal:             0 kB
> SwapFree:              0 kB
> Dirty:                 0 kB
> Writeback:             0 kB
> AnonPages:        151792 kB
> Mapped:            16968 kB
> Shmem:              7052 kB
> Slab:               8672 kB
> SReclaimable:       1480 kB
> SUnreclaim:         7192 kB
> KernelStack:        1672 kB
> PageTables:         1580 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:       94124 kB
> Committed_AS:     316376 kB
> VmallocTotal:     720896 kB
> VmallocUsed:       26444 kB
> VmallocChunk:     666460 kB

So my results look similar to yours.  I'm starting with more free memory, so it's odd that you're getting further along.  My kernel is also not able to reclaim about 3MB.

I'll try setting the parameters you suggest and see if I get any further.
(In reply to Thomas Zimmermann from comment #8)
> From these results, I'd say this is not a problem within Linux, but CTR
> using too much memory.

I should have added here, this is indeed the case; the CTR app is egregious.  We're running code that was designed to run in IE on "desktop".  But it's a good test case for our low-memory handling.  We don't control the code of third-party apps like this, we'll need to deal with memory hogs.
With the kernel params tweaked, I also see

> Inactive(file):     6620 kB

but the results are the same.  My logging isn't very precise, so maybe this memory is actually being reclaimed and the timing is just bad.
The JS engine might mlock stacks. Grep the source for mlock.
Summary: Cut-the-Rope kills b2g on otoro (with SIGKILL) → Cut-the-Rope subprocess is OOM'ing
The PSS for the CTR process gets up to 90MB+ before it dies its OOM death.  This is in the neighborhood of the available memory we were targeting for app processes for v1, so this isn't necessarily a bug.

However, what's somewhat disturbing is that it appears the memory usage of the b2g process has gone up significantly since the last time I measured, possibly up to 8MB USS.  That probably explains why I can't play the first level anymore.

My kingdom for about:memory --> logcat! :)
> My kingdom for about:memory --> logcat! :)

Actively working on it, bug 788021.
Hum, after a reboot, the b2g process was using *10 MB* less USS, and I got past the first level on CTR.  Trying to load the second level, its PSS usage goes past 95MB and then it's OOM killed.
Thanks everyone. I'm currently investigating these unevictable pages. Searching through the sources hasn't been successful, so I'm working on a kernel module to output the relevant bits of the kernel's VMAs.
> Thanks everyone. I'm currently investigating these unevictable pages.

Sorry...what exactly are you doing?
(In reply to Justin Lebar [:jlebar] from comment #17)
> > Thanks everyone. I'm currently investigating these unevictable pages.
> 
> Sorry...what exactly are you doing?

Typically, almost all cached memory pages can be evicted from the pagecache. The output of /proc/meminfo in comments 8 and 9 shows 9 MiBs of memory that can not be evicted by Linux' flusher threads. I try to find the owner of this memory and afterwards I'll try to make the respective memory pages evictable.
I found that the unevictable pages are graphics buffers within '/system/b2g/b2g' (pid 105), which get allocated when CTR uses canvas elements. Shown below is the output of the kernel module is used for inspecting the kernel's VMAs. The format roughly resembles the content of /proc/<pid>/maps.

> <6>[09-11 22:21:57.067] [879: insmod]pid nr is 105
> <6>[09-11 22:21:57.067] [879: insmod]47900000-47a50000 rw-sl 00000000 00:04 3358
> <6>[09-11 22:21:57.067] [879: insmod] /dev/ashmem/gralloc-buffer-4a89dd40 (deleted)
> <6>[09-11 22:21:57.067] [879: insmod]
> <6>[09-11 22:21:57.067] [879: insmod]4d8fb000-4db3b000 rw-sl 00000000 00:04 3743
> <6>[09-11 22:21:57.067] [879: insmod] /dev/ashmem/gralloc-buffer-4a93d640 (deleted)
> <6>[09-11 22:21:57.067] [879: insmod]
> <6>[09-11 22:21:57.067] [879: insmod]4db3b000-4dd7b000 rw-sl 00000000 00:04 3781
> <6>[09-11 22:21:57.067] [879: insmod] /dev/ashmem/gralloc-buffer-4aa368c0 (deleted)
> <6>[09-11 22:21:57.067] [879: insmod]

We obviously cannot throw away the graphics buffers, but TowerJelly does not have unevictable pages.

 - Does this only depend on the application, or is there anything to optimize at the lower levels?
 - Or does this depend on the use of pmem vs ashmem?

It would be helpful to me if someone with knowledge in the graphics system could comment on these questions. Thanks!
Ah!  Nice find.

In the course of normal operation, we use pmem-backed "gralloc" memory for gfx.  But it's only allocated from a fixed-size pool, so it may run even when there's RAM still available.

This game is running at something like 16x the screen resolution of the otoro, and lord knows what crazy things it might be doing.  I always see it exhaust gralloc memory when it runs.

Usually exhausting gralloc just results in allocation failures, but the vendor allocator library here will fall back on ashmem when pmem is gone, transparent to the allocator.

The semantics of ashmem is a little odd --- as I recall it can be in one of two states, pinned or not-pinned.  When it's not pinned, the kernel is free to throw away the backing memory whenever it wants.  When it's pinned, the kernel can't do that.  IIRC ashmem segments start out pinned.

So I would bet that ashmem pinning is implemented by the same mechanism as mlock(), and the mlock()d memory here is for gfx buffers.

The problem is, we can't unpin this memory without causing crashes or graphical corruption.  So I guess we're doing about as well as we can do here :/.  Need to look for wins elsewhere.

(I have an idea that might let us not run out of pmem so soon.)
Ah, ok. Thanks for the info.
We could experiment with linking /dev/pmem and /dev/pmem_adsp together but then we need a in-gecko mechanism to evict gralloc buffers from pmem when it comes time to turn on the camera or do OMX video decoding.
Good to know, thanks.

We have another 5MB win in hand, and I'm starting to think we may be able to get by with 8MB of gralloc, which would get us another 5MB.  We're already at the initial target for available app memory, so at this point it's probably not worth the risk of trying that.
Assignee: tzimmermann → nobody
Status: ASSIGNED → NEW
Whiteboard: [3rd-party-preloaded-apps]
Memory usage has improved enough to run the unmodified game.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [3rd-party-preloaded-apps]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: