Crash [@ pages_decommit] (MOZ_CRASH) with OOM due to large ArrayBuffer
Categories
(Core :: Memory Allocator, defect, P2)
Tracking
()
People
(Reporter: decoder, Assigned: gcp)
Details
(4 keywords, Whiteboard: [jsbugmon:update])
Crash Data
Attachments
(2 files)
The following testcase crashes on mozilla-central revision 4a692c812a3f (build with --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --disable-profiling --enable-debug --enable-optimize, run with --fuzzing-safe --cpu-count=2 --ion-offthread-compile=off):
function testValue() {
var ab = new ArrayBuffer(0x100000);
var dbg = new testValue;
var target = {};
}
testValue();
Backtrace:
received signal SIGSEGV, Segmentation fault.
#0 pages_decommit (aSize=1048576, aAddr=0x7feff9900000) at memory/build/mozjemalloc.cpp:1285
#1 0x000055555587827e in pages_decommit (aSize=1048576, aAddr=0x7feff9900000) at memory/build/mozjemalloc.cpp:3580
#2 arena_t::PallocHuge (this=<optimized out>, aSize=<optimized out>, aAlignment=<optimized out>, aZero=<optimized out>) at memory/build/mozjemalloc.cpp:3573
#3 0x000055555587a81b in BaseAllocator::calloc (aSize=1, aNum=1048576, this=<synthetic pointer>) at memory/build/mozjemalloc.cpp:3931
#4 Allocator<MozJemallocBase>::moz_arena_calloc (arg2=1, arg1=1048576, aArenaId=aArenaId@entry=10204206196954226593) at memory/build/malloc_decls.h:39
#5 moz_arena_calloc (arg1=arg1@entry=10204206196954226593, arg2=arg2@entry=1048576, arg3=arg3@entry=1) at memory/build/malloc_decls.h:117
#6 0x00005555559bf114 in js_arena_calloc (size=1, nmemb=1048576, arena=10204206196954226593) at dist/include/js/Utility.h:368
#7 js_pod_arena_calloc<unsigned char> (numElems=1048576, arena=10204206196954226593) at dist/include/js/Utility.h:573
#8 js::MallocProvider<JSContext>::maybe_pod_calloc<unsigned char> (arena=10204206196954226593, numElems=1048576, this=0x7ffff5f17000) at js/src/vm/MallocProvider.h:62
#9 JSContext::pod_callocCanGC<unsigned char> (arena=10204206196954226593, numElems=1048576, this=0x7ffff5f17000) at js/src/vm/JSContext.h:223
#10 AllocateArrayBufferContents (nbytes=1048576, cx=0x7ffff5f17000) at js/src/vm/ArrayBufferObject.cpp:450
#11 js::ArrayBufferObject::createZeroed (cx=<optimized out>, cx@entry=0x7ffff5f17000, nbytes=1048576, proto=proto@entry=...) at js/src/vm/ArrayBufferObject.cpp:1244
#12 0x00005555559bf47c in js::ArrayBufferObject::class_constructor (cx=0x7ffff5f17000, argc=<optimized out>, vp=<optimized out>) at js/src/vm/ArrayBufferObject.cpp:441
#13 0x00001527e8fdbdd6 in ?? ()
#14 0x0000000000000000 in ?? ()
rax 0x555557c20240 93825032913472
rbx 0x100000 1048576
rcx 0x7ffff6c1c2dd 140737333281501
rdx 0x555556aa9fd6 93825014603734
rsi 0x7ffff6eeb770 140737336227696
rdi 0x7ffff6eea540 140737336223040
rbp 0x7fffffeaeff0 140737486974960
rsp 0x7fffffeaeff0 140737486974960
r8 0x7ffff6eeb770 140737336227696
r9 0x7ffff7fe6cc0 140737354034368
r10 0x58 88
r11 0x7ffff6b927a0 140737332717472
r12 0x200000 2097152
r13 0x7ff38ec93680 140684049331840
r14 0x7fffffeaf020 140737486975008
r15 0x7feff9800000 140668659826688
rip 0x55555581e3fc <pages_decommit(size_t, void*)+38>
=> 0x55555581e3fc <pages_decommit(size_t, void*)+38>: movl $0x0,0x0
0x55555581e407 <pages_decommit(size_t, void*)+49>: ud2
Updated•5 years ago
|
Comment 1•5 years ago
|
||
memory/build/mozjemalloc.cpp:1285 is a MOZ_CRASH on error returned by mmap... why is mmap failing?
Reporter | ||
Comment 2•5 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #1)
memory/build/mozjemalloc.cpp:1285 is a MOZ_CRASH on error returned by mmap... why is mmap failing?
How would I find that out?
Comment 3•5 years ago
|
||
JSBugMon: Bisection requested, result: autoBisect shows this is probably related to the following changeset: The first bad revision is: changeset: https://hg.mozilla.org/mozilla-central/rev/becf3a27c2bb user: Gian-Carlo Pascutto date: Wed Mar 20 10:45:15 2019 +0000 summary: Bug 1529922 - Add guard pages following huge allocations. r=glandium This iteration took 532.923 seconds to run.
Comment 4•5 years ago
|
||
Start with strace, and see what the error code is?
Reporter | ||
Comment 5•5 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #4)
Start with strace, and see what the error code is?
The only thing I see in strace is
mmap(0x7f649bb00000, 2097152, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
Comment 6•5 years ago
|
||
That doesn't match the size in the pages_decommit call in your crash stack trace. A more complete strace, along with the stack trace you get from that specific run would be useful.
Or a complete STR (comment 0 say "run with --fuzzing-safe..." but... run what?)
Reporter | ||
Comment 7•5 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #6)
That doesn't match the size in the pages_decommit call in your crash stack trace. A more complete strace, along with the stack trace you get from that specific run would be useful.
Or a complete STR (comment 0 say "run with --fuzzing-safe..." but... run what?)
This bug was reassigned from the JS engine component. All the bugs we file there are reproduced on the JS shell. I will also attach the full strace log.
Reporter | ||
Comment 8•5 years ago
|
||
Full strace log
Comment 9•5 years ago
|
||
Considering how many calls to mmap and munmap there are in this log, I'm tempted to say you're exhausting /proc/sys/vm/max_map_count. It also seems you're allocating a lot of memory.
Seeing this log, it also looks like the fact the JS engine is allocating 1MB chunks makes bug 1529922 a large waste of address space. In this particular instance, it makes the exhaustion of system resources wrt memory faster to happen. You probably were close to exhaustion before.
Comment 10•5 years ago
•
|
||
It also seems you're allocating a lot of memory.
Crude calculation says close to 70GB. Which implies you likely were mapping more than 35GB before bug 1529922.
Updated•5 years ago
|
Assignee | ||
Comment 11•5 years ago
•
|
||
So the summary is that:
a) We have a clownshoes issue because the code is allocating a 1M arraybuffer, which happens to 1) go through to our allocator unmodified because ArrayBuffers are special 2) hits the worst case there.
b) The code is allocating 35GB. Due to the clownshoes, this doubles to 70GB, which is over the 65G limit that Linux defaults to.
The clownshoes are unfortunate but I don't see a simple (or complicated) way around it. Unless there's evidence that this worsens performance in the field, I think this is a WONTFIX from my side. Workaround would be to make the buffer 4096 bytes smaller or allocate a bit less in a single run.
If you need to keep the testcase unmodified, I guess you'll want to look if the kernel settings on CI machines can be modified.
Reporter | ||
Comment 12•5 years ago
|
||
The problem is not so much that we crash, but rather that there is no visible message/indicator that this is an unhandled out-of-memory crash. In the JS engine, we have the contract that all intended crashes due to out-of-memory emit something on stderr so we can filter it out easily.
Since this is a regular MOZ_CRASH in non-JS code there is two problems:
-
In opt builds, the messages are omitted (something I would very much like to change, even if it is only for fuzzing builds, because it makes triaging and signatures a lot easier).
-
The message itself doesn't have a clear indicator/prefix that this is an OOM situation that can't be easily recovered.
Assignee | ||
Comment 13•5 years ago
|
||
So in the scope of the bug, i.e. issue (2), it's just a matter of changing the MOZ_CRASH() into MOZ_CRASH("OOM: No more mappings available").
By the way, comment 1 says that this happens on the decommit call. In theory we could just ignore the failure there. But then an attacker can disable guard pages by nearly exhausting memory. Probably also not what we want.
glandium, it seems that using the DECOMMIT logic for normal allocations would mean we're liable to hit this max_map_count much faster because we'll now be DECOMMIT-ing pages in the middel of a normal chunk. Is this an argument for going back to doing guard regions with mprotect instead?
Comment 14•5 years ago
|
||
Is this an argument for going back to doing guard regions with mprotect instead?
That makes no practical difference:
#include <stddef.h>
#include <stdio.h>
#include <sys/mman.h>
int main(int argc, char *argv[]) {
char *ptr = mmap(NULL, 2UL * 1024 * 1024 * 1024, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
int i;
for (i = 0; ; i++) {
int fail;
if (argc > 1) {
fail = (mmap(ptr + i * 8192, 4096, PROT_NONE, MAP_PRIVATE | MAP_ANON | MAP_FIXED, -1, 0) == MAP_FAILED);
} else {
fail = mprotect(ptr + i * 8192, 4096, PROT_NONE);
}
if (fail) {
printf("%d\n", i);
break;
}
}
return 0;
}
$ ./test
32754
$ ./test foo
32755
Updated•5 years ago
|
Comment 15•5 years ago
|
||
Hey, is anyone working on this and what is the priority?
Updated•5 years ago
|
Comment 16•5 years ago
|
||
gcp, are you going to take care of fixing part 2 as you noted in comment 13?
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 17•5 years ago
|
||
Comment 18•5 years ago
|
||
Pushed by gpascutto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d65ec56af997 Explicitly say not being able to mmap is likely OOM. r=decoder,glandium
Comment 19•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Assignee | ||
Comment 20•4 years ago
|
||
While copy-pasting this code to debug another issue, I found out that the log message is missing a trailing newline.
Description
•