Closed Bug 1539133 Opened 5 years ago Closed 5 years ago

Crash [@ pages_decommit] (MOZ_CRASH) with OOM due to large ArrayBuffer

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla72

Tracking Flags:

Tracking

Status

firefox-esr68

---

wontfix

firefox68

---

wontfix

firefox70

---

wontfix

firefox71

---

wontfix

firefox72

---

fixed

People

(Reporter: decoder, Assigned: gcp)

Details

(4 keywords, Whiteboard: [jsbugmon:update])

Crash Data

Attachments

(2 files)

log.gz 5 years ago Christian Holler (:decoder) 378.64 KB, application/gzip		Details
Bug 1539133 - Explicitly say not being able to mmap is likely OOM. r?decoder 5 years ago Gian-Carlo Pascutto [:gcp] 47 bytes, text/x-phabricator-request		Details \| Review

Christian Holler (:decoder)

Reporter

Description

•

5 years ago

The following testcase crashes on mozilla-central revision 4a692c812a3f (build with --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --disable-profiling --enable-debug --enable-optimize, run with --fuzzing-safe --cpu-count=2 --ion-offthread-compile=off):

function testValue() {
  var ab = new ArrayBuffer(0x100000);
  var dbg = new testValue;
  var target = {};
}
testValue();

Backtrace:

received signal SIGSEGV, Segmentation fault.
#0  pages_decommit (aSize=1048576, aAddr=0x7feff9900000) at memory/build/mozjemalloc.cpp:1285
#1  0x000055555587827e in pages_decommit (aSize=1048576, aAddr=0x7feff9900000) at memory/build/mozjemalloc.cpp:3580
#2  arena_t::PallocHuge (this=<optimized out>, aSize=<optimized out>, aAlignment=<optimized out>, aZero=<optimized out>) at memory/build/mozjemalloc.cpp:3573
#3  0x000055555587a81b in BaseAllocator::calloc (aSize=1, aNum=1048576, this=<synthetic pointer>) at memory/build/mozjemalloc.cpp:3931
#4  Allocator<MozJemallocBase>::moz_arena_calloc (arg2=1, arg1=1048576, aArenaId=aArenaId@entry=10204206196954226593) at memory/build/malloc_decls.h:39
#5  moz_arena_calloc (arg1=arg1@entry=10204206196954226593, arg2=arg2@entry=1048576, arg3=arg3@entry=1) at memory/build/malloc_decls.h:117
#6  0x00005555559bf114 in js_arena_calloc (size=1, nmemb=1048576, arena=10204206196954226593) at dist/include/js/Utility.h:368
#7  js_pod_arena_calloc<unsigned char> (numElems=1048576, arena=10204206196954226593) at dist/include/js/Utility.h:573
#8  js::MallocProvider<JSContext>::maybe_pod_calloc<unsigned char> (arena=10204206196954226593, numElems=1048576, this=0x7ffff5f17000) at js/src/vm/MallocProvider.h:62
#9  JSContext::pod_callocCanGC<unsigned char> (arena=10204206196954226593, numElems=1048576, this=0x7ffff5f17000) at js/src/vm/JSContext.h:223
#10 AllocateArrayBufferContents (nbytes=1048576, cx=0x7ffff5f17000) at js/src/vm/ArrayBufferObject.cpp:450
#11 js::ArrayBufferObject::createZeroed (cx=<optimized out>, cx@entry=0x7ffff5f17000, nbytes=1048576, proto=proto@entry=...) at js/src/vm/ArrayBufferObject.cpp:1244
#12 0x00005555559bf47c in js::ArrayBufferObject::class_constructor (cx=0x7ffff5f17000, argc=<optimized out>, vp=<optimized out>) at js/src/vm/ArrayBufferObject.cpp:441
#13 0x00001527e8fdbdd6 in ?? ()
#14 0x0000000000000000 in ?? ()
rax	0x555557c20240	93825032913472
rbx	0x100000	1048576
rcx	0x7ffff6c1c2dd	140737333281501
rdx	0x555556aa9fd6	93825014603734
rsi	0x7ffff6eeb770	140737336227696
rdi	0x7ffff6eea540	140737336223040
rbp	0x7fffffeaeff0	140737486974960
rsp	0x7fffffeaeff0	140737486974960
r8	0x7ffff6eeb770	140737336227696
r9	0x7ffff7fe6cc0	140737354034368
r10	0x58	88
r11	0x7ffff6b927a0	140737332717472
r12	0x200000	2097152
r13	0x7ff38ec93680	140684049331840
r14	0x7fffffeaf020	140737486975008
r15	0x7feff9800000	140668659826688
rip	0x55555581e3fc <pages_decommit(size_t, void*)+38>
=> 0x55555581e3fc <pages_decommit(size_t, void*)+38>:	movl   $0x0,0x0
   0x55555581e407 <pages_decommit(size_t, void*)+49>:	ud2

Jon Coppeard (:jonco)

Updated

•

5 years ago

Component: JavaScript Engine → Memory Allocator

Mike Hommey [:glandium]

Comment 1

•

5 years ago

memory/build/mozjemalloc.cpp:1285 is a MOZ_CRASH on error returned by mmap... why is mmap failing?

Flags: needinfo?(choller)

Christian Holler (:decoder)

Reporter

Comment 2

•

5 years ago

(In reply to Mike Hommey [:glandium] from comment #1)

memory/build/mozjemalloc.cpp:1285 is a MOZ_CRASH on error returned by mmap... why is mmap failing?

How would I find that out?

Flags: needinfo?(choller) → needinfo?(mh+mozilla)

Fuzzing Team

Comment 3

•

5 years ago

JSBugMon: Bisection requested, result:
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/becf3a27c2bb
user:        Gian-Carlo Pascutto
date:        Wed Mar 20 10:45:15 2019 +0000
summary:     Bug 1529922 - Add guard pages following huge allocations. r=glandium

This iteration took 532.923 seconds to run.

Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:update]

Mike Hommey [:glandium]

Comment 4

•

5 years ago

Start with strace, and see what the error code is?

Flags: needinfo?(mh+mozilla)

Christian Holler (:decoder)

Reporter

Comment 5

•

5 years ago

(In reply to Mike Hommey [:glandium] from comment #4)

Start with strace, and see what the error code is?

The only thing I see in strace is

mmap(0x7f649bb00000, 2097152, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)

Flags: needinfo?(mh+mozilla)

Mike Hommey [:glandium]

Comment 6

•

5 years ago

That doesn't match the size in the pages_decommit call in your crash stack trace. A more complete strace, along with the stack trace you get from that specific run would be useful.

Or a complete STR (comment 0 say "run with --fuzzing-safe..." but... run what?)

Flags: needinfo?(mh+mozilla)

Christian Holler (:decoder)

Reporter

Comment 7

•

5 years ago

(In reply to Mike Hommey [:glandium] from comment #6)

That doesn't match the size in the pages_decommit call in your crash stack trace. A more complete strace, along with the stack trace you get from that specific run would be useful.

Or a complete STR (comment 0 say "run with --fuzzing-safe..." but... run what?)

This bug was reassigned from the JS engine component. All the bugs we file there are reproduced on the JS shell. I will also attach the full strace log.

Flags: needinfo?(mh+mozilla)

Christian Holler (:decoder)

Reporter

Comment 8

•

5 years ago

Attached file log.gz — Details

Full strace log

Mike Hommey [:glandium]

Comment 9

•

5 years ago

Considering how many calls to mmap and munmap there are in this log, I'm tempted to say you're exhausting /proc/sys/vm/max_map_count. It also seems you're allocating a lot of memory.

Seeing this log, it also looks like the fact the JS engine is allocating 1MB chunks makes bug 1529922 a large waste of address space. In this particular instance, it makes the exhaustion of system resources wrt memory faster to happen. You probably were close to exhaustion before.

Flags: needinfo?(mh+mozilla)

Mike Hommey [:glandium]

Comment 10

•

5 years ago

•

Edited

It also seems you're allocating a lot of memory.

Crude calculation says close to 70GB. Which implies you likely were mapping more than 35GB before bug 1529922.

Jim Mathies [:jimm]

Updated

•

5 years ago

Flags: needinfo?(gpascutto)

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 11

•

5 years ago

•

Edited

So the summary is that:

a) We have a clownshoes issue because the code is allocating a 1M arraybuffer, which happens to 1) go through to our allocator unmodified because ArrayBuffers are special 2) hits the worst case there.

b) The code is allocating 35GB. Due to the clownshoes, this doubles to 70GB, which is over the 65G limit that Linux defaults to.

The clownshoes are unfortunate but I don't see a simple (or complicated) way around it. Unless there's evidence that this worsens performance in the field, I think this is a WONTFIX from my side. Workaround would be to make the buffer 4096 bytes smaller or allocate a bit less in a single run.

If you need to keep the testcase unmodified, I guess you'll want to look if the kernel settings on CI machines can be modified.

Flags: needinfo?(gpascutto)

Christian Holler (:decoder)

Reporter

Comment 12

•

5 years ago

The problem is not so much that we crash, but rather that there is no visible message/indicator that this is an unhandled out-of-memory crash. In the JS engine, we have the contract that all intended crashes due to out-of-memory emit something on stderr so we can filter it out easily.

Since this is a regular MOZ_CRASH in non-JS code there is two problems:

In opt builds, the messages are omitted (something I would very much like to change, even if it is only for fuzzing builds, because it makes triaging and signatures a lot easier).
The message itself doesn't have a clear indicator/prefix that this is an OOM situation that can't be easily recovered.

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 13

•

5 years ago

So in the scope of the bug, i.e. issue (2), it's just a matter of changing the MOZ_CRASH() into MOZ_CRASH("OOM: No more mappings available").

By the way, comment 1 says that this happens on the decommit call. In theory we could just ignore the failure there. But then an attacker can disable guard pages by nearly exhausting memory. Probably also not what we want.

glandium, it seems that using the DECOMMIT logic for normal allocations would mean we're liable to hit this max_map_count much faster because we'll now be DECOMMIT-ing pages in the middel of a normal chunk. Is this an argument for going back to doing guard regions with mprotect instead?

Flags: needinfo?(mh+mozilla)

Mike Hommey [:glandium]

Comment 14

•

5 years ago

Is this an argument for going back to doing guard regions with mprotect instead?

That makes no practical difference:

#include <stddef.h>
#include <stdio.h>
#include <sys/mman.h>

int main(int argc, char *argv[]) {
  char *ptr = mmap(NULL, 2UL * 1024 * 1024 * 1024, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
  int i;
  for (i = 0; ; i++) {
    int fail;
    if (argc > 1) {
      fail = (mmap(ptr + i * 8192, 4096, PROT_NONE, MAP_PRIVATE | MAP_ANON | MAP_FIXED, -1, 0) == MAP_FAILED);
    } else {
      fail = mprotect(ptr + i * 8192, 4096, PROT_NONE);
    }
    if (fail) {
      printf("%d\n", i);
      break;
    }
  }
  return 0;
}

$ ./test
32754
$ ./test foo
32755

Flags: needinfo?(mh+mozilla)

Patricia Lawless

Updated

•

5 years ago

status-firefox68: affected → fix-optional

Patricia Lawless

Comment 15

•

5 years ago

Hey, is anyone working on this and what is the priority?

Flags: needinfo?(mh+mozilla)

Mike Hommey [:glandium]

Updated

•

5 years ago

Flags: needinfo?(mh+mozilla) → needinfo?(choller)

Eric Rahm [:erahm]

Comment 16

•

5 years ago

gcp, are you going to take care of fixing part 2 as you noted in comment 13?

Flags: needinfo?(choller) → needinfo?(gpascutto)

Priority: -- → P2

Gian-Carlo Pascutto [:gcp]

Assignee

Updated

•

5 years ago

Assignee: nobody → gpascutto

Flags: needinfo?(gpascutto)

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 17

•

5 years ago

Attached file Bug 1539133 - Explicitly say not being able to mmap is likely OOM. r?decoder — Details

Pulsebot

Comment 18

•

5 years ago

Pushed by gpascutto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d65ec56af997
Explicitly say not being able to mmap is likely OOM. r=decoder,glandium

Bogdan Tara[:bogdan_tara | bogdant]

Comment 19

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d65ec56af997

Status: NEW → RESOLVED

Closed: 5 years ago

status-firefox72: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla72

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

status-firefox68: fix-optional → wontfix

status-firefox70: --- → wontfix

status-firefox71: --- → wontfix

status-firefox-esr68: --- → wontfix

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 20

•

4 years ago

While copy-pasting this code to debug another issue, I found out that the log message is missing a trailing newline.

You need to log in before you can comment on or make changes to this bug.