Closed Bug 1008613 Opened 10 years ago Closed 10 years ago

Assertion failure: (0 == munmap(end, regionEnd - uintptr_t(end))), at gc/Memory.cpp

Categories

(Core :: JavaScript: GC, defect)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla32
Tracking Status
firefox29 --- unaffected
firefox30 --- unaffected
firefox31 --- wontfix
firefox32 --- fixed
firefox-esr24 --- unaffected
firefox-esr31 --- wontfix

People

(Reporter: gkw, Unassigned)

References

Details

(Keywords: assertion, regression, testcase)

Attachments

(2 files)

Attached file stack
The upcoming testcase asserts js debug shell on m-c changeset 739a1d2861aa with --no-baseline at Assertion failure: (0 == munmap(end, regionEnd - uintptr_t(end))), at gc/Memory.cpp

Sometimes this needs ASLR disabled, sometimes this doesn't, to make this more reliable. See http://stackoverflow.com/a/5194709 for how to do so.

Setting s-s and sec-critical because this has been a pain to reduce, is intermittent, and is a gc-related assertion. (Also cc'ing our gc gurus)

My configure flags are:

AR=ar sh /home/fuzz2lin/trees/mozilla-central/js/src/configure --enable-optimize --enable-debug --enable-profiling --enable-gczeal --enable-debug-symbols --disable-tests --enable-more-deterministic --with-ccache --enable-threadsafe <other NSPR options>

autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   http://hg.mozilla.org/mozilla-central/rev/5f964391ed13
user:        Nicholas Nethercote
date:        Thu Apr 03 23:31:20 2014 -0700
summary:     Bug 991998 (part 2) - Limit JSFatInlineString's length to 11 on both 32-bit and 64-bit platforms. r=luke.

Nick, is bug 991998 a likely regressor?
Flags: needinfo?
(so I need to set :njn in the cc list instead of needinfo?-ing directly as he's not part of the sec group or something... which means I'll email him the testcase)

Also, thanks for :sunfish for suggesting disabling ASLR to make the issue more reliable, this saved me lots of time. :)
Flags: needinfo?
Flags: needinfo?(n.nethercote)
I'm hitting this too sometimes, it looked like an OOM issue to me.
> Nick, is bug 991998 a likely regressor?

It seems unlikely. That changed how big JSFatInlineString is on 64-bit, which does involve memory, but it was an almost-trivial change that just reduced the number of chars in a JSFatInlineString to match the number that we get on 32-bit.

This assertion here is in mapAlignedPages(), which is responsible for allocating large chunks of memory for the JS heap. It needs a particular alignment (most commonly a 1 MiB chunk with a 1 MiB alignment) in which case it over-allocates (asks for 3 MiB) and then unmaps the front and rear to give a chunk of the required size and alignment. It's the "unmap the rear" step that's failing here, and for munmap() to fail usually requires that one of its arguments is bogus. I can't see an obvious way that could happen.

All this appears to be entirely unrelated to bug 991998. It also doesn't look like an OOM issue to me -- if that were the case, I'd expect the call to MapMemory() earlier in the function to fail.

I tried and failed to reproduce, even after disabling ASLR. I get this output on every invocation:

> undefined
> js_ReportOverRecursed called
> a.js:1093:31 InternalError: too much recursion

Gary, how many times do you have to run it to reproduce? I tried a few hundred times. If you can printf the values of all the local variables -- size, alignment, reqSize, region, regionEnd, offset, front, end -- in a failing run I'd be very interested to see the output.
Flags: needinfo?(n.nethercote)
Attached patch Diagnostic patchSplinter Review
Gary, can you try this patch and report its output on a failure? Thanks!
Attachment #8420732 - Flags: feedback?(gary)
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
Comment on attachment 8420732 [details] [diff] [review]
Diagnostic patch

Without patch:

$ ~/Desktop/js-dbg-64-dm-ts-linux-mozilla-central-20ca234fd62b-182415-nW8KTo/js-dbg-64-dm-ts-linux-20ca234fd62b --no-baseline 1008613.js
undefined
Assertion failure: (0 == munmap(end, regionEnd - uintptr_t(end))), at /home/fuzz2lin/trees/mozilla-central/js/src/gc/Memory.cpp:277
Segmentation fault (core dumped)

With patch:

$ /home/fuzz2lin/Desktop/js-dbg-64-dm-ts-linux-mozilla-central-20ca234fd62b-182415-1008613-c5-de61117c4df8-jm8b_M/js-dbg-64-dm-ts-linux-20ca234fd62b-1008613-c5-de61117c4df8 --no-baseline 1008613.js
undefined
size = 1048576
alignment = 1048576
reqSize = 2097152
region = 0x2e09fee000
regionEnd = 0x2e0a1ee000
offset = 0xee000
front = 0x2e0a000000
end = 0x2e0a100000
regionEnd - uintptr_t(end) = 0xee000
ret = -1
Assertion failure: 0, at /home/fuzz2lin/trees/mozilla-central/js/src/gc/Memory.cpp:290
Segmentation fault (core dumped)
Attachment #8420732 - Flags: feedback?(gary) → feedback+
Flags: needinfo?(n.nethercote)
> $
> /home/fuzz2lin/Desktop/js-dbg-64-dm-ts-linux-mozilla-central-20ca234fd62b-
> 182415-1008613-c5-de61117c4df8-jm8b_M/js-dbg-64-dm-ts-linux-20ca234fd62b-
> 1008613-c5-de61117c4df8 --no-baseline 1008613.js
> undefined
> size = 1048576
> alignment = 1048576
> reqSize = 2097152
> region = 0x2e09fee000
> regionEnd = 0x2e0a1ee000
> offset = 0xee000
> front = 0x2e0a000000
> end = 0x2e0a100000
> regionEnd - uintptr_t(end) = 0xee000
> ret = -1
> Assertion failure: 0, at
> /home/fuzz2lin/trees/mozilla-central/js/src/gc/Memory.cpp:290
> Segmentation fault (core dumped)

I don't see anything wrong with this. We have 0x12000 bytes at the front that is successfully unmapped, then 0x100000 bytes in the middle, then 0xee000 bytes at the end which looks like it should be unmappable but the munmap() call fails.

Gary has arranged ssh access for me on a machine that reproduces this. It's later here now; I'll investigate some more tomorrow.
Flags: needinfo?(n.nethercote)
I managed to reproduce locally -- I had been omitting the --no-baseline flag.

Christian's right: it's an OOM. The kernel is returning ENOMEM for the munmap call. Why it's doing this for the munmap call rather than the mmap call, I don't know.

The test calls |new SharedArrayBuffer(4096)| thousands of times. On Linux64, each call to SharedArrayRawBuffer::New() mmaps 4 GiB of address space. Do that a few thousand times and the address space fills up -- e.g. RSS is 190 MiB and a VSIZE of 127.8 GiB. On my machine, the failing munmap call is for a region of memory that's quite close to the stack, though not quite there.

I'm now certain that bug 991998 is unrelated to this.

I also don't think this is actually an s-s bug. When the munmap fails we'll end up with an extra bit of memory mapped that we don't want, and that's not a security problem, AFAICT.

As for how to fix it... mmapping 4 GiB for a |new SharedArrayBuffer(4096)| is silly. Except SharedArrayBuffer is designed for asm.js, where typically you would only have one of them. (sstangl, any thoughts about this?)

One possibility is this: if the munmap fails, instead of asserting, check the error value. If it's ENOMEM, just let mapAlignedPages fail by returning false. If it's something other than ENOMEM, then assert.
No longer blocks: 991998
Flags: needinfo?(sstangl)
I admit I don't know why --no-baseline affects this test. It seems like it shouldn't.

Relatedly, the following program terminates without any kind of error message after printing up to 32722:

  var a = [];
  var i = 0;
  while (true) {
      a.push(new SharedArrayBuffer(4096));
      print(++i);
  }
Group: core-security, javascript-core-security
Ah, I completely forgot: Part 2b in bug 1005849 implemented the suggestion in comment #8. Nicholas, can you confirm that this no longer reproduces?
Flags: needinfo?(n.nethercote)
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #10)
> Ah, I completely forgot: Part 2b in bug 1005849 implemented the suggestion
> in comment #8. Nicholas, can you confirm that this no longer reproduces?

Correct -- it no longer reproduces. I guess we could mark this as WORKSFORME though I'd still like feedback from sstangl...
Flags: needinfo?(n.nethercote)
As it seems to be fixed, no longer tracking it. Please resubmit to tracking if it is not actually fixed.
Assignee: n.nethercote → nobody
SharedArrayBuffer is being redesigned and will likely not require the 4GB mapping unless bound to AsmJS. Lars is doing the redesign.
Flags: needinfo?(sstangl)
The case here was fixed as part of bug 1005849, so I think we can resolve this. jemalloc might run into the same thing, but that shouldn't affect the GC allocations unless we switch to using jemalloc directly at some point.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Depends on: 1005849
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: