Closed Bug 457189 Opened 16 years ago Closed 15 years ago

infinite loop in chunk_alloc_mmap() on Solaris 10

Categories

(Core :: Memory Allocator, defect)

Sun
Solaris
defect
Not set
blocker

Tracking

()

RESOLVED FIXED

People

(Reporter: ginnchen+exoracle, Assigned: wes)

Details

(Keywords: fixed1.9.1)

Attachments

(1 file)

It's not reproducible on Solaris Nevada x86 and SPARC.
I didn't try it on S10 x86 yet.

The loop is,
2362         while (ret == NULL) {
2363             /*
2364              * Over-allocate in order to map a memory region that
2365              * is definitely large enough.
2366              */
2367             ret = pages_map(NULL, size + chunksize, -1);
2368             if (ret == NULL)
2369                 goto RETURN;
2370             /*
2371              * Deallocate, then allocate the correct size, within
2372              * the over-sized mapping.
2373              */
2374             offset = CHUNK_ADDR2OFFSET(ret);
2375             pages_unmap(ret, size + chunksize);
2376             if (offset == 0)
2377                 ret = pages_map(ret, size, pfd);
2378             else {
2379                 ret = pages_map((void *)((uintptr_t)ret +
2380                     chunksize - offset), size, pfd);
2381             }
2382             /*
2383              * Failure here indicates a race with another thread, so
2384              * try again.
2385              */
2386         }

at line 2367, size = 0x100000, chunksize = 0x100000, ret = 0xfa580000
at line 2375, page was unmapped successfully
at line 2380, offset = 0x80000, ret + (chunksize - offset) = 0xfa600000
but mmap result is 0xfa680000, so pages_map returns nil.

it loops forever, but there is no other threads.

Stack is:
=>[2] pages_map(addr = 0xfa600000, size = 1048576U, pfd = 3), line 2172 in "jemalloc.c"
  [3] chunk_alloc_mmap(size = 1048576U, pagefile = 1U), line 2380 in "jemalloc.c"
  [4] chunk_recycle_reserve(size = 1048576U, zero = 1U), line 2524 in "jemalloc.c"
  [5] chunk_alloc(size = 1048576U, zero = 1U, pagefile = 1U), line 2573 in "jemalloc.c"
  [6] arena_run_alloc(arena = 0xfaa80040, bin = 0xfaa801c8, size = 8192U, large = 0, zero = 0), line 3330 in "jemalloc.c"
  [7] arena_bin_nonfull_run_get(arena = 0xfaa80040, bin = 0xfaa801c8), line 3630 in "jemalloc.c"
  [8] arena_bin_malloc_hard(arena = 0xfaa80040, bin = 0xfaa801c8), line 3694 in "jemalloc.c"
  [9] arena_malloc_small(arena = 0xfaa80040, size = 16U, zero = 1U), line 3885 in "jemalloc.c"
  [10] arena_malloc(arena = 0xfaa80040, size = 12U, zero = 1U), line 3959 in "jemalloc.c"
  [11] icalloc(size = 12U), line 3981 in "jemalloc.c"
  [12] calloc(num = 1U, size = 12U), line 6116 in "jemalloc.c"

It seems mmap didn't take the suggested addr since we don't have MAP_FIXED.

But jemalloc in Firefox 3.0 works fine on S10.
just tested on Solaris 10 x86, also hangs.
This is related to bug 470217, which affects Linux with grsecurity PAX kernels.

This loop appears to try to talk mmap into allocating memory on a chunksize boundary.

It does this by first over-allocating, then calculating a new pointer address, unmapping the memory, and then asking for an allocation at the new address of the correct size -- however, the MAP_FIXED parameter is not passed to mmap in pages_map.

The comments in pages_map() (non-Darwin, non-Windows version) says 
        /*
         * We don't use MAP_FIXED here, because it can cause the *replacement*
         * of existing mappings, and we only want to create new mappings.
         */

...and that is probably true most of the time; however, based on the math in this loop, it should not be possible to accidentally replace a mapping int he pages_map() call (line 2379), except in the case where another thread calls mmap while this loop is executing.

I see a couple of solutions here:
 1. Extend the pages_map() interface to indicate when it is safe to use MAP_FIXED
 2. Use MAP_ALIGN when available and just ask for memory which is already aligned to "chunksize". 

They both have drawbacks.

The first one is still not an absolute fix, as the mmap spec doesn't guarantee that MAP_FIXED will return the passed address, even if it is available.  It also has the drawback of requiring thread-synchronization overhead.

The second one is a good fix, but is not available everywhere. MAP_FIXED did not appear in Solaris until version 9, and my local Linux boxen don't have it. It does have the advantage of eliminating two system calls from each chunk allocation.

Additionally, using MAP_ALIGN has the drawback that chunksize must be a power-of-two multiple of the system page size. Whether or not this is true depends on CHUNK_2POW_DEFAULT.  Currently, that happens to be true  (chunksize == 128 * 8192).
Attaching a patch which implements the second solution described in comment 2.

It is guarded by a MOZ_MEMORY_SOLARIS define at the moment, although its possible that other platforms could see benefits from this patch due to reduced number of system calls during memory allocation.  There are also assertions to hold chunksize to the conditions for MAP_ALIGN described in the Solaris 10 mmap man page.

I have tested this patch with the js shell on Solaris 10, sun4u. Both the js shell and the jemalloc library came from tracemonkey-73bfcdaa1a51. I built jemalloc as a DSO and spidermonkey with threads.
Attachment #359443 - Flags: review?(jasone)
Attachment #359443 - Flags: review?(jasone) → review+
Assignee: nobody → wes
Status: NEW → ASSIGNED
Keywords: checkin-needed
Cool!

It works for me on S10 SPARC.
http://hg.mozilla.org/mozilla-central/rev/afd4626d77c9

Sorry, I forgot to switch the user name.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Attachment #359443 - Flags: approval1.9.1?
Comment on attachment 359443 [details] [diff] [review]
Patch to use MAP_ALIGN to request chunksize-aligned blocks from mmap

a191=beltzner
Attachment #359443 - Flags: approval1.9.1? → approval1.9.1+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: