Closed
Bug 571209
Opened 14 years ago
Closed 13 years ago
Make chunksize in jemalloc a compile-time constant under MOZ_MEMORY
Categories
(Core :: Memory Allocator, defect)
Core
Memory Allocator
Tracking
()
RESOLVED
FIXED
mozilla11
People
(Reporter: igor, Assigned: igor)
References
Details
Attachments
(1 file, 4 obsolete files)
9.71 KB,
patch
|
justin.lebar+bug
:
review+
|
Details | Diff | Splinter Review |
jemalloc during initialization allows to change the default chunk size using a configuration file or an environment variable MALLOC_OPTIONS. This may trigger rather inefficient allocation of chunks for JS GC. With the bug 560818 fixed the code assumes that the size of GC chunk (a compile-time constant) matches chunksize so posix_memalign comming from jemalloc would do efficient chunk allocation. Thus I suggest to make chunksize a compile-time constant when building under MOZ_MEMORY so JS GC may assume the optimal chunk size. This may also speedup things a bit as, for example, the free call in jemalloc starts with CHUNK_ADDR2BASE(ptr) or ((void *)((uintptr_t)(ptr) & ~chunksize_mask)). Replacing chunksize_mask with compile-time constant would avoid an extra memory load per each free call.
Comment 1•14 years ago
|
||
I'd rather leave it runtime configurable so that we can play with different settings and find optimal ones more easily.
The point here is that there are other parts of the code that create relationships to the chunk size, and that you won't find optimal ones by simply changing one part of that chain. What sort of experiments do you find that you're currently doing with the runtime configurability? It might be that we can support those more directly, if changing the relations in the source and rebuilding is too much work.
Assignee | ||
Comment 3•13 years ago
|
||
The patch makes chunksize and other related parameters in jemalloc a compile-time constant. The effect of the patch is visible in the following program that allocated and frees arrays of various sizes: int main(int argc, char **argv, char **envp) { for (size_t loop = 0; loop != 10; ++loop) { const size_t N = 1024 * 1024; void **array = static_cast<void **>(malloc(N * sizeof(void *))); for (size_t i = 0; i != N; ++i) { array[i] = malloc(16 * (N % 128)); } for (size_t i = 0; i != N; ++i) { free(array[i]); } free(array); } return 0; } Without the patch the program compiled with GCC -O3 for 32 bit on a 3.0 GHz Linux box runs like (the stats for the fastest run among 10) : 1043.840247 task-clock # 1.000 CPUs utilized 1 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 15,720 page-faults # 0.015 M/sec 3,228,595,566 cycles # 3.093 GHz 952,114,163 stalled-cycles-frontend # 29.49% frontend cycles idle 157,464,458 stalled-cycles-backend # 4.88% backend cycles idle 5,756,110,782 instructions # 1.78 insns per cycle # 0.17 stalled cycles per insn 1,252,664,517 branches # 1200.054 M/sec 376,755 branch-misses # 0.03% of all branches 1.044286911 seconds time elapsed With the patch that changes into: 992.405996 task-clock # 1.000 CPUs utilized 1 context-switches # 0.000 M/sec 1 CPU-migrations # 0.000 M/sec 10,611 page-faults # 0.011 M/sec 3,069,489,210 cycles # 3.093 GHz 879,431,589 stalled-cycles-frontend # 28.65% frontend cycles idle 136,361,360 stalled-cycles-backend # 4.44% backend cycles idle 5,515,353,611 instructions # 1.80 insns per cycle # 0.16 stalled cycles per insn 1,185,909,581 branches # 1194.984 M/sec 372,580 branch-misses # 0.03% of all branches 0.992851027 seconds time elapsed Which gives a 5% improvement speed-wise. This shows that an option for a compile-time constant configuration is useful from performance point of view alone.
Comment 4•13 years ago
|
||
> Which gives a 5% improvement speed-wise. This shows that an option for a compile-time constant
> configuration is useful from performance point of view alone.
This seems like a good idea to me. We've done very little experimentation tuning jemalloc's parameters, and I think our time at this point would probably be better spent getting jemalloc 2 working, since we know that's a large speedup. But if we did want to play with the parameters, we still could with this patch.
igor, I'm happy to review whenever you're ready.
Assignee | ||
Comment 5•13 years ago
|
||
The patch adds JEMALLOC_STATIC_SIZES that, when defined to 1, turns various size-related parameters into compile-time constants and disables their runtime configuration. If JEMALLOC_STATIC_SIZES is not defined but MOZ_MEMERY is set, JEMALLOC_STATIC_SIZES is defined to 1. Under JEMALLOC_STATIC_SIZES I define some parameters not via preprocessor defines, but using less portable static const declarations that the C compiler is not required to treat as compile-time constants. This is done when the parameter name coincides with a name in some jemalloc struct. In this case the define would require first to rename the parameter greatly increasing the patch size. So instead I trust that any optimizing compiler would at least inline the const value.
Attachment #577242 -
Attachment is obsolete: true
Attachment #577334 -
Flags: review?(justin.lebar+bug)
Comment 6•13 years ago
|
||
>+/* >+ * JEMALLOC_STATIC_SIZES controls whether we want to be able to fully >+ * configure memory allocator parameters at runtime (undefined or 0) or >+ * for maximal performance we want to define most of the parameters as >+ * compile-time constants (value 1). >+ */ >+#ifndef JEMALLOC_STATIC_SIZES >+# ifdef MOZ_MEMORY >+# define JEMALLOC_STATIC_SIZES 1 >+# else >+# define JEMALLOC_STATIC_SIZES 0 >+# endif >+#elif JEMALLOC_STATIC_SIZES != 0 && JEMALLOC_STATIC_SIZES != 1 >+# error "When defined JEMALLOC_STATIC_SIZES must be 0 or 1" >+#endif Why not just harcode JEMALLOC_STATIC_SIZES to 1 and avoid these checks? >+#if JEMALLOC_STATIC_SIZES >+ > /* VM page size. */ >+#define pagesize_2pow ((size_t) 12) Please comment that this must divide the hardware page size, otherwise we'll abort on startup. >+#define PAGESIZE_DEFAULT ((size_t) 1 << pagesize_2pow) >+static const size_t pagesize = PAGESIZE_DEFAULT; Can we just use |pagesize| instead of PAGESIZE_DEFAULT? I'd rather not introduce a global define which is only available when JEMALLOC_STATIC_SIZES is defined. >+static const size_t pagesize_mask = ((size_t) 1 << pagesize_2pow) - 1; >+ >+/* Various quantum-related settings. */ >+ >+#define QUANTUM_DEFAULT ((size_t) 1 << QUANTUM_2POW_MIN) Nit: One too many tabs. >+static const size_t quantum = QUANTUM_DEFAULT; Similarly, can we just use |quantum| instead of QUANTUM_DEFAULT? >+/* Number of (2^n)-spaced sub-page bins. */ >+static const unsigned nsbins = (unsigned) >+ (pagesize_2pow - >+ SMALL_MAX_2POW_DEFAULT - >+ 1); Nit: "1);" doesn't need its own line. >+ >+#else /* !JEMALLOC_STATIC_SIZES */ >+ > static size_t pagesize; > static size_t pagesize_mask; > static size_t pagesize_2pow; > >-/* Various bin-related settings. */ Why was this comment removed? It's a little silly, I agree, but better to be consistent with the other comments that are still here in this section. >-static size_t bin_maxclass; /* Max size class for bins. */ >-static unsigned ntbins; /* Number of (2^n)-spaced tiny bins. */ >-static unsigned nqbins; /* Number of quantum-spaced bins. */ >-static unsigned nsbins; /* Number of (2^n)-spaced sub-page bins. */ >+static size_t quantum; >+static size_t quantum_mask; >+ > static size_t small_min; > static size_t small_max; > >-/* Various quantum-related settings. */ >-static size_t quantum; >-static size_t quantum_mask; /* (quantum - 1). */ >+static size_t bin_maxclass; >+static unsigned ntbins; >+static unsigned nqbins; >+static unsigned nsbins; Not sure why you changed this, but could you please preserve the comments on n{t,q,s}bins? >+/* >+ * Compute the header size such that it is large enough to contain the page map >+ * and enough nodes for the worst case: one node per non-header page plus one >+ * extra for situations where we briefly have one more node allocated than we >+ * will need. >+ */ >+#define calculate_arena_header_size() \ >+ (sizeof(arena_chunk_t) + sizeof(arena_chunk_map_t) * (chunk_npages - 1)) >+ >+#define calculate_arena_header_pages() \ >+ ((calculate_arena_header_size() >> pagesize_2pow) + \ >+ (calculate_arena_header_size() & pagesize_mask) ? 1 : 0) >+ >+/* Max size class for arenas. */ >+#define calculate_arena_maxclass() \ >+ (chunksize - (arena_chunk_header_npages << pagesize_2pow)) >+ >+#if JEMALLOC_STATIC_SIZES >+#define CHUNKSIZE_DEFAULT ((size_t) 1 << CHUNK_2POW_DEFAULT) >+static const size_t chunksize = CHUNKSIZE_DEFAULT; Just use |chunksize| instead of CHUNKSIZE_DEFAULT? >+static const size_t chunksize_mask =CHUNKSIZE_DEFAULT - 1; >+static const size_t chunk_npages = CHUNKSIZE_DEFAULT >> pagesize_2pow; >+#define arena_chunk_header_npages calculate_arena_header_pages() >+#define arena_maxclass calculate_arena_maxclass() Can we make these const size_t's? I've got to imagine the compiler can fold these expressions; do you have evidence to the contrary? >+#else > static size_t chunksize; >-static size_t chunksize_mask; /* (chunksize - 1). */ >+static size_t chunksize_mask; > static size_t chunk_npages; > static size_t arena_chunk_header_npages; >-static size_t arena_maxclass; /* Max size class for arenas. */ >+static size_t arena_maxclass; >+#endif > > /********/ > /* > * Chunks. > */ > > #ifdef MALLOC_VALIDATE > static malloc_rtree_t *chunk_rtree; > #endif >@@ -1212,21 +1294,27 @@ static bool opt_abort = false; > #ifdef MALLOC_FILL > static bool opt_junk = false; > #endif > #endif > static size_t opt_dirty_max = DIRTY_MAX_DEFAULT; > #ifdef MALLOC_BALANCE > static uint64_t opt_balance_threshold = BALANCE_THRESHOLD_DEFAULT; > #endif > static bool opt_print_stats = false; >+#if JEMALLOC_STATIC_SIZES >+#define opt_quantum_2pow QUANTUM_2POW_MIN >+#define opt_small_max_2pow SMALL_MAX_2POW_DEFAULT >+#define opt_chunk_2pow CHUNK_2POW_DEFAULT >+#else > static size_t opt_quantum_2pow = QUANTUM_2POW_MIN; > static size_t opt_small_max_2pow = SMALL_MAX_2POW_DEFAULT; > static size_t opt_chunk_2pow = CHUNK_2POW_DEFAULT; >+#endif > #ifdef MALLOC_PAGEFILE > static bool opt_pagefile = false; > #endif > #ifdef MALLOC_UTRACE > static bool opt_utrace = false; > #endif > #ifdef MALLOC_SYSV > static bool opt_sysv = false; > #endif >@@ -5552,44 +5640,47 @@ malloc_init_hard(void) > #endif > > /* Get page size and number of CPUs */ > #ifdef MOZ_MEMORY_WINDOWS > { > SYSTEM_INFO info; > > GetSystemInfo(&info); > result = info.dwPageSize; >- >- pagesize = (unsigned) result; >- >+ > #ifndef MOZ_MEMORY_NARENAS_DEFAULT_ONE > ncpus = info.dwNumberOfProcessors; > #endif > } > #else > #ifndef MOZ_MEMORY_NARENAS_DEFAULT_ONE > ncpus = malloc_ncpus(); > #endif > > result = sysconf(_SC_PAGESIZE); > assert(result != -1); >- >- pagesize = (unsigned) result; >-#endif >- >- /* >- * We assume that pagesize is a power of 2 when calculating >- * pagesize_mask and pagesize_2pow. >- */ >+#endif >+ >+ /* We assume that the page size is a power of 2. */ > assert(((result - 1) & result) == 0); >- pagesize_mask = result - 1; >+#if JEMALLOC_STATIC_SIZES >+ if (pagesize % (size_t) result) { >+ _malloc_message(_getprogname(), >+ "Compile-time page size does not divide the runtime one.\n", >+ "", ""); >+ abort(); >+ } Thanks for this check. >+#else >+ pagesize = (size_t) result; >+ pagesize_mask = (size_t) result - 1; > pagesize_2pow = ffs((int)result) - 1; >- >+#endif >+#if !JEMALLOC_STATIC_SIZES > case 'k': > /* > * Chunks always require at least one > * header page, so chunks can never be > * smaller than two pages. > */ > if (opt_chunk_2pow > pagesize_2pow + 1) > opt_chunk_2pow--; > break; > case 'K': > if (opt_chunk_2pow + 1 < > (sizeof(size_t) << 3)) > opt_chunk_2pow++; > break; >+#endif If you wanted to be awesome, you could print a warning when one of these options is specified and STATIC_SIZES is defined. But since basically nobody uses this, it's not a big deal.
Updated•13 years ago
|
Attachment #577334 -
Flags: review?(justin.lebar+bug) → review+
Assignee | ||
Comment 7•13 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #6) > >+/* > >+ * JEMALLOC_STATIC_SIZES controls whether we want to be able to fully > >+ * configure memory allocator parameters at runtime (undefined or 0) or > >+ * for maximal performance we want to define most of the parameters as > >+ * compile-time constants (value 1). > >+ */ > >+#ifndef JEMALLOC_STATIC_SIZES > >+# ifdef MOZ_MEMORY > >+# define JEMALLOC_STATIC_SIZES 1 > >+# else > >+# define JEMALLOC_STATIC_SIZES 0 > >+# endif > >+#elif JEMALLOC_STATIC_SIZES != 0 && JEMALLOC_STATIC_SIZES != 1 > >+# error "When defined JEMALLOC_STATIC_SIZES must be 0 or 1" > >+#endif > > Why not just harcode JEMALLOC_STATIC_SIZES to 1 and avoid these checks? I will simply drop elif check - it is not necessary given that the rest of the patch uses #if JEMALLOC_STATIC_SIZES so the compiler will reject any-non-integer value and any integer non-zero integer is treated as true. > >+#define PAGESIZE_DEFAULT ((size_t) 1 << pagesize_2pow) > >+static const size_t pagesize = PAGESIZE_DEFAULT; > > Can we just use |pagesize| instead of PAGESIZE_DEFAULT? ... > Similarly, can we just use |quantum| instead of QUANTUM_DEFAULT? QUANTUM_DEFAULT is necessary as in C one cannot use a const when defining another const. That is, the following fragment is not a valid C code: static const size_t quantum = ((size_t) 1 << QUANTUM_2POW_MIN); static const size_t quantum_mask = quantum - 1; So I added QUANTUM_DEFAULT. But pagesize is not used in that position, so PAGESIZE_DEFAULT can be indeed removed. > >+static const size_t chunksize = CHUNKSIZE_DEFAULT; > > Just use |chunksize| instead of CHUNKSIZE_DEFAULT? > >+static const size_t chunksize_mask =CHUNKSIZE_DEFAULT - 1; Using chunksize in chunksize_mask does not work. > > >+static const size_t chunk_npages = CHUNKSIZE_DEFAULT >> pagesize_2pow; > >+#define arena_chunk_header_npages calculate_arena_header_pages() > >+#define arena_maxclass calculate_arena_maxclass() > > Can we make these const size_t's? This is not possible again for that const-not-a-real-const-in-C problem.
Comment 8•13 years ago
|
||
> I will simply drop elif check What's the benefit of having this at all, as compared to a blanket |#define JEMALLOC_STATIC_SIZES 1|? We never compile jemalloc without MOZ_MALLOC, right? > This is not possible again for that const-not-a-real-const-in-C problem. I'm not versed in this C gotcha. Would you mind elaborating? The following code #include <stddef.h> static const size_t pagesize_2pow = (size_t) 12; static const size_t pagesize = (size_t) 1 << pagesize_2pow; static const size_t pagesize_mask = pagesize - 1; int foo() { return pagesize_mask; } compiles to 0000000000000000 <foo()>: 0: b8 ff 0f 00 00 mov eax,0xfff 5: c3 ret Isn't that what we want?
Assignee | ||
Comment 9•13 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #8) > > I will simply drop elif check > > What's the benefit of having this at all, as compared to a blanket |#define > JEMALLOC_STATIC_SIZES 1|? We never compile jemalloc without MOZ_MALLOC, > right? This allows to compile with the runtime configuration enabled while having MOZ_MEMORY. This is how I tested the patch. > I'm not versed in this C gotcha. Would you mind elaborating? The following > code > > #include <stddef.h> > static const size_t pagesize_2pow = (size_t) 12; > static const size_t pagesize = (size_t) 1 << pagesize_2pow; > static const size_t pagesize_mask = pagesize - 1; > > int foo() > { > return pagesize_mask; > } > > compiles to With GCC 4.6 I got: x.c:3:1: error: initializer element is not constant x.c:4:1: error: initializer element is not constant IIRC GCC is right here - this is not a valid C code.
Assignee | ||
Comment 10•13 years ago
|
||
patch with comments addressed
Attachment #577334 -
Attachment is obsolete: true
Comment 11•13 years ago
|
||
> This allows to compile with the runtime configuration enabled while having MOZ_MEMORY. This is how I > tested the patch. But in order to compile with runtime configuration enabled, you had to modify some source code, or otherwise send a #define into jemalloc.c, right? My suggestion is just that, when you want to compile with runtime configuration enabled, you should modify the #define, just like you have to do with, for example, MALLOC_DECOMMIT (and, indeed, with every other #define'd option in jemalloc.c). > With GCC 4.6 I got: > > x.c:3:1: error: initializer element is not constant > x.c:4:1: error: initializer element is not constant Huh. I'm also on gcc 4.6, and I've tried -O0 and -O2, with and without -pedantic. clang also doesn't complain.
Comment 12•13 years ago
|
||
Ah, I need -std=gnu99 in order for gcc to flag it as an error.
Assignee | ||
Comment 13•13 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #11) > > This allows to compile with the runtime configuration enabled while having MOZ_MEMORY. This is how I > > tested the patch. > > But in order to compile with runtime configuration enabled, you had to > modify some source code, or otherwise send a #define into jemalloc.c, right? No - I just added -DJEMALLOC_STATIC_SIZES=0 to CFLAGS when compiling. > > With GCC 4.6 I got: > > > > x.c:3:1: error: initializer element is not constant > > x.c:4:1: error: initializer element is not constant > > Huh. I'm also on gcc 4.6, and I've tried -O0 and -O2, with and without > -pedantic. clang also doesn't complain. Do you compile with GCC or G++ and the file extension is .c?
Assignee | ||
Comment 14•13 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #12) > Ah, I need -std=gnu99 in order for gcc to flag it as an error. Hm, For me just gcc x.c or gcc -std=<every_c_version_from_the_manual_on_fedora_15> x.c generates the error.
Comment 15•13 years ago
|
||
> No - I just added -DJEMALLOC_STATIC_SIZES=0 to CFLAGS when compiling.
That was what I meant by "or otherwise send a #define into jemalloc.c."
Sorry to nit over this little thing, but I really don't think this one flag is worth privileging in this way over all the other flags which are hardcoded into jemalloc.
Comment 16•13 years ago
|
||
> Hm, For me just gcc x.c or gcc -std=<every_c_version_from_the_manual_on_fedora_15> x.c generates the
> error.
You're right; the problem was that the file extension was cpp. clang really doesn't complain, fwiw.
Assignee | ||
Comment 17•13 years ago
|
||
The new version names the preprocessor define as MALLOC_STATIC_SIZES and its usage should follow the rest of defines.
Attachment #577429 -
Attachment is obsolete: true
Attachment #577443 -
Flags: review?(justin.lebar+bug)
Comment 18•13 years ago
|
||
+#if defined MOZ_MEMORY && !defined MALLOC_STATIC_SIZES +# define MALLOC_STATIC_SIZES 1 +#endif Can it just be |#define MALLOC_STATIC_SIZES 1| without the MOZ_MEMORY and !MALLOC_STATIC_SIZES guards? We never compile jemalloc without MOZ_MEMORY (right?), so this is just a red herring. And the !MALLOC_STATIC_SIZES guard is only useful if we want to disable this feature from configure. Which we might want to do for some weird architecture or something, but YAGNI.
Assignee | ||
Comment 19•13 years ago
|
||
The new version just defines MALLOC_STATIC_SIZES as 1 and fixes a bad typo in calculate_arena_header_pages() when I forgot () around b ? c : d in a + (b ? c : d) so it became (a + b) ? c : d. Surprisingly the tryserver has shown the problem only in very few cases that I initially ignored as the bug lead to an infinite loop that translates into a timeouts on he tryserver.
Attachment #577443 -
Attachment is obsolete: true
Attachment #577536 -
Flags: review?(justin.lebar+bug)
Attachment #577443 -
Flags: review?(justin.lebar+bug)
Comment 20•13 years ago
|
||
Comment on attachment 577536 [details] [diff] [review] v5 r=me. Thanks, Igor.
Attachment #577536 -
Flags: review?(justin.lebar+bug) → review+
Assignee | ||
Comment 21•13 years ago
|
||
http://hg.mozilla.org/integration/mozilla-inbound/rev/67451553bcbb
Comment 22•13 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/67451553bcbb
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla11
You need to log in
before you can comment on or make changes to this bug.
Description
•