Closed Bug 396007 Opened 18 years ago Closed 18 years ago

Using posix_memalign for GC arenas when mmap is not available

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla1.9

People

(Reporter: igor, Assigned: igor)

References

Details

Attachments

(2 files, 8 obsolete files)

v1 (something that just compiles) 18 years ago Igor Bukanov 4.85 KB, patch		Details \| Diff \| Splinter Review
v2 18 years ago Igor Bukanov 12.29 KB, patch		Details \| Diff \| Splinter Review
v3 18 years ago Igor Bukanov 12.61 KB, patch		Details \| Diff \| Splinter Review
v4 18 years ago Igor Bukanov 13.76 KB, patch		Details \| Diff \| Splinter Review
sunspider using psix_memalign 18 years ago Igor Bukanov 3.66 KB, text/plain		Details
v5 18 years ago Igor Bukanov 18.43 KB, patch		Details \| Diff \| Splinter Review
v6 18 years ago Igor Bukanov 16.63 KB, patch	brendan : review+ brendan : approval1.9+	Details \| Diff \| Splinter Review
v6b 18 years ago Igor Bukanov 16.63 KB, patch	igor : review+ igor : approval1.9+	Details \| Diff \| Splinter Review
v7 18 years ago Igor Bukanov 20.77 KB, patch	brendan : review+ brendan : review+	Details \| Diff \| Splinter Review
v8 18 years ago Igor Bukanov 20.90 KB, patch	igor : review+ mtschrep : approval1.9b4+ igor : approval1.9+	Details \| Diff \| Splinter Review

Igor Bukanov

Assignee

Description

•

18 years ago

Bug 392263 uses mmap/VirtualAlloc to implement the allocation of aligned GC arenas using non-executable CPU pages. When the paged allocation is not available or can not be used due to an unsuitable size of the CPU page, the code switches to over sized malloc calls to implement aligned allocation. It would be nice to use posix_memalign instead when available and avoid malloc overhead.

Stuart Parmenter

Comment 1

•

18 years ago

With jemalloc Windows we support posix_memalign. Linux has it. Doesn't seem to be on Mac, but would be nice to use it where possible.

Igor Bukanov

Assignee

Comment 2

•

18 years ago

Attached patch v1 (something that just compiles) (obsolete) — Details — Splinter Review

I will test/document this tomorrow.

Stuart Parmenter

Comment 3

•

18 years ago

fwiw this patch seems to work OK on Windows with jemalloc (with some forced #ifdefs)

Igor Bukanov

Assignee

Comment 4

•

18 years ago

(In reply to comment #3) > fwiw this patch seems to work OK on Windows with jemalloc (with some forced > #ifdefs) > what is the overhead of memalign in jemalloc? In particular, does it need any headers/trailers? if so, calling the function several times like in memalign(&p, 2^n, 2^n); would not be able to allocate the blocks in a sequence but would make inssted a holes of size 2^n at least between blocks. A proper way to use the function would be to decrease the allocation size: memalign(&p, 2^n, 2^n - implementation_overhead);

Jason Evans

Comment 5

•

18 years ago

jemalloc does not use object headers, so the objects would all be adjacent, assuming no pre-existing memory fragmentation.

Stuart Parmenter

Comment 6

•

18 years ago

I think we're going to want to do larger allocations than 1k for these. With your patch I see 328200 NewGCChunk()s vs 1500 with the defaults on Windows

Igor Bukanov

Assignee

Comment 7

•

18 years ago

(In reply to comment #6) > I think we're going to want to do larger allocations than 1k for these. With > your patch I see 328200 NewGCChunk()s vs 1500 with the defaults on Windows > why is it necessary bad? The amount of allocated memory for GC heap is the same in both cases. But in the memalign case the system can use for its purposes the space that is otherwise wasted due to internal fragmentation.

Igor Bukanov

Assignee

Comment 8

•

18 years ago

(In reply to comment #5) > jemalloc does not use object headers, so the objects would all be adjacent, > assuming no pre-existing memory fragmentation. This is nice. Still, since posix_memalign from GLIBC does use the header for posix_memalign, I would like to support that in the patch automatically detecting header's size during compilations. Does a build where posix_memalign comes from jemalloc defines any special preprocessor macros?

Igor Bukanov

Assignee

Comment 9

•

18 years ago

(In reply to comment #3) > fwiw this patch seems to work OK on Windows with jemalloc (with some forced > #ifdefs) But how MSVC can find a definition of posix_memalign??? The patch does not try to include any extra headers assuming that posix_memalign comes from stdlib.c. Or does MSVC comes with posix_memalign defined in that header?

Igor Bukanov

Assignee

Comment 10

•

18 years ago

Attached patch v2 (obsolete) — Details — Splinter Review

The new version passes the tests when compiled with -DJS_GC_USE_MMAP=0 on Linux. To enable posix_memalign on a Windows build with jemalloc, make sure that the following defines are passed to the compiler when it compiles jsgc.c: -DJS_GC_USE_MMAP=0 -DJS_HAS_POSIX_MEMALIGN=1 -DJS_GC_AREANA_PAD=0 To finish the patch I need to set JS_HAS_POSIX_MEMALIGN and JS_GC_AREANA_PAD automatically on Windows when jemalloc and its posix_memalign is available. Does jsemalloc provides any preprocessor macros for that?

Attachment #304305 - Attachment is obsolete: true

Igor Bukanov

Assignee

Comment 11

•

18 years ago

Attached patch v3 (obsolete) — Details — Splinter Review

The new patch detects jemalloc build via checking MOZ_MEMORY and enables posix_memalign support if so. Since a default posix_memalign from GLIBC suffers from the problem in comment 4, the patch introduces JS_GC_ARENA_PAD to ask for less memory than the alignment requirements. With jemalloc it is set to 0, otherwise JS_GC_ARENA_PAD it is 2 words, the size that GLIBC needs. The patch also uses 4K arenas with all the allocation modes. Small arenas apparently hurt the performance with posix_memalign from jemalloc. To enable posix_memalign allocations with the patch, make sure that JS_GC_USE_MMAP is set to 0 when compiling SpiderMonkey. I.e. like passing -DJS_GC_USE_MMAP=0 to the compiler.

Attachment #304446 - Attachment is obsolete: true

Attachment #304559 - Flags: review?(brendan)

Igor Bukanov

Assignee

Comment 12

•

18 years ago

If testing shows that jemalloc is better for the browser, then code can be changed to favor posix_memalign over mmap when MOZ_MEMORY is defined.

Igor Bukanov

Assignee

Comment 13

•

18 years ago

Attached patch v4 (obsolete) — Details — Splinter Review

The new version optimizes more for the case when GC arena == GC chunk and excludes unnecessary code in this case.

Attachment #304559 - Attachment is obsolete: true

Attachment #304565 - Flags: review?(brendan)

Attachment #304559 - Flags: review?(brendan)

Igor Bukanov

Assignee

Comment 14

•

18 years ago

Attached file sunspider using psix_memalign — Details

With a default allocator from GLIBC enabling posix_memalign on Linux leads to 2.5% improvement in SunSpider benchmark with optimized thread-safe build of js shell. This is for Fedora-9 alpha on Pentium-M 1.1GHz laptop.

Igor Bukanov

Assignee

Comment 15

•

18 years ago

Attached patch v5 (obsolete) — Details — Splinter Review

The new version prioritize posix_memalign over mmap when both are available. This is based on the benchmark results from Linux. It would be interesting to know what happens on Windows.

Attachment #304565 - Attachment is obsolete: true

Attachment #304742 - Flags: review?(brendan)

Attachment #304565 - Flags: review?(brendan)

Igor Bukanov

Assignee

Comment 16

•

18 years ago

Attached patch v6 (obsolete) — Details — Splinter Review

The previous version was over-engineered. The new version always uses posix_memalign when it is available and the user does not insist on mmap with explicit JS_GC_USE_MMAP set to 1.

Attachment #304742 - Attachment is obsolete: true

Attachment #304758 - Flags: review?(brendan)

Attachment #304742 - Flags: review?(brendan)

Brendan Eich [:brendan]

Updated

•

18 years ago

Flags: blocking1.9+

Target Milestone: --- → mozilla1.9beta4

Brendan Eich [:brendan]

Comment 17

•

18 years ago

What is the win (SunSpider or any other test we use) on Windows with jemalloc and this patch? /be

Brendan Eich [:brendan]

Comment 18

•

18 years ago

Comment on attachment 304758 [details] [diff] [review] v6 >+ * When it is is true, a platform-dependant function like mmap is used to get >+ * memory aligned on CPU page boundary. If the macro is false or not defined, >+ * posix_memalign is used when available. Otherwise the code uses over- >+ * sized malloc to allocate a chunk with js_gcArenasPerChunk aligned arenas. >+ * The approximate space overhead of this is 1/js_gcArenasPerChunk. For >+ * details, see NewGCChunk/DestroyGCChunk below. Please rewrap to avoid ragged right, oversized (one word) fits on one line. >+ * Implement the chunk allocation using over sized malloc if mmap or s/over sized/oversized/ and any others like these. r/a=me, looks good -- anticipate Windows win too (may have heard about it, didn't see it reported here). /be

Attachment #304758 - Flags: review?(brendan)

Attachment #304758 - Flags: review+

Attachment #304758 - Flags: approval1.9+

Igor Bukanov

Assignee

Comment 19

•

18 years ago

(In reply to comment #17) > What is the win (SunSpider or any other test we use) on Windows with jemalloc > and this patch? Robert has mentioned on IRC that with the initial version of the patch that has used 1 K GC arenas things were slower by couple percent, with bigger arenas the speedup was 2%. According to this the last version of the patch uses 4K arenas in all arena allocation modes.

Igor Bukanov

Assignee

Comment 20

•

18 years ago

Attached patch v6b (obsolete) — Details — Splinter Review

The new version addresses the nits from comment 18.

Attachment #304758 - Attachment is obsolete: true

Attachment #305345 - Flags: review+

Attachment #305345 - Flags: approval1.9+

Igor Bukanov

Assignee

Comment 21

•

18 years ago

Attached patch v7 (obsolete) — Details — Splinter Review

The landed patch from bug 400902 added expected heavy conflicts with this patch. While resolving them I took opportunity to improve the patch when it is known at the compile time that GC arena == chunk. Now the code uses bitfields in JSGCArenaInfo only if a chunked allocation is necessary.

Attachment #305345 - Attachment is obsolete: true

Igor Bukanov

Assignee

Comment 22

•

18 years ago

Comment on attachment 305852 [details] [diff] [review] v7 The last version has passed both the test suite and mochi tests.

Attachment #305852 - Flags: review?(brendan)

Igor Bukanov

Assignee

Comment 23

•

18 years ago

The last patch compiles on Mac and probably on Windows - the try server http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaTry would have already stopped in the case of compilation problems on Windows as SpiderMonkey compiles early during the compilation stage.

Brendan Eich [:brendan]

Comment 24

•

18 years ago

Comment on attachment 305852 [details] [diff] [review] v7 >+int >+posix_memalign(void **memptr, size_t alignment, size_t size); Minimal declaration to get Windows happy, but house style favors extern. >+ * Declare the headers for mmap unless we have posix_memalign and do not insist s/Declare/Include/ >+ * memory aligned on CPU page boundary. If the macro is false or not defined, s/boundary/boundaries/ s/not defined/undefined/ This should still fit in the given tw=78. >+ * posix_memalign is used when available. Otherwise the code uses oversized >+ * malloc to allocate a chunk with js_gcArenasPerChunk aligned arenas. Suggest "... uses malloc to over-allocate a chunk ...". >+ * The code also allocates arenas in chunks when JS_GC_USE_MMAP is true to s/true/1/ >+ * the runtime. Strike "the". >+ * JS_GC_ARENA_PAD defines the number of bytes to pad JSGCArenaInfo structure. >+ * It is used to improve allocation efficiency when using posix_memalign. If >+ * its implementation uses internal headers, then calling s/its/malloc's/ >+ * due to the need to fit headers. JS_GC_ARENA_PAD allows to mitigate that s/allows to mitigate/allows us to mitigate/ or (better) just s/allows to mitigate/mitigates/ >+ * The structure is stored in one of chunk's free arenas. GET_CHUNK_INFO_INDEX "Structure stored in one of a chunk's free arenas." >+ * macro gives the index of this arena. When all arenas in the chunk are used, strike "macro" as the lack of definite article "The" before "GET_CHUNK_INFO_INDEX" makes it awkward, and it's arguably deadwood in any case -- the rest of the para should rewrap well. >+ * Implement the chunk allocation using oversized malloc if mmap or >+ * posix_memalign are not available. s/the chunk/chunk/ s/or/and/ > * > * Since malloc allocates pointers aligned on the word boundary, to get > * js_gcArenasPerChunk aligned arenas, we need to malloc only > * ((js_gcArenasPerChunk + 1) << GC_ARENA_SHIFT) - sizeof(size_t) > * bytes. But since we stores the gap between the malloced pointer and the > * first arena in the chunk after the chunk, we need to ask for > * ((js_gcArenasPerChunk + 1) << GC_ARENA_SHIFT) > * bytes to ensure that we always have room to store the gap. > */ Again blank comment lines on either side of excerpted code, relations, and formulae would be nice. > p = malloc((js_gcArenasPerChunk + 1) << GC_ARENA_SHIFT); >- if (!p) >- return 0; >- chunk = ((jsuword) p + GC_ARENA_MASK) & ~GC_ARENA_MASK; >- *GetMallocedChunkGapPtr(chunk) = (uint32) (chunk - (jsuword) p); >- return chunk; >+ if (p) { >+ jsuword chunk; >+ >+ chunk = ((jsuword) p + GC_ARENA_MASK) & ~GC_ARENA_MASK; >+ *GetMallocedChunkGapPtr(chunk) = (uint32) (chunk - (jsuword) p); >+ return chunk; >+ } Previous code followed prevailing style by casting out (unlikely) error with early return and well-predicted (IIRC) branch around then clause. Just wondering why this change. Is it better for prefetching? Looks good, r/a+ with nits picked as appropriate. /be

Attachment #305852 - Flags: review?(brendan)

Attachment #305852 - Flags: review+

Stuart Parmenter

Comment 25

•

18 years ago

Latest patch looks good on Windows. I tested it and don't see any real perf change on Windows but should be a little better long term.

Igor Bukanov

Assignee

Comment 26

•

18 years ago

Attached patch v8 — Details — Splinter Review

The new version of the patch addresses the nits from the comment 24. > Previous code followed prevailing style by casting out (unlikely) error with early return and well-predicted (IIRC) branch around then clause. I wanted to avoid declaring "chunk" at the top of the function as it would require an ifdef around it as only oversized malloc needs it. In the new version I have changed the code to follow the early-return-on-error pattern and used an explicit block to declare "chunk": p = malloc((js_gcArenasPerChunk + 1) << GC_ARENA_SHIFT); - if (p) { + if (!p) + return 0; + + { jsuword chunk; chunk = ((jsuword) p + GC_ARENA_MASK) & ~GC_ARENA_MASK; *GetMallocedChunkGapPtr(chunk) = (uint32) (chunk - (jsuword) p); return chunk; } #endif - return 0;

Attachment #305852 - Attachment is obsolete: true

Attachment #305957 - Flags: review+

Attachment #305957 - Flags: approval1.9+

Mike Beltzner [:beltzner, not reading bugmail]

Comment 27

•

18 years ago

Moving bugs that aren't beta 4 blockers to target final release.

Target Milestone: mozilla1.9beta4 → mozilla1.9

Damon Sicore (:damons)

Updated

•

18 years ago

Attachment #305957 - Flags: approval1.9b4?

Damon Sicore (:damons)

Comment 28

•

18 years ago

Please land this and close it. Requesting approval for b4.

Flags: tracking1.9+ → blocking1.9+

Priority: -- → P2

Mike Schroepfer

Updated

•

18 years ago

Attachment #305957 - Flags: approval1.9b4? → approval1.9b4+

Igor Bukanov

Assignee

Comment 29

•

18 years ago

I checked in the patch from comment 26 to the trunk: http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1204474380&maxdate=1204474414&who=igor%25mir2.org

Status: NEW → RESOLVED

Closed: 18 years ago

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM]

Updated

•

18 years ago

Version: unspecified → Trunk

timeless

Comment 30

•

18 years ago

(In reply to comment #18) > (From update of attachment 304758 [details] [diff] [review]) > >+ * When it is is true, a platform-dependant function like mmap is used to get "it is true", "dependent". I'm shocked that brendan didn't complain.

Igor Bukanov

Assignee

Comment 31

•

18 years ago

(In reply to comment #30) > (In reply to comment #18) > > (From update of attachment 304758 [details] [diff] [review] [details]) > > >+ * When it is is true, a platform-dependant function like mmap is used to get > > "it is true", "dependent". I'm shocked that brendan didn't complain. Thanks for spotting this, I will file a "regression" bug about this".

Igor Bukanov

Assignee

Updated

•

18 years ago

Blocks: 420593

Igor Bukanov

Assignee

Updated

•

18 years ago

No longer blocks: 420593

Depends on: 420593

Brian Crowder

Updated

•

18 years ago

Depends on: 421236

Bob Clary [:bc] (inactive)

Updated

•

18 years ago

Flags: in-testsuite-

Flags: in-litmus-

You need to log in before you can comment on or make changes to this bug.