Closed Bug 392263 Opened 18 years ago Closed 18 years ago

Using CPU pages as GC arenas

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: igor, Assigned: igor)

References

Details

Attachments

(1 file, 30 obsolete files)

v0.1 18 years ago Igor Bukanov 41.37 KB, patch		Details \| Diff \| Splinter Review
v0.2 18 years ago Igor Bukanov 41.66 KB, patch		Details \| Diff \| Splinter Review
v0.3 18 years ago Igor Bukanov 53.61 KB, patch		Details \| Diff \| Splinter Review
v0.4 18 years ago Igor Bukanov 57.78 KB, patch		Details \| Diff \| Splinter Review
v0.5 18 years ago Igor Bukanov 58.96 KB, patch		Details \| Diff \| Splinter Review
v0.6 18 years ago Igor Bukanov 59.08 KB, patch		Details \| Diff \| Splinter Review
v1 18 years ago Igor Bukanov 90 bytes, patch		Details \| Diff \| Splinter Review
v2 18 years ago Igor Bukanov 60.06 KB, patch		Details \| Diff \| Splinter Review
v3 18 years ago Igor Bukanov 60.04 KB, patch		Details \| Diff \| Splinter Review
v4 18 years ago Igor Bukanov 60.14 KB, patch		Details \| Diff \| Splinter Review
v4b 18 years ago Igor Bukanov 60.23 KB, patch		Details \| Diff \| Splinter Review
v5 18 years ago Igor Bukanov 61.12 KB, patch		Details \| Diff \| Splinter Review
v5b 18 years ago Igor Bukanov 61.88 KB, patch		Details \| Diff \| Splinter Review
v6 18 years ago Igor Bukanov 61.93 KB, patch		Details \| Diff \| Splinter Review
v7 18 years ago Igor Bukanov 62.00 KB, patch		Details \| Diff \| Splinter Review
v8 18 years ago Igor Bukanov 61.81 KB, patch		Details \| Diff \| Splinter Review
v9 (using bit fields) 18 years ago Igor Bukanov 62.32 KB, patch		Details \| Diff \| Splinter Review
v10 (using bit fields) 18 years ago Igor Bukanov 62.22 KB, patch		Details \| Diff \| Splinter Review
v11 (with bit fields) 18 years ago Igor Bukanov 67.23 KB, patch		Details \| Diff \| Splinter Review
v12 (with bit fields) 18 years ago Igor Bukanov 67.33 KB, patch		Details \| Diff \| Splinter Review
v13 (with bit fields) 18 years ago Igor Bukanov 66.50 KB, patch		Details \| Diff \| Splinter Review
v14 (with bit fields) 18 years ago Igor Bukanov 68.90 KB, patch		Details \| Diff \| Splinter Review
v15 (with bit fields) 18 years ago Igor Bukanov 75.51 KB, patch		Details \| Diff \| Splinter Review
diff -b for v15 18 years ago Igor Bukanov 74.24 KB, patch		Details \| Diff \| Splinter Review
v16 18 years ago Igor Bukanov 78.56 KB, patch		Details \| Diff \| Splinter Review
v17 18 years ago Igor Bukanov 81.56 KB, patch	brendan : review+ brendan : approval1.9+	Details \| Diff \| Splinter Review
v18 18 years ago Igor Bukanov 81.38 KB, patch	brendan : review+ brendan : approval1.9+	Details \| Diff \| Splinter Review
v18b 18 years ago Igor Bukanov 81.54 KB, patch	igor : review+ igor : approval1.9+	Details \| Diff \| Splinter Review
v18c 18 years ago Igor Bukanov 81.57 KB, patch	igor : review+ igor : approval1.9+	Details \| Diff \| Splinter Review
v18d 18 years ago Igor Bukanov 81.57 KB, patch	igor : review+ igor : approval1.9+	Details \| Diff \| Splinter Review
v19 18 years ago Igor Bukanov 81.71 KB, patch	brendan : review+ brendan : approval1.9+	Details \| Diff \| Splinter Review

Igor Bukanov

Assignee

Description

•

18 years ago

[This is a spin-off of bug 51 comment 157334] It is worth to try to use CPU pages allocated via mmap or VirtualAlloc as GC arenas. It should allow for very fast js_GetGCThingFlags and would remove all the alignment complexity and overhead in the current jsgc.c.

Igor Bukanov

Assignee

Comment 1

•

18 years ago

Attached patch v0.1 (obsolete) — Details — Splinter Review

This is backup of the initial implementation that does not use any mmap or VirtualAlloc calls. Instead it just allocs 9 * page_size bytes and uses 8 aligned pages inside this chunk with the overhead of 12.5%. I will test the code and later use it as a default implementation when mmap is not available on the target platform.

Brendan Eich [:brendan]

Comment 2

•

18 years ago

It would also allow us to map PROT_READ|PROT_WRITE, no PROT_EXEC -- defeating a class of security exploits that contrive to put the pc in a JS string or similar and execute its characters as instructions. /be

Igor Bukanov

Assignee

Comment 3

•

18 years ago

(In reply to comment #2) > It would also allow us to map PROT_READ|PROT_WRITE, no PROT_EXEC -- defeating a > class of security exploits that contrive to put the pc in a JS string or > similar and execute its characters as instructions. AFAICS !PROT_EXEC for GC thinsg does not help since malloc would continue to allocate charaters for strings. Thus spraying the heap with strings with various forms of NOP instructions chunks before the bad payload would still work.

Igor Bukanov

Assignee

Comment 4

•

18 years ago

Attached patch v0.2 (obsolete) — Details — Splinter Review

Various fixes

Attachment #276722 - Attachment is obsolete: true

Brendan Eich [:brendan]

Comment 5

•

18 years ago

I was hoping we would get to the point where string character vectors would be allocated from ~PROT_EXEC memory. /be

Igor Bukanov

Assignee

Comment 6

•

18 years ago

Attached patch v0.3 (obsolete) — Details — Splinter Review

New versions with less overhead when mmap is not available plus infrastructure to use several arenas per mmap call. The latter is necessary to minimize overhead of mmap call.

Attachment #276841 - Attachment is obsolete: true

Igor Bukanov

Assignee

Comment 7

•

18 years ago

Attached patch v0.4 (obsolete) — Details — Splinter Review

The check point before adding mmap support on POSIX. Note that even without mmap support the patch shows faster allocation / GC on simple benchmarks.

Igor Bukanov

Assignee

Updated

•

18 years ago

Attachment #277636 - Attachment is obsolete: true

Igor Bukanov

Assignee

Comment 8

•

18 years ago

Attached patch v0.5 (obsolete) — Details — Splinter Review

The new version uses mmap on POSIX systems when available.

Attachment #277949 - Attachment is obsolete: true

Igor Bukanov

Assignee

Comment 9

•

18 years ago

Attached patch v0.6 (obsolete) — Details — Splinter Review

The new version fixes warnings with the last one when compiling FF.

Attachment #277973 - Attachment is obsolete: true

Igor Bukanov

Assignee

Comment 10

•

18 years ago

Attached patch v1 (obsolete) — Details — Splinter Review

The new version adds VirtualAlloc calls under Windows. This is based on info from http://msdn2.microsoft.com/en-us/library/aa366887.aspxbut I have no means to test it.

Attachment #277979 - Attachment is obsolete: true

Igor Bukanov

Assignee

Comment 11

•

18 years ago

Attached patch v2 (obsolete) — Details — Splinter Review

This is optimized version of v1. Notes on the patch: 1. Patch uses 4K arenas aligned on 4K boundaries to allocate GC things. This is hard coded even with 64-bit CPU as x64 uses 4K CPU pages. If mmap is not available, then to get properly alignment arenas the code allocates 32K and uses seven arenas in the block that are guaranteed to be aligned on 4K boundary. 2. Even with mmap available the patch allocates arenas in blocks to minimize the overhead of mmap. 3. As the page size is not a compile-time constant, for extra portability the code checks for the page size at startup and switches to the fall back schema if the size exceeds some sanity limit. Since the arena size is a compile-time constant, this runtime checks adds no visible overhead. 4. The XP_WIN part of the code is not tested beyond parsing by C compiler and based on http://msdn2.microsoft.com/en-us/library/aa366887.aspx. 5. Various synthetic benchmarks like: function test(n) { var now = Date.now; var startTime = now(); var o = null; var array = []; // Interleave garbage with reachable things while (n-- != 0) { var x = 1.1 + n; if (n % 2) array.push(x); x = null; } for (var i = 0; i != 20; ++i) gc(); } test(100 * 1000); shows a slight speedup of about 1% with the patch.

Attachment #278024 - Attachment is obsolete: true

Attachment #278061 - Flags: review?(brendan)

Brendan Eich [:brendan]

Comment 12

•

18 years ago

I will review by Monday, probably by tomorrow. /be

Igor Bukanov

Assignee

Comment 13

•

18 years ago

Attached patch v3 (obsolete) — Details — Splinter Review

The new version fixes a bug in OOM detection with mmap. When memory is not available, mmap returns MAP_FAILED or (void *) -1, not NULL.

Attachment #278061 - Attachment is obsolete: true

Attachment #278240 - Flags: review?(brendan)

Attachment #278061 - Flags: review?(brendan)

Igor Bukanov

Assignee

Comment 14

•

18 years ago

Attached patch v4 (obsolete) — Details — Splinter Review

The new version allows checks allows to explicitly define JS_GC_USE_MMAP during compilation to either 0 or 1 to disable mmap completely or to forces the inclusion of proper system headers.

Attachment #278240 - Attachment is obsolete: true

Attachment #278241 - Flags: review?(brendan)

Attachment #278240 - Flags: review?(brendan)

Brendan Eich [:brendan]

Comment 15

•

18 years ago

Comment on attachment 278241 [details] [diff] [review] v4 The major comment starting: * Encodes a flag indicating if the arena is the first in the block, the is too ragged-right -- reformat as if by Q commands in vim with tw=78. Also: * the index of arena is either the index of a free arena holding should say "the index of the arena" or "the index of this arena in its block" or something like that. In: * All blocks that have at least one free arena are put to the doubly-linked s/put to/put in/ I saw another "put" abusage, or three (two pre-existing): " * All blocks that have at least one free arena are put to the doubly-linked for these, the simplest fix is "put on". * the head of linked list of free arenas in the block together with previous how about "the head of a linked free arena list for the block" or "the head of the block's free arena list"? * it is removed from the list and BLOCK_ALL_USED_FLAG is set indicating that BLOCK_ALL_USED_FLAG is not defined, so this comment seems stale. One question: what bounds the number of mmap calls, if anything? These are not as cheap as one might like, and may consume enough OS overhead that we would prefer bigger chunks than 7 or 8 arenas' worth. More comments in a bit, feel free to put up a new patch. /be

Brendan Eich [:brendan]

Comment 16

•

18 years ago

Sorry, I meant to suggest "put on" for * All blocks that have at least one free arena are put to the doubly-linked not "put in". "Inserted in" but "put on". Micro-nit on my own comment, but it was inconsistent with itself (later recommending "put on" over "put to"). /be

Igor Bukanov

Assignee

Comment 17

•

18 years ago

(In reply to comment #15) > One question: what bounds the number of mmap calls, if anything? In the current patch the limit on the number of arenas in the block is limited by 2^ARENA_INDEX_BITS - 1 or 2^(JS_GC_ARENA_SHIFT - 1) - 1 which gives the maximum 2047 arenas per block limiting the block size by 8MB. > These are not > as cheap as one might like, and may consume enough OS overhead that we would > prefer bigger chunks than 7 or 8 arenas' worth. With synthetic benchmark like for (var i = 0; i != 1000*1000; ++i) var tmp = i+0.1; I have not noticed a difference with 4 or more arenas on Linux, so the patch uses 8 under assumption that on Windows, where system calls are more expensive, they are not that much worse than on Linux. But clearly the final value should be exposed by tests on Windows. With no mmap available the patch defaults to 7 and not 8 arenas so the total size of malloc block would be the same as with mmap. In principle for less overhead for string and double GC things the default may be set to 15, but then that has to be balanced with heap fragmentation.

Igor Bukanov

Assignee

Comment 18

•

18 years ago

Attached patch v4b (obsolete) — Details — Splinter Review

Fixing comments as suggested.

Attachment #278241 - Attachment is obsolete: true

Attachment #278473 - Flags: review?(brendan)

Attachment #278241 - Flags: review?(brendan)

Brendan Eich [:brendan]

Comment 19

•

18 years ago

Comment on attachment 278473 [details] [diff] [review] v4b More comments, sorry for incremental review. One more cycle after this, new patch optional (if you are up still). "Block" is typically an i/o unit, atomic to some part of a storage hierarchy, not a cluster of arenas or pages. Suggest Cluster or Chunk (latter is as short as Block and less specific). If we really wanted the right words for a hierarchy of Arena/Block, we might prefer Page and Cluster or Segment. Segment implies contiguous span of memory. But Chunk would suffice and be better than Block in my book. s/gcUnscannedArenaStackTop/gcUnscannedStackTop/ or even gcUnscannedArenaTop -- Top implies Stack. Please remove pre-existing FINALIZE_LEN deadwood. Need Windows (and Mac?) build buddies -- I will help test Mac tonight if needed. Crowder, can you try Windows? In: * Effectively things are allocated from the start of things' area and flags * are allocated from the end of flags' area. Such arrangement avoids please use "their" instead of "things'" and "flags'". Also a comma after "Effectively", but I think you could just lose "Effectively" altogether. Just wondering why the JS_GC_ prefixing of macros -- GC_ might not be enough in light of Windows.h? /be

Igor Bukanov

Assignee

Comment 20

•

18 years ago

Attached patch v5 (obsolete) — Details — Splinter Review

Besides addressing nits and doing renames as suggested above, the patch fixes js_InitGC to set js_gcUseMmap to false with too small/too big pages. In the new patch only JS_GC_USE_MMAP uses JS_GC_ prefix as the macro can be set from the command line. The rest of macros just use GC_ when necessary.

Attachment #278473 - Attachment is obsolete: true

Attachment #278555 - Flags: review?(brendan)

Attachment #278473 - Flags: review?(brendan)

Igor Bukanov

Assignee

Comment 21

•

18 years ago

(In reply to comment #19) > If we really wanted the right words for a > hierarchy of Arena/Block, we might prefer Page and Cluster or Segment. In an initial prototype of the patch I replaced arena by page and used the former to mean "chunk" in the new patch. But then I realized that size(gc arena) != size(cpu page) in general and, to avoid confusion with CPU pages and extra renames in js_GC and other places explicitly accessing gc arenas, I kept the arena. On the other hand, given the alignment requirements, "page" plays nicely with "chunk" and comments can easily clarify that gc page is not a cpu page. So should I rename JSGCArenaInfo a -> JSGCPageInfo page and update the comments accordingly?

Igor Bukanov

Assignee

Comment 22

•

18 years ago

Attached patch v5b (obsolete) — Details — Splinter Review

jsgc.c currently uses the word "chunk" to mean "Number of things covered by a single bit of JSGCArena.unscannedThings." The new patch changes that usage into "segment" as "chunk" now means a sequence of GC arenas.

Attachment #278555 - Attachment is obsolete: true

Attachment #278557 - Flags: review?

Attachment #278555 - Flags: review?(brendan)

Brian Crowder

Comment 23

•

18 years ago

v5b patch introduces the following new warnings: jsgc.c(2615) : warning C4018: '<' : signed/unsigned mismatch jsgc.c(2678) : warning C4018: '<' : signed/unsigned mismatch The test-suite isn't working for me on Windows right now, I'll play with it in the morning.

Igor Bukanov

Assignee

Comment 24

•

18 years ago

Attached patch v6 (obsolete) — Details — Splinter Review

Here is an attempt to fix the warnings reported by MSVC. I think it comes uint16->int32 promotion rule in arena->list->thingSize + 1 where thingSize is uint16 as 1 is int, not uint, literal. Thus the new patch uses explicit (uint32) 1.

Attachment #278557 - Attachment is obsolete: true

Attachment #278560 - Flags: review?(brendan)

Attachment #278557 - Flags: review?

Igor Bukanov

Assignee

Comment 25

•

18 years ago

When testing on Windows, it would be nice to know how the timing with an optimized build of jsshell for following test is affected when one change 8 to 1-2-4-16 in lines 928-929, js_InitGC, js_gcArenasPerChunk = (arenasPerPage < 8) ? 8 : (size_t) arenasPerPage; The test: function test() { var startTime = Date.now(); var i = 1000*1000; var tmp = 0; do { tmp += 0.1; } while (--i != 0); tmp = null; gc(); var time = Date.now() - startTime; print("Test time: "+time+"ms"); } test(); This will measure the overhead of mmap/VirtualAlloc on Posix/Windows. On my 1.1Ghz Pentium M laptom the best timing from 20 runs was: js_gcArenasPerChunk test-timing sys-timing 1 454 12 2 431 9 4 428 4 8 432 5 16 428 5 where sys-timing is the best system time as reported by time ./Linux_All_OPT.OBJ/js ~/m/y.js > /dev/null So I will change the default from 8 to 4 under Linux since it seems that very slight but increase when going from 4 to 8 pages is real.

Brian Crowder

Comment 26

•

18 years ago

I was getting results all over the map with i == 1,000,000, so I made it 10,000,000 instead (best result of ten runs of test()): 1: 2203 2: 2140 4: 2094 8: 2094 16: 2091 Alas, still getting the same compiler warnings (new line numbers): jsgc.c(2616) : warning C4018: '<' : signed/unsigned mismatch jsgc.c(2679) : warning C4018: '<' : signed/unsigned mismatch

Igor Bukanov

Assignee

Comment 27

•

18 years ago

(In reply to comment #25) > js_gcArenasPerChunk test-timing sys-timing > 1 454 12 > 2 431 9 > 4 428 4 > 8 432 5 > 16 428 5 Here is the results without the patch: > without-the-patch 474 5 > patch with no mmap 470 6 Where "patch with no mmap" are results of the patch compiled with -DJS_GC_USE_MMAP=0.

Igor Bukanov

Assignee

Comment 28

•

18 years ago

(In reply to comment #26) > I was getting results all over the map with i == 1,000,000, so I made it > 10,000,000 instead (best result of ten runs of test()): > > 1: 2203 > 2: 2140 > 4: 2094 > 8: 2094 > 16: 2091 What about 24 and 32 and without the patch? I just want to confirm that 8 reaches plateau on Windows as well.

Igor Bukanov

Assignee

Comment 29

•

18 years ago

Attached patch v7 (obsolete) — Details — Splinter Review

This should fix all the warnings.

Attachment #278560 - Attachment is obsolete: true

Attachment #278569 - Flags: review?(brendan)

Attachment #278560 - Flags: review?(brendan)

Brian Crowder

Comment 30

•

18 years ago

> What about 24 and 32 and without the patch? I just want to confirm that 8 > reaches plateau on Windows as well. > 24: 2150 32: 2134 Without the patch: 2088! And hovering in the mid-to-low 2090's for most runs. The new version does fix all warnings, however.

Igor Bukanov

Assignee

Comment 31

•

18 years ago

(In reply to comment #30) > > What about 24 and 32 and without the patch? I just want to confirm that 8 > > reaches plateau on Windows as well. > > > > 24: 2150 > 32: 2134 So this means that VirtualAlloc(4 CPU pages) brings the lowest overall overhead on Windows as well. > > Without the patch: > > 2088! Interesting. It seems that allocating of 9K is extremely optimized on Windows. Could you repeat the test with 4, 8 and 16 gc arenas per chunk and modified test case where double is replaced by object: function test() { var startTime = Date.now(); var i = 10*1000*1000 / 4; // sizeof(object) = 4*sizeof(double) var tmp = 0; do { tmp = {}; } while (--i != 0); tmp = null; gc(); var time = Date.now() - startTime; print("Test time: "+time+"ms"); } test(); With the patch the total memory allocation for the test should be 10% less so it would be interesting to compare with/without patch cases.

Igor Bukanov

Assignee

Comment 32

•

18 years ago

Timing with the test case from comment 31 on a 1.1 GHz Pentium M Laptop under Fedora 7: js_gcArenasPerChunk test-timing 1 2477 2 2450 4 2433 8 2431 16 2413 24 2414 32* 2426 64* 2424 patch with no mmap 2424 no patch 2435 Now it seems 16 arenas is best when allocating objects. * GC_ARENAS_PER_CPU_PAGE_LIMIT was altered to be 64 during the run

Brendan Eich [:brendan]

Comment 33

•

18 years ago

From comment 26, 16 arenas looks best on Windows too, right? How about Mac OS X? /be

Brian Crowder

Comment 34

•

18 years ago

(In reply to comment #33) > From comment 26, 16 arenas looks best on Windows too, right? How about Mac OS > X? It seems so, though I get equivalent or very, very slightly better results without the patch on Win32.

Igor Bukanov

Assignee

Comment 35

•

18 years ago

(In reply to comment #34) > It seems so, though I get equivalent or very, very slightly better results > without the patch on Win32. One more question about Windows performance. What happens if GC_ARENAS_PER_CPU_PAGE_LIMIT is altered to 1000 and js_gcArenasPerChunk is forced to be 128/256 with double/object test cases?

Brendan Eich [:brendan]

Comment 36

•

18 years ago

Comment on attachment 278569 [details] [diff] [review] v7 * from the end of their area. Such arrangement avoids calculating the Nit: could just say "This avoids ..." and wrap better (maybe). * Macros to decode/encode previous arena on the GC list and chunk gap Nit: "the previous arena ..." Couldn't you use bit-fields instead of word1 and word2 and BIG_UGLY_MACROS (I know, I started that trend long ago ;-)? s/\<bi\>/ci/g (block=>chunk renaming fallout) Nit: 1U is slightly easier on the eyes than (uint32) 1, and just as good. Suggest avoiding SEGMENT and just using SPAN (spanIndex, etc.) in context of UNSCANNED_ and the like -- just as clear or moreso because shorter, less associated with large (multi-"page") runs of memory, and (yeah) shorter. I'm ok with arena/chunk instead of page/chunk or similar. It's hard to get these names right, but we're close and arena is traditional. Need clarity on perf at large number of OS pages per chunk values on top three platforms, and nitpicks if you can, and I'll r/a+. /be

Igor Bukanov

Assignee

Comment 37

•

18 years ago

Here is another test that makes everything allocated reachable until the explicit GC call not to depend on the effects of the last ditch GC: function test() { var startTime = Date.now(); var i = 1000*1000; var o = null; do { o = {o: o}; } while (--i != 0); o = null; gc(); var time = Date.now() - startTime; print("Test time: "+time+"ms"); } test(); When I run it in Linux console with no X11 running, I got the following results on 1.1 GHz Pentium M laptop with Fedora 7 for optimized js shell arenas per chunk time of the best among 10 runs 1 1882 2 1871 3 1873 4 1859 6 1864 8 1850 12 1858 16 1859 24 1846 32 1854 48 1864 64 1852 96 1871 128 1856 192 1855 256 1850 no mmap 1860 no patch 1881

Igor Bukanov

Assignee

Comment 38

•

18 years ago

Attached patch v8 (obsolete) — Details — Splinter Review

The new patch follows suggestions from comment 36 except the bit field ides. I will try that in the next patch.

Attachment #278569 - Attachment is obsolete: true

Attachment #278569 - Flags: review?(brendan)

Igor Bukanov

Assignee

Comment 39

•

18 years ago

Attached patch v9 (using bit fields) (obsolete) — Details — Splinter Review

The version of the patch that uses bit fields in JSGCArenaInfo instead of explicit bit manipulations. This indeeds reduces the number of macros, adds clarity and decreases the number of ifs as the previous patch contained code like: a = GET_PREV_ARENA(a); if (!a) jump; where GET_PREV_ARENA already contained the check for NULL address. But to proceed with bit fields, I need to verify that on Windows/Mac compilers implements the bit fields properly. So the patch contains: JS_STATIC_ASSERT(sizeof(JSGCArenaInfo) == sizeof(jsuword) * 4); So does it compile on Windows/Mac?

Brian Crowder

Comment 40

•

18 years ago

Running with the current patch under windows, I hit the following assertion: Assertion failure: arenaList->lastCount > 0, at jsgc.c:2622

Igor Bukanov

Assignee

Comment 41

•

18 years ago

Attached patch v10 (using bit fields) (obsolete) — Details — Splinter Review

The new patch fixes the asserts in the previous bit-field patch.

Attachment #278703 - Attachment is obsolete: true

Attachment #278747 - Attachment is obsolete: true

Igor Bukanov

Assignee

Comment 42

•

18 years ago

Comment on attachment 279069 [details] [diff] [review] v10 (using bit fields) The patch passes shell's test suite and fF smoke test.

Attachment #279069 - Flags: review?(brendan)

Igor Bukanov

Assignee

Comment 43

•

18 years ago

For performance tuning I would like to know the results for the test from comment 37 on Windows/Mac with the following values of js_gcArenasPerChunk: 1 2 4 8 16 32 64 For that just add in js_InitGC from jsgc.c after the line: js_gcArenasPerChunk = JS_MAX((uint32) arenasPerPage, 8); a line like js_gcArenasPerChunk = <the desired value>; For completeness it would be interesting to know the results without the patch and the results with mmap/VirtaulAlloc disabled. For the later change the line js_InitGC that reads: if (arenasPerPage - 1 <= (size_t) (GC_ARENAS_PER_CPU_PAGE_LIMIT - 1)) { into something like if (0 && arenasPerPage - 1 <= (size_t) (GC_ARENAS_PER_CPU_PAGE_LIMIT - 1)) {

Igor Bukanov

Assignee

Comment 44

•

18 years ago

Attached patch v11 (with bit fields) (obsolete) — Details — Splinter Review

The new version fixes JS_GCMETER and uses 1K-sized arenas when mmap is not available to minimize the overhead of aligned allocation implemented through oversized malloc.

Attachment #279069 - Attachment is obsolete: true

Attachment #279160 - Flags: review?(brendan)

Attachment #279069 - Flags: review?(brendan)

Brian Crowder

Comment 45

•

18 years ago

Using the test from comment #37, 1: 1485 2: 1422 4: 1375 8: 1390 16: 1390 32: 1375 64: 1406 With MMAP disabled: 1486 Without the patch: 1515ms (with most other results very nearby)

Status: NEW → ASSIGNED

Igor Bukanov

Assignee

Comment 46

•

18 years ago

(In reply to comment #45) > Using the test from comment #37, > > 1: 1485 > 2: 1422 > 4: 1375 > 8: 1390 > 16: 1390 > 32: 1375 > 64: 1406 Since the test from comment 37 makes sure that 32MB of memory is constantly allocated until the explicit GC call, the results should be unhindered by CPU cache effects. So I will switch the patch to use 4 CPU pages per GC chunk.

Igor Bukanov

Assignee

Comment 47

•

18 years ago

Attached patch v12 (with bit fields) (obsolete) — Details — Splinter Review

The new version uses 16K chunks with 4 4K-sized arenas.

Attachment #279160 - Attachment is obsolete: true

Attachment #279715 - Flags: review?(brendan)

Attachment #279160 - Flags: review?(brendan)

Igor Bukanov

Assignee

Comment 48

•

18 years ago

Attached patch v13 (with bit fields) (obsolete) — Details — Splinter Review

In the new version I renamed gcUnscannedArenaTop back to gcUnscannedArenaStackTop to minimize the patch.

Attachment #279715 - Attachment is obsolete: true

Attachment #279716 - Flags: review?(brendan)

Attachment #279715 - Flags: review?(brendan)

Igor Bukanov

Assignee

Updated

•

18 years ago

Blocks: 333236

Igor Bukanov

Assignee

Comment 49

•

18 years ago

Attached patch v14 (with bit fields) (obsolete) — Details — Splinter Review

The new version fixes the wrong assert in FLAGP_TO_THING macro and optimizes JS_CallTracer for string and double things. There the code uses THING_TO_FLAGP(thing, thingSize) with a compile-time constant as thingSize so a compiler has more options for the optimization.

Attachment #279716 - Attachment is obsolete: true

Attachment #280260 - Flags: review?(brendan)

Attachment #279716 - Flags: review?(brendan)

Igor Bukanov

Assignee

Comment 50

•

18 years ago

Attached patch v15 (with bit fields) (obsolete) — Details — Splinter Review

Cleanups and optimizations to reduce the number of ifs taken on various code paths.

Attachment #280260 - Attachment is obsolete: true

Attachment #280265 - Flags: review?(brendan)

Attachment #280260 - Flags: review?(brendan)

Igor Bukanov

Assignee

Comment 51

•

18 years ago

Attached patch diff -b for v15 (obsolete) — Details — Splinter Review

Brendan Eich [:brendan]

Comment 52

•

18 years ago

Comment on attachment 280265 [details] [diff] [review] v15 (with bit fields) struct JSGCArenaInfo may be the first use of bit-fields in SpiderMonkey. Slight style preference for space before the : as well as after (aligning field sizes in a column is cool still). Need "these" before "indexes" here: * access either of indexes. nfreeArenas => numFreeArenas Nit: overflow line should underhang right operand of = here: chunkGap = (GC_ARENA_SIZE - (chunk & GC_ARENA_MASK)) & GC_ARENA_MASK; Non-nit: the overflow line and final & on first line are not needed since GC_ARENA_MASK = GC_ARENA_SIZE - 1. Too bad we can't use the extra arena allocated and wasted for alignment in the malloc-based chunk. RemoveChunkFromList would be shorter if you used a double-indirect "prevp" -- see bug 391211 comment 18. Suggest early return in NewGCArena for the js_gcArenasPerChunk == 1 case, to match DestroyGCArena. Nit: "If it is fully used, ..." (missing "is") in: * Try to allocate things from the last arena. If it fully used, check Old nit: does "Bag" mean something useful in AddToUnscannedBag etc.? It can't mean "set of values allowing redundant elements, i.e. a multiset". More comments in a bit, almost done. /be

Igor Bukanov

Assignee

Comment 53

•

18 years ago

(In reply to comment #52) > Too bad we can't use the extra arena allocated and wasted for alignment in the > malloc-based chunk. Well, malloc code is the last resort code when if mmap or some other alignment primitives is not available. For example, on POSIX systems (including Linux) there is posix_memalign that can be used as mmap replacement. So if on some platform that cannot use mmap one would like to avoid 3% overhead of overallocated chunk, then one can try that. > Old nit: does "Bag" mean something useful in AddToUnscannedBag etc.? It can't > mean "set of values allowing redundant elements, i.e. a multiset". A "bag" here means a container where one can add an element and from which one can retrive elements one by one. So this is not a set since the set allows to test for membership and such operation is not required here. It is not even a queue since the queue assumes a particular retrieval order and the code does not need to impose any particular ordering on things in the container. On the other hand such container resembles pretty much a real bag or sack.

Brendan Eich [:brendan]

Comment 54

•

18 years ago

memalign, right -- I was messing with trace-malloc lately, I should have thought of that. Bag => multiset in my book, so "sack" is better for me. But it's a very minor nit. /be

Igor Bukanov

Assignee

Comment 55

•

18 years ago

Attached patch v16 (obsolete) — Details — Splinter Review

The new version brings the following changes besides addressing the nits: 1. When the code uses malloc for the arena allocation, the gap between the malloc area and the first GC arena is stored outside GC chunk. It allowed to use a pointer for JSGCArenaInfo.prev field avoiding the bitfield and simplifying the code. 2. The code uses prepv method to implement the doubly-linked list of JSGCChunkInfo. My only excuse for not using it in the first place is that some initial version of the patch have used paged addresses for JSGCChunkInfo.(prev|next). With that setup it was not possible to implement the method. 3. For consistency with the rest of code AddThingToUnscannedSack and ScanDelayedChildren uses "a", not "arena" as name for a local variable holding JSGCArenaInfo pointer. 4. The new version fixed bad bug in the free phase loop in JS_GC. The code there unconditionally reset arenaList->lastCount even if the first arena arena were not freed. Too bad that the test suite does not covers that.

Attachment #280265 - Attachment is obsolete: true

Attachment #280266 - Attachment is obsolete: true

Attachment #280491 - Flags: review?(brendan)

Attachment #280265 - Flags: review?(brendan)

Brendan Eich [:brendan]

Comment 56

•

18 years ago

Comment on attachment 280491 [details] [diff] [review] v16 Rest of v15 comments. I will re-review via interdiff a final patch and stamp it. /be ----- Note "bellow" misspelling: * NewGCChunk/DestroyGCChunk bellow for details. Backslash not indented enough in: #define GET_ARENA_CHUNK(arena, index) \ Nit: blank line before: rt->gcBytes += GC_ARENA_SIZE; in js_NewGCThing. Non-nit/question: why not do the gcBytes +/-= GC_ARENA_SIZE in New/DestroyGCArena? Nit: rename UNSCANNED_SPAN to THINGS_PER_UNSCANNED_BIT? Not sure why I urged "SPAN" before. Let's keep the name concrete in terms of a bitmap. Sanity check: replace "span" with "bit" in: * span as the real limit can be inside the span. to get: * bit as the real limit can be "inside" the bit. (I added scare-quotes ;-) and it works for me. How about you? Further check: * Skip free or already scanned things that share the span becomes: * Skip free or already scanned things that share the bit also reads nicely. In this more concrete light, perhaps s/Bag/Bitmap/. Could use a "so" before "push" here: * The thing is the first unscanned thing in the whole arena, push the Probably want "[the] unscanned stack", not "[the] unscan stack", in: * unscan stack since AddThingToUnscannedBag ensures that even for and a comma after "stack" would be nice. Nit: is it worthwhile to do explicit c.s.e. in: } else if (endIndex > THINGS_PER_ARENA(thingSize)) { endIndex = THINGS_PER_ARENA(thingSize); There's a divide hiding in the macro, but modern compilers should common for us. Typo or wrong name in this comment: * When JS_TraceChildren from the above calls JS_Trace that in turn on s/JS_Trace /JS_CallTracer /. Actually, grep on current source finds two such misnomers: $ grep -w JS_Trace *.c jsgc.c: * When JS_TraceChildren from the above calls JS_Trace that in turn jsgc.c: * after it calls JS_Trace or JS_MarkGCThing for the last time, the Nit: the change_list label and goto seem like a candidate for a do-while loop. But this requires a goto downward to exit the for (;;) loop. And it requires more indentation. Still, house style tries to use downward gotos but not upward ones that easily rewrite to canonical loop statements. s/if/is/ in: * Optimize for string and double as their size if known and their tracing Canonical comment for: + /* unreachable */ uses "NOTREACHED", not "unreachable". -END-

Brendan Eich [:brendan]

Comment 57

•

18 years ago

(In reply to comment #55) > 4. The new version fixed bad bug in the free phase loop in JS_GC. The code > there unconditionally reset arenaList->lastCount even if the first arena arena > were not freed. Too bad that the test suite does not covers that. Can you contrive a testcase? /be

Igor Bukanov

Assignee

Comment 58

•

18 years ago

(In reply to comment #56) > Note "bellow" misspelling: I must learn the proper spelling by now :( > Non-nit/question: why not do the gcBytes +/-= GC_ARENA_SIZE in > New/DestroyGCArena? gcBytes is irrelevant to arena allocation code so I moved its checks and updates to js_NewGCThing/js_GC/js_DestroyArenaList, the places that define GC memory pressure heuristics. > In this more concrete light, perhaps s/Bag/Bitmap/. Do you mean to s/Sack/Bitmap/ in the latest patch. IMO it does not read good. Compare: static void AddThingToUnscannedSack(JSRuntime *rt, uint8 *flagp) ... if (RECURSION_TOO_DEEP()) AddThingToUnscannedSack(rt, flagp); versus static void AddThingToUnscannedBitmap(JSRuntime *rt, uint8 *flagp) ... if (RECURSION_TOO_DEEP()) AddThingToUnscannedBitmap(rt, flagp); Perhaps it is better to call the function DelayChildrenScanning. It matches ScanDelayedChildren and even reads better: static void DelayChildrenScanning(JSRuntime *rt, uint8 *flagp) ... if (RECURSION_TOO_DEEP()) DelayChildrenScanning(rt, flagp); After typing this i realized that it using DelayChildrenTracing/TraceDelayedChildren would be even better: static void DelayChildrenScanning(JSRuntime *rt, uint8 *flagp) ... if (RECURSION_TOO_DEEP()) DelayChildrenTracing(rt, flagp); else JS_TraceChildren(trc, thing, kind); ... JS_TraceChildren(trc, thing, kind); TraceDelayedChildren(trc); But that would require comment rewrite.

Brendan Eich [:brendan]

Comment 59

•

18 years ago

DelayTracingChildren is the better order among the words in the identifier. I like it! Looking forward to a new (last) patch. /be

Igor Bukanov

Assignee

Comment 60

•

18 years ago

Attached patch v17 (obsolete) — Details — Splinter Review

Addressing nits and replacing (un)scanned -> (un)traced. Now there is no words "bag" or "sack" in GC sources ;)

Attachment #280491 - Attachment is obsolete: true

Attachment #280521 - Flags: review?(brendan)

Attachment #280491 - Flags: review?(brendan)

Brendan Eich [:brendan]

Comment 61

•

18 years ago

Comment on attachment 280521 [details] [diff] [review] v17 >+ goto change_list; > for (;;) { >+ if (a->list != arenaList) { >+ change_list: >+ arenaList = a->list; That's a downward goto, all right, but still a bit tricky. If a few cycles are not an issue here (I claim they are not), I'd prefer setting arenaList = NULL; before the for loop and letting the if do its thing. r+a=me with that. Thanks, terrific patch. /be

Attachment #280521 - Flags: review?(brendan)

Attachment #280521 - Flags: review+

Attachment #280521 - Flags: approval1.9+

Igor Bukanov

Assignee

Comment 62

•

18 years ago

Attached patch v18 (obsolete) — Details — Splinter Review

The new version skips goto and ifs in TraceDelayedChildren and unconditionally calls THINGS_PER_UNTRACED_BIT(thingSize) per each arena. This makes this rarely executed code simpler and trades storage of few temporaries for calculations.

Attachment #280521 - Attachment is obsolete: true

Attachment #280648 - Flags: review?(brendan)

Brendan Eich [:brendan]

Updated

•

18 years ago

Attachment #280648 - Flags: review?(brendan)

Attachment #280648 - Flags: review+

Attachment #280648 - Flags: approval1.9+

Brendan Eich [:brendan]

Comment 63

•

18 years ago

Forgot to ask that memalign comment would be nice (about how the malloc-only case could use memalign if available). /be

Igor Bukanov

Assignee

Updated

•

18 years ago

Blocks: 396007

Igor Bukanov

Assignee

Comment 64

•

18 years ago

Attached patch v18b (obsolete) — Details — Splinter Review

The new version contains comments suggestion to consider posix_memalign. Here is the inter-diff: @@ -555,16 +555,19 @@ NewGCChunk() return (p == MAP_FAILED) ? 0 : (jsuword) p; # else # error "Not implemented" # endif } #endif /* + * Implement the chunk allocation using over sized malloc if mmap can not + * be used. XXX Consider using posix_memalign here, see bug 396007. + * * Since malloc allocates pointers aligned on the word boundary, to get * js_gcArenasPerChunk aligned arenas, we need to malloc only * ((js_gcArenasPerChunk + 1) << GC_ARENA_SHIFT) - sizeof(size_t) * bytes. But since we stores the gap between the malloced pointer and the * first arena in the chunk after the chunk, we need to ask for * ((js_gcArenasPerChunk + 1) << GC_ARENA_SHIFT) * bytes to ensure that we always have room to store the gap. */

Attachment #280648 - Attachment is obsolete: true

Attachment #280710 - Flags: review+

Attachment #280710 - Flags: approval1.9+

Igor Bukanov

Assignee

Comment 65

•

18 years ago

Darmon, is it not too late for M8 check in?

Brendan Eich [:brendan]

Comment 66

•

18 years ago

New way is FIXME not XXX in such comments. Please change to that before committing. I hear M8 is done and this can go in when the tree is open for M9. /be

Igor Bukanov

Assignee

Comment 67

•

18 years ago

Attached patch v18c (obsolete) — Details — Splinter Review

Patch to check in with comments fixed. I

Attachment #280710 - Attachment is obsolete: true

Attachment #280991 - Flags: review+

Attachment #280991 - Flags: approval1.9+

Igor Bukanov

Assignee

Updated

•

18 years ago

Attachment #280991 - Attachment description: v1bc → v1c

Igor Bukanov

Assignee

Updated

•

18 years ago

Attachment #280991 - Attachment description: v1c → v18c

Igor Bukanov

Assignee

Comment 68

•

18 years ago

Attached patch v18d (obsolete) — Details — Splinter Review

Fixing one more typo.

Attachment #280991 - Attachment is obsolete: true

Attachment #280993 - Flags: review+

Attachment #280993 - Flags: approval1.9+

Igor Bukanov

Assignee

Comment 69

•

18 years ago

I checked in the patch from the comment 68 to the trunk: http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1189868942&maxdate=1189869125&who=igor%mir2.org

Status: ASSIGNED → RESOLVED

Closed: 18 years ago

Resolution: --- → FIXED

Igor Bukanov

Assignee

Comment 70

•

18 years ago

The patch busted Mac tinderboxes. Apparently on Mac MAP_ANONYMOUS is not defined.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Igor Bukanov

Assignee

Comment 71

•

18 years ago

Attached patch v19 — Details — Splinter Review

The new version uses a workaround for platforms where MAP_ANONYMOUS is not defined and only deprecated MAP_ANON is available. Here is the inter-diff: +++ js/src/jsgc.c 2007-09-15 17:37:01.000000000 +0200 @@ -96,16 +96,22 @@ # define JS_GC_USE_MMAP 0 #endif #if JS_GC_USE_MMAP # if defined(XP_WIN) # include <windows.h> # elif defined(XP_UNIX) || defined(XP_BEOS) # include <sys/mman.h> + +/* On Mac OS X MAP_ANONYMOUS is not defined. */ +# if !defined(MAP_ANONYMOUS) && defined(MAP_ANON) +# define MAP_ANONYMOUS MAP_ANON +# endif + # endif #endif I would appreciate if somebody can check that the patch compiles on Mac.

Attachment #280993 - Attachment is obsolete: true

Attachment #281009 - Flags: review?(brendan)

Attachment #281009 - Flags: approval1.9?

Brendan Eich [:brendan]

Comment 72

•

18 years ago

Comment on attachment 281009 [details] [diff] [review] v19 That should do it. /be

Attachment #281009 - Flags: review?(brendan)

Attachment #281009 - Flags: review+

Attachment #281009 - Flags: approval1.9?

Attachment #281009 - Flags: approval1.9+

Brendan Eich [:brendan]

Comment 73

•

18 years ago

I tested v19 on my Macbook Pro and it builds fine. /be

Igor Bukanov

Assignee

Comment 74

•

18 years ago

(In reply to comment #73) > I tested v19 on my Macbook Pro and it builds fine. Thanks, I checked in the patch from comment 71 to the trunk: http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1189876705&maxdate=1189876980&who=igor%mir2.org

Status: REOPENED → RESOLVED

Closed: 18 years ago → 18 years ago

Resolution: --- → FIXED

Igor Bukanov

Assignee

Comment 75

•

18 years ago

I will be away until 23:00 UTC / 16:00 PDT so if tinderbox testing reveals troubles, then the link in comments 74 should allow to take the patch away.

Ryan VanderMeulen [:RyanVM]

Comment 76

•

18 years ago

It appears this patch bumped up Lk on fxdbug-linux-tbox by about .25MB

Steve England [:stevee]

Comment 77

•

18 years ago

The jump's filed as bug 396299

Igor Bukanov

Assignee

Comment 78

•

18 years ago

I took out the patch again to investigate the increased leak.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Igor Bukanov

Assignee

Comment 79

•

18 years ago

I rechecked-in the patch from comment 71 to the trunk as taking it out had no influence on the leak counters: http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1189947780&maxdate=1189947840&who=igor%mir2.org

Status: REOPENED → RESOLVED

Closed: 18 years ago → 18 years ago

Resolution: --- → FIXED

Benjamin Smedberg

Updated

•

18 years ago

Depends on: 396936

Igor Bukanov

Assignee

Updated

•

18 years ago

Depends on: 408409

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: DOM → DOM: Core & HTML

You need to log in before you can comment on or make changes to this bug.