Using CPU pages as GC arenas

RESOLVED FIXED

Status

()

Core
DOM
--
enhancement
RESOLVED FIXED
11 years ago
11 years ago

People

(Reporter: Igor Bukanov, Assigned: Igor Bukanov)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 30 obsolete attachments)

v19
81.71 KB, patch
brendan
: review+
brendan
: approval1.9+
Details | Diff | Splinter Review
(Assignee)

Description

11 years ago
[This is a spin-off of bug 51 comment 157334]

It is worth to try to use CPU pages allocated via mmap or VirtualAlloc as GC arenas. It should allow for very fast js_GetGCThingFlags and would remove all the alignment complexity and overhead in the current jsgc.c.
(Assignee)

Comment 1

11 years ago
Created attachment 276722 [details] [diff] [review]
v0.1

This is backup of the initial implementation that does not use any mmap or VirtualAlloc calls. Instead it just allocs 9 * page_size bytes and uses 8 aligned pages inside this chunk with the overhead of 12.5%. I will test the code and later use it as a default implementation when mmap is not available on the target platform.
It would also allow us to map PROT_READ|PROT_WRITE, no PROT_EXEC -- defeating a class of security exploits that contrive to put the pc in a JS string or similar and execute its characters as instructions.

/be
(Assignee)

Comment 3

11 years ago
(In reply to comment #2)
> It would also allow us to map PROT_READ|PROT_WRITE, no PROT_EXEC -- defeating a
> class of security exploits that contrive to put the pc in a JS string or
> similar and execute its characters as instructions.

AFAICS !PROT_EXEC for GC thinsg does not help since malloc would continue to allocate charaters for strings. Thus spraying the heap with strings with various forms of NOP instructions chunks before the bad payload would still work.
(Assignee)

Comment 4

11 years ago
Created attachment 276841 [details] [diff] [review]
v0.2

Various fixes
Attachment #276722 - Attachment is obsolete: true
I was hoping we would get to the point where string character vectors would be allocated from ~PROT_EXEC memory.

/be
(Assignee)

Comment 6

11 years ago
Created attachment 277636 [details] [diff] [review]
v0.3

New versions with less overhead when mmap is not available plus infrastructure to  use several arenas per mmap call. The latter is necessary to minimize overhead of mmap call.
Attachment #276841 - Attachment is obsolete: true
(Assignee)

Comment 7

11 years ago
Created attachment 277949 [details] [diff] [review]
v0.4

The check point before adding mmap support on POSIX. Note that even without mmap support the patch shows faster allocation / GC on simple benchmarks.
(Assignee)

Updated

11 years ago
Attachment #277636 - Attachment is obsolete: true
(Assignee)

Comment 8

11 years ago
Created attachment 277973 [details] [diff] [review]
v0.5

The new version uses mmap on POSIX systems when available.
Attachment #277949 - Attachment is obsolete: true
(Assignee)

Comment 9

11 years ago
Created attachment 277979 [details] [diff] [review]
v0.6

The new version fixes warnings with the last one when compiling FF.
Attachment #277973 - Attachment is obsolete: true
(Assignee)

Comment 10

11 years ago
Created attachment 278024 [details] [diff] [review]
v1

The new version adds VirtualAlloc calls under Windows. This is based on info from http://msdn2.microsoft.com/en-us/library/aa366887.aspxbut I have no means to test it.
Attachment #277979 - Attachment is obsolete: true
(Assignee)

Comment 11

11 years ago
Created attachment 278061 [details] [diff] [review]
v2

This is optimized version of v1. Notes on the patch:

1. Patch uses 4K arenas aligned on 4K boundaries to allocate GC things. This is hard coded even with 64-bit CPU as x64 uses 4K CPU pages. If mmap is not available, then to get properly alignment arenas the code allocates 32K and uses seven arenas in the block that are guaranteed to be aligned on 4K boundary.

2. Even with mmap available the patch allocates arenas in blocks to minimize the overhead of mmap.

3. As the page size is not a compile-time constant, for extra portability the code checks for the page size at startup and switches to the fall back schema if the size exceeds some sanity limit. Since the arena size is a compile-time constant, this runtime checks adds no visible overhead.

4. The XP_WIN part of the code is not tested beyond parsing by C compiler and based on http://msdn2.microsoft.com/en-us/library/aa366887.aspx.

5. Various synthetic benchmarks like:

function test(n)
{
    var now = Date.now;
    var startTime = now();

    var o = null;
    var array = [];
    // Interleave garbage with reachable things
    while (n-- != 0) {
        var x = 1.1 + n;
        if (n % 2)
            array.push(x);
        x = null;
    }
    for (var i = 0; i != 20; ++i) 
        gc();
}

test(100 * 1000);

shows a slight speedup of about 1% with the patch.
Attachment #278024 - Attachment is obsolete: true
Attachment #278061 - Flags: review?(brendan)
I will review by Monday, probably by tomorrow.

/be
(Assignee)

Comment 13

11 years ago
Created attachment 278240 [details] [diff] [review]
v3

The new version fixes a bug in OOM detection with mmap. When memory is not available, mmap returns MAP_FAILED or (void *) -1, not NULL.
Attachment #278061 - Attachment is obsolete: true
Attachment #278240 - Flags: review?(brendan)
Attachment #278061 - Flags: review?(brendan)
(Assignee)

Comment 14

11 years ago
Created attachment 278241 [details] [diff] [review]
v4

The new version allows checks allows to explicitly define JS_GC_USE_MMAP during compilation to either 0 or 1 to disable mmap completely or to forces the inclusion of proper system headers.
Attachment #278240 - Attachment is obsolete: true
Attachment #278241 - Flags: review?(brendan)
Attachment #278240 - Flags: review?(brendan)
Comment on attachment 278241 [details] [diff] [review]
v4

The major comment starting:

     * Encodes a flag indicating if the arena is the first in the block, the

is too ragged-right -- reformat as if by Q commands in vim with tw=78. Also:

     * the index of arena is either the index of a free arena holding

should say "the index of the arena" or "the index of this arena in its block" or something like that.

In:

 * All blocks that have at least one free arena are put to the doubly-linked

s/put to/put in/

I saw another "put" abusage, or three (two pre-existing): "

 * All blocks that have at least one free arena are put to the doubly-linked

for these, the simplest fix is "put on".

 * the head of linked list of free arenas in the block together with previous

how about "the head of a linked free arena list for the block" or "the head of the block's free arena list"?

 * it is removed from the list and BLOCK_ALL_USED_FLAG is set indicating that

BLOCK_ALL_USED_FLAG is not defined, so this comment seems stale.

One question: what bounds the number of mmap calls, if anything? These are not as cheap as one might like, and may consume enough OS overhead that we would prefer bigger chunks than 7 or 8 arenas' worth.

More comments in a bit, feel free to put up a new patch.

/be
Sorry, I meant to suggest "put on" for

 * All blocks that have at least one free arena are put to the doubly-linked

not "put in". "Inserted in" but "put on". Micro-nit on my own comment, but it was inconsistent with itself (later recommending "put on" over "put to").

/be
(Assignee)

Comment 17

11 years ago
(In reply to comment #15)
> One question: what bounds the number of mmap calls, if anything?

In the current patch the limit on the number of arenas in the block is limited by 2^ARENA_INDEX_BITS - 1 or 2^(JS_GC_ARENA_SHIFT - 1) - 1 which gives the maximum 2047 arenas per block limiting the block size by 8MB.

> These are not
> as cheap as one might like, and may consume enough OS overhead that we would
> prefer bigger chunks than 7 or 8 arenas' worth.

With synthetic benchmark like 
  for (var i = 0; i != 1000*1000; ++i) var tmp = i+0.1;

I have not noticed a difference with 4 or more arenas on Linux, so the patch uses 8 under assumption that on Windows, where system calls are more expensive, they are not that much worse than on Linux. But clearly the final value should be exposed by tests on Windows.

With no mmap available the patch defaults to 7 and not 8 arenas so the total size of malloc block would be the same as with mmap. In principle for less overhead for string and double GC things the default may be set to 15, but then that has to be balanced with heap fragmentation.
(Assignee)

Comment 18

11 years ago
Created attachment 278473 [details] [diff] [review]
v4b

Fixing comments as suggested.
Attachment #278241 - Attachment is obsolete: true
Attachment #278473 - Flags: review?(brendan)
Attachment #278241 - Flags: review?(brendan)
Comment on attachment 278473 [details] [diff] [review]
v4b

More comments, sorry for incremental review. One more cycle after this, new patch optional (if you are up still).

"Block" is typically an i/o unit, atomic to some part of a storage hierarchy, not a cluster of arenas or pages. Suggest Cluster or Chunk (latter is as short as Block and less specific). If we really wanted the right words for a hierarchy of Arena/Block, we might prefer Page and Cluster or Segment. Segment implies contiguous span of memory. But Chunk would suffice and be better than Block in my book.

s/gcUnscannedArenaStackTop/gcUnscannedStackTop/ or even gcUnscannedArenaTop -- Top implies Stack.

Please remove pre-existing FINALIZE_LEN deadwood.

Need Windows (and Mac?) build buddies -- I will help test Mac tonight if needed. Crowder, can you try Windows?

In:

 * Effectively things are allocated from the start of things' area and flags
 * are allocated from the end of flags' area. Such arrangement avoids

please use "their" instead of "things'" and "flags'". Also a comma after "Effectively", but I think you could just lose "Effectively" altogether.

Just wondering why the JS_GC_ prefixing of macros -- GC_ might not be enough in light of Windows.h?

/be
(Assignee)

Comment 20

11 years ago
Created attachment 278555 [details] [diff] [review]
v5

Besides addressing nits and doing renames as suggested above, the patch fixes js_InitGC to set js_gcUseMmap to false with too small/too big pages.

In the new patch only JS_GC_USE_MMAP uses JS_GC_ prefix as the macro can be set from the command line. The rest of macros just use GC_ when necessary.
Attachment #278473 - Attachment is obsolete: true
Attachment #278555 - Flags: review?(brendan)
Attachment #278473 - Flags: review?(brendan)
(Assignee)

Comment 21

11 years ago
(In reply to comment #19)
> If we really wanted the right words for a
> hierarchy of Arena/Block, we might prefer Page and Cluster or Segment.

In an initial prototype of the patch I replaced arena by page and used the former to mean "chunk" in the new patch. But then I realized that size(gc arena) != size(cpu page) in general and, to avoid confusion with CPU pages and extra renames in js_GC and other places explicitly accessing gc arenas, I kept the arena.

On the other hand, given the alignment requirements, "page" plays nicely with "chunk" and comments can easily clarify that gc page is not a cpu page. So should I rename JSGCArenaInfo a -> JSGCPageInfo page and update the comments accordingly?
(Assignee)

Comment 22

11 years ago
Created attachment 278557 [details] [diff] [review]
v5b

jsgc.c currently uses the word "chunk" to mean "Number of things covered by a single bit of JSGCArena.unscannedThings." The new patch changes that usage into "segment" as "chunk" now means a sequence of GC arenas.
Attachment #278555 - Attachment is obsolete: true
Attachment #278557 - Flags: review?
Attachment #278555 - Flags: review?(brendan)

Comment 23

11 years ago
v5b patch introduces the following new warnings:

jsgc.c(2615) : warning C4018: '<' : signed/unsigned mismatch
jsgc.c(2678) : warning C4018: '<' : signed/unsigned mismatch

The test-suite isn't working for me on Windows right now, I'll play with it in the morning.
(Assignee)

Comment 24

11 years ago
Created attachment 278560 [details] [diff] [review]
v6

Here is an attempt to fix the warnings reported by MSVC. I think it comes uint16->int32 promotion rule in arena->list->thingSize + 1 where thingSize is uint16 as 1 is int, not uint, literal. Thus the new patch uses explicit (uint32) 1.
Attachment #278557 - Attachment is obsolete: true
Attachment #278560 - Flags: review?(brendan)
Attachment #278557 - Flags: review?
(Assignee)

Comment 25

11 years ago
When testing on Windows, it would be nice to know how the timing with an optimized build of jsshell for following test is affected when one change 8 to 1-2-4-16 in lines 928-929, js_InitGC, 
            js_gcArenasPerChunk = (arenasPerPage < 8)
                                  ? 8
                                  : (size_t) arenasPerPage;

The test:

function test()
{
    var startTime = Date.now();
    var i = 1000*1000;
    var tmp = 0;
    do {
        tmp += 0.1;
    } while (--i != 0);
    tmp = null;
    gc();
    var time = Date.now() - startTime; 
    print("Test time: "+time+"ms");
}

test();

This will measure the overhead of mmap/VirtualAlloc on Posix/Windows.

On my 1.1Ghz Pentium M laptom the best timing from 20 runs was:

js_gcArenasPerChunk test-timing   sys-timing
1                    454            12
2                    431             9
4                    428             4
8                    432             5
16                   428             5

where sys-timing is the best system time as reported by
time ./Linux_All_OPT.OBJ/js ~/m/y.js > /dev/null

So I will change the default from 8 to 4 under Linux since it seems that very slight but increase when going from 4 to 8 pages is real.

Comment 26

11 years ago
I was getting results all over the map with i == 1,000,000, so I made it 10,000,000 instead (best result of ten runs of test()):

1: 2203
2: 2140
4: 2094
8: 2094
16: 2091

Alas, still getting the same compiler warnings (new line numbers):

jsgc.c(2616) : warning C4018: '<' : signed/unsigned mismatch
jsgc.c(2679) : warning C4018: '<' : signed/unsigned mismatch
(Assignee)

Comment 27

11 years ago
(In reply to comment #25)
> js_gcArenasPerChunk test-timing   sys-timing
> 1                    454            12
> 2                    431             9
> 4                    428             4
> 8                    432             5
> 16                   428             5

Here is the results without the patch:
> without-the-patch    474             5
> patch with no mmap   470             6

Where "patch with no mmap" are results of the patch compiled with -DJS_GC_USE_MMAP=0.
(Assignee)

Comment 28

11 years ago
(In reply to comment #26)
> I was getting results all over the map with i == 1,000,000, so I made it
> 10,000,000 instead (best result of ten runs of test()):
> 
> 1: 2203
> 2: 2140
> 4: 2094
> 8: 2094
> 16: 2091

What about 24 and 32 and without the patch? I just want to confirm that 8 reaches plateau on Windows as well. 
(Assignee)

Comment 29

11 years ago
Created attachment 278569 [details] [diff] [review]
v7

This should fix all the warnings.
Attachment #278560 - Attachment is obsolete: true
Attachment #278569 - Flags: review?(brendan)
Attachment #278560 - Flags: review?(brendan)

Comment 30

11 years ago
> What about 24 and 32 and without the patch? I just want to confirm that 8
> reaches plateau on Windows as well. 
> 

24: 2150
32: 2134

Without the patch:

2088!  And hovering in the mid-to-low 2090's for most runs.  The new version does fix all warnings, however.

(Assignee)

Comment 31

11 years ago
(In reply to comment #30)
> > What about 24 and 32 and without the patch? I just want to confirm that 8
> > reaches plateau on Windows as well. 
> > 
> 
> 24: 2150
> 32: 2134

So this means that VirtualAlloc(4 CPU pages) brings the lowest overall overhead on Windows as well.

> 
> Without the patch:
> 
> 2088!

Interesting. It seems that allocating of 9K is extremely optimized on Windows.

Could you repeat the test with 4, 8 and 16 gc arenas per chunk and modified test case where double is replaced by object:

function test()
{
    var startTime = Date.now();
    var i = 10*1000*1000 / 4; // sizeof(object) = 4*sizeof(double)
    var tmp = 0;
    do {
        tmp = {};
    } while (--i != 0);
    tmp = null;
    gc();
    var time = Date.now() - startTime; 
    print("Test time: "+time+"ms");
}

test();

With the patch the total memory allocation for the test should be 10% less so it would be interesting to compare with/without patch cases.
(Assignee)

Comment 32

11 years ago
Timing with the test case from comment 31 on a 1.1 GHz Pentium M Laptop under Fedora 7:

js_gcArenasPerChunk test-timing
1                    2477            
2                    2450            
4                    2433            
8                    2431            
16                   2413
24                   2414            
32*                  2426
64*                  2424
patch with no mmap   2424
no patch             2435   

Now it seems 16 arenas is best when allocating objects.

* GC_ARENAS_PER_CPU_PAGE_LIMIT was altered to be 64 during the run
From comment 26, 16 arenas looks best on Windows too, right? How about Mac OS X?

/be

Comment 34

11 years ago
(In reply to comment #33)
> From comment 26, 16 arenas looks best on Windows too, right? How about Mac OS
> X?

It seems so, though I get equivalent or very, very slightly better results without the patch on Win32.
(Assignee)

Comment 35

11 years ago
(In reply to comment #34)
> It seems so, though I get equivalent or very, very slightly better results
> without the patch on Win32.

One more question about Windows performance. What happens if GC_ARENAS_PER_CPU_PAGE_LIMIT is altered to 1000 and js_gcArenasPerChunk is forced to be 128/256 with double/object test cases?


Comment on attachment 278569 [details] [diff] [review]
v7

 * from the end of their area. Such arrangement avoids calculating the

Nit: could just say "This avoids ..." and wrap better (maybe).

 * Macros to decode/encode previous arena on the GC list and chunk gap

Nit: "the previous arena ..."

Couldn't you use bit-fields instead of word1 and word2 and BIG_UGLY_MACROS (I know, I started that trend long ago ;-)?

s/\<bi\>/ci/g (block=>chunk renaming fallout)

Nit: 1U is slightly easier on the eyes than (uint32) 1, and just as good.

Suggest avoiding SEGMENT and just using SPAN (spanIndex, etc.) in context of UNSCANNED_ and the like -- just as clear or moreso because shorter, less associated with large (multi-"page") runs of memory, and (yeah) shorter.

I'm ok with arena/chunk instead of page/chunk or similar. It's hard to get these names right, but we're close and arena is traditional.

Need clarity on perf at large number of OS pages per chunk values on top three platforms, and nitpicks if you can, and I'll r/a+.

/be
(Assignee)

Comment 37

11 years ago
Here is another test that makes everything allocated reachable until the explicit GC call not to depend on the effects of the last ditch GC:

function test()
{
    var startTime = Date.now();
    var i = 1000*1000;
    var o = null;
    do {
        o = {o: o};
    } while (--i != 0);
    o = null;
    gc();
    var time = Date.now() - startTime; 
    print("Test time: "+time+"ms");
}

test();

When I run it in Linux console with no X11 running, I got the following results on 1.1 GHz Pentium M laptop with Fedora 7 for optimized js shell

arenas per chunk    time of the best among 10 runs
1                   1882
2                   1871
3                   1873
4                   1859   
6                   1864
8                   1850
12                  1858
16                  1859
24                  1846
32                  1854
48                  1864
64                  1852
96                  1871
128                 1856
192                 1855
256                 1850
no mmap             1860
no patch            1881
(Assignee)

Comment 38

11 years ago
Created attachment 278703 [details] [diff] [review]
v8

The new patch follows suggestions from comment 36 except the bit field ides. I will try that in the next patch.
Attachment #278569 - Attachment is obsolete: true
Attachment #278569 - Flags: review?(brendan)
(Assignee)

Comment 39

11 years ago
Created attachment 278747 [details] [diff] [review]
v9 (using bit fields)

The version of the patch that uses bit fields in JSGCArenaInfo instead of explicit bit manipulations. This indeeds reduces the number of macros, adds clarity and decreases the number of ifs as the previous patch contained code like:

a = GET_PREV_ARENA(a);
if (!a)
    jump;

where GET_PREV_ARENA already contained the check for NULL address.

But to proceed with bit fields, I need to verify that on Windows/Mac compilers implements the bit fields properly. So the patch contains:

JS_STATIC_ASSERT(sizeof(JSGCArenaInfo) == sizeof(jsuword) * 4);

So does it compile on Windows/Mac?

Comment 40

11 years ago
Running with the current patch under windows, I hit the following assertion:
Assertion failure: arenaList->lastCount > 0, at jsgc.c:2622
(Assignee)

Comment 41

11 years ago
Created attachment 279069 [details] [diff] [review]
v10 (using bit fields)

The new patch fixes the asserts in the previous bit-field patch.
Attachment #278703 - Attachment is obsolete: true
Attachment #278747 - Attachment is obsolete: true
(Assignee)

Comment 42

11 years ago
Comment on attachment 279069 [details] [diff] [review]
v10 (using bit fields)

The patch passes shell's test suite and fF smoke test.
Attachment #279069 - Flags: review?(brendan)
(Assignee)

Comment 43

11 years ago
For performance tuning I would like to know the results for the test from comment 37 on Windows/Mac with the following values of js_gcArenasPerChunk:

1 2 4 8 16 32 64

For that just add in js_InitGC from jsgc.c after the line:
js_gcArenasPerChunk = JS_MAX((uint32) arenasPerPage, 8);

a line like
js_gcArenasPerChunk = <the desired value>;

For completeness it would be interesting to know the results without the patch and the results with mmap/VirtaulAlloc disabled. For the later change the line js_InitGC that reads:
if (arenasPerPage - 1 <= (size_t) (GC_ARENAS_PER_CPU_PAGE_LIMIT - 1)) { 

into something like
if (0 && arenasPerPage - 1 <= (size_t) (GC_ARENAS_PER_CPU_PAGE_LIMIT - 1)) { 
(Assignee)

Comment 44

11 years ago
Created attachment 279160 [details] [diff] [review]
v11 (with bit fields)

The new version fixes JS_GCMETER and uses 1K-sized arenas when mmap is not available  to minimize the overhead of aligned allocation implemented through oversized malloc.
Attachment #279069 - Attachment is obsolete: true
Attachment #279160 - Flags: review?(brendan)
Attachment #279069 - Flags: review?(brendan)

Comment 45

11 years ago
Using the test from comment #37,

1:  1485
2:  1422
4:  1375
8:  1390
16: 1390
32: 1375
64: 1406

With MMAP disabled:
1486

Without the patch:
1515ms (with most other results very nearby)
Status: NEW → ASSIGNED
(Assignee)

Comment 46

11 years ago
(In reply to comment #45)
> Using the test from comment #37,
> 
> 1:  1485
> 2:  1422
> 4:  1375
> 8:  1390
> 16: 1390
> 32: 1375
> 64: 1406

Since the test from comment 37 makes sure that 32MB of memory is constantly allocated until the explicit GC call, the results should be unhindered by CPU cache effects. 

So I will switch the patch to use 4 CPU pages per GC chunk.
(Assignee)

Comment 47

11 years ago
Created attachment 279715 [details] [diff] [review]
v12 (with bit fields)

The new version uses 16K chunks with 4 4K-sized arenas.
Attachment #279160 - Attachment is obsolete: true
Attachment #279715 - Flags: review?(brendan)
Attachment #279160 - Flags: review?(brendan)
(Assignee)

Comment 48

11 years ago
Created attachment 279716 [details] [diff] [review]
v13 (with bit fields)

In the new version I renamed gcUnscannedArenaTop back to gcUnscannedArenaStackTop to minimize the patch.
Attachment #279715 - Attachment is obsolete: true
Attachment #279716 - Flags: review?(brendan)
Attachment #279715 - Flags: review?(brendan)
(Assignee)

Updated

11 years ago
Blocks: 333236
(Assignee)

Comment 49

11 years ago
Created attachment 280260 [details] [diff] [review]
v14 (with bit fields)

The new version fixes the wrong assert in FLAGP_TO_THING macro and optimizes JS_CallTracer for string and double things. There the code uses THING_TO_FLAGP(thing, thingSize) with a compile-time constant as thingSize so a compiler has more options for the optimization.
Attachment #279716 - Attachment is obsolete: true
Attachment #280260 - Flags: review?(brendan)
Attachment #279716 - Flags: review?(brendan)
(Assignee)

Comment 50

11 years ago
Created attachment 280265 [details] [diff] [review]
v15 (with bit fields)

Cleanups and optimizations to reduce the number of ifs taken on various code paths.
Attachment #280260 - Attachment is obsolete: true
Attachment #280265 - Flags: review?(brendan)
Attachment #280260 - Flags: review?(brendan)
(Assignee)

Comment 51

11 years ago
Created attachment 280266 [details] [diff] [review]
diff -b for v15
Comment on attachment 280265 [details] [diff] [review]
v15 (with bit fields)

struct JSGCArenaInfo may be the first use of bit-fields in SpiderMonkey. Slight style preference for space before the : as well as after (aligning field sizes in a column is cool still).

Need "these" before "indexes" here:

     * access either of indexes.

nfreeArenas => numFreeArenas

Nit: overflow line should underhang right operand of = here:

    chunkGap = (GC_ARENA_SIZE - (chunk & GC_ARENA_MASK)) &
        GC_ARENA_MASK;

Non-nit: the overflow line and final & on first line are not needed since GC_ARENA_MASK = GC_ARENA_SIZE - 1.

Too bad we can't use the extra arena allocated and wasted for alignment in the malloc-based chunk.

RemoveChunkFromList would be shorter if you used a double-indirect "prevp" -- see bug 391211 comment 18.

Suggest early return in NewGCArena for the js_gcArenasPerChunk == 1 case, to match DestroyGCArena.

Nit: "If it is fully used, ..." (missing "is") in:

         * Try to allocate things from the last arena. If it fully used, check

Old nit: does "Bag" mean something useful in AddToUnscannedBag etc.? It can't mean "set of values allowing redundant elements, i.e. a multiset".

More comments in a bit, almost done.

/be
(Assignee)

Comment 53

11 years ago
(In reply to comment #52)
> Too bad we can't use the extra arena allocated and wasted for alignment in the
> malloc-based chunk.

Well, malloc code is the last resort code when if mmap or some other alignment primitives is not available. For example, on POSIX systems (including Linux) there is posix_memalign that can be used as mmap replacement. So if on some platform that cannot use mmap one would like to avoid 3% overhead of overallocated chunk, then one can try that. 

> Old nit: does "Bag" mean something useful in AddToUnscannedBag etc.? It can't
> mean "set of values allowing redundant elements, i.e. a multiset".

A "bag" here means a container where one can add an element and from which one can retrive elements one by one. So this is not a set since the set allows to test for membership and such operation is not required here. It is not even a queue since the queue assumes a particular retrieval order and the code does not need to impose any particular ordering on things in the container. On the other hand such container resembles pretty much a real bag or sack.
memalign, right -- I was messing with trace-malloc lately, I should have thought of that.

Bag => multiset in my book, so "sack" is better for me. But it's a very minor nit.

/be
(Assignee)

Comment 55

11 years ago
Created attachment 280491 [details] [diff] [review]
v16

The new version brings the following changes besides addressing the nits:

1. When the code uses malloc for the arena allocation, the gap between the malloc area and the first GC arena is stored outside GC chunk. It allowed to use a pointer for JSGCArenaInfo.prev field avoiding the bitfield and simplifying the code.

2. The code uses prepv method to implement the doubly-linked list of JSGCChunkInfo. My only excuse for not using it in the first place is that some initial version of the patch have used paged addresses for JSGCChunkInfo.(prev|next). With that setup it was not possible to implement the method.

3. For consistency with the rest of code AddThingToUnscannedSack and ScanDelayedChildren uses "a", not "arena" as name for a local variable holding JSGCArenaInfo pointer.

4. The new version fixed bad bug in the free phase loop in JS_GC. The code there  unconditionally reset arenaList->lastCount even if the first arena arena were not freed. Too bad that the test suite does not covers that.
Attachment #280265 - Attachment is obsolete: true
Attachment #280266 - Attachment is obsolete: true
Attachment #280491 - Flags: review?(brendan)
Attachment #280265 - Flags: review?(brendan)
Comment on attachment 280491 [details] [diff] [review]
v16

Rest of v15 comments. I will re-review via interdiff a final patch and stamp it.

/be

-----

Note "bellow" misspelling:

 * NewGCChunk/DestroyGCChunk bellow for details.

Backslash not indented enough in:

#define GET_ARENA_CHUNK(arena, index)                                   \

Nit: blank line before:

            rt->gcBytes += GC_ARENA_SIZE;

in js_NewGCThing.

Non-nit/question: why not do the gcBytes +/-= GC_ARENA_SIZE in New/DestroyGCArena?

Nit: rename UNSCANNED_SPAN to THINGS_PER_UNSCANNED_BIT? Not sure why I urged "SPAN" before. Let's keep the name concrete in terms of a bitmap. Sanity check: replace "span" with "bit" in:

             * span as the real limit can be inside the span.

to get:

             * bit as the real limit can be "inside" the bit.

(I added scare-quotes ;-) and it works for me. How about you? Further check:

                 * Skip free or already scanned things that share the span

becomes:

                 * Skip free or already scanned things that share the bit

also reads nicely.

In this more concrete light, perhaps s/Bag/Bitmap/.

Could use a "so" before "push" here:

         * The thing is the first unscanned thing in the whole arena, push the

Probably want "[the] unscanned stack", not "[the] unscan stack", in:

         * unscan stack since AddThingToUnscannedBag ensures that even for

and a comma after "stack" would be nice.

Nit: is it worthwhile to do explicit c.s.e. in:

            } else if (endIndex > THINGS_PER_ARENA(thingSize)) {
                endIndex = THINGS_PER_ARENA(thingSize);

There's a divide hiding in the macro, but modern compilers should common for us.

Typo or wrong name in this comment:

         * When JS_TraceChildren from the above calls JS_Trace that in turn on

s/JS_Trace /JS_CallTracer /. Actually, grep on current source finds two such misnomers:

$ grep -w JS_Trace *.c
jsgc.c:         * When JS_TraceChildren from the above calls JS_Trace that in turn
jsgc.c:         * after it calls JS_Trace or JS_MarkGCThing for the last time, the

Nit: the change_list label and goto seem like a candidate for a do-while loop. But this requires a goto downward to exit the for (;;) loop. And it requires more indentation. Still, house style tries to use downward gotos but not upward ones that easily rewrite to canonical loop statements.

s/if/is/ in:

     * Optimize for string and double as their size if known and their tracing

Canonical comment for:

+        /* unreachable */

uses "NOTREACHED", not "unreachable".

-END-
(In reply to comment #55)
> 4. The new version fixed bad bug in the free phase loop in JS_GC. The code
> there  unconditionally reset arenaList->lastCount even if the first arena arena
> were not freed. Too bad that the test suite does not covers that.

Can you contrive a testcase?

/be
(Assignee)

Comment 58

11 years ago
(In reply to comment #56)
> Note "bellow" misspelling:

I must learn the proper spelling by now :(

> Non-nit/question: why not do the gcBytes +/-= GC_ARENA_SIZE in
> New/DestroyGCArena?

gcBytes is irrelevant to arena allocation code so I moved its checks and updates to js_NewGCThing/js_GC/js_DestroyArenaList, the places that define GC memory pressure heuristics. 

> In this more concrete light, perhaps s/Bag/Bitmap/.

Do you mean to s/Sack/Bitmap/ in the latest patch. IMO it does not read good. Compare:

static void
AddThingToUnscannedSack(JSRuntime *rt, uint8 *flagp)
...
if (RECURSION_TOO_DEEP())
    AddThingToUnscannedSack(rt, flagp);

versus

static void
AddThingToUnscannedBitmap(JSRuntime *rt, uint8 *flagp)
...
if (RECURSION_TOO_DEEP())
    AddThingToUnscannedBitmap(rt, flagp);


Perhaps it is better to call the function DelayChildrenScanning. It matches ScanDelayedChildren and even reads better:

static void
DelayChildrenScanning(JSRuntime *rt, uint8 *flagp)
...
if (RECURSION_TOO_DEEP())
    DelayChildrenScanning(rt, flagp);


After typing this i realized that it using DelayChildrenTracing/TraceDelayedChildren would be even better:

static void
DelayChildrenScanning(JSRuntime *rt, uint8 *flagp)
...
if (RECURSION_TOO_DEEP())
    DelayChildrenTracing(rt, flagp);
else
    JS_TraceChildren(trc, thing, kind);

...
JS_TraceChildren(trc, thing, kind);
TraceDelayedChildren(trc);

But that would require comment rewrite.
DelayTracingChildren is the better order among the words in the identifier. I like it! Looking forward to a new (last) patch.

/be
(Assignee)

Comment 60

11 years ago
Created attachment 280521 [details] [diff] [review]
v17

Addressing nits and replacing (un)scanned -> (un)traced. 

Now there is no words "bag" or "sack" in GC sources ;)
Attachment #280491 - Attachment is obsolete: true
Attachment #280521 - Flags: review?(brendan)
Attachment #280491 - Flags: review?(brendan)
Comment on attachment 280521 [details] [diff] [review]
v17

>+    goto change_list;
>     for (;;) {
>+        if (a->list != arenaList) {
>+          change_list:
>+            arenaList = a->list;

That's a downward goto, all right, but still a bit tricky. If a few cycles are not an issue here (I claim they are not), I'd prefer setting arenaList = NULL; before the for loop and letting the if do its thing.

r+a=me with that. Thanks, terrific patch.

/be
Attachment #280521 - Flags: review?(brendan)
Attachment #280521 - Flags: review+
Attachment #280521 - Flags: approval1.9+
(Assignee)

Comment 62

11 years ago
Created attachment 280648 [details] [diff] [review]
v18

The new version skips goto and ifs in TraceDelayedChildren and unconditionally calls THINGS_PER_UNTRACED_BIT(thingSize) per each arena. This makes this rarely executed code simpler and trades storage of few temporaries for calculations.
Attachment #280521 - Attachment is obsolete: true
Attachment #280648 - Flags: review?(brendan)

Updated

11 years ago
Attachment #280648 - Flags: review?(brendan)
Attachment #280648 - Flags: review+
Attachment #280648 - Flags: approval1.9+
Forgot to ask that memalign comment would be nice (about how the malloc-only case could use memalign if available).

/be
(Assignee)

Updated

11 years ago
Blocks: 396007
(Assignee)

Comment 64

11 years ago
Created attachment 280710 [details] [diff] [review]
v18b

The new version contains comments suggestion to consider posix_memalign. Here is the inter-diff:

@@ -555,16 +555,19 @@ NewGCChunk()
         return (p == MAP_FAILED) ? 0 : (jsuword) p;
 # else
 #  error "Not implemented"
 # endif
     }
 #endif
 
     /*
+     * Implement the chunk allocation using over sized malloc if mmap can not
+     * be used. XXX Consider using posix_memalign here, see bug 396007. 
+     * 
      * Since malloc allocates pointers aligned on the word boundary, to get
      * js_gcArenasPerChunk aligned arenas, we need to malloc only
      *   ((js_gcArenasPerChunk + 1) << GC_ARENA_SHIFT) - sizeof(size_t)
      * bytes. But since we stores the gap between the malloced pointer and the
      * first arena in the chunk after the chunk, we need to ask for
      *   ((js_gcArenasPerChunk + 1) << GC_ARENA_SHIFT)
      * bytes to ensure that we always have room to store the gap.
      */
Attachment #280648 - Attachment is obsolete: true
Attachment #280710 - Flags: review+
Attachment #280710 - Flags: approval1.9+
(Assignee)

Comment 65

11 years ago
Darmon, is it not too late for M8 check in? 
New way is FIXME not XXX in such comments. Please change to that before committing. I hear M8 is done and this can go in when the tree is open for M9.

/be
(Assignee)

Comment 67

11 years ago
Created attachment 280991 [details] [diff] [review]
v18c

Patch to check in with comments  fixed. I
Attachment #280710 - Attachment is obsolete: true
Attachment #280991 - Flags: review+
Attachment #280991 - Flags: approval1.9+
(Assignee)

Updated

11 years ago
Attachment #280991 - Attachment description: v1bc → v1c
(Assignee)

Updated

11 years ago
Attachment #280991 - Attachment description: v1c → v18c
(Assignee)

Comment 68

11 years ago
Created attachment 280993 [details] [diff] [review]
v18d

Fixing one more typo.
Attachment #280991 - Attachment is obsolete: true
Attachment #280993 - Flags: review+
Attachment #280993 - Flags: approval1.9+
(Assignee)

Comment 69

11 years ago
I checked in the patch from the comment 68 to the trunk:

http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1189868942&maxdate=1189869125&who=igor%mir2.org
Status: ASSIGNED → RESOLVED
Last Resolved: 11 years ago
Resolution: --- → FIXED
(Assignee)

Comment 70

11 years ago
The patch busted Mac tinderboxes. Apparently on Mac MAP_ANONYMOUS is not defined.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 71

11 years ago
Created attachment 281009 [details] [diff] [review]
v19

The new version uses a workaround for platforms where MAP_ANONYMOUS is not defined and only deprecated MAP_ANON is available. Here is the inter-diff: 

+++ js/src/jsgc.c	2007-09-15 17:37:01.000000000 +0200
@@ -96,16 +96,22 @@
 # define JS_GC_USE_MMAP 0
 #endif
 
 #if JS_GC_USE_MMAP
 # if defined(XP_WIN)
 #  include <windows.h>
 # elif defined(XP_UNIX) || defined(XP_BEOS)
 #  include <sys/mman.h>
+
+/* On Mac OS X MAP_ANONYMOUS is not defined. */
+#  if !defined(MAP_ANONYMOUS) && defined(MAP_ANON)
+#   define MAP_ANONYMOUS MAP_ANON
+#  endif
+
 # endif
 #endif
 

I would appreciate if somebody can check that the patch compiles on Mac.
Attachment #280993 - Attachment is obsolete: true
Attachment #281009 - Flags: review?(brendan)
Attachment #281009 - Flags: approval1.9?
Comment on attachment 281009 [details] [diff] [review]
v19

That should do it.

/be
Attachment #281009 - Flags: review?(brendan)
Attachment #281009 - Flags: review+
Attachment #281009 - Flags: approval1.9?
Attachment #281009 - Flags: approval1.9+
I tested v19 on my Macbook Pro and it builds fine.

/be
(Assignee)

Comment 74

11 years ago
(In reply to comment #73)
> I tested v19 on my Macbook Pro and it builds fine.

Thanks, I checked in the patch from comment 71 to the trunk:

http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1189876705&maxdate=1189876980&who=igor%mir2.org
Status: REOPENED → RESOLVED
Last Resolved: 11 years ago11 years ago
Resolution: --- → FIXED
(Assignee)

Comment 75

11 years ago
I will be away until 23:00 UTC / 16:00 PDT so if tinderbox testing reveals troubles, then the link in comments 74 should allow to take the patch away.
It appears this patch bumped up Lk on fxdbug-linux-tbox by about .25MB
The jump's filed as bug 396299
(Assignee)

Comment 78

11 years ago
I took out the patch again to investigate the increased leak.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 79

11 years ago
I rechecked-in the patch from comment 71 to the trunk as taking it out had no influence on the leak counters:

http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1189947780&maxdate=1189947840&who=igor%mir2.org
Status: REOPENED → RESOLVED
Last Resolved: 11 years ago11 years ago
Resolution: --- → FIXED

Updated

11 years ago
Depends on: 396936
(Assignee)

Updated

11 years ago
Depends on: 408409
You need to log in before you can comment on or make changes to this bug.