Closed Bug 671702 Opened 8 years ago Closed 8 years ago

Investigate small GC chunks

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED WONTFIX

People

(Reporter: igor, Assigned: igor)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

(Keywords: memory-footprint, Whiteboard: [MemShrink:P2])

Attachments

(1 file, 7 obsolete files)

+++ This bug was initially created as a clone of Bug #669245 comment 13+++

To improve our heap fragmentation problem when we have many GC chunks with just few arenas allocated we should investigate small 64K-sized chunks (this is the smallest amount of memory that Windows allocates using VirtualAlloc). With such chunks it should also be possible to make them per-compartment with each chunk holding arenas only from single compartment.
Whiteboard: [MemShrink]
Assignee: general → igor
You can just move the compartment pointer into each arena. Want to try that first? Its probably easier (at the end of the arena for example). Thanks for working on this igor. Awesome stuff.
(In reply to comment #1)
> You can just move the compartment pointer into each arena.

I am not sure what do you mean here. Currently the compartment pointer is per arena (it is stored in the arena header).
Attached patch v1 (obsolete) — Splinter Review
The patch makes chunks per-compartment. 

Under jemalloc browser on 64-bit Linux build for http://v8.googlecode.com/svn/data/benchmarks/v6/run.html I clearly see up to 20% regression when run the first time after the browser restart. However for the second and later benchmark the situation looks rather OK. Here is data for different chunk sizes:

             Base     32K    64K     128K    512K

Score:       3917     3440   3748    3923    4020
	         	                              
Richards:    6227     6146   6196    5998    6047
DeltaBlue:   3811     2290   2698    3848    4172
Crypto:      6353     6268   6264    6369    6101
RayTrace:    2863     2682   2804    2765    2928
EarleyBoyer: 3811     3210   4335    4085    4273
RegExp:      1647     1443   1542    1586    1647
Splay:       5215     5207   5300    5430    5354

I.e. with 128K compartment-private chunks the scores matches the score without the patch.

With SunSpider the situation is similar, but the the differences are less then with V8.

Now I need to see memory utilization with the patch.
I'm a little confused. Are the numbers in the table from the first run after a restart or from a later run?
(In reply to comment #4)
> I'm a little confused. Are the numbers in the table from the first run after
> a restart or from a later run?

The numbers are for latter runs. The first run after the browser startup with 64K chunks scores about 15% lower compared with the number in the table, with 128K chunks the regression is about 10%.

For references, here are the JS shell numbers for V8 with 128K chunks allocated via mmap:

TEST              COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:      *1.058x as slow*  1528.2ms +/- 0.1%   1617.4ms +/- 0.3%     significant

=============================================================================

  v8:             *1.058x as slow*  1528.2ms +/- 0.1%   1617.4ms +/- 0.3%     significant
    crypto:       ??                 203.1ms +/- 0.3%    203.6ms +/- 0.3%     not conclusive: might be *1.003x as slow*
    deltablue:    *1.015x as slow*   277.2ms +/- 0.4%    281.4ms +/- 0.2%     significant
    earley-boyer: *1.112x as slow*   250.9ms +/- 0.3%    279.0ms +/- 0.6%     significant
    raytrace:     *1.021x as slow*   196.2ms +/- 0.2%    200.4ms +/- 0.2%     significant
    regexp:       *1.007x as slow*   193.4ms +/- 0.4%    194.8ms +/- 0.4%     significant
    richards:     *1.013x as slow*   206.4ms +/- 0.6%    209.1ms +/- 0.7%     significant
    splay:        *1.24x as slow*    201.0ms +/- 0.3%    249.0ms +/- 1.2%     significant




With 64K chunks the numbers are:

=============================================================================

** TOTAL **:      *1.090x as slow*  1528.2ms +/- 0.1%   1666.1ms +/- 0.5%     significant

=============================================================================

  v8:             *1.090x as slow*  1528.2ms +/- 0.1%   1666.1ms +/- 0.5%     significant
    crypto:       *1.008x as slow*   203.1ms +/- 0.3%    204.8ms +/- 0.4%     significant
    deltablue:    *1.21x as slow*    277.2ms +/- 0.4%    336.2ms +/- 0.4%     significant
    earley-boyer: *1.101x as slow*   250.9ms +/- 0.3%    276.2ms +/- 0.3%     significant
    raytrace:     *1.022x as slow*   196.2ms +/- 0.2%    200.5ms +/- 0.2%     significant
    regexp:       *1.025x as slow*   193.4ms +/- 0.4%    198.2ms +/- 0.6%     significant
    richards:     *1.009x as slow*   206.4ms +/- 0.6%    208.4ms +/- 0.7%     significant
    splay:        *1.20x as slow*    201.0ms +/- 0.3%    241.9ms +/- 2.6%     significant


The try server builds with the patch: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/ibukanov@mozilla.com-1e65f648fca3
about:memory stats after opening gmail, livejornal, gmaps, bbc.co.uk, closing gmail and running GC an CC. Without the patch:


167.76 MB (100.0%) -- explicit
├───68.02 MB (40.54%) -- js
│   ├──19.78 MB (11.79%) -- gc-heap-chunk-unused
│   ├──13.44 MB (08.01%) -- compartment([System Principal])
│   │  ├───8.52 MB (05.08%) -- gc-heap
│   │  │   ├──4.38 MB (02.61%) -- objects
│   │  │   ├──3.30 MB (01.97%) -- shapes
│   │  │   └──0.83 MB (00.50%) -- (5 omitted)
│   │  ├───2.01 MB (01.20%) -- scripts
│   │  ├───1.50 MB (00.89%) -- mjit-code
│   │  └───1.42 MB (00.84%) -- (5 omitted)
│   ├──10.42 MB (06.21%) -- compartment(http://maps.google.com/)
│   │  ├───4.54 MB (02.71%) -- gc-heap
│   │  │   ├──1.94 MB (01.16%) -- objects
│   │  │   ├──1.63 MB (00.97%) -- arena-unused
│   │  │   ├──0.85 MB (00.51%) -- shapes
│   │  │   └──0.12 MB (00.07%) -- (4 omitted)
│   │  ├───2.73 MB (01.63%) -- mjit-code
│   │  ├───1.63 MB (00.97%) -- scripts
│   │  └───1.52 MB (00.91%) -- (5 omitted)
│   ├───8.00 MB (04.77%) -- stack
│   ├───7.31 MB (04.36%) -- compartment(http://www.bbc.co.uk/news/)
│   │   ├──3.34 MB (01.99%) -- gc-heap
│   │   │  ├──1.15 MB (00.68%) -- arena-unused
│   │   │  ├──1.12 MB (00.67%) -- objects
│   │   │  ├──0.94 MB (00.56%) -- shapes
│   │   │  └──0.14 MB (00.08%) -- (4 omitted)
│   │   ├──1.82 MB (01.09%) -- mjit-code
│   │   ├──1.09 MB (00.65%) -- (5 omitted)
│   │   └──1.05 MB (00.63%) -- scripts
│   ├───4.81 MB (02.87%) -- compartment(http://fpoling.livejournal.com/friends)
│   │   ├──2.49 MB (01.48%) -- gc-heap
│   │   │  ├──0.97 MB (00.58%) -- objects
│   │   │  ├──0.85 MB (00.51%) -- shapes
│   │   │  └──0.67 MB (00.40%) -- (5 omitted)
│   │   ├──1.38 MB (00.82%) -- (6 omitted)
│   │   └──0.95 MB (00.56%) -- scripts
│   ├───3.30 MB (01.97%) -- compartment(atoms)
│   │   ├──2.40 MB (01.43%) -- gc-heap
│   │   │  ├──1.59 MB (00.95%) -- strings
│   │   │  └──0.81 MB (00.48%) -- (6 omitted)
│   │   ├──0.91 MB (00.54%) -- string-chars
│   │   └──0.00 MB (00.00%) -- (6 omitted)
│   └───0.96 MB (00.57%) -- (4 omitted)
├───63.90 MB (38.09%) -- heap-unclassified
├───24.14 MB (14.39%) -- images
│   ├──24.03 MB (14.32%) -- content
│   │  ├──24.03 MB (14.32%) -- used
│   │  │  ├──22.33 MB (13.31%) -- uncompressed
│   │  │  └───1.71 MB (01.02%) -- raw
│   │  └───0.00 MB (00.00%) -- (1 omitted)
│   └───0.10 MB (00.06%) -- (1 omitted)
├────5.59 MB (03.33%) -- storage
│    └──5.59 MB (03.33%) -- sqlite
│       ├──2.75 MB (01.64%) -- places.sqlite
│       │  ├──2.49 MB (01.48%) -- cache-used
│       │  └──0.26 MB (00.16%) -- (2 omitted)
│       ├──1.97 MB (01.17%) -- (10 omitted)
│       └──0.87 MB (00.52%) -- other
├────4.65 MB (02.77%) -- layout
│    └──4.65 MB (02.77%) -- all
├────1.31 MB (00.78%) -- xpti-working-set
└────0.16 MB (00.09%) -- (2 omitted)


With the patch:

158.01 MB (100.0%) -- explicit
├───66.49 MB (42.08%) -- heap-unclassified
├───54.07 MB (34.22%) -- js
│   ├──13.58 MB (08.59%) -- compartment([System Principal])
│   │  ├───8.55 MB (05.41%) -- gc-heap
│   │  │   ├──4.36 MB (02.76%) -- objects
│   │  │   ├──3.25 MB (02.06%) -- shapes
│   │  │   └──0.95 MB (00.60%) -- (5 omitted)
│   │  ├───2.04 MB (01.29%) -- scripts
│   │  ├───1.56 MB (00.99%) -- mjit-code
│   │  └───1.42 MB (00.90%) -- (5 omitted)
│   ├──10.20 MB (06.45%) -- compartment(http://maps.google.com/)
│   │  ├───4.52 MB (02.86%) -- gc-heap
│   │  │   ├──1.93 MB (01.22%) -- objects
│   │  │   ├──1.62 MB (01.02%) -- arena-unused
│   │  │   ├──0.85 MB (00.54%) -- shapes
│   │  │   └──0.12 MB (00.07%) -- (4 omitted)
│   │  ├───2.61 MB (01.65%) -- mjit-code
│   │  ├───1.63 MB (01.03%) -- scripts
│   │  └───1.45 MB (00.92%) -- (5 omitted)
│   ├───8.00 MB (05.06%) -- stack
│   ├───6.87 MB (04.35%) -- compartment(http://www.bbc.co.uk/news/)
│   │   ├──3.04 MB (01.92%) -- gc-heap
│   │   │  ├──1.11 MB (00.70%) -- objects
│   │   │  ├──0.93 MB (00.59%) -- shapes
│   │   │  ├──0.87 MB (00.55%) -- arena-unused
│   │   │  └──0.13 MB (00.08%) -- (4 omitted)
│   │   ├──1.76 MB (01.11%) -- mjit-code
│   │   ├──1.05 MB (00.67%) -- scripts
│   │   └──1.02 MB (00.64%) -- (5 omitted)
│   ├───6.53 MB (04.13%) -- gc-heap-chunk-unused
│   ├───4.93 MB (03.12%) -- compartment(http://fpoling.livejournal.com/friends)
│   │   ├──2.58 MB (01.63%) -- gc-heap
│   │   │  ├──1.01 MB (00.64%) -- objects
│   │   │  ├──0.90 MB (00.57%) -- shapes
│   │   │  └──0.67 MB (00.42%) -- (5 omitted)
│   │   ├──1.37 MB (00.87%) -- (6 omitted)
│   │   └──0.98 MB (00.62%) -- scripts
│   ├───3.29 MB (02.08%) -- compartment(atoms)
│   │   ├──2.38 MB (01.51%) -- gc-heap
│   │   │  ├──1.61 MB (01.02%) -- strings
│   │   │  └──0.78 MB (00.49%) -- (6 omitted)
│   │   ├──0.91 MB (00.58%) -- string-chars
│   │   └──0.00 MB (00.00%) -- (6 omitted)
│   └───0.67 MB (00.42%) -- (4 omitted)
├───24.13 MB (15.27%) -- images
│   ├──24.02 MB (15.20%) -- content
│   │  ├──24.02 MB (15.20%) -- used
│   │  │  ├──22.32 MB (14.13%) -- uncompressed
│   │  │  └───1.70 MB (01.08%) -- raw
│   │  └───0.00 MB (00.00%) -- (1 omitted)
│   └───0.10 MB (00.07%) -- (1 omitted)
├────7.23 MB (04.57%) -- storage
│    └──7.23 MB (04.57%) -- sqlite
│       ├──2.72 MB (01.72%) -- places.sqlite
│       │  ├──2.46 MB (01.56%) -- cache-used
│       │  └──0.26 MB (00.17%) -- (2 omitted)
│       ├──2.15 MB (01.36%) -- (10 omitted)
│       ├──1.42 MB (00.90%) -- urlclassifier3.sqlite
│       │  ├──1.32 MB (00.84%) -- cache-used
│       │  └──0.10 MB (00.06%) -- (2 omitted)
│       └──0.93 MB (00.59%) -- other
├────4.62 MB (02.92%) -- layout
│    └──4.62 MB (02.92%) -- all
├────1.31 MB (00.83%) -- xpti-working-set
└────0.16 MB (00.10%) -- (2 omitted)

The numbers varies between runs, but a 15% - 20% reduction for the size of JS heap is pretty consistent.

With per-compartment chunks it should be possible to improve the stats and accurately calculate the per-compartment heap size, but it is for another bug.
Attached patch v2 (obsolete) — Splinter Review
The reason for v1 regressions is that currently we have:

GC_ARENA_ALLOCATION_TRIGGER = 30 * js::GC_CHUNK_SIZE

This constant defines the minimal threshold to run the last-ditch GC. With smaller threshold we ended up doing more GC during benchmarks. 

The new patch fixes that using explicit 30MB number for the constant. With this changed even with 64K chunks V8 in js shell shows no difference and V8 in the browser does not show a regression for the first run after the start-up. The overall score in the browser is at worst 5% lower than the base numbers, but this is within the noise. With 128K chunks I see no regressions.
Attachment #546940 - Attachment is obsolete: true
Awesome! It might be useful to try a shell build with --enable-gctimer and see how the GC times differ when running on V8. It seems like there's a slight chance that conservative stack scanning could get slower because the chunk table is larger.
This might be beyond the scope of this bug, but it also might be easy to fix at the same time: How do you decide which of a compartment's chunks to allocate an object into?  jemalloc's heuristic is to always allocate into the chunk with the lowest address, but any stable ordering would do.
(In reply to comment #9)
> This might be beyond the scope of this bug, but it also might be easy to fix
> at the same time: How do you decide which of a compartment's chunks to
> allocate an object into? 

The patch put all chunks with at least one available arena on a doubly-linked list with the list head stored in JSCompartment. When all arenas from the chunk are used, the chunk is removed from the list. Also, after the GC empty chunks are removed from the the list and added to the global pool of empty chunks. As before the patch the empty chunks are returned to the system if they survive 3 GC cycle or when the browser is idle.

> jemalloc's heuristic is to always allocate into
> the chunk with the lowest address, but any stable ordering would do.

I guess instead of linking the chunks the compartment can put them into an array that can be sorted after the GC. But this is for another bug.
> But this is for another bug.

Okay.  For those following along at home, we've been in bug 669245.
Bug 669611 will help with the evaluation of this bug in the browser.  Hopefully I'll land it on mozilla-inbound today.
Attached patch v3 (obsolete) — Splinter Review
In v3 I replaced the linked list of empty chunks with an array that also stores the age. This way the empty chunks are not derefrenced that helped to remove a spike of cache/TLB misses in ExpireGCChunks.

With this patch I see no differences in v8 in browser/shell. Also GCTIMER output during v8 run shows no substantial difference.
Attachment #547086 - Attachment is obsolete: true
(In reply to comment #12)
> Bug 669611 will help with the evaluation of this bug in the browser. 
> Hopefully I'll land it on mozilla-inbound today.

The patch shows fragmentation decrease from 50% to 35% for a simple test from the comment 6.
Comment on attachment 547465 [details] [diff] [review]
v3

The patch passes try server.

It can be made smaller as there are some cleanups that I did when trying to identify regressions in the initial versions, but I suppose those would not complicate the review.
Attachment #547465 - Flags: review?(wmccloskey)
Very nice results! Could you summarize the changes in a single comment?

Did you also benchmark this patch on a small device like your netbook? It might hurt more on such platforms.
So you got rid of the system/user split from bug 666058?  This needs some careful measurement.  And yes, a summary would be nice!
If each chunk holds objects from only one compartment, then it's still the case that a chunk holding objects from a user compartment doesn't hold objects from a system compartment, right?
Attached patch v4 (obsolete) — Splinter Review
The new version adds comments and removes unrelated changes. here is a summary of all changes:

1. Chunks are made 64K. Each chunk contains only GC things from one compartment. This eliminated the need to have system/user chunk separation and allow to reclaim all compartment GC memory after finishing the compartment.

2. Each compartment maintains a doubly-linked list of all its chunks with at least one free arena. The arenas are allocated from the list head. When all arenas are allocated, the chunk is removed from the list. If the GC frees at least one arena in the previously full chunk, it is added back to the list. Also during the GC any chunk that becomes empty is removed from the list and added to the global pool of empty chunks.

3. The pool of empty chunks is implemented as a vector to avoid dereferencing chunks in the pool when aging them. Also chunks in the pool are removed from the global hash of all chunks. This helps the conservative GC as it minimizes the hash size. It also avoids clearing the mark bitmap in the empty chunks before the start of the GC. 

4. The compartment pointer is moved from the arena header to the compartment info descriptor. This freed one word in the header and the patch uses that to remove the separated array of marking delay pointers and move them to the header.
Attachment #547465 - Attachment is obsolete: true
Attachment #547663 - Flags: review?(wmccloskey)
Attachment #547465 - Flags: review?(wmccloskey)
(In reply to comment #16)
> Did you also benchmark this patch on a small device like your netbook? It
> might hurt more on such platforms.

I do not have old single-core Atom netbook setup right now for development-testing, but on a netbook with double core 1.3 Ghz Neo K325 from AMD I see no differences in the total V8 scores under Linux neither in js shell nor when testing http://v8.googlecode.com/svn/data/benchmarks/v6/run.html in a jemalloc-emabled browser. Also the GC timing in the browser stays approximately the same, but it looks like the GC is run at different moments. That may explain why v8 shows a noticeable variation between individual benchmarks. Here are the numbers for V8 in the shell before and after the patch:

TEST              COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:      1.005x as fast    3232.5ms +/- 0.1%   3216.2ms +/- 0.2%     significant

=============================================================================

  v8:             1.005x as fast    3232.5ms +/- 0.1%   3216.2ms +/- 0.2%     significant
    crypto:       -                  419.5ms +/- 0.1%    419.9ms +/- 0.1% 
    deltablue:    1.010x as fast     598.2ms +/- 0.1%    592.1ms +/- 0.2%     significant
    earley-boyer: *1.034x as slow*   526.3ms +/- 0.3%    544.3ms +/- 0.3%     significant
    raytrace:     1.006x as fast     361.9ms +/- 0.3%    359.7ms +/- 0.2%     significant
    regexp:       1.010x as fast     432.0ms +/- 0.2%    427.6ms +/- 0.2%     significant
    richards:     -                  506.9ms +/- 0.8%    503.8ms +/- 0.3% 
    splay:        1.051x as fast     387.7ms +/- 0.2%    368.8ms +/- 1.0%     significant
Attached patch v5 (obsolete) — Splinter Review
The new version fixes a bug in the empty chunk management code. Previously it cleared ChunkInfo::info.compartment when adding the chunk to the pool of empty chunks during the GC. But when the GC is compartment-local it leads to wrong results in IsAboutToBeFinalized that uses the compartment pointer to check for dead things in the current compartment.
Attachment #547663 - Attachment is obsolete: true
Attachment #547675 - Flags: review?(wmccloskey)
Attachment #547663 - Flags: review?(wmccloskey)
(In reply to comment #20)
>     splay:        1.051x as fast     387.7ms +/- 0.2%    368.8ms +/- 1.0%   

Thats very surprising! Fragmentation shouldn't have any effect on the splay benchmark. Here we have to increase the heapsize quickly and it seems that 64K  allocations are much faster than 1MB allocations.
I wanted to try it myself but your version seems to be outdated. Is this patch based on mozilla-central or still tracemonkey?
(In reply to comment #23)
> I wanted to try it myself but your version seems to be outdated. Is this
> patch based on mozilla-central or still tracemonkey?

Oh never mind. This was some mercurial mixup on my side!
Thats what I get for v8 in the browser:

trunk:
Score: 5070
Richards: 7915
DeltaBlue: 4893
Crypto: 8740
RayTrace: 3952
EarleyBoyer: 4956
RegExp: 2086
Splay: 6226

with patch:
Score: 4804
Richards: 8247
DeltaBlue: 4959
Crypto: 8692
RayTrace: 4006
EarleyBoyer: 4853
RegExp: 2085
Splay: 4099

I see a big regression for splay.
(In reply to comment #25)

> I see a big regression for splay.

This is on Mac, right? We do not have jemalloc there yet and mmap is relatively slow tehre. Can you try to change GC_CHUNK_SHIFT to 17 and try with 128K chunks? Another option would be to try posix_memalign which is available on Mac. For that just change AllocateGCChunk from jsgc.cpp to:

inline Chunk *
AllocateGCChunk(JSRuntime *rt)
{
    void *p;
    if (posix_memalign(&p, GC_CHUNK_SIZE, GC_CHUNK_SIZE - 4 * sizeof(uintptr_t))
        return NULL;
#ifdef MOZ_GCTIMER
    if (p)
        JS_ATOMIC_INCREMENT(&newChunkCount);
#endif
    return (Chunk *) p;
}

That 4 * sizeof(uintptr_t) is a hack to support allocators that uses allocations headers like glibc one.

and ReleaseGCChunk(JSRuntime *rt, Chunk *p) to

inline void
ReleaseGCChunk(JSRuntime *rt, Chunk *p)
{
    JS_ASSERT(p);
#ifdef MOZ_GCTIMER
    JS_ATOMIC_INCREMENT(&destroyChunkCount);
#endif
    Foreground::free_(p);
}
Depends on: 673760
Comment on attachment 547675 [details] [diff] [review]
v5

I split the bug into two to investigate the effect of not hashing empty chunks separately.
Attachment #547675 - Attachment is obsolete: true
Attachment #547675 - Flags: review?(wmccloskey)
Depends on: 673795
No longer depends on: 669245
Attached patch v6 (obsolete) — Splinter Review
The patch is based on the patches from the bug 673760 and the bug 673795 that should better expose the effect of smaller per-compartment chunks. As before I see no difference with it under Linux in jemalloc build of the browser in v8:

             without
              patch     patch

Score:         3952     3948
	       	            
Richards:      6128     6192
DeltaBlue:     3930     3785
Crypto:        6033     6430
RayTrace:      2942     2849
EarleyBoyer:   3859     3873
RegExp:        1753     1711
Splay:         5207     5256

But in js shell (no jemalloc so mmap is used instead) the numbers for v8 are:

TEST              COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:      *1.010x as slow*  1516.7ms +/- 0.1%   1532.1ms +/- 0.1%     significant

=============================================================================

  v8:             *1.010x as slow*  1516.7ms +/- 0.1%   1532.1ms +/- 0.1%     significant
    crypto:       1.004x as fast     202.7ms +/- 0.3%    201.9ms +/- 0.2%     significant
    deltablue:    *1.005x as slow*   281.8ms +/- 0.3%    283.3ms +/- 0.2%     significant
    earley-boyer: *1.017x as slow*   258.6ms +/- 0.4%    263.1ms +/- 0.3%     significant
    raytrace:     *1.004x as slow*   195.9ms +/- 0.1%    196.6ms +/- 0.1%     significant
    regexp:       1.003x as fast     195.6ms +/- 0.2%    195.0ms +/- 0.2%     significant
    richards:     *1.008x as slow*   205.4ms +/- 0.4%    207.1ms +/- 0.4%     significant
    splay:        *1.048x as slow*   176.7ms +/- 0.4%    185.2ms +/- 0.2%     significant

The regression can be fully offset with chunks increased to 256K from 64K used in the patch and increasing the number of the GC cycle an empty chunk can stay before been released to the system:

TEST              COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:      1.005x as fast    1516.7ms +/- 0.1%   1508.7ms +/- 0.1%     significant

=============================================================================

  v8:             1.005x as fast    1516.7ms +/- 0.1%   1508.7ms +/- 0.1%     significant
    crypto:       ??                 202.7ms +/- 0.3%    203.0ms +/- 0.3%     not conclusive: might be *1.001x as slow*
    deltablue:    -                  281.8ms +/- 0.3%    282.0ms +/- 0.2% 
    earley-boyer: -                  258.6ms +/- 0.4%    258.5ms +/- 0.4% 
    raytrace:     -                  195.9ms +/- 0.1%    196.0ms +/- 0.1% 
    regexp:       1.003x as fast     195.6ms +/- 0.2%    195.0ms +/- 0.2%     significant
    richards:     -                  205.4ms +/- 0.4%    204.7ms +/- 0.5% 
    splay:        1.043x as fast     176.7ms +/- 0.4%    169.5ms +/- 0.2%     significant


I suppose on Mac, where we don have yet jemalloc available, we may need to make those changes.
Whiteboard: [MemShrink] → [MemShrink:P1]
We also need numbers here that measure the JS heap size for a browsing session with 200+ tabs. 
I did a similar patch about a year ago and realized that the heap size increases linear with more tabs. This would lead to much more fragmentation! For the current system we get a curve that becomes very flat after about 30 tabs.
The smaller chunks might help a lot but we have to study the behavior with big workloads.
(In reply to comment #29)
> The smaller chunks might help a lot but we have to study the behavior with
> big workloads.

My hope is that with per-compartment chunks the consequences of the fragmentation would be less noticeable as after closing a tab all its chunks would be after the shrinking GC returned to jemalloc and be available for non-JS allocations.

Now, to fully fight fragmentation we would need to impose a stable ordering of chunks as proposed in the bug 669245.

In any case, do you have a bookmark with those 200 tabs for testing?
(In reply to comment #30)
> 
> In any case, do you have a bookmark with those 200 tabs for testing?

I modified a page from njn to open 150 sites:
http://gregor-wagner.com/tmp/mem

you have to adjust this in about:config to work:
dom.popup_maximum
Also, bug 674074 means that the FOTN tab in that test is causing about:memory to show blank.
(In reply to comment #32)
> Also, bug 674074 means that the FOTN tab in that test is causing
> about:memory to show blank.

I removed FOTN. It works now for me.
Stats with 150 tabs of open/closed from http://gregor-wagner.com/tmp/mem . The columns in the table gives the situation after opening all the tabs, closing the tabs via the buttom in the test and finally closing all the windows except 
about:memory. Each time I captured the stats after pressing GC/GC+CC several times until the number of empty GC chunks became close to zero.

Effectively the third column gives a picture of sort-of permanent heap fragmentation. The numbers in the first column varies a lot and gives mostly the scale of what is going on.

Base case of jemalloc-enabled browser on Linux:


                                   all tabs       keeping just    closing everything
                                                  membench tab     but about:memory
-- canvas-2d-pixel-bytes            2.61 MB          0.00 MB            0.00 MB
-- gfx-surface-image		   23.78 MB 	     2.36 MB	        0.31 MB
-- heap-allocated		1,792.13 MB 	   818.08 MB	      116.63 MB
-- heap-committed		1,852.00 MB 	 1,857.00 MB	    1,431.00 MB
-- heap-dirty			    1.25 MB 	     3.97 MB	        2.71 MB
-- heap-unallocated		   59.87 MB 	 1,038.91 MB	    1,314.37 MB
-- js-compartments-system	          2 	           2	              2
-- js-compartments-user		        249 	         150	              1
-- js-gc-heap			  439.00 MB 	   452.00 MB	       48.00 MB
-- js-gc-heap-arena-unused	   83.87 MB 	    70.04 MB	        3.37 MB
-- js-gc-heap-chunk-empty	    0.00 MB 	     0.00 MB	        0.00 MB
-- js-gc-heap-chunk-unused	    9. 71 MB 	   297.35 MB	       37.62 MB
-- js-gc-heap-unused-fraction	     21.31% 	      81.28%	         85.38%
-- page-faults-hard		         10 	          10	             10
-- page-faults-soft		  1,664,150 	   2,286,239	      2,343,457
-- resident			2,009.42 MB 	 1,342.29 MB	      295.39 MB
-- shmem-allocated		    0.00 MB 	     0.00 MB	        0.00 MB
-- shmem-mapped			    0.00 MB 	     0.00 MB	        0.00 MB
-- vsize			2,639.37 MB 	 2,556.45 MB	    1,966.75 MB


64K per-compartment chunks (the patch and its dependent patches):

                                   all tabs       keeping just    closing everything
                                                  membench tab     but about:memory
-- canvas-2d-pixel-bytes            2.57 MB          0.00 MB            0.00 MB
-- gfx-surface-image		   20.66 MB	     3.16 MB	        0.31 MB
-- heap-allocated		1,873.28 MB	   665.51 MB	       93.29 MB
-- heap-committed		1,931.00 MB	 1,942.00 MB	    1,890.00 MB
-- heap-dirty			    1.39 MB	     2.68 MB	        3.38 MB
-- heap-unallocated		   57.71 MB	 1,276.49 MB	    1,796.71 MB
-- js-compartments-system	          2	           2	              2
-- js-compartments-user		        242	         148	              1
-- js-gc-heap			  526.31 MB	   303.19 MB	       27.38 MB
-- js-gc-heap-arena-unused	   79.57 MB	    67.30 MB	        3.49 MB
-- js-gc-heap-chunk-empty	    0.31 MB	     0.00 MB	        0.00 MB
-- js-gc-heap-chunk-unused	  102.68 MB	   154.80 MB	       17.07 MB
-- js-gc-heap-unused-fraction	     34.63%	      73.25%	         75.08%
-- page-faults-hard		          4	           4	              4
-- page-faults-soft		  1,696,230	   2,466,237	      2,509,942
-- resident			2,089.09 MB	 1,197.11 MB	      282.27 MB
-- shmem-allocated		    0.00 MB	     0.00 MB	        0.00 MB
-- shmem-mapped			    0.00 MB	     0.00 MB	        0.00 MB
-- vsize			2,732.18 MB	 2,633.98 MB	    2,443.60 MB



128K per-compartment chunks and 128K jemalloc chunks (enabled via setting the MALLOC_OPTIONS environment variable to the value kkk , that is, decrease the default chunk size by 2 three times):


                                   all tabs       keeping just    closing everything
                                                  membench tab     but about:memory
-- canvas-2d-pixel-bytes            2.74 MB          0.00 MB          0.00 MB
-- gfx-surface-image		   19.77 MB	     2.50 MB	      0.27 MB
-- heap-allocated		1,946.76 MB	   721.17 MB	    103.52 MB
-- heap-committed		2,035.88 MB	 1,755.25 MB	    974.25 MB
-- heap-dirty			    3.18 MB	     3.48 MB	      2.81 MB
-- heap-unallocated		   89.11 MB	 1,034.08 MB	    870.73 MB
-- js-compartments-system	          2	           2	            2
-- js-compartments-user		        243	         149	            1
-- js-gc-heap			  568.63 MB	   351.50 MB	     35.88 MB
-- js-gc-heap-arena-unused	   85.71 MB	    68.42 MB	      3.95 MB
-- js-gc-heap-chunk-empty	    0.13 MB	     0.00 MB	      0.00 MB
-- js-gc-heap-chunk-unused	  126.05 MB	   198.17 MB	     24.78 MB
-- js-gc-heap-unused-fraction	     37.24%	      75.84%	       80.09%
-- page-faults-hard		          4	           4	            4
-- page-faults-soft		  2,895,448	   3,158,265	    3,199,487
-- resident			2,199.02 MB	 1,288.83 MB	    308.22 MB
-- shmem-allocated		    0.00 MB	     0.00 MB	      0.00 MB
-- shmem-mapped			    0.00 MB	     0.00 MB	      0.00 MB
-- vsize			2,805.53 MB	 2,399.51 MB	  1,514.34 MB


The conclusion is that small chunks on itself, while benefiting the JS (the final js heap size shrinks from 48MB to 25MB), in fact hurts the browser. jemalloc is forced to manage those small chunks within its own bigger 1MB chunks. As the JS (with or without the patch) allocates the chunks randomly, that lead to the same fragmentation that we currently observe with JS arenas inside JS chunks.

To benefit the memory usage with the patch the jemalloc chunks should be decreased and match the size of JS chunks. Then everything is allocated using the same-size mmap calls (plus a small number of bigger allocations) and a random nature of JS chunk allocation does not hurt at all. Then, as the numbers above indicate, the patch shrinks the committed heap size after closing 150 tabs from 1.4 GB down to 1GB when JS and jemalloc allocates using 128K chunks.
Thanks for the detailed measurements, that's interesting.  My understanding is the 1MB is the minimum allocation size that jemalloc always returns to the OS immediately when free() is called.  So with our current 1MB chunks, we're effectively bypassing jemalloc and handling all the allocations ourselves.  But by reducing the chunk size to 64KB or 128KB the responsibility of handling chunk management is divided between jemalloc and the JS engine.

My gut feeling is that sharing the responsibility is not a good idea.  Either (a) the JS engine should have full control over management of its heap (as it does currently), or jemalloc should have full control (as it would if we got rid of chunks altogether and just allocated arenas).  The advantage of (a) is that we can customize the management exactly to the behaviour of the JS engine (e.g. we know about compartments).  The advantage of (b) is that jemalloc is pretty sophisticated.  The question is whether we can utilize our app-specific knowledge to do better than jemalloc without having to get as complex as jemalloc.

(This makes me wonder:  why are we even using jemalloc to allocate our 1MB chunks currently?  Why not just use mmap/VirtualAlloc directly?  In fact, that's what happens on platforms like Mac that don't currently have jemalloc.)
I'm not sure heap-committed is the right number to look at here.  If that actually represents committed memory, then in all three cases, there's about 600mb paged onto disk (heap-commited - resident).  But surely you'd have noticed if 2/3 of FF was paged out.

In terms of RSS, 64kb chunks with no changes to jemalloc does better than the other two options after closing all tabs.  (I don't mean to suggest that 64kb chunks are obviously better; they're much worse in terms of vsize, which is certainly meaningful.)

I agree with Nick that these results suggest that maybe we shouldn't be using jemalloc to manage the chunks.  But I think we should also try to get a better handle on what these numbers actually mean.
(In reply to comment #35)
> (This makes me wonder:  why are we even using jemalloc to allocate our 1MB
> chunks currently?  Why not just use mmap/VirtualAlloc directly?  In fact,
> that's what happens on platforms like Mac that don't currently have
> jemalloc.)

Presumably because mmap doesn't give us the alignment we need, so we can end up allocating double.
(In reply to comment #35)
> Thanks for the detailed measurements, that's interesting.  My understanding
> is the 1MB is the minimum allocation size that jemalloc always returns to
> the OS immediately when free() is called.

To be precise, 1 MB is the default jemalloc chunk size. It can be changed at runtime via an environment variable MALLOC_OPTIONS (that is how the stats are collected in the third table above). For allocation that are chunk-size or bigger jemalloc calls mmap/munmap.

> My gut feeling is that sharing the responsibility is not a good idea. 
> Either (a) the JS engine should have full control over management of its
> heap (as it does currently), or jemalloc should have full control (as it
> would if we got rid of chunks altogether and just allocated arenas). 

The second option still may require some management on JS size. Benchmarks clearly indicates that the engine must pool free arena so it can get a new one very quickly. But such pooling defeats anti-fragmentation heuristics put into jemalloc. So either we patch jemalloc so GC arenas can be allocated really quickly or we try to match jemalloc heuristics when allocating the GC arenas from the pool. 

> (This makes me wonder:  why are we even using jemalloc to allocate our 1MB
> chunks currently?  Why not just use mmap/VirtualAlloc directly?  In fact,
> that's what happens on platforms like Mac that don't currently have
> jemalloc.)

If I remember correctly that was done for unified accounting and better sharing of chunks so an empty GC chunk can be quickly turned into malloc one or vise-versa.
> Presumably because mmap doesn't give us the alignment we need, so we can end up 
> allocating double.

jemalloc actually does this itself, amusingly enough.  See chunk_alloc_mmap.

It tries to allocate a chunk using mmap, but if it's not properly aligned, it allocates something much bigger, saves the address, deallocates it, then tries to allocate something of the right size at the right point in the middle of the old allocation.  If that makes any sense.
(In reply to comment #37)

> Presumably because mmap doesn't give us the alignment we need, so we can end
> up allocating double.

When jemalloc is not enabled, the GC uses the same strategy when allocating its aligned chunks as what jemalloc would do (I lifted the relevant parts of jemalloc code into JS for that). So jemalloc+mmap does not provide any wins over mmap.
(In reply to comment #39)
> but if it's not properly aligned,
> it allocates something much bigger, saves the address, deallocates it, then
> tries to allocate something of the right size at the right point in the
> middle of the old allocation.  If that makes any sense.

This is what happens on Windows. On Linux/Mac the code takes advantage of the fact that one can call munmap on a part of allocated chunk resulting in less allocation calls.
By "the code" you mean the JS engine's allocator, not jemalloc, right?  I don't see what you describe in jemalloc's chunk_alloc_mmap.
(In reply to comment #42)
> By "the code" you mean the JS engine's allocator, not jemalloc, right?  I
> don't see what you describe in jemalloc's chunk_alloc_mmap.

Sorry, I was wrong here, neither jemalloc nor JS GC uses munmap to cut over-sized allocations. I forgot that that was not included in the implementation.
How important is it that chunks are 1MB-aligned?
(In reply to comment #34)
> 
> 
> The conclusion is that small chunks on itself, while benefiting the JS (the
> final js heap size shrinks from 48MB to 25MB), in fact hurts the browser.
> jemalloc is forced to manage those small chunks within its own bigger 1MB
> chunks. As the JS (with or without the patch) allocates the chunks randomly,
> that lead to the same fragmentation that we currently observe with JS arenas
> inside JS chunks.
> 

Thx for the numbers!
I had the same problems when I made per-compartment arenas. Back then the JS heap size increase for 50 tabs was 13% but the curve was very flat for additional tabs.

Maybe we could use a different chunk size for the chrome compartment to get the advantages there?
(In reply to comment #44)
> How important is it that chunks are 1MB-aligned?

The alignment of chunks on its size is important for fast access to the mark bits that are stored separately from the arenas. If the bitmap is stored in the arena itself, then the alignment would not be necessary. But that hurts the GC marking and finalization due to increasing number of TLB misses.
(In reply to comment #46)
> The alignment of chunks on its size is important for fast access to the mark
> bits that are stored separately from the arenas. If the bitmap is stored in
> the arena itself, then the alignment would not be necessary. But that hurts
> the GC marking and finalization due to increasing number of TLB misses.

The alignment also simplifies the conservative GC. With it is very easy to check if chunk corresponding to a potential pointer is registered with the GC.
I realized that even with per-compartment GC chunks we still get an extra fragmentation after closing the compartment. The problem is that our jemalloc chunks would still be global so after GC/CC we would have a lot holes in them.

Has we considered per-compartment private heaps among the lines of HeapAlloc and friends on Windows, http://msdn.microsoft.com/en-us/library/aa366599%28v=vs.85%29.aspx? It does not look to hard to modify jemalloc so a private instances of it can be constructed.

Then after destroying a compartment we surely can release all its memory back to the system and at worst in a long-running system we would have only fragmentation of the virtual address space.
Consider yet another data set without the patch where I included the stats showing the initial memory configuration after the browser start up as the first column, the situation after opening a separated window with  http://gregor-wagner.com/tmp/mem and tabs, closing the tabs, closing that window:


                                 initial     all tabs       no tabs     no window

 -- heap-allocated		 38.28 MB   1,896.28 MB     847.76 MB    119.06 MB
 -- heap-committed		 43.00 MB   2,002.00 MB	  1,975.00 MB  1,521.00 MB
 -- heap-dirty			  1.41 MB       3.84 MB	      2.46 MB      3.13 MB
 -- heap-unallocated		  4.72 MB     105.72 MB	  1,127.24 MB  1,401.94 MB
 -- js-compartments-system	        2             2	            2            2
 -- js-compartments-user	        3           242	          150            1
 -- js-gc-heap			  9.00 MB     478.00 MB	    473.00 MB     49.00  -- js-gc-heap-chunk-unused	  2.17 MB      27.39 MB	    318.01 MB     38.96 MB
 -- js-gc-heap-unused-fraction     34.30%        24.60%	       81.43%       85.52%
 -- page-faults-hard		       48            67	           67           67
 -- page-faults-soft		   20,773     1,827,300	    2,112,344    2,148,457
 -- resident			 74.46 MB   2,138.32 MB	  1,380.27 MB    289.98 MB
 -- vsize			476.63 MB   2,803.77 MB	  2,675.67 MB  2,054.79 MB


In ideal world the first column would match the 4th. But due to fragmentation JS heap wastes 86% of its memory while the heap as the whole (that includes the JS heap) wastes heap-unallocated/heap-commited or 1402/1521 or 92%.

This tells me that fragmentation in jemalloc chunks is noticeably worse then in JS world and perhaps this bug is a wrong target. Time to really consider per-compartment malloc heaps?
Again (see comment 36), I really don't think heap-committed means what we think it does here.  It is very clearly *not* committed, non-shared, non-copy-on-write pages, because heap-committed can be many times larger than RSS.

Until we actually understand what that number means, I don't think we should use it to make decisions here or elsewhere.
My preliminary thoughts about heap-committed (see also bug 675216): On Linux, jemalloc madvise(DONT_NEED)'s most kinds of blocks on Linux, instead of explicitly decommitting them.  So therefore the committed number has very little to do with how much memory the allocator is using.

I think RSS is your best bet here.
(In reply to comment #50)
> Again (see comment 36), I really don't think heap-committed means what we
> think it does here.  It is very clearly *not* committed, non-shared,
> non-copy-on-write pages, because heap-committed can be many times larger
> than RSS.

heap-commited is the number of pages touched by jemalloc and directly reflects the content of jemalloc_stats_t::committed. 

heap-allocate comes from jemalloc_stats_t::allocated and includes total size of all allocations currently in use by malloc/realloc callers.

heap-allocated + heap-unallocated gives jemalloc_stats_t::mapped, that is, the total number of bytes in all mmapped chunks managed by jemalloc.

So this numbers are useful to see how jemalloc heap is fragmented. And for the above test the GC allocator behaves noticeably better than jemalloc from the fragmentation point of view.
(In reply to comment #51)
> My preliminary thoughts about heap-committed (see also bug 675216): On
> Linux, jemalloc madvise(DONT_NEED)'s most kinds of blocks on Linux, instead
> of explicitly decommitting them.  So therefore the committed number has very
> little to do with how much memory the allocator is using.

Right, I forgot that the decommit is not enabled on Linux. I will patch that and report the results again.
Why can't you just look at RSS?  That's the only thing that matters.
(In reply to comment #51)
> My preliminary thoughts about heap-committed (see also bug 675216): On
> Linux, jemalloc madvise(DONT_NEED)'s most kinds of blocks on Linux, instead
> of explicitly decommitting them.

But jemalloc does explicitly releases chunks that becomes free, so  heap-allocated + heap-unallocated gives the number of mmap bytes that jemalloc wants to keep and those chunks can not be used for GC chunks, right?
I'm not sure if jemalloc explicitly decommits or unmaps its arena allocations. (In jemalloc parlance, an arena is the big thing we map in from the OS.)  It doesn't happen in arena_chunk_dealloc, and I don't think it happens in arena_purge.  It might just be that the whole thing is madvise(DONT_NEED)'ed.  Or maybe it gets freed somewhere else.

But if I understand you correctly, yes, heap-allocated + heap-unallocated = heap-mapped, which is the amount of virtual address space that jemalloc is holding onto.  This address space can't be used for allocating GC chunks (or anything else).

If this is a problem, I think the solution is probably to make jemalloc unmap memory more aggressively, rather than to choose one GC chunk size or another.  I kind of doubt it's a problem, though, because it seems unlikely that you'd use gigabytes of JS heap without using a similar amount of jemalloc heap.
(In reply to comment #55)
> In jemalloc parlance, an arena is the big thing we map in from the OS.

Hm, jemalloc.c source starts with:

/*
 * Size and alignment of memory chunks that are allocated by the OS's virtual
 * memory system.
 */
#define	CHUNK_2POW_DEFAULT	20

As I undertstand jemalloc arena is a structure for managing a set of chunks from which it allocates its things. By default jemalloc uses 4*number_of_cpu arenas.

(In reply to comment #54)
> Why can't you just look at RSS?  That's the only thing that matters.

This bug has started from the bug 669245. The suggestion there is to change the way GC things are allocated from GC chunks to follow jemalloc algorithm to minimize the amount of unused space in GC chunks. But before doing that it would interesting to know if jemalloc in fact does better than a naive GC approach.

From jemalloc source I see that at http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/jemalloc.c#3442 it calls arena_chunk_dealloc. That does not release the chunk immediately but rather put it into a single-element per-arena cache (arena_t::spare). For the previously cached empty chunk the code does call chunk_dealloc which in turn calls munmap.

That means that  jemalloc_stats_t::mapped == bytes_in_jemalloc_chunks_with_at_least_one_allocation + number_of_bytes_in_cached_empty_chunks

But number_of_bytes_in_cached_empty_chunks <= number_of_arenas * chunk_size == 16MB by default on 4-core system. Thus jemalloc_stats_t::mapped is a good approximation of jemalloc chunks with at least one allocation. 

So jemalloc does not deal better than the GC from fragmentation point of view. But the RSS stats indicates that the GC may benefit from madvise/decommit - like calls.
(In reply to comment #57)

Thanks for clarifying, Igor.  I misunderstood the purposes of your experiments
-- I thought the question was "does switching to smaller GC chunks help?", but
it was actually the much more nuanced "does switching to smaller GC chunks
managed by jemalloc help more than applying jemalloc's heuristics to our own GC
chunks would?".

I agree with you much more now that I understand what you're trying to
determine.  :)

> From jemalloc source I see that at
> http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/jemalloc.
> c#3442 it calls arena_chunk_dealloc. That does not release the chunk
> immediately but rather put it into a single-element per-arena cache
> (arena_t::spare). For the previously cached empty chunk the code does call
> chunk_dealloc which in turn calls munmap.
>
> That means that  jemalloc_stats_t::mapped ==
> bytes_in_jemalloc_chunks_with_at_least_one_allocation +
> number_of_bytes_in_cached_empty_chunks
> 
> But number_of_bytes_in_cached_empty_chunks <= number_of_arenas * chunk_size
> == 16MB by default on 4-core system. Thus jemalloc_stats_t::mapped is a good
> approximation of jemalloc chunks with at least one allocation. 

This is true if jemalloc never keeps more than one empty chunk around.  It's
true as far as I can tell from my reading, but I don't know if it's actually
the case.

> So jemalloc does not deal better than the GC from fragmentation point of
> view. But the RSS stats indicates that the GC may benefit from
> madvise/decommit - like calls.

I agree with the second part -- madvise/decommit may help, and if we have that, it doesn't appear so necessary to use jemalloc to manage smaller js gc chunks.

However, I don't think that we can yet conclude that changing the chunk choice
heuristic to be more similar to jemalloc's (bug 669245) would not also help.

First, let's be clear about exactly what kind of fragmentation we're talking
about here.  There are fact two kinds.  One is fragmentation within mmap'ed
chunks.  The other is fragmentation within pages.  We've only measured the
first one -- how many chunks are there which have at least one active
allocation?  But as we've seen, this isn't necessarily a big deal; you can take
care of it with madvise/decommit.  I don't know, but it could be that jemalloc
doesn't make much of an attempt to avoid this kind of fragmentation, since it
doesn't impact RSS.

It's the second one which can be really bad, because you can't decommit half a
page.  I don't know that we have a good way of measuring the intra-page
fragmentation in jemalloc, but AIUI we measure it in the js engine as
js-arena-unused.

Choosing chunks more cleverly might help us pack js-arenas (pages) more
tightly, thus reducing the second kind of fragmentation.  This would be a win
even with madvise/decommit.
(In reply to comment #58)
> it was actually the much more nuanced "does switching to smaller GC chunks
> managed by jemalloc help more than applying jemalloc's heuristics to our own
> GC
> chunks would?".

Yes, this is exactly what I wanted to find out. And the experiments so far indicate that relying on jemalloc for GC chunks does not help. 

So now lets see if smaller GC chunks allocated using straight mmap call would help.

No patches (the same data end test setup  as one in the commnent 49):

                             initial     all tabs       no tabs     no window

heap-allocated               38.28 MB   1,896.28 MB     847.76 MB    119.06 MB
heap-committed               43.00 MB   2,002.00 MB   1,975.00 MB  1,521.00 MB
heap-dirty                    1.41 MB       3.84 MB       2.46 MB      3.13 MB
heap-unallocated              4.72 MB     105.72 MB   1,127.24 MB  1,401.94 MB
js-compartments-system              2             2             2            2
js-compartments-user                3           242           150            1
js-gc-heap                    9.00 MB     478.00 MB     473.00 MB     49.00 MB
js-gc-heap-arena-unused       0.91 MB      90.22 MB      67.18 MB      2.95 MB
js-gc-heap-chunk-empty        1.00 MB       0.00 MB       0.00 MB      0.00 MB
js-gc-heap-chunk-unused       2.17 MB      27.39 MB     318.01 MB     38.96 MB
js-gc-heap-unused-fraction     34.30%        24.60%        81.43%       85.52%
page-faults-hard                   48            67            67           67
page-faults-soft               20,773     1,827,300     2,112,344    2,148,457
resident                     74.46 MB   2,138.32 MB   1,380.27 MB    289.98 MB
vsize                       476.63 MB   2,803.77 MB   2,675.67 MB  2,054.79 MB


64K GC chunks using straight mmap ( I used the patch here plus I commented out setCustomGCChunkAllocator in XPCJSRuntime::XPCJSRuntime). Note that now heap-(allocated|comitted|dirty) does not include GC heap stats and one has to sum this with GC chunks numbers to get totals for the heap:  

                             initial     all tabs       no tabs     no window

heap-allocated               28.98 MB   1,398.42 MB     362.80 MB     60.11 MB
heap-committed               35.00 MB	1,454.00 MB   1,449.00 MB  1,432.00 MB
heap-dirty                    1.97 MB	    2.85 MB       1.96 MB      3.35 MB
heap-unallocated              6.02 MB	   55.58 MB   1,086.20 MB  1,371.89 MB
js-compartments-system              2	          2             2            2
js-compartments-user                1	        244           149            1
js-gc-heap                    7.44 MB	  555.31 MB     301.19 MB     24.75 MB
js-gc-heap-arena-unused       0.90 MB	   81.52 MB      65.87 MB      2.66 MB
js-gc-heap-chunk-empty        0.00 MB	    2.06 MB       1.19 MB      0.00 MB
js-gc-heap-chunk-unused       1.02 MB	  105.80 MB     151.02 MB     15.60 MB
js-gc-heap-unused-fraction     25.83%	     34.10%        72.40%       73.76%
page-faults-hard                    1	         73            73           73
page-faults-soft               21,443	  1,717,754     2,104,741    2,169,409
resident                     72.92 MB	2,187.43 MB   1,192.75 MB    258.55 MB
vsize                       478.41 MB	2,801.93 MB   2,355.81 MB  1,971.01 MB


Here the last two columns show a clear win according to RSS, total heap size and vsize. Moreover, there is no regression in the number of page faults. So it looks like that smaller, per-compartment GC chunks managed outside jemalloc is a good way to minimize the memory usage.

> > So jemalloc does not deal better than the GC from fragmentation point of
> > view. But the RSS stats indicates that the GC may benefit from
> > madvise/decommit - like calls.
> 
> I agree with the second part -- madvise/decommit may help, and if we have
> that, it doesn't appear so necessary to use jemalloc to manage smaller js gc
> chunks.

madvise/decommit has need tried, but the initial implementation showed a performance regression. So it looks like smaller chunks gives provide a better win from a memory point of view without much performance regressions (see commnet 7).
(In reply to comment #59)
> 
> Here the last two columns show a clear win according to RSS, total heap size
> and vsize. 

The 1st column also shows a win.  The 2nd column has worse numbers for js-gc-heap:  478MB vs 555MB, which is entirely becaue js-gc-heap-chunk-unused rose from 27MB to 106MB.  And resident increased by almost 50MB.

I guess this is because the minimum heap size per compartment is 64KB (as opposed to 4KB previously), which results in increases in peak GC size when many new compartments are created -- the "open many tabs in a row" case is probably the worst case for this new strategy.

I definitely think it's worth considering allowing worse peak size if it results in less fragmentation over time.  But I also want to be cautious, because we're moving towards one compartment per global.  An interesting thing would be to try this experiment with the Bugzilla Tweaks add-on installed.  It adds *lots* of extra compartments (bug 672443).


> So it looks like that smaller, per-compartment GC chunks managed outside
> jemalloc is a good way to minimize the memory usage.

From a fragmentation point of view, definitely.  I was afraid that this bug was heading in a "lots of discussion and experiments but nothing ever lands" direction, but this new approach is quite promising! :)
Depends on: 672443
Blocks: 676205
Hmm, compartment-per-global may have problems with this.  On startup I found ~150 globals and its not uncommon for a single tab to have 5, 10, or 20 globals (due to iframes).  So I assume 320K to 1.2MB per tab is unacceptable, yes?  Yes.

One idea is that, once compartments are per-global, to have a per-domain "thing" (with bug 650411, the word "zone" is up for grabs again ;-) that owns the chunks.  I don't think it would be hard, from xpconnect's perspective, to get this working; what about from a GC-internal perspective?

To be clear: I don't want to impede this bug -- it looks pretty righteous -- I just want to get advice on how to proceed with compartment-per-global.
(In reply to Luke Wagner [:luke] from comment #61)
> Hmm, compartment-per-global may have problems with this.  On startup I found
> ~150 globals and its not uncommon for a single tab to have 5, 10, or 20
> globals (due to iframes).  So I assume 320K to 1.2MB per tab is
> unacceptable, yes?  Yes.

Gregor has a nice test case that opens 150 tabs, see http://gregor-wagner.com/tmp/mem that opens 150+ tabs. Just run it and then check about:memory. For me it show the JS heap size of 460 MB. So we already have 3MB per tab. Also a page that contains just <script>alert(1)</alert> shows the heap size of 40-70K on 32/64 bits systems. So even with a compartment per global the patch would not make things worse.

Clearly these data indicate that we have a lot of bloat in JS that we should address. On the other hand, even if we optimize the memory usage, hopefully soon we could allocate variable-length GC things so string data, slots, scripts etc. would also end up on the JS heap, then 64K chunks still would not be that bad. 

> One idea is that, once compartments are per-global, to have a per-domain
> "thing" (with bug 650411, the word "zone" is up for grabs again ;-) that
> owns the chunks.  I don't think it would be hard, from xpconnect's
> perspective, to get this working; what about from a GC-internal perspective?

The main reason behind per-compartment chunks is to avoid long-term fragmentation when we end up with a lot of chunks that have only few GC things allocated. Compartment-private chunks groups things naturally so when we close the tab the memory is released and can be used for other things besides the JS. I guess zones that be used for that purpose as well with per-compartment globals.
(In reply to Igor Bukanov from comment #62)

I'm going to wait to see what njn says regarding your first comment ;-)

> On the other hand, even if we optimize the memory usage, hopefully
> soon we could allocate variable-length GC things so string data, slots,
> scripts etc. would also end up on the JS heap, then 64K chunks still would
> not be that bad. 

Hmm, interesting point; once c-p-g doesn't compartment-assert on startup, I'll be sure to measure that.
Depends on: 681884
This one's been quiet for a while. Do we have a decision about what to do here, or at least a summary of the current state?
(In reply to David Mandelin from comment #64)
> This one's been quiet for a while. Do we have a decision about what to do
> here, or at least a summary of the current state?

To minimize performance regressions I need bug 681884. Also some adjustments and new measurements are necessary in view on the type inference landing and the bug 674251.
Depends on: 684569
Depends on: 684581
Depends on: 684583
Attached patch v7 (obsolete) — Splinter Review
Here is updated patch that applies to the MC tip. It uses 2K chunks so a page with just <script>a</script> uses only one 64 chunk.
Attachment #548436 - Attachment is obsolete: true
Here is a short version of the results the bug 669245 comment 44 for convenience:

about:memory measurement points on 32-bit Linux 

1. After starting the browser, opening about:memory and gmail in another tad (gmail1)
2. After closing gmail, opening all windows from gregor-wagner.com/tmp/mem and openening a new tab with new gmail instance (all+gmail2).
3. After closing all windows except about:memory and gmail (gmail2)
4. After closing gmail and opening gmail again (gmail3)


MC tip from 2011-09-07
                            gmail      all+gmail2   gmail2       gmail3

heap-allocated               39.80 MB    792.34 MB     65.34 MB     64.58 MB
heap-committed               78.00 MB    878.00 MB    873.00 MB	   871.00 MB
heap-dirty                    3.80 MB      2.49 MB      3.71 MB	     3.81 MB
heap-unallocated             38.20 MB     85.66 MB    807.66 MB	   806.42 MB
js-compartments-system              2            2            2	           2
js-compartments-user                2          244            2	           2
js-gc-heap                   17.00 MB    376.00 MB     60.00 MB	    52.00 MB
js-gc-heap-arena-unused       1.85 MB     73.19 MB      5.92 MB	     3.81 MB
js-gc-heap-chunk-clean-unuse  0.00 MB      2.00 MB      0.00 MB	     0.00 MB
js-gc-heap-chunk-dirty-unuse  1.30 MB      4.42 MB     39.14 MB	    34.04 MB
js-gc-heap-unused-fraction     18.58%       21.17%       75.09%	      72.78%
page-faults-hard                    2            3            3	           3
page-faults-soft               48,837    1,558,121    1,739,117	   1,769,136
resident                    102.47 MB  1,337.43 MB    318.79 MB	   266.40 MB
vsize                       338.70 MB  1,642.93 MB  1,220.86 MB	 1,195.21 MB


64K per-compartment chunks

                            gmail      all+gmail2   gmail2       gmail3

heap-allocated               42.66 MB    809.43 MB     66.18 MB     66.37 MB
heap-committed               89.00 MB    884.00 MB    874.00 MB	   872.00 MB
heap-dirty                    3.27 MB      3.46 MB      3.16 MB	     3.91 MB
heap-unallocated             46.33 MB     74.57 MB    807.81 MB	   805.63 MB
js-compartments-system              2            2            2	           2
js-compartments-user                2          246            2	           2
js-gc-heap                   18.31 MB    439.25 MB     32.44 MB	    31.81 MB
js-gc-heap-arena-unused       2.08 MB     55.60 MB      4.57 MB	     2.89 MB
js-gc-heap-chunk-clean-unuse  0.00 MB      0.00 MB      0.00 MB	     0.00 MB
js-gc-heap-chunk-dirty-unuse  1.62 MB     71.40 MB     12.32 MB	    13.60 MB
js-gc-heap-unused-fraction     20.17%       28.91%       52.08%	      51.85%
page-faults-hard                   86          197          199	         199
page-faults-soft               47,380    1,602,425    1,773,069	   1,808,302
resident                     97.02 MB  1,417.81 MB    292.57 MB	   249.53 MB
vsize                       366.77 MB  1,773.68 MB  1,256.30 MB	 1,187.14 MB


As the patch only affect the JS allocation and all memory in the JS heap is committed, variations in the heap size gives a useful reference for the data noise. 

With this data I see that when just opens new tabs, the patch increases the JS memory usage by 439.25/376.00 or 17% due to chunk under utilization. But when we start to close the tabs the patch behaves match better at reclaiming memory. Bug 669245 comment 44 also show how various allocation strategies influence the data.

The question is this a resonable tradeoff?
> The question is this a resonable tradeoff?

Not to make you do more tests, but it seems to me that the increase in RSS with 64KB per-compartment chunks is due "per-compartment", not "64KB".  Have we considered 64KB multi-compartment chunks?
Having chunks not be attached to compartments is certainly attractive from the perspective of comment 61.
(In reply to Luke Wagner [:luke] from comment #69)
> Having chunks not be attached to compartments is certainly attractive from
> the perspective of comment 61.

After type info and scripts became GC things a page with just var a=1 consumes over 32K of allocated things on 32 bit with 2K arenas plus extra 19K coming from under-utilized arenas, not empty arenas in chunks, giving 51K in total. And with strings and other other data allocated in the JS heap 64K-per compartment chunks would not loose any memory. 

Now, per-compartment chunks do harm if we switch to 1K arenas, but that looses over 10% in V8...
Boy, that's a lot of GC-things.  It makes me think that we will be doing work to cut it down.  But, of course, iframes tend to contain more than var a=1, so perhaps it'll even out in real measurements.  We'll see, but my worries are quieted for the moment, thanks!
> With this data I see that when just opens new tabs, the patch increases the
> JS memory usage by 439.25/376.00 or 17% due to chunk under utilization. But
> when we start to close the tabs the patch behaves match better at reclaiming
> memory. Bug 669245 comment 44 also show how various allocation strategies
> influence the data.
> 
> The question is this a resonable tradeoff?

My gut feeling is that long-term fragmentation is a bigger problem that short-term peak usage, but I could be wrong.  It'd be really nice if this change could be combined with some other change(s) (smaller objects, better GC, something) that kept the peak size the same while reducing the fragmentation.
(In reply to Justin Lebar [:jlebar] from comment #68)
> Not to make you do more tests, but it seems to me that the increase in RSS
> with 64KB per-compartment chunks is due "per-compartment", not "64KB".  Have
> we considered 64KB multi-compartment chunks?

I considered that initially, but its win was much smaller than per-compartment chunks plus it also show a bigger usage with all 150+ tabs opened. However, this increased usage was related to the layout inefficiency. That can be fixed, see the last patch from the bug 600234. With that patch on 32 bit the best utilization of chunks is archived either with 128K chunks and default 4K arenas or with 64K chunks and 2K arenas. On 64 bit it requires some extra packing in data structures, but that is straightforward to fix. So here is the result with the setup from the comment 67:

MC tip 2011-09-09 164976bffd31

                            gmail      all+gmail2   gmail2       gmail3

heap-allocated               40.94 MB    793.89 MB     67.67 MB     66.28 MB
heap-dirty                    3.22 MB      3.03 MB      2.95 MB	     3.33 MB
heap-unallocated             28.06 MB     82.10 MB    800.32 MB	   798.72 MB
js-compartments-system              2            2            2	           2
js-compartments-user                2          239            2	           2
js-gc-heap                   18.00 MB    370.00 MB     63.00 MB	    41.00 MB
js-gc-heap-arena-unused       2.34 MB     68.65 MB      7.40 MB	     2.72 MB
js-gc-heap-chunk-clean-unuse  0.00 MB      3.00 MB      0.00 MB	     0.00 MB
js-gc-heap-chunk-dirty-unuse  1.75 MB      6.94 MB     40.25 MB	    23.51 MB
js-gc-heap-unused-fraction     22.71%       21.24%       75.63%	      63.97%
page-faults-hard                    2           96           96	          96
page-faults-soft               46,997    2,127,284    2,275,358	   2,308,339
resident                    111.28 MB  1,343.04 MB    347.37 MB	   262.89 MB
vsize                       322.27 MB  1,655.34 MB  1,239.41 MB	 1,206.09 MB

128K chunks

                            gmail      all+gmail2   gmail2       gmail3

heap-allocated               40.96 MB    808.30 MB     68.52 MB     65.63 MB
heap-dirty                    3.67 MB      3.07 MB      2.86 MB	     3.65 MB
heap-unallocated             27.03 MB     81.70 MB    813.48 MB	   815.36 MB
js-compartments-system              2            2            2	           2
js-compartments-user                2          244            2	           2
js-gc-heap                   16.88 MB    368.63 MB     38.38 MB	    33.50 MB
js-gc-heap-arena-unused       2.26 MB     62.70 MB      6.11 MB	     3.94 MB
js-gc-heap-chunk-clean-unuse  0.00 MB      2.75 MB      0.00 MB	     0.00 MB
js-gc-heap-chunk-dirty-unuse  0.50 MB      2.45 MB     17.02 MB	    15.30 MB
js-gc-heap-unused-fraction     16.38%       18.41%       60.25%	      57.41%
page-faults-hard                    2           12           12	          12
page-faults-soft               47,151    1,927,948    2,095,422	   2,124,649
resident                    107.02 MB  1,352.80 MB    316.08 MB	   256.57 MB
vsize                       320.38 MB  1,692.59 MB  1,232.55 MB	 1,218.67 MB


64K chunks + 2K arenas
                            gmail      all+gmail2   gmail2       gmail3

heap-allocated               40.48 MB    789.26 MB     66.34 MB     66.28 MB
heap-dirty                    3.86 MB      2.88 MB      3.41 MB	     3.61 MB
heap-unallocated             38.51 MB     97.73 MB    810.66 MB	   808.72 MB
js-compartments-system              2            2            2	           2
js-compartments-user                2          240            2	           2
js-gc-heap                   16.38 MB    368.75 MB     44.44 MB	    29.19 MB
js-gc-heap-arena-unused       1.50 MB     61.14 MB      4.16 MB	     1.86 MB
js-gc-heap-chunk-clean-unuse  0.38 MB      3.38 MB      0.00 MB	     0.00 MB
js-gc-heap-chunk-dirty-unuse  0.28 MB      2.24 MB     24.63 MB	    12.17 MB
js-gc-heap-unused-fraction     13.11%       18.10%       64.79%	      48.06%
page-faults-hard                    2           21           21	          21
page-faults-soft               46,419    1,905,134    2,072,656	   2,104,159
resident                    103.18 MB  1,332.70 MB    332.43 MB	   272.40 MB
vsize                       338.91 MB  1,670.26 MB  1,221.29 MB	 1,195.61 MB


As the the changes only affect the JS heap, the variation in heap-allocated gives a useful reference about tests noise.

So it looks like the case of 128K chunks is a winner. It does not regressed when opening new tabs and shows a nice win after closing them. Now I will test how it affects the performance.
I drop per-compartment part from the title. As the previous comment shows we can get similar wins with smaller compartment-shared chunks without regression.
Summary: Investigate small per-compartment GC chunks → Investigate small GC chunks
Depends on: 686017
Depends on: 686144
With 64K/128K chunks I observed a V8 benchmark regression over 2%. However, with 256K chunks and with bug 686017 and bug 686144 addressed the regression is less then 0.4% with no regression in SunSpider or GC pause times. Yet those chunks still provides benefits of lessen fragmentation. For references I also include the stats from the effect of shrinking jemalloc chunks to 256K  (via setting the environment variable MALLOC_OPTIONS to kk)

The stats with the setup from the comment 67:

MC tip 2011-09-09 164976bffd31

1MB JS and jemalloc chunks
                            gmail      all+gmail2   gmail2       gmail3

heap-allocated               40.29 MB    804.34 MB     66.94 MB     66.32 MB
heap-dirty                    3.34 MB      1.63 MB      3.78 MB	     3.46 MB
heap-unallocated             41.70 MB     85.65 MB    818.06 MB	   817.68 MB
js-compartments-system              2            2            2	           2
js-compartments-user                2          253            2	           2
js-gc-heap                   18.00 MB    387.00 MB     49.00 MB	    46.00 MB
js-gc-heap-arena-unused       2.74 MB     80.42 MB      6.07 MB	     3.47 MB
js-gc-heap-chunk-clean-unuse  0.00 MB      0.00 MB      0.00 MB	     0.00 MB
js-gc-heap-chunk-dirty-unuse  1.49 MB      7.00 MB     28.04 MB	    27.73 MB
js-gc-heap-unused-fraction     23.47%       22.58%       69.62%	      67.81%
page-faults-hard                    2            2            2	           2
page-faults-soft               45,297    1,811,178    1,948,807	   1,979,877
resident                    110.52 MB  1,368.55 MB    328.41 MB	   294.05 MB
vsize                       343.57 MB  1,684.74 MB  1,238.32 MB	 1,226.08 MB

256KB JS and 1MB jemalloc chunks

                            gmail      all+gmail2   gmail2       gmail3

heap-allocated               43.86 MB    803.62 MB     67.34 MB     68.33 MB
heap-dirty                    2.58 MB      2.97 MB      2.70 MB	     3.65 MB
heap-unallocated             41.13 MB     87.38 MB    818.66 MB	   816.66 MB
js-compartments-system              2            2            2	           2
js-compartments-user                2          249            2	           2
js-gc-heap                   17.75 MB    380.00 MB     43.25 MB	    38.25 MB
js-gc-heap-arena-unused       2.33 MB     74.74 MB      6.63 MB	     3.25 MB
js-gc-heap-chunk-clean-unuse  0.00 MB      3.25 MB      0.00 MB	     0.00 MB
js-gc-heap-chunk-dirty-unuse  1.31 MB      4.29 MB     21.52 MB	    20.03 MB
js-gc-heap-unused-fraction     20.51%       21.65%       65.09%	      60.87%
page-faults-hard                   89          296          296	         296
page-faults-soft               56,141    1,811,176    1,992,485	   2,022,654
resident                    101.70 MB  1,354.27 MB    335.80 MB	   313.31 MB
vsize                       362.25 MB  1,737.36 MB  1,295.41 MB	 1,231.13 MB

1MB JS and 256K jemalloc chunks

                            gmail      all+gmail2   gmail2       gmail3

heap-allocated               43.97 MB    819.78 MB     66.69 MB     66.88 MB
heap-dirty                    2.77 MB      3.79 MB      2.92 MB	     2.09 MB
heap-unallocated             32.03 MB     94.47 MB    785.05 MB	   759.36 MB
js-compartments-system              2            2            2	           2
js-compartments-user                2          251            2	           2
js-gc-heap                   18.00 MB    383.00 MB     53.00 MB	    47.00 MB
js-gc-heap-arena-unused       1.95 MB     75.96 MB      5.86 MB	     3.48 MB
js-gc-heap-chunk-clean-unuse  0.00 MB      0.00 MB      0.00 MB	     0.00 MB
js-gc-heap-chunk-dirty-unuse  2.17 MB      6.80 MB     31.90 MB	    28.42 MB
js-gc-heap-unused-fraction     22.87%       21.60%       71.24%	      67.86%
page-faults-hard                    2            4            4	           4
page-faults-soft               54,432    2,030,972    2,155,899	   2,187,639
resident                     99.73 MB  1,375.48 MB    315.77 MB	   257.83 MB
vsize                       344.99 MB  1,714.50 MB  1,221.54 MB	 1,156.74 MB


So to proceed father I will ask for a review for a patch that just changes JS chunk size to 256K after fixing the bug 686017 and bug 686144. Even smaller JS chunks or arenas or compartment-owned chunks is for another bug.
Note about RSS data in those tests - it is pretty much useless as it varies between runs up to 40 MB depending on how many times I press Minimize Memory Usage button. In particular, sometimes just single click to Minimize Memory Usage increases that number by 20MB.
Attached patch 256K chunksSplinter Review
The patch shrinks GC chunks to 256K. Even smaller chunks or per-compartment chunks is for another bug.
Attachment #559244 - Attachment is obsolete: true
Attachment #561283 - Flags: review?(anygregor)
The mac allocator doesn't like smaller chunks and this patch causes a big regression on splay:

trunk:
pv135218:v8 idefix2$ ../OPT.OBJ/js -m -n run.js
Richards: 7625
DeltaBlue: 8827
Crypto: 12366
RayTrace: 3635
EarleyBoyer: 6939
RegExp: 1723
Splay: 8296
----
Score (version 6): 6060

with this patch:
pv135218:v8 idefix2$ ../OPT.OBJ/js -m -n run.js
Richards: 7682
DeltaBlue: 8853
Crypto: 12382
RayTrace: 3642
EarleyBoyer: 7043
RegExp: 1741
Splay: 6462
----
Score (version 6): 5880
(In reply to Gregor Wagner from comment #78)
> The mac allocator doesn't like smaller chunks and this patch causes a big
> regression on splay:

If you change MAX_EMPTY_CHUNK_AGE from 4 to 5, would it fix the regression on MAC?
(In reply to Igor Bukanov from comment #79)
> (In reply to Gregor Wagner from comment #78)
> > The mac allocator doesn't like smaller chunks and this patch causes a big
> > regression on splay:
> 
> If you change MAX_EMPTY_CHUNK_AGE from 4 to 5, would it fix the regression
> on MAC?

That has no influence. Splay uses way more memory than all the other benchmarks.
I am wondering how much speedup we could get on Mac by putting the allocation on the helper thread. Similar to the de-allocation of the chunks.
Depends on: 688641
(In reply to Gregor Wagner from comment #78)
> The mac allocator doesn't like smaller chunks and this patch causes a big
> regression on splay:
> 
> Splay: 8296
-----
> Splay: 6462

Gregor, on which OS X have you seen the regression? On Mac Mini with OS X 10.7 I do not see it. For example, without the patch my scores for Splay during 4 consequent runs in the browser of http://v8.googlecode.com/svn/data/benchmarks/v6/run.html :

8615 8695 8703 8646

With 256K chunks that becomes:

8623 8671 8703 8752

The same is with other benchmarks - all the differences are within noise.
Depends on: 691731
This has stalled and I'm tempted to mark it WONTFIX for two reasons:

- The results in comment 75 aren't very convincing.  js-gc-heap is better with 256KB JS chunks, but resident has mixed results, vsize is worse, page-faults-hard is much worse.

- More generally, the real solution to our JS heap fragmentation problems is to introduce a compacting collector.  I'd rather we work on that than muck about with constants to eke out 1% wins.

I'm downgrading this to MemShrink:P2.
Whiteboard: [MemShrink:P1] → [MemShrink:P2]
(In reply to Nicholas Nethercote [:njn] from comment #82)
> This has stalled and I'm tempted to mark it WONTFIX for two reasons:

I agree with WONTFIX. With the bug 670596 fixed we discard memory used for the individual GC arenas. That makes the chunk size no longer relevant for memory performance.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
Attachment #561283 - Flags: review?(anygregor)
You need to log in before you can comment on or make changes to this bug.