Closed
Bug 671702
Opened 10 years ago
Closed 9 years ago
Investigate small GC chunks
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: igor, Assigned: igor)
References
(Depends on 2 open bugs, Blocks 1 open bug)
Details
(Keywords: memory-footprint, Whiteboard: [MemShrink:P2])
Attachments
(1 file, 7 obsolete files)
|
512 bytes,
patch
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #669245 comment 13+++ To improve our heap fragmentation problem when we have many GC chunks with just few arenas allocated we should investigate small 64K-sized chunks (this is the smallest amount of memory that Windows allocates using VirtualAlloc). With such chunks it should also be possible to make them per-compartment with each chunk holding arenas only from single compartment.
Updated•10 years ago
|
Whiteboard: [MemShrink]
| Assignee | ||
Updated•10 years ago
|
Assignee: general → igor
Comment 1•10 years ago
|
||
You can just move the compartment pointer into each arena. Want to try that first? Its probably easier (at the end of the arena for example). Thanks for working on this igor. Awesome stuff.
| Assignee | ||
Comment 2•10 years ago
|
||
(In reply to comment #1) > You can just move the compartment pointer into each arena. I am not sure what do you mean here. Currently the compartment pointer is per arena (it is stored in the arena header).
| Assignee | ||
Comment 3•10 years ago
|
||
The patch makes chunks per-compartment. Under jemalloc browser on 64-bit Linux build for http://v8.googlecode.com/svn/data/benchmarks/v6/run.html I clearly see up to 20% regression when run the first time after the browser restart. However for the second and later benchmark the situation looks rather OK. Here is data for different chunk sizes: Base 32K 64K 128K 512K Score: 3917 3440 3748 3923 4020 Richards: 6227 6146 6196 5998 6047 DeltaBlue: 3811 2290 2698 3848 4172 Crypto: 6353 6268 6264 6369 6101 RayTrace: 2863 2682 2804 2765 2928 EarleyBoyer: 3811 3210 4335 4085 4273 RegExp: 1647 1443 1542 1586 1647 Splay: 5215 5207 5300 5430 5354 I.e. with 128K compartment-private chunks the scores matches the score without the patch. With SunSpider the situation is similar, but the the differences are less then with V8. Now I need to see memory utilization with the patch.
I'm a little confused. Are the numbers in the table from the first run after a restart or from a later run?
| Assignee | ||
Comment 5•10 years ago
|
||
(In reply to comment #4) > I'm a little confused. Are the numbers in the table from the first run after > a restart or from a later run? The numbers are for latter runs. The first run after the browser startup with 64K chunks scores about 15% lower compared with the number in the table, with 128K chunks the regression is about 10%. For references, here are the JS shell numbers for V8 with 128K chunks allocated via mmap: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: *1.058x as slow* 1528.2ms +/- 0.1% 1617.4ms +/- 0.3% significant ============================================================================= v8: *1.058x as slow* 1528.2ms +/- 0.1% 1617.4ms +/- 0.3% significant crypto: ?? 203.1ms +/- 0.3% 203.6ms +/- 0.3% not conclusive: might be *1.003x as slow* deltablue: *1.015x as slow* 277.2ms +/- 0.4% 281.4ms +/- 0.2% significant earley-boyer: *1.112x as slow* 250.9ms +/- 0.3% 279.0ms +/- 0.6% significant raytrace: *1.021x as slow* 196.2ms +/- 0.2% 200.4ms +/- 0.2% significant regexp: *1.007x as slow* 193.4ms +/- 0.4% 194.8ms +/- 0.4% significant richards: *1.013x as slow* 206.4ms +/- 0.6% 209.1ms +/- 0.7% significant splay: *1.24x as slow* 201.0ms +/- 0.3% 249.0ms +/- 1.2% significant With 64K chunks the numbers are: ============================================================================= ** TOTAL **: *1.090x as slow* 1528.2ms +/- 0.1% 1666.1ms +/- 0.5% significant ============================================================================= v8: *1.090x as slow* 1528.2ms +/- 0.1% 1666.1ms +/- 0.5% significant crypto: *1.008x as slow* 203.1ms +/- 0.3% 204.8ms +/- 0.4% significant deltablue: *1.21x as slow* 277.2ms +/- 0.4% 336.2ms +/- 0.4% significant earley-boyer: *1.101x as slow* 250.9ms +/- 0.3% 276.2ms +/- 0.3% significant raytrace: *1.022x as slow* 196.2ms +/- 0.2% 200.5ms +/- 0.2% significant regexp: *1.025x as slow* 193.4ms +/- 0.4% 198.2ms +/- 0.6% significant richards: *1.009x as slow* 206.4ms +/- 0.6% 208.4ms +/- 0.7% significant splay: *1.20x as slow* 201.0ms +/- 0.3% 241.9ms +/- 2.6% significant The try server builds with the patch: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/ibukanov@mozilla.com-1e65f648fca3
| Assignee | ||
Comment 6•10 years ago
|
||
about:memory stats after opening gmail, livejornal, gmaps, bbc.co.uk, closing gmail and running GC an CC. Without the patch: 167.76 MB (100.0%) -- explicit ├───68.02 MB (40.54%) -- js │ ├──19.78 MB (11.79%) -- gc-heap-chunk-unused │ ├──13.44 MB (08.01%) -- compartment([System Principal]) │ │ ├───8.52 MB (05.08%) -- gc-heap │ │ │ ├──4.38 MB (02.61%) -- objects │ │ │ ├──3.30 MB (01.97%) -- shapes │ │ │ └──0.83 MB (00.50%) -- (5 omitted) │ │ ├───2.01 MB (01.20%) -- scripts │ │ ├───1.50 MB (00.89%) -- mjit-code │ │ └───1.42 MB (00.84%) -- (5 omitted) │ ├──10.42 MB (06.21%) -- compartment(http://maps.google.com/) │ │ ├───4.54 MB (02.71%) -- gc-heap │ │ │ ├──1.94 MB (01.16%) -- objects │ │ │ ├──1.63 MB (00.97%) -- arena-unused │ │ │ ├──0.85 MB (00.51%) -- shapes │ │ │ └──0.12 MB (00.07%) -- (4 omitted) │ │ ├───2.73 MB (01.63%) -- mjit-code │ │ ├───1.63 MB (00.97%) -- scripts │ │ └───1.52 MB (00.91%) -- (5 omitted) │ ├───8.00 MB (04.77%) -- stack │ ├───7.31 MB (04.36%) -- compartment(http://www.bbc.co.uk/news/) │ │ ├──3.34 MB (01.99%) -- gc-heap │ │ │ ├──1.15 MB (00.68%) -- arena-unused │ │ │ ├──1.12 MB (00.67%) -- objects │ │ │ ├──0.94 MB (00.56%) -- shapes │ │ │ └──0.14 MB (00.08%) -- (4 omitted) │ │ ├──1.82 MB (01.09%) -- mjit-code │ │ ├──1.09 MB (00.65%) -- (5 omitted) │ │ └──1.05 MB (00.63%) -- scripts │ ├───4.81 MB (02.87%) -- compartment(http://fpoling.livejournal.com/friends) │ │ ├──2.49 MB (01.48%) -- gc-heap │ │ │ ├──0.97 MB (00.58%) -- objects │ │ │ ├──0.85 MB (00.51%) -- shapes │ │ │ └──0.67 MB (00.40%) -- (5 omitted) │ │ ├──1.38 MB (00.82%) -- (6 omitted) │ │ └──0.95 MB (00.56%) -- scripts │ ├───3.30 MB (01.97%) -- compartment(atoms) │ │ ├──2.40 MB (01.43%) -- gc-heap │ │ │ ├──1.59 MB (00.95%) -- strings │ │ │ └──0.81 MB (00.48%) -- (6 omitted) │ │ ├──0.91 MB (00.54%) -- string-chars │ │ └──0.00 MB (00.00%) -- (6 omitted) │ └───0.96 MB (00.57%) -- (4 omitted) ├───63.90 MB (38.09%) -- heap-unclassified ├───24.14 MB (14.39%) -- images │ ├──24.03 MB (14.32%) -- content │ │ ├──24.03 MB (14.32%) -- used │ │ │ ├──22.33 MB (13.31%) -- uncompressed │ │ │ └───1.71 MB (01.02%) -- raw │ │ └───0.00 MB (00.00%) -- (1 omitted) │ └───0.10 MB (00.06%) -- (1 omitted) ├────5.59 MB (03.33%) -- storage │ └──5.59 MB (03.33%) -- sqlite │ ├──2.75 MB (01.64%) -- places.sqlite │ │ ├──2.49 MB (01.48%) -- cache-used │ │ └──0.26 MB (00.16%) -- (2 omitted) │ ├──1.97 MB (01.17%) -- (10 omitted) │ └──0.87 MB (00.52%) -- other ├────4.65 MB (02.77%) -- layout │ └──4.65 MB (02.77%) -- all ├────1.31 MB (00.78%) -- xpti-working-set └────0.16 MB (00.09%) -- (2 omitted) With the patch: 158.01 MB (100.0%) -- explicit ├───66.49 MB (42.08%) -- heap-unclassified ├───54.07 MB (34.22%) -- js │ ├──13.58 MB (08.59%) -- compartment([System Principal]) │ │ ├───8.55 MB (05.41%) -- gc-heap │ │ │ ├──4.36 MB (02.76%) -- objects │ │ │ ├──3.25 MB (02.06%) -- shapes │ │ │ └──0.95 MB (00.60%) -- (5 omitted) │ │ ├───2.04 MB (01.29%) -- scripts │ │ ├───1.56 MB (00.99%) -- mjit-code │ │ └───1.42 MB (00.90%) -- (5 omitted) │ ├──10.20 MB (06.45%) -- compartment(http://maps.google.com/) │ │ ├───4.52 MB (02.86%) -- gc-heap │ │ │ ├──1.93 MB (01.22%) -- objects │ │ │ ├──1.62 MB (01.02%) -- arena-unused │ │ │ ├──0.85 MB (00.54%) -- shapes │ │ │ └──0.12 MB (00.07%) -- (4 omitted) │ │ ├───2.61 MB (01.65%) -- mjit-code │ │ ├───1.63 MB (01.03%) -- scripts │ │ └───1.45 MB (00.92%) -- (5 omitted) │ ├───8.00 MB (05.06%) -- stack │ ├───6.87 MB (04.35%) -- compartment(http://www.bbc.co.uk/news/) │ │ ├──3.04 MB (01.92%) -- gc-heap │ │ │ ├──1.11 MB (00.70%) -- objects │ │ │ ├──0.93 MB (00.59%) -- shapes │ │ │ ├──0.87 MB (00.55%) -- arena-unused │ │ │ └──0.13 MB (00.08%) -- (4 omitted) │ │ ├──1.76 MB (01.11%) -- mjit-code │ │ ├──1.05 MB (00.67%) -- scripts │ │ └──1.02 MB (00.64%) -- (5 omitted) │ ├───6.53 MB (04.13%) -- gc-heap-chunk-unused │ ├───4.93 MB (03.12%) -- compartment(http://fpoling.livejournal.com/friends) │ │ ├──2.58 MB (01.63%) -- gc-heap │ │ │ ├──1.01 MB (00.64%) -- objects │ │ │ ├──0.90 MB (00.57%) -- shapes │ │ │ └──0.67 MB (00.42%) -- (5 omitted) │ │ ├──1.37 MB (00.87%) -- (6 omitted) │ │ └──0.98 MB (00.62%) -- scripts │ ├───3.29 MB (02.08%) -- compartment(atoms) │ │ ├──2.38 MB (01.51%) -- gc-heap │ │ │ ├──1.61 MB (01.02%) -- strings │ │ │ └──0.78 MB (00.49%) -- (6 omitted) │ │ ├──0.91 MB (00.58%) -- string-chars │ │ └──0.00 MB (00.00%) -- (6 omitted) │ └───0.67 MB (00.42%) -- (4 omitted) ├───24.13 MB (15.27%) -- images │ ├──24.02 MB (15.20%) -- content │ │ ├──24.02 MB (15.20%) -- used │ │ │ ├──22.32 MB (14.13%) -- uncompressed │ │ │ └───1.70 MB (01.08%) -- raw │ │ └───0.00 MB (00.00%) -- (1 omitted) │ └───0.10 MB (00.07%) -- (1 omitted) ├────7.23 MB (04.57%) -- storage │ └──7.23 MB (04.57%) -- sqlite │ ├──2.72 MB (01.72%) -- places.sqlite │ │ ├──2.46 MB (01.56%) -- cache-used │ │ └──0.26 MB (00.17%) -- (2 omitted) │ ├──2.15 MB (01.36%) -- (10 omitted) │ ├──1.42 MB (00.90%) -- urlclassifier3.sqlite │ │ ├──1.32 MB (00.84%) -- cache-used │ │ └──0.10 MB (00.06%) -- (2 omitted) │ └──0.93 MB (00.59%) -- other ├────4.62 MB (02.92%) -- layout │ └──4.62 MB (02.92%) -- all ├────1.31 MB (00.83%) -- xpti-working-set └────0.16 MB (00.10%) -- (2 omitted) The numbers varies between runs, but a 15% - 20% reduction for the size of JS heap is pretty consistent. With per-compartment chunks it should be possible to improve the stats and accurately calculate the per-compartment heap size, but it is for another bug.
| Assignee | ||
Comment 7•10 years ago
|
||
The reason for v1 regressions is that currently we have: GC_ARENA_ALLOCATION_TRIGGER = 30 * js::GC_CHUNK_SIZE This constant defines the minimal threshold to run the last-ditch GC. With smaller threshold we ended up doing more GC during benchmarks. The new patch fixes that using explicit 30MB number for the constant. With this changed even with 64K chunks V8 in js shell shows no difference and V8 in the browser does not show a regression for the first run after the start-up. The overall score in the browser is at worst 5% lower than the base numbers, but this is within the noise. With 128K chunks I see no regressions.
Attachment #546940 -
Attachment is obsolete: true
Awesome! It might be useful to try a shell build with --enable-gctimer and see how the GC times differ when running on V8. It seems like there's a slight chance that conservative stack scanning could get slower because the chunk table is larger.
Comment 9•10 years ago
|
||
This might be beyond the scope of this bug, but it also might be easy to fix at the same time: How do you decide which of a compartment's chunks to allocate an object into? jemalloc's heuristic is to always allocate into the chunk with the lowest address, but any stable ordering would do.
| Assignee | ||
Comment 10•10 years ago
|
||
(In reply to comment #9) > This might be beyond the scope of this bug, but it also might be easy to fix > at the same time: How do you decide which of a compartment's chunks to > allocate an object into? The patch put all chunks with at least one available arena on a doubly-linked list with the list head stored in JSCompartment. When all arenas from the chunk are used, the chunk is removed from the list. Also, after the GC empty chunks are removed from the the list and added to the global pool of empty chunks. As before the patch the empty chunks are returned to the system if they survive 3 GC cycle or when the browser is idle. > jemalloc's heuristic is to always allocate into > the chunk with the lowest address, but any stable ordering would do. I guess instead of linking the chunks the compartment can put them into an array that can be sorted after the GC. But this is for another bug.
Comment 11•10 years ago
|
||
> But this is for another bug. Okay. For those following along at home, we've been in bug 669245.
Comment 12•10 years ago
|
||
Bug 669611 will help with the evaluation of this bug in the browser. Hopefully I'll land it on mozilla-inbound today.
| Assignee | ||
Comment 13•10 years ago
|
||
In v3 I replaced the linked list of empty chunks with an array that also stores the age. This way the empty chunks are not derefrenced that helped to remove a spike of cache/TLB misses in ExpireGCChunks. With this patch I see no differences in v8 in browser/shell. Also GCTIMER output during v8 run shows no substantial difference.
Attachment #547086 -
Attachment is obsolete: true
| Assignee | ||
Comment 14•10 years ago
|
||
(In reply to comment #12) > Bug 669611 will help with the evaluation of this bug in the browser. > Hopefully I'll land it on mozilla-inbound today. The patch shows fragmentation decrease from 50% to 35% for a simple test from the comment 6.
| Assignee | ||
Comment 15•10 years ago
|
||
Comment on attachment 547465 [details] [diff] [review] v3 The patch passes try server. It can be made smaller as there are some cleanups that I did when trying to identify regressions in the initial versions, but I suppose those would not complicate the review.
Attachment #547465 -
Flags: review?(wmccloskey)
Comment 16•10 years ago
|
||
Very nice results! Could you summarize the changes in a single comment? Did you also benchmark this patch on a small device like your netbook? It might hurt more on such platforms.
Comment 17•10 years ago
|
||
So you got rid of the system/user split from bug 666058? This needs some careful measurement. And yes, a summary would be nice!
Comment 18•10 years ago
|
||
If each chunk holds objects from only one compartment, then it's still the case that a chunk holding objects from a user compartment doesn't hold objects from a system compartment, right?
| Assignee | ||
Comment 19•10 years ago
|
||
The new version adds comments and removes unrelated changes. here is a summary of all changes: 1. Chunks are made 64K. Each chunk contains only GC things from one compartment. This eliminated the need to have system/user chunk separation and allow to reclaim all compartment GC memory after finishing the compartment. 2. Each compartment maintains a doubly-linked list of all its chunks with at least one free arena. The arenas are allocated from the list head. When all arenas are allocated, the chunk is removed from the list. If the GC frees at least one arena in the previously full chunk, it is added back to the list. Also during the GC any chunk that becomes empty is removed from the list and added to the global pool of empty chunks. 3. The pool of empty chunks is implemented as a vector to avoid dereferencing chunks in the pool when aging them. Also chunks in the pool are removed from the global hash of all chunks. This helps the conservative GC as it minimizes the hash size. It also avoids clearing the mark bitmap in the empty chunks before the start of the GC. 4. The compartment pointer is moved from the arena header to the compartment info descriptor. This freed one word in the header and the patch uses that to remove the separated array of marking delay pointers and move them to the header.
Attachment #547465 -
Attachment is obsolete: true
Attachment #547663 -
Flags: review?(wmccloskey)
Attachment #547465 -
Flags: review?(wmccloskey)
| Assignee | ||
Comment 20•10 years ago
|
||
(In reply to comment #16) > Did you also benchmark this patch on a small device like your netbook? It > might hurt more on such platforms. I do not have old single-core Atom netbook setup right now for development-testing, but on a netbook with double core 1.3 Ghz Neo K325 from AMD I see no differences in the total V8 scores under Linux neither in js shell nor when testing http://v8.googlecode.com/svn/data/benchmarks/v6/run.html in a jemalloc-emabled browser. Also the GC timing in the browser stays approximately the same, but it looks like the GC is run at different moments. That may explain why v8 shows a noticeable variation between individual benchmarks. Here are the numbers for V8 in the shell before and after the patch: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.005x as fast 3232.5ms +/- 0.1% 3216.2ms +/- 0.2% significant ============================================================================= v8: 1.005x as fast 3232.5ms +/- 0.1% 3216.2ms +/- 0.2% significant crypto: - 419.5ms +/- 0.1% 419.9ms +/- 0.1% deltablue: 1.010x as fast 598.2ms +/- 0.1% 592.1ms +/- 0.2% significant earley-boyer: *1.034x as slow* 526.3ms +/- 0.3% 544.3ms +/- 0.3% significant raytrace: 1.006x as fast 361.9ms +/- 0.3% 359.7ms +/- 0.2% significant regexp: 1.010x as fast 432.0ms +/- 0.2% 427.6ms +/- 0.2% significant richards: - 506.9ms +/- 0.8% 503.8ms +/- 0.3% splay: 1.051x as fast 387.7ms +/- 0.2% 368.8ms +/- 1.0% significant
| Assignee | ||
Comment 21•10 years ago
|
||
The new version fixes a bug in the empty chunk management code. Previously it cleared ChunkInfo::info.compartment when adding the chunk to the pool of empty chunks during the GC. But when the GC is compartment-local it leads to wrong results in IsAboutToBeFinalized that uses the compartment pointer to check for dead things in the current compartment.
Attachment #547663 -
Attachment is obsolete: true
Attachment #547675 -
Flags: review?(wmccloskey)
Attachment #547663 -
Flags: review?(wmccloskey)
Comment 22•10 years ago
|
||
(In reply to comment #20) > splay: 1.051x as fast 387.7ms +/- 0.2% 368.8ms +/- 1.0% Thats very surprising! Fragmentation shouldn't have any effect on the splay benchmark. Here we have to increase the heapsize quickly and it seems that 64K allocations are much faster than 1MB allocations.
Comment 23•10 years ago
|
||
I wanted to try it myself but your version seems to be outdated. Is this patch based on mozilla-central or still tracemonkey?
Comment 24•10 years ago
|
||
(In reply to comment #23) > I wanted to try it myself but your version seems to be outdated. Is this > patch based on mozilla-central or still tracemonkey? Oh never mind. This was some mercurial mixup on my side!
Comment 25•10 years ago
|
||
Thats what I get for v8 in the browser: trunk: Score: 5070 Richards: 7915 DeltaBlue: 4893 Crypto: 8740 RayTrace: 3952 EarleyBoyer: 4956 RegExp: 2086 Splay: 6226 with patch: Score: 4804 Richards: 8247 DeltaBlue: 4959 Crypto: 8692 RayTrace: 4006 EarleyBoyer: 4853 RegExp: 2085 Splay: 4099 I see a big regression for splay.
| Assignee | ||
Comment 26•10 years ago
|
||
(In reply to comment #25) > I see a big regression for splay. This is on Mac, right? We do not have jemalloc there yet and mmap is relatively slow tehre. Can you try to change GC_CHUNK_SHIFT to 17 and try with 128K chunks? Another option would be to try posix_memalign which is available on Mac. For that just change AllocateGCChunk from jsgc.cpp to: inline Chunk * AllocateGCChunk(JSRuntime *rt) { void *p; if (posix_memalign(&p, GC_CHUNK_SIZE, GC_CHUNK_SIZE - 4 * sizeof(uintptr_t)) return NULL; #ifdef MOZ_GCTIMER if (p) JS_ATOMIC_INCREMENT(&newChunkCount); #endif return (Chunk *) p; } That 4 * sizeof(uintptr_t) is a hack to support allocators that uses allocations headers like glibc one. and ReleaseGCChunk(JSRuntime *rt, Chunk *p) to inline void ReleaseGCChunk(JSRuntime *rt, Chunk *p) { JS_ASSERT(p); #ifdef MOZ_GCTIMER JS_ATOMIC_INCREMENT(&destroyChunkCount); #endif Foreground::free_(p); }
| Assignee | ||
Comment 27•9 years ago
|
||
Comment on attachment 547675 [details] [diff] [review] v5 I split the bug into two to investigate the effect of not hashing empty chunks separately.
Attachment #547675 -
Attachment is obsolete: true
Attachment #547675 -
Flags: review?(wmccloskey)
| Assignee | ||
Comment 28•9 years ago
|
||
The patch is based on the patches from the bug 673760 and the bug 673795 that should better expose the effect of smaller per-compartment chunks. As before I see no difference with it under Linux in jemalloc build of the browser in v8: without patch patch Score: 3952 3948 Richards: 6128 6192 DeltaBlue: 3930 3785 Crypto: 6033 6430 RayTrace: 2942 2849 EarleyBoyer: 3859 3873 RegExp: 1753 1711 Splay: 5207 5256 But in js shell (no jemalloc so mmap is used instead) the numbers for v8 are: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: *1.010x as slow* 1516.7ms +/- 0.1% 1532.1ms +/- 0.1% significant ============================================================================= v8: *1.010x as slow* 1516.7ms +/- 0.1% 1532.1ms +/- 0.1% significant crypto: 1.004x as fast 202.7ms +/- 0.3% 201.9ms +/- 0.2% significant deltablue: *1.005x as slow* 281.8ms +/- 0.3% 283.3ms +/- 0.2% significant earley-boyer: *1.017x as slow* 258.6ms +/- 0.4% 263.1ms +/- 0.3% significant raytrace: *1.004x as slow* 195.9ms +/- 0.1% 196.6ms +/- 0.1% significant regexp: 1.003x as fast 195.6ms +/- 0.2% 195.0ms +/- 0.2% significant richards: *1.008x as slow* 205.4ms +/- 0.4% 207.1ms +/- 0.4% significant splay: *1.048x as slow* 176.7ms +/- 0.4% 185.2ms +/- 0.2% significant The regression can be fully offset with chunks increased to 256K from 64K used in the patch and increasing the number of the GC cycle an empty chunk can stay before been released to the system: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.005x as fast 1516.7ms +/- 0.1% 1508.7ms +/- 0.1% significant ============================================================================= v8: 1.005x as fast 1516.7ms +/- 0.1% 1508.7ms +/- 0.1% significant crypto: ?? 202.7ms +/- 0.3% 203.0ms +/- 0.3% not conclusive: might be *1.001x as slow* deltablue: - 281.8ms +/- 0.3% 282.0ms +/- 0.2% earley-boyer: - 258.6ms +/- 0.4% 258.5ms +/- 0.4% raytrace: - 195.9ms +/- 0.1% 196.0ms +/- 0.1% regexp: 1.003x as fast 195.6ms +/- 0.2% 195.0ms +/- 0.2% significant richards: - 205.4ms +/- 0.4% 204.7ms +/- 0.5% splay: 1.043x as fast 176.7ms +/- 0.4% 169.5ms +/- 0.2% significant I suppose on Mac, where we don have yet jemalloc available, we may need to make those changes.
Updated•9 years ago
|
Whiteboard: [MemShrink] → [MemShrink:P1]
Comment 29•9 years ago
|
||
We also need numbers here that measure the JS heap size for a browsing session with 200+ tabs. I did a similar patch about a year ago and realized that the heap size increases linear with more tabs. This would lead to much more fragmentation! For the current system we get a curve that becomes very flat after about 30 tabs. The smaller chunks might help a lot but we have to study the behavior with big workloads.
| Assignee | ||
Comment 30•9 years ago
|
||
(In reply to comment #29) > The smaller chunks might help a lot but we have to study the behavior with > big workloads. My hope is that with per-compartment chunks the consequences of the fragmentation would be less noticeable as after closing a tab all its chunks would be after the shrinking GC returned to jemalloc and be available for non-JS allocations. Now, to fully fight fragmentation we would need to impose a stable ordering of chunks as proposed in the bug 669245. In any case, do you have a bookmark with those 200 tabs for testing?
Comment 31•9 years ago
|
||
(In reply to comment #30) > > In any case, do you have a bookmark with those 200 tabs for testing? I modified a page from njn to open 150 sites: http://gregor-wagner.com/tmp/mem you have to adjust this in about:config to work: dom.popup_maximum
Comment 32•9 years ago
|
||
Also, bug 674074 means that the FOTN tab in that test is causing about:memory to show blank.
Comment 33•9 years ago
|
||
(In reply to comment #32) > Also, bug 674074 means that the FOTN tab in that test is causing > about:memory to show blank. I removed FOTN. It works now for me.
| Assignee | ||
Comment 34•9 years ago
|
||
Stats with 150 tabs of open/closed from http://gregor-wagner.com/tmp/mem . The columns in the table gives the situation after opening all the tabs, closing the tabs via the buttom in the test and finally closing all the windows except about:memory. Each time I captured the stats after pressing GC/GC+CC several times until the number of empty GC chunks became close to zero. Effectively the third column gives a picture of sort-of permanent heap fragmentation. The numbers in the first column varies a lot and gives mostly the scale of what is going on. Base case of jemalloc-enabled browser on Linux: all tabs keeping just closing everything membench tab but about:memory -- canvas-2d-pixel-bytes 2.61 MB 0.00 MB 0.00 MB -- gfx-surface-image 23.78 MB 2.36 MB 0.31 MB -- heap-allocated 1,792.13 MB 818.08 MB 116.63 MB -- heap-committed 1,852.00 MB 1,857.00 MB 1,431.00 MB -- heap-dirty 1.25 MB 3.97 MB 2.71 MB -- heap-unallocated 59.87 MB 1,038.91 MB 1,314.37 MB -- js-compartments-system 2 2 2 -- js-compartments-user 249 150 1 -- js-gc-heap 439.00 MB 452.00 MB 48.00 MB -- js-gc-heap-arena-unused 83.87 MB 70.04 MB 3.37 MB -- js-gc-heap-chunk-empty 0.00 MB 0.00 MB 0.00 MB -- js-gc-heap-chunk-unused 9. 71 MB 297.35 MB 37.62 MB -- js-gc-heap-unused-fraction 21.31% 81.28% 85.38% -- page-faults-hard 10 10 10 -- page-faults-soft 1,664,150 2,286,239 2,343,457 -- resident 2,009.42 MB 1,342.29 MB 295.39 MB -- shmem-allocated 0.00 MB 0.00 MB 0.00 MB -- shmem-mapped 0.00 MB 0.00 MB 0.00 MB -- vsize 2,639.37 MB 2,556.45 MB 1,966.75 MB 64K per-compartment chunks (the patch and its dependent patches): all tabs keeping just closing everything membench tab but about:memory -- canvas-2d-pixel-bytes 2.57 MB 0.00 MB 0.00 MB -- gfx-surface-image 20.66 MB 3.16 MB 0.31 MB -- heap-allocated 1,873.28 MB 665.51 MB 93.29 MB -- heap-committed 1,931.00 MB 1,942.00 MB 1,890.00 MB -- heap-dirty 1.39 MB 2.68 MB 3.38 MB -- heap-unallocated 57.71 MB 1,276.49 MB 1,796.71 MB -- js-compartments-system 2 2 2 -- js-compartments-user 242 148 1 -- js-gc-heap 526.31 MB 303.19 MB 27.38 MB -- js-gc-heap-arena-unused 79.57 MB 67.30 MB 3.49 MB -- js-gc-heap-chunk-empty 0.31 MB 0.00 MB 0.00 MB -- js-gc-heap-chunk-unused 102.68 MB 154.80 MB 17.07 MB -- js-gc-heap-unused-fraction 34.63% 73.25% 75.08% -- page-faults-hard 4 4 4 -- page-faults-soft 1,696,230 2,466,237 2,509,942 -- resident 2,089.09 MB 1,197.11 MB 282.27 MB -- shmem-allocated 0.00 MB 0.00 MB 0.00 MB -- shmem-mapped 0.00 MB 0.00 MB 0.00 MB -- vsize 2,732.18 MB 2,633.98 MB 2,443.60 MB 128K per-compartment chunks and 128K jemalloc chunks (enabled via setting the MALLOC_OPTIONS environment variable to the value kkk , that is, decrease the default chunk size by 2 three times): all tabs keeping just closing everything membench tab but about:memory -- canvas-2d-pixel-bytes 2.74 MB 0.00 MB 0.00 MB -- gfx-surface-image 19.77 MB 2.50 MB 0.27 MB -- heap-allocated 1,946.76 MB 721.17 MB 103.52 MB -- heap-committed 2,035.88 MB 1,755.25 MB 974.25 MB -- heap-dirty 3.18 MB 3.48 MB 2.81 MB -- heap-unallocated 89.11 MB 1,034.08 MB 870.73 MB -- js-compartments-system 2 2 2 -- js-compartments-user 243 149 1 -- js-gc-heap 568.63 MB 351.50 MB 35.88 MB -- js-gc-heap-arena-unused 85.71 MB 68.42 MB 3.95 MB -- js-gc-heap-chunk-empty 0.13 MB 0.00 MB 0.00 MB -- js-gc-heap-chunk-unused 126.05 MB 198.17 MB 24.78 MB -- js-gc-heap-unused-fraction 37.24% 75.84% 80.09% -- page-faults-hard 4 4 4 -- page-faults-soft 2,895,448 3,158,265 3,199,487 -- resident 2,199.02 MB 1,288.83 MB 308.22 MB -- shmem-allocated 0.00 MB 0.00 MB 0.00 MB -- shmem-mapped 0.00 MB 0.00 MB 0.00 MB -- vsize 2,805.53 MB 2,399.51 MB 1,514.34 MB The conclusion is that small chunks on itself, while benefiting the JS (the final js heap size shrinks from 48MB to 25MB), in fact hurts the browser. jemalloc is forced to manage those small chunks within its own bigger 1MB chunks. As the JS (with or without the patch) allocates the chunks randomly, that lead to the same fragmentation that we currently observe with JS arenas inside JS chunks. To benefit the memory usage with the patch the jemalloc chunks should be decreased and match the size of JS chunks. Then everything is allocated using the same-size mmap calls (plus a small number of bigger allocations) and a random nature of JS chunk allocation does not hurt at all. Then, as the numbers above indicate, the patch shrinks the committed heap size after closing 150 tabs from 1.4 GB down to 1GB when JS and jemalloc allocates using 128K chunks.
Comment 35•9 years ago
|
||
Thanks for the detailed measurements, that's interesting. My understanding is the 1MB is the minimum allocation size that jemalloc always returns to the OS immediately when free() is called. So with our current 1MB chunks, we're effectively bypassing jemalloc and handling all the allocations ourselves. But by reducing the chunk size to 64KB or 128KB the responsibility of handling chunk management is divided between jemalloc and the JS engine. My gut feeling is that sharing the responsibility is not a good idea. Either (a) the JS engine should have full control over management of its heap (as it does currently), or jemalloc should have full control (as it would if we got rid of chunks altogether and just allocated arenas). The advantage of (a) is that we can customize the management exactly to the behaviour of the JS engine (e.g. we know about compartments). The advantage of (b) is that jemalloc is pretty sophisticated. The question is whether we can utilize our app-specific knowledge to do better than jemalloc without having to get as complex as jemalloc. (This makes me wonder: why are we even using jemalloc to allocate our 1MB chunks currently? Why not just use mmap/VirtualAlloc directly? In fact, that's what happens on platforms like Mac that don't currently have jemalloc.)
Comment 36•9 years ago
|
||
I'm not sure heap-committed is the right number to look at here. If that actually represents committed memory, then in all three cases, there's about 600mb paged onto disk (heap-commited - resident). But surely you'd have noticed if 2/3 of FF was paged out. In terms of RSS, 64kb chunks with no changes to jemalloc does better than the other two options after closing all tabs. (I don't mean to suggest that 64kb chunks are obviously better; they're much worse in terms of vsize, which is certainly meaningful.) I agree with Nick that these results suggest that maybe we shouldn't be using jemalloc to manage the chunks. But I think we should also try to get a better handle on what these numbers actually mean.
Comment 37•9 years ago
|
||
(In reply to comment #35) > (This makes me wonder: why are we even using jemalloc to allocate our 1MB > chunks currently? Why not just use mmap/VirtualAlloc directly? In fact, > that's what happens on platforms like Mac that don't currently have > jemalloc.) Presumably because mmap doesn't give us the alignment we need, so we can end up allocating double.
| Assignee | ||
Comment 38•9 years ago
|
||
(In reply to comment #35) > Thanks for the detailed measurements, that's interesting. My understanding > is the 1MB is the minimum allocation size that jemalloc always returns to > the OS immediately when free() is called. To be precise, 1 MB is the default jemalloc chunk size. It can be changed at runtime via an environment variable MALLOC_OPTIONS (that is how the stats are collected in the third table above). For allocation that are chunk-size or bigger jemalloc calls mmap/munmap. > My gut feeling is that sharing the responsibility is not a good idea. > Either (a) the JS engine should have full control over management of its > heap (as it does currently), or jemalloc should have full control (as it > would if we got rid of chunks altogether and just allocated arenas). The second option still may require some management on JS size. Benchmarks clearly indicates that the engine must pool free arena so it can get a new one very quickly. But such pooling defeats anti-fragmentation heuristics put into jemalloc. So either we patch jemalloc so GC arenas can be allocated really quickly or we try to match jemalloc heuristics when allocating the GC arenas from the pool. > (This makes me wonder: why are we even using jemalloc to allocate our 1MB > chunks currently? Why not just use mmap/VirtualAlloc directly? In fact, > that's what happens on platforms like Mac that don't currently have > jemalloc.) If I remember correctly that was done for unified accounting and better sharing of chunks so an empty GC chunk can be quickly turned into malloc one or vise-versa.
Comment 39•9 years ago
|
||
> Presumably because mmap doesn't give us the alignment we need, so we can end up
> allocating double.
jemalloc actually does this itself, amusingly enough. See chunk_alloc_mmap.
It tries to allocate a chunk using mmap, but if it's not properly aligned, it allocates something much bigger, saves the address, deallocates it, then tries to allocate something of the right size at the right point in the middle of the old allocation. If that makes any sense.| Assignee | ||
Comment 40•9 years ago
|
||
(In reply to comment #37) > Presumably because mmap doesn't give us the alignment we need, so we can end > up allocating double. When jemalloc is not enabled, the GC uses the same strategy when allocating its aligned chunks as what jemalloc would do (I lifted the relevant parts of jemalloc code into JS for that). So jemalloc+mmap does not provide any wins over mmap.
| Assignee | ||
Comment 41•9 years ago
|
||
(In reply to comment #39) > but if it's not properly aligned, > it allocates something much bigger, saves the address, deallocates it, then > tries to allocate something of the right size at the right point in the > middle of the old allocation. If that makes any sense. This is what happens on Windows. On Linux/Mac the code takes advantage of the fact that one can call munmap on a part of allocated chunk resulting in less allocation calls.
Comment 42•9 years ago
|
||
By "the code" you mean the JS engine's allocator, not jemalloc, right? I don't see what you describe in jemalloc's chunk_alloc_mmap.
| Assignee | ||
Comment 43•9 years ago
|
||
(In reply to comment #42) > By "the code" you mean the JS engine's allocator, not jemalloc, right? I > don't see what you describe in jemalloc's chunk_alloc_mmap. Sorry, I was wrong here, neither jemalloc nor JS GC uses munmap to cut over-sized allocations. I forgot that that was not included in the implementation.
Comment 44•9 years ago
|
||
How important is it that chunks are 1MB-aligned?
Comment 45•9 years ago
|
||
(In reply to comment #34) > > > The conclusion is that small chunks on itself, while benefiting the JS (the > final js heap size shrinks from 48MB to 25MB), in fact hurts the browser. > jemalloc is forced to manage those small chunks within its own bigger 1MB > chunks. As the JS (with or without the patch) allocates the chunks randomly, > that lead to the same fragmentation that we currently observe with JS arenas > inside JS chunks. > Thx for the numbers! I had the same problems when I made per-compartment arenas. Back then the JS heap size increase for 50 tabs was 13% but the curve was very flat for additional tabs. Maybe we could use a different chunk size for the chrome compartment to get the advantages there?
| Assignee | ||
Comment 46•9 years ago
|
||
(In reply to comment #44) > How important is it that chunks are 1MB-aligned? The alignment of chunks on its size is important for fast access to the mark bits that are stored separately from the arenas. If the bitmap is stored in the arena itself, then the alignment would not be necessary. But that hurts the GC marking and finalization due to increasing number of TLB misses.
| Assignee | ||
Comment 47•9 years ago
|
||
(In reply to comment #46) > The alignment of chunks on its size is important for fast access to the mark > bits that are stored separately from the arenas. If the bitmap is stored in > the arena itself, then the alignment would not be necessary. But that hurts > the GC marking and finalization due to increasing number of TLB misses. The alignment also simplifies the conservative GC. With it is very easy to check if chunk corresponding to a potential pointer is registered with the GC.
| Assignee | ||
Comment 48•9 years ago
|
||
I realized that even with per-compartment GC chunks we still get an extra fragmentation after closing the compartment. The problem is that our jemalloc chunks would still be global so after GC/CC we would have a lot holes in them. Has we considered per-compartment private heaps among the lines of HeapAlloc and friends on Windows, http://msdn.microsoft.com/en-us/library/aa366599%28v=vs.85%29.aspx? It does not look to hard to modify jemalloc so a private instances of it can be constructed. Then after destroying a compartment we surely can release all its memory back to the system and at worst in a long-running system we would have only fragmentation of the virtual address space.
| Assignee | ||
Comment 49•9 years ago
|
||
Consider yet another data set without the patch where I included the stats showing the initial memory configuration after the browser start up as the first column, the situation after opening a separated window with http://gregor-wagner.com/tmp/mem and tabs, closing the tabs, closing that window: initial all tabs no tabs no window -- heap-allocated 38.28 MB 1,896.28 MB 847.76 MB 119.06 MB -- heap-committed 43.00 MB 2,002.00 MB 1,975.00 MB 1,521.00 MB -- heap-dirty 1.41 MB 3.84 MB 2.46 MB 3.13 MB -- heap-unallocated 4.72 MB 105.72 MB 1,127.24 MB 1,401.94 MB -- js-compartments-system 2 2 2 2 -- js-compartments-user 3 242 150 1 -- js-gc-heap 9.00 MB 478.00 MB 473.00 MB 49.00 -- js-gc-heap-chunk-unused 2.17 MB 27.39 MB 318.01 MB 38.96 MB -- js-gc-heap-unused-fraction 34.30% 24.60% 81.43% 85.52% -- page-faults-hard 48 67 67 67 -- page-faults-soft 20,773 1,827,300 2,112,344 2,148,457 -- resident 74.46 MB 2,138.32 MB 1,380.27 MB 289.98 MB -- vsize 476.63 MB 2,803.77 MB 2,675.67 MB 2,054.79 MB In ideal world the first column would match the 4th. But due to fragmentation JS heap wastes 86% of its memory while the heap as the whole (that includes the JS heap) wastes heap-unallocated/heap-commited or 1402/1521 or 92%. This tells me that fragmentation in jemalloc chunks is noticeably worse then in JS world and perhaps this bug is a wrong target. Time to really consider per-compartment malloc heaps?
Comment 50•9 years ago
|
||
Again (see comment 36), I really don't think heap-committed means what we think it does here. It is very clearly *not* committed, non-shared, non-copy-on-write pages, because heap-committed can be many times larger than RSS. Until we actually understand what that number means, I don't think we should use it to make decisions here or elsewhere.
Comment 51•9 years ago
|
||
My preliminary thoughts about heap-committed (see also bug 675216): On Linux, jemalloc madvise(DONT_NEED)'s most kinds of blocks on Linux, instead of explicitly decommitting them. So therefore the committed number has very little to do with how much memory the allocator is using. I think RSS is your best bet here.
| Assignee | ||
Comment 52•9 years ago
|
||
(In reply to comment #50) > Again (see comment 36), I really don't think heap-committed means what we > think it does here. It is very clearly *not* committed, non-shared, > non-copy-on-write pages, because heap-committed can be many times larger > than RSS. heap-commited is the number of pages touched by jemalloc and directly reflects the content of jemalloc_stats_t::committed. heap-allocate comes from jemalloc_stats_t::allocated and includes total size of all allocations currently in use by malloc/realloc callers. heap-allocated + heap-unallocated gives jemalloc_stats_t::mapped, that is, the total number of bytes in all mmapped chunks managed by jemalloc. So this numbers are useful to see how jemalloc heap is fragmented. And for the above test the GC allocator behaves noticeably better than jemalloc from the fragmentation point of view.
| Assignee | ||
Comment 53•9 years ago
|
||
(In reply to comment #51) > My preliminary thoughts about heap-committed (see also bug 675216): On > Linux, jemalloc madvise(DONT_NEED)'s most kinds of blocks on Linux, instead > of explicitly decommitting them. So therefore the committed number has very > little to do with how much memory the allocator is using. Right, I forgot that the decommit is not enabled on Linux. I will patch that and report the results again.
Comment 54•9 years ago
|
||
Why can't you just look at RSS? That's the only thing that matters.
| Assignee | ||
Comment 55•9 years ago
|
||
(In reply to comment #51) > My preliminary thoughts about heap-committed (see also bug 675216): On > Linux, jemalloc madvise(DONT_NEED)'s most kinds of blocks on Linux, instead > of explicitly decommitting them. But jemalloc does explicitly releases chunks that becomes free, so heap-allocated + heap-unallocated gives the number of mmap bytes that jemalloc wants to keep and those chunks can not be used for GC chunks, right?
Comment 56•9 years ago
|
||
I'm not sure if jemalloc explicitly decommits or unmaps its arena allocations. (In jemalloc parlance, an arena is the big thing we map in from the OS.) It doesn't happen in arena_chunk_dealloc, and I don't think it happens in arena_purge. It might just be that the whole thing is madvise(DONT_NEED)'ed. Or maybe it gets freed somewhere else. But if I understand you correctly, yes, heap-allocated + heap-unallocated = heap-mapped, which is the amount of virtual address space that jemalloc is holding onto. This address space can't be used for allocating GC chunks (or anything else). If this is a problem, I think the solution is probably to make jemalloc unmap memory more aggressively, rather than to choose one GC chunk size or another. I kind of doubt it's a problem, though, because it seems unlikely that you'd use gigabytes of JS heap without using a similar amount of jemalloc heap.
| Assignee | ||
Comment 57•9 years ago
|
||
(In reply to comment #55) > In jemalloc parlance, an arena is the big thing we map in from the OS. Hm, jemalloc.c source starts with: /* * Size and alignment of memory chunks that are allocated by the OS's virtual * memory system. */ #define CHUNK_2POW_DEFAULT 20 As I undertstand jemalloc arena is a structure for managing a set of chunks from which it allocates its things. By default jemalloc uses 4*number_of_cpu arenas. (In reply to comment #54) > Why can't you just look at RSS? That's the only thing that matters. This bug has started from the bug 669245. The suggestion there is to change the way GC things are allocated from GC chunks to follow jemalloc algorithm to minimize the amount of unused space in GC chunks. But before doing that it would interesting to know if jemalloc in fact does better than a naive GC approach. From jemalloc source I see that at http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/jemalloc.c#3442 it calls arena_chunk_dealloc. That does not release the chunk immediately but rather put it into a single-element per-arena cache (arena_t::spare). For the previously cached empty chunk the code does call chunk_dealloc which in turn calls munmap. That means that jemalloc_stats_t::mapped == bytes_in_jemalloc_chunks_with_at_least_one_allocation + number_of_bytes_in_cached_empty_chunks But number_of_bytes_in_cached_empty_chunks <= number_of_arenas * chunk_size == 16MB by default on 4-core system. Thus jemalloc_stats_t::mapped is a good approximation of jemalloc chunks with at least one allocation. So jemalloc does not deal better than the GC from fragmentation point of view. But the RSS stats indicates that the GC may benefit from madvise/decommit - like calls.
Comment 58•9 years ago
|
||
(In reply to comment #57) Thanks for clarifying, Igor. I misunderstood the purposes of your experiments -- I thought the question was "does switching to smaller GC chunks help?", but it was actually the much more nuanced "does switching to smaller GC chunks managed by jemalloc help more than applying jemalloc's heuristics to our own GC chunks would?". I agree with you much more now that I understand what you're trying to determine. :) > From jemalloc source I see that at > http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/jemalloc. > c#3442 it calls arena_chunk_dealloc. That does not release the chunk > immediately but rather put it into a single-element per-arena cache > (arena_t::spare). For the previously cached empty chunk the code does call > chunk_dealloc which in turn calls munmap. > > That means that jemalloc_stats_t::mapped == > bytes_in_jemalloc_chunks_with_at_least_one_allocation + > number_of_bytes_in_cached_empty_chunks > > But number_of_bytes_in_cached_empty_chunks <= number_of_arenas * chunk_size > == 16MB by default on 4-core system. Thus jemalloc_stats_t::mapped is a good > approximation of jemalloc chunks with at least one allocation. This is true if jemalloc never keeps more than one empty chunk around. It's true as far as I can tell from my reading, but I don't know if it's actually the case. > So jemalloc does not deal better than the GC from fragmentation point of > view. But the RSS stats indicates that the GC may benefit from > madvise/decommit - like calls. I agree with the second part -- madvise/decommit may help, and if we have that, it doesn't appear so necessary to use jemalloc to manage smaller js gc chunks. However, I don't think that we can yet conclude that changing the chunk choice heuristic to be more similar to jemalloc's (bug 669245) would not also help. First, let's be clear about exactly what kind of fragmentation we're talking about here. There are fact two kinds. One is fragmentation within mmap'ed chunks. The other is fragmentation within pages. We've only measured the first one -- how many chunks are there which have at least one active allocation? But as we've seen, this isn't necessarily a big deal; you can take care of it with madvise/decommit. I don't know, but it could be that jemalloc doesn't make much of an attempt to avoid this kind of fragmentation, since it doesn't impact RSS. It's the second one which can be really bad, because you can't decommit half a page. I don't know that we have a good way of measuring the intra-page fragmentation in jemalloc, but AIUI we measure it in the js engine as js-arena-unused. Choosing chunks more cleverly might help us pack js-arenas (pages) more tightly, thus reducing the second kind of fragmentation. This would be a win even with madvise/decommit.
| Assignee | ||
Comment 59•9 years ago
|
||
(In reply to comment #58) > it was actually the much more nuanced "does switching to smaller GC chunks > managed by jemalloc help more than applying jemalloc's heuristics to our own > GC > chunks would?". Yes, this is exactly what I wanted to find out. And the experiments so far indicate that relying on jemalloc for GC chunks does not help. So now lets see if smaller GC chunks allocated using straight mmap call would help. No patches (the same data end test setup as one in the commnent 49): initial all tabs no tabs no window heap-allocated 38.28 MB 1,896.28 MB 847.76 MB 119.06 MB heap-committed 43.00 MB 2,002.00 MB 1,975.00 MB 1,521.00 MB heap-dirty 1.41 MB 3.84 MB 2.46 MB 3.13 MB heap-unallocated 4.72 MB 105.72 MB 1,127.24 MB 1,401.94 MB js-compartments-system 2 2 2 2 js-compartments-user 3 242 150 1 js-gc-heap 9.00 MB 478.00 MB 473.00 MB 49.00 MB js-gc-heap-arena-unused 0.91 MB 90.22 MB 67.18 MB 2.95 MB js-gc-heap-chunk-empty 1.00 MB 0.00 MB 0.00 MB 0.00 MB js-gc-heap-chunk-unused 2.17 MB 27.39 MB 318.01 MB 38.96 MB js-gc-heap-unused-fraction 34.30% 24.60% 81.43% 85.52% page-faults-hard 48 67 67 67 page-faults-soft 20,773 1,827,300 2,112,344 2,148,457 resident 74.46 MB 2,138.32 MB 1,380.27 MB 289.98 MB vsize 476.63 MB 2,803.77 MB 2,675.67 MB 2,054.79 MB 64K GC chunks using straight mmap ( I used the patch here plus I commented out setCustomGCChunkAllocator in XPCJSRuntime::XPCJSRuntime). Note that now heap-(allocated|comitted|dirty) does not include GC heap stats and one has to sum this with GC chunks numbers to get totals for the heap: initial all tabs no tabs no window heap-allocated 28.98 MB 1,398.42 MB 362.80 MB 60.11 MB heap-committed 35.00 MB 1,454.00 MB 1,449.00 MB 1,432.00 MB heap-dirty 1.97 MB 2.85 MB 1.96 MB 3.35 MB heap-unallocated 6.02 MB 55.58 MB 1,086.20 MB 1,371.89 MB js-compartments-system 2 2 2 2 js-compartments-user 1 244 149 1 js-gc-heap 7.44 MB 555.31 MB 301.19 MB 24.75 MB js-gc-heap-arena-unused 0.90 MB 81.52 MB 65.87 MB 2.66 MB js-gc-heap-chunk-empty 0.00 MB 2.06 MB 1.19 MB 0.00 MB js-gc-heap-chunk-unused 1.02 MB 105.80 MB 151.02 MB 15.60 MB js-gc-heap-unused-fraction 25.83% 34.10% 72.40% 73.76% page-faults-hard 1 73 73 73 page-faults-soft 21,443 1,717,754 2,104,741 2,169,409 resident 72.92 MB 2,187.43 MB 1,192.75 MB 258.55 MB vsize 478.41 MB 2,801.93 MB 2,355.81 MB 1,971.01 MB Here the last two columns show a clear win according to RSS, total heap size and vsize. Moreover, there is no regression in the number of page faults. So it looks like that smaller, per-compartment GC chunks managed outside jemalloc is a good way to minimize the memory usage. > > So jemalloc does not deal better than the GC from fragmentation point of > > view. But the RSS stats indicates that the GC may benefit from > > madvise/decommit - like calls. > > I agree with the second part -- madvise/decommit may help, and if we have > that, it doesn't appear so necessary to use jemalloc to manage smaller js gc > chunks. madvise/decommit has need tried, but the initial implementation showed a performance regression. So it looks like smaller chunks gives provide a better win from a memory point of view without much performance regressions (see commnet 7).
Comment 60•9 years ago
|
||
(In reply to comment #59) > > Here the last two columns show a clear win according to RSS, total heap size > and vsize. The 1st column also shows a win. The 2nd column has worse numbers for js-gc-heap: 478MB vs 555MB, which is entirely becaue js-gc-heap-chunk-unused rose from 27MB to 106MB. And resident increased by almost 50MB. I guess this is because the minimum heap size per compartment is 64KB (as opposed to 4KB previously), which results in increases in peak GC size when many new compartments are created -- the "open many tabs in a row" case is probably the worst case for this new strategy. I definitely think it's worth considering allowing worse peak size if it results in less fragmentation over time. But I also want to be cautious, because we're moving towards one compartment per global. An interesting thing would be to try this experiment with the Bugzilla Tweaks add-on installed. It adds *lots* of extra compartments (bug 672443). > So it looks like that smaller, per-compartment GC chunks managed outside > jemalloc is a good way to minimize the memory usage. From a fragmentation point of view, definitely. I was afraid that this bug was heading in a "lots of discussion and experiments but nothing ever lands" direction, but this new approach is quite promising! :)
Updated•9 years ago
|
Blocks: MatchStartupMem
Comment 61•9 years ago
|
||
Hmm, compartment-per-global may have problems with this. On startup I found ~150 globals and its not uncommon for a single tab to have 5, 10, or 20 globals (due to iframes). So I assume 320K to 1.2MB per tab is unacceptable, yes? Yes. One idea is that, once compartments are per-global, to have a per-domain "thing" (with bug 650411, the word "zone" is up for grabs again ;-) that owns the chunks. I don't think it would be hard, from xpconnect's perspective, to get this working; what about from a GC-internal perspective? To be clear: I don't want to impede this bug -- it looks pretty righteous -- I just want to get advice on how to proceed with compartment-per-global.
| Assignee | ||
Comment 62•9 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #61) > Hmm, compartment-per-global may have problems with this. On startup I found > ~150 globals and its not uncommon for a single tab to have 5, 10, or 20 > globals (due to iframes). So I assume 320K to 1.2MB per tab is > unacceptable, yes? Yes. Gregor has a nice test case that opens 150 tabs, see http://gregor-wagner.com/tmp/mem that opens 150+ tabs. Just run it and then check about:memory. For me it show the JS heap size of 460 MB. So we already have 3MB per tab. Also a page that contains just <script>alert(1)</alert> shows the heap size of 40-70K on 32/64 bits systems. So even with a compartment per global the patch would not make things worse. Clearly these data indicate that we have a lot of bloat in JS that we should address. On the other hand, even if we optimize the memory usage, hopefully soon we could allocate variable-length GC things so string data, slots, scripts etc. would also end up on the JS heap, then 64K chunks still would not be that bad. > One idea is that, once compartments are per-global, to have a per-domain > "thing" (with bug 650411, the word "zone" is up for grabs again ;-) that > owns the chunks. I don't think it would be hard, from xpconnect's > perspective, to get this working; what about from a GC-internal perspective? The main reason behind per-compartment chunks is to avoid long-term fragmentation when we end up with a lot of chunks that have only few GC things allocated. Compartment-private chunks groups things naturally so when we close the tab the memory is released and can be used for other things besides the JS. I guess zones that be used for that purpose as well with per-compartment globals.
Comment 63•9 years ago
|
||
(In reply to Igor Bukanov from comment #62) I'm going to wait to see what njn says regarding your first comment ;-) > On the other hand, even if we optimize the memory usage, hopefully > soon we could allocate variable-length GC things so string data, slots, > scripts etc. would also end up on the JS heap, then 64K chunks still would > not be that bad. Hmm, interesting point; once c-p-g doesn't compartment-assert on startup, I'll be sure to measure that.
Comment 64•9 years ago
|
||
This one's been quiet for a while. Do we have a decision about what to do here, or at least a summary of the current state?
| Assignee | ||
Comment 65•9 years ago
|
||
(In reply to David Mandelin from comment #64) > This one's been quiet for a while. Do we have a decision about what to do > here, or at least a summary of the current state? To minimize performance regressions I need bug 681884. Also some adjustments and new measurements are necessary in view on the type inference landing and the bug 674251.
| Assignee | ||
Comment 66•9 years ago
|
||
Here is updated patch that applies to the MC tip. It uses 2K chunks so a page with just <script>a</script> uses only one 64 chunk.
Attachment #548436 -
Attachment is obsolete: true
| Assignee | ||
Comment 67•9 years ago
|
||
Here is a short version of the results the bug 669245 comment 44 for convenience: about:memory measurement points on 32-bit Linux 1. After starting the browser, opening about:memory and gmail in another tad (gmail1) 2. After closing gmail, opening all windows from gregor-wagner.com/tmp/mem and openening a new tab with new gmail instance (all+gmail2). 3. After closing all windows except about:memory and gmail (gmail2) 4. After closing gmail and opening gmail again (gmail3) MC tip from 2011-09-07 gmail all+gmail2 gmail2 gmail3 heap-allocated 39.80 MB 792.34 MB 65.34 MB 64.58 MB heap-committed 78.00 MB 878.00 MB 873.00 MB 871.00 MB heap-dirty 3.80 MB 2.49 MB 3.71 MB 3.81 MB heap-unallocated 38.20 MB 85.66 MB 807.66 MB 806.42 MB js-compartments-system 2 2 2 2 js-compartments-user 2 244 2 2 js-gc-heap 17.00 MB 376.00 MB 60.00 MB 52.00 MB js-gc-heap-arena-unused 1.85 MB 73.19 MB 5.92 MB 3.81 MB js-gc-heap-chunk-clean-unuse 0.00 MB 2.00 MB 0.00 MB 0.00 MB js-gc-heap-chunk-dirty-unuse 1.30 MB 4.42 MB 39.14 MB 34.04 MB js-gc-heap-unused-fraction 18.58% 21.17% 75.09% 72.78% page-faults-hard 2 3 3 3 page-faults-soft 48,837 1,558,121 1,739,117 1,769,136 resident 102.47 MB 1,337.43 MB 318.79 MB 266.40 MB vsize 338.70 MB 1,642.93 MB 1,220.86 MB 1,195.21 MB 64K per-compartment chunks gmail all+gmail2 gmail2 gmail3 heap-allocated 42.66 MB 809.43 MB 66.18 MB 66.37 MB heap-committed 89.00 MB 884.00 MB 874.00 MB 872.00 MB heap-dirty 3.27 MB 3.46 MB 3.16 MB 3.91 MB heap-unallocated 46.33 MB 74.57 MB 807.81 MB 805.63 MB js-compartments-system 2 2 2 2 js-compartments-user 2 246 2 2 js-gc-heap 18.31 MB 439.25 MB 32.44 MB 31.81 MB js-gc-heap-arena-unused 2.08 MB 55.60 MB 4.57 MB 2.89 MB js-gc-heap-chunk-clean-unuse 0.00 MB 0.00 MB 0.00 MB 0.00 MB js-gc-heap-chunk-dirty-unuse 1.62 MB 71.40 MB 12.32 MB 13.60 MB js-gc-heap-unused-fraction 20.17% 28.91% 52.08% 51.85% page-faults-hard 86 197 199 199 page-faults-soft 47,380 1,602,425 1,773,069 1,808,302 resident 97.02 MB 1,417.81 MB 292.57 MB 249.53 MB vsize 366.77 MB 1,773.68 MB 1,256.30 MB 1,187.14 MB As the patch only affect the JS allocation and all memory in the JS heap is committed, variations in the heap size gives a useful reference for the data noise. With this data I see that when just opens new tabs, the patch increases the JS memory usage by 439.25/376.00 or 17% due to chunk under utilization. But when we start to close the tabs the patch behaves match better at reclaiming memory. Bug 669245 comment 44 also show how various allocation strategies influence the data. The question is this a resonable tradeoff?
Comment 68•9 years ago
|
||
> The question is this a resonable tradeoff?
Not to make you do more tests, but it seems to me that the increase in RSS with 64KB per-compartment chunks is due "per-compartment", not "64KB". Have we considered 64KB multi-compartment chunks?
Comment 69•9 years ago
|
||
Having chunks not be attached to compartments is certainly attractive from the perspective of comment 61.
| Assignee | ||
Comment 70•9 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #69) > Having chunks not be attached to compartments is certainly attractive from > the perspective of comment 61. After type info and scripts became GC things a page with just var a=1 consumes over 32K of allocated things on 32 bit with 2K arenas plus extra 19K coming from under-utilized arenas, not empty arenas in chunks, giving 51K in total. And with strings and other other data allocated in the JS heap 64K-per compartment chunks would not loose any memory. Now, per-compartment chunks do harm if we switch to 1K arenas, but that looses over 10% in V8...
Comment 71•9 years ago
|
||
Boy, that's a lot of GC-things. It makes me think that we will be doing work to cut it down. But, of course, iframes tend to contain more than var a=1, so perhaps it'll even out in real measurements. We'll see, but my worries are quieted for the moment, thanks!
Comment 72•9 years ago
|
||
> With this data I see that when just opens new tabs, the patch increases the
> JS memory usage by 439.25/376.00 or 17% due to chunk under utilization. But
> when we start to close the tabs the patch behaves match better at reclaiming
> memory. Bug 669245 comment 44 also show how various allocation strategies
> influence the data.
>
> The question is this a resonable tradeoff?
My gut feeling is that long-term fragmentation is a bigger problem that short-term peak usage, but I could be wrong. It'd be really nice if this change could be combined with some other change(s) (smaller objects, better GC, something) that kept the peak size the same while reducing the fragmentation.| Assignee | ||
Comment 73•9 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #68) > Not to make you do more tests, but it seems to me that the increase in RSS > with 64KB per-compartment chunks is due "per-compartment", not "64KB". Have > we considered 64KB multi-compartment chunks? I considered that initially, but its win was much smaller than per-compartment chunks plus it also show a bigger usage with all 150+ tabs opened. However, this increased usage was related to the layout inefficiency. That can be fixed, see the last patch from the bug 600234. With that patch on 32 bit the best utilization of chunks is archived either with 128K chunks and default 4K arenas or with 64K chunks and 2K arenas. On 64 bit it requires some extra packing in data structures, but that is straightforward to fix. So here is the result with the setup from the comment 67: MC tip 2011-09-09 164976bffd31 gmail all+gmail2 gmail2 gmail3 heap-allocated 40.94 MB 793.89 MB 67.67 MB 66.28 MB heap-dirty 3.22 MB 3.03 MB 2.95 MB 3.33 MB heap-unallocated 28.06 MB 82.10 MB 800.32 MB 798.72 MB js-compartments-system 2 2 2 2 js-compartments-user 2 239 2 2 js-gc-heap 18.00 MB 370.00 MB 63.00 MB 41.00 MB js-gc-heap-arena-unused 2.34 MB 68.65 MB 7.40 MB 2.72 MB js-gc-heap-chunk-clean-unuse 0.00 MB 3.00 MB 0.00 MB 0.00 MB js-gc-heap-chunk-dirty-unuse 1.75 MB 6.94 MB 40.25 MB 23.51 MB js-gc-heap-unused-fraction 22.71% 21.24% 75.63% 63.97% page-faults-hard 2 96 96 96 page-faults-soft 46,997 2,127,284 2,275,358 2,308,339 resident 111.28 MB 1,343.04 MB 347.37 MB 262.89 MB vsize 322.27 MB 1,655.34 MB 1,239.41 MB 1,206.09 MB 128K chunks gmail all+gmail2 gmail2 gmail3 heap-allocated 40.96 MB 808.30 MB 68.52 MB 65.63 MB heap-dirty 3.67 MB 3.07 MB 2.86 MB 3.65 MB heap-unallocated 27.03 MB 81.70 MB 813.48 MB 815.36 MB js-compartments-system 2 2 2 2 js-compartments-user 2 244 2 2 js-gc-heap 16.88 MB 368.63 MB 38.38 MB 33.50 MB js-gc-heap-arena-unused 2.26 MB 62.70 MB 6.11 MB 3.94 MB js-gc-heap-chunk-clean-unuse 0.00 MB 2.75 MB 0.00 MB 0.00 MB js-gc-heap-chunk-dirty-unuse 0.50 MB 2.45 MB 17.02 MB 15.30 MB js-gc-heap-unused-fraction 16.38% 18.41% 60.25% 57.41% page-faults-hard 2 12 12 12 page-faults-soft 47,151 1,927,948 2,095,422 2,124,649 resident 107.02 MB 1,352.80 MB 316.08 MB 256.57 MB vsize 320.38 MB 1,692.59 MB 1,232.55 MB 1,218.67 MB 64K chunks + 2K arenas gmail all+gmail2 gmail2 gmail3 heap-allocated 40.48 MB 789.26 MB 66.34 MB 66.28 MB heap-dirty 3.86 MB 2.88 MB 3.41 MB 3.61 MB heap-unallocated 38.51 MB 97.73 MB 810.66 MB 808.72 MB js-compartments-system 2 2 2 2 js-compartments-user 2 240 2 2 js-gc-heap 16.38 MB 368.75 MB 44.44 MB 29.19 MB js-gc-heap-arena-unused 1.50 MB 61.14 MB 4.16 MB 1.86 MB js-gc-heap-chunk-clean-unuse 0.38 MB 3.38 MB 0.00 MB 0.00 MB js-gc-heap-chunk-dirty-unuse 0.28 MB 2.24 MB 24.63 MB 12.17 MB js-gc-heap-unused-fraction 13.11% 18.10% 64.79% 48.06% page-faults-hard 2 21 21 21 page-faults-soft 46,419 1,905,134 2,072,656 2,104,159 resident 103.18 MB 1,332.70 MB 332.43 MB 272.40 MB vsize 338.91 MB 1,670.26 MB 1,221.29 MB 1,195.61 MB As the the changes only affect the JS heap, the variation in heap-allocated gives a useful reference about tests noise. So it looks like the case of 128K chunks is a winner. It does not regressed when opening new tabs and shows a nice win after closing them. Now I will test how it affects the performance.
| Assignee | ||
Comment 74•9 years ago
|
||
I drop per-compartment part from the title. As the previous comment shows we can get similar wins with smaller compartment-shared chunks without regression.
Summary: Investigate small per-compartment GC chunks → Investigate small GC chunks
| Assignee | ||
Comment 75•9 years ago
|
||
With 64K/128K chunks I observed a V8 benchmark regression over 2%. However, with 256K chunks and with bug 686017 and bug 686144 addressed the regression is less then 0.4% with no regression in SunSpider or GC pause times. Yet those chunks still provides benefits of lessen fragmentation. For references I also include the stats from the effect of shrinking jemalloc chunks to 256K (via setting the environment variable MALLOC_OPTIONS to kk) The stats with the setup from the comment 67: MC tip 2011-09-09 164976bffd31 1MB JS and jemalloc chunks gmail all+gmail2 gmail2 gmail3 heap-allocated 40.29 MB 804.34 MB 66.94 MB 66.32 MB heap-dirty 3.34 MB 1.63 MB 3.78 MB 3.46 MB heap-unallocated 41.70 MB 85.65 MB 818.06 MB 817.68 MB js-compartments-system 2 2 2 2 js-compartments-user 2 253 2 2 js-gc-heap 18.00 MB 387.00 MB 49.00 MB 46.00 MB js-gc-heap-arena-unused 2.74 MB 80.42 MB 6.07 MB 3.47 MB js-gc-heap-chunk-clean-unuse 0.00 MB 0.00 MB 0.00 MB 0.00 MB js-gc-heap-chunk-dirty-unuse 1.49 MB 7.00 MB 28.04 MB 27.73 MB js-gc-heap-unused-fraction 23.47% 22.58% 69.62% 67.81% page-faults-hard 2 2 2 2 page-faults-soft 45,297 1,811,178 1,948,807 1,979,877 resident 110.52 MB 1,368.55 MB 328.41 MB 294.05 MB vsize 343.57 MB 1,684.74 MB 1,238.32 MB 1,226.08 MB 256KB JS and 1MB jemalloc chunks gmail all+gmail2 gmail2 gmail3 heap-allocated 43.86 MB 803.62 MB 67.34 MB 68.33 MB heap-dirty 2.58 MB 2.97 MB 2.70 MB 3.65 MB heap-unallocated 41.13 MB 87.38 MB 818.66 MB 816.66 MB js-compartments-system 2 2 2 2 js-compartments-user 2 249 2 2 js-gc-heap 17.75 MB 380.00 MB 43.25 MB 38.25 MB js-gc-heap-arena-unused 2.33 MB 74.74 MB 6.63 MB 3.25 MB js-gc-heap-chunk-clean-unuse 0.00 MB 3.25 MB 0.00 MB 0.00 MB js-gc-heap-chunk-dirty-unuse 1.31 MB 4.29 MB 21.52 MB 20.03 MB js-gc-heap-unused-fraction 20.51% 21.65% 65.09% 60.87% page-faults-hard 89 296 296 296 page-faults-soft 56,141 1,811,176 1,992,485 2,022,654 resident 101.70 MB 1,354.27 MB 335.80 MB 313.31 MB vsize 362.25 MB 1,737.36 MB 1,295.41 MB 1,231.13 MB 1MB JS and 256K jemalloc chunks gmail all+gmail2 gmail2 gmail3 heap-allocated 43.97 MB 819.78 MB 66.69 MB 66.88 MB heap-dirty 2.77 MB 3.79 MB 2.92 MB 2.09 MB heap-unallocated 32.03 MB 94.47 MB 785.05 MB 759.36 MB js-compartments-system 2 2 2 2 js-compartments-user 2 251 2 2 js-gc-heap 18.00 MB 383.00 MB 53.00 MB 47.00 MB js-gc-heap-arena-unused 1.95 MB 75.96 MB 5.86 MB 3.48 MB js-gc-heap-chunk-clean-unuse 0.00 MB 0.00 MB 0.00 MB 0.00 MB js-gc-heap-chunk-dirty-unuse 2.17 MB 6.80 MB 31.90 MB 28.42 MB js-gc-heap-unused-fraction 22.87% 21.60% 71.24% 67.86% page-faults-hard 2 4 4 4 page-faults-soft 54,432 2,030,972 2,155,899 2,187,639 resident 99.73 MB 1,375.48 MB 315.77 MB 257.83 MB vsize 344.99 MB 1,714.50 MB 1,221.54 MB 1,156.74 MB So to proceed father I will ask for a review for a patch that just changes JS chunk size to 256K after fixing the bug 686017 and bug 686144. Even smaller JS chunks or arenas or compartment-owned chunks is for another bug.
| Assignee | ||
Comment 76•9 years ago
|
||
Note about RSS data in those tests - it is pretty much useless as it varies between runs up to 40 MB depending on how many times I press Minimize Memory Usage button. In particular, sometimes just single click to Minimize Memory Usage increases that number by 20MB.
| Assignee | ||
Comment 77•9 years ago
|
||
The patch shrinks GC chunks to 256K. Even smaller chunks or per-compartment chunks is for another bug.
Attachment #559244 -
Attachment is obsolete: true
Attachment #561283 -
Flags: review?(anygregor)
Comment 78•9 years ago
|
||
The mac allocator doesn't like smaller chunks and this patch causes a big regression on splay: trunk: pv135218:v8 idefix2$ ../OPT.OBJ/js -m -n run.js Richards: 7625 DeltaBlue: 8827 Crypto: 12366 RayTrace: 3635 EarleyBoyer: 6939 RegExp: 1723 Splay: 8296 ---- Score (version 6): 6060 with this patch: pv135218:v8 idefix2$ ../OPT.OBJ/js -m -n run.js Richards: 7682 DeltaBlue: 8853 Crypto: 12382 RayTrace: 3642 EarleyBoyer: 7043 RegExp: 1741 Splay: 6462 ---- Score (version 6): 5880
| Assignee | ||
Comment 79•9 years ago
|
||
(In reply to Gregor Wagner from comment #78) > The mac allocator doesn't like smaller chunks and this patch causes a big > regression on splay: If you change MAX_EMPTY_CHUNK_AGE from 4 to 5, would it fix the regression on MAC?
Comment 80•9 years ago
|
||
(In reply to Igor Bukanov from comment #79) > (In reply to Gregor Wagner from comment #78) > > The mac allocator doesn't like smaller chunks and this patch causes a big > > regression on splay: > > If you change MAX_EMPTY_CHUNK_AGE from 4 to 5, would it fix the regression > on MAC? That has no influence. Splay uses way more memory than all the other benchmarks. I am wondering how much speedup we could get on Mac by putting the allocation on the helper thread. Similar to the de-allocation of the chunks.
| Assignee | ||
Comment 81•9 years ago
|
||
(In reply to Gregor Wagner from comment #78) > The mac allocator doesn't like smaller chunks and this patch causes a big > regression on splay: > > Splay: 8296 ----- > Splay: 6462 Gregor, on which OS X have you seen the regression? On Mac Mini with OS X 10.7 I do not see it. For example, without the patch my scores for Splay during 4 consequent runs in the browser of http://v8.googlecode.com/svn/data/benchmarks/v6/run.html : 8615 8695 8703 8646 With 256K chunks that becomes: 8623 8671 8703 8752 The same is with other benchmarks - all the differences are within noise.
Comment 82•9 years ago
|
||
This has stalled and I'm tempted to mark it WONTFIX for two reasons: - The results in comment 75 aren't very convincing. js-gc-heap is better with 256KB JS chunks, but resident has mixed results, vsize is worse, page-faults-hard is much worse. - More generally, the real solution to our JS heap fragmentation problems is to introduce a compacting collector. I'd rather we work on that than muck about with constants to eke out 1% wins. I'm downgrading this to MemShrink:P2.
Whiteboard: [MemShrink:P1] → [MemShrink:P2]
| Assignee | ||
Comment 83•9 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #82) > This has stalled and I'm tempted to mark it WONTFIX for two reasons: I agree with WONTFIX. With the bug 670596 fixed we discard memory used for the individual GC arenas. That makes the chunk size no longer relevant for memory performance.
| Assignee | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Updated•9 years ago
|
Attachment #561283 -
Flags: review?(anygregor)
You need to log in
before you can comment on or make changes to this bug.
Description
•