541140 - TM: don't return GCChunks immediately

Assignee

Description

•

16 years ago

The measurements in bug 539532 show that we spend a lot of time returning GC pages to the OS during each GC. These pages have to be allocated again. The idea is to keep the pages around. This will reduce the GC pause time and object allocation time in general.

Gregor Wagner [:gwagner]

Assignee

•

16 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Is there a better way to get the chunk address from the chunkinfo? The code in DestoyEmptyGCChunks is a little bit complicated.

Attachment #423624 - Attachment is obsolete: true

Attachment #423842 - Flags: review?(igor)

Igor Bukanov

Comment 3

•

16 years ago

(In reply to comment #2) > Created an attachment (id=423842) [details] > patch Hm, I thought the idea is to offload chunk destruction to another thread and see how that would affect performance on different platforms. For example, OS X is known for a slow VM, on a faster VM the overhead of having of up to 50 MB unused memory may slow things down. Also, if munmap is bad even if it is done on another thread, an alternative is to use posix_memalign(big_chunk). That at least would return unused memory into the malloc heap. > > Is there a better way to get the chunk address from the chunkinfo? The code in > DestoyEmptyGCChunks is a little bit complicated. Unused chunks can be linked through a link at the first chunk word. They do not need chunkinfo.

Andreas Gal :gal

Comment 4

•

•

Comment 18

•

16 years ago

How about delaying the free? We could count how many GC cycles the chunk is empty and once it reaches a certain threshold we give it back to the OS.

Mike Shaver (:shaver emeritus)

Comment 19

•

16 years ago

Yeah, I'm more confortable with some sort of decay model here. I wanted to try some with arrays when I did the dense array work, so that I could amortize the growth over fewer allocations without having long-term memory impact, but never got to it. We don't get reliable memory pressure signals, and some of our memory pressure is from the market rather than the OS.

Igor Bukanov

Comment 20

•

16 years ago

(In reply to comment #18) > How about delaying the free? Well, my tests indicates that doing the free in time makes things faster on Linux both with GLIBC allocator and jemalloc.

Igor Bukanov

Comment 21

•

16 years ago

Attached patch hack to restore posix_memalign (obsolete) — Details — Splinter Review

Gregor, could you try this on Mac to see how it would affect sunspider?

Gregor Wagner [:gwagner]

Assignee

Updated

•

16 years ago

Attachment #424051 - Attachment mime type: application/octet-stream → text/plain

Igor Bukanov

Comment 22

•

•

16 years ago

I measured GC pause times on the Canopy Chrome experiment. My procedure: - Measure time for each call to js_GC with rdtsc() and a class that takes timestamps in ctor/dtor - Run Canopy for several seconds - To analyze data, throw away pause times much shorter than usual (less than 20M cycles). - Report average and max pause time in ms Results: mean (ms) max (ms) TM tip: 21.5 26.9 first patch: 17.5 25.7 posix_memalign patch: 22.2 26.1

Gregor Wagner [:gwagner]

Assignee

Comment 38

•

16 years ago

In order to get a better understanding for Linux I was measuring clock cycles (measured with rdtsc) for the clock benchmark Alex mentioned. I take the first 10 GC runs once I start the benchmark. All numbers represent clockcycles * 1E6: chunk and destroy represent clock-cycles spend in createChunk and destroyChunk function until the next GC. GC total means clock cycles spend in js_GC. never return means the approach where I never return chunks to the OS. Looks like a combination of memalign and delayed returning pages is the way to go for Linux but not for Windows as Davids number show. Tip: GC total: 59.984250 create chunk: 1.283946, destroy: 5.357551 GC total: 65.148690 create chunk: 1.702142, destroy: 8.668120 GC total: 65.102940 create chunk: 1.668936, destroy: 8.571407 GC total: 66.635302 create chunk: 1.785963, destroy: 8.642302 GC total: 67.303733 create chunk: 1.823751, destroy: 9.096835 GC total: 65.943045 create chunk: 1.647631, destroy: 8.743144 GC total: 71.483768 create chunk: 1.833246, destroy: 11.906154 GC total: 69.165578 create chunk: 1.959102, destroy: 9.234659 GC total: 62.773845 create chunk: 1.602108, destroy: 8.140037 GC total: 69.744675 create chunk: 1.851726, destroy: 9.220635 memalign GC total: 56.798535 create chunk: 1.078626, destroy: 4.140181 GC total: 67.185240 create chunk: 2.071883, destroy: 7.399031 GC total: 66.639907 create chunk: 2.088105, destroy: 7.072140 GC total: 66.712673 create chunk: 2.148167, destroy: 7.076758 GC total: 65.022098 create chunk: 1.877782, destroy: 7.274674 GC total: 62.263410 create chunk: 3.300468, destroy: 6.702548 GC total: 62.094457 create chunk: 3.134920, destroy: 6.511204 GC total: 62.282565 create chunk: 3.174125, destroy: 6.583060 GC total: 62.116913 create chunk: 3.093809, destroy: 6.525590 GC total: 60.745748 create chunk: 1.510684, destroy: 3.266253 Tip + never return: GC total: 53.453002 create chunk: 1.550957, destroy: 0.000000 GC total: 56.476117 create chunk: 0.585223, destroy: 0.000000 GC total: 62.370593 create chunk: 0.256883, destroy: 0.000000 GC total: 55.096665 create chunk: 0.000000, destroy: 0.000000 GC total: 55.641825 create chunk: 0.000000, destroy: 0.000000 GC total: 55.118168 create chunk: 0.000000, destroy: 0.000000 GC total: 55.079535 create chunk: 0.000000, destroy: 0.000000 GC total: 55.285597 create chunk: 0.000000, destroy: 0.000000 GC total: 58.120110 create chunk: 0.000000, destroy: 0.000000 GC total: 58.995465 create chunk: 0.000000, destroy: 0.000000 memalign + never return: GC total: 34.332472 create chunk: 0.463917, destroy: 0.000000 GC total: 43.483253 create chunk: 0.956575, destroy: 0.000000 GC total: 44.499712 create chunk: 0.051564, destroy: 0.000000 GC total: 44.070758 create chunk: 0.000000, destroy: 0.000000 GC total: 44.366948 create chunk: 0.000000, destroy: 0.000000 GC total: 44.334638 create chunk: 0.000000, destroy: 0.000000 GC total: 44.319345 create chunk: 0.000000, destroy: 0.000000 GC total: 50.329492 create chunk: 0.302865, destroy: 0.000000 GC total: 43.602270 create chunk: 0.000000, destroy: 0.000000 GC total: 45.702323 create chunk: 0.000000, destroy: 0.000000

Igor Bukanov

Comment 39

•

16 years ago

Attached patch memalign + free on background thread (obsolete) — Details — Splinter Review

The new patch moves the free call to the background thread.

Igor Bukanov

Comment 40

•

16 years ago

(In reply to comment #38) > In order to get a better understanding for Linux I was measuring clock cycles > (measured with rdtsc) for the clock benchmark Alex mentioned. Could you post that measuring code?

Igor Bukanov

Comment 41

•

16 years ago

Attached patch memalign + free on background thread v2 (obsolete) — Details — Splinter Review

The new patch increases the chunk size to 128K and uses as before memalign + free on the background thread. For the benchmark from the comment 29 that exercise only GC allocator with no malloc calls I have with the patch in jemalloc+xpcshell: base: Alloc time: 209 GC time all used: 58 GC time all free: 15 memalign + background free: Alloc time: 214 GC time all used: 59 GC time all free: 6 mmap + never return memory Alloc time: 200 GC time all used: 59 GC time all free: 7 This shows that on Linux mmap/munmap is faster than memalign allocation but that is completely offset by slower finalization. Still mmap/no-release wins on all counts. If I change the benchmark so allocation uses the same number of GC and malloc allocations: function test(N) { var N = 6e6; var a = []; var time0 = Date.now(); for(var i = 0; i != N; ++i) { a[i] = "aa".toUpperCase(); } var time0 = Date.now() - time0; gc(); var time1 = Date.now(); gc(); time1 = Date.now() - time1; a = null; var time2 = Date.now(); gc(); time2 = Date.now() - time2; return [time0, time1, time2]; } // warmup test(); var min_times = [Infinity, Infinity, Infinity]; for (var i = 0; i != 100; ++i) { var times = test(); min_times[0] = Math.min(min_times[0], times[0]); min_times[1] = Math.min(min_times[1], times[1]); min_times[2] = Math.min(min_times[2], times[2]); } print("Alloc time: "+min_times[0]); print("GC time all used: "+min_times[1]); print("GC time all free: "+min_times[2]); The results: base: Alloc time: 760 GC time all used: 97 GC time all free: 227 memalign + background free: Alloc time: 745 GC time all used: 97 GC time all free: 197 mmap + never return memory Alloc time: 722 GC time all used: 97 GC time all free: 196 That is, now memalign wins with the base on allocation while mmap/no-free is still the fastest. My theory regarding memalign allocation speedup is that memalign somehow warmup jemalloc code and data caches. This speedup is not enough to offset the wins from mmap/no-free in this benchmark but in sunspider it is enough to make the allocation faster. Still if one focuses purely on GC mark and finally timing then memalign+free on the background thread on Linux is at least as fast as mmap/no-free but has an advantage that is does not leave unused memory arround. Gregor/David, could you repeat the tests with this patch?

Attachment #424051 - Attachment is obsolete: true

Attachment #424198 - Attachment is obsolete: true

Igor Bukanov

Comment 42

•

16 years ago

Attached patch background munmap (obsolete) — Details — Splinter Review

The patch moves munmup calls to the background thread. On Linux it worse then other proposals. There is no changes in sunspider, allocation tests shows slowdown and the mark/sweep phases are not faster then other patches.

Gregor Wagner [:gwagner]

Assignee

Comment 43

•

16 years ago

MacPro OS X 10.6.2 I calculated the (arith) mean of 10 GC calls once I started the benchmarks. All numbers (except Count) represent (rdtsc) cycles * 1E6 GC Total: accumulated cycles in js_GC (mean) Alloc since last GC: accumulate cycles in NewGCChunk (mean) Count: number of NewGCChunk calls between last GC call and current call (no mean value) per alloc: current accumulated NewGCChunk cycles / Count (no mean value) Mark, Sweep, Finalize Obj, Doubles: cycles in mark, sweep... (mean) Destroy: cycles in DestroyGCArenas (mean) Count: number of DestroyGCArenas calls between last GC call and current one (no mean value) per Dist: current accumulated DestroyGCArenas cycles / Count (no mean value) Tip: mmap + 16 arenas per chunk Clock GC Total: 86.277183 Alloc since last GC: 11.231231, Count: 490 per alloc: 0.022659 Mark: 20.773317, Sweep: 63.651558 Finalize Obj: 40.383799, Doubles: 0.002311 Destroy: 21.227316, Count: 491, per Dest: 0.037003 Canopy GC Total: 108.236756 Alloc since last GC: 97.377330, Count: 985 per alloc: 0.100792 Mark: 35.407740, Sweep: 69.874921 Finalize Obj: 19.989628, Doubles: 6.034316 Destroy: 41.343824, Count: 984, per Dest: 0.03649 It is very interesting that the time per allocation increases dramatically if we call mmap over 600 times. Attachment 424209 [details] [diff] 32 arenas per chunk posix_memalign ? free on background thread ? Clock GC Total: 84.069372 Alloc since last GC: 5.045846, Count: 254 per alloc: 0.020286 Mark: 20.815956, Sweep: 61.436992 Finalize Obj: 40.309536, Doubles: 0.002079 Destroy: 19.095992, Count: 254, per Dest: 0.065833 Canopy GC Total: 99.394880 Alloc since last GC: 49.907864, Count: 495 per alloc: 0.115329 Mark: 35.076904, Sweep: 61.967926 Finalize Obj: 18.955368, Doubles: 5.883493 Destoy: 34.705608, Count: 497, per Dest: 0.064539 Attachment 424219 [details] [diff] Background mmunmap Clock GC Total: 71.826731 Alloc since last GC: 11.787724, Count: 490 per alloc: 0.022092 Mark: 20.867233, Sweep: 48.612205 Finalize Obj: 43.314811, Doubles: 0.002473 Destoy: 3.230002, Count: 486, per Dest: 0.054199 Canopy: GC Total: 71.387655 Alloc since last GC: 95.360449, Count: 1048 per alloc: 0.103057 Mark: 35.334600, Sweep: 33.680048 Finalize Obj: 18.939819, Doubles: 6.190844 Destoy: 6.139078, Count: 1012, per Dest: 0.059629 Allocation time kills here!!!! never release chunks Clock GC Total: 73.801036 Alloc since last GC: 0.000000, Count: 0 per alloc: 0.000000 Mark: 23.834874, Sweep: 47.563324 Finalize Obj: 42.371644, Doubles: 0.003180 Destoy: 3.149600, Count: 0, per Dest: 0.000000 Canopy GC Total: 58.777888 Alloc since last GC: 1.598225, Count: 0 per alloc: 0.000000 Mark: 27.906838, Sweep: 28.843181 Finalize Obj: 16.121037, Doubles: 5.332098 Destoy: 5.215941, Count: 0, per Dest: 0.000000

Gregor Wagner [:gwagner]

Assignee

Comment 44

•

16 years ago

And the same for Ubuntu 9.10 2.6.30 Tip: mmap + 16 arenas per chunk Clock: GC Total: 70.850034 Alloc since last GC: 2.257380, Count: 466 per alloc: 0.004596 Mark: 19.527767, Sweep: 50.059779 Finalize Obj: 35.943663, Doubles: 0.001456 Destroy: 12.141015, Count: 467, per Dest: 0.020657 Canopy: GC Total: 72.883884 Alloc since last GC: 8.125352, Count: 779 per alloc: 0.011639 Mark: 26.552469, Sweep: 44.960441 Finalize Obj: 8.169489, Doubles: 3.656163 Destroy: 19.895777, Count: 776, per Dest: 0.021073 GC Total: 71.099658 Attachment 424209 [details] [diff] 32 arenas per chunk posix_memalign free on background thread Clock: GC Total: 71.766398 Alloc since last GC: 1.622382, Count: 235 per alloc: 0.007643 Mark: 18.974574, Sweep: 45.849586 Finalize Obj: 34.197816, Doubles: 0.002181 Destroy: 2.356967, Count: 234, per Dest: 0.000039 Canopy: GC Total: 78.486300 Alloc since last GC: 2.612945, Count: 388 per alloc: 0.006740 Mark: 25.261387, Sweep: 49.146618 Finalize Obj: 8.961127, Doubles: 4.070297 Destroy: 4.361608, Count: 393, per Dest: 0.000038 Sweep is unstable between 30 and 60! Attachment 424219 [details] [diff] Background mmunmap Clock: GC Total: 70.966204 Alloc since last GC: 2.115593, Count: 474 per alloc: 0.004514 Mark: 19.397501, Sweep: 50.368943 Finalize Obj: 35.884453, Doubles: 0.001558 Destroy: 2.411190, Count: 473, per Dest: 0.021089 Canopy: GC Total: 71.389255 Alloc since last GC: 9.345766, Count: 752 per alloc: 0.011788 Mark: 26.901378, Sweep: 43.113534 Finalize Obj: 7.819234, Doubles: 3.657568 Destroy: 3.910578, Count: 0, per Dest: 0.000000 never release Chunk Clock: GC Total: 61.238915 Alloc since last GC: 0.000000, Count: 0 per alloc: 0.000000 Mark: 19.263335, Sweep: 40.673392 Finalize Obj: 36.356200, Doubles: 0.001970 Destroy: 2.414596, Count: 0, per Dest: 0.000000 Canopy: GC Total: 53.569553 Alloc since last GC: 0.000000, Count: 0 per alloc: 0.000000 Mark: 27.715081, Sweep: 24.418610 Finalize Obj: 8.086296, Doubles: 3.615567 Destroy: 3.709132, Count: 0, per Dest: 0.000000

Gregor Wagner [:gwagner]

Assignee

Comment 45

•

•

16 years ago

Attached patch background mmap/munmap — Details — Splinter Review

This patch implements background allocation and release of GC chunks using mmap/munmap.

Attachment #424209 - Attachment is obsolete: true

Attachment #424219 - Attachment is obsolete: true

Igor Bukanov

Comment 52

•

16 years ago

Attached patch background oversized malloc and free — Details — Splinter Review

This patch replaces mmap/munmap for chunks with oversized malloc. This wastes about one arena per chunk, but the plus is that any released chunk is immediately available for other caller of malloc. As with the previous patch, both the allocation and release of chunks is done in the background.

Igor Bukanov

Comment 53

•

16 years ago

Attached patch foreground oversized malloc + background free — Details — Splinter Review

The new patch effectively disables the background allocation in the previous patch so all chunk allocations is done using straight oversized malloc call.

Igor Bukanov

Comment 54

•

16 years ago

Here are the benchmark results on Linux for 4-core Xeon X3353 2.66 GHz CPU for the last 3 patches compared TM tip (revison ce654228dabe). I have used jemalloc-enabled xpcshell to stay as close to a browser as possible. The table lists the speed up factors compared with the base, so a number less than 1 is a slowdown . susnspider alloc1 free1 alloc2 free2 background mmap/munmap 1.000 0.84 1.14 1.00 2.14 background malloc/free 1.016 0.91 1.14 1.01 2.14 malloc/background free 1.011 1.02 1.14 0.98 2.14 Here alloc1 and free1 are numbers for the benchmark from the comment 41 and alloc2 and free2 are for the benchmark from the comment 29. In the first benchmark there is a 50% mix of GC and malloc allocations. In the second benchmark only GC allocations are exercised. The numbers indicates that background munmap/free makes things better accross all patches. The pure allocation measurements shows a slowdown with background mmap/malloc done on separated thread. One possibility for that could be an extra memory bandwidth that is required to populate caches on two threads. To explain sunspider results that shows a win for malloc over mmap one has to remember that after each test the sunspider driver runs the gc(). When that returns the background thread would continue to call the free. It seems that such free improves malloc performance on another thread. I do not have a reason for that, but I have observed that effect with jemalloc and GLIBC-malloc in other benchmarks and patches. From code simplicity point of view on Linux the winner is oversized malloc done in foreground with background free. It wastes some space, but some of that can be recovered with moving JSGCArenInfo there.

Gregor Wagner [:gwagner]

Assignee

Comment 55

Comment 61

•

•

15 years ago

Igor, Do you have a bug# for the background finalization you are speaking about? If so are you in active development or is this an idea for future. ETA? Thanks.

Gregor Wagner [:gwagner]

Assignee

Comment 68

•

15 years ago

(In reply to comment #66) > (In reply to comment #65) > > In an embedding with many threads (where GC runs way more often) this patch > > will probably make a big difference. I'm not sure if background finalization > > will have the same impact. > > I am not against proposed changes. I just would like to do them after > implementing the background finalization as that may require to rewrite any > proposed patch here. When will this patch be ready? So if I understand correctly you want to move the finalization and chunk release to the background thread. This means for the canopy benchmark for example that we free and alloc the same amount of memory in parallel. I don't think this will be very fast. Didn't we see the impact on the GC benchmark we had to remove from the test-suite?

Igor Bukanov

Comment 69

•

15 years ago

(In reply to comment #67) > Igor, > > Do you have a bug# for the background finalization you are speaking about? > If so are you in active development or is this an idea for future. > ETA? Bug 543036

Igor Bukanov

Comment 70

•

15 years ago

(In reply to comment #68) > When will this patch be ready? For doubles (bug 543036) I should have the patch before Monday. Strings shold be ready during the next week. > This means for the canopy benchmark for example that we free and alloc the same > amount of memory in parallel. We already do the free/malloc in parallel. But this is still faster then free from one thread as it minimizes the GC pause and a possible slowdown is spread over all allocations. Still for the GC chunks we may need to do something special. But then again, lets consider this after the background changes.

Gregor Wagner [:gwagner]

Assignee

Comment 71

•

15 years ago

Attached image Tip: Canopy — Details

Added GCChunk count.

Gregor Wagner [:gwagner]

Assignee

Comment 72

•

15 years ago

Attached image Canopy with this patch. — Details

Added GCChunk count.

Igor Bukanov

Comment 73

•

15 years ago

In the bug 553812 I am going to bump the chunk size to 2 MB among other refactorings. The patch will also use vm_allocate on MAC. That will alter the stats here.

Depends on: 553812

Igor Bukanov

•

•

•

15 years ago

(In reply to comment #97) > (In reply to comment #96) > > > > So as a way forward I would like first to implement the bug 557538 > > How complex is that bug? Do you have an estimate of when you'll get to it? I have a patch but that is build on top of the bug 553812. I guess I can rebase it and ask for an review tomorrow.

Gregor Wagner [:gwagner]

Assignee

Comment 99

•

15 years ago

(In reply to comment #98) > (In reply to comment #97) > > (In reply to comment #96) > > > > > > So as a way forward I would like first to implement the bug 557538 > > > > How complex is that bug? Do you have an estimate of when you'll get to it? > > I have a patch but that is build on top of the bug 553812. I guess I can rebase > it and ask for an review tomorrow. How are these two bugs related? How does the deallocation and reallocation of chunks for the Mac browser get optimized? Is the idea to reimplement this idea with delayed chunk returns in an outside defined callback?

Igor Bukanov

Comment 100

•

15 years ago

(In reply to comment #99) > How are these two bugs related? How does the deallocation and reallocation of > chunks for the Mac browser get optimized? Is the idea to reimplement this idea > with delayed chunk returns in an outside defined callback? The current patch needs integration with the allocation code to release pooled GC chunks on memory pressure especially given that the browser has switched to use malloc that never returns null. One or another way this has to be implemented and implementing the chunk management together with the code that implements infallible malloc makes more sense. So yes, I would like to move the the pooling outside the engine via a callback.

Mike Shaver (:shaver emeritus)

Comment 101

•

15 years ago

Does it get harder to do that if this patch has landed? It seems like it will change our GC timing quite a bit, in pleasant ways, and that will let us start to get data on the next hurdles to tackle. If landing this patch doesn't represent a regression for the browser, and it doesn't make moving to a callback model massively more difficult, I would be very much in favour of landing it, getting the baseline win, and gathering more data while the refactoring and jemalloc chunk-sharing patches wend their way through the system. We have a patch here with a lot of good data, and it seems like we've stalled on it mostly because we think there's something even *better* in the future. I think we should try to avoid that, especially where we're talking about a major win on a major pain point!

Gregor Wagner [:gwagner]

Assignee

Comment 102

•

15 years ago

Attached patch patch (obsolete) — Details — Splinter Review

bugfix.

Attachment #423842 - Attachment is obsolete: true

Comment 108

•

15 years ago

Igor, I would like to see profile data for #107. Especially after your change to use 1MB chunks, we have very few chunks (8-16 in a browser session, maybe). I have a hard time believing that can measurably impact TLB performance. Lets optimize real performance problems (like the one the patch does), not imaginary ones. Even if this was a measurable problem (I am confident its not), lets take a follow-up patch. I think we have given Gregor enough run-around.

Brendan Eich [:brendan]

Comment 109

•

15 years ago

The data for comment 107 came from bug 550373 comment 18, a synthetic benchmark. Indeed we need to study real Firefox GC chunk populations including with hundreds of tabs open (8-16 in a browser session? not with that many tabs, I suspect -- but let's measure). A bug over 100 comments with a patch that wins good performance, with synthetic benchmarks wanting a data structure change, suggests doing the data structure change in a followup bug. And doing the real Firefox population measurements. Usually you can do such measurements by hand. For this and many other studies, we also have TestPilot. We should consider using it. /be

Gregor Wagner [:gwagner]

Assignee

Comment 110

•

15 years ago

Attached patch patch (obsolete) — Details — Splinter Review

new version due to changes in GC code. Added vector for empty chunks.

Attachment #437774 - Attachment is obsolete: true

Andreas Gal :gal

Comment 111

•

15 years ago

Comment on attachment 438668 [details] [diff] [review] patch >+ >+ js::Vector<JSGCChunkInfo*, 0, js::SystemAllocPolicy> gcEmptyChunkVector; Nit: I suggest gcEmptyChunks. We don't use Hungarian notation around here :) >+static const jsuword GC_EMPTY_CHUNK_SURVIVES = 3; >+ Nit: I don't much like "survives". GC_EMPTY_CHUNK_RESERVE = 3 maybe? Its also overloaded. RESERVOIR maybe? Pick something. >+ size_t gcSurvived; Nit: Same here. Name sucks. I suggest something more expressive. > JSGCChunkInfo *ci = rt->gcChunkList; >+ >+ if (!ci && !rt->gcEmptyChunkVector.empty()) { >+ ci = rt->gcEmptyChunkVector.back(); >+ rt->gcEmptyChunkVector.popBack(); >+ JS_ASSERT(ci); >+ ci->gcSurvived = 0; >+ ci->init(rt); >+ } >+ > jsuword chunk; > if (!ci) { Merge these conditions somehow. This seems redundant if you hit the fast path above. > if (ci->numFreeArenas == GC_ARENAS_PER_CHUNK) { > ci->removeFromList(rt); >- ci->next = emptyChunkList; >- emptyChunkList = ci; >+ rt->gcEmptyChunkVector.append(ci); Not setting gcSurvived = 0 here?

Andreas Gal :gal

Comment 112

•

15 years ago

RESERVOIR comment above is nonsense of course. So looks like thats an age counter? Or generations?

Gregor Wagner [:gwagner]

Assignee

Comment 113

•

15 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Age is better. Now with append failure handling.

Attachment #438668 - Attachment is obsolete: true

Igor Bukanov

Comment 114

•

15 years ago

(In reply to comment #113) > Igor, I would like to see profile data for #107. Especially after your change > to use 1MB chunks, we have very few chunks (8-16 in a browser session, maybe). I was referring to pre-big chunk state of things. > I have a hard time believing that can measurably impact TLB performance. There is code simplicity argument also. Vector is just more suitable data structure for doing appending/bulk removal than the list. I should have seen that long time ago. See the bug 559141 which eliminates the doubly-linked list of chunks based on your suggestion. As another proof consider that doing the delayed chunk release on top of that patch would mean adding few lines of code to chunk scanning loop without the need to add any helper methods.

Andreas Gal :gal

•

15 years ago

Comment on attachment 438919 [details] [diff] [review] patch >+ if (!rt->gcEmptyChunks.empty()) { >+ ci = rt->gcEmptyChunks.back(); >+ rt->gcEmptyChunks.popBack(); >+ JS_ASSERT(ci); >+ chunk = ci->getChunk(); Get a reference straight to gcEmptyChunks if you talk to it repeatedly to avoid the rt-> dereference every time. Its also easier to read. >+ ci->gcChunkAge = 0; >+ if (!rt->gcEmptyChunks.append(ci)) { >+ jsuword chunk = ci->getChunk(); >+ PutGCChunk(rt, (void *)chunk); >+ } PutGCChunk is the worst name in the world. Please make that something else. Free or release. Anything. >+void DestroyEmptyGCChunks(JSRuntime *rt, bool releaseAll) >+{ >+ size_t newLength = 0; >+ JSGCChunkInfo **array = rt->gcEmptyChunks.begin(); >+ size_t length = rt->gcEmptyChunks.length(); >+ >+ for(size_t i = 0; i < length; ++i){ >+ JSGCChunkInfo *ci = array[i]; The proper pattern is probably for (JSGCChunkInfo *ci = ->begin(); ci != ->end(); ++ci) ^ also space here >+ ci->gcChunkAge++; >+ array[newLength++] = ci; You can keep this one I guess. r=me with nits picked at your leisure.

Attachment #438919 - Flags: review+

Gregor Wagner [:gwagner]

Assignee

Comment 119

•

15 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Attachment #438919 - Attachment is obsolete: true

Gregor Wagner [:gwagner]

Assignee

Comment 120

•

15 years ago

Attached patch patch — Details — Splinter Review

white space fix

Attachment #438929 - Attachment is obsolete: true

Gregor Wagner [:gwagner]

Assignee

Comment 121

•

15 years ago

http://hg.mozilla.org/tracemonkey/rev/15955e3513e0 r=gal

Gregor Wagner [:gwagner]

Assignee

Updated

•

15 years ago

Whiteboard: fixed-in-tracemonkey

Robert Sayre

Comment 122

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/15955e3513e0

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

David Mandelin [:dmandelin]

Updated

•

15 years ago

Depends on: 604676

Gregor Wagner [:gwagner]

Assignee

Updated

•

15 years ago

No longer depends on: 604676

WiP 16 years ago Gregor Wagner [:gwagner] 4.05 KB, patch		Details \| Diff \| Splinter Review
patch 16 years ago Gregor Wagner [:gwagner] 4.07 KB, patch		Details \| Diff \| Splinter Review
hack to restore posix_memalign 16 years ago Igor Bukanov 1.43 KB, patch		Details \| Diff \| Splinter Review
memalign + free on background thread 16 years ago Igor Bukanov 5.05 KB, patch		Details \| Diff \| Splinter Review
memalign + free on background thread v2 16 years ago Igor Bukanov 5.82 KB, patch		Details \| Diff \| Splinter Review
background munmap 16 years ago Igor Bukanov 5.07 KB, patch		Details \| Diff \| Splinter Review
patch 16 years ago Gregor Wagner [:gwagner] 9.37 KB, patch		Details \| Diff \| Splinter Review
background mmap/munmap 16 years ago Igor Bukanov 30.08 KB, patch		Details \| Diff \| Splinter Review
background oversized malloc and free 16 years ago Igor Bukanov 36.94 KB, patch		Details \| Diff \| Splinter Review
foreground oversized malloc + background free 16 years ago Igor Bukanov 36.96 KB, patch		Details \| Diff \| Splinter Review
Current Tip Canopy 15 years ago Gregor Wagner [:gwagner] 6.99 KB, image/png		Details
Delayed Return of GC Chunks on Mac 15 years ago Gregor Wagner [:gwagner] 5.95 KB, image/png		Details
Tip: Canopy 15 years ago Gregor Wagner [:gwagner] 8.83 KB, image/png		Details
Canopy with this patch. 15 years ago Gregor Wagner [:gwagner] 9.48 KB, image/png		Details
Canopy with patch from bug 553812 15 years ago Gregor Wagner [:gwagner] 8.89 KB, image/png		Details
Clock Benchmark Tip 15 years ago Gregor Wagner [:gwagner] 9.60 KB, image/png		Details
Clock Benchmark with patch from bug 553812 15 years ago Gregor Wagner [:gwagner] 9.60 KB, image/png		Details
Clock Benchmark with delayed chunk deallocation 15 years ago Gregor Wagner [:gwagner] 10.15 KB, image/png		Details
patch 15 years ago Gregor Wagner [:gwagner] 9.52 KB, patch		Details \| Diff \| Splinter Review
patch 15 years ago Gregor Wagner [:gwagner] 7.66 KB, patch		Details \| Diff \| Splinter Review
patch 15 years ago Gregor Wagner [:gwagner] 8.16 KB, patch		Details \| Diff \| Splinter Review
patch 15 years ago Gregor Wagner [:gwagner] 8.24 KB, patch	gal : review+	Details \| Diff \| Splinter Review
patch 15 years ago Gregor Wagner [:gwagner] 9.04 KB, patch		Details \| Diff \| Splinter Review
patch 15 years ago Gregor Wagner [:gwagner] 9.03 KB, patch		Details \| Diff \| Splinter Review