Closed Bug 656261 Opened 11 years ago Closed 11 years ago
better GC arena layout
The current GC arena layout when we align things from the start of the arena implies that we may have two holes in the areana. The first one is between the arena header and the things start when sizeof(thing) % sizeof(ArenaHeader) != 0. The second one is at the end of the arena when ArenaSize % sizeof(thing) != 0. If we align towards the end of the arena with the last thing ending exactly at the end of the arena, then only one hole between ArenaHeader and the first thing would exist. When sizeof(ArenaHeader) <= ArenaSize % sizeof(thing), this would allow to fit an extra thing to the arena. For example, on 32 bit it allows to fit an extra shape and most of the object kinds into the arena providing a 1-4% win in memory utilization. This layout also simplifies implementation for the bug 601075 as there the bump allocator would be able to detect the end of the things to allocate by testing for the arena end.
The patch changes the layout of the arena to move things towards its end. The patch also makes the arena a non-template class to simplify things and using templates only in places where they benefits. In particular, the type specialization is used only during the marking and finalization. The allocation code is specialized on the thing size as it does not depend on the nature of things to be allocated.
Assignee: general → igor
The new version updates the comments and has more preparations for the bug 601075.
Comment on attachment 532188 [details] [diff] [review] v2 Review of attachment 532188 [details] [diff] [review]: ----------------------------------------------------------------- Great patch. I really like the direction it moves us in. It would be nice to remove as much parametrization as possible. It seems unlikely to me that removing the template on MarkDelayedChildren would hurt performance. That would get rid of one big switch in the caller. Given that you updated the template for BuildFreeList, I'm guessing that you tried to remove it and it regressed performance. If so, could you post the benchmarks? Eventually, I think we should try removing the template on conservative marking and see what happens. I tend to agree with you that the template on finalization will have to stay. ::: js/src/jsgc.cpp @@ +186,5 @@ > FreeCell **tailp = &freeList; > bool allClear = true; > > + uintptr_t thing = thingsStart(sizeof(T)); > + uintptr_t end = thingsEnd(); I don't really see much benefit to switching to uintptr_t here. It allows you to remove a few calls to asFreeCell, but I find those cleaner than what's going on here. @@ -1546,5 @@ > } > > template<typename T> > static void > MarkDelayedChilderen(JSTracer *trc, ArenaHeader *aheader) I just happened to notice this: should be MarkDelayedChildren. ::: js/src/jsgc.h @@ +190,1 @@ > * Slight change: An arena is 4K in size and 4K-aligned. It starts with the ArenaHeader descriptor followed by some pad bytes. The remainder of the arena is filled with the array of T things. The pad bytes ensure that the thing array ends exactly at the end of the arena.
Attachment #532188 - Flags: review?(wmccloskey) → review+
(In reply to comment #3) > Given that you updated the template for BuildFreeList, I'm guessing that you > tried to remove it and it regressed performance. If so, could you post the > benchmarks? The new version is effectively the code GCC has been generating before the patch. But for some reason after the switch to the size template in earlier versions of the patch the compiler stopped doing that and the generated code was not optimal. That resulted in a tiny but visible regression in callgrind. I plan to reconsider the need for templates during Refill after the bug 601075. That bug removes BuildFreeList and it would be easier to avoid any performance regression with non-template versions while avoiding the thing kind switch in the Refill. > > Eventually, I think we should try removing the template on conservative > marking and see what happens. I tend to agree with you that the template on > finalization will have to stay. I am not sure about that. Despite all the claims about how fast the modern CPU at calculating stuff, the division and remainder operations are expensive on x86. Yet this is what the conservative GC would need without the thing kind switch to replace. So we need to measure that. > > ::: js/src/jsgc.cpp > @@ +186,5 @@ > > FreeCell **tailp = &freeList; > > bool allClear = true; > > > > + uintptr_t thing = thingsStart(sizeof(T)); > > + uintptr_t end = thingsEnd(); > > I don't really see much benefit to switching to uintptr_t here. It allows > you to remove a few calls to asFreeCell, but I find those cleaner than > what's going on here. This is not necessary here but in bug 601075 I also need to bit-manipulate the pointers. With that the casts becoming even more ugly and helper functions like thing->address() destructs. I suppose using a wrapper class for pointer may help here, but it adds source complexity.
This is v2 + fixed comments + removal of the thing kind switch in MarkedDelayedChidren. I tried a wrapper class for GC thing ptr to avoid using an explicit uintptr_t as a pointer. But that added a lot of code for rather unclear benefits. I also tried to hide casts between uintptr_t and various types behind a helper functions like AsCell(uintptr_t), but it the readability improvements were too small to bother IMO.
The prev attachment had a wrong patch.
Comment on attachment 533291 [details] [diff] [review] v3 for real Review of attachment 533291 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, thanks.
Attachment #533291 - Flags: review?(wmccloskey) → review+
An attempt to land the patch showed a regression in jsreftest in a browser on Windows and in dromaeo benchmark. I could reproduce the latter in debug build with optimizations on Linux. Apparently when the cycle collector navigates the GC graph it see a gray object with an a unmarked, finalized shape in obj->lastProp.
I must be smoking something when I called MarkKind in MarkDelayedChildren. Of cause the thing must be marked at that point and it is necessary to mark its children. So the new version uses JS_TraceChildren for that. But that points out that MarkDelayedChildren is used during dromaeo benchmark. Perhaps we should tune our stack sizes?
Sounds good. However, we should probably also add a test to jstests that actually exercises delayed marking in a way that causes the unfixed patch to crash. Maybe we should also think about a GC zeal-like option that causes us to do delayed marking more often.
(In reply to comment #10) > Sounds good. However, we should probably also add a test to jstests that > actually exercises delayed marking in a way that causes the unfixed patch to > crash. A am not going to land this without such test. But it is involved. > > Maybe we should also think about a GC zeal-like option that causes us to do > delayed marking more often. Something like that would be nice indeed. Even better would be an option to change the size of the mark stacks at runtime in debug builds, but this is "better is an enemy of good" sort of things.
The new version updates IterateCompartmentCells for the layout chnages and removes the switch an template there. In addition it adds a regression test for the problem from the comment 9 and extends the shell with internalConst function to avoid hard-coding the constant in the test.
cdleary-bot mozilla-central merge info: http://hg.mozilla.org/mozilla-central/rev/869479a8d3c8 http://hg.mozilla.org/mozilla-central/rev/e67746e7febd (backout) Note: not marking as fixed because last changeset is a backout.
Comment on attachment 534279 [details] [diff] [review] v5 Review of attachment 534279 [details] [diff] [review]: ----------------------------------------------------------------- Looks great. Sorry for the delay; I was out of town last week.
Attachment #534279 - Flags: review?(wmccloskey) → review+
Looks like this relanded: http://hg.mozilla.org/mozilla-central/rev/b0d93728d58c http://hg.mozilla.org/mozilla-central/rev/278195e0f9f9 (warning fix) Does that make the bug fixed-in-tracemonkey and FIXED?
I forgot to add fixed-in-tracemonkey when landing the patch.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.