Closed
Bug 553812
Opened 15 years ago
Closed 15 years ago
separate GC arena info and mark bits from the arena
Categories
(Core :: JavaScript Engine, enhancement)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: igor, Assigned: igor)
References
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(1 file, 8 obsolete files)
92.32 KB,
patch
|
igor
:
review+
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #550373 comment 18 +++
As the data from the bug 550373 comment 18 the sweeping of doubles is dominated by the TLB and L2 cache misses as the CPU has to access arena info and mark bitmap stored together with dead double values. If we separate the info and mark bitmap storage from the data itself, we can avoid most TLB misses as the info would be packed into dense array. That should also benefit the marking phase as for already marked objects the would be no need to populate the TLB cache.
To implement that I suggest to follow jemalloc and use big (2MB by default) chunks aligned on the chunk size. The arena info and the mark bitmap will be stored at the beginning of the chunk with arena coming after that.
Comment 1•15 years ago
|
||
If we can figure out how to obtain a 2MB Jumbo page from the OS, that should significantly reduce TLB pressure. I assume most OSes run with PAE enabled these days.
Comment 2•15 years ago
|
||
None of the 32-bit desktop Windows OSes (XP, Vista32, Win7 32) run in PAE mode, unless you hack the server version of a kernel module into them and edit your boot config. True story!
Comment 3•15 years ago
|
||
I lied. PAE is supported, though the maximum RAM is still restricted to 4GB in those cases, no doubt for very important consumer protection and security reasons!
http://en.wikipedia.org/wiki/Physical_Address_Extension#Microsoft_Windows
Comment 4•15 years ago
|
||
NX is only available in PAE mode, so everyone is or should be using it. Not sure whether everyone exposes jumbo pages to user space though, hence comment #1.
Comment 5•15 years ago
|
||
Windows exposes large pages to user space, but the memory is non-pageable and I think it requires some privilege elevation, so perhaps not what we want.
We can get the alignment via _aligned_malloc, I believe, on Windows. Their sample code shows 2^16 alignment, but it would be easy to test if we could do better. That wouldn't help with the TLB issue, of course. :-/
Comment 6•15 years ago
|
||
Igor, could we get some data comparing TLB stalls to L2 stalls? Should be pretty easy to get with shark. If L2 stalls dominate, we don't have to worry about the jumbo page business too much, just provide better spacial locality.
Assignee | ||
Comment 7•15 years ago
|
||
(In reply to comment #5)
> We can get the alignment via _aligned_malloc, I believe, on Windows. Their
> sample code shows 2^16 alignment, but it would be easy to test if we could do
> better.
_aligned_malloc(size+alignment) on Windows is just a wrapper around malloc that calls malloc(size+alignment+1) and then does the alignment.
For big aligned chunks aligned on their size it is necessary to do what jemalloc uses. That is, it does repeated mmap/munmup to get the proper alignment, see http://hg.mozilla.org/tracemonkey/file/1fd3836fc778/memory/jemalloc/jemalloc.c#l2365 .
Assignee | ||
Comment 8•15 years ago
|
||
(In reply to comment #6)
> Igor, could we get some data comparing TLB stalls to L2 stalls?
On Linux cachegrind does not collect TLB stats AFAICS. It shows only L1 and L2 misses. Still even with that it show that for a benchmark like
function test() {
var N = 4e6;
var array = [];
for (var i = 0; i != N; ++i) {
array.push(i + 0.1);
}
array = null;
gc();
}
test();
there are 2 sources of L1/L2 misses. First, a->info.hasInfoDoubles misses the cache when it checks for free doubles. Second, the arena release loop in DestroyGCArenas provides almost the same penalty. Apparently JSGCArenaInfo does not stay even in L2 between the double sweeping loop and arena release loop.
> If L2 stalls dominate, we don't have to worry
> about the jumbo page business too much, just provide better spacial locality.
Providing better space locality improves both TLB and L2. Consider both
Assignee | ||
Comment 9•15 years ago
|
||
The patch just borrows the code from jemalloc to allocate aligned big chunks and switches the GC to use the big chunks. Separating arena info and mark bits from the arena itself is coming next.
Assignee | ||
Comment 10•15 years ago
|
||
The patch implements separation of the GC thing from the mark bitmap and JSGCArenaInfo. It seems to pass the tests, but it completely lucks the comments on the new layout. With benchmarks that test GC performance with the heap consisting of mostly garbage double values I observed with the patch up to 30% reduction in GC time.
Attachment #433794 -
Attachment is obsolete: true
Assignee | ||
Comment 11•15 years ago
|
||
The new version updates the comments and uses tighter layout for chunk info. I will submit it for review after some testing.
Attachment #434381 -
Attachment is obsolete: true
Assignee | ||
Comment 12•15 years ago
|
||
The patch improves sunspider timing on Linux by 0.8% on with Atom N270 1.6GHz and by 1.1% on 4-core Xenon X3353 2.6 GHz CPU. For v8 the numbers are zero (no changes on Atom) and 1.8% (measured as the best score among 10 runs).
Comment 13•15 years ago
|
||
nice work!
Assignee | ||
Comment 14•15 years ago
|
||
The new version moves the fields related to delay marking from JSGCArenaInfo into another structure that is allocated separately in the chunk. This makes the size of JSGCArenaInfo a power of 2 and avoids multiplication when locating the info from GC things.
Attachment #435322 -
Attachment is obsolete: true
Attachment #435410 -
Flags: review?(brendan)
Comment 15•15 years ago
|
||
Comment on attachment 435410 [details] [diff] [review]
v3
Will review tomorrow (Monday 29-March-2010).
/be
Comment 16•15 years ago
|
||
Comment on attachment 435410 [details] [diff] [review]
v3
I'm dealing with movers today -- could Andreas and Gregor do this review? My only nit is the license comment needs to cite The Mozilla Foundation as "Initial Deveoper", and Andreas under "Contributor(s)".
/be
Updated•15 years ago
|
Attachment #435410 -
Flags: review?(gal)
Attachment #435410 -
Flags: review?(brendan)
Attachment #435410 -
Flags: review?(anygregor)
Comment 17•15 years ago
|
||
Comment on attachment 435410 [details] [diff] [review]
v3
>+static void *pages_map(void *addr, size_t size);
>+static void pages_unmap(void *addr, size_t size);
Nit: Remove extra whitespace please.
>+
>+#ifdef XP_WIN
>+# ifdef JS_GC_USES_MAP_ALIGN
>+
>+#define ALIGN_ADDR2OFFSET(al, ad) \
>+ ((uintptr_t)ad & (al - 1))
>+
>+static void *
>+pages_map_align(size_t size, size_t alignment)
>+{
>+ JS_ASSERT(size % alignment == 0);
>+ JS_ASSERT(size >= alignment);
Nit: These are tabs. Please de-tab-ify the entire file if you don't mind. I think we usually try to stick to space (I might be wrong).
>+ void *ret = pages_map(NULL, size);
>+ int offset = ALIGN_ADDR2OFFSET(alignment, ret);
>+ if (offset) {
>+ /* try to over allocate by the ammount we're offset */
>+ void *tmp;
>+ pages_unmap(ret, size);
>+ tmp = VirtualAlloc(NULL, size + alignment - offset,
>+ MEM_RESERVE, PAGE_NOACCESS);
>+ if (offset == ALIGN_ADDR2OFFSET(alignment, tmp))
>+ ret = VirtualAlloc((void*)((intptr_t)tmp + alignment
>+ - offset), size, MEM_COMMIT,
>+ PAGE_READWRITE);
>+ else
>+ VirtualFree(tmp, 0, MEM_RELEASE);
>+ offset = ALIGN_ADDR2OFFSET(alignment, ret);
>+
>+ if (offset) {
>+ /* over allocate to ensure we have an aligned region */
>+ ret = VirtualAlloc(NULL, size + alignment, MEM_RESERVE,
>+ PAGE_NOACCESS);
>+ offset = ALIGN_ADDR2OFFSET(alignment, ret);
>+ ret = VirtualAlloc((void*)((intptr_t)ret +
>+ alignment - offset),
>+ size, MEM_COMMIT, PAGE_READWRITE);
>+ }
>+ }
>+ return (ret);
>+}
>+# endif /* JS_GC_USES_MAP_ALIGN */
>+
>+static void *
>+pages_map(void *addr, size_t size)
>+{
>+ void *ret = NULL;
>+#if defined(WINCE) && !defined(MOZ_MEMORY_WINCE6)
>+ void *va_ret;
>+ JS_ASSERT(addr == NULL);
>+ va_ret = VirtualAlloc(addr, size, MEM_RESERVE, PAGE_NOACCESS);
>+ if (va_ret)
>+ ret = VirtualAlloc(va_ret, size, MEM_COMMIT, PAGE_READWRITE);
>+ JS_ASSERT(va_ret == ret);
>+#else
>+ ret = VirtualAlloc(addr, size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
>+#endif
>+ return (ret);
>+}
>+
>+static void
>+pages_unmap(void *addr, size_t size)
>+{
>+ if (VirtualFree(addr, 0, MEM_RELEASE) == 0) {
>+#if defined(WINCE) && !defined(MOZ_MEMORY_WINCE6)
>+ if (GetLastError() == ERROR_INVALID_PARAMETER) {
>+ MEMORY_BASIC_INFORMATION info;
>+ VirtualQuery(addr, &info, sizeof(info));
>+ if (VirtualFree(info.AllocationBase, 0, MEM_RELEASE))
>+ return;
>+ }
>+#endif
>+ abort();
Are we sure we want an abort here? The SM-way seems to be to report an error back to the caller and deal with it there.
>+ kern_return_t err = vm_deallocate((vm_map_t)mach_task_self(),
>+ (vm_address_t)addr,
>+ (vm_size_t)size);
>+ if (err != KERN_SUCCESS)
>+ abort();
Dito. If this is a totally unexpected error, I would just assert on it, or report error flag back.
>+#if defined(MOZ_MEMORY_WINCE) && !defined(MOZ_MEMORY_WINCE6)
>+const size_t GC_CHUNK_SHIFT = 21;
>+#else
>+const size_t GC_CHUNK_SHIFT = 20;
>+#endif
Why the distinction? (not objecting, just curious).
>+ * All chunks that have at least one free arena are put on the doubly-linked
>+ * list with the head stored in JSRuntime.gcChunkList. JSGCChunkInfo contains
>+ * the head of the chunk's free arena list together with the link fields for
>+ * gcChunkList.
>+ * arena must be divisible by GC_CELL_SIZE, the minima allocation unit, and
minimal.
>+ * The number of arenas in the chunk is given by GC_ARENAS_PER_CHUNK. We find
>+ * that number as the following. Suppose chunk contains n arenas. Together
as follows.
>+ * with the word-aligned free arena bitmap and JSGCChunkInfo they should fit
fit into the
>-JS_STATIC_ASSERT(sizeof(jsbitmap) == sizeof(jsuword));
>+/* See comments before ThingsPerUnmarkedBit below. */
>+struct JSGCMarkingDelay {
>+ JSGCArena *link;
>+ jsuword unmarkedChildren;
>+};
I am going to have to rebase my patch to remove this.
>+/* Can only be applicable to arena where a->list is not null. */
Can only be called if thing is on an arena where a->list is not null.
>+/* Can only be applicable to arena where a->list is not null. */
>+ * with some delayed things. This should be fine as any GC graph
>+ * marking/traversing hooks must allow repeated calls during the same GC cycle. In particular, xpcom cycle collector relies on
Overlong line.
Assignee | ||
Comment 18•15 years ago
|
||
(In reply to comment #17)
> Dito. If this is a totally unexpected error, I would just assert on it, or
> report error flag back.
The error is unexpected (currently the GC assumes that munmap always succeeds). I will update the patch with asserts.
>
> >+#if defined(MOZ_MEMORY_WINCE) && !defined(MOZ_MEMORY_WINCE6)
> >+const size_t GC_CHUNK_SHIFT = 21;
> >+#else
> >+const size_t GC_CHUNK_SHIFT = 20;
> >+#endif
>
> Why the distinction? (not objecting, just curious).
This is what jemalloc does and comes from bug 478044. I am not sure why it was necessary.
Assignee | ||
Comment 19•15 years ago
|
||
Attachment #435410 -
Attachment is obsolete: true
Attachment #435955 -
Flags: review?
Attachment #435410 -
Flags: review?(gal)
Attachment #435410 -
Flags: review?(anygregor)
Assignee | ||
Comment 20•15 years ago
|
||
Comment on attachment 435955 [details] [diff] [review]
v4
The new version fixes above nits
Attachment #435955 -
Flags: review?(anygregor)
Comment 21•15 years ago
|
||
Comment on attachment 435955 [details] [diff] [review]
v4
>+
>+#ifdef XP_WIN
>+# include <windows.h>
>+
>+# ifdef _MSC_VER
>+# pragma warning( disable: 4267 4996 4146 )
>+# endif
>+
>+# if defined(WINCE) && !defined(MOZ_MEMORY_WINCE6)
>+/* Required for Windows CE < 6 */
>+# define JS_GC_USES_MAP_ALIGN 1
>+#endif
nit: indentation?
>+
>+#elif defined(XP_MACOSX) || defined(DARWIN)
>+
>+# include <libkern/OSAtomic.h>
>+# include <mach/mach_error.h>
>+# include <mach/mach_init.h>
>+# include <mach/vm_map.h>
>+# include <malloc/malloc.h>
>+
>+#elif defined(XP_UNIX) || defined(XP_BEOS)
>+
>+# include <unistd.h>
>+# include <sys/mman.h>
>+
>+# ifndef MAP_NOSYNC
>+# define MAP_NOSYNC 0
>+# endif
>+
>+/* Required on Solaris 10. Might improve performance elsewhere. */
>+# if defined(SOLARIS) && defined(MAP_ALIGN)
>+# define JS_GC_USES_MAP_ALIGN 1
>+# endif
>+
>+#endif
>+
>+#ifndef JS_GC_USES_MAP_ALIGN
>+# define JS_GC_USES_MAP_ALIGN 0
>+#endif
>+
>+static void *pages_map(void *addr, size_t size);
>+static void pages_unmap(void *addr, size_t size);
>+
>+#ifdef XP_WIN
>+# ifdef JS_GC_USES_MAP_ALIGN
>+
>+#define ALIGN_ADDR2OFFSET(al, ad) \
>+ ((uintptr_t)ad & (al - 1))
indentation?
>+
>+static void *
>+pages_map_align(size_t size, size_t alignment)
>+{
>+ JS_ASSERT(size % alignment == 0);
>+ JS_ASSERT(size >= alignment);
>+ void *ret = pages_map(NULL, size);
>+ int offset = ALIGN_ADDR2OFFSET(alignment, ret);
>+ if (offset) {
>+ /* try to over allocate by the ammount we're offset */
nit: amount
>+ void *tmp;
>+ pages_unmap(ret, size);
please add a comment about the difference of the 2 Virtual Alloc calls
the reserve and commit is hard to see.
What is the problem with using _aligned_malloc?
>+ tmp = VirtualAlloc(NULL, size + alignment - offset,
>+ MEM_RESERVE, PAGE_NOACCESS);
>+ if (offset == ALIGN_ADDR2OFFSET(alignment, tmp))
>+ ret = VirtualAlloc((void*)((intptr_t)tmp + alignment
>+ - offset), size, MEM_COMMIT,
>+ PAGE_READWRITE);
>+ else
>+ VirtualFree(tmp, 0, MEM_RELEASE);
>+ offset = ALIGN_ADDR2OFFSET(alignment, ret);
>+
>+ if (offset) {
>+ /* over allocate to ensure we have an aligned region */
>+ ret = VirtualAlloc(NULL, size + alignment, MEM_RESERVE,
>+ PAGE_NOACCESS);
>+ offset = ALIGN_ADDR2OFFSET(alignment, ret);
>+ ret = VirtualAlloc((void*)((intptr_t)ret +
>+ alignment - offset),
>+ size, MEM_COMMIT, PAGE_READWRITE);
>+ }
>+ }
>+ return (ret);
>+}
>+# endif /* JS_GC_USES_MAP_ALIGN */
>+
>+static void *
>+pages_map(void *addr, size_t size)
>+{
>+ void *ret = NULL;
>+#if defined(WINCE) && !defined(MOZ_MEMORY_WINCE6)
>+ void *va_ret;
>+ JS_ASSERT(addr == NULL);
>+ va_ret = VirtualAlloc(addr, size, MEM_RESERVE, PAGE_NOACCESS);
>+ if (va_ret)
>+ ret = VirtualAlloc(va_ret, size, MEM_COMMIT, PAGE_READWRITE);
>+ JS_ASSERT(va_ret == ret);
>+#else
>+ ret = VirtualAlloc(addr, size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
>+#endif
>+ return ret;
>+}
>+
>+static void
>+pages_unmap(void *addr, size_t size)
>+{
>+#if (defined(WINCE) && !defined(MOZ_MEMORY_WINCE6))
>+ if (VirtualFree(addr, 0, MEM_RELEASE))
>+ return;
>+
>+ JS_ASSERT(GetLastError() == ERROR_INVALID_PARAMETER);
>+ MEMORY_BASIC_INFORMATION info;
>+ VirtualQuery(addr, &info, sizeof(info));
>+ addr = info.AllocationBase;
>+#endif
>+
>+#ifdef DEBUG
>+ BOOL status =
>+#endif
>+ VirtualFree(addr, 0, MEM_RELEASE);
>+ JS_ASSERT(status);
>+}
>+
>+#elif defined(XP_MACOSX) || defined(DARWIN)
>+
>+static void *
>+pages_map(void *addr, size_t size)
>+{
>+ void *ret;
>+ kern_return_t err;
>+ int flags;
>+
>+ if (addr != NULL) {
>+ ret = addr;
>+ flags = 0;
>+ } else {
>+ flags = VM_FLAGS_ANYWHERE;
>+ }
>+
>+ err = vm_allocate((vm_map_t)mach_task_self(), (vm_address_t *)&ret,
>+ (vm_size_t)size, flags);
>+ if (err != KERN_SUCCESS)
>+ return NULL;
>+
>+ JS_ASSERT((!addr && ret != addr) || (addr && ret == addr));
>+ return ret;
>+}
>+
>+static void
>+pages_unmap(void *addr, size_t size)
>+{
>+#ifdef DEBUG
>+ kern_return_t status =
>+#endif
>+ vm_deallocate((vm_map_t) mach_task_self(),
>+ (vm_address_t) addr,
>+ (vm_size_t) size);
>+ JS_ASSERT(status == KERN_SUCCESS);
>+}
>+
>+#elif defined(XP_UNIX) || defined(XP_BEOS)
>+
>+# if JS_GC_USES_MAP_ALIGN
>+static void *
>+pages_map_align(size_t size, size_t alignment)
>+{
>+ /*
>+ * We don't use MAP_FIXED here, because it can cause the *replacement*
>+ * of existing mappings, and we only want to create new mappings.
>+ */
>+ void *ret = mmap((void *)alignment, size, PROT_READ | PROT_WRITE,
>+ MAP_PRIVATE | MAP_NOSYNC | MAP_ALIGN | MAP_ANON, -1, 0);
>+ if (ret == MAP_FAILED)
>+ return NULL;
>+ return ret;
>+}
>+# endif /* JS_GC_USES_MAP_ALIGN */
>+
>+static void *
>+pages_map(void *addr, size_t size)
>+{
>+ /*
>+ * We don't use MAP_FIXED here, because it can cause the *replacement*
>+ * of existing mappings, and we only want to create new mappings.
>+ */
>+ void *ret = mmap(addr, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON,
>+ -1, 0);
>+
>+ if (ret == MAP_FAILED)
>+ return NULL;
>+ if (addr && ret != addr) {
>+ /*
>+ * We succeeded in mapping memory, but not in the right place.
>+ */
>+#ifdef DEBUG
>+ int status =
>+#endif
>+ munmap(ret, size);
>+ JS_ASSERT(status == 0);
>+ return NULL;
>+ }
>+
>+ return ret;
>+}
>+
>+static void
>+pages_unmap(void *addr, size_t size)
>+{
>+#ifdef DEBUG
>+ int ret =
>+#endif
>+ munmap(addr, size);
>+ JS_ASSERT(ret == 0);
>+}
>+
>+#endif
nit: linebreak
>+
>+
>+static void *
>+chunk_alloc_mmap(size_t size)
>+{
>+ JS_ASSERT((size & (size - 1)) == 0);
>+
>+ /*
>+ * Windows requires that there be a 1:1 mapping between VM
nit: "there is" or just "requires a 1:1"
>+ * allocation/deallocation operations. Therefore, take care here to
>+ * acquire the final result via one mapping operation. This means
>+ * unmapping any preliminary result that is not correctly aligned.
>+ */
>+
>+#if JS_GC_USES_MAP_ALIGN
>+ return pages_map_align(size, size);
>+#else
>+ void* ret = pages_map(NULL, size);
>+ if (ret == NULL)
>+ return NULL;
>+
>+ size_t mask = size - 1;
>+ size_t offset = reinterpret_cast<jsuword>(ret) & mask;
>+ if (offset != 0) {
>+ /* Deallocate, then try to allocate at (ret + size - offset). */
>+ pages_unmap(ret, size);
>+ ret = pages_map((void *)((uintptr_t)ret + size - offset), size);
>+ while (!ret) {
>+ /*
>+ * Over-allocate in order to map a memory region that
>+ * is definitely large enough.
>+ */
>+ ret = pages_map(NULL, size + size);
>+ if (!ret)
>+ return NULL;
>+ /*
>+ * Deallocate, then allocate the correct size, within
>+ * the over-sized mapping.
>+ */
>+ pages_unmap(ret, size + size);
>+ offset = reinterpret_cast<jsuword>(ret) & mask;
>+ if (offset == 0) {
>+ ret = pages_map(ret, size);
>+ } else {
>+ jsuword addr = reinterpret_cast<jsuword>(ret) + size - offset;
>+ ret = pages_map(reinterpret_cast<void *>(addr), size);
>+ }
>+ /*
>+ * Failure here indicates a race with another thread, so
>+ * try again.
>+ */
>+ }
>+ }
>+ return ret;
>+#endif
>+}
>+
>+namespace js {
>+
>+jsuword
>+NewGCChunk()
>+{
>+ void *p = chunk_alloc_mmap(GC_CHUNK_SIZE);
>+ if (!p)
>+ return 0;
>+
>+ jsuword chunk = reinterpret_cast<jsuword>(p);
>+ JS_ASSERT(!(chunk & GC_CHUNK_MASK));
>+ return chunk;
>+}
>+
>+void
>+DestroyGCChunk(jsuword chunk)
>+{
>+ JS_ASSERT(chunk);
>+ JS_ASSERT(!(chunk & GC_CHUNK_MASK));
>+
>+ pages_unmap(reinterpret_cast<void *>(chunk), GC_CHUNK_SIZE);
>+}
>+
>+}
>+
>Index: tm/js/src/jsgcchunk.h
>- *
>- * +------------------------------+---------------+-------------+
>- * | allocation area for GC thing | JSGCArenaInfo | mark bitmap |
>- * +------------------------------+---------------+-------------+
>- *
>- * The allocation area contains GC_CELLS_PER_ARENA. We find that number as the
>- * following. Let n be the number of cells in the arena. Together with the
>- * word-aligned mark bitmap and JSGCArenaInfo they should fit the arena. Hence
>- * GC_CELLS_PER_ARENA or n_max is the maximum value of n for which the
>- * following holds:
>+ * GC memory is allocated in chunks. The size of each chunk is GC_CHUNK_SIZE.
>+ * The chunk contains an array of GC arena holding GC things, an array of
nit: arenas
>+ * the mark bitmaps for each arena, an array of JSGCArenaInfo arena
>+ * descriptors, an array of JSGCMarkingDelay descriptors, the JSGCChunkInfo
>+ * chunk descriptor and a bitmap indicating free arenas in the chunk. The
>+ * following picture demonstrates the layout:
>+ *
>+ * +--------+--------------+-------------+----------------+------------+-------------------+
>+ * | arenas | mark bitmaps | arena infos | marking delays | chunk info | free arena bitmap |
>+ * +--------+--------------+-------------+----------------+------------+-------------------+
more spaces in the picture or is this just in the diff?
>+
>+/* Can only be applicable to arena where a->list is not null. */
>+inline uint8 *
>+GCArenaIndexToThing(JSGCArena *a, JSGCArenaInfo *ainfo, size_t index)
>+{
>+ JS_ASSERT(a->getInfo() == ainfo);
>+
>+ /*
>+ * We use "<=" and not "<" in the assert so index can mean the limit.
>+ * For the same reason we use "+", not "|" when finding the thing address
>+ * as the limit address can start at the next arena.
>+ */
>+ JS_ASSERT(index <= ThingsPerArena(ainfo->list->thingSize));
>+ jsuword offsetInArena = index * ainfo->list->thingSize;
>+ return reinterpret_cast<uint8 *>(a->toPageStart() + offsetInArena);
>+}
>+
>+inline JSGCThing *
>+NextThing(JSGCThing *thing, size_t thingSize)
nit: The name of the function is not very precise and the function could be used in the wrong way.
Another assert ! isStatic?
I will continue tomorrow.
Assignee | ||
Comment 22•15 years ago
|
||
(In reply to comment #21)
> What is the problem with using _aligned_malloc?
On Windows _aligned_malloc(size, alignment) is implemented via malloc(size + alignment + sizeof(void *)). Thus _aligned_malloc(alignment, alignment) would waste more than twice the memory.
Comment 23•15 years ago
|
||
Does it really? IT uses a bit extra virtual memory, but that's basically free. Only the pages we actually use get mapped, which is exactly the aligned part of that allocated block.
Assignee | ||
Comment 24•15 years ago
|
||
(In reply to comment #23)
> Does it really?
Well, somebody having MSVC installed should look at \Program Files\Microsoft Visual Studio 8\VC\crt\src\align.c but my googling shows that is still the case.
Comment 25•15 years ago
|
||
What I meant is "does it really waste memory?". I think malloc(size + alignment + sizeof(void *)) is a reasonable way to implement this from a physical memory consumption perspective.
Assignee | ||
Comment 26•15 years ago
|
||
Attachment #435955 -
Attachment is obsolete: true
Attachment #436205 -
Flags: review?
Attachment #435955 -
Flags: review?(anygregor)
Attachment #435955 -
Flags: review?
Assignee | ||
Updated•15 years ago
|
Attachment #436205 -
Flags: review?(gal)
Attachment #436205 -
Flags: review?(anygregor)
Attachment #436205 -
Flags: review?
Comment 27•15 years ago
|
||
(In reply to comment #25)
> What I meant is "does it really waste memory?". I think malloc(size + alignment
> + sizeof(void *)) is a reasonable way to implement this from a physical memory
> consumption perspective.
It depends on the values of size and alignment. What are they here?
/be
Comment 28•15 years ago
|
||
Of course only measurements will tell, but any sane malloc will not touch the mapped pages needlessly, so I would say the values is "nothing", and it will become 0 the moment the process first touches the page (which for the alignment overflow part will be "never").
Assignee | ||
Comment 29•15 years ago
|
||
(In reply to comment #27)
> (In reply to comment #25)
> > What I meant is "does it really waste memory?". I think malloc(size + alignment
> > + sizeof(void *)) is a reasonable way to implement this from a physical memory
> > consumption perspective.
>
> It depends on the values of size and alignment. What are they here?
size == alignment == 1MB, the same values that jemalloc uses.
Comment 30•15 years ago
|
||
Comment on attachment 436205 [details] [diff] [review]
v5
>- JSGCArenaInfo info;
>- jsbitmap markBitmap[GC_ARENA_MARK_BITMAP_WORDS];
>+ uint8 data[GC_ARENA_SIZE];
I was always wondering what the indentation rules are for local vars are.
Should data be aligned to the old markBitmap?
>
> inline void
>-SetChunkInfoIndex(jsuword chunk, unsigned index)
>+CheckValidThingPtr(void *thing)
Maybe rename this function to CheckValidGCThingPtr since static strings are valid "things"?
>
>+inline size_t
>+ThingsPerArena(size_t thingSize)
>+{
>+ JS_ASSERT(!(thingSize & GC_CELL_MASK));
>+ JS_ASSERT(thingSize <= GC_ARENA_SIZE);
>+ return GC_ARENA_SIZE / thingSize;
>+}
>+
>+/* Can only be called if thing is on an arena where a->list is not null. */
nit "is in an arena"
>
> while ((a = arenaList->cursor) != NULL) {
>- arenaList->cursor = a->info.prev;
>- JSGCThing *freeList = a->info.freeList;
>+ JSGCArenaInfo * ainfo = a->getInfo();
nit: remove Whitespace
>- while (a->info.unmarkedChildren != 0) {
>- unmarkedBitIndex = JS_FLOOR_LOG2W(a->info.unmarkedChildren);
>- a->info.unmarkedChildren &= ~((jsuword)1 << unmarkedBitIndex);
>+ while (markingDelay->unmarkedChildren != 0) {
>+ unmarkedBitIndex = JS_FLOOR_LOG2W(markingDelay->unmarkedChildren);
>+ markingDelay->unmarkedChildren &= ~((jsuword)1 << unmarkedBitIndex);
shouldn't it be jsuword(1) ?
Attachment #436205 -
Flags: review?(anygregor) → review+
Comment 31•15 years ago
|
||
(In reply to comment #30)
> (From update of attachment 436205 [details] [diff] [review])
>
> >- JSGCArenaInfo info;
> >- jsbitmap markBitmap[GC_ARENA_MARK_BITMAP_WORDS];
> >+ uint8 data[GC_ARENA_SIZE];
>
> I was always wondering what the indentation rules are for local vars are.
> Should data be aligned to the old markBitmap?
Not sure, but C++ allows us to move declarations to just before needed, which means fewer piled up in a file (possibly with same-typed ones along ranks) at the top of the function -- which means less desire to line up declarators.
Does that help here?
/be
Assignee | ||
Comment 32•15 years ago
|
||
(In reply to comment #30)
> (From update of attachment 436205 [details] [diff] [review])
>
> >- JSGCArenaInfo info;
> >- jsbitmap markBitmap[GC_ARENA_MARK_BITMAP_WORDS];
> >+ uint8 data[GC_ARENA_SIZE];
>
> I was always wondering what the indentation rules are for local vars are.
> Should data be aligned to the old markBitmap?
The rule is to align all field names on a tabstop. But it is not applicable here as the patch makes data a single field in JSGCArena.
Assignee | ||
Comment 33•15 years ago
|
||
The new version fixes the latest nits.
Attachment #436205 -
Attachment is obsolete: true
Attachment #436542 -
Flags: review?
Attachment #436205 -
Flags: review?(gal)
Assignee | ||
Updated•15 years ago
|
Attachment #436542 -
Flags: review? → review?(gal)
Assignee | ||
Comment 34•15 years ago
|
||
gal: review ping
Comment 35•15 years ago
|
||
This is probably a terrible idea, for other reasons, but for what it's worth:
If one has separated the mark bits from the objects, then that means that only the allocator and GC touch them. Thus, once the roots have been placed in a queue somewhere, the rest of the traversal can easily be done in a separate thread, if we "allocate black": new objects are marked as "visited" by the ongoing GC phase.
Comment 36•15 years ago
|
||
... oh, right. The algorithm I had in mind also requires a write barrier, which we don't have.
Comment 37•15 years ago
|
||
(In reply to comment #36)
> ... oh, right. The algorithm I had in mind also requires a write barrier,
> which we don't have.
We do, of course -- at a slightly higher level. JS requires read and write barriers for getters and setters. We have a write barrier for branding and a read barrier for lazy method cloning as well.
/be
Comment 38•15 years ago
|
||
Comment on attachment 436542 [details] [diff] [review]
v6
>+ jsuword getChunk() { return toPageStart() & ~GC_CHUNK_MASK; }
Nit: No same-line function implementation.
>+ void addToList(JSRuntime *rt) {
>+ prevp = &rt->gcChunkList;
>+ next = rt->gcChunkList;
>+ if (rt->gcChunkList) {
>+ JS_ASSERT(rt->gcChunkList->prevp == &rt->gcChunkList);
>+ rt->gcChunkList->prevp = &next;
>+ }
>+ rt->gcChunkList = this;
Do we need our own list implementation here or could we use js::Vector?
>- if (a->info.unmarkedChildren != 0) {
>+ if (markingDelay->unmarkedChildren != 0) {
> JS_ASSERT(rt->gcUnmarkedArenaStackTop);
>- if (a->info.unmarkedChildren & bit) {
>+ if (markingDelay->unmarkedChildren & bit) {
> /* bit already covers things with children to mark later. */
> return;
> }
>- a->info.unmarkedChildren |= bit;
>+ markingDelay->unmarkedChildren |= bit;
I still really dislike this code. Another patch to remove it I guess.
Looks good. Sorry for the slow review.
r=me if you can prove to me that on windows your allocation code is faster than the builtin aligned malloc that over-allocates, or that it uses significantly less _physical_ memory.
Attachment #436542 -
Flags: review?(gal) → review+
Comment 39•15 years ago
|
||
One-line inline method definition for very short functions (like one-line macros) is fine in C++ house style.
/be
Comment 40•15 years ago
|
||
(In reply to comment #37)
> (In reply to comment #36)
> > ... oh, right. The algorithm I had in mind also requires a write barrier,
> > which we don't have.
>
> We do, of course -- at a slightly higher level. JS requires read and write
> barriers for getters and setters. We have a write barrier for branding and a
> read barrier for lazy method cloning as well.
What parallel GC would need is a barrier protecting every reference that can keep an object alive, not just object slots. We have huge amounts of miscellanea.
Comment 42•15 years ago
|
||
(In reply to comment #40)
> (In reply to comment #37)
> > (In reply to comment #36)
> > > ... oh, right. The algorithm I had in mind also requires a write barrier,
> > > which we don't have.
> >
> > We do, of course -- at a slightly higher level. JS requires read and write
> > barriers for getters and setters. We have a write barrier for branding and a
> > read barrier for lazy method cloning as well.
>
> What parallel GC would need is a barrier protecting every reference that can
> keep an object alive, not just object slots. We have huge amounts of
> miscellanea.
Yes, indeed. I didn't mean to make it sound like we were "done" -- all the "native" writes are the problem.
/be
Assignee | ||
Comment 43•15 years ago
|
||
(In reply to comment #38)
> >+ void addToList(JSRuntime *rt) {
>
> Do we need our own list implementation here or could we use js::Vector?
A data structure that allows fast insertion and deletion is necessary here. Vector does not fit that pattern so the patch continue to use a custom double linked list implementation.
> r=me if you can prove to me that on windows your allocation code is faster than
> the builtin aligned malloc that over-allocates, or that it uses significantly
> less _physical_ memory.
The chunk allocation code comes directly from jemalloc. If this can be done better, we need a common bug to improve it there as well. Also consider that with jemalloc and GC allocating the same 1MB chunks the initial allocation of getting 1MB most likely will get the chunk aligned with a sane OS.
Assignee | ||
Comment 44•15 years ago
|
||
(In reply to comment #43)
> (In reply to comment #38)
> > >+ void addToList(JSRuntime *rt) {
> >
> > Do we need our own list implementation here or could we use js::Vector?
>
> A data structure that allows fast insertion and deletion is necessary here.
This is not true. We need a structure that allows for fast insertion and *bulk* release. So vector can do the job and would result for simpler code. I will do it in another bug.
Assignee | ||
Comment 45•15 years ago
|
||
The new version fixes the nits and adds chunks stats to the GC stats.
Attachment #436542 -
Attachment is obsolete: true
Attachment #438490 -
Flags: review+
Comment 46•15 years ago
|
||
looks like there's a crash or assertion in xpcshell testing:
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1271097483.1271099064.14971.gz&fulltext=1
find stacks there.
Comment 47•15 years ago
|
||
0 ld-2.5.so + 0x7f2
eip = 0x0097f7f2 esp = 0xbfb8c23c ebp = 0xbfb8c248 ebx = 0x00002171
esi = 0xffffffff edi = 0x00b22ff4 eax = 0x00000000 ecx = 0x00002171
edx = 0x00000006 efl = 0x00000206
Found by: given as instruction pointer in context
1 libmozjs.so!JS_Assert [jsutil.cpp:7c910d181330 : 78 + 0xb]
eip = 0x0035314b esp = 0xbfb8c250 ebp = 0xbfb8c268
Found by: previous frame's frame pointer
2 libmozjs.so!js::GCArenaReleaser::release [jsgc.cpp:7c910d181330 : 754 + 0x50]
eip = 0x0029063b esp = 0xbfb8c270 ebp = 0xbfb8c298 ebx = 0x0043bb74
Found by: call frame info
3 libmozjs.so!js::GCArenaReleaser::releaseList [jsgc.cpp:7c910d181330 : 771 + 0x18]
eip = 0x00290707 esp = 0xbfb8c2a0 ebp = 0xbfb8c2b8 ebx = 0x0043bb74
esi = 0x09ad7ac0 edi = 0x00000014
Found by: call frame info
4 libmozjs.so!FinishGCArenaLists [jsgc.cpp:7c910d181330 : 879 + 0x1a]
eip = 0x00290025 esp = 0xbfb8c2c0 ebp = 0xbfb8c2e8 ebx = 0x0043bb74
esi = 0x09ad7ac0 edi = 0x00000014
Found by: call frame info
5 libmozjs.so!js_FinishGC [jsgc.cpp:7c910d181330 : 1198 + 0xa]
eip = 0x002900d8 esp = 0xbfb8c2f0 ebp = 0xbfb8c2f8 ebx = 0x0043bb74
esi = 0x09ad7ac0 edi = 0x00000014
Found by: call frame info
6 libmozjs.so!JSRuntime::~JSRuntime [jsapi.cpp:7c910d181330 : 630 + 0xa]
Assignee | ||
Comment 48•15 years ago
|
||
My attempt to land this revealed a debug-only bug where the arena release method would change the arena link field in debug-only builds before the arena list enumerator read the list. The new patch fixes this.
Attachment #438490 -
Attachment is obsolete: true
Attachment #438560 -
Flags: review+
Assignee | ||
Comment 49•15 years ago
|
||
Whiteboard: fixed-in-tracemonkey
Comment 50•15 years ago
|
||
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•