Closed Bug 550413 Opened 14 years ago Closed 14 years ago

TM: simplify delayed marking algorithm

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 558861

People

(Reporter: gal, Assigned: gwagner)

References

Details

Attachments

(1 file, 3 obsolete files)

To deal with deeply nested data structures during GC we use a bitmask and a stack pointer. The attached patch reduces this to a stack pointer. We trade performance for this extremely rare case for less code and a few free bits in the arena header (for which we have other needs).
Attached patch patch (obsolete) — Splinter Review
Assignee: general → gal
Attached patch patch (obsolete) — Splinter Review
Attachment #430536 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
Attachment #430538 - Attachment is obsolete: true
Performance measurements:

- Without the patch, regular stack budget: 45ms
- With the patch, regular stack budget: 45ms (expected, we don't touch the regular code path)

- Without the patch, always mark delayed: 368ms
- With the patch, always marked delayed: 1158ms

The patch thus is a 3x slowdown for the following case:

- We are completely out of stack memory (less then 20k C stack available, with > 20k C stack the perf difference is negligible).
- The user has created a deeply nested data structure where every element is aligned in the heap in perfect opposite order (backwards) of the marking algorithm (forwards).
- The entire heap is full with these objects.

This is the test case I used:

var a = [];

for (n = 0; n < 512 * 1024; ++n)
    a.push({});

var root = {};
var ptr = root;
var x;

while ((x = a.pop()) != undefined) {
    ptr.next = x;
    ptr = x;
}

var t = Date.now();
gc();
print(Date.now() - t);
Attachment #430540 - Flags: review?(igor)
Attached patch patchSplinter Review
Make sure JSGCArenaInfo ends at an 8-byte boundary.
Attachment #430540 - Attachment is obsolete: true
Attachment #430544 - Flags: review?(igor)
Attachment #430540 - Flags: review?(igor)
(In reply to comment #0)
> To deal with deeply nested data structures during GC we use a bitmask and a
> stack pointer. The attached patch reduces this to a stack pointer. We trade
> performance for this extremely rare case for less code and a few free bits in
> the arena header (for which we have other needs).

For the record: currently the marking bitmap contains 3 unused bits per object. Even if the cycle collector would take 2 (in the current proposal it would take only one due to clever sharing with the GC marking), that still leaves one unused bit. In principle that bit can be used to denote the delayed objects. That bit would allow to scan for children only objects that were not scanned before. That is better then scanning all marked objects in the arena like the patch proposes.

But that means complex code again and we could have better uses of those unused bits like efficient implementation of ephemeron tables.
Comment on attachment 430544 [details] [diff] [review]
patch

>diff --git a/js/src/jsgc.cpp b/js/src/jsgc.cpp
>--- a/js/src/jsgc.cpp
>+++ b/js/src/jsgc.cpp
>@@ -258,48 +258,43 @@ struct JSGCArenaInfo {
>      * Pointer to the previous arena in a linked list. The arena can either
>      * belong to one of JSContext.gcArenaList lists or, when it does not have
>      * any allocated GC things, to the list of free arenas in the chunk with
>      * head stored in JSGCChunkInfo.lastFreeArena.
>      */
>     JSGCArena       *prev;
> 
>     /*
>-     * A link field for the list of arenas with marked things that haven't yet
>-     * been scanned for live children. The field is encoded as arena's page to
>-     * to hold only the high-order arena-counting bits to share the space with
>-     * firstArena and arenaIndex fields. For details see comments before
>-     * DelayMarkingChildren.
>-     */
>-    jsuword         prevUnmarkedPage :  JS_BITS_PER_WORD - GC_ARENA_SHIFT;
>-
>-    /*
>      * When firstArena is false, the index of arena in the chunk. When
>      * firstArena is true, the index of a free arena holding JSGCChunkInfo or
>      * NO_FREE_ARENAS if there are no free arenas in the chunk.
>      *
>      * GetArenaIndex() and GetChunkInfoIndex() below are convenience functions
>      * to access either of indexes.
>      */
>     jsuword         arenaIndex :        GC_ARENA_SHIFT - 1;
> 
>     /* Flag indicating if the arena is the first in the chunk. */
>     jsuword         firstArena :        1;
> 
>     JSGCThing       *freeList;
> 
>+    jsuword          dummy;

I do not see why this is necessary. The mark bitmap takes the last 64 bytes in the arena. Thus JSGCArenaInfo ends on that boundary. So what exactly the dummy wants to align?

 *
>- * Note that such repeated scanning may slow down the GC. In particular, it is
>- * possible to construct an object graph where the GC calls JS_TraceChildren
>- * ThingsPerUnmarkedBit(thingSize) for almost all things in the graph. We
>- * tolerate this as the max value for ThingsPerUnmarkedBit(thingSize) is 4.
>- * This is archived for JSObject on 32 bit system as it is exactly JSObject
>- * that has the smallest size among the GC things that can be delayed. On 32
>- * bit CPU we have less than 128 objects per 4K GC arena so each bit in
>- * unmarkedChildren covers 4 objects.
>  */

Keep the comment about repeated marking of the already marked things and potential performance problem with that including estimation on the worst case scenario.

r+ with this addressed.
Attachment #430544 - Flags: review?(igor)
Blocks: 550373
Assignee: gal → anygregor
Incorporated in the new GC
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: