Closed Bug 627200 Opened 13 years ago Closed 13 years ago

Background Finalization for Strings and Objects

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: gwagner, Assigned: gwagner)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(7 files, 4 obsolete files)

In order to make the GC pause time shorter we could finalize objects and strings outside the main GC cycle.
I want to start with strings and short strings.
Assignee: general → anygregor
Attached patch WiP (obsolete) — Splinter Review
Thats a first try of the implementation for finalizing JSStrings and ShortStrings on the background thread. It's WiP and not ready yet. There are still dirty hacks in there...

One problem that comes with background finalization is that after a GC we don't know if the GC successfully freed memory.
We only get this information after we finalized all arenas.
Another problem is that the next allocation shouldn't have to wait until finalization has finished. We would only move the pause time to the first allocation.
So I tried to allocate new arenas if we are still finalizing an arenalist and not take the currently finalized arenas for allocation.
I had to add a lock per chunk in order to insert into and remove from the chunklist that holds free arenas per chunk.

The results look very promising. The v8-splay benchmark shows some string finalization time and this is completely gone with this patch.
Sunspider in a threadsafe-enabled shell doesn't show any slowdown which shows that the added overhead doesn't have a major performance impact during the allocation path.
Another problem comes from the fact that we don't know the WSS right after the GC. We assume that all arenas are full and calculate the next GC trigger with it.

We have to update the GC trigger during or after the finalization process.
Currently the GC pause time is around 50 mill cycles and about 15 mill cycles are spent in the string finalization.
With this patch the average GC pause time reduces to around 30 mill cycles. The finalization of the strings is now completely in the background.
We should really transition away from the stupid triggerGC stuff and measure working sets.
Summary: Lazy Arena Finalization for Strings → Background Finalization for Strings
This patch finalizes strings and certain objects (mostly where clasp == NULL) on the background thread. Again there are dirty hacks in there and I disabled the  sweeping of shapes for now because sweeping shapes before objects is not a good idea...
An average GC pause for the FOTN reduces now from around 350 mill instructions to 230 mill instructions.
(In reply to comment #6)
> Created attachment 508060 [details] [diff] [review]
> Proof of concept: finalize obj + str on background thread
> 
> This patch finalizes strings and certain objects (mostly where clasp == NULL)

ups I mean clasp->finalize == null
Patrick could you try the patch from Comment 6 with your frame tool and tell me if you see any improvement for the FOTN?
Attached patch patch (obsolete) — Splinter Review
This patch moves JSString and ShortString finalization to the gcHelperThread.
The main idea is that the whole arenaList is locked until we finalize it. This means no allocation until the whole arenalist is finalized. If a new string should be allocated, we have to allocate a new arena. The trick is now that we add the new arena at the beginning of the arenaList and we don't delete the head of the old arenaList even if it's empty. So we don't have to wait during allocation until the whole arenalist gets finalized. Otherwise we would move the pause to the first allocation.

Another problem is that we don't know how many arenas actually are empty at the end of a GC. This is a problem since we can no longer calculate an exact gcTrigger. We have to adjust the gcTrigger during background finalization.
Attachment #521969 - Flags: feedback?(gal)
Summary: Background Finalization for Strings → Background Finalization for Strings and Objects
Attached patch patch (obsolete) — Splinter Review
GC Pause time reduces from 200ms to 100ms for the realtime raytracer and 30% for FOTN.
Attachment #506035 - Attachment is obsolete: true
Attachment #508060 - Attachment is obsolete: true
Attachment #521969 - Attachment is obsolete: true
Attachment #521969 - Flags: feedback?(gal)
Attached image V8 Benchmark Trunk
Current situation for the V8 Benchmark. Almost all Time in the GC is spent in finalization.
Most of the finalization cost is gone. The average GC pause time reduces from 20 ms to 5-10ms. Even for the splay benchmark we reduce the pause time from 140ms to 60ms.
blizzard: you wanted nice marketable features, here is one for FF6 (see comment 13 and comment 14)
Attached patch patchSplinter Review
This patch changes the GC behavior significantly. 50% pause time reduction for the realtime raytracer, 30% for FOTN. More results will follow. I left the GC heuristics parameter unchanged but we might need to tweak them a little bit.

I can see a possible regression with this patch if we call JS_GC in a loop because we have to wait for the background thread to be done in order to start a new GC.

I introduced a new lock per chunk since the background thread can release arenas to the chunk and the normal JS thread can allocate arenas from the same chunk at the same time.
Every ArenaList has now a status variable that indicates if the arenalist was finalized. For the allocation path this means now that we allocate new arenas and insert them at the beginning of the list until we are done finalizing the current arenalist.

I am open for naming suggestions (finalize types, function names...)
We might want to clean up the gcHelperThread code with followups.
Attachment #525226 - Attachment is obsolete: true
Attachment #525723 - Flags: review?(gal)
Comment on attachment 525723 [details] [diff] [review]
patch

> #define JSCLASS_HAS_PRIVATE             (1<<0)  /* objects have private slot */
> #define JSCLASS_NEW_ENUMERATE           (1<<1)  /* has JSNewEnumerateOp hook */
> #define JSCLASS_NEW_RESOLVE             (1<<2)  /* has JSNewResolveOp hook */
> #define JSCLASS_PRIVATE_IS_NSISUPPORTS  (1<<3)  /* private is (nsISupports *) */
>+#define JSCLASS_HAS_INTERNAL_FINALIZER  (1<<4)  /* finalize is called on background thread */

JSCLASS_CONCURRENT_FINALIZER ?

>     void setGCTriggerFactor(uint32 factor);
>     void setGCLastBytes(size_t lastBytes);
>+    void decreaseGCTriggerBytes(uint32 amount);

reduce? decrement?

>-    size_t                       gcBytes;
>-    size_t                       gcTriggerBytes;
>+    uint32                       gcBytes;
>+    uint32                       gcTriggerBytes;

Shouldn't we use size_t everywhere instead?

>     if (comp->arenas[FINALIZE_OBJECT0].arenasContainThing<JSObject>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT0_BACKGROUND].arenasContainThing<JSObject>(thing) ||
>         comp->arenas[FINALIZE_OBJECT2].arenasContainThing<JSObject_Slots2>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT2_BACKGROUND].arenasContainThing<JSObject_Slots2>(thing) ||
>         comp->arenas[FINALIZE_OBJECT4].arenasContainThing<JSObject_Slots4>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT4_BACKGROUND].arenasContainThing<JSObject_Slots4>(thing) ||
>         comp->arenas[FINALIZE_OBJECT8].arenasContainThing<JSObject_Slots8>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT8_BACKGROUND].arenasContainThing<JSObject_Slots8>(thing) ||
>         comp->arenas[FINALIZE_OBJECT12].arenasContainThing<JSObject_Slots12>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT12_BACKGROUND].arenasContainThing<JSObject_Slots12>(thing) ||
>         comp->arenas[FINALIZE_OBJECT16].arenasContainThing<JSObject_Slots16>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT16_BACKGROUND].arenasContainThing<JSObject_Slots16>(thing) ||
>         comp->arenas[FINALIZE_FUNCTION].arenasContainThing<JSFunction>(thing) ||
>         comp->arenas[FINALIZE_SHAPE].arenasContainThing<Shape>(thing) ||
> #if JS_HAS_XML_SUPPORT
>         comp->arenas[FINALIZE_XML].arenasContainThing<JSXML>(thing) ||
> #endif
>         comp->arenas[FINALIZE_STRING].arenasContainThing<JSString>(thing) ||
>         comp->arenas[FINALIZE_EXTERNAL_STRING].arenasContainThing<JSExternalString>(thing) ||
>         comp->arenas[FINALIZE_SHORT_STRING].arenasContainThing<JSShortString>(thing))

Use a loop?

>+    JS_ATOMIC_ADD(&rt->gcBytes, sizeof(Arena<T>));
>+    JS_ATOMIC_ADD(&comp->gcBytes, sizeof(Arena<T>));

Oh right this is why you want uint32. Lame.

> Class js_IteratorClass = {
>     "Iterator",
>-    JSCLASS_HAS_PRIVATE | JSCLASS_HAS_CACHED_PROTO(JSProto_Iterator),
>+    JSCLASS_HAS_PRIVATE | JSCLASS_HAS_INTERNAL_FINALIZER | 
>+        JSCLASS_HAS_CACHED_PROTO(JSProto_Iterator),

indent?

>+    if (kind % 2 == 0 && (!clasp->finalize || clasp->flags & JSCLASS_HAS_INTERNAL_FINALIZER))
>+        kind = (gc::FinalizeKind)(kind + 1);
>+

For the love of god make this nicer and comment it.

>+
>+    if (kind % 2 == 0 && (!clasp->finalize || clasp->flags & JSCLASS_HAS_INTERNAL_FINALIZER))
>+        kind = (gc::FinalizeKind)(kind + 1);
>+

Dito.

>+    if (!isFunction) {
>+        if (kind % 2 == 0 && (!clasp->finalize || clasp->flags & JSCLASS_HAS_INTERNAL_FINALIZER))
>+            kind = (gc::FinalizeKind)(kind + 1);
>+    }

Dito.

> Class ArrayBuffer::jsclass = {
>     "ArrayBuffer",
>-    JSCLASS_HAS_PRIVATE | JSCLASS_HAS_CACHED_PROTO(JSProto_ArrayBuffer),
>+    JSCLASS_HAS_PRIVATE | JSCLASS_HAS_INTERNAL_FINALIZER |
>+        JSCLASS_HAS_CACHED_PROTO(JSProto_ArrayBuffer),

Indent.
Attachment #525723 - Flags: review?(gal) → review+
(In reply to comment #17)
> Comment on attachment 525723 [details] [diff] [review]
> patch
> 
> > #define JSCLASS_HAS_PRIVATE             (1<<0)  /* objects have private slot */
> > #define JSCLASS_NEW_ENUMERATE           (1<<1)  /* has JSNewEnumerateOp hook */
> > #define JSCLASS_NEW_RESOLVE             (1<<2)  /* has JSNewResolveOp hook */
> > #define JSCLASS_PRIVATE_IS_NSISUPPORTS  (1<<3)  /* private is (nsISupports *) */
> >+#define JSCLASS_HAS_INTERNAL_FINALIZER  (1<<4)  /* finalize is called on background thread */
> 
> JSCLASS_CONCURRENT_FINALIZER ?

done. also the other name changes.

> 
> >     if (comp->arenas[FINALIZE_OBJECT0].arenasContainThing<JSObject>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT0_BACKGROUND].arenasContainThing<JSObject>(thing) ||
> >         comp->arenas[FINALIZE_OBJECT2].arenasContainThing<JSObject_Slots2>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT2_BACKGROUND].arenasContainThing<JSObject_Slots2>(thing) ||
> >         comp->arenas[FINALIZE_OBJECT4].arenasContainThing<JSObject_Slots4>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT4_BACKGROUND].arenasContainThing<JSObject_Slots4>(thing) ||
> >         comp->arenas[FINALIZE_OBJECT8].arenasContainThing<JSObject_Slots8>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT8_BACKGROUND].arenasContainThing<JSObject_Slots8>(thing) ||
> >         comp->arenas[FINALIZE_OBJECT12].arenasContainThing<JSObject_Slots12>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT12_BACKGROUND].arenasContainThing<JSObject_Slots12>(thing) ||
> >         comp->arenas[FINALIZE_OBJECT16].arenasContainThing<JSObject_Slots16>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT16_BACKGROUND].arenasContainThing<JSObject_Slots16>(thing) ||
> >         comp->arenas[FINALIZE_FUNCTION].arenasContainThing<JSFunction>(thing) ||
> >         comp->arenas[FINALIZE_SHAPE].arenasContainThing<Shape>(thing) ||
> > #if JS_HAS_XML_SUPPORT
> >         comp->arenas[FINALIZE_XML].arenasContainThing<JSXML>(thing) ||
> > #endif
> >         comp->arenas[FINALIZE_STRING].arenasContainThing<JSString>(thing) ||
> >         comp->arenas[FINALIZE_EXTERNAL_STRING].arenasContainThing<JSExternalString>(thing) ||
> >         comp->arenas[FINALIZE_SHORT_STRING].arenasContainThing<JSShortString>(thing))
> 
> Use a loop?

Types like <JSObject> are the problem...


> 
> >+
> >+    if (kind % 2 == 0 && (!clasp->finalize || clasp->flags & JSCLASS_HAS_INTERNAL_FINALIZER))
> >+        kind = (gc::FinalizeKind)(kind + 1);
> >+

With the new name we are below 100.
http://hg.mozilla.org/tracemonkey/rev/79bb6e40bc61
Whiteboard: fixed-in-tracemonkey
Great work!

/be
Gregor: what ensures that cx that is used for background finalization stays alive?
(In reply to comment #21)
> Gregor: what ensures that cx that is used for background finalization stays
> alive?

I wait in destroyContext until the background thread is done.
V8 benchmark numbers with this patch:
Score: 4885
Richards: 8059
DeltaBlue: 4861
Crypto: 8501
RayTrace: 4168
EarleyBoyer: 4802
RegExp: 1702
Splay: 5851

without:
Score: 4682
Richards: 7498
DeltaBlue: 4899
Crypto: 8523
RayTrace: 3819
EarleyBoyer: 4632
RegExp: 1686
Splay: 5281

I can see improvements all over the place but especially a great win on splay. The current chrome score for Splay on my machine is: 2909!
(We should generally expect that a non-copying GC beats a copying GC, mutatis mutandis, since it's designed to have a large live set and small garbage set during collection.)

Great numbers, though!
(In reply to comment #24)
> (We should generally expect that a non-copying GC beats a copying GC, mutatis
> mutandis, since it's designed to have a large live set and small garbage set
> during collection.)

Isn't the conventional wisdom that v8bench benefits hugely from a generational GC?
Yes, shorter scavenges help where they can, though I think intentionally not in splay, IIRC from a conversation with the V8 guys, because the random linkage causes a lot of cross-generational references.  Splay was written as a hard-case tuning target for their GC.
Depends on: 651193
Depends on: 652416
http://hg.mozilla.org/mozilla-central/rev/79bb6e40bc61
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Depends on: 654028
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: