Last Comment Bug 627200 - Background Finalization for Strings and Objects
: Background Finalization for Strings and Objects
Status: RESOLVED FIXED
fixed-in-tracemonkey
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: x86 Mac OS X
: -- normal with 2 votes (vote)
: ---
Assigned To: Gregor Wagner [:gwagner]
:
Mentors:
: 584688 (view as bug list)
Depends on: 651193 652416 654028
Blocks:
  Show dependency treegraph
 
Reported: 2011-01-19 14:32 PST by Gregor Wagner [:gwagner]
Modified: 2011-06-15 08:51 PDT (History)
15 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
WiP (46.44 KB, patch)
2011-01-21 17:41 PST, Gregor Wagner [:gwagner]
no flags Details | Diff | Splinter Review
spinning world map current trunk (13.13 KB, image/png)
2011-01-21 21:20 PST, Gregor Wagner [:gwagner]
no flags Details
spinning world map benchmark with this patch (15.51 KB, image/png)
2011-01-21 21:22 PST, Gregor Wagner [:gwagner]
no flags Details
Proof of concept: finalize obj + str on background thread (48.85 KB, patch)
2011-01-28 17:54 PST, Gregor Wagner [:gwagner]
no flags Details | Diff | Splinter Review
GC events for FOTN: trunk (8.92 KB, image/png)
2011-01-28 17:56 PST, Gregor Wagner [:gwagner]
no flags Details
GC events for FOTN: with background finalization of objects and strings (8.58 KB, image/png)
2011-01-28 17:57 PST, Gregor Wagner [:gwagner]
no flags Details
patch (44.17 KB, patch)
2011-03-25 15:07 PDT, Gregor Wagner [:gwagner]
no flags Details | Diff | Splinter Review
patch (67.27 KB, patch)
2011-04-11 17:19 PDT, Gregor Wagner [:gwagner]
no flags Details | Diff | Splinter Review
V8 Benchmark Trunk (6.92 KB, image/png)
2011-04-12 10:52 PDT, Gregor Wagner [:gwagner]
no flags Details
V8 Benchmark with background finalization (8.39 KB, image/png)
2011-04-12 10:54 PDT, Gregor Wagner [:gwagner]
no flags Details
patch (73.31 KB, patch)
2011-04-13 10:49 PDT, Gregor Wagner [:gwagner]
gal: review+
Details | Diff | Splinter Review

Description Gregor Wagner [:gwagner] 2011-01-19 14:32:03 PST
In order to make the GC pause time shorter we could finalize objects and strings outside the main GC cycle.
I want to start with strings and short strings.
Comment 1 Gregor Wagner [:gwagner] 2011-01-21 17:41:52 PST
Created attachment 506035 [details] [diff] [review]
WiP

Thats a first try of the implementation for finalizing JSStrings and ShortStrings on the background thread. It's WiP and not ready yet. There are still dirty hacks in there...

One problem that comes with background finalization is that after a GC we don't know if the GC successfully freed memory.
We only get this information after we finalized all arenas.
Another problem is that the next allocation shouldn't have to wait until finalization has finished. We would only move the pause time to the first allocation.
So I tried to allocate new arenas if we are still finalizing an arenalist and not take the currently finalized arenas for allocation.
I had to add a lock per chunk in order to insert into and remove from the chunklist that holds free arenas per chunk.

The results look very promising. The v8-splay benchmark shows some string finalization time and this is completely gone with this patch.
Sunspider in a threadsafe-enabled shell doesn't show any slowdown which shows that the added overhead doesn't have a major performance impact during the allocation path.
Comment 2 Gregor Wagner [:gwagner] 2011-01-21 19:54:07 PST
Another problem comes from the fact that we don't know the WSS right after the GC. We assume that all arenas are full and calculate the next GC trigger with it.

We have to update the GC trigger during or after the finalization process.
Comment 3 Gregor Wagner [:gwagner] 2011-01-21 21:20:38 PST
Created attachment 506060 [details]
spinning world map current trunk

Currently the GC pause time is around 50 mill cycles and about 15 mill cycles are spent in the string finalization.
Comment 4 Gregor Wagner [:gwagner] 2011-01-21 21:22:45 PST
Created attachment 506061 [details]
spinning world map benchmark with this patch

With this patch the average GC pause time reduces to around 30 mill cycles. The finalization of the strings is now completely in the background.
Comment 5 Andreas Gal :gal 2011-01-21 21:31:11 PST
We should really transition away from the stupid triggerGC stuff and measure working sets.
Comment 6 Gregor Wagner [:gwagner] 2011-01-28 17:54:55 PST
Created attachment 508060 [details] [diff] [review]
Proof of concept: finalize obj + str on background thread

This patch finalizes strings and certain objects (mostly where clasp == NULL) on the background thread. Again there are dirty hacks in there and I disabled the  sweeping of shapes for now because sweeping shapes before objects is not a good idea...
An average GC pause for the FOTN reduces now from around 350 mill instructions to 230 mill instructions.
Comment 7 Gregor Wagner [:gwagner] 2011-01-28 17:56:36 PST
Created attachment 508062 [details]
GC events for FOTN: trunk
Comment 8 Gregor Wagner [:gwagner] 2011-01-28 17:57:18 PST
Created attachment 508063 [details]
GC events for FOTN: with background finalization of objects and strings
Comment 9 Gregor Wagner [:gwagner] 2011-01-28 17:59:48 PST
(In reply to comment #6)
> Created attachment 508060 [details] [diff] [review]
> Proof of concept: finalize obj + str on background thread
> 
> This patch finalizes strings and certain objects (mostly where clasp == NULL)

ups I mean clasp->finalize == null
Comment 10 Gregor Wagner [:gwagner] 2011-01-28 18:12:28 PST
Patrick could you try the patch from Comment 6 with your frame tool and tell me if you see any improvement for the FOTN?
Comment 11 Gregor Wagner [:gwagner] 2011-03-25 15:07:10 PDT
Created attachment 521969 [details] [diff] [review]
patch

This patch moves JSString and ShortString finalization to the gcHelperThread.
The main idea is that the whole arenaList is locked until we finalize it. This means no allocation until the whole arenalist is finalized. If a new string should be allocated, we have to allocate a new arena. The trick is now that we add the new arena at the beginning of the arenaList and we don't delete the head of the old arenaList even if it's empty. So we don't have to wait during allocation until the whole arenalist gets finalized. Otherwise we would move the pause to the first allocation.

Another problem is that we don't know how many arenas actually are empty at the end of a GC. This is a problem since we can no longer calculate an exact gcTrigger. We have to adjust the gcTrigger during background finalization.
Comment 12 Gregor Wagner [:gwagner] 2011-04-11 17:19:33 PDT
Created attachment 525226 [details] [diff] [review]
patch

GC Pause time reduces from 200ms to 100ms for the realtime raytracer and 30% for FOTN.
Comment 13 Gregor Wagner [:gwagner] 2011-04-12 10:52:54 PDT
Created attachment 525436 [details]
V8 Benchmark Trunk

Current situation for the V8 Benchmark. Almost all Time in the GC is spent in finalization.
Comment 14 Gregor Wagner [:gwagner] 2011-04-12 10:54:53 PDT
Created attachment 525439 [details]
V8 Benchmark with background finalization

Most of the finalization cost is gone. The average GC pause time reduces from 20 ms to 5-10ms. Even for the splay benchmark we reduce the pause time from 140ms to 60ms.
Comment 15 Andreas Gal :gal 2011-04-12 11:18:19 PDT
blizzard: you wanted nice marketable features, here is one for FF6 (see comment 13 and comment 14)
Comment 16 Gregor Wagner [:gwagner] 2011-04-13 10:49:50 PDT
Created attachment 525723 [details] [diff] [review]
patch

This patch changes the GC behavior significantly. 50% pause time reduction for the realtime raytracer, 30% for FOTN. More results will follow. I left the GC heuristics parameter unchanged but we might need to tweak them a little bit.

I can see a possible regression with this patch if we call JS_GC in a loop because we have to wait for the background thread to be done in order to start a new GC.

I introduced a new lock per chunk since the background thread can release arenas to the chunk and the normal JS thread can allocate arenas from the same chunk at the same time.
Every ArenaList has now a status variable that indicates if the arenalist was finalized. For the allocation path this means now that we allocate new arenas and insert them at the beginning of the list until we are done finalizing the current arenalist.

I am open for naming suggestions (finalize types, function names...)
We might want to clean up the gcHelperThread code with followups.
Comment 17 Andreas Gal :gal 2011-04-13 11:01:14 PDT
Comment on attachment 525723 [details] [diff] [review]
patch

> #define JSCLASS_HAS_PRIVATE             (1<<0)  /* objects have private slot */
> #define JSCLASS_NEW_ENUMERATE           (1<<1)  /* has JSNewEnumerateOp hook */
> #define JSCLASS_NEW_RESOLVE             (1<<2)  /* has JSNewResolveOp hook */
> #define JSCLASS_PRIVATE_IS_NSISUPPORTS  (1<<3)  /* private is (nsISupports *) */
>+#define JSCLASS_HAS_INTERNAL_FINALIZER  (1<<4)  /* finalize is called on background thread */

JSCLASS_CONCURRENT_FINALIZER ?

>     void setGCTriggerFactor(uint32 factor);
>     void setGCLastBytes(size_t lastBytes);
>+    void decreaseGCTriggerBytes(uint32 amount);

reduce? decrement?

>-    size_t                       gcBytes;
>-    size_t                       gcTriggerBytes;
>+    uint32                       gcBytes;
>+    uint32                       gcTriggerBytes;

Shouldn't we use size_t everywhere instead?

>     if (comp->arenas[FINALIZE_OBJECT0].arenasContainThing<JSObject>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT0_BACKGROUND].arenasContainThing<JSObject>(thing) ||
>         comp->arenas[FINALIZE_OBJECT2].arenasContainThing<JSObject_Slots2>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT2_BACKGROUND].arenasContainThing<JSObject_Slots2>(thing) ||
>         comp->arenas[FINALIZE_OBJECT4].arenasContainThing<JSObject_Slots4>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT4_BACKGROUND].arenasContainThing<JSObject_Slots4>(thing) ||
>         comp->arenas[FINALIZE_OBJECT8].arenasContainThing<JSObject_Slots8>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT8_BACKGROUND].arenasContainThing<JSObject_Slots8>(thing) ||
>         comp->arenas[FINALIZE_OBJECT12].arenasContainThing<JSObject_Slots12>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT12_BACKGROUND].arenasContainThing<JSObject_Slots12>(thing) ||
>         comp->arenas[FINALIZE_OBJECT16].arenasContainThing<JSObject_Slots16>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT16_BACKGROUND].arenasContainThing<JSObject_Slots16>(thing) ||
>         comp->arenas[FINALIZE_FUNCTION].arenasContainThing<JSFunction>(thing) ||
>         comp->arenas[FINALIZE_SHAPE].arenasContainThing<Shape>(thing) ||
> #if JS_HAS_XML_SUPPORT
>         comp->arenas[FINALIZE_XML].arenasContainThing<JSXML>(thing) ||
> #endif
>         comp->arenas[FINALIZE_STRING].arenasContainThing<JSString>(thing) ||
>         comp->arenas[FINALIZE_EXTERNAL_STRING].arenasContainThing<JSExternalString>(thing) ||
>         comp->arenas[FINALIZE_SHORT_STRING].arenasContainThing<JSShortString>(thing))

Use a loop?

>+    JS_ATOMIC_ADD(&rt->gcBytes, sizeof(Arena<T>));
>+    JS_ATOMIC_ADD(&comp->gcBytes, sizeof(Arena<T>));

Oh right this is why you want uint32. Lame.

> Class js_IteratorClass = {
>     "Iterator",
>-    JSCLASS_HAS_PRIVATE | JSCLASS_HAS_CACHED_PROTO(JSProto_Iterator),
>+    JSCLASS_HAS_PRIVATE | JSCLASS_HAS_INTERNAL_FINALIZER | 
>+        JSCLASS_HAS_CACHED_PROTO(JSProto_Iterator),

indent?

>+    if (kind % 2 == 0 && (!clasp->finalize || clasp->flags & JSCLASS_HAS_INTERNAL_FINALIZER))
>+        kind = (gc::FinalizeKind)(kind + 1);
>+

For the love of god make this nicer and comment it.

>+
>+    if (kind % 2 == 0 && (!clasp->finalize || clasp->flags & JSCLASS_HAS_INTERNAL_FINALIZER))
>+        kind = (gc::FinalizeKind)(kind + 1);
>+

Dito.

>+    if (!isFunction) {
>+        if (kind % 2 == 0 && (!clasp->finalize || clasp->flags & JSCLASS_HAS_INTERNAL_FINALIZER))
>+            kind = (gc::FinalizeKind)(kind + 1);
>+    }

Dito.

> Class ArrayBuffer::jsclass = {
>     "ArrayBuffer",
>-    JSCLASS_HAS_PRIVATE | JSCLASS_HAS_CACHED_PROTO(JSProto_ArrayBuffer),
>+    JSCLASS_HAS_PRIVATE | JSCLASS_HAS_INTERNAL_FINALIZER |
>+        JSCLASS_HAS_CACHED_PROTO(JSProto_ArrayBuffer),

Indent.
Comment 18 Gregor Wagner [:gwagner] 2011-04-13 11:51:09 PDT
(In reply to comment #17)
> Comment on attachment 525723 [details] [diff] [review]
> patch
> 
> > #define JSCLASS_HAS_PRIVATE             (1<<0)  /* objects have private slot */
> > #define JSCLASS_NEW_ENUMERATE           (1<<1)  /* has JSNewEnumerateOp hook */
> > #define JSCLASS_NEW_RESOLVE             (1<<2)  /* has JSNewResolveOp hook */
> > #define JSCLASS_PRIVATE_IS_NSISUPPORTS  (1<<3)  /* private is (nsISupports *) */
> >+#define JSCLASS_HAS_INTERNAL_FINALIZER  (1<<4)  /* finalize is called on background thread */
> 
> JSCLASS_CONCURRENT_FINALIZER ?

done. also the other name changes.

> 
> >     if (comp->arenas[FINALIZE_OBJECT0].arenasContainThing<JSObject>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT0_BACKGROUND].arenasContainThing<JSObject>(thing) ||
> >         comp->arenas[FINALIZE_OBJECT2].arenasContainThing<JSObject_Slots2>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT2_BACKGROUND].arenasContainThing<JSObject_Slots2>(thing) ||
> >         comp->arenas[FINALIZE_OBJECT4].arenasContainThing<JSObject_Slots4>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT4_BACKGROUND].arenasContainThing<JSObject_Slots4>(thing) ||
> >         comp->arenas[FINALIZE_OBJECT8].arenasContainThing<JSObject_Slots8>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT8_BACKGROUND].arenasContainThing<JSObject_Slots8>(thing) ||
> >         comp->arenas[FINALIZE_OBJECT12].arenasContainThing<JSObject_Slots12>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT12_BACKGROUND].arenasContainThing<JSObject_Slots12>(thing) ||
> >         comp->arenas[FINALIZE_OBJECT16].arenasContainThing<JSObject_Slots16>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT16_BACKGROUND].arenasContainThing<JSObject_Slots16>(thing) ||
> >         comp->arenas[FINALIZE_FUNCTION].arenasContainThing<JSFunction>(thing) ||
> >         comp->arenas[FINALIZE_SHAPE].arenasContainThing<Shape>(thing) ||
> > #if JS_HAS_XML_SUPPORT
> >         comp->arenas[FINALIZE_XML].arenasContainThing<JSXML>(thing) ||
> > #endif
> >         comp->arenas[FINALIZE_STRING].arenasContainThing<JSString>(thing) ||
> >         comp->arenas[FINALIZE_EXTERNAL_STRING].arenasContainThing<JSExternalString>(thing) ||
> >         comp->arenas[FINALIZE_SHORT_STRING].arenasContainThing<JSShortString>(thing))
> 
> Use a loop?

Types like <JSObject> are the problem...


> 
> >+
> >+    if (kind % 2 == 0 && (!clasp->finalize || clasp->flags & JSCLASS_HAS_INTERNAL_FINALIZER))
> >+        kind = (gc::FinalizeKind)(kind + 1);
> >+

With the new name we are below 100.
Comment 19 Gregor Wagner [:gwagner] 2011-04-13 13:45:05 PDT
http://hg.mozilla.org/tracemonkey/rev/79bb6e40bc61
Comment 20 Brendan Eich [:brendan] 2011-04-13 19:59:54 PDT
Great work!

/be
Comment 21 Igor Bukanov 2011-04-14 05:09:30 PDT
Gregor: what ensures that cx that is used for background finalization stays alive?
Comment 22 Gregor Wagner [:gwagner] 2011-04-14 08:11:20 PDT
(In reply to comment #21)
> Gregor: what ensures that cx that is used for background finalization stays
> alive?

I wait in destroyContext until the background thread is done.
Comment 23 Gregor Wagner [:gwagner] 2011-04-14 10:24:42 PDT
V8 benchmark numbers with this patch:
Score: 4885
Richards: 8059
DeltaBlue: 4861
Crypto: 8501
RayTrace: 4168
EarleyBoyer: 4802
RegExp: 1702
Splay: 5851

without:
Score: 4682
Richards: 7498
DeltaBlue: 4899
Crypto: 8523
RayTrace: 3819
EarleyBoyer: 4632
RegExp: 1686
Splay: 5281

I can see improvements all over the place but especially a great win on splay. The current chrome score for Splay on my machine is: 2909!
Comment 24 Mike Shaver (:shaver -- probably not reading bugmail closely) 2011-04-14 11:05:36 PDT
(We should generally expect that a non-copying GC beats a copying GC, mutatis mutandis, since it's designed to have a large live set and small garbage set during collection.)

Great numbers, though!
Comment 25 Nicholas Nethercote [:njn] 2011-04-18 21:43:49 PDT
(In reply to comment #24)
> (We should generally expect that a non-copying GC beats a copying GC, mutatis
> mutandis, since it's designed to have a large live set and small garbage set
> during collection.)

Isn't the conventional wisdom that v8bench benefits hugely from a generational GC?
Comment 26 Mike Shaver (:shaver -- probably not reading bugmail closely) 2011-04-18 21:57:01 PDT
Yes, shorter scavenges help where they can, though I think intentionally not in splay, IIRC from a conversation with the V8 guys, because the random linkage causes a lot of cross-generational references.  Splay was written as a hard-case tuning target for their GC.
Comment 27 Chris Leary [:cdleary] (not checking bugmail) 2011-04-26 15:40:11 PDT
http://hg.mozilla.org/mozilla-central/rev/79bb6e40bc61
Comment 28 Igor Bukanov 2011-06-15 08:51:55 PDT
*** Bug 584688 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.