The default bug view has changed. See this FAQ.

Background Finalization for Strings and Objects

RESOLVED FIXED

Status

()

Core
JavaScript Engine
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: gwagner, Assigned: gwagner)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(7 attachments, 4 obsolete attachments)

(Assignee)

Description

6 years ago
In order to make the GC pause time shorter we could finalize objects and strings outside the main GC cycle.
I want to start with strings and short strings.
(Assignee)

Updated

6 years ago
Assignee: general → anygregor
(Assignee)

Comment 1

6 years ago
Created attachment 506035 [details] [diff] [review]
WiP

Thats a first try of the implementation for finalizing JSStrings and ShortStrings on the background thread. It's WiP and not ready yet. There are still dirty hacks in there...

One problem that comes with background finalization is that after a GC we don't know if the GC successfully freed memory.
We only get this information after we finalized all arenas.
Another problem is that the next allocation shouldn't have to wait until finalization has finished. We would only move the pause time to the first allocation.
So I tried to allocate new arenas if we are still finalizing an arenalist and not take the currently finalized arenas for allocation.
I had to add a lock per chunk in order to insert into and remove from the chunklist that holds free arenas per chunk.

The results look very promising. The v8-splay benchmark shows some string finalization time and this is completely gone with this patch.
Sunspider in a threadsafe-enabled shell doesn't show any slowdown which shows that the added overhead doesn't have a major performance impact during the allocation path.
(Assignee)

Comment 2

6 years ago
Another problem comes from the fact that we don't know the WSS right after the GC. We assume that all arenas are full and calculate the next GC trigger with it.

We have to update the GC trigger during or after the finalization process.
(Assignee)

Comment 3

6 years ago
Created attachment 506060 [details]
spinning world map current trunk

Currently the GC pause time is around 50 mill cycles and about 15 mill cycles are spent in the string finalization.
(Assignee)

Comment 4

6 years ago
Created attachment 506061 [details]
spinning world map benchmark with this patch

With this patch the average GC pause time reduces to around 30 mill cycles. The finalization of the strings is now completely in the background.

Comment 5

6 years ago
We should really transition away from the stupid triggerGC stuff and measure working sets.
(Assignee)

Updated

6 years ago
Summary: Lazy Arena Finalization for Strings → Background Finalization for Strings
(Assignee)

Comment 6

6 years ago
Created attachment 508060 [details] [diff] [review]
Proof of concept: finalize obj + str on background thread

This patch finalizes strings and certain objects (mostly where clasp == NULL) on the background thread. Again there are dirty hacks in there and I disabled the  sweeping of shapes for now because sweeping shapes before objects is not a good idea...
An average GC pause for the FOTN reduces now from around 350 mill instructions to 230 mill instructions.
(Assignee)

Comment 7

6 years ago
Created attachment 508062 [details]
GC events for FOTN: trunk
(Assignee)

Comment 8

6 years ago
Created attachment 508063 [details]
GC events for FOTN: with background finalization of objects and strings
(Assignee)

Comment 9

6 years ago
(In reply to comment #6)
> Created attachment 508060 [details] [diff] [review]
> Proof of concept: finalize obj + str on background thread
> 
> This patch finalizes strings and certain objects (mostly where clasp == NULL)

ups I mean clasp->finalize == null
(Assignee)

Comment 10

6 years ago
Patrick could you try the patch from Comment 6 with your frame tool and tell me if you see any improvement for the FOTN?
(Assignee)

Comment 11

6 years ago
Created attachment 521969 [details] [diff] [review]
patch

This patch moves JSString and ShortString finalization to the gcHelperThread.
The main idea is that the whole arenaList is locked until we finalize it. This means no allocation until the whole arenalist is finalized. If a new string should be allocated, we have to allocate a new arena. The trick is now that we add the new arena at the beginning of the arenaList and we don't delete the head of the old arenaList even if it's empty. So we don't have to wait during allocation until the whole arenalist gets finalized. Otherwise we would move the pause to the first allocation.

Another problem is that we don't know how many arenas actually are empty at the end of a GC. This is a problem since we can no longer calculate an exact gcTrigger. We have to adjust the gcTrigger during background finalization.
Attachment #521969 - Flags: feedback?(gal)
(Assignee)

Updated

6 years ago
Summary: Background Finalization for Strings → Background Finalization for Strings and Objects
(Assignee)

Comment 12

6 years ago
Created attachment 525226 [details] [diff] [review]
patch

GC Pause time reduces from 200ms to 100ms for the realtime raytracer and 30% for FOTN.
Attachment #506035 - Attachment is obsolete: true
Attachment #508060 - Attachment is obsolete: true
Attachment #521969 - Attachment is obsolete: true
Attachment #521969 - Flags: feedback?(gal)
(Assignee)

Comment 13

6 years ago
Created attachment 525436 [details]
V8 Benchmark Trunk

Current situation for the V8 Benchmark. Almost all Time in the GC is spent in finalization.
(Assignee)

Comment 14

6 years ago
Created attachment 525439 [details]
V8 Benchmark with background finalization

Most of the finalization cost is gone. The average GC pause time reduces from 20 ms to 5-10ms. Even for the splay benchmark we reduce the pause time from 140ms to 60ms.

Comment 15

6 years ago
blizzard: you wanted nice marketable features, here is one for FF6 (see comment 13 and comment 14)
(Assignee)

Comment 16

6 years ago
Created attachment 525723 [details] [diff] [review]
patch

This patch changes the GC behavior significantly. 50% pause time reduction for the realtime raytracer, 30% for FOTN. More results will follow. I left the GC heuristics parameter unchanged but we might need to tweak them a little bit.

I can see a possible regression with this patch if we call JS_GC in a loop because we have to wait for the background thread to be done in order to start a new GC.

I introduced a new lock per chunk since the background thread can release arenas to the chunk and the normal JS thread can allocate arenas from the same chunk at the same time.
Every ArenaList has now a status variable that indicates if the arenalist was finalized. For the allocation path this means now that we allocate new arenas and insert them at the beginning of the list until we are done finalizing the current arenalist.

I am open for naming suggestions (finalize types, function names...)
We might want to clean up the gcHelperThread code with followups.
Attachment #525226 - Attachment is obsolete: true
Attachment #525723 - Flags: review?(gal)

Comment 17

6 years ago
Comment on attachment 525723 [details] [diff] [review]
patch

> #define JSCLASS_HAS_PRIVATE             (1<<0)  /* objects have private slot */
> #define JSCLASS_NEW_ENUMERATE           (1<<1)  /* has JSNewEnumerateOp hook */
> #define JSCLASS_NEW_RESOLVE             (1<<2)  /* has JSNewResolveOp hook */
> #define JSCLASS_PRIVATE_IS_NSISUPPORTS  (1<<3)  /* private is (nsISupports *) */
>+#define JSCLASS_HAS_INTERNAL_FINALIZER  (1<<4)  /* finalize is called on background thread */

JSCLASS_CONCURRENT_FINALIZER ?

>     void setGCTriggerFactor(uint32 factor);
>     void setGCLastBytes(size_t lastBytes);
>+    void decreaseGCTriggerBytes(uint32 amount);

reduce? decrement?

>-    size_t                       gcBytes;
>-    size_t                       gcTriggerBytes;
>+    uint32                       gcBytes;
>+    uint32                       gcTriggerBytes;

Shouldn't we use size_t everywhere instead?

>     if (comp->arenas[FINALIZE_OBJECT0].arenasContainThing<JSObject>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT0_BACKGROUND].arenasContainThing<JSObject>(thing) ||
>         comp->arenas[FINALIZE_OBJECT2].arenasContainThing<JSObject_Slots2>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT2_BACKGROUND].arenasContainThing<JSObject_Slots2>(thing) ||
>         comp->arenas[FINALIZE_OBJECT4].arenasContainThing<JSObject_Slots4>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT4_BACKGROUND].arenasContainThing<JSObject_Slots4>(thing) ||
>         comp->arenas[FINALIZE_OBJECT8].arenasContainThing<JSObject_Slots8>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT8_BACKGROUND].arenasContainThing<JSObject_Slots8>(thing) ||
>         comp->arenas[FINALIZE_OBJECT12].arenasContainThing<JSObject_Slots12>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT12_BACKGROUND].arenasContainThing<JSObject_Slots12>(thing) ||
>         comp->arenas[FINALIZE_OBJECT16].arenasContainThing<JSObject_Slots16>(thing) ||
>+        comp->arenas[FINALIZE_OBJECT16_BACKGROUND].arenasContainThing<JSObject_Slots16>(thing) ||
>         comp->arenas[FINALIZE_FUNCTION].arenasContainThing<JSFunction>(thing) ||
>         comp->arenas[FINALIZE_SHAPE].arenasContainThing<Shape>(thing) ||
> #if JS_HAS_XML_SUPPORT
>         comp->arenas[FINALIZE_XML].arenasContainThing<JSXML>(thing) ||
> #endif
>         comp->arenas[FINALIZE_STRING].arenasContainThing<JSString>(thing) ||
>         comp->arenas[FINALIZE_EXTERNAL_STRING].arenasContainThing<JSExternalString>(thing) ||
>         comp->arenas[FINALIZE_SHORT_STRING].arenasContainThing<JSShortString>(thing))

Use a loop?

>+    JS_ATOMIC_ADD(&rt->gcBytes, sizeof(Arena<T>));
>+    JS_ATOMIC_ADD(&comp->gcBytes, sizeof(Arena<T>));

Oh right this is why you want uint32. Lame.

> Class js_IteratorClass = {
>     "Iterator",
>-    JSCLASS_HAS_PRIVATE | JSCLASS_HAS_CACHED_PROTO(JSProto_Iterator),
>+    JSCLASS_HAS_PRIVATE | JSCLASS_HAS_INTERNAL_FINALIZER | 
>+        JSCLASS_HAS_CACHED_PROTO(JSProto_Iterator),

indent?

>+    if (kind % 2 == 0 && (!clasp->finalize || clasp->flags & JSCLASS_HAS_INTERNAL_FINALIZER))
>+        kind = (gc::FinalizeKind)(kind + 1);
>+

For the love of god make this nicer and comment it.

>+
>+    if (kind % 2 == 0 && (!clasp->finalize || clasp->flags & JSCLASS_HAS_INTERNAL_FINALIZER))
>+        kind = (gc::FinalizeKind)(kind + 1);
>+

Dito.

>+    if (!isFunction) {
>+        if (kind % 2 == 0 && (!clasp->finalize || clasp->flags & JSCLASS_HAS_INTERNAL_FINALIZER))
>+            kind = (gc::FinalizeKind)(kind + 1);
>+    }

Dito.

> Class ArrayBuffer::jsclass = {
>     "ArrayBuffer",
>-    JSCLASS_HAS_PRIVATE | JSCLASS_HAS_CACHED_PROTO(JSProto_ArrayBuffer),
>+    JSCLASS_HAS_PRIVATE | JSCLASS_HAS_INTERNAL_FINALIZER |
>+        JSCLASS_HAS_CACHED_PROTO(JSProto_ArrayBuffer),

Indent.
Attachment #525723 - Flags: review?(gal) → review+
(Assignee)

Comment 18

6 years ago
(In reply to comment #17)
> Comment on attachment 525723 [details] [diff] [review]
> patch
> 
> > #define JSCLASS_HAS_PRIVATE             (1<<0)  /* objects have private slot */
> > #define JSCLASS_NEW_ENUMERATE           (1<<1)  /* has JSNewEnumerateOp hook */
> > #define JSCLASS_NEW_RESOLVE             (1<<2)  /* has JSNewResolveOp hook */
> > #define JSCLASS_PRIVATE_IS_NSISUPPORTS  (1<<3)  /* private is (nsISupports *) */
> >+#define JSCLASS_HAS_INTERNAL_FINALIZER  (1<<4)  /* finalize is called on background thread */
> 
> JSCLASS_CONCURRENT_FINALIZER ?

done. also the other name changes.

> 
> >     if (comp->arenas[FINALIZE_OBJECT0].arenasContainThing<JSObject>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT0_BACKGROUND].arenasContainThing<JSObject>(thing) ||
> >         comp->arenas[FINALIZE_OBJECT2].arenasContainThing<JSObject_Slots2>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT2_BACKGROUND].arenasContainThing<JSObject_Slots2>(thing) ||
> >         comp->arenas[FINALIZE_OBJECT4].arenasContainThing<JSObject_Slots4>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT4_BACKGROUND].arenasContainThing<JSObject_Slots4>(thing) ||
> >         comp->arenas[FINALIZE_OBJECT8].arenasContainThing<JSObject_Slots8>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT8_BACKGROUND].arenasContainThing<JSObject_Slots8>(thing) ||
> >         comp->arenas[FINALIZE_OBJECT12].arenasContainThing<JSObject_Slots12>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT12_BACKGROUND].arenasContainThing<JSObject_Slots12>(thing) ||
> >         comp->arenas[FINALIZE_OBJECT16].arenasContainThing<JSObject_Slots16>(thing) ||
> >+        comp->arenas[FINALIZE_OBJECT16_BACKGROUND].arenasContainThing<JSObject_Slots16>(thing) ||
> >         comp->arenas[FINALIZE_FUNCTION].arenasContainThing<JSFunction>(thing) ||
> >         comp->arenas[FINALIZE_SHAPE].arenasContainThing<Shape>(thing) ||
> > #if JS_HAS_XML_SUPPORT
> >         comp->arenas[FINALIZE_XML].arenasContainThing<JSXML>(thing) ||
> > #endif
> >         comp->arenas[FINALIZE_STRING].arenasContainThing<JSString>(thing) ||
> >         comp->arenas[FINALIZE_EXTERNAL_STRING].arenasContainThing<JSExternalString>(thing) ||
> >         comp->arenas[FINALIZE_SHORT_STRING].arenasContainThing<JSShortString>(thing))
> 
> Use a loop?

Types like <JSObject> are the problem...


> 
> >+
> >+    if (kind % 2 == 0 && (!clasp->finalize || clasp->flags & JSCLASS_HAS_INTERNAL_FINALIZER))
> >+        kind = (gc::FinalizeKind)(kind + 1);
> >+

With the new name we are below 100.
(Assignee)

Comment 19

6 years ago
http://hg.mozilla.org/tracemonkey/rev/79bb6e40bc61
Whiteboard: fixed-in-tracemonkey
Great work!

/be

Comment 21

6 years ago
Gregor: what ensures that cx that is used for background finalization stays alive?
(Assignee)

Comment 22

6 years ago
(In reply to comment #21)
> Gregor: what ensures that cx that is used for background finalization stays
> alive?

I wait in destroyContext until the background thread is done.
(Assignee)

Comment 23

6 years ago
V8 benchmark numbers with this patch:
Score: 4885
Richards: 8059
DeltaBlue: 4861
Crypto: 8501
RayTrace: 4168
EarleyBoyer: 4802
RegExp: 1702
Splay: 5851

without:
Score: 4682
Richards: 7498
DeltaBlue: 4899
Crypto: 8523
RayTrace: 3819
EarleyBoyer: 4632
RegExp: 1686
Splay: 5281

I can see improvements all over the place but especially a great win on splay. The current chrome score for Splay on my machine is: 2909!
(We should generally expect that a non-copying GC beats a copying GC, mutatis mutandis, since it's designed to have a large live set and small garbage set during collection.)

Great numbers, though!
(In reply to comment #24)
> (We should generally expect that a non-copying GC beats a copying GC, mutatis
> mutandis, since it's designed to have a large live set and small garbage set
> during collection.)

Isn't the conventional wisdom that v8bench benefits hugely from a generational GC?
Yes, shorter scavenges help where they can, though I think intentionally not in splay, IIRC from a conversation with the V8 guys, because the random linkage causes a lot of cross-generational references.  Splay was written as a hard-case tuning target for their GC.
(Assignee)

Updated

6 years ago
Depends on: 651193

Updated

6 years ago
Depends on: 652416
http://hg.mozilla.org/mozilla-central/rev/79bb6e40bc61
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Depends on: 654028

Updated

6 years ago
Duplicate of this bug: 584688
You need to log in before you can comment on or make changes to this bug.