Closed Bug 602534 Opened 14 years ago Closed 14 years ago

simpler scheduling of the last ditch GC

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: igor, Assigned: igor)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 5 obsolete files)

Currently the scheduling of the last ditch is spreed through few allocation functions in jsgc.cpp. That makes it harder to follow. We should simplify that.
Attached patch v1 (obsolete) — Splinter Review
The patch moves all calculations about the need to run the last-ditch GC into RefillFinalizableFreeList. It also treats waiveGCQuota (renamed into disableLastDitchGC) in exactly the same way as the been on trace. That is, the last ditch GC is not run but just scheduled and the allocator is allowed to breach quotas.

It also merges gcNewArenaTriggerBytes into gcTriggerBytes. The value of the latter is calculated after the GC based on both gcTriggerFactor and arena heuristics. It seems we should be able to replace all these heuristics with simple:

gcTriggerBytes = Max(gcLastBytes, some_threshold) * gcTriggerFactor

but this is for another bug as here I want to preserve the current logic.
Attachment #481561 - Flags: review?(anygregor)
Attached patch v1 for real (obsolete) — Splinter Review
Attachment #481561 - Attachment is obsolete: true
Attachment #481566 - Flags: review?(anygregor)
Attachment #481561 - Flags: review?(anygregor)
(In reply to comment #0)
> Currently the scheduling of the last ditch is spreed through few allocation
> functions in jsgc.cpp. That makes it harder to follow. We should simplify that.

Yeah we should definitely do this!

I can't compile it:
jstracer.cpp
../jstracer.cpp: In function ‘void js::LeaveTree(js::TraceMonitor*, js::TracerState&, js::VMSideExit*)’:
../jstracer.cpp:6709: error: ‘AutoDisableLastDitchGC’ was not declared in this scope
../jstracer.cpp:6709: error: expected `;' before ‘disable’
../nanojit/NativeX64.h: At global scope:
../nanojit/NativeX64.h:339: warning: ‘nanojit::SavedRegs’ defined but not used
../nanojit/NativeX64.h:347: warning: ‘nanojit::SingleByteStoreRegs’ defined but not used
make[1]: *** [jstracer.o] Error 1
make[1]: *** Waiting for unfinished jobs....
Attached patch v2 (obsolete) — Splinter Review
In v1 I have put AutoDisableLastDitchGC under debug-only section.
Attachment #481566 - Attachment is obsolete: true
Attachment #481595 - Flags: review?(anygregor)
Attachment #481566 - Flags: review?(anygregor)
I see some small but steady slowdown for the v8 benchmark shell. Especially the splay benchmark. There is no extra GC run involved. 

TEST              COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:      ??                2643.0ms +/- 1.0%   2667.1ms +/- 0.5%     not conclusive: might be *1.009x as slow*

=============================================================================

  v8:             ??                2643.0ms +/- 1.0%   2667.1ms +/- 0.5%     not conclusive: might be *1.009x as slow*
    crypto:       -                  249.1ms +/- 3.9%    247.9ms +/- 5.1% 
    deltablue:    -                  322.3ms +/- 1.2%    321.6ms +/- 1.0% 
    earley-boyer: ??                 944.9ms +/- 2.4%    955.8ms +/- 1.0%     not conclusive: might be *1.012x as slow*
    raytrace:     -                  446.5ms +/- 1.3%    445.3ms +/- 1.3% 
    regexp:       -                  169.4ms +/- 0.3%    169.2ms +/- 0.4% 
    richards:     ??                 165.6ms +/- 0.3%    166.6ms +/- 1.3%     not conclusive: might be *1.006x as slow*
    splay:        *1.045x as slow*   345.2ms +/- 1.0%    360.7ms +/- 1.0%     significant
(In reply to comment #5)
> I see some small but steady slowdown for the v8 benchmark shell. Especially the
> splay benchmark. There is no extra GC run involved. 

There are two possibilities. 

First it could be the effect of doing the IsGCThresholdReached(rt) check in RefillFinalizableFreeList before considering the free list of arenas. The patch will do that even on trace to trigger the GC asap. On 64-bit system we have about 40 objects per arena before RefillFinalizableFreeList is called and the extra checks do not contribute significantly to the code complexity as at worst they increases the number of checks from 5 to 7 on the fast path. Assuming that code only doing repeated call to NewGCThing without doing anything else that would increase the number if checks from 40*2 + 5 to 40*2 + 7 or 2.3%. That cannot explain the 4.5% slowdown at all especially given that for the benchmarks the free list is exhausted very quickly and AllocateArena is called much more often and there the patch provide a win by removing the extra checks.

So I suppose a bigger factor could come from cache effects from removal of one member in JSRuntime. If you add something like "size_t  dummy;" right after gcMaxMallocBytes; in JSRuntime on top of the patch, would it affect the numbers?
I added a dummy value right after gcMaxMAllocBytes but it doesn't help. I still see the persistent 25-30 ms slowdown for splay.
Attached patch v3 (obsolete) — Splinter Review
Finally I managed to disable Intel's Turbo Boost on my laptop and reproduce the v8 slowdown. It looks like a cache line miss when accessing JSRuntime:: counters. What is puzzling is why access JSRuntime->isRunning at the very beginning of Refill does not cost that match. Perhaps CPU predicts never taken branches really well?

In any case, the new patch avoids all those slowdowns by calling TriggerGC right after increasing gcBytes after arena allocation and discovering that gcBytes is too big. The patch also adds the TriggerGC call right after failed chunk allocation.

This way on trace the code does not need to check any counters in Refill (it stays the same as before the patch) - the GC will be scheduled in any case at that point so the last-ditch related checks can be skipped immediately after discovering the on-trace condition. On the other hand all the memory-pressure related checks in PickChunk are replaced with a single check for gcBytes in Arena::alocateArena right after increasing gcBytes.
Attachment #481595 - Attachment is obsolete: true
Attachment #482037 - Flags: review?(anygregor)
Attachment #481595 - Flags: review?(anygregor)
Attached patch v4 (obsolete) — Splinter Review
The new version moves the gcTriggerBytes field right after gcBytes so the check gcBytes >= gcTriggerBytes will most likely access things from one cache line.
Attachment #482037 - Attachment is obsolete: true
Attachment #482037 - Flags: review?(anygregor)
Comment on attachment 482039 [details] [diff] [review]
v4

>diff --git a/js/src/jscntxt.h b/js/src/jscntxt.h
>--- a/js/src/jscntxt.h
>+++ b/js/src/jscntxt.h
>@@ -1118,20 +1118,19 @@ struct JSThreadData {
>      * towards rt->interruptCounter.
>      */
>     volatile int32      interruptFlags;
> 
>     /* Keeper of the contiguous stack used by all contexts in this thread. */
>     js::StackSpace      stackSpace;
> 
>     /*
>      * Flag indicating that we are waiving any soft limits on the GC heap
>-     * because we want allocations to be infallible (except when we hit
>-     * a hard quota).
>+     * because we want allocations to be infallible (except when we hit OOM).
>      */
>     bool                waiveGCQuota;
> 
>     /*
>      * The GSN cache is per thread since even multi-cx-per-thread embeddings
>      * do not interleave js_GetSrcNote calls.
>      */
>     JSGSNCache          gsnCache;
> 
>@@ -1332,33 +1331,35 @@ struct JSRuntime {
>     uint32              protoHazardShape;
> 
>     /* Garbage collector state, used by jsgc.c. */
>     js::GCChunkSet      gcChunkSet;
> 
>     js::RootedValueMap  gcRootsHash;
>     js::GCLocks         gcLocksHash;
>     jsrefcount          gcKeepAtoms;
>     size_t              gcBytes;
>+    size_t              gcTriggerBytes;
>     size_t              gcLastBytes;
>     size_t              gcMaxBytes;
>     size_t              gcMaxMallocBytes;
>-    size_t              gcNewArenaTriggerBytes;
>     uint32              gcEmptyArenaPoolLifespan;
>     uint32              gcNumber;
>     js::GCMarker        *gcMarkingTracer;
>     uint32              gcTriggerFactor;
>-    size_t              gcTriggerBytes;
>     volatile JSBool     gcIsNeeded;
> 
>     /*
>-     * NB: do not pack another flag here by claiming gcPadding unless the new
>-     * flag is written only by the GC thread.  Atomic updates to packed bytes
>-     * are not guaranteed, so stores issued by one thread may be lost due to
>+     * We can pack these flags as only the GC thread writes to them. Atomic
>+     * updates to packed bytes are not guaranteed, so stores issued by one
>+     * thread may be lost due to unsynchronized read-modify-write cycles on
>+     * other threads. NB: do not pack another flag here unless the new flag is
>+     * written only by the GC thread.  Atomic updates to packed bytes are not
>+     * guaranteed, so stores issued by one thread may be lost due to
>      * unsynchronized read-modify-write cycles on other threads.

nit: a little bit redundant?


> 
> void
> JSRuntime::setGCLastBytes(size_t lastBytes)
> {
>     gcLastBytes = lastBytes;
>-    uint64 triggerBytes = uint64(lastBytes) * uint64(gcTriggerFactor / 100);
>-    if (triggerBytes != size_t(triggerBytes))
>-        triggerBytes = size_t(-1);
>-    gcTriggerBytes = size_t(triggerBytes);
>+    float trigger1 = float(lastBytes) * float(gcTriggerFactor) / 100.0f;
>+    float trigger2 = float(Max(lastBytes, GC_ARENA_ALLOCATION_TRIGGER)) *
>+                     GC_HEAP_GROWTH_FACTOR;
>+    float maxtriger = Max(trigger1, trigger2);

nit: maxTrigger
Maybe more meaningful names for trigger1, trigger2?
Attachment #482039 - Flags: review+
Blocks: 603916
Attached patch v5Splinter Review
Here is a rebased patch with nits addressed.
Attachment #482039 - Attachment is obsolete: true
Attachment #482814 - Flags: review+
http://hg.mozilla.org/tracemonkey/rev/f976b72fafaf
Whiteboard: fixed-in-tracemonkey
Depends on: 604831
http://hg.mozilla.org/mozilla-central/rev/f976b72fafaf
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
>+    float maxtriger = Max(trigger1, trigger2);

nit: maxTrigger

There is still the maxtriger in the code.
(In reply to comment #14)
> >+    float maxtriger = Max(trigger1, trigger2);
> 
> nit: maxTrigger
> 
> There is still the maxtriger in the code.

Sorry about that, I will fix that in the bug 603916.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: