Closed
Bug 602534
Opened 14 years ago
Closed 14 years ago
simpler scheduling of the last ditch GC
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: igor, Assigned: igor)
References
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(1 file, 5 obsolete files)
22.48 KB,
patch
|
igor
:
review+
|
Details | Diff | Splinter Review |
Currently the scheduling of the last ditch is spreed through few allocation functions in jsgc.cpp. That makes it harder to follow. We should simplify that.
Assignee | ||
Comment 1•14 years ago
|
||
The patch moves all calculations about the need to run the last-ditch GC into RefillFinalizableFreeList. It also treats waiveGCQuota (renamed into disableLastDitchGC) in exactly the same way as the been on trace. That is, the last ditch GC is not run but just scheduled and the allocator is allowed to breach quotas. It also merges gcNewArenaTriggerBytes into gcTriggerBytes. The value of the latter is calculated after the GC based on both gcTriggerFactor and arena heuristics. It seems we should be able to replace all these heuristics with simple: gcTriggerBytes = Max(gcLastBytes, some_threshold) * gcTriggerFactor but this is for another bug as here I want to preserve the current logic.
Attachment #481561 -
Flags: review?(anygregor)
Assignee | ||
Comment 2•14 years ago
|
||
Attachment #481561 -
Attachment is obsolete: true
Attachment #481566 -
Flags: review?(anygregor)
Attachment #481561 -
Flags: review?(anygregor)
Comment 3•14 years ago
|
||
(In reply to comment #0) > Currently the scheduling of the last ditch is spreed through few allocation > functions in jsgc.cpp. That makes it harder to follow. We should simplify that. Yeah we should definitely do this! I can't compile it: jstracer.cpp ../jstracer.cpp: In function ‘void js::LeaveTree(js::TraceMonitor*, js::TracerState&, js::VMSideExit*)’: ../jstracer.cpp:6709: error: ‘AutoDisableLastDitchGC’ was not declared in this scope ../jstracer.cpp:6709: error: expected `;' before ‘disable’ ../nanojit/NativeX64.h: At global scope: ../nanojit/NativeX64.h:339: warning: ‘nanojit::SavedRegs’ defined but not used ../nanojit/NativeX64.h:347: warning: ‘nanojit::SingleByteStoreRegs’ defined but not used make[1]: *** [jstracer.o] Error 1 make[1]: *** Waiting for unfinished jobs....
Assignee | ||
Comment 4•14 years ago
|
||
In v1 I have put AutoDisableLastDitchGC under debug-only section.
Attachment #481566 -
Attachment is obsolete: true
Attachment #481595 -
Flags: review?(anygregor)
Attachment #481566 -
Flags: review?(anygregor)
Comment 5•14 years ago
|
||
I see some small but steady slowdown for the v8 benchmark shell. Especially the splay benchmark. There is no extra GC run involved. TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: ?? 2643.0ms +/- 1.0% 2667.1ms +/- 0.5% not conclusive: might be *1.009x as slow* ============================================================================= v8: ?? 2643.0ms +/- 1.0% 2667.1ms +/- 0.5% not conclusive: might be *1.009x as slow* crypto: - 249.1ms +/- 3.9% 247.9ms +/- 5.1% deltablue: - 322.3ms +/- 1.2% 321.6ms +/- 1.0% earley-boyer: ?? 944.9ms +/- 2.4% 955.8ms +/- 1.0% not conclusive: might be *1.012x as slow* raytrace: - 446.5ms +/- 1.3% 445.3ms +/- 1.3% regexp: - 169.4ms +/- 0.3% 169.2ms +/- 0.4% richards: ?? 165.6ms +/- 0.3% 166.6ms +/- 1.3% not conclusive: might be *1.006x as slow* splay: *1.045x as slow* 345.2ms +/- 1.0% 360.7ms +/- 1.0% significant
Assignee | ||
Comment 6•14 years ago
|
||
(In reply to comment #5) > I see some small but steady slowdown for the v8 benchmark shell. Especially the > splay benchmark. There is no extra GC run involved. There are two possibilities. First it could be the effect of doing the IsGCThresholdReached(rt) check in RefillFinalizableFreeList before considering the free list of arenas. The patch will do that even on trace to trigger the GC asap. On 64-bit system we have about 40 objects per arena before RefillFinalizableFreeList is called and the extra checks do not contribute significantly to the code complexity as at worst they increases the number of checks from 5 to 7 on the fast path. Assuming that code only doing repeated call to NewGCThing without doing anything else that would increase the number if checks from 40*2 + 5 to 40*2 + 7 or 2.3%. That cannot explain the 4.5% slowdown at all especially given that for the benchmarks the free list is exhausted very quickly and AllocateArena is called much more often and there the patch provide a win by removing the extra checks. So I suppose a bigger factor could come from cache effects from removal of one member in JSRuntime. If you add something like "size_t dummy;" right after gcMaxMallocBytes; in JSRuntime on top of the patch, would it affect the numbers?
Comment 7•14 years ago
|
||
I added a dummy value right after gcMaxMAllocBytes but it doesn't help. I still see the persistent 25-30 ms slowdown for splay.
Assignee | ||
Comment 8•14 years ago
|
||
Finally I managed to disable Intel's Turbo Boost on my laptop and reproduce the v8 slowdown. It looks like a cache line miss when accessing JSRuntime:: counters. What is puzzling is why access JSRuntime->isRunning at the very beginning of Refill does not cost that match. Perhaps CPU predicts never taken branches really well? In any case, the new patch avoids all those slowdowns by calling TriggerGC right after increasing gcBytes after arena allocation and discovering that gcBytes is too big. The patch also adds the TriggerGC call right after failed chunk allocation. This way on trace the code does not need to check any counters in Refill (it stays the same as before the patch) - the GC will be scheduled in any case at that point so the last-ditch related checks can be skipped immediately after discovering the on-trace condition. On the other hand all the memory-pressure related checks in PickChunk are replaced with a single check for gcBytes in Arena::alocateArena right after increasing gcBytes.
Attachment #481595 -
Attachment is obsolete: true
Attachment #482037 -
Flags: review?(anygregor)
Attachment #481595 -
Flags: review?(anygregor)
Assignee | ||
Comment 9•14 years ago
|
||
The new version moves the gcTriggerBytes field right after gcBytes so the check gcBytes >= gcTriggerBytes will most likely access things from one cache line.
Attachment #482037 -
Attachment is obsolete: true
Attachment #482037 -
Flags: review?(anygregor)
Comment 10•14 years ago
|
||
Comment on attachment 482039 [details] [diff] [review] v4 >diff --git a/js/src/jscntxt.h b/js/src/jscntxt.h >--- a/js/src/jscntxt.h >+++ b/js/src/jscntxt.h >@@ -1118,20 +1118,19 @@ struct JSThreadData { > * towards rt->interruptCounter. > */ > volatile int32 interruptFlags; > > /* Keeper of the contiguous stack used by all contexts in this thread. */ > js::StackSpace stackSpace; > > /* > * Flag indicating that we are waiving any soft limits on the GC heap >- * because we want allocations to be infallible (except when we hit >- * a hard quota). >+ * because we want allocations to be infallible (except when we hit OOM). > */ > bool waiveGCQuota; > > /* > * The GSN cache is per thread since even multi-cx-per-thread embeddings > * do not interleave js_GetSrcNote calls. > */ > JSGSNCache gsnCache; > >@@ -1332,33 +1331,35 @@ struct JSRuntime { > uint32 protoHazardShape; > > /* Garbage collector state, used by jsgc.c. */ > js::GCChunkSet gcChunkSet; > > js::RootedValueMap gcRootsHash; > js::GCLocks gcLocksHash; > jsrefcount gcKeepAtoms; > size_t gcBytes; >+ size_t gcTriggerBytes; > size_t gcLastBytes; > size_t gcMaxBytes; > size_t gcMaxMallocBytes; >- size_t gcNewArenaTriggerBytes; > uint32 gcEmptyArenaPoolLifespan; > uint32 gcNumber; > js::GCMarker *gcMarkingTracer; > uint32 gcTriggerFactor; >- size_t gcTriggerBytes; > volatile JSBool gcIsNeeded; > > /* >- * NB: do not pack another flag here by claiming gcPadding unless the new >- * flag is written only by the GC thread. Atomic updates to packed bytes >- * are not guaranteed, so stores issued by one thread may be lost due to >+ * We can pack these flags as only the GC thread writes to them. Atomic >+ * updates to packed bytes are not guaranteed, so stores issued by one >+ * thread may be lost due to unsynchronized read-modify-write cycles on >+ * other threads. NB: do not pack another flag here unless the new flag is >+ * written only by the GC thread. Atomic updates to packed bytes are not >+ * guaranteed, so stores issued by one thread may be lost due to > * unsynchronized read-modify-write cycles on other threads. nit: a little bit redundant? > > void > JSRuntime::setGCLastBytes(size_t lastBytes) > { > gcLastBytes = lastBytes; >- uint64 triggerBytes = uint64(lastBytes) * uint64(gcTriggerFactor / 100); >- if (triggerBytes != size_t(triggerBytes)) >- triggerBytes = size_t(-1); >- gcTriggerBytes = size_t(triggerBytes); >+ float trigger1 = float(lastBytes) * float(gcTriggerFactor) / 100.0f; >+ float trigger2 = float(Max(lastBytes, GC_ARENA_ALLOCATION_TRIGGER)) * >+ GC_HEAP_GROWTH_FACTOR; >+ float maxtriger = Max(trigger1, trigger2); nit: maxTrigger Maybe more meaningful names for trigger1, trigger2?
Attachment #482039 -
Flags: review+
Assignee | ||
Comment 11•14 years ago
|
||
Here is a rebased patch with nits addressed.
Attachment #482039 -
Attachment is obsolete: true
Attachment #482814 -
Flags: review+
Assignee | ||
Comment 12•14 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/f976b72fafaf
Whiteboard: fixed-in-tracemonkey
Comment 13•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/f976b72fafaf
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 14•14 years ago
|
||
>+ float maxtriger = Max(trigger1, trigger2);
nit: maxTrigger
There is still the maxtriger in the code.
Assignee | ||
Comment 15•14 years ago
|
||
(In reply to comment #14) > >+ float maxtriger = Max(trigger1, trigger2); > > nit: maxTrigger > > There is still the maxtriger in the code. Sorry about that, I will fix that in the bug 603916.
You need to log in
before you can comment on or make changes to this bug.
Description
•