Closed Bug 486504 Opened 15 years ago Closed 15 years ago

Collection policy changes for efficiency and incrementality

Categories

(Tamarin Graveyard :: Garbage Collection (mmGC), defect, P2)

defect

Tracking

(Not tracked)

VERIFIED FIXED
flash10.1

People

(Reporter: lhansen, Assigned: lhansen)

References

Details

Attachments

(1 file, 19 obsolete files)

11.22 KB, patch
Details | Diff | Splinter Review
The GC policy isn't always right.  I will track those issues here.

So far I have identified at least four significant issues.

The first is that GC::AllocBlockIncremental may run a full collection because it panics when it discovers that a block can't be allocated without expanding the heap.  It's easy to construct examples where this full collection locks up the system for more than a second on slowish (400MHz) devices, not counting finalization.

The second is that GC::Finalize scans the entire heap and runs finalizers on all finalizable dead objects, which is to say, almost all dead objects - because almost all objects are RCObjects.  The example I mentioned above can spend over 300ms in finalization on my 400MHz PowerBook G3.

The third is that ZCT reaping time is not bounded.  When large data structures are dropped the time to delete them depends on the size of the data structure and the time to delete each object in it.

The fourth is that incremental collection work is controlled by two time parameters: the quantum of marking and the minimum time between increments.  But in order to guarantee that heap consumption stays reasonably within the parameters we've selected marking must be driven by allocation work.  In particular, I suspect that even when the allocator picks an object off the free list we should consider running the collector.  Why?  Because otherwise we're going to allocate everything that's on the free lists, and /then/ we'll allocate a fresh block before the collector can even consider starting to run, and /then/ it will only run once we've allocated heapsize/4 memory.  For a program in the steady state this means that the heap may expand without limit just in order to keep the collector running.

(There's a related bug, which is that marking time is bounded by a time quantum but there's no guarantee that it will finish within that time.  I've seen mark increments take 250ms when the quantum was 5ms.  The recursive marker is probably to blame for this, as the "time" to mark one item varies significantly depending on whether that item references many on-page objects.  It's not hard to construct programs that result in those situations.)
This generalization makes the policy manager capture the elapsed time for each phase.  This can be used for accounting and profiling (the GC has some of the same code and we can replace the GC code with a call to the policy manager by and by, if desired) and I'm introducing it so that I can track pause times for each phase by means of simple instrumentation in the policy manager.
Attachment #370677 - Flags: review?(treilly)
Comment on attachment 370677 [details] [diff] [review]
Slight generalization of policy manager

Coolio, it'd be nice if the max pause time could be tracked and included in the DumpMemoryInfo output, including what event caused it

Also how useful is a running total of time w/o the number of times that event happened, I guess if you are comparing to total program runtime these sums could be useful
Attachment #370677 - Flags: review?(treilly) → review+
(In reply to comment #2)
> (From update of attachment 370677 [details] [diff] [review])
> Coolio, it'd be nice if the max pause time could be tracked and included in the
> DumpMemoryInfo output, including what event caused it
> 
> Also how useful is a running total of time w/o the number of times that event
> happened, I guess if you are comparing to total program runtime these sums
> could be useful

Noted.  I will add whatever it takes to make these useful / supplant whatever code is currently in the GC that does very similar things.
Comment on attachment 370677 [details] [diff] [review]
Slight generalization of policy manager

redux changeset: 1663:761e61b0d309
Attached file The test case "ofib" (obsolete) —
"ofib" stresses GC and is impervious to RC, see comments in the file for more details.

When I run this on my 400MHz PowerBook G3, the "panic" mode of MMgc (where it pushes a collection through to finalization instead of allocating a fresh block from the heap) causes FinalRootAndStackScan to take about 1100ms once the program reaches a comfortable working set, with FinalizeAndSweep taking another 390ms, for a total stop-the-world pause of 1500ms, give or take.  Almost no incremental mark steps are run during the ramp-up to the comfort zone, lots of full collections are run instead, and even when the comfort zone is reached not many mark steps are run.  GC accounts for nearly 60% of the running time.

The comfortable heap size - very informal measurement here, more to come later! - is about 1.8 times the maximum live size of 23MB, or about 42MB.

(I have pending logging code that exhibits these stats, will be posted here in a bit.)
Attached file The test case "ofib-rc" (obsolete) —
This test case demonstrates the unboundedness of RC reaping times.  Turns out it's a little tricky to demonstrate this because of the GC's twitchiness ("panic"), and ofib-rc will fail to show anything if the GC happens to be running while the ZCT is supposed to be filling up, but in practice it works well enough: On the venerable PowerBook, the ZCT reap that occurs in response to all the assignments takes about 1050ms.  The live data volume is about the same as for "ofib".
A commentary on the two test programs: If we change ofib so that its nodes have no parent links then in principle the reference counting should be able to reclaim some of the storage, but in practice this does not happen.  (I tried.)  The direct reason is that the GC runs very aggressively, but I suspect that even if it didn't we'd still see reference counting not performing reclamation.  The reason I say that is that RC reaping is driven only by the fullness of the ZCT, not by the need for memory (which is what drives the GC).  But fullness is in terms of the number of pointers, not in terms of the amount of storage that might be reclaimed (which is not easily knowable of course).

So a possible refinement to the overall policy is to treat the ZCT reaping as a true minor collection, and when we would normally trigger a collection we should /first/ try to reap the ZCT to see if that frees any memory - which is more approximating what a simple generational collector would do.
(In reply to comment #0)
> For a
> program in the steady state this means that the heap may expand without limit
> just in order to keep the collector running.

I no longer believe that, because heap expansion triggers an immediate synchronous full collection, but I will revisit the issue once the policy has been changed so that we no longer block for 1s or more on slower machines - as doing so will almost certainly mean disabling that full collection.
Another parameter that could/should drive GC work is the write barrier: if the mark stack is filling up very quickly due to write barrier work, incremental marking should try to keep up with it.
Comment on attachment 370677 [details] [diff] [review]
Slight generalization of policy manager

370677 committed, hence obsolete.
Attachment #370677 - Attachment is obsolete: true
Three main things going on here:

Introduces a GCManager which is a member of GCHeap, used to track all the GCs in the system.  For now a gentle generalization of the code that was already in GCHeap but more will probably show up here - the intent is to centralize the access to all GCs from everywhere, so that policy decisions can be made based on system-wide behavior, for example.  A couple of stubs (signaling start and end of collection from one GC to the others) is here and illustrates this.

Moves more accounting code out of the GC and into GCPolicyManager, with the GC then getting the code from GCPolicyManager.

Profiling facilities for latency.  For now the logging output is enabled by a special ifdef (MMGC_PROFILE_LATENCY) but profiling is virtually free and is always enabled.  GCPolicyManager exposes variables that holds profiling info.  Some further generalizations are obvious, for example propagating this info up to GCManager.  TBD.
Attachment #370864 - Flags: review?(treilly)
Attachment #370864 - Flags: review?(treilly) → review+
Target Milestone: --- → flash10.x
Depends on: 412370
Flags: flashplayer-qrb+
Priority: -- → P2
See also bug #412370 for related issues.
Depends on: 493555
Assignee: lhansen → rishah
Assignee: rishah → lhansen
Comment on attachment 370864 [details] [diff] [review]
Further refactoring / elaboration, v1

Landed some time ago.
Attachment #370864 - Attachment is obsolete: true
Attachment #384593 - Flags: review?(treilly)
Attachment #384087 - Attachment is obsolete: true
Comment on attachment 384593 [details] [diff] [review]
More elaborate accounting code, in preparation for a change in policy

I would be concerned about the logToFile throwing off the profile timing information opening and closing the file every time.

Is this stuff for winmo where there's no cli?
Yeah, it's for winmo.  Note the logging volume is tiny - typically a couple of kilobytes per run for the benchmarks we're considering, scattered over a few dozen lines - and there's really no evidence that it's a problem in practice.  (I suppose that could be tested fairly easily but writing a test that just opens and closes the file a few thousand times.)
Attachment #384608 - Attachment is obsolete: true
Blocks: 499965
No longer depends on: 493555
Summary: MMgc collection policy does not guarantee even soft incrementality → Collection policy changes for efficiency and incrementality
Comment on attachment 384593 [details] [diff] [review]
More elaborate accounting code, in preparation for a change in policy

GCPolicyManager::ticksToMs
 and GC::ticksToMillis are redundant no?   Or use ticksToMicros to enable higher res full non-FP math.
(In reply to comment #21)
> (From update of attachment 384593 [details] [diff] [review])
> GCPolicyManager::ticksToMs
>  and GC::ticksToMillis are redundant no?   Or use ticksToMicros to enable
> higher res full non-FP math.

FP doesn't matter in the case of the policy code, but yes, I'm aware of the redundancy.  Duplicating the function may have been a revolt against a function like that being inside the GC class in the first place (rather than global :-).  I'll tidy it up.
(In reply to comment #18)
> (From update of attachment 384593 [details] [diff] [review])
> I would be concerned about the logToFile throwing off the profile timing
> information opening and closing the file every time.

Indeed, the open/write/close cycle takes about 19ms on the device.  This is a little too much for comfort.
Calling fflush after every write cuts the time down to the .4ms range per write, which will be adequate in practice I think.  New patch coming up.
Changes since last patch:

Avoids the fopen/fclose cost by using fflush after fprintf; this has been measured and has acceptable cost.  GCHeap constructor redirects output and destructor undoes the redirect.

I decided to keep GCPolicyManager::ticksToMs() because it returns a double, which I need (since the WinMo libraries don't work with %llu in printf strings, and since I actually want floating-point formatting), but since it's only used for policy profiling I made it #ifdef MMGC_POLICY_PROFILING for now.  The floating-point math and the printfs are not performance critical for this device, the output volume is one per major GC cycle, not one per incremental mark.
Attachment #384593 - Attachment is obsolete: true
Attachment #385162 - Flags: review?(treilly)
Attachment #384593 - Flags: review?(treilly)
Attachment #384625 - Attachment is obsolete: true
Attachment #385181 - Flags: review?(treilly)
Attachment #385162 - Flags: review?(treilly) → review+
Comment on attachment 385162 [details] [diff] [review]
More elaborate accounting code, v2

I review +'d the previous version so you can assume I +'d minor tweaks to it as well
Blocks: 493555
Attached patch Final draft of accounting code (obsolete) — Splinter Review
Incorporates some refinements from the policy code patch, and also the (very minor) changes for bug #493555.
Attachment #385162 - Attachment is obsolete: true
Comment on attachment 385743 [details] [diff] [review]
Final draft of accounting code

Landed with redux changeset:   2068:48fed9328c8c
Attachment #385743 - Attachment is obsolete: true
Attachment #385181 - Attachment is obsolete: true
Attachment #385765 - Flags: review?(treilly)
Attachment #385181 - Flags: review?(treilly)
Not ready for landing; based on slightly older version of policy code; not tested.
Attached the wrong file the last time...
Attachment #385765 - Attachment is obsolete: true
Attachment #385974 - Flags: review?(treilly)
Attachment #385765 - Flags: review?(treilly)
OS: Mac OS X → All
Hardware: x86 → All
Additional accounting code was pushed as part of fix for bug #501318; this patch is clean.
Attachment #385974 - Attachment is obsolete: true
Attachment #386093 - Flags: review?(treilly)
Attachment #385974 - Flags: review?(treilly)
(In reply to comment #33)
> Created an attachment (id=386093) [details]
> Policy code, rebased to redux 2077
> 
> Additional accounting code was pushed as part of fix for bug #501318; this
> patch is clean.

Memo to self: hitZeroObjects is now redundant, should remove it from everywhere.
Comment on attachment 370842 [details]
The test case "ofib"

test/performance/mmgc/ofib.as, redux changeset 2079
Attachment #370842 - Attachment is obsolete: true
Comment on attachment 370847 [details]
The test case "ofib-rc"

test/performance/mmgc/ofib-rc.as, redux changeset 2079
Attachment #370847 - Attachment is obsolete: true
Clean, ready for final testing.
Attachment #386093 - Attachment is obsolete: true
Attachment #386093 - Flags: review?(treilly)
With compile fixes for Win64.
Attachment #386540 - Attachment is obsolete: true
Attachment #386543 - Flags: review?(treilly)
Comment on attachment 386543 [details] [diff] [review]
Policy code, rebased to redux 2085

redux changeset: 2086:4c40f2edffbf; review still desired.
Attachment #386543 - Flags: review?(treilly) → review+
Attachment #386543 - Attachment is obsolete: true
Attached patch More logging code (obsolete) — Splinter Review
Cleans up some of the write barrier logging; adds logging for ZCT reap data and a couple other things.  Minor cleanup.
Attachment #387886 - Flags: review?(treilly)
Attached patch Expose GC policy to the avmshell (obsolete) — Splinter Review
Expose central policy parameters (L, G, X) as command line arguments to the shell; add them to the GCHeap's config structure; pick them up in the GC constructor.  Facilitates experimentation.
Attachment #387889 - Flags: review?(treilly)
Small fry here, but beneficial.  The GC trigger is now dependent on the load factor, and the initial "guessed" scanning rate is chosen to be not laughably low but in tune with what we can expect on a netbook, say (decent average between smartphone and desktop).  This avoids some pathological startup behavior.
Attachment #387897 - Flags: review?(treilly)
Comment on attachment 387886 [details] [diff] [review]
More logging code

+1 I think the gcstats stuff should be consolidated with the MMGC_POLICY_PROFILING, I was trying to make gcstats work in all builds but forcing a re-compile to get detailed stats is probably better than slowing down release builds.
Attachment #387886 - Flags: review?(treilly) → review+
Comment on attachment 387889 [details] [diff] [review]
Expose GC policy to the avmshell

at some point replacing the single letter identifiers with real names would be nice to make things more readable.
Attachment #387889 - Flags: review?(treilly) → review+
Comment on attachment 387897 [details] [diff] [review]
Tweaks to policy settings; comments; tuning

minor nit: I think setting something to zero is clearer than subtracting itself
Attachment #387897 - Flags: review?(treilly) → review+
(In reply to comment #45)
> (From update of attachment 387897 [details] [diff] [review])
> minor nit: I think setting something to zero is clearer than subtracting itself

I agree, but if you're thinking of this line:

  remainingMajorAllocationBudget -= remainingMinorAllocationBudget

then that's not what's happening.  (If you're not thinking of that line I'd like to know which one you're thinking of so I can fix it.)
Comment on attachment 387886 [details] [diff] [review]
More logging code

redux changeset:   2127:527643bd2af9
Attachment #387886 - Attachment is obsolete: true
Comment on attachment 387889 [details] [diff] [review]
Expose GC policy to the avmshell

redux changeset:   2128:0e7e6606d079
Attachment #387889 - Attachment is obsolete: true
Comment on attachment 387897 [details] [diff] [review]
Tweaks to policy settings; comments; tuning

redux changeset:   2129:ee312d424029
Attachment #387897 - Attachment is obsolete: true
is the feature completed?  mark bug resolved/fixed?
Flags: flashplayer-triage+
(In reply to comment #50)
> is the feature completed?  mark bug resolved/fixed?

I don't think so, but the bug report is getting unmanageable.  I'll deal with it.
Remaining work here:

* Global heap pressure has yet to be incorporated into the policy code
* ZCT reap pauses are not bounded
* Sweep+Finalize pauses are not bounded (though I think that's out of scope until
  we fix how finalization is done)

I will create separate bugs for those and close this one.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Resolved fixed engineering / work item that has been pushed.  Setting status to verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: