A bug to serve as an anchor for ongoing work that improves memory management efficiency through opportunistic, non-disruptive changes.
Created attachment 400441 [details] [diff] [review] New mark bits structure Work in progress but tested and clean. (See next patch for performance numbers.) This does two things: - changes the number of mark etc bits per object from 4 to 8 - changes the layout of the bits vector so that computing the index into the vector requires many fewer instructions (GetIndex has been specialized into GetBitsIndex to compute that index, and GetObjectIndex to compute the offset of the object in the page). See comment in GCAlloc::GCAlloc. The change in the number of mark bits is required because I need more bits for ongoing work on generational collection / a static area. That change could have been made independent of the other fix. The second fix was made in order to reduce the cost it takes to set the mark on an object; that does not move the needle much currently but will probably make more of a difference when we move to partially exact marking, because then the cost of setting the mark bit will be more important than it is now.
Created attachment 400442 [details] Preliminary performance numbers for the change to bit representation MacBook Pro only. I need numbers from Windows for sure (different compiler) and from some ARM device. See the file header for some preliminary analysis.
(In reply to comment #1) > Created an attachment (id=400441) [details] > New mark bits structure BTW this incorporates the fixes in bug #516161 and bug #516332.
Priority: -- → P3
Summary: Quiet revolution → [tracking] Quiet revolution
Target Milestone: --- → flash10.1
Summary: [tracking] Quiet revolution → Quiet revolution
Created attachment 408820 [details] [diff] [review] New mark bits structure, v2 Adapted to redux 2904.
Created attachment 408837 [details] [diff] [review] Mutator changes for partly-exact tracing This is more a proof of concept than something finished, but it shows how ScriptObject can be traced exactly (or in a type-aware way, for '*' slots). There are a couple of shortcomings here: dictionaries are not handled yet, and dynamic properties could be handled smarter. There are some unclear things about the representation of native 'double' types that need to be worked out. And of course we want support for many more data types. And after that we need integration with and testing in the Player. All that said, this code does not break the VM when running the acceptance tests or performance tests. (Only tested on my MacPro.)
How to do partly-exact tracing in MMgc: It's a standard trick: mark objects that can be traced exactly with a special flag, and look for this flag in the tracer (GC::MarkItem). If the flag is set, invoke a custom tracer on the object (based on the value of the flag). In the simplest version, assume that if the flag is set then the object is a subtype of GCFinalizedObject; so call that class's virtual GCTrace method to perform the tracing. In a more complicated version, one flag value specifies a virtual method for tracing, others specify specific object types that don't have a VTable (pure GCObject subclasses). For now we have these parts that build one upon the other: - "new mark bits structure" patch: - provide space for the flag on a per-object basis, so we don't have to segregate allocators even more. The patch could be simpler, but the consequence of going to a byte per object means a lot of complexity drops away and the patch contains the resulting cleanup. - "gc support for partly-exact tracing" patch: - a facility to set the "exact" flag during object allocation, for qualifying object types - a check for the flag in the GC, with dispatch to a virtual method - support code that the custom GCTrace methods can call to trace known and unknown pointers - some refactoring to make it all hang together - "mutator changes for partly-exact tracing" patch: - changes to the ScriptObject constructor to allow a flag to be passed and set - custom allocator for ScriptObject that makes sure to set the exact flag by passing it to the constructor; this was not strictly necessary perhaps but simplified the client code - a custom tracer for ScriptObject, slot arrays, atom arrays While a host application probably would want to implement more custom tracers to take full benefit of the facility, and while we want to provide a custom tracer for StringObject, at least, and maybe other VM objects, handling just ScriptObject should take care of every AS3 class that doesn't have a native aspect that contains pointers that must be traced. In terms of benchmark results, some of our standard benchmarks are slightly faster with this fix, but the big win in the short term is getting away from conservative marking with the risks of leaks that come with that.
Sweet this is what we need to get rid of false positives and speed up marking. Just wanted to note that its valuable to distinguish non-pointer pages from pointer pages to make the marker more efficient on systems with significant caches (ie desktop). flatter memory models probably don't benefit as much, should measure. Taking this logical to the extreme we might also look at moving the per object bits completely off the page for non-pointer objects and have a way to look up the bits for an address without touching the page.
Created attachment 408849 [details] [diff] [review] More mutator changes: exact tracing for StringObject
Nice! Be aware that the slots patch (https://bugzilla.mozilla.org/show_bug.cgi?id=512485) changes ScriptObject slot layout, but I don't think it will affect your approach in any meaningful way (it still relies on the slotdestroyinfo bit array) BTW, you are correct, the slotdestroyinfo bit array doesn't currently take doubles into account, only things known to be RCObjects or Atoms.
Created attachment 408938 [details] [diff] [review] Simple generational collection This patch is relative to the older redux 82b8860b31df (ca 2585), and is attached here as work-in-progress that - so far as I remember - works more or less according to spec but needs rebasing. To understand this one, observe that a simple generational collector is a collector that does not clear the mark bits at the end of a collection cycle and which runs the write barrier all the time. Occasionally it must collect the entire heap, which it does by clearing the mark bits and triggering a normal incremental collection. This is controlled (if that's the word for it) by policy.
One more thought on exact tracing. A substantial number of object types in the VM are derived from GCObject. On the one hand we'd like to trace these exactly without making them any larger, because there are many of them and making them larger is not a good thing. On the other hand, absent a virtual method or a function pointer in each of them, we must have many bits available to create a type discriminator for all the important types, and we must switch on this discriminator when we trace. Suppose we go with one byte per object for the various bits; four bits are already used, at most that gives us 14 different types (plus virtual method and conservatively traced). That's not a lot, so that means moving to two bytes per object. At that point, it becomes interesting to ask whether it is not simply better to make the interesting classes derive from GCFinalizedObject, so that they have a vtable and can use the virtual trace method. If the vtable introduces some sort of alignment issue in the object we risk paying two words for it - and that could be bad. In that case a new base class, GCObjectTraced could be used, which introduces an explicit function pointer - but that too could have some alignment issues. So measurements will be important. But it seems likely that we'll have to pay some memory here to get exact tracing, given the large number of different types we have.
Since ScriptObject and all AS3 user types already derive from GCFinalizedObject->RCObject, what are the chances that 14 of the remaining (many) GCObject-derived types represent the bulk of the non-GCFO population? if its any consolation, we could save a pointer per ScriptObject by exploiting the fact that user defined ScriptObject subclasses always share a common value for __proto__ (ScriptObject.delegate) -- we could move that to VTable. only vanilla instances of Object constructed with a user defined function have varying __proto__
(In reply to comment #13) > Since ScriptObject and all AS3 user types already derive from > GCFinalizedObject->RCObject, what are the chances that 14 of the remaining > (many) GCObject-derived types represent the bulk of the non-GCFO population? I don't know yet. I observe: class VTable : public MMgc::GCObject class Traits : public MMgc::GCObject class MultinameHashtable : public MMgc::GCObject class ImtEntry: public MMgc::GCObject class NamespaceSet : public MMgc::GCObject class QCachedItem : public MMgc::GCObject class ScopeTypeChain : public MMgc::GCObject class ScopeChain : public MMgc::GCObject class ExceptionHandlerTable : public MMgc::GCObject class E4XNodeAux : public MMgc::GCObject class E4XNode : public MMgc::GCObject class LineNumberRecord : public MMgc::GCObject class CodeContext : public MMgc::GCObject as well as class TextE4XNode : public E4XNode class CommentE4XNode : public E4XNode class AttributeE4XNode : public E4XNode class CDATAE4XNode : public E4XNode class PIE4XNode : public E4XNode class ElementE4XNode : public E4XNode It's my fault for not static the case precisely previously. Observe that we derive two benefits from exact tracing, namely precision (thus less false retention) and performance (when we're exactly tracing an object that has few pointers relative to its size). Performance improves when we don't have to guess at what a field is, and if we have many objects with a significant fraction of non-pointer fields. Precision improves with the fraction of all object types that have /any/ non-pointer fields that we trace exactly, because exact tracing precludes misidentifying int/double data as pointer data. So for performance we mainly care about GCObject-derived types that have many non-pointer fields, and of which there's a significant volume. Some profiling needs to happen to find this out. But for precision we care about many more types, probably.
Makes sense. for the sake of precision, one would think we'd end up gaining back (some-of|more-than) the memory we used, by reducing false positives. Having fast paths for 14 types seems mostly a perf win (maybe). and there's still the sort-of-goal of changing some of these to fixed mem. i'm rambling now.
Priority: P5 → --
Target Milestone: flash10.2 → Future
Comment on attachment 408820 [details] [diff] [review] New mark bits structure, v2 Moved to bug 575316.
Attachment #408820 - Attachment is obsolete: true
Comment on attachment 400442 [details] Preliminary performance numbers for the change to bit representation Moved to bug 575316.
Attachment #400442 - Attachment is obsolete: true
Comment on attachment 408837 [details] [diff] [review] Mutator changes for partly-exact tracing Moved to bug #576307.
Attachment #408837 - Attachment is obsolete: true
Comment on attachment 408836 [details] [diff] [review] GC support for partly-exact tracing Moved to bug #576307.
Attachment #408836 - Attachment is obsolete: true
Comment on attachment 408849 [details] [diff] [review] More mutator changes: exact tracing for StringObject Moved to bug #576307.
Attachment #408849 - Attachment is obsolete: true
Comment on attachment 408938 [details] [diff] [review] Simple generational collection Moved to bug #576896.
Attachment #408938 - Attachment is obsolete: true
Created attachment 456023 [details] [diff] [review] New write barrier (card marking) This is old work, relative to TR 2577:637e59b7a595 (September 2009), almost certainly incomplete and not yet working, attached here for safekeeping.
Current patch queue: markBitsRewrite (bug 575316) fastBitsTweak (bug 575316) exactMarking (bug 576307) mutatorChanges (bug 576307) rcEctomy (bug 538943) generationalGC (bug 576896) rcGone (bug 538943)
You need to log in before you can comment on or make changes to this bug.