Status

9 years ago
6 years ago

People

(Reporter: lhansen, Unassigned)

Tracking

(Depends on: 6 bugs)

unspecified
Future
x86
Mac OS X
Dependency tree / graph
Bug Flags:
flashplayer-qrb +
flashplayer-triage +

Details

(Whiteboard: Tracking)

Attachments

(1 attachment, 7 obsolete attachments)

(Reporter)

Description

9 years ago
A bug to serve as an anchor for ongoing work that improves memory management efficiency through opportunistic, non-disruptive changes.
(Reporter)

Comment 1

9 years ago
Created attachment 400441 [details] [diff] [review]
New mark bits structure

Work in progress but tested and clean.  (See next patch for performance numbers.)  This does two things:

 - changes the number of mark etc bits per object from 4 to 8

 - changes the layout of the bits vector so that computing the index into
   the vector requires many fewer instructions (GetIndex has been specialized
   into GetBitsIndex to compute that index, and GetObjectIndex to compute
   the offset of the object in the page).  See comment in GCAlloc::GCAlloc.

The change in the number of mark bits is required because I need more bits for ongoing work on generational collection / a static area.  That change could have been made independent of the other fix.  The second fix was made in order to reduce the cost it takes to set the mark on an object; that does not move the needle much currently but will probably make more of a difference when we move to partially exact marking, because then the cost of setting the mark bit will be more important than it is now.
(Reporter)

Comment 2

9 years ago
Created attachment 400442 [details]
Preliminary performance numbers for the change to bit representation

MacBook Pro only.  I need numbers from Windows for sure (different compiler) and from some ARM device.  See the file header for some preliminary analysis.
(Reporter)

Comment 3

9 years ago
(In reply to comment #1)
> Created an attachment (id=400441) [details]
> New mark bits structure

BTW this incorporates the fixes in bug #516161 and bug #516332.
(Reporter)

Updated

9 years ago
Priority: -- → P3
Summary: Quiet revolution → [tracking] Quiet revolution
Target Milestone: --- → flash10.1
(Reporter)

Updated

9 years ago
Summary: [tracking] Quiet revolution → Quiet revolution
Whiteboard: Tracking
(Reporter)

Updated

9 years ago
Priority: P3 → P5

Updated

9 years ago
Flags: flashplayer-triage+
(Reporter)

Comment 4

9 years ago
Created attachment 408820 [details] [diff] [review]
New mark bits structure, v2

Adapted to redux 2904.
(Reporter)

Updated

9 years ago
Attachment #400441 - Attachment is obsolete: true
(Reporter)

Comment 5

9 years ago
Created attachment 408836 [details] [diff] [review]
GC support for partly-exact tracing
(Reporter)

Comment 6

9 years ago
Created attachment 408837 [details] [diff] [review]
Mutator changes for partly-exact tracing

This is more a proof of concept than something finished, but it shows how ScriptObject can be traced exactly (or in a type-aware way, for '*' slots).  There are a couple of shortcomings here: dictionaries are not handled yet, and dynamic properties could be handled smarter.  There are some unclear things about the representation of native 'double' types that need to be worked out.  And of course we want support for many more data types.  And after that we need integration with and testing in the Player.

All that said, this code does not break the VM when running the acceptance tests or performance tests.

(Only tested on my MacPro.)
(Reporter)

Comment 7

9 years ago
How to do partly-exact tracing in MMgc:

It's a standard trick: mark objects that can be traced exactly with a special flag, and look for this flag in the tracer (GC::MarkItem).  If the flag is set, invoke a custom tracer on the object (based on the value of the flag).

In the simplest version, assume that if the flag is set then the object is a subtype of GCFinalizedObject; so call that class's virtual GCTrace method to perform the tracing.

In a more complicated version, one flag value specifies a virtual method for tracing, others specify specific object types that don't have a VTable (pure GCObject subclasses).

For now we have these parts that build one upon the other:

- "new mark bits structure" patch:

  - provide space for the flag on a per-object 
    basis, so we don't have to segregate allocators 
    even more.  The patch could be simpler, but
    the consequence of going to a byte per object
    means a lot of complexity drops away and the
    patch contains the resulting cleanup.

- "gc support for partly-exact tracing" patch:

  - a facility to set the "exact" flag during object 
    allocation, for qualifying object types

  - a check for the flag in the GC, with dispatch to 
    a virtual method

  - support code that the custom GCTrace methods can 
    call to trace known and unknown pointers

  - some refactoring to make it all hang together

- "mutator changes for partly-exact tracing" patch:

  - changes to the ScriptObject constructor to allow
    a flag to be passed and set

  - custom allocator for ScriptObject that makes sure
    to set the exact flag by passing it to the
    constructor; this was not strictly necessary
    perhaps but simplified the client code

  - a custom tracer for ScriptObject, slot arrays,
    atom arrays

While a host application probably would want to implement more custom tracers to take full benefit of the facility, and while we want to provide a custom tracer for StringObject, at least, and maybe other VM objects, handling just ScriptObject should take care of every AS3 class that doesn't have a native aspect that contains pointers that must be traced.

In terms of benchmark results, some of our standard benchmarks are slightly faster with this fix, but the big win in the short term is getting away from
conservative marking with the risks of leaks that come with that.

Comment 8

9 years ago
Sweet this is what we need to get rid of false positives and speed up marking.  Just wanted to note that its valuable to distinguish non-pointer pages from pointer pages to make the marker more efficient on systems with significant caches (ie desktop).  flatter memory models probably don't benefit as much, should measure.  Taking this logical to the extreme we might also look at moving the per object bits completely off the page for non-pointer objects and have a way to look up the bits for an address without touching the page.
(Reporter)

Comment 9

9 years ago
Created attachment 408849 [details] [diff] [review]
More mutator changes: exact tracing for StringObject

Comment 10

9 years ago
Nice! Be aware that the slots patch (https://bugzilla.mozilla.org/show_bug.cgi?id=512485) changes ScriptObject slot layout, but I don't think it will affect your approach in any meaningful way (it still relies on the slotdestroyinfo bit array)

BTW, you are correct, the slotdestroyinfo bit array doesn't currently take doubles into account, only things known to be RCObjects or Atoms.
(Reporter)

Comment 11

9 years ago
Created attachment 408938 [details] [diff] [review]
Simple generational collection

This patch is relative to the older redux 82b8860b31df (ca 2585), and is attached here as work-in-progress that - so far as I remember - works more or less according to spec but needs rebasing.

To understand this one, observe that a simple generational collector is a collector that does not clear the mark bits at the end of a collection cycle and which runs the write barrier all the time.  Occasionally it must collect the entire heap, which it does by clearing the mark bits and triggering a normal incremental collection.  This is controlled (if that's the word for it) by policy.
(Reporter)

Comment 12

9 years ago
One more thought on exact tracing.

A substantial number of object types in the VM are derived from GCObject.  On the one hand we'd like to trace these exactly without making them any larger, because there are many of them and making them larger is not a good thing.  On the other hand, absent a virtual method or a function pointer in each of them, we must have many bits available to create a type discriminator for all the important types, and we must switch on this discriminator when we trace.  Suppose we go with one byte per object for the various bits; four bits are already used, at most that gives us 14 different types (plus virtual method and conservatively traced).  That's not a lot, so that means moving to two bytes per object.  At that point, it becomes interesting to ask whether it is not simply better to make the interesting classes derive from GCFinalizedObject, so that they have a vtable and can use the virtual trace method.  If the vtable introduces some sort of alignment issue in the object we risk paying two words for it - and that could be bad.  In that case a new base class, GCObjectTraced could be used, which introduces an explicit function pointer - but that too could have some alignment issues.  So measurements will be important.  But it seems likely that we'll have to pay some memory here to get exact tracing, given the large number of different types we have.

Comment 13

9 years ago
Since ScriptObject and all AS3 user types already derive from GCFinalizedObject->RCObject, what are the chances that 14 of the remaining (many) GCObject-derived types represent the bulk of the non-GCFO population?

if its any consolation, we could save a pointer per ScriptObject by exploiting the fact that user defined ScriptObject subclasses always share a common value for __proto__ (ScriptObject.delegate) -- we could move that to VTable.  only vanilla instances of Object constructed with a user defined function have varying __proto__
(Reporter)

Comment 14

9 years ago
(In reply to comment #13)
> Since ScriptObject and all AS3 user types already derive from
> GCFinalizedObject->RCObject, what are the chances that 14 of the remaining
> (many) GCObject-derived types represent the bulk of the non-GCFO population?

I don't know yet.  I observe:

class VTable : public MMgc::GCObject
class Traits : public MMgc::GCObject 
class MultinameHashtable : public MMgc::GCObject
class ImtEntry: public MMgc::GCObject
class NamespaceSet : public MMgc::GCObject
class QCachedItem : public MMgc::GCObject
class ScopeTypeChain : public MMgc::GCObject
class ScopeChain : public MMgc::GCObject
class ExceptionHandlerTable : public MMgc::GCObject
class E4XNodeAux : public MMgc::GCObject
class E4XNode : public MMgc::GCObject
class LineNumberRecord : public MMgc::GCObject
class CodeContext : public MMgc::GCObject

as well as

class TextE4XNode : public E4XNode
class CommentE4XNode : public E4XNode
class AttributeE4XNode : public E4XNode
class CDATAE4XNode : public E4XNode
class PIE4XNode : public E4XNode
class ElementE4XNode : public E4XNode

It's my fault for not static the case precisely previously.  Observe that we derive two benefits from exact tracing, namely precision (thus less false retention) and performance (when we're exactly tracing an object that has few pointers relative to its size).

Performance improves when we don't have to guess at what a field is, and if we have many objects with a significant fraction of non-pointer fields.

Precision improves with the fraction of all object types that have /any/ non-pointer fields that we trace exactly, because exact tracing precludes misidentifying int/double data as pointer data.

So for performance we mainly care about GCObject-derived types that have many non-pointer fields, and of which there's a significant volume.  Some profiling needs to happen to find this out.

But for precision we care about many more types, probably.

Comment 15

9 years ago
Makes sense.  for the sake of precision, one would think we'd end up gaining back (some-of|more-than) the memory we used, by reducing false positives.  Having fast paths for 14 types seems mostly a perf win (maybe).  and there's still the sort-of-goal of changing some of these to fixed mem.  i'm rambling now.

Updated

9 years ago
Target Milestone: flash10.1 → flash10.2
(Reporter)

Updated

9 years ago
Depends on: 538943
(Reporter)

Updated

9 years ago
Assignee: lhansen → nobody
Status: ASSIGNED → NEW
(Reporter)

Updated

8 years ago
Priority: P5 → --
Target Milestone: flash10.2 → Future
(Reporter)

Updated

8 years ago
Depends on: 575316
(Reporter)

Comment 16

8 years ago
Comment on attachment 408820 [details] [diff] [review]
New mark bits structure, v2

Moved to bug 575316.
Attachment #408820 - Attachment is obsolete: true
(Reporter)

Comment 17

8 years ago
Comment on attachment 400442 [details]
Preliminary performance numbers for the change to bit representation

Moved to bug 575316.
Attachment #400442 - Attachment is obsolete: true
(Reporter)

Comment 18

8 years ago
Comment on attachment 408837 [details] [diff] [review]
Mutator changes for partly-exact tracing

Moved to bug #576307.
Attachment #408837 - Attachment is obsolete: true
(Reporter)

Comment 19

8 years ago
Comment on attachment 408836 [details] [diff] [review]
GC support for partly-exact tracing

Moved to bug #576307.
Attachment #408836 - Attachment is obsolete: true
(Reporter)

Comment 20

8 years ago
Comment on attachment 408849 [details] [diff] [review]
More mutator changes: exact tracing for StringObject

Moved to bug #576307.
Attachment #408849 - Attachment is obsolete: true
(Reporter)

Updated

8 years ago
Depends on: 576307
(Reporter)

Comment 21

8 years ago
Comment on attachment 408938 [details] [diff] [review]
Simple generational collection

Moved to bug #576896.
Attachment #408938 - Attachment is obsolete: true
(Reporter)

Updated

8 years ago
Depends on: 576896
(Reporter)

Comment 22

8 years ago
Created attachment 456023 [details] [diff] [review]
New write barrier (card marking)

This is old work, relative to TR 2577:637e59b7a595 (September 2009), almost certainly incomplete and not yet working, attached here for safekeeping.
(Reporter)

Updated

8 years ago
Depends on: 577118
(Reporter)

Comment 23

8 years ago
Current patch queue:

markBitsRewrite  (bug 575316)
fastBitsTweak    (bug 575316)
exactMarking     (bug 576307)
mutatorChanges   (bug 576307)
rcEctomy         (bug 538943)
generationalGC   (bug 576896)
rcGone           (bug 538943)
(Reporter)

Updated

8 years ago
Depends on: 588364

Updated

8 years ago
Assignee: nobody → lhansen
(Reporter)

Updated

8 years ago
Depends on: 600564
(Reporter)

Updated

8 years ago
Depends on: 603606
(Reporter)

Updated

8 years ago
Depends on: 604333
(Reporter)

Updated

8 years ago
Depends on: 581016
(Reporter)

Updated

8 years ago
Depends on: 594756
(Reporter)

Updated

8 years ago
Depends on: 611484
(Reporter)

Updated

8 years ago
Depends on: 627025
(Reporter)

Updated

8 years ago
No longer depends on: 538943, 576896, 604333
(Reporter)

Updated

8 years ago
No longer depends on: 581016
(Reporter)

Updated

7 years ago
Depends on: 654943
(Reporter)

Updated

7 years ago
Assignee: lhansen → nobody

Updated

7 years ago
Flags: flashplayer-qrb+
Assignee: nobody → fklockii
Assignee: fklockii → nobody
You need to log in before you can comment on or make changes to this bug.