Open Bug 634503 Opened 9 years ago Updated 5 years ago

Investigate V8's GC

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

People

(Reporter: dvander, Unassigned)

References

(Blocks 1 open bug)

Details

Fact-finding bug for information about V8's GC. The first comment will be general VM stuff and then I'll do a second one about JIT integration.
V8's has a single heap per process, divided into the following components:
 * NewSpace: New object allocations go here. Capacity is 8MB. It's divided into two contiguous semispaces, and periodically collected using Cheney's algorithm. Uses bump-allocation.
 * OldSpaces: These allocate memory in 64-128KB chunks of 8KB pages. They come in the following flavors:
   * MapSpace:     Exclusively for Maps (similar to js::Shape), and Maps *always* go here.
   * CodeSpace:    Exclusively for Code.
   * DataSpace:    For flat strings and numbers; can't reference objects, except for maps.
   * PointerSpace: For anything not a Map or Code that can reference other objects.
   * CellSpace:    Seems to store global properties or something.
 * LOSpace:  For large objects, where "large" means it needs more than ~8KB of storage.

OldSpace allocation keeps a linked list of available pages, and tries to fit on either the current or next page. If it can't, it either uses the free list or tries to grab a new page.

All GC-things derive from HeapObject. A HeapObject has a header containing a "map word". During normal operation, this is a pointer to that object's shape. During compaction, this field becomes a forwarding address.

(Maps themselves are HeapObjects (there is a sentinel Map for Maps), and are analogous to a js::Class combined with js::Shape. They appear to be fixed size, containing just four fields (prototype, IC cache!, property and transition information. Also worth noting is that the GC-thing hierarchy is used far beyond only JS objects. A good example is "FixedArray," a GC-thing which represents a fixed-length vector of GC-things. This is used as a component of all sorts of structures, like hash tables, which are themselves GC-things.)

All GC pages have the following notable fields at their base:
 * A pointer to the next page.
 * Allocation watermark.
 * 32-bit field of dirty bits. Each page, no matter how large it is, is divided into 32 regions.
   This allows for a fast write barrier:
>     WriteBarrier(obj, offset) {
>        if (obj not in NewSpace) {  // This is only a mask of |obj| and compare to a precomputed address.
>           page = obj & PageMask
>           region = ((obj + offset) & ~PageMask) >> 8
>           page->dirty |= 1 << region
>        }
>     }
 * A spare word that spaces can use for indexing.
 * A special word used during mark compaction, which specifies a base forward address.

The GC can perform either a "scavenging" collection, which reclaims only gargabe in the NewSpace, or it can perform a full, mark-sweep collection. The full collection may or may not compact.

For a scavenging GC:
 (0) The inactive ("to") semispace is completely free, and the active
     ("from") semispace contains all newborn objects since the last GC.
 (1) During this process, the "map" field specifies whether an object
     has been moved or not. If it doesn't have a mark bit, then it hasn't
     been moved. If it does, then the map is actually the object's new
     address.
 (2) The first pass looks at roots, and copies objects out of the active
     "from" semispace:
   (a) If the object is already marked, the reference is updated.
       Otherwise,
   (b) If the inactive semispace is 25% full, OR if the object survived
       the last scavenge, copy it to an old space. A watermark from the
       last scavenge is kept, any object below this is old.
   (c) Promoted objects (those copied to old space) are added to a
       "promotion queue". The promotion queue is a vector at the top of
       the "to" space, guaranteed to not overflow.
   (c) If the object is not promoted, it is copied to the "to" space.
   (d) The old object's map pointer becomes a forwarding address with a
       mark bit set.
 (3) The first pass also iterates over all pages in spaces other than New
     and Data, and updates/copies any references from dirty regions to
     the NewSpace. In LO space, a dirty region can be quite large.
     Otherwise, they are 256 bytes, which is 64 addresses on x86. All
     fields in dirty regions are GC-things, so no special work is needed.
 (4) After all roots and possible intergenerational references have been
     scanned, the following algorithm is used.
   (a) Inspect the "to" semispace for any references to the "from" space.
       For each such reference, the object is copied if not yet done,
       then the reference updated.
   (b) Do the same thing for each item in the promotion queue until the
       promotion queue is empty.
   (b) If the size of the "to" semispace has changed, go back to step
       (a), scanning the unscanned portions of the "to" semispace.
 (5) The semispaces are flipped.

At the end of a GC, some statistics are collected. It determines, based on the current survival rate and last known survival rate, whether the rate is stable, increasing, or fluctuating (with some margin of error). If the rate is increasing, and the survival rate itself is high, the next full GC is scheduled sooner.

A full GC is much more involved. First, a full marking phase occurs:
 (1) For each strong root, add objects to a marking stack. The marking
     stack uses the inactive semispace, and can overflow. If it overflows,
     objects are marked with an "overflowed" bit. The whole process is
     iterative. Mark bits overlay the "map" pointer.
 (2) For each object in the marking stack:
   (a) Remove the object from the marking stack.
   (b) For each unmarked object immediately referenced from that object,
       add it to the marking stack.
 (3) As long as the marking stack is overflowed, the entire heap is
     scanned for overflowed objects. The overflow bit is only removed
     once its contents have been queued or marked as overflowed.
 (4) There's a notable concept of "object groups", special collections
     of roots where if any one object is live in the group, the entire
     group is live. A comment implies it's for DOM-like structures.
 (5) Once the marking stack is empty and not overflowed...
 (6) The large object space is swept. It is never compacted. Free objects
     are unlinked, live objects have their mark bit cleared.

The next phase can either be compaction, or sweeping. There's a heuristic for this that can be toggled by shell flags. If compaction is desired, the MapSpace builds a vector of all its pages, such that (vector[page->index] == page). This is used for a very efficient address-forwarding scheme that overlays map pointers. If the size of this list is larger than 12 bits (~32MB of pages), then the scheme doesn't work and compaction *must* be forfeited for sweeping.

For compaction:
 (1) For each GC page |p| in all spaces,
   (a) For each span of dead objects, the first word of this region
       becomes a placeholder for future passes. The placeholder says
       "skip X many bytes to reach the next live object".
   (b) For each live object, a new address is chosen starting from the
       base of that space. It is not yet copied.
     (1) If the object is the first live object on this page, set
         p->first_live = new_address
     (2) Replace the map word with a new word, encoded as such:
       * Bits 31-21 are the number of live words encountered since setting
         p->first_live.
       * Bits 20-13 are the distance between the object's map and the
         start of the object's map's page.
       * Bits 12-0 are the object's map's page index, in MapSpace.
       * Using this scheme, the map is overwritten, but both the
         map and the object's new address can be recovered.
 (2) For each object in new space, try to reserve a new address for it
     in old space. If this fails, reserve one in the other semispace.
     The forwarding address scheme is much simpler for NewSpace objects:
        offset = (obj - from_space)
        to_space[offset] = new_address
 (3) At this point, all live objects have a known forwarding pointer.
 (4) For all live objects, starting with MapSpace,
   (a) Decompose the map pointer into (new_address, map).
   (b) Find the map object's new address.
   (c) Update the object's map pointer, using the same encoding as
       earlier, to be (new_address, new_map_address).
 (5) Relocate internal object references.
 (6) Finally, perform actual object copies, taking care to preserve
     inter-generational dirty bits if necessary.
 (7) If the NewSpace is under max capacity, and a large amount of
     objects survived last GC, then double the size of both semispaces.

If instead, only a sweep is performed:
 (1) For every page, look for dead objects. If all objects are dead,
     put the page back in its space's free page list. If only some objects
     are dead, put spans of dead objects into a free list, which is
     threaded through such regions.
 (2) Try to promote all objects out of the active semispace, or copy them
     to the other semispace.
 (3) Update pointers into the new space.

Worth noting is the high amount of abstraction in the GC. Everything tends to be expressed in terms of Address, Page, *Space objects, et cetera.

It seems like our two main engine-wide difficulties in adapting this sort of model would be rooting (both in C++ code and in JIT code) and in adding write barriers. Aside from that, there's a lot to like here. Sharing IC stubs in shapes (probably a Crankshaft change), GC'ing shapes, GC'ing code (haven't investigated this yet), storing prototypes in shapes, is all good stuff.
Nice write-up. As you say, a lot to like. We could copy quite a bit in the way of design, incrementally or patch-wise to boot.

>  (4) There's a notable concept of "object groups", special collections
>      of roots where if any one object is live in the group, the entire
>      group is live. A comment implies it's for DOM-like structures.

This is like what we had pre-CycleCollector, dbaron's SCC (strongly connected components) algorithm. It handled DOM/JS cycles but not all XPCOM cycles. See bug 283129, recommended reading for all who want to take on DOM memory management.

/be
Brendan pointed out a mistake in IRC, card marking computation is (obj + addr) & PageMask, not ~PageMask. The region number is based on the offset, but the dirty bit is stored in the base page. When marking an object in the LO space, the base dirty bits are treated as the dirty bits for each individual page in that object.
Another design aspect of V8's GC is that you can't garbage collect in between every two lines of code in the engine. Internal functions tend to leave locals unrooted, performing anything fallible or impure operations at the very end of the function, after all allocation has succeeded.

Allocation failure never immediately triggers a GC. Instead a magic "retry after GC" cookie propagates back up to top-level API wrappers. These wrappers invoke VM machinery through a CALL_HEAP_FUNCTION macro. This macro invokes the VM function, and if a RetryAfterGC cookie is returned, performs collection. Then it re-runs the internal function, which is safe as long that function has been written properly.
Some grab-bag questions that came up, and their answers:

Q: Do objects ever have inline slots?
A: Yes. A Map contains a count of how many slots must be inline. When fetching a value from a slot number, the number is decremented by this count, sort of like what we used to do for JS_INITIAL_NSLOTS, except it's not constant. An object has both inline slots and dynamic slots, and dslots never points to inline slots. (Note: there is a global "empty array" value.)

Q: Do objects have flags?
A: I don't see anything like our JSObject::flags.

Q: Do objects have resolve hooks?
A: It looks like there are various ways of installing interceptors on objects, and these look kind of like our JSClass hooks.

Q: Are there host objects?
A: A Map encodes a byte that describes various kinds of internal object types. These mostly seem to assist in determining how to GC an object. I don't see anything like JSClass other than that.

Q: What exactly is a Map?
A: Our js::Shape is a fairly different beast from JSC's Structure, or v8's Map. In these two systems, a "shape" is a collection of property descriptors stored in a dictionary. In SpiderMonkey, a Shape is a single property descriptor and a link pointer to the next property in enumeration order. For small property chains, lookup traverses a linked list. For large property property sets, or dictionary-mode objects, all properties along the ancestor line are cloned, and placed into a hashtable stored in the last shape. In V8's model, there is no link pointer, but there are more tables and less shapes for dictionaries.

The main item of interest in a Map is its "descriptor" field. This is a pointer to a 2+N element FixedArray:
  [0]   - A FixedArray that lists properties and transitions.
  [1]   - An enumeration-helper that can contain an enumeration cache.
  [2+N] - Property key list. This uses linear search for < 8 keys, otherwise, binary search with hashing.
          There is a DescriptorLookupCache, globally, that keys on (DescriptorArray, PropertyName) and
          maps to an index into the DescriptorArray. It is filled after slow-searching for a property.
          (There are hash tables and they FixedArrays+probing, but it looks like descriptors are not
           stored in hash tables).

Transitions are encoded as descriptors, so a lookup can return a descriptor, and that descriptor may be a data field, or a transition, etc.

Enumeration order is encoded in a descriptor's details, but it's not threaded through or anything. Enumeration is multi-step:
 * Build a vector of all enumerable properties.
 * Sort it in enumeration order.
 * Maybe, cache the result inside the Map.
(In reply to comment #5)
> Q: What exactly is a Map?
> A: Our js::Shape is a fairly different beast from JSC's Structure, or v8's Map.

No, JSC's Structure is close to our Shape.

V8 Map (IIRC) is based on Ungar et al.'s property maps work for Self:

http://portal.acm.org/citation.cfm?id=74884

> In these two systems, a "shape" is a collection of property descriptors stored
> in a dictionary.

That's not true of JSC Structures, AFAIK -- Structures like Shapes are in a tree until one of several conditions causes a switch to dictionary mode. In either mode, the VM has one Structure or Shape per property.

> In SpiderMonkey, a Shape is a single property descriptor and a
> link pointer to the next property in enumeration order.

JSC Structures are in enumeration order until in dictionary mode, then they need extra bookkeeping for enumeration.

Our Shapes stay in enumeration order even after going to dictionary mode.

/be
(In reply to comment #6)
> (In reply to comment #5)
> > Q: What exactly is a Map?
> > A: Our js::Shape is a fairly different beast from JSC's Structure, or v8's Map.
> 
> No, JSC's Structure is close to our Shape.

Are you sure? Looking at Structure:

        union {
            TransitionTable* m_table;
            Structure* m_singleTransition;
        } m_transitions;
        PropertyMapHashTable* m_propertyTable;
 ---------^
This looks like a string -> attribute map. That seems more like V8.
Okay, I think I'm wrong, Structure does have a parent link and it's used to materialize that table sometimes.
(In reply to comment #7)
> (In reply to comment #6)
> > (In reply to comment #5)
> > > Q: What exactly is a Map?
> > > A: Our js::Shape is a fairly different beast from JSC's Structure, or v8's Map.
> > 
> > No, JSC's Structure is close to our Shape.
> 
> Are you sure? Looking at Structure:
> 
>         union {
>             TransitionTable* m_table;
>             Structure* m_singleTransition;
>         } m_transitions;
>         PropertyMapHashTable* m_propertyTable;
>  ---------^
> This looks like a string -> attribute map. That seems more like V8.

No, that is like our js::KidsPointer from jspropertytree.h. Except that the JSC tree of Structures transitions are keyed by (id, attrs) only, with a mutable specificValue option. We specialize for more of the members of Shape, which can overspecialize sometimes. We both have to despecialize when things go south.

Again the key to understanding this is not just the parent linkage but the fact that in no case is there a "Structure" or "Shape" that maps, by itself and ignoring parent, *multiple* property identifiers to property metadata. These grow by induction from empty objects, one prop at a time.

/be
(In reply to comment #6)
Some general thoughts about type mapss/shapes/descriptor and how they relate to a GC:

The structure of a "type descriptor" for non-memory management concerns should have no relevance to the GC. It's a separation of concerns issues and a matter of design hygiene.  You really want to be able to change map/shape/descriptor design without having to also make substantial changes to the GC.

The only thing the GC really cares about is locating the pointers it needs to trace. This typically only requires knowing the size of the object and location of each such pointer.  Everything is simpler (in the GC) if you only have all-pointers or all-nonpointer objects. This GC-level simplification can even justify the expense of using multiple GC-level objects to represent a single complex higher level object. It you can't have such homogeneous objects, then the next best thing is to keep pointers and non-pointers in separate  contiguous regions of the object. In the worst case where you must have arbitrary layout of both pointer and non-pointer data then the GC is going to need some sort of map and it makes sense of keep this map in the same "type descriptor" as is used to support non-GC semantics.  However, it is still best to minimize any dependencies of the GC map upon any other semantics encoded in the descriptor. One good approach is to dynamically generate a tracer function for each unique GC object shape and keep a pointer to it in the type descriptor.

There is always a tension between how much metadata to keep "per object" in an object header and how much metadata to push into a shared descriptor.  Smaller headers are good for an overall footprint perspective and mean fewer bytes to copy.  However, access to shared meta state may incur the overhead of multiple levels of indirection and cache misses.

For example, the size of an object might be in the object header or it might be in the type descriptor.  Not only pointer tracing but any walk over contiguous regions of objects usually require access to the size of individual objects.  Placing that size in the type descriptor is likely to cause cache misses (at various levels of the storage hierarchy). It's usually worth having a slightly larger object header containing a size field to avoid those cache misses. The best scenario is being able to walk an entire segment of objects without having to refer to any separately located description metadata. That is why simple homogeneous object structures are so desirable.

Finally, in a highly dynamic language like JS you are likely to be creating and discarding a lot of type descriptors.  In such situations it is usually best to simply use the GC to manage them rather than to have some sort of specialized memory manager for them. Specialized mangers add overall complexity and usually require additional special cases in the main GC. You really want to built a GC that is good enough that it can manage both regular objects and metadata.
As a contrast to the V8 GC design you might want to take a look at the design described in the "JOVE Runtime Design" document available at http://www.wirfs-brock.com/allen/things/jove .

This describes a full multi-generational GC that was built for a Java compiler.
Some info on how the moving GC interacts with the JIT. Starting with the 2.x branch, first observation (new to me, perhaps not to anyone else) is that V8 actually has two full code generators for each platform. Both walk the AST, but emit different code.

The "full code generator" is intended for run-once scripts. It's not invoked on anything with loops. It always pushes values directly onto the stack, and pops them off to consume them. There is no register allocation. Registers are hardcoded. Aside from that it's similar to JM in eval/debug mode. It can call directly into the C++ runtime without any extra work, and because the stack is always reified, nothing is held in registers, and thus a moving GC poses no problem.

The other code generator has an abstraction similar to our FrameState (called a VirtualFrame), except it can nest, and two frames can be merged. Stub calls sync every frame slot, and do not preserve non-volatile registers. All registers are killed, because a GC could move items on the stack.

The other big difference from JM is how out-of-line code is generated on the main compilation path. JM1 captured the register state of slow paths, then deferred their generation until the end of compilation. The slow paths were then generated in a completely separate pass. JM2 instead emits a sync sequence in a separate, out-of-line code buffer. Stack operands and all registers are synced and killed. This is followed by a C++ call, then register state is restored before rejoining to the inline path.

V8 seems to do exactly what JM1 did. It has "DeferredCode" objects which save register state and operation-specific variables. These are processed near the end of compilation. Stub calls in the deferred path don't need the complex and expensive syncing process that JM2 has, nor V8's heavyweight splitting of frames that is used for control flow. Instead, it just spills all in-flight registers, calls into C++, then restores them. Spills go to the normal frame locations so re-loading the registers afterwards is GC safe.

One thing I don't understand is why stub calls on the inline path sync the entire frame, but stub calls on the out-of-line path don't. I don't know if 2.x did any on-stack replacement for debug mode.

The only area where JM2 is really not ready for moving GC is how it bakes in object pointers everywhere. V8 does this too, but its assembler tracks when GC things are passed as operands, and adds the position to a relocation table.

Tomorrow I'll look at Crankshaft.
Blocks: GC
Assignee: general → nobody
You need to log in before you can comment on or make changes to this bug.