Closed Bug 687788 Opened 8 years ago Closed 8 years ago

Object/shape shrinking goal

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: bhackett, Unassigned)

References

Details

Rough proposal for the target state of object and shape shrinking.  Filing this separately from bug 637931 as this is for discussion and bug 637931 is for tracking.

The working observation here is that while benchmarks make few shapes, websites make many, and little will be accomplished if we shrink objects by fattening shapes (borrow from Peter to pay Paul etc.).  Eventual goal, spanning various bugs (several already in progress):

JSObject:

Shape *lastProp;
TypeObject *type;
Value *properties;
Value *elements;

Shape:

BaseShape *base;
jsid propid;
Shape *parent;
kids union;
64 bits of assorted flags

BaseShape:

Class *clasp;
getter union;
setter union;
PropertyTable *table;
flags, slotspan, ...;

On 32 bit platforms this shrinks JSObject from 10 words to 4 words and Shape from 10 words to 6 words.  It also adds a BaseShape structure, which has two roles.  BaseShapes can be shared, in which case they consolidate information common to many objects and shapes.  Few of these (e.g. one for each combination of clasp/getter/setter).  Alternatively, they can be owned by a specific Shape, in which case they store extra information which did not fit in the shape or its object.  Owned BaseShapes are allocated for exactly those shapes which have a property table; these are random (but rare) shapes in the property tree, and lastProp shapes for dictionary objects.  PropertyTables/BaseShapes for shapes in the property tree can be purged on GC if desired.

So, what happens to the removed state?

----- JSObject:

objShape (bug 684505):

This bug will remove shape numbers entirely.  All caching will be done on shape pointers rather than numbers, and places where we GenerateOwnShape will (at least sometimes) fabricate a new shape pointer (forcing objects into dictionary mode).

clasp (not filed):

Store at lastProp->base->clasp.

flags (not filed):

Hand wavy-est field here.  Some flags can be removed entirely, others will move to (owned or unowned) BaseShape, Shape or TypeObject.  I don't see any flags which absolutely have to stay on the object itself.  It would be much better to remove the flags entirely than to remove TypeObject, as the type would need to move to Shape, both fattening the shape and causing us to allocate a lot more shapes.

newType (bug 684410):

This is only set for prototype JSObjects, and will move to a hashtable.  So making a generic object within the VM will require a hash lookup (still pretty cheap), but hot places which allocate objects can cache the newType pointers either in jitcode or not (e.g. JSON).

initializedLength (in a union with newType) (bug 586842 or a followup):

Only used for dense arrays, should be kept track of for all objects with dense element arrays but can be stored at a negative offset from elements (resurrect dslots[-1], except this time with abstractions in place).

parent (bug 650353):

Needs compartment-per-global to go away.

privateData:

Can use the storage for the last fixed slot in classes which JSCLASS_HAS_PRIVATE I think.  Bug 586842 / a followup can store array lengths in a similar fashion to initializedLength (it's good for regalloc to have length/initializedLength/capacity all at offsets from the same pointer), call object private pointers can (hopefully) go away with bug 659577, the main remaining pure JS use of private I know of is FunctionClass, need to understand that case better.

capacity (bug 586842 / a followup):

The capacity for dynamic property arrays can be determined with some algorithm based on slotSpan, the capacity for dense element arrays can be stored in a similar fashion to initializedLength / array length.

----- Shape:

shapeid (bug 684505):

Removed with JSObject::objShape.

slotSpan (bug 684505):

For non-dictionary shapes, determined from the slot (slots must steadily increase for property tree shapes).  Slot spans for dictionary objects are tracked in the object's owned BaseShape.

getter/setter (bug 684505):

Moved to BaseShape.  From measuring on membench50, ~2/3 of created shapes have neither a getter nor setter, and the lion's share of the rest have specific combinations of C++ getters/setters (mainly for call objects and XPCOM).  Method shapes store the function object in both the getter and in a slot on the object, the former seems relatively easy to remove.  BaseShapes would need to be made for each combination of JSObject getter/setter, presumably rare in web code.  Internally we make getter/setter JSObjects based on Natives passed in via JS_DefineProperty, not sure how many we make and if hot I don't know why these couldn't be JSPropertyOps.
(In reply to Brian Hackett from comment #0)
> Rough proposal for the target state of object and shape shrinking.  Filing
> this separately from bug 637931 as this is for discussion and bug 637931 is
> for tracking.
> 
> The working observation here is that while benchmarks make few shapes,
> websites make many, and little will be accomplished if we shrink objects by
> fattening shapes (borrow from Peter to pay Paul etc.).  Eventual goal,
> spanning various bugs (several already in progress):

I like it. And good point about shape size vs object size. I do want to push this a little further.

> JSObject:
> 
> Shape *lastProp;
> TypeObject *type;
> Value *properties;
> Value *elements;

JSObject looks good. IIRC, TypeObjects are distinct if the prototype differs or the prototype is object and the allocation site differs, so TypeObject is neither a coarsening nor a refinement of Shape. It sounds like a question for another day, but I do wonder if there's any chance we can reduce those to one pointer here someday.

Also, can you remind me of the rationale for having properties and elements in separate arrays? Is it to simplify addressing, simplify other code, or avoid copying a long elements array when a small property set grows?

> Shape:
> 
> BaseShape *base;
> jsid propid;
> Shape *parent;
> kids union;
> 64 bits of assorted flags
> 
> BaseShape:
> 
> Class *clasp;
> getter union;
> setter union;
> PropertyTable *table;
> flags, slotspan, ...;

I'd like to get this cleaned up further. The main issue is that a "property descriptor" is a logically separate concept from "shape" (which is more or less "property descriptor list"), and I'd really like to get that reflected in the data types. 

One question there is whether the normal shape list should be a vector or a linked list. That probably needs to be decided based on some observations about shape populations.

> On 32 bit platforms this shrinks JSObject from 10 words to 4 words and Shape
> from 10 words to 6 words.  It also adds a BaseShape structure, which has two
> roles.  BaseShapes can be shared, in which case they consolidate information
> common to many objects and shapes.  Few of these (e.g. one for each
> combination of clasp/getter/setter).  Alternatively, they can be owned by a
> specific Shape, in which case they store extra information which did not fit
> in the shape or its object.  Owned BaseShapes are allocated for exactly
> those shapes which have a property table; these are random (but rare) shapes
> in the property tree, and lastProp shapes for dictionary objects. 
> PropertyTables/BaseShapes for shapes in the property tree can be purged on
> GC if desired.
(In reply to David Mandelin from comment #1)
> JSObject looks good. IIRC, TypeObjects are distinct if the prototype differs
> or the prototype is object and the allocation site differs, so TypeObject is
> neither a coarsening nor a refinement of Shape. It sounds like a question
> for another day, but I do wonder if there's any chance we can reduce those
> to one pointer here someday.

It would be possible to flatten these to one pointer by requiring that the shape of an object determines its type.  We used to do this, but dropped this requirement as it was causing the number of shapes to bloat.  The main issue is for functions, where we will try to use many different types for the different functions (incurring little overhead, since the TypeObjects themselves are allocated lazily).  Functions normally have the same shape as each other, so making shape determine type causes a lot of extra shapes to get allocated.  Lazily allocating shapes doesn't seem reasonable, as it would probably mean an extra is-this-lazy check every time we do a shape check.

> Also, can you remind me of the rationale for having properties and elements
> in separate arrays? Is it to simplify addressing, simplify other code, or
> avoid copying a long elements array when a small property set grows?

It's mostly simplification, but I think it will be more space efficient than trying to combine the two together.  If the properties and elements are in the same array, for efficient indexing the elements will have to come first, and objects will need to be reshaped every time the element capacity changes.  Moreover, the elements need several words of metadata (capacity, initialized length, array length?) which would need to be allocated for all objects with dynamic properties, not just those with indexed properties.  By keeping the two separate we can use a shared singleton array for objects with no indexed properties.

> I'd like to get this cleaned up further. The main issue is that a "property
> descriptor" is a logically separate concept from "shape" (which is more or
> less "property descriptor list"), and I'd really like to get that reflected
> in the data types. 

Most property descriptor stuff could live in the BaseShape (taking suggestions on name changes!), though in the second 'owned' role the BaseShape is more than that and has information about both the property lineage and object itself.

I think it's most important to optimize for space and efficient access here (e.g. minimize branching needed to get X bit of info), and as long as the invariants are simple and clear it should be ok for separate concepts to be split across multiple structures or combined in one structure.

> One question there is whether the normal shape list should be a vector or a
> linked list. That probably needs to be decided based on some observations
> about shape populations.

I've wondered about this myself but think this should be followup work.  Going to leave property trees and tables alone for now.
(In reply to Brian Hackett from comment #2)
> (In reply to David Mandelin from comment #1)
> > JSObject looks good. IIRC, TypeObjects are distinct if the prototype differs
> > or the prototype is object and the allocation site differs, so TypeObject is
> > neither a coarsening nor a refinement of Shape. It sounds like a question
> > for another day, but I do wonder if there's any chance we can reduce those
> > to one pointer here someday.
> 
> It would be possible to flatten these to one pointer by requiring that the
> shape of an object determines its type.  We used to do this, but dropped
> this requirement as it was causing the number of shapes to bloat.  The main
> issue is for functions, where we will try to use many different types for
> the different functions (incurring little overhead, since the TypeObjects
> themselves are allocated lazily).  Functions normally have the same shape as
> each other, so making shape determine type causes a lot of extra shapes to
> get allocated.  Lazily allocating shapes doesn't seem reasonable, as it
> would probably mean an extra is-this-lazy check every time we do a shape
> check.

Makes sense.

> > Also, can you remind me of the rationale for having properties and elements
> > in separate arrays? Is it to simplify addressing, simplify other code, or
> > avoid copying a long elements array when a small property set grows?
> 
> It's mostly simplification, but I think it will be more space efficient than
> trying to combine the two together.  If the properties and elements are in
> the same array, for efficient indexing the elements will have to come first,
> and objects will need to be reshaped every time the element capacity
> changes.  

We could put the properties at negative indices.

> Moreover, the elements need several words of metadata (capacity,
> initialized length, array length?) which would need to be allocated for all
> objects with dynamic properties, not just those with indexed properties.  

Negative indices might still be possible, but they seem rather less appealing if we have to put those things in there somewhere.

> By keeping the two separate we can use a shared singleton array for objects
> with no indexed properties.

OK, thanks for the explanation. I agree with keeping them separate.

> > I'd like to get this cleaned up further. The main issue is that a "property
> > descriptor" is a logically separate concept from "shape" (which is more or
> > less "property descriptor list"), and I'd really like to get that reflected
> > in the data types. 
> 
> Most property descriptor stuff could live in the BaseShape (taking
> suggestions on name changes!), though in the second 'owned' role the
> BaseShape is more than that and has information about both the property
> lineage and object itself.
> 
> I think it's most important to optimize for space and efficient access here
> (e.g. minimize branching needed to get X bit of info), and as long as the
> invariants are simple and clear it should be ok for separate concepts to be
> split across multiple structures or combined in one structure.

I agree.

> > One question there is whether the normal shape list should be a vector or a
> > linked list. That probably needs to be decided based on some observations
> > about shape populations.
> 
> I've wondered about this myself but think this should be followup work. 
> Going to leave property trees and tables alone for now.

OK.
> 
> On 32 bit platforms this shrinks JSObject from 10 words to 4 words and Shape
> from 10 words to 6 words.

/me swoons
(In reply to David Mandelin from comment #1)
> 
> I'd like to get this cleaned up further. The main issue is that a "property
> descriptor" is a logically separate concept from "shape" (which is more or
> less "property descriptor list"), and I'd really like to get that reflected
> in the data types. 

In my patch for bug 631138 I used "shape lineage" for the latter, which has some precedent in existing comments.  

Having two types, e.g. Property and Shape, would be good, assuming it can be done without increasing sizes of things.
 
> One question there is whether the normal shape list should be a vector or a
> linked list. That probably needs to be decided based on some observations
> about shape populations.

By "normal shape list" I guess you mean dictionary mode lists?  If you had a specialized dictionary mode version of Shape (e.g. "DictShape") it could omit the |parent| and |kids| fields, saving two words if you stored it in a Vector.  But Vectors can waste space due to their doubling growth strategy.  So profiling is needed, yes.
(In reply to Nicholas Nethercote [:njn] from comment #5)
> If you had a
> specialized dictionary mode version of Shape (e.g. "DictShape") it could
> omit the |parent| and |kids| fields, saving two words if you stored it in a
> Vector.

That would complicated GC, presumably.  Maybe DictShape could point to a vector... that would reduce the amount of marking done by GC?
(In reply to Nicholas Nethercote [:njn] from comment #5)
> By "normal shape list" I guess you mean dictionary mode lists?  If you had a
> specialized dictionary mode version of Shape (e.g. "DictShape") it could
> omit the |parent| and |kids| fields, saving two words if you stored it in a
> Vector.  But Vectors can waste space due to their doubling growth strategy. 
> So profiling is needed, yes.

I've wondered about this but think it would be tricky when dealing with property deletion.  Deleting a property would leave holes in the vector, and those holes can't be filled with new properties because that would change enumeration order (enumeration order isn't specified by the ES spec but apparently the internet gets angry when you try to change it).  So an object that has repeated insertions and deletions could grow without bound.  This could be resolved by detecting such growth on property insertion and occasionally compacting objects, but doing so would change shape slots and mess with things like active and cached iterators.  Not impossible, but hard and needs numbers on how much it would save.
Can this bug be closed now that objshrink has landed on m-c?
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.