1038917 - (shypes) SpiderMonkey: Unify Shapes and ObjectGroups

Reporter

Description

•

10 years ago

We have talked at various points about different approaches to unifying shapes and types in a more harmonious way than our current bifurcated system does.

The main approach that we had talked about before was the notion of rooting shape trees under TypeObjects, and moving the typeset mappings into the shape tree itself.

Under this scheme every TypeObject would associate with an empty root shape, and all objects with a given TypeObject will have a shape that is contained within the tree rooted at its that TypeObject.

The mappings that a TypeObject would hold could be spread out across the shape tree - every shape object would own the TypeSet for that property.

This approach promised a number of nice wins, and prompted some concerns. The concerns centered around potential shape redundancy for objects with different types but the same shape. The benefits included 1) storing unboxed values in objects (since the shape’s typeset can specialized value formattings), and 2) better information from shape guards (since guarding on a shape can tell you type-information too).

I’ve been pondering this topic off and on and I think I came up with a very interesting unification of shapes and types that is simpler, more precise, and more dynamic than both the current approach and the talked-about “root shape trees in TypeObjects” approach.

I propose that we can eliminate TypeObjects entirely, and move TI over to an implementation where shape trees directly model families of related types. In this approach, Shapes would be associated with types, and Shapes would be recorded as the type-of-object token in TypeSets. The approach would also eliminate the need for definite property analysis - instead arriving at more robust optimizations for TI-based property accesses using no static analysis whatsoever. Lastly, the approach should free us from other places where there exists ad-hoc pattern matching mechanisms that advise TypeObject assignment (e.g. that weird thing we do to pattern-match constructor-constructors in byte code to decide TypeObject assignment).

The basic idea is this: the current system is limited by the fact that we cannot change the type of an object once the object has been observed by any heap typeset. This is because heap typesets model persistent slots that exist in memory, and they are expected to conservatively cover all values that may exist within that slot. If an object could change types, then it could be stored into a slot and its original type recorded in the typeset for the slot, and then changed to a different type, rendering the typeset inconsistent with the slot.

Now, while the current TI implementation doesn’t allow for objects to change their types easily, in theory we should be able to allow objects to vary their types, as long as we can assert that the new type of the object is a subtype of its previous type.

Consider an object X with type A that is stored into slot S. The store will add the type A to the typeset for slot S. If later X’s type changes to a subtype A’ of A, then the typeset for slot S is still valid without any changes.

With that intuition in place - it becomes reasonably clear that as long as we treat the shape tree as a tree of types where child shapes are subtypes of parent shapes, we can treat shapes the same way as we do types: recording them in typesets, and using constraint-based optimization with them.

So the above is the broad sketch of the intuition behind this. Below I’ll try to give a slightly more in-depth treatment. So if we’re taking Shapes-as-Types, then our basic unit of optimization is the shape tree, diagrammed below:

* <- root shape R
/|\
/ | \
/ | \
/ / \
/ \ \
/ \ \
/ \ \
/ / \
/ / \
/ * <- shape S
/ / \ \
/ / \ <- subtree S+
/ / \ \
/ - - - - \
- - - - - - - - - - - - - - -

As with the current shape tree, each of the vertices describe some structural change to the object. The path from the root to the shape, likewise, represents the set of all changes from the empty object to the target object, summarizing the structure of the object.

We can associate with every shape in the tree two implicit types: the specific and general type at that shape.

The specific type at the shape is simply the type described by the path from the root to the shape. A shape-guard on an object will reveal its specific type.

The general type at a shape is the lowest-common-denominator of the set of all specific types occurring in the subtree for the shape (including the shape itself).

Both of these types are represented implicitly in the shape tree: one can ask questions of the specific type by asking questions of the path from the root to the shape, and one can ask questions about the generic type by looking at all the specific types in the subtree.

As the runtime proceeds, and objects get stored into slots, we will record the shape of the object in the heap typesets. As long as a heap typeset treats the presence of a shape as indicating the presence of the general type associated with that shape, type sets can stay consistent while objects are free to change their shape/type “down the tree” as they mutate.

For now, let’s keep things simple and just talk about AddProp operations on objects. The AddProp shape might look like this:

struct AddPropShape:
Shape *parent
Shape *children
Shape *siblings
PropertyName propName
HeapTypeSet propTypes

Now let’s build a simple test case and analyze its behaviour. Consider the constructor function:

function Tree(size) {
this.size = size;
if (arguments.length == 2) {
this.value = arguments[1];
} else {
this.left = arguments[1];
this.right = arguments[2];
}
}

This constructor defines a basic tree structure, where each vertex may be a leaf containing a ‘value’ property, or a branch containing ‘left’ and ‘right’ properties. All instances contain a ‘size’ property.

Let’s consider the behaviour of the shape tree under the following sequence of operations:

var t1 = new Tree(1, “foo”);
var t2 = new Tree(1, “bar”);
var t3 = new Tree(t1.size + t2.size, t1, t2);

In the first construction (t1), the following shape lineage will be created:

S1:TreeRoot
|
|- - -> S2: AddProperty “size”
| Slot 0
| TypeSet { Int32 }
|
|- - -> S3: AddProperty “value” <- - - - Shape(t1)
Slot 1
TypeSet { String }

In the second construction (t2), the same shape lineage will be re-used.
In the third construction (t3), the shape tree will branch at “size”:

S1:TreeRoot
|
|- - -> S2: AddProperty “size”
| Slot 0
| TypeSet { Int32 }
|
|- - -> S3: AddProperty “value” <- - - - Shape(t1), Shape(t2)
| Slot 1
| TypeSet { String }
|
|- - -> S4: AddProperty “left”
| Slot 1
| TypeSet { S3 }
|
|- - -> S5: AddProperty “right” <- - - - Shape(t3)
Slot 2
TypeSet { S3 }

At the time the “left” and “right” properties are assigned in t3, the shapes of t1 and t2 are both S3, so S3 gets recorded in the typeset for the “left” and “right” properties.

The key thing to note here is that the heap typeset captures the type of the object AT THE TIME OF THE STORE. At any given future point in time, the engine can be sure that the object will either have that recorded type, or a subtype.

We can imagine that as Trees are constructed, the shape tree will eventually stabilize to the following:

S1:TreeRoot
|
|- - -> S2: AddProperty “size”
| Slot 0
| TypeSet { Int32 }
|
|- - -> S3: AddProperty “value” <- - - shape of leaves
| Slot 1
| TypeSet { . . . }
|
|- - -> S4: AddProperty “left”
| Slot 1
| TypeSet { S3, S5 }
|
|- - -> S5: AddProperty “right” <- - - shape of branches
Slot 2
TypeSet { S3, S5 }

Namely, branch objects will have “size”, “left”, and “right”, with size being an int32, and left and right referring to instances of S3 or S5 (other leaves or branches).

Let’s say we are trying to optimize a property access “t.left.size”, and we know that “t” has concrete shape S5 (let’s say we know that from a guard check). The optimizer has to answer the following series of questions:

1. Where does “t.left” reside within “t”
2. What can the result of “t.left” be?
3. Where does “t.left.size” reside within “t.left”?
4. What can the result of “t.left.size” be?

To answer question 1:
The shape chain [S5, S4, S2, S1] contains a definition for “left” at S4.
The slot location for it is defined to be 2.

Since “t” is known to have exactly shape S5, this implication is absolute. Slot location information doesn’t change for an individual shape. The optimizer can generate code to read “t.left” without attaching any constraints or adding any guards.

To answer question 2:
The TypeSet for property “left” on S4 indicates it contains instances of S3 and S5.
The TypeSet may change in the future, so the optimizer attaches a freeze constraint on it.

The optimizer now knows that the result of “t.left” is an object that has a shape that is a sub-shape of either S3 or S5. Since it can only be an object, the optimizer knows that it is stored unboxed, and can retrieve the pointer directly.

To answer question 3:
The optimizer has the TypeSet describing the value just obtained.
It queries the typeset directly, and discovers that all the types in the subtrees rooted at S5 and S3, have a binding for “size” at a common shape S2, which places the “size” property at slot 0.

The optimizer attaches a constraint on the subtrees rooted at S3 and S5, so that if the subtree under either of them changes, the constraint has an opportunity to invalidate code. It is now free to retrieve “t.left.size” from slot 0 of “t.left”.

To answer question 4:
The TypeSet for “size” indicates it contains an Int32, stored unboxed in the slot. The optimizer attaches a constraint and generates the read.

Ultimately, from this we can optimize the property access down to:

loadPtr tReg[slot2], tmp0
loadInt32 tmp0[slot0], tmp1

One thing to notice here is that no static analysis was needed to be able to generate this code. In fact, this approach handles far more situations than our current definite property analysis. Consider the following two cases:

function A() {
this.x = 3;
this.y = eval(. . .);
this.z = 9;
}

function B() {
this.q = 3;
}
function makeB() {
// B is _ALWAYS_ constructed using makeB
var b = new B();
addMorePropertiesTo(b);
return b;
}

In the case of A, definite property analysis cannot look past the eval() in the constructor. Thus, only “x” will be a definite property in the TypeObject for A’s instances. All other property accesses will require guards.

In the case of B, definite property analysis cannot look past the constructor to see that other properties are added to B. All property accesses except for “q” will require shape guards.

Using shapes-as-types, when the constructed object flows into heap slots and through code, TI can dynamically capture specific information about an object _at the time it got stored_. Intermediate types will only pollute typesets that half-constructed objects flow into. Every site that only sees fully-constructed objects will be cleanly optimizable. Every site that sees a partially-constructed object will still be able to optimize operations that depend on the partially constructed structure only.

I’ll wrap this up because it’s getting ridiculously long. There are a lot of “add-on” ideas that I think we can experiment with on top of this infrastructure. But I’ll try to give what I think is a high-level summary of the proposal:

Currently, we have shapes (which extend), and types (which expand). We can combine these into one construct: shapes-as-types. Types become implicit within the shape tree, and every shape becomes associated with two types: its concrete type, and its general type. The specific type is the type of a particular object with that shape, and the general type represents the meet of all the specific types in a subtree.

Heap typesets track general types (by just storing the shape pointer), and objects are now allowed to “move down the shape tree” (i.e. change their shape and type).

The optimizer can generate optimized code by attaching constraints to various parts of the shape/type tree, such as freezing typesets in the tree, or freezing subtrees within the tree.

All this leads to a dynamic optimization approach that lets us eliminate static property analysis, and delivers better results to boot.

Brian Hackett [Laid off!]

Comment 1

•

10 years ago

I think this proposal generally looks fine, though it would be good I think to get a better understanding of the trade/offs involved here.

Generally with analyses, making things more dynamic will increase precision, at the cost of higher runtime overhead. The main cost with this approach seems to be in making type sets larger (due to the additional shapes in them) and making barriers on those type sets more expensive to execute. In your tree example, instead of a single TypeObject for the trees there will be two leaf shapes, S3 and S5, and any barrier on trees will need to test each of these shapes instead of the single type. If there were any more shapes which the trees could acquire then the size of the type sets would continue increasing, requiring invalidation, making barriers even more expensive and eventually leading us to deoptimize the type set entirely and lose all information about the trees which could be seen there.

In contrast, with the current architecture we would be able to answer the four questions in the tree example with the same degree of precision: the definite properties analysis can figure out that |size| is definite, but doing anything more requires shape checks. If more properties were added to the tree objects later on, this information will not degrade and type barriers will not be affected.

So, I think this is to a large extent trading cheaper type barriers and smaller type sets for better precision in definite property information. It's hard to say which is more valuable. Over time, we've reduced the maximum size of type sets to avoid expensive barriers, and I think the current limit of 32 is still too high (h4writer or bz know more I think). And in wild web code the proportion of property accesses on definite properties is pretty low, I think. We measured that for the PLDI paper way back when but not for a while. My impression has been that the analysis does poorly in large part because there are fewer properties that actually are definite in this code (such as in your tree example) rather than because the analysis is too stupid to pick them up.

I guess my big concern is that this proposal is a gigantic change and needs a very strong justification, which I don't really see here. The issues you're addressing seem like they could also be handled by smaller and more incremental changes. The definite property analysis could be improved to (optionally) be more dynamic when the static analysis gets confused, in a way which wouldn't get confused by your makeB() example, but anything done in this direction should be motivated by real code which the analysis breaks down on.

Jan de Mooij [:jandem]

Comment 2

•

10 years ago

Thanks for filing this, very interesting! The current system works pretty well, but I agree the definite properties analysis is fragile and when it doesn't work you know a property's types and everything but not its slot number; that's just not ideal. Brian does raise some valid concerns though; maybe we can gather some data to see how this will likely behave on benchmarks etc.

(In reply to Kannan Vijayan [:djvj] from comment #0)
>   The optimizer now knows that the result of “t.left” is an object that has
> a shape that is a sub-shape of either S3 or S5.  Since it can only be an
> object, the optimizer knows that it is stored unboxed, and can retrieve the
> pointer directly.

What happens if we then do:

  t.left = 3

Will S4's TypeSet change from { S3, S5 } to { S3, S5, int32 }? Then what will we do with the unboxed objects stored in that slot in the other Tree objects, do we have to walk the heap and box them (very expensive)? Or will we split the shape tree or something?

Nicolas B. Pierron [:nbp]

Comment 3

•

10 years ago

I have multiple concerns, mostly directed towards the memory consumption of such dynamic analysis.  Remember that we are extremely constraint on some devices (128 MB of zRAM), and that we might aim at device with even lower amount of memory.

In your suggestion, you seems to suggest that the TypeSet should always contain a precise upper bound of types which have been discovered using the setters of the property.  Keeping such information has a cost in terms of memory.

We need such upper bound in order to avoid guards in the generated code, but do we really need to maintain a precise upper bound before looking into compiling code?  Could we refine our typesets later, such as when we are in baseline, or when we are running the GC?

  S1:TreeRoot
  |
  |- - -> S2: AddValueProperty “size”
          |   Slot 0
          |
          |- - -> S3: AddValueProperty “value”
          |           Slot 1
          |
          |- - -> S4: AddValueProperty “left”
                  |   Slot 1
                  |
                  |- - -> S5: AddValueProperty “right”
                          Slot 2

Then, when we reach baseline, we can start building the Shape tree that you mentioned in comment 0, by both looking at the result of getters / setters, and reshape on the fly these objects with the type-specialized shapes.

If there is still some object keeping the untyped shape alive, we can guard against untyped objects in the generated code.  The GC can then discard / reshape the remaining objects to use a type-specialized shape, and remove the shape guard from Ion code.

If the type is an additional information that we refine later, do we really want to have this information directly on the shape, or could that be annotating the shape?

  S1:TreeRoot
  |
  |- - -> S2: AddValueProperty “size”
          |   Slot 0
          |
          |- - -> S3: AddValueProperty “value”
          |       |   Slot 1
          |       |
          |       |- - -> S6: AnnotateProperty “size” Type { Int32 }
          |
          |- - -> S4: AddValueProperty “left”
                  |   Slot 1
                  |
                  |- - -> S5: AddValueProperty “right”
                          |   Slot 1
                          |
                          |- - -> S7: AnnotateProperty “size”  Type { Int32 }
                                      AnnotateProperty “left”  Type { S6, S7 }
                                      AnnotateProperty “right” Type { S6, S7 }

(In reply to Kannan Vijayan [:djvj] from comment #0)
> We can imagine that as Trees are constructed, the shape tree will eventually
> stabilize to the following:
> 
>   S1:TreeRoot
>   |
>   |- - -> S2: AddProperty “size”
>           |   Slot 0
>           |   TypeSet { Int32 }
>           |
>           |- - -> S3: AddProperty “value”  <- - - shape of leaves
>           |           Slot 1
>           |           TypeSet { . . . }
>           |
>           |- - -> S4: AddProperty “left”
>                   |   Slot 1
>                   |   TypeSet { S3, S5 }
>                   |
>                   |- - -> S5: AddProperty “right”  <- - - shape of branches
>                           Slot 2
>                           TypeSet { S3, S5 }

A slightly different question, which also leads to the idea of re-shape is the following code.  Is there any way to optimize this code by moving the property "size" to Slot 0 even if it is at the bottom of the shape tree?

  function Tree(size) {
    if (arguments.length == 2) {
      this.value = arguments[1];
    } else {
      this.left = arguments[1];
      this.right = arguments[2];
    }
    this.size = size;
  }

Kannan Vijayan [:djvj]

Reporter

Comment 4

•

10 years ago

Thanks for the feedback guys. There are a couple of shared concerns brought up so I'll just address by point instead of quote-replying.

1. bhackett & nbp - overhead of more precise typesets being larger

The subtyping discipline imposed on general types provides a mechanism for addressing this seamlessly. It is already true in TI that typesets can generalize over time. This happens currently, for example, when we generalize typesets from having a list of TypeObjects to a single AnyObject member.

This ability is enhanced by the implementation. For example, the heap typeset containing {S5, S3} can be generalized to {S2} as a policy or heuristic decision, and this policy can be made tunable - so for example on lowmem B2G devices we can tell TypeSets to be more aggressive in generalizing to LCD types, and on highmem desktop builds we can tell them to be more precise.

Additionally, by eliminating the TypeObject data structure and moving it into the shape tree, there is a bunch of memory implicitly saved. TypeObjects and Shapes both model property maps, and the overhead of the TypeObject's map (redundant key storage, hashtable overhead) is eliminated, as well as the allocation overhead of the TypeObject.

I also wanted to address in particular the issues with definite property analysis. When I was analyzing Swooop (a web game based on PlayCanvas) for perf issues, I found that it was extremely easy for developers to trip up DPA. Defining properties too late, being too fancy in the constructor, constructor being too large, making object changes outside the constructor, etc.. We can make DPA better, but ultimately it's always going to be an ad-hoc solution littered with invisible cliffs that developers can unwittingly fall off of.

2. jandem - unboxed object issues when typesets change to allow varied representations.

You're right, if we implement unboxed value storage, then this change would require either scanning and modifying the objects (not practical), or extending the shape tree.

You mention "splitting" the shape tree, but we would not actually split the tree. Instead, we'd extend the shape with a new "ExtendTypeSet" shape:

S1:TreeRoot
|
|- - -> S2: AddProperty “size”
| Slot 0
| TypeSet { Int32 }
|
|- - -> S3: AddProperty “value”
| Slot 1
| TypeSet { . . . }
|
|- - -> S4: AddProperty “left”
| Slot 1
| TypeSet { S3, S5 }
|
|- - -> S5: AddProperty “right”
| Slot 2
| TypeSet { S3, S5 }
|
|- - -> S6: ExtendTypeSet S4
TypeSet { Int32 }

This would keep typesets containing S5, S4, S2, or S1 consistent. When this new shape gets created, the "constraints on the general type" (as opposed to constraints on the specific type) registered with shapes S5, S4, S2, and S1 would be invalidated since conceptually the general type at each of those shapes has become looser.

Ideally, we'd want to have the ability to avoid this sort of shape outgrowths, and there ways to address this. One of the ways of avoiding this is lazy shape-tree creation, which dovetails with my response to nbp's comments, so please read point 3 for further details.

3. nbp - on trying to be lazy about collecting some type information

I think what you're suggesting is workable. There are a few wrinkles to address, but it should be doable.

In your example, we'd type-specialize objects once we got down to baseline, but in the meantime, the objects would have already been stored into slot locations when they had the old (non-type-specialized) shape. The non-specialized shapes would have already gotten a chance to pollute heap typesets, and we can't fix that after the fact.

The main issue is the fact that heap typesets have to model a slot from the time of its creation, and slots can be created in cold code and then used in hot code. It seems hard to get around collecting information from the start.

The problem we're running into is that when we're collecting information on "cold types", we end up storing one of their early "less specific" types in typesets, and becoming over-conservative by the time we get to generating accesses. Making typesets more precise is very hard, but making them general is easy.

So let's say when a new root Shape/Type is created, it is marked "cold" at start (when it has zero instances). A cold root shape doesn't actually have a shape tree under it - it rather keeps a small list of all of its instances. When the cold shape becomes hot, it inspects the list of all of its instances, and constructs the most appropriate shape tree for them.

The issue to resolve is how these "cold objects" get tracked in TypeSets, since they don't get a true shape until their type becomes hot. To handle this, when cold objects get tracked by TypeSets, they are tracked directly like singletons (but their semantics differ from singletons).

The presence of a cold object reference is basically a "lazy type addition" to the typeset, and it stays lazy until the typeset needs to operate on the cold object's type (e.g. the optimizer is running and wants to ask questions of the typeset, or the typeset grows too large and needs to generalize itself to save memory). At this point, the typeset does the following:

* It looks at all the type-pointers of cold objects in the set.
* For each of those types, it converts them to "hot types" if they have not already been converted (ensuring that all the objects now have their specialized type).
* It replaces the cold objects in its set with the objects' current shape.

This operation is fine because it happens before anybody "looks" at the typeset. The nice thing about this approach is that it allows us to delay shape-tree building until we have a number of objects and a better probability that we'll know more about how they will behave in the future.

Incidentally, this also addresses shape-proliferation caused by extending shapes for typeset mutations that would imply value representation changes. Under this scheme, the typeset for "t.left" will be initialized to {S5, S3, Int32} from the point of lazy shape tree instantiation, and no shape extensions required.

Kannan Vijayan [:djvj]

Reporter

Comment 5

•

10 years ago

(In reply to Brian Hackett (:bhackett) from comment #1)
> I think this proposal generally looks fine, though it would be good I think
> to get a better understanding of the trade/offs involved here.
> 
> Generally with analyses, making things more dynamic will increase precision,
> at the cost of higher runtime overhead.  The main cost with this approach
> seems to be in making type sets larger (due to the additional shapes in
> them) and making barriers on those type sets more expensive to execute.  In
> your tree example, instead of a single TypeObject for the trees there will
> be two leaf shapes, S3 and S5, and any barrier on trees will need to test
> each of these shapes instead of the single type.  If there were any more
> shapes which the trees could acquire then the size of the type sets would
> continue increasing, requiring invalidation, making barriers even more
> expensive and eventually leading us to deoptimize the type set entirely and
> lose all information about the trees which could be seen there.
> 
> In contrast, with the current architecture we would be able to answer the
> four questions in the tree example with the same degree of precision: the
> definite properties analysis can figure out that |size| is definite, but
> doing anything more requires shape checks.  If more properties were added to
> the tree objects later on, this information will not degrade and type
> barriers will not be affected.
> 
> So, I think this is to a large extent trading cheaper type barriers and
> smaller type sets for better precision in definite property information. 
> It's hard to say which is more valuable.  Over time, we've reduced the
> maximum size of type sets to avoid expensive barriers, and I think the
> current limit of 32 is still too high (h4writer or bz know more I think). 
> And in wild web code the proportion of property accesses on definite
> properties is pretty low, I think.  We measured that for the PLDI paper way
> back when but not for a while.  My impression has been that the analysis
> does poorly in large part because there are fewer properties that actually
> are definite in this code (such as in your tree example) rather than because
> the analysis is too stupid to pick them up.
> 
> I guess my big concern is that this proposal is a gigantic change and needs
> a very strong justification, which I don't really see here.  The issues
> you're addressing seem like they could also be handled by smaller and more
> incremental changes.  The definite property analysis could be improved to
> (optionally) be more dynamic when the static analysis gets confused, in a
> way which wouldn't get confused by your makeB() example, but anything done
> in this direction should be motivated by real code which the analysis breaks
> down on.

You're right.  Any move in this direction requires a solid justification and case behind it, and a pretty broad consensus from the team.

We can always keep DPA around, but it rubs me the wrong way (that's not my "strong justification, mind you ;)").  It's this massive static analysis that we are forced to run on cold code (constructors being called for the first time).  It's the definition of up-front heavyweight work with little payoff.  We're going to do DPA on cold code just as well as hot code.  Everything about it runs against the whole ethos of dynamic optimization.

Boris Zbarsky [:bzbarsky]

Comment 6

•

10 years ago

I had a question about interaction with proto and class.  When njn posted a link to the "Improving JavaScript Performance by Deconstructing the Type System" in j-e-internals, our conclusion was:

  So type-fragmentation due to protos would exist, but the bigger issue of shape-tree
  fragmentation due to protos is not an issue. 

In this new proposal, I assume shape would in fact cover proto (and Class?) and hence we would get shape-tree fragmentation due to protos.  How much of an issue is that in practice?

Brian Hackett [Laid off!]

Comment 7

•

10 years ago

(In reply to Boris Zbarsky [:bz] from comment #6)
> I had a question about interaction with proto and class.  When njn posted a
> link to the "Improving JavaScript Performance by Deconstructing the Type
> System" in j-e-internals, our conclusion was:
> 
>   So type-fragmentation due to protos would exist, but the bigger issue of
> shape-tree
>   fragmentation due to protos is not an issue. 
> 
> In this new proposal, I assume shape would in fact cover proto (and Class?)
> and hence we would get shape-tree fragmentation due to protos.  How much of
> an issue is that in practice?

Shapes already determine an object's proto and clasp, except for the hasUncacheableProto() hack, which is a workaround related to bug 707717 and can maybe be removed.

Brian Hackett [Laid off!]

Comment 8

•

10 years ago

(In reply to Kannan Vijayan [:djvj] from comment #4)
> This ability is enhanced by the implementation.  For example, the heap
> typeset containing {S5, S3} can be generalized to {S2} as a policy or
> heuristic decision, and this policy can be made tunable - so for example on
> lowmem B2G devices we can tell TypeSets to be more aggressive in
> generalizing to LCD types, and on highmem desktop builds we can tell them to
> be more precise.

OK, but what would the generated code look like then?  I figured the leaf shapes should go into the type sets because then set membership can be tested via pointer equality, but if shapes higher up in the tree are there then the set membership needs to chase the shape's parent pointers and that can get expensive with large objects.

> Additionally, by eliminating the TypeObject data structure and moving it
> into the shape tree, there is a bunch of memory implicitly saved. 
> TypeObjects and Shapes both model property maps, and the overhead of the
> TypeObject's map (redundant key storage, hashtable overhead) is eliminated,
> as well as the allocation overhead of the TypeObject.

Most of what a TypeObject is doing is storing the HeapTypeSets for the properties.  With your proposal these type sets would still be stored, but they would be on the Shapes instead, and there are a lot more shapes than type objects (7x as much GC heap memory for my current browser session).  Would you try to share type sets between shapes?

> I also wanted to address in particular the issues with definite property
> analysis.  When I was analyzing Swooop (a web game based on PlayCanvas) for
> perf issues, I found that it was extremely easy for developers to trip up
> DPA.  Defining properties too late, being too fancy in the constructor,
> constructor being too large, making object changes outside the constructor,
> etc..  We can make DPA better, but ultimately it's always going to be an
> ad-hoc solution littered with invisible cliffs that developers can
> unwittingly fall off of.

Unfortunately this issue is pretty complicated.  Even easier than tripping up the definite properties analysis is just adding properties inconsistently, or in an inconsistent order.  Code like this will continue to run slowly even with your proposal.  It just seems like a more general problem with JavaScript as a performance language, which could be addressed with tooling / developer outreach or with new constructs like TypedObjects (not to be confused with TypeObjects) but trying to completely fix it with engine changes seems intractable.

Do you know how often, on Swooop or normal websites or whatever, the definite properties analysis fails due to its brittleness vs. due to inconsistent property addition?

(In reply to Kannan Vijayan [:djvj] from comment #5)
> We can always keep DPA around, but it rubs me the wrong way (that's not my
> "strong justification, mind you ;)").  It's this massive static analysis
> that we are forced to run on cold code (constructors being called for the
> first time).  It's the definition of up-front heavyweight work with little
> payoff.  We're going to do DPA on cold code just as well as hot code. 
> Everything about it runs against the whole ethos of dynamic optimization.

Yeah, the analysis isn't a great fit (though it's not the only one; see the arguments usage analysis) with the rest of the compilation strategy.  Do you know how much time we spend running the definite properties analysis during normal browsing or on Swooop?  I think we could make the analysis less eager if we had a way of keeping track of the objects allocated before the analysis kicks in, as the main sticking point with the analysis is it needs to capture the state of all objects allocated with the type.  Maybe this wouldn't be too hard to do, and it would also give us some more information to use (i.e. when an object can be considered fully initialized) if we wanted to make the analysis more dynamic / robust.

Kannan Vijayan [:djvj]

Reporter

Comment 9

•

10 years ago

(In reply to Brian Hackett (:bhackett) from comment #8)
> (In reply to Kannan Vijayan [:djvj] from comment #4)
> > This ability is enhanced by the implementation.  For example, the heap
> > typeset containing {S5, S3} can be generalized to {S2} as a policy or
> > heuristic decision, and this policy can be made tunable - so for example on
> > lowmem B2G devices we can tell TypeSets to be more aggressive in
> > generalizing to LCD types, and on highmem desktop builds we can tell them to
> > be more precise.
> 
> OK, but what would the generated code look like then?  I figured the leaf
> shapes should go into the type sets because then set membership can be
> tested via pointer equality, but if shapes higher up in the tree are there
> then the set membership needs to chase the shape's parent pointers and that
> can get expensive with large objects.

The generated code ends up becoming worse for those properties that are indeterminate on the meet type, but code for accessing properties from structure shared between the two will remain the same.

There's no way around storing non-leaf shapes in typesets - it can happen without trying (object gets stored, then changes structure, moves into new leaf shape, and now the shape in the typeset is not a leaf anymore).

A huge part of the membership test cost will be walking the shape chain up two objects that aren't in the same tree, just to determine that they have no meet shape already present in the set.  If every shape caches its root shape, then these cases can be eliminated with a simple compare.  If every shape caches its depth in the shape tree, that too can be used to eliminate some cases.

> Most of what a TypeObject is doing is storing the HeapTypeSets for the
> properties.  With your proposal these type sets would still be stored, but
> they would be on the Shapes instead, and there are a lot more shapes than
> type objects (7x as much GC heap memory for my current browser session). 
> Would you try to share type sets between shapes?

I can't think of an effective, reasonable way to share type sets between shapes.  There might be a way, but it seems like a hairy problem.

It's true that there are a lot more shapes, but the number of those shapes that flow into various slots is hardly going to be 7x the number of typeobjects that flow into those same slots.  I can admit, though, that there will be more shape polymorphism than type-polymorphism.  However, the fact that TypeSets can arbitrarily take any subset of the shapes in its sets, and replace them all with their meet shape.. would suggest that this is a question of how aggressively we want TypeSets to generalize.  If we make them generalize maximally (and save as much memory as possible), then we get back to the same sort of behaviour we have now (loose types), but otherwise lose little.

> Unfortunately this issue is pretty complicated.  Even easier than tripping
> up the definite properties analysis is just adding properties
> inconsistently, or in an inconsistent order.  Code like this will continue
> to run slowly even with your proposal.  It just seems like a more general
> problem with JavaScript as a performance language, which could be addressed
> with tooling / developer outreach or with new constructs like TypedObjects
> (not to be confused with TypeObjects) but trying to completely fix it with
> engine changes seems intractable.

yeah the inconsistent property adds are something that are still going to be an issue.  But there are many other ways things break and a lot of those are simply due to static visibility issues.  Yes there will remain code that defeats this, but there is currently a large amount of code that defeats the current analysis, and a lot of it would be covered by this implementation.

> Do you know how often, on Swooop or normal websites or whatever, the
> definite properties analysis fails due to its brittleness vs. due to
> inconsistent property addition?

Only anecdotally, and only with Swooop.  We'd need to collect numbers for definitive answers on this.  So just for Swooop, I never ran into inconsistent property addition being an issue in hot reads.  A huge number of issues were due to code like this:

function Thing(a) {
  this.x = 9;
  this.y = 10;
  this.z = SomeDeepCallTreeThatGeneratesAValue();
  this.a = ...
  this.b = ...
}

So all the props after 'z' (including 'z') would not be optimizable for GETPROP using TI.  Some of this code is actually hard to reorder so that the property adds are all visible to the analyzer.  Sometimes, rewriting to be friendly to the analyzer destroys the order and flow of the constructor body.

On a number of occasions I advised changing code to look something like this:

function Thing() {
  this.str = "";
  ... stuff that's opaque to DPA ...
  this.str = actualStringValue();
}

Basically to initialize properties to dummy values of the same type at the top, and then re-initialize them after the opaque part.  I did NOT feel cool telling people to do that.

> Yeah, the analysis isn't a great fit (though it's not the only one; see the
> arguments usage analysis) with the rest of the compilation strategy.  Do you
> know how much time we spend running the definite properties analysis during
> normal browsing or on Swooop?

Nope.  I need to collect numbers.  I guess part of this discussion is to figure out what information would need to be collected to be able to build a case for this.  We can summarize some of these quantitative questions and address them down the line.

> I think we could make the analysis less eager
> if we had a way of keeping track of the objects allocated before the
> analysis kicks in, as the main sticking point with the analysis is it needs
> to capture the state of all objects allocated with the type.  Maybe this
> wouldn't be too hard to do, and it would also give us some more information
> to use (i.e. when an object can be considered fully initialized) if we
> wanted to make the analysis more dynamic / robust.

See my response to nbp in point 3 of Comment 4.  It outlines a way of doing this that seems reasonable (and is implementable on our current system as is).

Steve Fink [:sfink] [:s:]

Comment 10

•

10 years ago

(In reply to Kannan Vijayan [:djvj] from comment #9)
> (In reply to Brian Hackett (:bhackett) from comment #8)
> > (In reply to Kannan Vijayan [:djvj] from comment #4)
> > > This ability is enhanced by the implementation.  For example, the heap
> > > typeset containing {S5, S3} can be generalized to {S2} as a policy or
> > > heuristic decision, and this policy can be made tunable - so for example on
> > > lowmem B2G devices we can tell TypeSets to be more aggressive in
> > > generalizing to LCD types, and on highmem desktop builds we can tell them to
> > > be more precise.
> > 
> > OK, but what would the generated code look like then?  I figured the leaf
> > shapes should go into the type sets because then set membership can be
> > tested via pointer equality, but if shapes higher up in the tree are there
> > then the set membership needs to chase the shape's parent pointers and that
> > can get expensive with large objects.
> 
> The generated code ends up becoming worse for those properties that are
> indeterminate on the meet type, but code for accessing properties from
> structure shared between the two will remain the same.
> 
> There's no way around storing non-leaf shapes in typesets - it can happen
> without trying (object gets stored, then changes structure, moves into new
> leaf shape, and now the shape in the typeset is not a leaf anymore).
> 
> A huge part of the membership test cost will be walking the shape chain up
> two objects that aren't in the same tree, just to determine that they have
> no meet shape already present in the set.  If every shape caches its root
> shape, then these cases can be eliminated with a simple compare.  If every
> shape caches its depth in the shape tree, that too can be used to eliminate
> some cases.

Maybe I don't understand the queries you're interested in, but if you're willing to cache stuff, then cache the DFS number of each node. Then testing whether a Shype is within the S2 subtree is a constant-time range check of S2.enter <= shype.enter <= shype.exit. The numbering would get stale when you add/remove nodes from the tree, but it wouldn't invalidate it for the nodes that have it assigned, and you could refresh the numbering periodically. If S2 is not numbered, you'd fall back to a linear parent scan. If your Shype is not numbered, you could fall back to climbing up to the nearest numbered parent. (Or when adding a node, inherit the parent's enter number and set the exit number to zero to mark it as unnumbered. Or immediately trigger a global renumbering.)

Sorry if I'm missing the point somehow, but at least conceptually it makes sense to me that if you want to access some slot described by a base shype/model, then you test whether your actual model is in its subtree.

Kannan Vijayan [:djvj]

Reporter

Comment 11

•

10 years ago

The subtree range check is very clever and novel to me :)  It makes perfect sense.  Also, "Shype"... nice :)  It makes it easy to distinguish when you're talking about the current implementation versus the proposed implementation.  I think I'll adopt that as well.

Kannan Vijayan [:djvj]

Reporter

Comment 12

•

10 years ago

(In reply to Brian Hackett (:bhackett) from comment #8)
> OK, but what would the generated code look like then?  I figured the leaf
> shapes should go into the type sets because then set membership can be
> tested via pointer equality, but if shapes higher up in the tree are there
> then the set membership needs to chase the shape's parent pointers and that
> can get expensive with large objects.

I realized that I misunderstood this question in my earlier response to it.  You're referring to the duplication of typesets on divergent shape tree branches.  I responded with an answer to the higher shape-polymorphism than typeobject-polymorphism question.

I thought some more about this issue and it seems possible to share typesets, using one of a couple of mechanisms.

The first would be to "search the tree for compatible typesets" when adding a new shape.  This would involve walking up the tree to the root, keeping track of the names defined along that path so far, and at each intermediate shype searching for a different path down that covers that set of names.  The shype would then share the other shype's property typeset.

If all property-defining shypes did this as policy, then all representational variants of the same set of properties would end up sharing the same typeset.

However, the search up the tree is somewhat expensive.  In some pathological case, one can imagine a shype tree that contains a single property chain of length N, and all other leaf paths in the tree are length N-1, defining all-but-one of the properties defined on the first chain.

So the worst case for this kind of search is linear on the size of the tree, but we can easily impose a bound on the number of backtracks and still cover most of the non-pathological cases.  So in reality we can make this operation as cheap as a walk up the shype chain + some tunable constant amount of work at each vertex.



The second way would be to use the lazy shype-tree idea in comment 4 point 3.  If cold root shypes simply collected a few instances before instantiating a shape tree for them, it could directly compute not only equivalent shypes to share typesets between, but it could presumably form a DAG where common suffix paths were shared.

However, I'm less keen on the DAG idea because it complicates things a lot.  Order info would have to be stored separately.  Every shype would need to support having multiple parents.  Using sibling-links for lists-of-children is no longer viable, since each shype could have multiple sibling groups.  All in all the data structure becomes a lot more complicated and hard to work with.

Nicolas B. Pierron [:nbp]

Comment 13

•

10 years ago

For people like me, who might need some time to figure it out:
  Shype = Shape + Type (- Tape)

Brian Hackett [Laid off!]

Comment 14

•

10 years ago

(In reply to Kannan Vijayan [:djvj] from comment #9)
> Only anecdotally, and only with Swooop.  We'd need to collect numbers for
> definitive answers on this.  So just for Swooop, I never ran into
> inconsistent property addition being an issue in hot reads.  A huge number
> of issues were due to code like this:
> 
> function Thing(a) {
>   this.x = 9;
>   this.y = 10;
>   this.z = SomeDeepCallTreeThatGeneratesAValue();
>   this.a = ...
>   this.b = ...
> }
> 
> So all the props after 'z' (including 'z') would not be optimizable for
> GETPROP using TI.  Some of this code is actually hard to reorder so that the
> property adds are all visible to the analyzer.  Sometimes, rewriting to be
> friendly to the analyzer destroys the order and flow of the constructor body.

This example should work fine with the definite properties analysis.  The analysis mainly breaks down (a) when |this| escapes somewhere to the heap or into a call which isn't able to be inlined, and (b) when properties are written to |this| in code that doesn't always execute.

> On a number of occasions I advised changing code to look something like this:
> 
> function Thing() {
>   this.str = "";
>   ... stuff that's opaque to DPA ...
>   this.str = actualStringValue();
> }
> 
> Basically to initialize properties to dummy values of the same type at the
> top, and then re-initialize them after the opaque part.  I did NOT feel cool
> telling people to do that.

OK, but why not?  Eagerly specifying the properties an object has is something that is done in pretty much every other performance language, and for a reason.  This is also the main thing being added by TypedObjects, so far as I can tell.  If these codebases started their non-trivial constructors with a block like "this.foo = 0; this.bar = null; ..." then the code will probably run faster and (arguably) be easier to read, even if the types of the initial values written don't quite line up with the final types.

Kannan Vijayan [:djvj]

Reporter

Comment 15

•

10 years ago

(In reply to Brian Hackett (:bhackett) from comment #14)
> This example should work fine with the definite properties analysis.  The
> analysis mainly breaks down (a) when |this| escapes somewhere to the heap or
> into a call which isn't able to be inlined, and (b) when properties are
> written to |this| in code that doesn't always execute.

The latter problem was not something I saw so much.  It seems developers these days are by in large aware that divergent property definitions on objects are a bad thing.  However, conditional writes can incidentally show up in a lot of places.  An extremely common thing was this sort of pattern:

function Constructor() {
  this.a = ...;
  this.b = ...;

  extend(this, DefaultProperties);
}

function extend(obj, other) {
  for (var name in other)
    obj[name] = other[name];
}

Now, the specific "extend" method varies.  Sometimes there are explicit conditionals in them:

function extend(obj) {
  if (!obj.prop)
    obj.prop = something;
  if (!obj.prop2)
    obj.prop2 = something else;
}

The former was what I was referring to.  This occurred in two game engines actually.  There are simply a lot of things that the analyzer cannot see through, even if it inlines.

These are not bad programming idioms.  They are in fact characteristic of the kind of things that dynamic languages offer as features: allowing procedural construction of objects by composing calls to helpers.

Another pattern from the codebase:

    var Player = function (entity) {
        this.entity = entity;
    };

    Player.prototype = {
        initialize: function () {
            this.game = ...;
            this.game.on('reset', this.reset, this);
            this.game.on('gameover', this.gameover, this);
            ...
            this.model = ...;
        }
    }

First off, chances are high that findByName is complex and not completely inlineable.  There are two issues here:

one, the initialize method doesn't run in the constructor.  Two, even if we were to look into it, the object would trivially escape at the callback registrations.

We have a bit of a skewed perspective on this because we've tuned DPA to cover many of the hot cases in the benchmarks we track, but it misses a LOT of cases in the wild - and not just in "ad-hoc web code", but application frameworks and game engines.


> OK, but why not?  Eagerly specifying the properties an object has is
> something that is done in pretty much every other performance language, and
> for a reason.  This is also the main thing being added by TypedObjects, so
> far as I can tell.  If these codebases started their non-trivial
> constructors with a block like "this.foo = 0; this.bar = null; ..." then the
> code will probably run faster and (arguably) be easier to read, even if the
> types of the initial values written don't quite line up with the final types.

I don't think this is a reasonable thing to ask the programmer to do.  It also doesn't solve the problem that well.  For fields that will be initialized to objects that will not be known until construction time, we now have to pre-initialize the field (polluting the typeset with null), and then re-initialize it to the object.

So now every property add that needs to be "pre-initialized" will emit two writes instead of one (and the first write probably not optimizable away because the whole necessity for the first write is some opaque code in between).  We improperly pollute the typeset with null when it may not be necessary, we force the developer to re-order his property assignments in a way that they may not find intuitive or reflective of the flow of logic, and we force the developer to refrain from using perfectly sensible tools like object extenders and compositional object construction tools.

I simply don't see the justifications for those severe and deep compromises.  These are common and desirable patterns, which have at least anecdotally shown up in the wild in major game engines (and probably a lot of other code), that we can't optimize because our approach is not good enough.  We shouldn't tell the developer to "be more static" simply because the optimizer can't handle it.

Shu-yu Guo [:shu]

Comment 16

•

10 years ago

Kannan's approach sounds more and more like a boil-the-ocean type approach, so given that the new proposed architecture has tradeoffs compared to the current architecture, my main concern is that if we work on this, do we go into another longish period of fixing regressions? Will the new architecture (once ironed out) raise the performance bar sufficiently high?

FWIW, since all the discussion has been surrounding definite property analysis, as Brian points out, TypeObjects shouldn't need any of that. It seems perfectly reasonable to me to tell the performance-minded programmers to just use TypedObjects.

A broader point is that analyses all have corner cases or patterns that they are ill-suited to optimize. My opinion over the years have shifted from making increasingly clever analyses to just making the engine more transparent with tooling.

(In reply to Kannan Vijayan [:djvj] from comment #15)
> I simply don't see the justifications for those severe and deep compromises.
> These are common and desirable patterns, which have at least anecdotally
> shown up in the wild in major game engines (and probably a lot of other
> code), that we can't optimize because our approach is not good enough.  We
> shouldn't tell the developer to "be more static" simply because the
> optimizer can't handle it.

I disagree here. I think telling the developer to "be more static" is fine. There will be cases that optimizers can't handle. It's a difference of opinion to be sure. TypeObjects perhaps seem to the purist JIT engineer as "giving up", and manual SIMD intrinsics perhaps seem as "giving up" too, instead of doing automatic vectorization. But to the audience of programmers who care about performance, I think they want harder guarantees for harder-to-optimize things (like object layout), and a cleverer dynamic optimization, more likely than not, is not that.

To play devil's advocate against myself, on paper, I do very much still like the idea of unifying the notion of object layout into one concept. Having shapes and types in the same place will open the door to dynamically adapting object representation (like Truffle does), which has been a point I harp on from time to time. But again, TypedObjects might be the worse-is-better answer to that.

Nicolas B. Pierron [:nbp]

Comment 17

•

10 years ago

This is not because we are in the process of making TypeObjects that we should not optimize the rest of the Web.  I think we should aim at reaching the same level of optimizations as-if we had TypeObjects.

From the point of view of adding analysis, maybe this is something we should prototype in JavaScript first, by collecting all the information and have an estimated size, and an estimated type-accuracy.  I know Kannan did some work to measure the Shape vs. TypeObject usage about a year ago, maybe it is time for making a dirty prototype in order to motivate such approach.

I agree with the concern that you raise about fixing Type inference issues.  One of the thing that was awful, after landing of TI and during the development of IonMonkey, was the flow security issues.  We should make sure that we provide enough testing surface such that fuzzers are well capable of finding those quickly.

Personally, I think that using Shype would reduce our risk of having security issues overall, as it would be only a unique source of information for making optimization choices, and it would be easy to check against.  This means that if we have any doubts about the inferred type, we can still guard the shype.  Adding unneeded shype guards would also be a nice way to ensure that our type system invalidation is correct (almost as we do with Range Analysis).

Kannan Vijayan [:djvj]

Reporter

Comment 18

•

10 years ago

(In reply to Shu-yu Guo [:shu] from comment #16)
> Kannan's approach sounds more and more like a boil-the-ocean type approach,
> so given that the new proposed architecture has tradeoffs compared to the
> current architecture, my main concern is that if we work on this, do we go
> into another longish period of fixing regressions? Will the new architecture
> (once ironed out) raise the performance bar sufficiently high?

I thought about this for a while last night, and I think some of the concerns can be addressed.

The regression issue doesn't weigh on me that much, for the following reasons:  If we look at the implementation path to this feature, it's possible to structure it in terms of small deltas that deliver benefits at each turn:

Step 1. Root shape trees inside TypeObjects.

This is something we already talked about, and it provides benefits.  A shape would imply a TypeObject, which means that shape checks + type constraints on the property typesets defined on the typeobject.. could provide the same transitive optimization opportunities that type-guards currently provide: a shape guard can tell us what the types of slots are.

Step 2. Move typesets into the the shape tree.

This would mostly be a memory optimization.  We'd still only track TypeObjects in TI, but the property->typeset mapping a TypeObject provides would be spread out across the tree, eliminating redundant key storage and the hashtable overhead for the property typeset map we currently use.

Typesets could be shared between distinct paths that lead to structurally equivalent objects (i.e. same property key set).

Step 3. Start tracking Shypes in TI

This would be the biggest change.. but it's not actually as big of a change as when TI first went in.  The interface between the compiler and the type engine has largely been worked out, and operates on very simple terms:  Compiler asks TI questions, provides it a token to identify the compile by.. and TI answers those questions, attaching whatever constraints are necessary to invalidate those answers when info changes.

That fundamental mechanic is not going to change, and will still largely be structured as it is now.  The compiler will still ask questions of TI in IonBuilder.  But instead of traversing TypeObjects and TypeSets, TI will traverse Shypes and TypeSets.

The significant difference/enhancement is that we enlarge the domain of questions which come back with good optimization answers, and IonBuilder gets more opportunities to generate good code.  Also important, we acquire a framework with which to understand those questions:

Each shype provides two target types to analyze: the specific and general type.  If an object's actual shype has been discovered through some guard, then the questions asked of that shype can refer to the specific type it describes.  If an object's actual shype is being analyzed through its presence in a heap typeset, then the questions asked of it refer to the general type it describes.  The difference between these two types of questions is not explicit, but rather implicit in the constraints that get attached when they are answered.

Overall, the change affects TI and Shapes, but the interface and conceptual flow of the rest of the engine remains the same.  This is not as big of a risk as it might seem at first glance.


Secondly, I wasn't around when TI-based jitting landed, but I'd speculate that part of the issue was landing and integrating not only a brand new dynamic type model, but writing the jitter infrastructure around that to generate good code.  These are two complicated things with very fine and subtle effects on each other, and they were both being done for the first time, at a time when the techniques being employed were all pretty brand new.

The codegen infrastructure is solid now, and it won't require much in the way of changes.  Most of these changes will force the adaption of some restricted subsystems: how we generate MIR in IonBuilder's frontend, how we generate Ion IC stubs, and how we generate Baseline IC stubs.  Things that will not require changes include the core JIT subsystem (all the optimization passes on MIR, Lowering, and Codegen), the core Baseline JIT, the IC stub design for both Ion and Baseline, any changes to our Bailout/Invalidation mechanisms, our OSR mechanisms, our stack representations.. any of that.

Yes, this is a big change, but it is nowhere near the complexity of designing and implementing a production dynamic type inference system and a jit to go along with it, implementing and developing a production optimizing JIT on top of it, and integrating it all into an in-the-wild production engine that had never seen anything like it before.


> FWIW, since all the discussion has been surrounding definite property
> analysis, as Brian points out, TypeObjects shouldn't need any of that. It
> seems perfectly reasonable to me to tell the performance-minded programmers
> to just use TypedObjects.
> 
> A broader point is that analyses all have corner cases or patterns that they
> are ill-suited to optimize. My opinion over the years have shifted from
> making increasingly clever analyses to just making the engine more
> transparent with tooling.

We discussed this on IRC and seem to have a difference in perspective (of degrees, as you mentioned there) on this matter.

My perspective derives from the view that dynamic languages, and the dynamic features they offer for specifying the behaviour of programs in powerful ways, are things with inherent value... just as static languages and the features they offer provide inherent value.

Static languages allow for nice tools, and assisted code transforms, and better AOT error checking, and more predictable performance and memory guarantees.

Dynamic languages allow for procedural type and interface generation, procedural object construction, and easy procedural generation of logic at runtime.  If I'm using the word procedural a lot here, it's to contrast with the more declarative behaviour of the typesystems in static languages.  A lot of the power and flexibility of these languages comes from the fact that they expose an interface to almost all of their semantic structures at a procedural level, allowing the developer to write functions and libraries and systems that interact with and modify those structures in powerful ways.

TypedObjects are nice, but they're a hypothetical tool that we're discussing implementing to help developers write good programs.  They are by no means a declaration that programs that don't use them are bad programs, or that they are examples of bad code.

So to me, this stance feels like telling developers: You wrote a good program, and that's great, but it's not going to go fast because we care more about optimizing these other types of good programs.

If that's a reality that we cannot change, for existential, practical, or resource reasons, it's a reasonable answer.. but that may not be the case (if this stuff works out).  If it's not the case, I don't think that answer is reasonable anymore.

Nicholas Nethercote [inactive]

Comment 19

•

10 years ago

Unsurprisingly, I'm interested in the effect on memory usage this change would have, for the following reasons.

- Shapes already take up a *lot* of space -- typically almost as much as objects. I've been trying to reduce their size just this week, which led to some tinkering around the edges (bug 1038038, bug 1038601) but no notable design changes. Type objects tend to take up much less space. E.g. in my current session the main runtime has 80 MiB of objects, 74 MiB of shapes, and 6 MiB of type objects.

- Historically, big changes of this sort have led to memory usage blow-outs, at least for some workloads. When TI first landed this happened. Also, Brendan did a big shape refactoring a few years ago and the same thing happened. In both cases it was fixable with some effort, but in both cases the blow-up wasn't detected immediately.

So: you may already be doing this, but if not, I encourage you to consider memory usage a first-order design constraint on any changes to this system. Please don't regress it, and if it can be shrunk, that would be wonderful.

Brian Hackett [Laid off!]

Updated

•

10 years ago

Comment 20

•

10 years ago

(In reply to Nicholas Nethercote [:njn] from comment #19)
> Unsurprisingly, I'm interested in the effect on memory usage this change
> would have, for the following reasons.
> 
> - Shapes already take up a *lot* of space -- typically almost as much as
> objects. I've been trying to reduce their size just this week, which led to
> some tinkering around the edges (bug 1038038, bug 1038601) but no notable
> design changes. Type objects tend to take up much less space. E.g. in my
> current session the main runtime has 80 MiB of objects, 74 MiB of shapes,
> and 6 MiB of type objects.

I think it goes without saying that any serious memory regression would be a strong argument against this.  We should be able to get an estimate of any potential regressions or gains with some instrumentation and analysis on our usual benchmarks and workloads.

There are a couple of things which I feel might bode well for this, which might be worthwhile to confirm.  But a real dissection of this requires a better understanding of how shape memory is distributed and what runtime behaviour induces that.

From a quick inspection using gmail, about 25% of shape memory usage goes to dictionary-related tables and shapes.  We should be able to eliminate most of this by eliminating dictionary shapes entirely.  With shypes, an object change that would currently lead to the creation of a dictionary shape lineage can instead generate a single leaf type representing a hashtable object.  Objects deoptimizing would revert to a slot-less hashtable representation which keeps its own keys.

> - Historically, big changes of this sort have led to memory usage blow-outs,
> at least for some workloads. When TI first landed this happened. Also,
> Brendan did a big shape refactoring a few years ago and the same thing
> happened. In both cases it was fixable with some effort, but in both cases
> the blow-up wasn't detected immediately.
> 
> So: you may already be doing this, but if not, I encourage you to consider
> memory usage a first-order design constraint on any changes to this system.
> Please don't regress it, and if it can be shrunk, that would be wonderful.

This is something to definitely keep an eye on.  There are three metrics that I think this will eventually have to be justified on: memory, performance, and power.

Jan de Mooij [:jandem]

Comment 21

•

10 years ago

Something else we've wanted to do for a while: optimizing (global object or singleton) properties with a single/constant value (bug 894596), and invalidating when it changes. V8 added this a while ago and it increased their Octane score.

This obviously doesn't have to happen immediately, but we should keep it in mind.

Kannan Vijayan [:djvj]

Reporter

Comment 22

•

10 years ago

The inferred constant optimization is also possible to implement directly on the existing architecture in a relatively clean way, if we allow TypeSets to also represent singleton values.

Empty => Single Value TypeSet => Regular TypeSet

The first update to a typeset will store the value directly in the typeset.  Any subsequent update to the typeset will discard the value, and convert the typeset to a regular typeset.

The space required to store the Value in the Single-Value-Typeset and the space for the the current |flags| and |objectSet| fields can be shared, since the Single-Value representation will never be active at the same time as those fields are required.

This will let us attach constraints onto singleton values naturally and with very few changes to our optimization logic.

Given that ObjectGroups haven't existed for years, we can close this.

Status: NEW → RESOLVED

Closed: 5 months ago

Resolution: --- → INVALID