Closed Bug 874567 Opened 11 years ago Closed 9 years ago

SpiderMonkey: Unify Shapes and TypeObjects

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1038917

People

(Reporter: djvj, Unassigned)

References

Details

Attachments

(1 file)

There have been various ideas floated around for unifying shapes and types.  I'm making this bug to bookmark the topic and to serve as a place to start collecting info.


The proposal
------------
We should be able to root shape trees inside TypeObjects.  I.e. All JS objects with that type will have a shape drawn from within the corresponding shape tree.

Once this is done, the |property => TypeSet| map that is held by TypeObjects can be distributed across the shape tree instead of being held directly by the TypeObject:


                      [ Type ]

                         * (Root)
                         |
                   --------------
                   |            |
    .x{Int,String} |            | .a{String}
                   *            *
                   |            |
           .y{Int} |            | .b{ObjType1, ObjType2}
                   *            *

If we treat property typesets as annotating the shape tree, we can choose to split shapes on the basis of property types:


                                    [ Type ]

                                       * (Root)
                                       |
             -------------------------------------------
             |              |                          |
     .x{Int} |    .x{String}|                          | .a{String}
             *              *                          *
             |              |                          |
     .y{Int} |       .y{Int}|                 --------------------
             *              *                 |                  |
                                  .b{ObjType1}|                  |.b{ObjType2}
                                              *                  *

This approach should allow us to eliminate the boxing of many values when storing them into object slots, which would be a nice win.

Furthermore, type information can now be acted on both "statically" (e.g. using freeze constraint on the TypeObjects), as well as dynamically (since shape checks also provide type information).  Now that typeset propogation during Ion compilation is handled directly on the MIR, typeinfo-producing-shapechecks are easily actionable: the type info can simply be incorporated into the typeset of the MIR nodes produced by getprops.

If we're able to pull this off, we should be able to significantly improve the quality of generated code as well as the relevance of type information at various sites.
Brian suggested instrumenting the current engine to get some insight into how amenable most JS code is to these changes.  The potential downsides of this change are as follows:

1. Shape trees will be duplicated.  If there are certain existing shape trees which are used by objects with various different types, these trees will end up getting duplicated under each distinct type object .

2. More shapes.  If we're choosing to split the shape tree on changes in the type of the a value in a slot, this is going to duplicate the shape subtrees under those polymorphic properties.

3. Shape transitions on property assignment.  If having a property with a different type implies a shape change, then assignments which cause a change in a property's type can also demand a shape change for the object owning the property.

To evaluate these costs and benefits, I instrumented the interpreter to record the following information on every GetProp and SetProp:

1. The shape and type of every object participating in the property access.
2. The shape and type of the property's owner object (this could be the same as the first object, for an own property, or it could be some ancestral prototype, if the property is retreived from the proto chain).
3. The shape corresponding to the property being read or written.
4. The depth of this shape (i.e. the distance between the property shape and the lastProperty shape of the holder object)
5. The types of value in the slot, and (for setters) the type of the new value being assigned to it.

From this dump, we can calculate the following metrics:
1. The number of distinct types that, in practice, end up being associated with any given "leaf" shape.
2. The expected number of shape changes caused by changes to the type of a value stored in a particular property slot.
3. The expected amount of duplication implied by splitting the shape tree when values of new types are stored in slots.
Analysis results:
http://pastebin.mozilla.org/2428009

These provide summarized results for all the octane tests, as well as some browser runs on facebook and gmail.  I'll use the facebook one to explain what different numbers mean:

facebook.analysis
    Shape-Implies-Type
    | Fraction of shapes associated with multiple types: 0.0679
    | Mean number of types per shape: 1.5364
    Type-Implies-Shape
    | Fraction of types associated with multiple leaf shapes: 0.2376
    | Mean number of leaf shapes per type: 1.7635
    Type-Stability
    | Fraction of propShapes with multiple types: 0.1933
    | Mean number of types per shape: 1.2386
    | Number of types per shape weighted by hits: 1.6886
    | Estimated shape duplication factor: 1.8376
    | Fraction of hits which are transitions: 0.0910

"Fraction of shapes associated with multiple types":
This counts the number of times in the logs where different objects that share a "lastProperty" shape S, but differ in their type.  In this example, 6.8% of shapes seen had a population of objects without a single type.  Alternatively, 93.2% of shapes observed were for objects whose type was consistently the same.

"Mean number of types per shape":
This averages the number of types seen associated with a shape, across all shapes.  In this example,  there are 1.53 types for every leaf shape seen.  We can estimate then, that if we root shape trees under types, we might see as much as a 1.5x duplication of shapes.

Combined with the number above, it means that less than 7% of shapes contribute to the 50% increase in "types per shape".  Basically it implies that there are a handful of shapes which are associated with a _LOT_ of types.


"Fraction of types associated with multiple leaf shapes":
This counts the frequency of objects which share a type T, but have a different leaf shape.  In this example, about 24% of types seen have different leaf shapes associated with them.

"Mean number of shapes per type":
This averages the number of leaf shapes seen per type object, across all type objects.  On average, we have less than 2 shapes associated with most types.  This means that if we were to root shape trees within types, then most of these trees will have 2 or fewer leaves.


"Fraction of propShapes with multiple types: 0.1933"
This metric measures the number of property-shapes (i.e. the shape associated with the actual property being modified) which have more than one type of value assigned to their associated slot in the object.

This measure hints at how much benefit we can expect to receive by splitting shapes on the basis of the types of their properties.  The metrics to follow also help with that measurement.


"Mean number of types per shape: 1.2386"

This averages the number of types seen for values assigned to the property for a given shape.

"Number of types per shape weighted by hits: 1.6886"

This weights the above average by the number of times that property participates in a property access.  The number "1.6886" above suggests that on the facebook code, the majority of properties have either 1 or 2 different types of values assigned to them (i.e. the divergence of shapes based on the type of the property's value will largely be limited to binary splits, if any splits at all).


"Estimated shape duplication factor: 1.8376"

This uses the "depth" measure from a property shape to all the observed leaf shapes for it to estimate the amount of shape duplication implied by splitting shapes on their property's value's type.


"Fraction of hits which are transitions: 0.0910"

This counts the number of SetProperty accesses which cause the type of a value in a property to change.  It provides a rough, pessimistic bound on the number of property writes which will require a shape-update in the suggested system.  In this case, about 91% of writes cause NO change to the type of a stored value.
Full paste of the pastebin link above:
box2d.analysis
    Shape-Implies-Type
    | Fraction of shapes associated with multiple types: 0.0254
    | Mean number of types per shape: 1.1590
    Type-Implies-Shape
    | Fraction of types associated with multiple leaf shapes: 0.6812
    | Mean number of leaf shapes per type: 4.7536
    Type-Stability
    | Fraction of propShapes with multiple types: 0.0641
    | Mean number of types per shape: 1.0797
    | Number of types per shape weighted by hits: 1.0061
    | Estimated shape duplication factor: 1.6479
    | Fraction of hits which are transitions: 0.0005
code-load.analysis
    Shape-Implies-Type
    | Fraction of shapes associated with multiple types: 0.5338
    | Mean number of types per shape: 32.8446
    Type-Implies-Shape
    | Fraction of types associated with multiple leaf shapes: 0.3224
    | Mean number of leaf shapes per type: 7.7963
    Type-Stability
    | Fraction of propShapes with multiple types: 0.4932
    | Mean number of types per shape: 1.9726
    | Number of types per shape weighted by hits: 2.2371
    | Estimated shape duplication factor: 2.5145
    | Fraction of hits which are transitions: 0.2837
crypto.analysis
    Shape-Implies-Type
    | Fraction of shapes associated with multiple types: 0.0449
    | Mean number of types per shape: 1.1410
    Type-Implies-Shape
    | Fraction of types associated with multiple leaf shapes: 0.3125
    | Mean number of leaf shapes per type: 3.7083
    Type-Stability
    | Fraction of propShapes with multiple types: 0.1081
    | Mean number of types per shape: 1.1419
    | Number of types per shape weighted by hits: 1.0057
    | Estimated shape duplication factor: 1.6082
    | Fraction of hits which are transitions: 0.0001
deltablue.analysis
    Shape-Implies-Type
    | Fraction of shapes associated with multiple types: 0.0355
    | Mean number of types per shape: 1.2057
    Type-Implies-Shape
    | Fraction of types associated with multiple leaf shapes: 0.4091
    | Mean number of leaf shapes per type: 2.5758
    Type-Stability
    | Fraction of propShapes with multiple types: 0.0880
    | Mean number of types per shape: 1.1200
    | Number of types per shape weighted by hits: 1.0742
    | Estimated shape duplication factor: 1.4563
    | Fraction of hits which are transitions: 0.0008
earley-boyer.analysis
    Shape-Implies-Type
    | Fraction of shapes associated with multiple types: 0.0550
    | Mean number of types per shape: 1.2018
    Type-Implies-Shape
    | Fraction of types associated with multiple leaf shapes: 0.2787
    | Mean number of leaf shapes per type: 2.1475
    Type-Stability
    | Fraction of propShapes with multiple types: 0.1222
    | Mean number of types per shape: 1.1889
    | Number of types per shape weighted by hits: 2.9998
    | Estimated shape duplication factor: 2.0385
    | Fraction of hits which are transitions: 0.0000
facebook.analysis
    Shape-Implies-Type
    | Fraction of shapes associated with multiple types: 0.0679
    | Mean number of types per shape: 1.5364
    Type-Implies-Shape
    | Fraction of types associated with multiple leaf shapes: 0.2376
    | Mean number of leaf shapes per type: 1.7635
    Type-Stability
    | Fraction of propShapes with multiple types: 0.1933
    | Mean number of types per shape: 1.2386
    | Number of types per shape weighted by hits: 1.6886
    | Estimated shape duplication factor: 1.8376
    | Fraction of hits which are transitions: 0.0910
gbemu.analysis
    Shape-Implies-Type
    | Fraction of shapes associated with multiple types: 0.0052
    | Mean number of types per shape: 1.0120
    Type-Implies-Shape
    | Fraction of types associated with multiple leaf shapes: 0.2600
    | Mean number of leaf shapes per type: 11.7800
    Type-Stability
    | Fraction of propShapes with multiple types: 0.0608
    | Mean number of types per shape: 1.0676
    | Number of types per shape weighted by hits: 1.0012
    | Estimated shape duplication factor: 1.0803
    | Fraction of hits which are transitions: 0.0000
gmail.analysis
    Shape-Implies-Type
    | Fraction of shapes associated with multiple types: 0.0557
    | Mean number of types per shape: 1.2959
    Type-Implies-Shape
    | Fraction of types associated with multiple leaf shapes: 0.2766
    | Mean number of leaf shapes per type: 2.8559
    Type-Stability
    | Fraction of propShapes with multiple types: 0.2295
    | Mean number of types per shape: 1.2642
    | Number of types per shape weighted by hits: 1.5871
    | Estimated shape duplication factor: 1.5675
    | Fraction of hits which are transitions: 0.1296
mandreel.analysis
    Shape-Implies-Type
    | Fraction of shapes associated with multiple types: 0.0116
    | Mean number of types per shape: 1.0233
    Type-Implies-Shape
    | Fraction of types associated with multiple leaf shapes: 0.3056
    | Mean number of leaf shapes per type: 2.4444
    Type-Stability
    | Fraction of propShapes with multiple types: 0.0282
    | Mean number of types per shape: 1.0282
    | Number of types per shape weighted by hits: 1.0267
    | Estimated shape duplication factor: 1.5238
    | Fraction of hits which are transitions: 0.0267
navier-stokes.analysis
    Shape-Implies-Type
    | Fraction of shapes associated with multiple types: 0.0192
    | Mean number of types per shape: 1.0385
    Type-Implies-Shape
    | Fraction of types associated with multiple leaf shapes: 0.1786
    | Mean number of leaf shapes per type: 1.9286
    Type-Stability
    | Fraction of propShapes with multiple types: 0.0408
    | Mean number of types per shape: 1.0408
    | Number of types per shape weighted by hits: 1.0286
    | Estimated shape duplication factor: 1.2558
    | Fraction of hits which are transitions: 0.0286
pdfjs.analysis
    Shape-Implies-Type
    | Fraction of shapes associated with multiple types: 0.0185
    | Mean number of types per shape: 1.4312
    Type-Implies-Shape
    | Fraction of types associated with multiple leaf shapes: 0.1461
    | Mean number of leaf shapes per type: 1.9579
    Type-Stability
    | Fraction of propShapes with multiple types: 0.0921
    | Mean number of types per shape: 1.1128
    | Number of types per shape weighted by hits: 1.0466
    | Estimated shape duplication factor: 1.4094
    | Fraction of hits which are transitions: 0.0129
raytrace.analysis
    Shape-Implies-Type
    | Fraction of shapes associated with multiple types: 0.0211
    | Mean number of types per shape: 1.2183
    Type-Implies-Shape
    | Fraction of types associated with multiple leaf shapes: 0.1444
    | Mean number of leaf shapes per type: 1.9222
    Type-Stability
    | Fraction of propShapes with multiple types: 0.0882
    | Mean number of types per shape: 1.1176
    | Number of types per shape weighted by hits: 1.0989
    | Estimated shape duplication factor: 1.2843
    | Fraction of hits which are transitions: 0.0183
regexp.analysis
    Shape-Implies-Type
    | Fraction of shapes associated with multiple types: 0.0185
    | Mean number of types per shape: 1.0370
    Type-Implies-Shape
    | Fraction of types associated with multiple leaf shapes: 0.1667
    | Mean number of leaf shapes per type: 1.8667
    Type-Stability
    | Fraction of propShapes with multiple types: 0.0571
    | Mean number of types per shape: 1.0571
    | Number of types per shape weighted by hits: 1.0571
    | Estimated shape duplication factor: 1.5946
    | Fraction of hits which are transitions: 0.0571
richards.analysis
    Shape-Implies-Type
    | Fraction of shapes associated with multiple types: 0.0482
    | Mean number of types per shape: 1.1807
    Type-Implies-Shape
    | Fraction of types associated with multiple leaf shapes: 0.2766
    | Mean number of leaf shapes per type: 2.0851
    Type-Stability
    | Fraction of propShapes with multiple types: 0.1667
    | Mean number of types per shape: 1.2024
    | Number of types per shape weighted by hits: 1.6206
    | Estimated shape duplication factor: 1.6354
    | Fraction of hits which are transitions: 0.0871
splay.analysis
    Shape-Implies-Type
    | Fraction of shapes associated with multiple types: 0.0260
    | Mean number of types per shape: 1.0649
    Type-Implies-Shape
    | Fraction of types associated with multiple leaf shapes: 0.3030
    | Mean number of leaf shapes per type: 2.4848
    Type-Stability
    | Fraction of propShapes with multiple types: 0.2586
    | Mean number of types per shape: 1.2586
    | Number of types per shape weighted by hits: 1.9472
    | Estimated shape duplication factor: 1.7170
    | Fraction of hits which are transitions: 0.1197
(In reply to Kannan Vijayan [:djvj] from comment #3)
> Full paste of the pastebin link above:
> code-load.analysis
>     Shape-Implies-Type
>     | Fraction of shapes associated with multiple types: 0.5338
>     | Mean number of types per shape: 32.8446
>     Type-Implies-Shape
>     | Fraction of types associated with multiple leaf shapes: 0.3224
>     | Mean number of leaf shapes per type: 7.7963

This is an artifact of the eval, as we are creating identical shapes from different couple (JSScript, pc), which would cause us to have a unique TypeObject for each JSScript, even if the shapes are identical.

CodeLoad is suppose to look at the load-time of a web-page and this side-effect should not be considered as a valid concern against this optimization.

On the other hand, this also highlight that as soon as a code is no longer used and that no objects are using the type object, then we can get rid of the type object and even shrink the Type objects (even if this sounds unsafe for now).

Can you clarify, the Shape-implies-Type correspond to the ascii arts in comment 0, where the object will hold a shape, and we can find the corresponding type object by walking-up the shape linked list, right?

> gbemu.analysis
>     Shape-Implies-Type
>     | Fraction of shapes associated with multiple types: 0.0052
>     | Mean number of types per shape: 1.0120
>     Type-Implies-Shape
>     | Fraction of types associated with multiple leaf shapes: 0.2600
>     | Mean number of leaf shapes per type: 11.7800

Does that mean that gbemu objects are all parts of the same hierarchy, but not all objects have all slots?
(In reply to Nicolas B. Pierron [:nbp] from comment #4)
> Can you clarify, the Shape-implies-Type correspond to the ascii arts in
> comment 0, where the object will hold a shape, and we can find the
> corresponding type object by walking-up the shape linked list, right?
> 
> > gbemu.analysis
> >     Shape-Implies-Type
> >     | Fraction of shapes associated with multiple types: 0.0052
> >     | Mean number of types per shape: 1.0120
> >     Type-Implies-Shape
> >     | Fraction of types associated with multiple leaf shapes: 0.2600
> >     | Mean number of leaf shapes per type: 11.7800
> 
> Does that mean that gbemu objects are all parts of the same hierarchy, but
> not all objects have all slots?

I should post the full analysis dump for these results.  They contain lines mapping every shape to the list of types seen for that shape, and likewise for every type to the shapes seen for that type.

The shape-implies-type results show that by in large, a shape in this benchmark already implies a type.  There are a few shapes with more than one type.

The type-implies-shape results are more interesting.  On average, TypeObjects are being used for objects whose shapes are pretty divergent.  In particular, there's one TypeObject which is used for property accesses involving over 300 shapes.  I'm pretty sure that this TypeObject is the one used for GameBoyCore, and the hundreds of shapes are all along a single shape lineage which is extended by each SetProp.

The "big fat object" in GameBoy basically skews the results for Type-Implies-Shape.
(In reply to Kannan Vijayan [:djvj] from comment #2)
> "Fraction of propShapes with multiple types: 0.1933"
> […]

Does that consider Int & Double being different types?

> "Mean number of types per shape: 1.2386"
> […]
> "Number of types per shape weighted by hits: 1.6886"
> […]
> This weights the above average by the number of times that property
> participates in a property access.  The number "1.6886" above suggests that
> on the facebook code, the majority of properties have either 1 or 2
> different types of values assigned to them (i.e. the divergence of shapes
> based on the type of the property's value will largely be limited to binary
> splits, if any splits at all).

How many of these type-set are in fact related to common patterns of initialization of values, such as  {null, undefined} & object ?
(In reply to Nicolas B. Pierron [:nbp] from comment #6)
> (In reply to Kannan Vijayan [:djvj] from comment #2)
> > "Fraction of propShapes with multiple types: 0.1933"
> > […]
> 
> Does that consider Int & Double being different types?
> 
> > "Mean number of types per shape: 1.2386"
> > […]
> > "Number of types per shape weighted by hits: 1.6886"
> > […]
> > This weights the above average by the number of times that property
> > participates in a property access.  The number "1.6886" above suggests that
> > on the facebook code, the majority of properties have either 1 or 2
> > different types of values assigned to them (i.e. the divergence of shapes
> > based on the type of the property's value will largely be limited to binary
> > splits, if any splits at all).
> 
> How many of these type-set are in fact related to common patterns of
> initialization of values, such as  {null, undefined} & object ?

There's quite a few of those.  I discount {Undefined=>Object} transitions when calculating the "fraction of hits which are transitions" measure, but not elsewhere.

I do collapse {Int32,Double} down to just {Double} property-value-types.

I'll go ahead and attach a file with the full analysis results.  That should be more interesting reading.
Attached file Full analysis results
For each of the three kinds of analyses (Shape-Implies-Type, Type-Implies-Shape, and Type-Stability), the script dumps the coalesced data entries it uses to calculate the final numbers.

These entries are printed in the attached file.  Explanations for what they mean follow.

Shape-Implies-Type organizes all collected info into maps of kind:
Shape => Set-of-Types.  They are printed as:

      ...
      LeafShape S0xf585f8e0: T0xf582fc70
      LeafShape S0xf5864490: T0xf58262c0
      LeafShape S0xf58608e0: T0xf5826470,T0xf58261a0,T0xf5826450,T0xf58263f0,T0xf5826340
      LeafShape S0xf58683d0: T0xf5830de0
      LeafShape S0xf586ceb0: T0xf586abc0
      LeafShape S0xf5865040: T0xf5826340
      LeafShape S0xf586dc58: T0xf58702e0
      LeafShape S0xf5861580: T0xf5830e40
      LeafShape S0xf5865f70: T0xf5862430,T0xf5862370
      LeafShape S0xf586db68: T0xf58703a0
      ...

On the left hand side is a shape.  On the right hand side is all the types which have been observed on objects which have this shape as the |lastProperty|.


Type-Implies-Shape

The entries for this analysis are basically the inverse of the above entries.  These map TypeObjects to the set of |lastProperty| shapes seen on instances of that TypeObject:

      ...
      Type T0xf583a080: S0xf5823100
      Type T0xf582fc40: S0xf585f838,S0xf585f898,S0xf585f850,S0xf585f868,S0xf585f820,S0xf585f880,S0xf585f8b0
      Type T0xf586a500: S0xf58314f0
      Type T0xf5838100: S0xf5823100
      Type T0xf58382e0: S0xf5823100
      Type T0xf58362c0: S0xf5865640,S0xf5831820
      ...


Type-Stability

The entries for this look like:
      ...
      Shape S0xf58609a0 {depths=0} {objs=1} {hits=1} {trans=0}: T0xf58387c0
      Shape S0xf5861d60 {depths=0} {objs=1} {hits=1} {trans=0}: T0xf583a260
      Shape S0xf586cf40 {depths=0,1} {objs=104} {hits=206} {trans=0}: T0xf586ae60,Null
      Shape S0xf586d430 {depths=0} {objs=100} {hits=780} {trans=208}: Null,T0xf586af60
      Shape S0xf5864ee0 {depths=0} {objs=1} {hits=1} {trans=0}: T0xf583b9e0
      Shape S0xf586d040 {depths=0} {objs=106} {hits=106} {trans=0}: Int32
      ...


Here, the left hand side key is composed of property-shapes (i.e. shapes corresponding directly to the property being manipulated with GetProp or SetProp).

The {depths=a,b,...} field specifies the different depths at which access to this shape ocurred.  E.g. if {depths=0,1}, it means that the property was accessed on an object where the distance between the |lastProperty| and this shape was 0, and also when the distance between the |lastProperty| and this shape was 1.

The {objs=...} field records the number of distinct objects observed with this property.
The {hits=...} field records the number of actual accesses (GetProps and SetProps executed) which ended up hitting this shape.
The {trans=...} field records the number of SetProps where the type of the value stored in the slot changed as a result of the set.  This number excludes situations where the transition was "Undefined => Object", or "Double => Int32".
Assignee: general → nobody
There is a better/more concrete proposal for this in the bug 1038917.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
Depends on: 1285267
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: