Closed Bug 631138 Opened 13 years ago Closed 13 years ago

Write a comment explaining Shapes and the property tree

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla12

People

(Reporter: n.nethercote, Assigned: n.nethercote)

Details

Attachments

(1 file, 2 obsolete files)

Shapes and the property tree are important, but the only documentation about
them is the big comment at the top of (the now-misnamed) jsscope.h, which is
(a) dives into impenetrable detail without any high-level explanations, and
(b) looks out-of-date -- it talks about "scopes" rather than "shapes".

Here's my attempt at a high-level description.  Thanks to Waldo for
explaining much of this to me;  any mistakes are my own.

- A single Shape represents a property in an object.

- Shapes are stored in a tree structure -- the property tree -- that permits
  (a) full sharing of Shapes between objects that have identical layout and
  (b) partial sharing of Shapes between objects that have partly-identical
  layout.

- A single path from the root of the property tree to a non-root node
  represents the layout of one or more identically-laid-out objects.  Every
  such path has a "shape number" or "shape ID" associated with it, which is
  a number that uniquely identifies the path/layout.  This allows fast tests
  of object layouts.

- Example:  if we have two objects, {a:1, b:2} and {a:1, c:3}, the property
  tree will look like this:

       {}
       |
       a
      / \
     b   c

  This property tree describes four object layouts:  b/a/{}, c/a/{}, a/{},
  {}.  With the two objects above, only two of those layouts are actually
  seen.

  However, that property tree form assumes that the 'a' property was added
  first in both cases.  If the order was reversed for the second object we'd
  get this:

      {}
      /\
     a  c
     |  |
     b  a

- The values of the properties in the objects are irrelevant to the property
  tree.  So two objects {a:1, b:2} and {a:3, b:4} have the same Shape ID.

- All links in the property tree are bi-directional.  So from any non-root
  Shape you can get back to the root, and thus see the whole object layout.

- Each object contains a pointer to the appropriate non-root Shape.  This
  Shape will represent that property most recently added to the object.
 
- An object's path order, starting at the root, dictates the enumeration
  order for the object.

- We often need to find the Shape describing a property of a particular
  object.  This involves a reverse search along a path, ie. starting from
  the Shape that the object has a pointer to, ending at the root.  This is
  linear and so is slow if we do it a lot.  Therefore, for each Shape, if
  the number of searches starting at that Shape exceeds HASH_MIN_SEARCHES,
  we create an auxiliary hash table -- the PropertyTable -- that allows
  faster lookup.  Hashing is relatively rare, because most Shapes serve as
  the starting point of a search fewer than three times.

- Each Shape contains a slot number, which indicates which slot in an object
  the property's value is held.  Each Shape also has various flags and
  attributes.

- The property tree height is bounded by MAX_HEIGHT.  Any object that gets
  more properties than that is effectively removed from the property tree
  and converted to "dictionary mode".  When this happens, a list of Shapes
  is created from the Shapes in the object's path, and a PropertyTable is
  also created for it.

- There are actually multiple property trees... some kinds of objects (eg.
  call objects) have their own tree?  (Not sure about this.)


That's it.  Comments, corrections, etc. are welcome.  Actually, re-reading the comment at the top of jsscope.h, a lots of its details are useful and relevant;  assuming they're still correct, a merging of that comment and my text above would be good.
See also these bug 592534 and bug 593129, which would get rid of "Scope" and "scope" left-over names and of course revise comments to be better.

"A single Shape represents a property in an object." The trick here is the left context being encoded in the shape, so it's not just one property, but the property and its predecessors.

The use of "{}" and tree singular is misleading. There are forests, rooted at empty shapes belonging to prototype objects, plus some compartment-based (when Bill's patch stops bouncing) well-known empty shapes.

Thanks for taking this on. The comment in jsscope.h is a decade old, from a time when fewer people worked on the engine and they knew more about its fewer abstractions. Great to have an overhaul.

/be
I wrote "there are forests" because (with Bill's patch for bug 609104) we have a forest of property trees per compartment. There should be no cross-compartment object-to-shape references.

/be
More corrections and comments:

" The values of the properties in the objects are irrelevant to the property
  tree.  So two objects {a:1, b:2} and {a:3, b:4} have the same Shape ID."

Not true for shapes with the METHOD flag set. Also getter and setter values (whether js::PropertyOp or JSObject * [that is, function object pointers]) are part of the shape's identity. This can over-specialize: see bug 367425.

" All links in the property tree are bi-directional.  So from any non-root
  Shape you can get back to the root, and thus see the whole object layout."

Non-dictionary shapes are linked up via parent and down via kids.

Dictionary shapes are doubly-linked, non-circularly, with double-indirection for the "child" pointer (younger property, later in enumeration order) to unify the basis (listp points to &lastProp) and inductive (listp points to the next-younger shape's parent pointer) cases.

" Each object contains a pointer to the appropriate non-root Shape.  This
  Shape will represent that property most recently added to the object."

An empty object's lastProp member points to a root EmptyShape, whose identity is determined by the object's prototype or (in the case of well-known built-in classes) its class and compartment.

" ... Therefore, for each Shape, if
  the number of searches starting at that Shape exceeds HASH_MIN_SEARCHES,
  we create an auxiliary hash table -- the PropertyTable -- that allows
  faster lookup.  Hashing is relatively rare, because most Shapes serve as
  the starting point of a search fewer than three times."

Three times is a measurement for Firefox, YMMV if You are an embedder.

The PropertyTable for a given non-dictionary shape is shared and immutable. For a dictionary shape, only the shape at the owning object's lastProp has a table, if there have been enough searches to warrant hashing. More below on dictionary mode.

" ... Each Shape contains a slot number, which indicates which slot in an object
  the property's value is held."

The reserved SHAPE_INVALID_SLOT slot number indicates a property with no value stored per object in the object's slots, rather a getter, setter, getter/setter, or joined method, with the shape holding those function (object) pointers.

" The property tree height is bounded by MAX_HEIGHT.  Any object that gets
  more properties than that is effectively removed from the property tree
  and converted to "dictionary mode"."

Say why (I did, in bug 630456 comment 15).

" ...  When this happens, a list of Shapes"

Say something about how these shapes are unshared, private to the object, and mutable (this mutability is exploited, more below). As opposed to shared shapes in the compartment's forest.

" ... is created from the Shapes in the object's path, and a PropertyTable is
  also created for it."

Again (there's more below on when a dictionary-mode object does *not* have a PropertyTable eagerly created to give O(1) growth looking up its shapes) say why: to avoid quadratic growth computing number of properties in the object by walking the parent-linked shape lineage starting at lastProp.

Dictionary-mode is triggered apart from MAX_HEIGHT by the delete operator or API equivalent removing a property from the tree. The original version of the code tried lazy tree forking (hg annotate if you are interested) but this still blows up for delete/add repetitions and the web finds every O(n^2) algorithm and makes its author pay.

Dictionary mode is also triggered by changing attributes of a non-last property in an object of a given id -- the change attributes different values to the other members of the shape that contribute to its identity (but the parent says the same, so enumeration order does not change -- see bug 601399).

In the case of changing attributes of the last property of a non-dictionary-mode object, the shared shape tree can be forked in constant time, so we do not switch to dictionary mode in this case.

" ... Actually, re-reading the
  comment at the top of jsscope.h, a lots of its details are useful and relevant;
  assuming they're still correct, a merging of that comment and my text above
  would be good."

Glad that comment makes more sense now. It was an attempt to avoid an even bigger comment with redundancy and tutorial-style simplifications, such as starting with Shapes all shared in a tree, and contradicting the simplifications later as the student learned more.

We could use such docs but I'm loath to burn them into code comments. MDC seems better. Comments should lay out axioms and derivations without too much deadwood or chattiness, or they become lies even more quickly.

But of course, even the big jsscope.h file comment starts with some definitions and gradually reveals the fullness of the design as it goes.

This comment dates from the C "first age", so it abstracts little. We can hide some details with C++ (e.g., the dictionary shape linkage different from the tree linkage), for sure.

/be
So the other triggers than MAX_HEIGHT for dictionary mode do all hashify. I was misremembering when I wrote "there's more below on when a dictionary-mode object does *not* have a PropertyTable" in the last comment, as Nick knows from doing the deed ;-).

/be
I wrote:

" Dictionary-mode is triggered apart from MAX_HEIGHT by the delete operator or
  API equivalent removing a property from the tree."

That should read "removing a non-last property". If the shape being removed is at obj->lastProp we reset that to obj->lastProp->parent.

/be
Regarding "'A single Shape represents a property in an object.' The trick here is" and "so it's not just", it depends.  In the context of a lookupProperty a Shape* mostly is just a property.  At the highest and simplest level of meaning, the statement is correct and not incomplete, and it contradicts no subsequent elaboration.

Certainly you want to dive below that eventually to introduce the linking that makes the property tree valuable for caching.  Still, equating Shape* to a property is a good place to start to cover the very simplest uses of Shape*, and of answering questions like "how do I find and examine the properties of an object" taken pretty simply.

Regarding avoiding future contradiction: just be careful and precise pulling back the curtain.  I think we can do it without saying wrong things at any given time.  Take the {a:1, b:2} / {a:3, b:4} case.  It's totally right if we narrow the phrasing to make clear that those represent exactly those objects as immediately created by those honest-to-goodness object literals.  Then you could say, "but what about { get x() ... } or { y: function() ... }?" and just keep on elaborating the harder cases.  The trick is to step from simplest case to hardest case, speaking as narrowly as possible while being fully accurate at each step.

I'll try to keep an eye on this as I have time.
(In reply to comment #6)
> Regarding "'A single Shape represents a property in an object.' The trick here
> is" and "so it's not just", it depends.

Hmmm. Now that you mention it, that seems precariously half-right to me. I think a Shape represents a property list. It has two main subcomponents: a property, and the rest of the list. We happen not to have a separate data type to represent the property, so when we want to pass a property as argument, we pass a Shape, but it is really a distinct concept.
(In reply to comment #6)
> Regarding "'A single Shape represents a property in an object.' The trick here
> is" and "so it's not just", it depends. 

Something seems garbled there.

> In the context of a lookupProperty a
> Shape* mostly is just a property.

Sure.

> At the highest and simplest level of
> meaning, the statement is correct and not incomplete, and it contradicts no
> subsequent elaboration.

Then why is the name Shape and not Property? There's something missing.

> Regarding avoiding future contradiction: just be careful and precise pulling
> back the curtain.  I think we can do it without saying wrong things at any
> given time.  Take the {a:1, b:2} / {a:3, b:4} case.  It's totally right if we
> narrow the phrasing to make clear that those represent exactly those objects as
> immediately created by those honest-to-goodness object literals.

Only if they are same-compartment. This is not a tutorial for a beginner or an ES5 reader who thinks there is only one global object. This is a comment for SpiderMonkey engine hackers.

>  Then you
> could say, "but what about { get x() ... } or { y: function() ... }?" and just
> keep on elaborating the harder cases.  The trick is to step from simplest case
> to hardest case, speaking as narrowly as possible while being fully accurate at
> each step.

Have you studied Gödel's work? :-P

/be
+1 on comment 7.

/be
I agree with Waldo, and I was going to change it to say (roughly) "in isolation, a Shape represents a property and contains its id, attrs, flags, etc;  but in the property tree it represents more..."
(In reply to comment #8)
> Something seems garbled there.

It's just two different sub-quotes whose combined sentiment I was subdividing.  Perhaps I underquoted, although I think dmandelin/njn basically understood what I was saying.

> > At the highest and simplest level of meaning, the statement is correct
> > and not incomplete, and it contradicts no subsequent elaboration.
> 
> Then why is the name Shape and not Property? There's something missing.

Property seems like a fine name to me.  Heck, you could have that thing be Property and have it inherit the tree-related functionality from PropertyTree, for slight separation of abstractions.  But the name JSProperty for confusion is a reasonable counter-argument, and it's all bikeshedding regardless, so it's hardly been atop the list of things I've wanted to say anything about.  (Although: Shape versus Shape::shape must eventually be fixed in *some* way.)

> Only if they are same-compartment. This is not a tutorial for a beginner or an
> ES5 reader who thinks there is only one global object. This is a comment for
> SpiderMonkey engine hackers.

Fine, {a:1,b:2} and {a:3,b:4} in the same script then.  It is useful and illustrative to say that two simple object literals will share portions of the property tree, because it's the most "obvious" case where sharing can be done, even without knowing anything about the idea of property trees or whatever, just about the possibility of sharing object layout.

I disagree about the comment not being for beginners, at least as concerns beginning SpiderMonkey engine hackers.  We need more of them!  I'm confident we can write well enough to not sacrifice approachability while serving our needs.

Regarding compartments specifically, I'd say you deal with that in an offhand reference -- at the far end -- that Shapes are located in compartments, or zones, or wherever, or soon will be.  That really says it all as far as compartment-related concerns specifically about the Shape concept go.  (We could use a comment for compartments, too, to address compartments in detail.)
(In reply to comment #11)
> (In reply to comment #8)
> > Something seems garbled there.
> 
> It's just two different sub-quotes whose combined sentiment I was subdividing. 
> Perhaps I underquoted, although I think dmandelin/njn basically understood what
> I was saying.

No doubt (they're so smart :-P); I understood too but it was a speedbump.

> > > At the highest and simplest level of meaning, the statement is correct
> > > and not incomplete, and it contradicts no subsequent elaboration.
> > 
> > Then why is the name Shape and not Property? There's something missing.
> 
> Property seems like a fine name to me.

You're missing something. Hidden-class, Structure, and Shape all connote the derivation of successive property lineages by adding a property to an object. By itself, Property does not.

> Heck, you could have that thing be
> Property and have it inherit the tree-related functionality from PropertyTree,
> for slight separation of abstractions.

It's important not to lay on abstraction to the point that it wastes space via alignment padding (Nick is unioning to unbloat for uint32 numSearches plus its 64-bit padding). And dictionary-mode objects do not have tree linkage at all.

But let me stop playing yin to your yang and agree that some lightweight/inline C++ abstraction could be good. In the end you have to know how it lays out in memory and what its sizeof is, still.

> But the name JSProperty for confusion
> is a reasonable counter-argument, and it's all bikeshedding regardless, so it's
> hardly been atop the list of things I've wanted to say anything about. 

It's ok, you don't have to worry about bike-shedding here.

> (Although: Shape versus Shape::shape must eventually be fixed in *some* way.)

(Agreed!)
 
> > Only if they are same-compartment. This is not a tutorial for a beginner or an
> > ES5 reader who thinks there is only one global object. This is a comment for
> > SpiderMonkey engine hackers.
> 
> Fine, {a:1,b:2} and {a:3,b:4} in the same script then.

That works.

> I disagree about the comment not being for beginners, at least as concerns
> beginning SpiderMonkey engine hackers.  We need more of them!  I'm confident we
> can write well enough to not sacrifice approachability while serving our needs.

I'm pretty sure we need docs on the side for beginners. All who become expert will chafe at jamming books into code comments and the comments will rust faster.

All comments rust (all code rusts, as ken observed in his chapter in "Coders at Work"). It takes nice judgment to strike the right balance. I can't say more here but I'll review patches and then I'm sure we'll arrive at something agreeable in the review comments.

> Regarding compartments specifically, I'd say you deal with that in an offhand
> reference -- at the far end -- that Shapes are located in compartments, or
> zones, or wherever, or soon will be.  That really says it all as far as
> compartment-related concerns specifically about the Shape concept go.  (We
> could use a comment for compartments, too, to address compartments in detail.)

Sure. The existing comment in jsscope.h dates from when the property tree was runtime-wide. In some ways that is the "easy" case, ignoring thread safety. The modern version is per-compartment and it doesn't need to be in the lede.

Part of the challenge for us is mapping the easy, beginner, and still-ECMA-262-eye view of JS as having a single global object onto the big and nasty world of multiple globals (then compartments, zones, runtimes -- and contexts are in a different category, so not on that spectrum).

Here's another thing to think about. SpiderMonkey Shape evolution is specialized by the hash-identity of a Shape, which includes  most members of the Shape. Not just (id, attrs) as in JSC, with an optional mutable specialized-value member. I am not sure what V8 does.

For this bug, documenting the shape hash-identity relation and talking about how it applies to children in the tree is enough. Food for thought: try varying this relation a la JSC and see if there's a win.

/be
One more thought on how much to tune comments for beginners (not just experts who may need refreshing and want tutorial style progressive revealing of complexity): code is complex even when comments aren't. Our code-base is not as beginner-friendly as (to pick a not-random example) C-Python, which rejects optimizations in favor of keeping the code approachable as well as maintainable.

Sounds like a win-lose scenario, but we can play to win-win now with C++. I agree with something Luke has said often: C++ gives us better ways to make airtight abstractions that localize relatively high complexity, with good APIs, so we can tolerate the complexity.

We aren't ever going to be as simple as an interpreter (C-Python or SpiderMonkey in its earlier days). But we aren't stuck with C, either.

Then the comments on the outside of the complexity-hiding abstractions can indeed be more beginner-friendly, although the insides won't be, and the number and kind of abstractions won't be relatively beginner-friendly in total.

So, cause for hope and grounds for further research!

As noted, the C hangover definitely is still impairing parts of SpiderMonkey. "Hair of the dog" won't help (other than get us a liver transplat ;-). Cleaning up after Firefox 4 branches is a pretty high priority (cleaning up never ends in living code, but we've been going fast and deferring more than we ought to IMO).

/be
(In reply to comment #13)
> So, cause for hope and grounds for further research!

I agree with all of that.

> As noted, the C hangover definitely is still impairing parts of SpiderMonkey.
> "Hair of the dog" won't help (other than get us a liver transplat ;-). 
> Cleaning up after Firefox 4 branches is a pretty high priority (cleaning up
> never ends in living code, but we've been going fast and deferring more than
> we ought to IMO).

I think several of our hoped-for post-release projects are also excellent opportunities to clean up important chunks of code they will touch:

- GC, for GC and allocators of course, but also object representation stuff
      relating to write barriers and marking
- scope chains, for scope chains of course, but also maybe context globals
      and things like that
- hybrid arrays, for object slot stuff

I don't know what the competitive landscape is going to be like, but I'm hoping we'll have enough breathing room that we can go methodically and really fix things up, using Luke's awesome fatvals work as example and inspiration.
Attached patch patch (obsolete) — Splinter Review
Here's a patch with a new comment, incorporating suggestions from above.

When faced with the question of what to do with the old comment, I decided to be bold and remove it all, for a net size reduction of 95 lines.  I figured the new comment covered the same ground except for all the calculations about the size of the property tree, which didn't seem necessary.  (But maybe the stuff about thread safety could be salvaged.)
Attachment #569275 - Flags: review?(brendan)
Note to self:  it's worth pointing out that property tables for non-dictionary mode shapes never grow, but property tables for dictionary mode shapes can grow.
Note to self:  entryCount() does not count the EmptyShape at the start of a shape lineage!  And property tables don't include the EmptyShape.  And the "is the shape lineage big enough to turn into a dictionary" test involving MAX_HEIGHT accordingly doesn't count the EmptyShape.
Attached patch patch v2 (obsolete) — Splinter Review
New patch with additional insights.
Attachment #569275 - Attachment is obsolete: true
Attachment #571222 - Flags: review?(brendan)
Attachment #569275 - Flags: review?(brendan)
Four week review ping.
These days, I'd suggest having bhackett review this, given the shrinking work he's been doing for awhile.  (That'll also make him aware of the need to make any changes that work necessitates, too.)
I'll give Brendan a couple more days since (a) he reviewed my earlier version helpfully and (b) I am removing a big comment he wrote.
Comment on attachment 571222 [details] [diff] [review]
patch v2

Ok, enough waiting.
Attachment #571222 - Flags: review?(brendan) → review?(bhackett1024)
Comment on attachment 571222 [details] [diff] [review]
patch v2

Review of attachment 571222 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsscope.h
@@ +67,5 @@
>  #endif
>  
>  /*
> + * In isolation, a Shape represents a property that exists in one or more
> + * objects;  it has an id, flags, etc.  (But it doesn't represent the

Sentences and other separators should be single spaced I think, at least that's what I use in spidermonkey comments.

@@ +86,5 @@
> + *    a property tree is a shape lineage.  Property trees permit full (or
> + *    partial) sharing of Shapes between objects that have fully (or partly)
> + *    identical layouts.  The root is an EmptyShape whose identity is
> + *    determined by the object's prototype, or, in the case of well-known
> + *    built-in classes, its class and compartment.  These Shapes are shared and

With objshrink this should now be 'determined by the object's class, compartment and prototype', with a reference to EmptyShape::getInitialShape.

@@ +92,5 @@
> + * 
> + * 2. Dictionary mode lists.  Shapes in such lists are said to be "in
> + *    dictionary mode", as are objects that point to such Shapes.  These Shapes
> + *    are unshared, private to a single object, and mutable.  These lists do
> + *    not have an EmptyShape.

Remove the last sentence.  Dictionary objects do have an empty shape in their lineage.

@@ +98,5 @@
> + * All shape lineages are bi-directionally linked, via the |parent| and
> + * |kids|/|listp| members.
> + * 
> + * Shape lineages start out life in the property tree.  They can be converted
> + * (by copying, excluding the root EmptyShape) to dictionary mode lists in the

Strike the 'excluding the root EmptyShape'

@@ +122,5 @@
> + * auxiliary hash table -- the PropertyTable -- that allows faster lookup.
> + * Furthermore, a PropertyTable is always created for dictionary mode lists,
> + * and it is attached to the last Shape in the lineage.  Property tables for
> + * property tree Shapes never grow, but property tables for dictionary mode
> + * Shapes can grow.

I would change 'grow' to 'change' in these two instances, as dictionary property tables can also have entries deleted.

@@ +531,5 @@
>          JS_ASSERT_IF(isDataDescriptor(), writable());
>          return hasSlot() || (attrs & JSPROP_SHADOWABLE);
>      }
>  
> +    /* For property table Shapes, this excludes the EmptyShape at the root. */

This function should behave the same for both property tree and dictionary shapes.
Attachment #571222 - Flags: review?(bhackett1024) → review+
Also, while that big comment describing the performance implications of the shape design is helpful, I don't think it belongs in the introductory comment describing how shapes work, and I remember finding it very imposing when I was learning the code base.  If there is a bug containing discussions that led to this strategy that bug could be linked from the comment, otherwise it could be tucked somewhere else (PropertyTree::getChild?) and referred to from the introductory comment.
Attached patch patch v3Splinter Review
Revised patch: 

- Fixes comments from the previous review.

- Removes all traces of the old |lastProp| name.  Actually, there's one 
  still in the comment for |listp|, I'm not sure how to change that one.

- Add a reference to this bug for the performance implication description
  (which is still removed by the patch).

- Adds a brief mention of BaseShapes.
Attachment #571222 - Attachment is obsolete: true
Attachment #585452 - Flags: review?(bhackett1024)
Attachment #585452 - Flags: review?(bhackett1024) → review+
https://hg.mozilla.org/mozilla-central/rev/ef2409ee1e0e
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla12
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: