Open Bug 1500481 Opened 6 years ago Updated 2 years ago

JSVM - Simplify dictionary object representation and sparse arrays with object-held hashtables

Categories

(Core :: JavaScript Engine, enhancement, P2)

enhancement

Tracking

()

People

(Reporter: djvj, Unassigned)

References

(Depends on 1 open bug)

Details

Attachments

(1 file)

MOTIVATION
----------

I'm posting this in the middle of a really poor experience trying to land an optimization patch for SETELEMs on sparse arrays.  Trying to capture all the corner cases is difficult, and has led to two backouts on trunk because of some case not being handled exactly the right way, or the right flag not being set, etc.

Not only that, our optimization for sparse arrays wasn't even that good because it involved a shape allocation for every SETELEM that added a new property, which showed up in profiles.

Theoretically, a "dictionary object" should just be a hashtable directly, and not require the allocation of new shapes simply for adding new mappings.  Additionally, the mess of various flags (PACKED, DENSE, INDEXED, etc.) and their interactions are _really_ hard to reason about, and increase the security hazards around trying to optimize for these cases.

I'd like to propose a sketch for a new approach to dictionary objects, and get feedback from relevant JS engineers on the feasibility of a quick turn-around on implemenation.

DESIGN GOALS
------------

Design goals are: performance, simplicity, lower memory.

Performance for dictionary-like objects (including sparse arrays) comes from treating them as much as possible as simple hashtables.  It seems only NativeObjects should be considered as candidates for this (someone correct me if they disagree).

An adding SETELEM should not in the average case require an allocation (currently it requires a Shape allocation).

An updating SETELEM should not in the average case require a shape-search, which it currently does (optimized via the ShapeTable dictionary once it gets too unweidly).


Simplicity and ease of reasoning come from the fact that we'd eliminate a large number of different corner cases occurring due to different allowable combinations of PACKED, DENSE, and INDEXED flags on various structures.


IMPLEMENTATION
--------------

When an object goes into dictionary mode, it acquires a new _single_ DictionaryShape, and replaces its `slots_` and `elements_` each with a direct hashtable which holds both keys and values (so there would be two hashtables here - one for regular slots, and one for elements):

The hashtable entries are inlined into the `slots_` and `elements_` arrays, and each entry has 3 components:

{
  uint32_t flags,
  uint32_t nextIter,
  jsid key,
  ValueOrDescriptorPtr data
}

The `flags` word holds meta-information, indicating the type of slot (data or accessor, writable/enumerable/configurable flags).  It can also be used to determine if a hashtable entry is empty or deleted sentinel.

The `iter` holds the index of the next entry in iteration order, to support in-order iteration over keys.

The `key` is the property key.

The `data` depends on the flags.  For data properties, it's a simple Value pointing to the data property.  For more complex properties (e.g. accessors), it points to a `PropertyDescriptor` object on the heap.


The `DictionaryShape` singleton shape contains the following information:

1. A "slots_version" count (uint32_t) that is incremented whenever an entry is added or removed from the `slots_` hashtable, or when a property's attributes are modified (e.g. writable => non-writable, data => accessor, new getter defined, etc.).  Guarding on the shape + version would allow us to use ICs to optimize shape lookups on dictionary objects.

2. The index of the first hashtable entry by iteration order - to allow in-order object iteration (two of these indexes would exist: one for slots, one for elements).

3. The capacity and density of the two hashtables.  This is useful for heuristics on enlarging the hashtable, as well as for re-densification checks on arrays.

The hashtable in `elements_` would exclusively store indexed properties, and `slots_` would exclusively store string and symbol properties.

The `length` attribute on arrays would become a regular shaped attribute, as a fixed slot on the object.  Since Array.length is a non-configurable property, it will never go away and can always be guaranteed to occupy the fixed slot 0 on an array object.

CONSEQUENCES
------------

1. Property lookups on dictionary objects would become a simple, direct hashtable lookup.

2. Adding a new property (indexed or not) on dictionary objects would not require an allocation most of the time.

3. Guarding on the `shape + version` would allow for shape-guarded IC access to slots.

4. For integer-keyed-properties, empty hashtable entries would play the same role as HOLE entries do in current `elements_` arrays - implying that prototypes might need to be checked.

5. Lots of memory saved.  On 64-bit systems, each entry would take 24 bytes.  Currently, a unique shape for every property on a dictionary object takes up 40 bytes by itself, and the ShapeTable entry for it would take another 8 bytes.  We'd move from 6 words per entry to 3 words.

6. We would also reduce pressure on the GC system (due to fewer Shapes existing).

7. Much better memory locality, since all properties would be in a packed array.


FEEDBACK
--------

How does the above sound?  Are there any issues I'm missing here?  Any complications?  What's the implementation difficulty involved in doing this?  As far as I know, we tend to avoid dictionary-ified objects when optimizing in the JITs, so there shouldn't be too many assumptions about dictionary objects currently.
I want a clear accounting of whether this is a feasible idea, potential implementation timelines if I were to take it as a priority.  I'd want to target this for 2019, H1.
Flags: needinfo?(jorendorff)
Flags: needinfo?(jdemooij)
(In reply to Kannan Vijayan [:djvj] from comment #0)
> When an object goes into dictionary mode, it acquires a new _single_
> DictionaryShape, and replaces its `slots_` and `elements_` each with a
> direct hashtable which holds both keys and values (so there would be two
> hashtables here - one for regular slots, and one for elements):

This could be a relatively costly operation however, correct? (ie, an object that has a large number of properties, elements that goes dictionary requires moving all existing properties and elements into the hash table) 

A pathology would be a large collection of similar objects all going dictionary mode near the same time in a latency sensitive operation. 

Having said that, because the number of properties and elements is already bounded, this would be a bounded operation.
(In reply to Matthew Gaudet (he/him) [:mgaudet] from comment #2)
> A pathology would be a large collection of similar objects all going
> dictionary mode near the same time in a latency sensitive operation. 

Interesting point. This is similar to problems we currently experience with UnboxedObjects that we want to get away from by removing them.
Big +1 on using bugzilla and concrete proposals.

As it stands, there are two kinds of own-property storage on a NativeObject:
- shapes-and-slots
- elements
To me, it looks like the proposed dictionary storage mode would take over some "shapes-and-slots" cases, and never any "elements" cases. So to me the proposal seems orthogonal to the idea of removing something like PACKED, which applies only to elements. Is that right?

The mode you are proposing to *add* is very clearly explained. But that is new complexity, the easy and bad part. :) How exactly existing modes get removed, the hard and good part, is left vague.
I don't think this proposal actually lets us remove (for example) ShapeTable, because I think we still need it for non-dictionary objects too.
Flags: needinfo?(jorendorff)
jorendorff said:
> The mode you are proposing to *add* is very clearly explained. But that is new complexity, the easy and bad part. :) How exactly existing modes get removed, the hard and good part, is left vague.

The removal of complexity here is that "dictionary mode" would go away.  ShapeTables would still exist, but as an optimization for lookups on a regular shape tree.  The whole `union { kids, listp }` business on Shape would go away.  All of the code dealing with dictionary mode would be replaced with a new direct-dictionary mode.

All that said, given some additional conversations, and pointers from :evilpies, I can narrow the scope of this proposal for the sake of making the implementation incremental.  We can still consider the broader ideas proposed in the bug, but here is a good first step to get us there:

1. We do not use the Shape tree for indexed properties - they are _always_ in the `elements_` array, and never described by a Shape.

2. The `elements_` object is one of two modes, `Linear` or `Hashtable`.

In Linear mode, we are guaranteed that all indexed properties are stored linearly on `elements_`, with the array indices being the implicit property indices (as they are now).

In Hashtable mode, we are guaranteed that all indexed properties are stored as explicitly keyed hashtable entries within `elements_`.

So conceptually, we would change `elements_` into `union { elementsLinear_, elementsHashtable_ }`

We'd eliminate Shape allocations associated with adding-SetElems on sparse arrays, which is a direct contributor to the performance problems we've seen in use cases involving sparse arrays.  This should also decrease GC pressure due to Shape-allocation.  Memory would be saved, as the hashtable entry can be stored in two uint64_t-sized words (on 64-bit systems) - 16 bytes instead 48 bytes per entry.  We should also be able to simplify the handling of indexed properties significantly:

1. Linear Elements - Packed
2. Linear Elements - Holey
3. Hashed Elements

At some future point we can simplify the handling of dictionary objects as well.
+1 for moving sparse indexes/elements out of the Shape tree and into a separate elements hash map, see bug 1339265 (it's also what other engines are doing AFAIK). One of the nice things there is that insertion/iteration order doesn't matter (indexes are sorted when enumerated) and I think "indexes are always in nobj->elements_ and never in nobj->slots_" will simplify parts of the code.

As for dictionary objects in general, I think our current system has some advantages: a property get/set/iteration on native objects can always just walk the Shape tree from nobj->lastProperty(), dictionary mode or not (property mutation/deletion is more complicated, but that's also less common). The exception to this are elements (dense and typed array).

IIUC, the new system would "fork" NativeObject property lookup/iteration into either linear or dictionary. I think that will add more complexity than it eliminates: now looking up o.foo will have to deal with both cases and this will also affect ICs etc because most gets/sets on dictionary objects *are* optimized just like on non-dictionary objects.
Flags: needinfo?(jdemooij)
Depends on: 1339265
I think your original problem is a documentation issue about the various object representations and when we should change object representations.  For example, a documentation like we have on JSString [1] would have solved the original motivation cases.  So I guess the first thing to address would be to document what our object model and representations are.

One aspect which I found interesting is the co-location aspect of the data structure. I found it interesting not as an object representation, but as an **Object Model** out of which we could derive all others **Objects Representations**.

It feels to me that the problem you describe is that we keep the shape out of the dictionary, but I want generalize this feeling for other aspects of our object model, such as flags and values.

 1. Today our shape capture the information associated with each property, with its flags. My guess is that these flags are identical in many of the properties, and we might save space by only storing flags which are uncommon.

 2. Another aspect is that you kept the reverse-linked-list aspect of the spaghetti stack of shape, but if you are dealing with a singleton, then our shape representation might as well be a vector, which is ordered and unique.

To be clear, I do not criticize the choice of making a dictionary object representation which aggregates the shape and the values. However, I want to emphasize that we could derive various object representations from the same object model, by mentioning that how we aggregate aspects of it, and adds new pointers/indexes to reference related data.

  Properties ordering storage:
    - dictionary.
    - spaghetti stack.
    - spaghetti stack + dictionary for fast lookup.

  Value storage:
    - Fixed slots. (indexed stored in the property ordering)
    - Dynamic slots. (indexed stored in the property ordering)
    - Elements vector.
    - Dictionary.

  Properties flags storage:
    - (always attached to the properties ordering)

My point is, that we should probably first document what we have today, and how these are transitioning from one object representation to another. Later we can see the gaps more clearly to start filling them with better object representations.

[1] https://searchfox.org/mozilla-central/rev/0ec9f5972ef3e4a479720630ad7285a03776cdfc/js/src/vm/StringType.h#52
I've created a simple sparseArray test case. It uses randomly generated indices so it is not particularly deterministic, however it shows clear patterns. What it does is 'shift' elements in the array by a constant offset (which is something like what Google docs does). This can be used to compare our performance to Chrome. One thing that's immediately noticeable is that for 'fairly dense' arrays our performance is particularly bad. Our performance is comewhat closer to Chrome when shifted elements have their old element deleted rather than set to null. I'm thinking this might be because it causes elements in the array to go away and come back when another element is shifted into that spot, that'd be fairly easy to test if needed.
One thing I noticed in a profile of that testcase is that we spend a lot of time in NativeObject::maybeDensifySparseElements, but it seems to be outside the code we're benchmarking. Please file a new bug if you see maybeDensifySparseElements (and removeProperty under it) show up in real profiles, we can make that function significantly faster.

Also when I run your test in the JS shell and add a gc() call after each doWork call, we're *much* faster, so it's possible GC is adding significant overhead to the code you're benchmarking.
(In reply to Jan de Mooij [:jandem] from comment #10)
> One thing I noticed in a profile of that testcase is that we spend a lot of
> time in NativeObject::maybeDensifySparseElements, but it seems to be outside
> the code we're benchmarking. Please file a new bug if you see
> maybeDensifySparseElements (and removeProperty under it) show up in real
> profiles, we can make that function significantly faster.
> 
> Also when I run your test in the JS shell and add a gc() call after each
> doWork call, we're *much* faster, so it's possible GC is adding significant
> overhead to the code you're benchmarking.

We have the existing suspicion that our sparse array system is heavy on GC I believe, so that's quite believable. Do note that I have a separate test for a 'single run' and there we still very much underperform Chrome on larger index ranges with 10-30% density.
Priority: -- → P2
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: