Store properties named by uint32_t separately from properties named otherwise

ASSIGNED
Assigned to

Status

()

Core
JavaScript Engine
ASSIGNED
7 years ago
4 years ago

People

(Reporter: luke, Assigned: Waldo)

Tracking

(Depends on: 1 bug, Blocks: 7 bugs)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

7 years ago
If a dense array gets a single non-integer property (or becomes sufficiently sparse) the dense array becomes sparse which is way way slower.  One way to avoid that, that keeps being being brought up and has been used by other engines, notably Lua's, is to give an object both dense and sparse slot storage.  Without thinking too hard about the nitty gritty details, it seems like this could make general-case property access slightly slower, so we'd have to demonstrate by measurement that there are enough cases that win to justify this strategy.

One example where this really shows up is in building the result object/array of a RegExp exec (where bug 581595 shows we are 3x slower than JSC on v8-regexp).

This is probably too big to do before FF4; this bug is just to officially record the idea and catch discussion.
Bug 486356 comment 3 from Brendan said this:

> An idea that seems very much worth thinking about is sharing dense array
> goodness with objects having named properties (where access is optimized
> differently), so we are not stuck with either-or.
> 
> Lua does this, and Jason and I have talked about it in terms of extending
> stored 32-bit shapes to have an 8 bit "array tag". This tag would overlay the
> property-cache four-bit scope/proto-chain coordinates (see jsinterp.h
> PCVCAP_TAG*). 8 bits is enough to express "array of shape to the left, or else
> one of the machine-typed primitives, or of JSString or other jsval primitives".
> 
> How would this help? We should also get rid of a map per dense array, which is
> killing us right now (see bug 486321). Instead we would want shape to be in a
> dedicated object field, right after classword in the current world (we might
> get to a different world but let's do it with patches).
> 
> To unify fully, we would have an aslots optional field of JSObject, in addition
> to dslots, and probably void*. It would be scanned by the GC at the discretion
> of the class, as is done today even (see array_trace).
> 
> If you take this slightly longer road, I think you avoid higher-overhead steps
> and dead-end traps along the way. Having more than one *ArrayClass is one trap.
> The ongoing lack of array/object unification is another, bigger than this bug
> and of course not a good precedent -- but let's not have more, and let's try to
> fix it here and now. Or soon, anyway.
Blocks: 594655
Some of my words cited in comment 1 are way out of date (no more map allocation for dense arrays, the map has been a shared singleton for two releases, IIRC). Also I'm not sure we need the shape array-of-primitive trick.

But the idea of a universal aslots vector and zero-storage map for it (just index directly) has a lot of appeal, once you swallow the JSObject size increase.

Not sure this is out of scope for Fx4/Moz2 -- more study needed.

/be
Blocks: 602189
No longer blocks: 581595
I just found out that this is the main cause for the perf gap on string-unpack-code. That benchmark builds an array like this:

    // c and d are args in the original benchmark.
    var c = 1976;
    var d = {};
    while (c--)
        d[e(c)]=k[c]||e(c);

The resulting object's only properties are the integer indices from 0..1975, but it is a vanilla object all the way, and it has too many properties for the standard elem PIC to help.

We're seeing the need for this more and more: regexp-dna from comment 0 (Luke, do you have the data/details on that?), Processing Space Invaders (bug 602189), and now string-unpack-code. Is there any chance we can do this for Fx4?
Blocks: 578133
I made a little mistake in preparing the last comment: as you can see, the keys are not necessarily numeric, since they are generated by calling |e| (and in fact they are not numeric). So this is not a case for hybrid storage.

Anyway, that still seems like an important optimization, but back to post-Fx4.
Blocks: 611423

Updated

7 years ago
Blocks: 581180

Updated

7 years ago
Blocks: 609896
I am currently doing some tests with this approach. It's likelly that array access will get a tiny bit faster, as we lose on class check. What kind of relation dslots/aslots should have, and what kind of usage for aslots should be still acceptable, needs to be investigated. Eg x[10000] would we like to allocate so much, probably not for the first elment.
(Reporter)

Comment 6

7 years ago
I think another use case is the length property on dense arrays: currently, we can't do the same thing for dense arrays as we do on slow arrays (property with array length setter/getter) since dense arrays can't have normal properties.
I'll take this once 4.0 wraps up.
Assignee: general → jwalden+bmo

Comment 8

7 years ago
(In reply to comment #7)
> I'll take this once 4.0 wraps up.

I'm interested in helping, if you need anything.
(Reporter)

Updated

6 years ago
Blocks: 652377
Blocks: 654410
Jeff are you working on this? Maybe we could do this in project branch together?
I think this bug should wait until after we have a generational GC and can do rapid allocation of dense slot arrays for objects.  The problem right now is that an object's fixed slots are used to hold the dense slots of small arrays, and to do this right it should be possible for an object to have both a dense slot vector *and* inline slots which can be accessed in one dereference by JITs.

The only reason small dense arrays use inline slots for storage is that malloc is slow, and with a nursery we can just do a single bump allocation reserving space for an object plus its fixed slots plus initial space for its dense slots.
Maybe, though if someone's ready and willing to do the work now, it's not technically blocked on GC stuff. At least it seems like most of the bug's difficulty is in the VM.
Sure, it could be implemented now, but getting good performance characteristics for small arrays will I think require a very large increase in complexity vs. what would be necessary with a fast allocator.
I started work on this on Friday.  I hadn't thought of this as particularly well-suited to a project branch, and so far I've just been trying to carve out small pieces at a time to work on incrementally.  But maybe I just haven't thought it through hard enough.  I'll try to get back with more in the next few days.
(In reply to comment #12)
> Sure, it could be implemented now, but getting good performance
> characteristics for small arrays will I think require a very large increase
> in complexity vs. what would be necessary with a fast allocator.

I'd like this to start now, in the interest of getting it finished sooner. This is a big performance fault for us, and we can certainly fix that now. All the general VM adaptations should still apply with new GC. 

I'm fine with not fully optimizing and tuning arrays until we get the other pieces in place.
Depends on: 674775
Depends on: 676738

Updated

6 years ago
Blocks: 677079

Updated

6 years ago
Blocks: 679710
Blocks: 687935
No longer blocks: 609896
Blocks: 707686
Depends on: 713183
Depends on: 713965
Depends on: 714079
Depends on: 714218
Blocks: 726423
Depends on: 739380
I'm touching so much of the same code that this seems like the right time to begin implementing the proto-climbing refactoring:

http://wiki.ecmascript.org/doku.php?id=harmony:proto_climbing_refactoring

This should also have benefits for JSAPI users, where we currently hack in workarounds for some of that stuff in awkward ways.  It is scope creep, somewhat (but note "touching so much of the same code"), but it also means I have a clearer understanding of what each bit of functionality I implement corresponds to, because there are straightforward spec algorithms to point to.  These patches are landing piecemeal over time, adjacent to the current code, but not yet used (to enable easy switching back and forth until we're completely comfortable with the new representation).
Status: NEW → ASSIGNED
Summary: consider giving objects both dense and sparse slots → Store properties named by uint32_t separately from properties named otherwise
Depends on: 745944
Depends on: 751377
Blocks: 766847
Depends on: 786880
Blocks: 791699
Blocks: 817446

Updated

5 years ago
No longer blocks: 817446
Depends on: 799122

Updated

5 years ago
Duplicate of this bug: 820820
Blocks: 695438
Depends on: 823283
Depends on: 792108
Depends on: 827449
Depends on: 827490
No longer blocks: 791699
Depends on: 837773

Updated

4 years ago
Blocks: 889783
You need to log in before you can comment on or make changes to this bug.