TM: merge fslots and dslots?

RESOLVED WONTFIX

Status

()

Core
JavaScript Engine
RESOLVED WONTFIX
7 years ago
7 years ago

People

(Reporter: njn, Assigned: njn)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Assignee)

Description

7 years ago
JSObject currently looks like this:

struct JSObject {
    JSObjectMap *map;                       /* property map, see jsscope.h */
    jsuword     classword;                  /* JSClass ptr | bits, see above */
    jsval       fslots[JS_INITIAL_NSLOTS];  /* small number of fixed slots */
    jsval       *dslots;                    /* dynamically allocated slots */
}

Small objects fit entirely within fslots[].  Bigger objects are augmented by
allocating dslots.  Together, fslots[] and dslots are treated as if they are
a single array;  this requires lots of "fslot or dslot?" tests.  (Dense
arrays are different;  for them dslots holds the array elements.)

I find this horrid.  Also, it appears to be based on the assumptions that:

(a) allocations are expensive;
(b) memory is scarce;
(c) conditional tests are cheap.

Architectural trends have been making (b) and (c) less true over time;  in
particular I think the falseness of (c) is generally underestimated.

Furthermore, this design hurts the code's maintainability.  Concentrated
complexity is ok, but distributed complexity is death.  The "fslots or
dslots?" testing is awful, and it's *everywhere*.  As is the "which index
are we using, the slot number or the dslot number?" mucking about.  It's
hard to read and error-prone.

Although there are some abstracting functions (eg. STOBJ_NSLOTS, which is a
real piece of work in it's own right -- I had to ask three different people
to understand it) they aren't used widely enough to make it all work well.

The code could be made so much nicer if we had a single slots array.  Here's
one possible design:

struct JSObject {
    JSObjectMap *map;                       /* property map, see jsscope.h */
    jsuword     classword;                  /* JSClass ptr | bits, see above */
    jsval       fslots[JS_INITIAL_NSLOTS];  /* small number of fixed slots */
    jsval       *slots;                     /* initially points to fslots[] */
}

'slots' would initially point to fslots[].  If we ran out of slots,
we'd just malloc some memory and point 'slots' at it, copying across the
elements in fslots[].  (You can tell if an object is small or large by doing
a "slots == fslots" test.)

Dense arrays would need some tweaking, perhaps you'd get the start of the
array slots with something like 'denseArraySlots(slots)'.

Performance effects w.r.t. the assumptions above:

(a) The number of allocations is unchanged -- small objects involve one
    allocation, large objects involve two.

(b) Memory usage of small objects is unchanged, memory usage of large
    objects goes up -- five words are wasted per large object.
    (I started on a patch that moved 'proto' and 'parent' out of fslots[]
    into their own fields.  If that were done the number of wasted words
    would drop to three, but it introduces its own complications.)

(c) The number of tests is much lower.


Another possibility would be to always dynamically allocate fslots:

struct JSObject {
    JSObjectMap *map;                       /* property map, see jsscope.h */
    jsuword     classword;                  /* JSClass ptr | bits, see above */
    size_t      nslots;                     /* number of slots */
    jsval       *slots;                     /* slots */
}

Performance effects vs. the current design:

(a) The number of allocations increases;  all objects involve two
    allocations.

(b) Direct memory usage is unchanged or slightly lower -- objects
    could give themselves fewer than five slots if appropriate.  But there'd
    be more mallocs which have a book-keeping overhead.

(c) The number of tests is much lower.

I'd be interested to hear ideas for other designs that result in a single slots array.
Some comparisons:

* WebKit. Uses a simplified version of the same basic fslots/dslots idea:

    // From runtime/JSObject.h in class JSObject
    union {
        PropertyStorage m_externalStorage;
        EncodedJSValue m_inlineStorage[inlineStorageCapacity];
    };

    PropertyStorage propertyStorage() { 
        return (isUsingInlineStorage() ? m_inlineStorage : m_externalStorage); 
    }

The differences are:

 - they save a word by not needing a separate member for dslots
 - they don't need to subtract when accessing dslots, at the cost of more
   memory usage and more copying when promoting to dslots
 - the fslots/dslots option is fully encapsulated using C++ features

By the way:

 - they store prototype in the Structure (analogous to JSScope), so they don't
   store it directly in the object at all.
 - they don't have a parent. 

The latter two points are particularly interesting and I think they are also worth considering. 

* V8. V8 does things a little differently. They use either an array or a dictionary to store properties, in what seems to be a fake union (i.e., the just cast to the other type):

  // From objects.h in class JSObject
  // [properties]: Backing storage for properties.
  // properties is a FixedArray in the fast case, and a Dictionary in the
  // slow case.
  DECL_ACCESSORS(properties, FixedArray)  // Get and set fast properties.
  inline bool HasFastProperties();
  inline StringDictionary* property_dictionary();  // Gets slow properties.

|properties| starts out as an array of size 0, and is replaced by larger arrays as needed. If more than kMaxFastProperties properties are created, the property array is replaced by a dictionary.

It's probably OK for them to allocate all those out-of-line arrays because of their copying GC.
My own take:

- The test for which array to read probably doesn't matter for perf in the medium term, because in JM we will have PICs, which can skip testing which array to read from; and TM effectively has a PIC, achieving the same effect.

- Always dynamically allocating the base slots would be really rough on object creation perf (because we'd have to use malloc for it right now), and our object creation perf is not very good as it is.

  - But if we could allocate them with a bump allocator, then it becomes a
    pretty nice option.

- Keeping memory usage low for objects is probably a good thing. I'm pretty interested in moving the prototype to the scope (since we make scope unique by proto now anyway) and eliminating the parent and private pointers from vanilla objects. They could be added as normal-ish properties for objects that actually need them. The JSC trick of sharing the space for the dslots is also neat. 

  - We'd need stats on the population of objects with different numbers of
    properties (and what fraction of objects need parent and/or private) to
    make these decisions correctly.

Comment 3

7 years ago
Dvander was working on a patch to bump-allocate dslots. I like the idea to always bump-allocate them, dropping inline slots. We could probably wire this up pretty quickly using david's patch for measurement purposes (GC is broken, but the rest we can measure).
(In reply to comment #2)
> - Always dynamically allocating the base slots would be really rough on object
> creation perf (because we'd have to use malloc for it right now), and our
> object creation perf is not very good as it is.

Right, this is why we split the old obj->slots into fslots (inline in the obj header) and dslots. See bug 331966 and (Nick :-/) please do consider checking old CVS history for reasons why "horrid" changes were made.

>   - But if we could allocate them with a bump allocator, then it becomes a
>     pretty nice option.

The extra indirection being unavoidable (unlike with fslots inline) still hurts.

> - Keeping memory usage low for objects is probably a good thing. I'm pretty
> interested in moving the prototype to the scope (since we make scope unique by
> proto now anyway)

We always had a unique scope for each mutated object, and prototypes tend to have own properties.

Rather, we have moved toward unique "empty scopes" per proto, shared by all unmutated kids of the proto (there can be millions; each has a private data slot used by shared proto-homed getters and setters to reflect properties from a peer space such as the DOM, but otherwise each instance has no "own" properties or data of any kind).

We are heading for a world where scopes become shareable in single-threaded hash-cons'ed sense, or really: where the main reason for a scope (hash table for average O(1) access) can be folded into the thread-local property tree.

JSC's Structures can grow hashtables too as needed, so this would match JSC in that regard.

This leaves other reasons for scopes unaddressed, ignoring the title used for locking (once the big threads/GC plan of record moves that to a non-native wrapper layer):

* shape override (see below).

* object -- this is almost unnecessary now, jorendorff has looked into getting rid of it and I defer to him.

* freeslot index into object slots -- this would need to be redone outside of the scope, presumably in the object itself.

* emptyScope -- should be possible to eliminate this with more work collapsing scopes and sprops.

* lastProp -- this would move into JSObject in lieu of the map pointer.

> and eliminating the parent and private pointers from vanilla
> objects. They could be added as normal-ish properties for objects that actually
> need them.

This is a good idea too, although we will need another way to find the principals for an arbitrary object (GC page-or-wider associated metadata). It is necessary to find trust label for any object, and this need won't go away -- we will probably make greater use of it over time. It needs to be reasonably efficient.

> The JSC trick of sharing the space for the dslots is also neat. 

Yes, that's good.

It's also useful for shape override and other purposes where we do not want a full property just to allocate a slot. The challenge is allocating a slot at a fixed offset for all objects that have that slot. Or perhaps we just burn a full (named in some unnameable-in-the-language) property.

>   - We'd need stats on the population of objects with different numbers of
>     properties (and what fraction of objects need parent and/or private) to
>     make these decisions correctly.

Gregor studied object size populations. See bug 502736 and bug 547327.

Bugzilla is full of data, not all stale -- and code to gather and analyze it (ditto on freshness).

/be
Another scope field to eliminate, move to objects, or move to property tree nodes if that makes sense:

* flags:

        DICTIONARY_MODE         = 0x0001,
        SEALED                  = 0x0002,
        BRANDED                 = 0x0004,
        INDEXED_PROPERTIES      = 0x0008,
        OWN_SHAPE               = 0x0010,
        METHOD_BARRIER          = 0x0020,

        /*
         * This flag toggles with each shape-regenerating GC cycle.
         * See JSRuntime::gcRegenShapesScopeFlag.
         */
        SHAPE_REGEN             = 0x0040,

        /* The anti-branded flag, to avoid overspecializing. */
        GENERIC                 = 0x0080

As in JSC's structures, DICTIONARY_MODE would be a bit or two in the property tree node.

SEALED needs to be renamed FROZEN and moved into the object, I think. Waldo?

BRANDED would be a property tree thing, analogous to JSC's specific function tracking.

INDEXED_PROPERTIES -- see TraceRecorder::denseArrayElement and related. This is probably an object flag.

OWN_SHAPE would be the presence of the shape override property or slot.

METHOD_BARRIER is like BRANDED. It would go in the property tree.

SHAPE_REGEN goes away once scopes and sprops collapse.

GENERIC goes with BRANDED.

/be
I've changed SEALED to NOT_EXTENSIBLE in the patch in bug 492849.  FROZEN is just a bad idea, since "frozen" already has its own meaning that's different from [[Extensible]].  The negative name is unfortunate but necessary to avoid having to initialize scopes beyond callocing them.  It doesn't matter much in practice, because it's hidden behind a positive-named accessor.
(Assignee)

Comment 7

7 years ago
(In reply to comment #1)
>
>     // From runtime/JSObject.h in class JSObject
>     union {
>         PropertyStorage m_externalStorage;
>         EncodedJSValue m_inlineStorage[inlineStorageCapacity];
>     };
> 
>     PropertyStorage propertyStorage() { 
>         return (isUsingInlineStorage() ? m_inlineStorage : m_externalStorage); 
>     }
> 
> The differences are:
> 
>  - they save a word by not needing a separate member for dslots

How do they determine if they're using inline storage or not?  There must be a tag somewhere, no?

>  - they don't need to subtract when accessing dslots, at the cost of more
>    memory usage and more copying when promoting to dslots

I'd be surprised if adds/subtracts like that matter on modern machines.

>  - the fslots/dslots option is fully encapsulated using C++ features

Yes!  This is important, IMO.  More below.


(In reply to comment #4)
> 
> Right, this is why we split the old obj->slots into fslots (inline in the obj
> header) and dslots. See bug 331966 and (Nick :-/) please do consider checking
> old CVS history for reasons why "horrid" changes were made.

Thanks for the bug reference.  Split fslots/dslots may help performance, but it is a hack.  The fact that the hack isn't encapsulated is what I find most horrid.

Bug 331966 has some interesting points -- the idea that most objects have 5 or fewer slots, and that increasing object size can be a big loss because GC occurs more frequently.  So that indicates that the stuff about parent and scopes could be worthwhile.

I want to experiment with this, but because of the lack of encapsulation it requires touching a lot of code.  I might first try to encapsulate it, which would make the experimentation easier (as well as making the code less error-prone).

Comment 8

7 years ago
On a somewhat related note, the complex structure of our object header is one of the reasons the GC marking code is so slow. Marking an object is a fairly involved decision tree that fans out over class (per-class mark hooks), so even if we inline those, its quite a bit of code. The special casing of potential slots in fslots (reserved slots are not scanned) prevents us from doing a simple loop over all slots (in which case we could inline once most of the interior marking code). Instead we do:

traceParentAndProto
traceOtherFslots
traceDslots

The end result is so much code that gcc stops inlining and the marking loop incurs a lot of epilog/prolog expense during marking. If we could reduce that to just "traceSlots", marking would get a lot faster.
(In reply to comment #6)
> I've changed SEALED to NOT_EXTENSIBLE in the patch in bug 492849.  FROZEN is
> just a bad idea, since "frozen" already has its own meaning that's different
> from [[Extensible]].

Yeah, I meant NON_EXTENSIBLE -- FROZEN is catchy, or maybe I just have brain freeze.

> The negative name is unfortunate but necessary to avoid
> having to initialize scopes beyond callocing them.  It doesn't matter much in
> practice, because it's hidden behind a positive-named accessor.

Avoid the NON_ via inextensible, if everyone can stand it :-).

(In reply to comment #3)
> Dvander was working on a patch to bump-allocate dslots. I like the idea to
> always bump-allocate them, dropping inline slots.

I don't see how this does not strictly lose due to the indirection required to get common fslots, and get/set the first few fslots for vanilla Objects. The original split was a perf win for an obvious reason: avoiding the dependent slots load from obj to get the first 5 (6 initially -- but one was class ptr so it's really the same as today with different and less useful tagging in the old days) slots.

/be
(In reply to comment #7)
> (In reply to comment #4)
> > 
> > Right, this is why we split the old obj->slots into fslots (inline in the obj
> > header) and dslots. See bug 331966 and (Nick :-/) please do consider checking
> > old CVS history for reasons why "horrid" changes were made.
> 
> Thanks for the bug reference.  Split fslots/dslots may help performance, but it
> is a hack.

No, it was a perf win and a fix -- or, yes: we are hackers here. No hifalutin airs please. Academic computer science has failed to research dynamic objects (objects that grow as JS objects do). So we had a fix many years ago, and now we're looking into a better fix. Also, Gregor is looking at this in an academic research context, which is great.

> The fact that the hack isn't encapsulated is what I find most horrid.

Hey, we were C code for 13 years! Ok, that's water under the bridge and maybe a bad sunk cost.

Also, we had C encapsulation: the many *OBJ_[GS]ET_SLOT* macros. No one in the old days open-coded logic to decide between fslots and dslots, ya know.

> I want to experiment with this, but because of the lack of encapsulation it
> requires touching a lot of code.  I might first try to encapsulate it, which
> would make the experimentation easier (as well as making the code less
> error-prone).

Sounds good if the encapsulation is all inline runtime-cost-free goodness.

/be
(In reply to comment #8)
> On a somewhat related note, the complex structure of our object header is one
> of the reasons the GC marking code is so slow. Marking an object is a fairly
> involved decision tree that fans out over class (per-class mark hooks), so even
> if we inline those, its quite a bit of code. The special casing of potential
> slots in fslots (reserved slots are not scanned) prevents us from doing a
> simple loop over all slots (in which case we could inline once most of the
> interior marking code). Instead we do:
> 
> traceParentAndProto
> traceOtherFslots
> traceDslots
> 
> The end result is so much code that gcc stops inlining and the marking loop
> incurs a lot of epilog/prolog expense during marking. If we could reduce that
> to just "traceSlots", marking would get a lot faster.

Even JSC has to decide where the contiguous slots are.

Let's not overoptimize marking. You see hot mark costs for big live heaps because we have only one stinking heap for all unrelated groups of tabs/windows/frames. Fix that and you'd probably avoid prematurely optimizing marking, or making a bad trade against fslots access.

/be

Comment 12

7 years ago
> I don't see how this does not strictly lose due to the indirection required to
> get common fslots, and get/set the first few fslots for vanilla Objects. The
> original split was a perf win for an obvious reason: avoiding the dependent
> slots load from obj to get the first 5 (6 initially -- but one was class ptr so
> it's really the same as today with different and less useful tagging in the old
> days) slots.

You have to look at the whole story, not at the apparent effects. The only slots where this is an unconditional win are proto and parent since there we know its an fslot and we don't incur a conditional branch + comparison. A very rough instrumentation of a SS run shows 80 times more non-proto-parent slot accesses than proto-or-parent accesses (for objects, I excluded arrays). So for everything but the very rare case you introduce a comparison+branch, slowing down the frequent case.

The indirect load is a red herring here. You already depend on an indirect load in the equation: the slot! (*) Having to fetch the base in parallel (and independent) is likely not as big of a hit as the branch/comparison over the load (slot).

I can definitely see obj->slots + non-constant-slotoffset being faster than what we do now. At the very least its not as clearly inferior that it doesn't warrant investigation.

(*) The only exception are constant slots such as parent and proto, but as we just discussed those are rare.
(In reply to comment #12)
> You have to look at the whole story, not at the apparent effects. The only
> slots where this is an unconditional win are proto and parent since there we
> know its an fslot and we don't incur a conditional branch + comparison. A very
> rough instrumentation of a SS run shows 80 times more non-proto-parent slot
> accesses than proto-or-parent accesses (for objects, I excluded arrays).

This is not the comparison to make. Compare the cost of always loading obj->slots to the cost of the conditional branch (which is short and well-predicted, thanks to PGO.

> So for
> everything but the very rare case you introduce a comparison+branch, slowing
> down the frequent case.

Minor cycle penalty on mispredict if target is in pipe, IIRC.

> The indirect load is a red herring here. You already depend on an indirect load
> in the equation: the slot! (*) Having to fetch the base in parallel (and
> independent) is likely not as big of a hit as the branch/comparison over the
> load (slot).

Prove it. Loads use up resources on chip and dmandelin's past measurements show us saturating some of these. You do not assert "red herring" just because there is already a slot to load from somewhere. Another load costs, period. The only issue is how much, and the answer requires measurement, not hand-waves.

Furthermore, the slot is loaded from the property cache by the interpreter, but on trace it should be a constant given a slot hit (as opposed to sprop or method hit), thanks to the shape guard and basic layout guarantee. That we do not make it a constant is a bug in jstracer.cpp.

/be
Oops, I'm wrong, jstracer.cpp does make the slot a constant on trace:

LIns*
TraceRecorder::stobj_get_slot(LIns* obj_ins, unsigned slot, LIns*& dslots_ins)
{
    if (slot < JS_INITIAL_NSLOTS)
        return stobj_get_fslot(obj_ins, slot);
    return stobj_get_dslot(obj_ins, slot - JS_INITIAL_NSLOTS, dslots_ins);
}

and (above a bit):

LIns*
TraceRecorder::stobj_get_fslot(LIns* obj_ins, unsigned slot)
{
    JS_ASSERT(slot < JS_INITIAL_NSLOTS);
    return lir->insLoad(LIR_ldp, obj_ins, offsetof(JSObject, fslots) + slot * sizeof(jsval),
                        ACC_OTHER);
}

So there is no load of a slot for the first few (three at most, could be more with parent and proto evacuation as discussed here by dmandelin) slots on slot hits. This covers many small object cases, based on Gregor's numbers. It could cover more with proto/parent evac. It nullifies the hand-wave in comment 12 for such small object cases.

/be
The interesting thing about this fslots/dslots issue is that it depends not only on how the "real" workloads create and grow objects, but also on details such as tracing's ability to make variables loaded from somewhere become constants on trace.

I do not think anyone should assert glibly (me included -- load of slots could be a loss we can tolerate now, but I think not based on comment 14). Let's measure, not assert!

/be
(Assignee)

Updated

7 years ago
Depends on: 555429
(Assignee)

Comment 16

7 years ago
(In reply to comment #10)
> > Split fslots/dslots may help performance, but it is a hack.
> 
> No, it was a perf win and a fix -- or, yes: we are hackers here. No hifalutin
> airs please.

Looks like we disagree on what does and does not constitute a hack.  That's ok.  Hopefully we can agree on these steps to move forward (and this summarises various bits from above):

- Encapsulating slot access will improve code maintainability.  Filed as bug 555429.  I'm on it.

- Experimenting with the slots implementation is worth some effort.  Encapsulation will make this easier and thus is a prerequisite.

- The stuff about scopes et al is worth addressing, but beyond the scope of this bug, which is about slots.  Can someone (Brendan, dmandelin?) file one or more follow-up bugs?  I'd do so but I don't understand the details.
When a dense array becomes slow, we do not shuffle back the values to use the remaining fslots.  At one point in the development of dense arrays I wrote the extra code to do that, and then I wasn't able to see any performance gain from it for any of my tests (which at the time didn't include the v8 benchmark suite, so maybe for arrays it would show up there), so I took it back out.  Arrays do use fslots to track length and count, and at least the former was a very hot path in pre-JIT Sunspider.

Does the Structure for a given object in JSC determine whether they use their fslots or dslots?  For the fast-path access case, that would then boil away the conditional check, I imagine.
(In reply to comment #16)
> (In reply to comment #10)
> > > Split fslots/dslots may help performance, but it is a hack.
> > 
> > No, it was a perf win and a fix -- or, yes: we are hackers here. No hifalutin
> > airs please.
> 
> Looks like we disagree on what does and does not constitute a hack.

Maybe we agree on "hack" but not on whether it is pejorative, but the more significant question is: what would you have done differently back in the era of bug 331966, given the C not C++ requirement?

This is mostly academic, but if you see any C-ish patterns left stil, please feel free to get rid of them.

/be
(In reply to comment #17)
> Does the Structure for a given object in JSC determine whether they use their
> fslots or dslots?  For the fast-path access case, that would then boil away the
> conditional check, I imagine.

runtime/JSObject.h:        ConstPropertyStorage propertyStorage() const { return (isUsingInlineStorage() ? m_inlineStorage : m_externalStorage); }
runtime/JSObject.h:        PropertyStorage propertyStorage() { return (isUsingInlineStorage() ? m_inlineStorage : m_externalStorage); }

There's no free lunch -- how would structure or shape/proptree avoid having to decide between "here" and "over there"?

You could always load a slots pointer, but that is a strict lose (however small for a naive interpreter) and it prevents constant slot numbers from being folded into address-of-fslots computations on trace.

/be
(In reply to comment #17)
> When a dense array becomes slow, we do not shuffle back the values to use the
> remaining fslots.  At one point in the development of dense arrays I wrote the
> extra code to do that,

I remember that! ;-)

The other issue is the confusion where dslots[-1] changes to have the JS_INITIAL_NSLOTS bias. That is confusing but it has its rationale too. Can it be avoided without costs popping up at runtime?

/be
(Assignee)

Comment 21

7 years ago
(In reply to comment #17)
> When a dense array becomes slow, we do not shuffle back the values to use the
> remaining fslots.  At one point in the development of dense arrays I wrote the
> extra code to do that, and then I wasn't able to see any performance gain from
> it for any of my tests (which at the time didn't include the v8 benchmark
> suite, so maybe for arrays it would show up there), so I took it back out. 
> Arrays do use fslots to track length and count, and at least the former was a
> very hot path in pre-JIT Sunspider.

Only one fslot isn't used by dense arrays -- seems like it would be a small win if you've already allocated a dslots array.


> Maybe we agree on "hack" but not on whether it is pejorative, but the more
> significant question is: what would you have done differently back in the era
> of bug 331966, given the C not C++ requirement?

In C you can't avoid people accessing struct fields directly without putting the struct in a separate module and paying the cost of function calls to access.
But STOBJ_GET_SLOT et al are a good start -- I would have done more of that.  


> The other issue is the confusion where dslots[-1] changes to have the
> JS_INITIAL_NSLOTS bias. That is confusing but it has its rationale too. Can it
> be avoided without costs popping up at runtime?

Yes, it is confusing.  What cost are you worried about -- doing an add each time the number of slots is obtained?  My rule of thumb is that simple arithmetic like that is practically free on modern hardware...
(In reply to comment #21)
> (In reply to comment #17)
> > Maybe we agree on "hack" but not on whether it is pejorative, but the more
> > significant question is: what would you have done differently back in the era
> > of bug 331966, given the C not C++ requirement?
> 
> In C you can't avoid people accessing struct fields directly without putting
> the struct in a separate module and paying the cost of function calls to
> access.
> But STOBJ_GET_SLOT et al are a good start -- I would have done more of that.

We used those macros all over. Are you thinking of jstracer.cpp and jsarray.cpp in the more recent past? Those broke the macro-ized abstraction, sometimes when in a hurry it's the right thing so long as someone re-abstracts after. Which is now, I guess -- glad you are taking an interest.

> > The other issue is the confusion where dslots[-1] changes to have the
> > JS_INITIAL_NSLOTS bias. That is confusing but it has its rationale too. Can it
> > be avoided without costs popping up at runtime?
> 
> Yes, it is confusing.  What cost are you worried about -- doing an add each
> time the number of slots is obtained?  My rule of thumb is that simple
> arithmetic like that is practically free on modern hardware...

I'm not worried about anything, I'm asking the question whether it can be done without net-added runtime costs. Even ALU ops can hurt if you issue too many for the available superscalar units. Just do it and show the instruction counts and we will have an answer :-P.

/be
I meant instruction and cycle counts, of course -- agree an add could slip right into the superscalar schedule without costing extra cycles.

/be
I suggested in another bug storing the size of dslots (&dslots[-1], natch) in the object directly, because doing so would avoid bad space usage for power-of-two-sized arrays: a filled 256-element array has a dslots which can hold 511 elements, because the capacity storage is subtracted from the power-of-two space allocated.  A 255-element array completely uses a 256-sized &dslots[-1].  Are (2**n - 1) or 2**n element arrays more common?  I suspect, at not-actually-small sizes, that the latter are much more prevalent.
(In reply to comment #16)
> - The stuff about scopes et al is worth addressing, but beyond the scope of
> this bug, which is about slots.  Can someone (Brendan, dmandelin?) file one or
> more follow-up bugs?  I'd do so but I don't understand the details.

Scopes go away in bug 558451.

/be
(Assignee)

Comment 26

7 years ago
In bug 564522 comment 9 Brendan said:

"We can make objects bigger by default based on more general workloads. There
are some bugs touching on this: bug 555128 (which I claim is mis-summarized, we
do not want to merge fslots and dslots and lose
small-constant-offsets-on-trace)"

The summary is accurate for the bug's original intention, which was misguided but spurred some useful discussion and spin-off bugs.

I think this bug has now exhausted its usefulness and I'm ready to mark it WONTFIX.  Any disagreements?
(Assignee)

Updated

7 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.