Closed Bug 1479900 Opened 6 years ago Closed 6 years ago

Simplify GC relocations

Categories

(Core :: JavaScript: GC, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED

People

(Reporter: tcampbell, Assigned: tcampbell)

References

Details

Attachments

(2 files, 3 obsolete files)

Currently gc::Cell has no fields, but in various places in GC we apply reinterpret_casts and use prose instead of type-systems to argue that the bits we care about are certainly in the location we want and that nothing could go wrong.

1) The StoreBuffer distinguishes between JSObject and JSString by doing reinterpret_casts and relying on NON_ATOM bit and group_ pointer to play nicely.

2) The RelocationOverlay requires it's magic_ field to play nice with any derived class of Cell that is moveable.

Proposal:
  - Add |uintptr_t dataWithTag| field to gc::Cell. Also a comment describing that 'dataWithTag' uses the low 2-bits as a tag, and the rest are for derived classes to use. Current tag values include NORMAL / STRING / RELOCATION.
  - JSString flags should lose two bits of their flags to the tag instead of relying on NON_ATOM trick.
  - On 32-bit builds JSString need to define their length field, but not on 64-bit.
  - RelocationOverlay should extend gc::Cell and use tag of relocation when defining magic.
  - All other types of Cells should assert their first field is an aligned pointer.
  - nurseryCellIsString gets re-written to check the tag instead of current (questionable) cast
  - If we do Bug 1376646, we can remove the special case for STRING. In this case, it still seems preferable to have this explicit field instead of casting things to RelocationOverlay.
  - js::Symbol should reorder it's fields so the 'unused' flag is instead dataWithTag.

Compared to existing code, we would by taking two flag bits from JSString. We currently use 10/16 flag bits so this seems reasonable.
 

Yes, this is not pretty but it is simply describing what we are already relying on. More explicit types gives people a chance of understanding the relations without having to read the entire codebase first.
(In reply to Ted Campbell [:tcampbell] from comment #0)
This is a great idea.  I really like having some space reserved for flags in each cell.

Not for this bug, but if we were to make dataWithTag atomic this could open up other possibilities:
 - using the rest of it to hold the relocation target too could enable parallel nursery eviction
 - storing the mark bits there could enable parallel marking (as a different approach to bug 1424934)
Component: JavaScript Engine → JavaScript: GC
Assignee: nobody → tcampbell
Sounds good to me! Thanks for cleaning this mess up.
Part 1 adds accessors to JSString flags/length so storage can be changed later.
This is the heavy lifting to rewrite the layout of RelocationOverlay and update various Cell-derived types to be compatible.

> The gc::RelocationOverlay relies on gc::Cell to behave in specific ways
> to work correctly. This refactors gc::Cell derived types to start with a
> uintptr_t-sized field with the low bits reserved for the GC.
>
> - JSString now stores flags in a uintptr_t. On 32-bit platforms, a
>   second field is used to hold length.
> - Redefine JSString flag bit positions to avoid cell reserved bits.
> - Forwarded Cells are now indicated by a reserved flag instead of a
>   magic invalid-pointer-like value.
> - Update js::Scope and js::BigInt fields to be compatible.
The next step would be to actually define the uintptr_t in gc::Cell and also to make RelocationOverlay extend gc::Cell (and probably should properly call constructor).
Blocks: 1480772
Comment on attachment 8997261 [details] [diff] [review]
0001-Bug-1479900-Part-1-Use-accessor-methods-for-JSString.patch

This adds accessors for JSString::flags/length so that we can change the storage of them in a later patch.

Adds |setLengthAndFlags| so that they two are updated in one operation which simplifies things later if they are encoded in the same word of memory.

Adds |setTempData| to replace JSString::flattenData enum arm. This enforces you can't read the data back without restoring the flags/length at the same time. I needed to add some helpers for accessing inline data without checking flags and asserting.
Attachment #8997261 - Flags: review?(sphink)
Comment on attachment 8997261 [details] [diff] [review]
0001-Bug-1479900-Part-1-Use-accessor-methods-for-JSString.patch

Review of attachment 8997261 [details] [diff] [review]:
-----------------------------------------------------------------

easier to read this way, too

::: js/src/vm/StringType.h
@@ +385,5 @@
> +
> +    // Temporaraly store a word by clobbering flags. This is used by the
> +    // flattening algorithm and is not GC-safe.
> +    MOZ_ALWAYS_INLINE
> +    void setTempData(uintptr_t data) {

*Temporarily

I wonder if this would be better as setTempTaggedPointer(void* pointer, uint8_t tag)? Just to provide more freedom with the tagging scheme.

Though honestly setFlattenContinuation(uint8_t action, JSString* node) wouldn't be unreasonable, or if we were ok with exposing implementation details, use an enum for the action.

...or keep it like you had it, if you'd prefer not to expose the implementation.
Attachment #8997261 - Flags: review?(sphink) → review+
Carrying r=sfink from Comment 7.

Renamed JSString::setTempData to setFlattenData. Left the arguments an opaque uintptr_t.
Attachment #8997261 - Attachment is obsolete: true
Attachment #8998286 - Flags: review+
Final version. This reworks the Cell::isForwarded / RelocationOverlay mechanism.

Now all types should be compatible with forwarded (even if we disable compacting for other reasons on them). It turned out that a lot of debug asserts were making this assumption anyways and we were open to random assertion failures in existing code.

I intend to land Part 1/2 pending reviews. This is cleaning up some dirty casting were doing, and in future work we can actually move the field definition to gc::Cell. There are some ergonomics I'd like to experiment with before that.

Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=910321edc7588710bc484e5b779f00c791352d07&group_state=expanded
Attachment #8997262 - Attachment is obsolete: true
Attachment #8998289 - Flags: review?(sphink)
Comment on attachment 8998289 [details] [diff] [review]
0002-Bug-1479900-Part-2-Refactor-GC-relocation-to-use-a-r.patch

Review of attachment 8998289 [details] [diff] [review]:
-----------------------------------------------------------------

Gorgeous patch! r+++

::: js/src/gc/Cell.h
@@ +71,5 @@
> +    // Indicates if the cell is currently a RelocationOverlay
> +    static constexpr uintptr_t FORWARD_BIT = JS_BIT(0);
> +
> +    // When a Cell is in nursery, this will indicate if it is a JSString (1) or
> +    // JSObject (0). This may be removed by Bug 1376646.

in *the nursery

Hm. From the name of the bit plus the comment here, it feels sort of natural to assume:

1. JSSTRING_BIT=1 implies the Cell is a string (which is true)
2. JSSTRING_BIT=0 implies the Cell is not a string (which is false)
3. JSSTRING_BIT=0 implies the Cell is a JSObject (even more false)

Would it make more sense to call this NURSERY_STRING_BIT?

::: js/src/gc/GC.cpp
@@ +385,5 @@
>      /* 16 */ AllocKind::OBJECT16
>  };
>  
> +// Check that reserved bits of a Cell are compatible with our typical
> +// allocators since most derived classes will store a pointer in first word.

in *the first word.

::: js/src/gc/Marking-inl.h
@@ +43,2 @@
>      if (!MightBeForwarded<T>::value) {
> +        MOZ_ASSERT(!t->isForwarded());

Ouch, you're right, this was bogus before.

@@ +93,5 @@
>  {
>      MOZ_ASSERT(!isForwarded());
> +
> +    // Preserve old flags because nursery may check them before checking
> +    // if this is a forwarded Cell.

you left out "...which is a horrible thing to do, but we do it, so we'd better support it."

Can we not do that? And if we don't do it, then wouldn't it be better to set this to 0xffffffff | Cell::FORWARD_BIT? (yes, I know the `| FORWARD_BIT` is redundant) Or maybe plain FORWARD_BIT; whatever is most likely to crash hard and fail lots of tests.

::: js/src/gc/RelocationOverlay.h
@@ +32,5 @@
>  class RelocationOverlay
>  {
> +    // First word of a Cell has additional requirements from GC. The GC flags
> +    // determine if a Cell is a normal entry or is a RelocationOverlay. The
> +    // NewLocation is location |this| was moved to.

is *the location

Or lose that final sentence entirely; NewLocation is a nicely evocative name already.

@@ +37,5 @@
> +    //                3         0
> +    //  -------------------------
> +    //  | NewLocation | GCFlags |
> +    //  -------------------------
> +    uintptr_t dataWithTag_;

I really like that this shrinks down RelocationOverlay to two pointers.

@@ +52,5 @@
>          return reinterpret_cast<RelocationOverlay*>(cell);
>      }
>  
>      bool isForwarded() const {
> +        return reinterpret_cast<const Cell*>(this)->isForwarded();

I know you have bigger plans already, but would it make more or less sense to have RelocationOverlay inherit from Cell for now, to eliminate this cast?

::: js/src/vm/Scope.h
@@ -311,5 @@
> -    // stale.
> -    union {
> -        ScopeKind kind_;
> -        uintptr_t paddedKind_;
> -    };

Good riddance.

::: js/src/vm/StringType.h
@@ +166,5 @@
>      struct Data
>      {
> +        // First word of a Cell has additional requirements from GC and normally
> +        // would store a pointer. If a single word isn't large enough, the length
> +        // is stored seperately.

*separately

Ooh, so *this* is where some ugly squirts out. But it's really not bad at all.

My first reaction was much worse, because I thought you were going to do something where you store the length in the flags word in a 12-bit value or something, then use a sentinel if the length is >= 4095... but you're not, this is only an architecture word size thing. "Pack the length in a shared value if the word size permits, otherwise store the length separately"? "Patch the length with the flags..."?

@@ +172,5 @@
> +        //  --------------------------
> +        //  | Length | Index | Flags |
> +        //  --------------------------
> +        //
> +        // NOTE: This is also used during flattening a Rope for temporary storage.

How does flattening a Rope give you temporary storage? :-) Maybe "This is also used for temporary storage while flattening a Rope." Though hm... I think we're trying to move away from flat strings, so maybe s/flattening/linearizing/ even though the method name still says flattening. "Linearizing a Rope" just makes more sense to me.

@@ +176,5 @@
> +        // NOTE: This is also used during flattening a Rope for temporary storage.
> +        uintptr_t flags_;                               /* JSString */
> +
> +#if JS_BITS_PER_WORD == 32
> +        // Additional storage for length if |flags_| is too small.

"...if the |flags_| word is too small to fit both."?

@@ +210,4 @@
>       *
> +     * The first word of a JSString stores flags, index, and (on some
> +     * platforms) the length. The flags store both the string's type and its
> +     * character encoding.

This is clear.  I would have been fine had I read it first.

@@ +606,5 @@
> +        return offsetof(JSString, d.flags_) + sizeof(uint32_t);
> +    }
> +    static constexpr size_t offsetOfLength() {
> +        return offsetof(JSString, d.flags_);
> +    }

I can't decide if what you have is better, or if it would be better to eliminate the endian test and have

union FlagsWord {
    uintptr_t fullWord;
    struct {
        uint32_t indexAndFlags;
        uint32_t length;
    }
}

and then have offsetOfFlags() return offsetof(FlagsWord, indexAndFlags) and offsetOfLength return offsetof(FlagsWord, length).
Attachment #8998289 - Flags: review?(sphink) → review+
(In reply to Steve Fink [:sfink] [:s:] from comment #10)
> Would it make more sense to call this NURSERY_STRING_BIT?
The bit is still the old NON_ATOM bit and most names we consider will be misleading. I looked at separating the bit into two, but that requires patching a bunch of manual stuff in CodeGenerator. Probably worth doing, but also maybe we can have two store-buffers instead (or even tag the pointer-to-pointer-to-cell that we store in the storebuffer).

> @@ +606,5 @@
> > +        return offsetof(JSString, d.flags_) + sizeof(uint32_t);
> > +    }
> > +    static constexpr size_t offsetOfLength() {
> > +        return offsetof(JSString, d.flags_);
> > +    }
> 
> I can't decide if what you have is better, or if it would be better to
> eliminate the endian test and have
> 
> union FlagsWord {
>     uintptr_t fullWord;
>     struct {
>         uint32_t indexAndFlags;
>         uint32_t length;
>     }
> }
> 
> and then have offsetOfFlags() return offsetof(FlagsWord, indexAndFlags) and
> offsetOfLength return offsetof(FlagsWord, length).

Unfortunately that does not work. The reason I ended up with the solution I did was that every Cell now uses an endian-independant uintptr_t. The only exception is when accessing length or flags with a 32-bit load instead of 64-bit. We kind of want a general MEM64_32HI_OFF / MEM64_32LO_OFF macros in macroassembler as well as changing all string flags access to use a helper instead of the current stuff. This seems out of scope for this patch, but I can do it as part 3 if desired.
(In reply to Ted Campbell [:tcampbell] from comment #11)
> Unfortunately that does not work. The reason I ended up with the solution I
> did was that every Cell now uses an endian-independant uintptr_t. The only
> exception is when accessing length or flags with a 32-bit load instead of
> 64-bit. We kind of want a general MEM64_32HI_OFF / MEM64_32LO_OFF macros in
> macroassembler as well as changing all string flags access to use a helper

Doh! You are right, of course. Somehow I convinced myself that little-endian would reorder my two uint32_t fields. Me is dumb.

> instead of the current stuff. This seems out of scope for this patch, but I
> can do it as part 3 if desired.

Nah, I'd rather have this landed first.
(In reply to Steve Fink [:sfink] [:s:] from comment #10)
> > +    // Preserve old flags because nursery may check them before checking
> > +    // if this is a forwarded Cell.
> 
> you left out "...which is a horrible thing to do, but we do it, so we'd
> better support it."
> 
> Can we not do that? And if we don't do it, then wouldn't it be better to set
> this to 0xffffffff | Cell::FORWARD_BIT? (yes, I know the `| FORWARD_BIT` is
> redundant) Or maybe plain FORWARD_BIT; whatever is most likely to crash hard
> and fail lots of tests.

I poked at this a bit, and it is non-trivial to fix so we can handle a separate patch. The problem is getTraceKind is called all the time on a RelocationOverlay and asserts start firing left and right. Probably can be fixed, but out of scope for this part.
Carrying r=sfink from Comment 10.

- Fix some comment phrasing
- Make gc::RelocationOverlay extend gc::Cell (and reduce the casts as a result)
Attachment #8998289 - Attachment is obsolete: true
Attachment #8998651 - Flags: review+
Pushed by tcampbell@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ebac10c8c8b9
Part 1: Use accessor methods for JSString::flags/length. r=sfink
https://hg.mozilla.org/integration/mozilla-inbound/rev/819b92315984
Part 2: Refactor GC relocation to use a reserved flag. r=sfink
Landing the first parts. Leave-open to actually address moving the field to gc::Cell.
Keywords: leave-open
Closing this as this does the bulk of the work. The original goal of moving the field requires a little more scaffolding to be landed first, so I'll do that in another bug.
Status: NEW → RESOLVED
Closed: 6 years ago
Keywords: leave-open
Resolution: --- → FIXED
Summary: Define uintptr_t field in gc::Cell → Simplify GC relocations
Depends on: 1482844
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: