Closed Bug 1008590 Opened 10 years ago Closed 10 years ago

Remove chars pointer for inline strings, store JSString length and flags separately

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: jandem, Assigned: jandem)

References

Details

Attachments

(1 file, 2 obsolete files)

Attached patch WIP (obsolete) — Splinter Review
With the patch in bug 1008421, JSString's 4th word is only used for inline chars. My initial plan was to store the string length there, and stuff the length for inline strings in the flags word (it must be very small if we use inline chars).

However, this is a bit annoying because we'd have two different length fields and it complicates a lot of code, not just JSString::length().

There's a much nicer alternative: stop storing the chars pointer for inline strings, as it can be derived from the string pointer. This will add a branch to JSLinearString::chars(), but this seems negligible.

Then we can store the length and flags in two separate 32-bit fields and eliminate the right-shift to get the string length. It also means we can shrink JSString on 64-bit from 4 to 3 words.

The attached patch does this and passes jit-tests on x86 and x64. I also refactored the flags a bit: because we have 32 available flags now, we can make some queries (isFlat, isInline, isFatInline etc) more efficient. Benchmarks don't seem affected (on x64 Octane-pdfjs is about 500 points faster, likely because our strings are smaller now on 64-bit).
This is also nice for bug 903519 (allocating strings and chars in the nursery), because it eliminates the getAllocKind() calls in JSString::isFatInline and JSString::isExternal.
Blocks: 903519
Comment on attachment 8420545 [details] [diff] [review]
WIP

Review of attachment 8420545 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/String.h
@@ +197,5 @@
> + 000101 Inline
> + 010101 FatInline
> + 100001 External
> + 001001 Atom
> + 101001 PermAtom

I'd love to have every possible combination documented, even if some of them are invalid.
Attached patch Patch (obsolete) — Splinter Review
Green on Try.
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Attachment #8420839 - Flags: review?(luke)
Attachment #8420545 - Attachment is obsolete: true
(In reply to Nicholas Nethercote [:njn] from comment #2)
> I'd love to have every possible combination documented, even if some of them
> are invalid.

I tweaked the comment a bit, is this fine or should I add more info?
Flags: needinfo?(n.nethercote)
Comment on attachment 8420839 [details] [diff] [review]
Patch

Review of attachment 8420839 [details] [diff] [review]:
-----------------------------------------------------------------

I'm hoping to be able to easily answer questions like "what would flags 111111 mean... is that even valid?"
(In reply to Nicholas Nethercote [:njn] from comment #5)
> I'm hoping to be able to easily answer questions like "what would flags
> 111111 mean... is that even valid?"

The comment explains that's a string with the IsFlat, HasBase, IsInline, IsAtom flags (the lowest 4 bits). It also explains HasBase is only used for Dependent and Undepended and these have fixed flags: never used as atoms and don't have inline chars, so this type is not valid :)

FWIW, if we remove the HasBase flag, we get 111101: a FatInlinePermanentAtom. I'm not sure we use Inline/FatInline atm for permanent atoms, but the type itself is valid.
> > I'm hoping to be able to easily answer questions like "what would flags
> > 111111 mean... is that even valid?"
> 
> The comment explains that's a string with the IsFlat, HasBase, IsInline,
> IsAtom flags (the lowest 4 bits). It also explains HasBase is only used for
> Dependent and Undepended and these have fixed flags: never used as atoms and
> don't have inline chars, so this type is not valid :)

Well, yes. It'd be nice if I didn't have to deduce all that for myself. My motivation is that in the past I tried to enumerate all the possibilities exactly like this and I was unable to work out if certain combinations were valid. The new scheme is simpler, so maybe such a comment is not necessary; it's just a suggestion.
Flags: needinfo?(n.nethercote)
Attached patch PatchSplinter Review
Rebased to tip, with some minor tweaks.
Attachment #8420839 - Attachment is obsolete: true
Attachment #8420839 - Flags: review?(luke)
Attachment #8423000 - Flags: review?(luke)
Comment on attachment 8423000 [details] [diff] [review]
Patch

Review of attachment 8423000 [details] [diff] [review]:
-----------------------------------------------------------------

Overall, looks great.  A couple nits and possibly one bug below.

::: js/src/vm/String.h
@@ +175,5 @@
>       * have no such entry. The "subtype predicate" entry for a type specifies
>       * the predicate used to query whether a JSString instance is subtype
>       * (reflexively) of that type.
>       *
> +     *   Rope          000000       000000

Pre-existing, but we seem to have lost the column headings for "instance encoding" and "subtype encoding"; can you add them back in?

@@ +201,5 @@
> +     *
> +     * Some types also use the other 2 bits:
> +     *
> +     *   Bit 4: Extensible, FatInline
> +     *   Bit 5: External, PermanentAtom

This was a bit confusing.  I think what you're saying is that bits 4 and 5 don't imply a particular string, they are part of a denser encoding scheme.  In that case, listing out the cases just restates the above table.

@@ +205,5 @@
> +     *   Bit 5: External, PermanentAtom
> +     *
> +     * The Rope, Dependent, Undepended, Extensible and External types have fixed flags:
> +     * none of the other flags will be set for these types (they are not used as
> +     * atoms and don't have inline chars).

I think this para is mostly implied by the table which specifies the instance encoding.

@@ +223,2 @@
>  
> +    static const uint32_t PERMANENT_BIT         = JS_BIT(5);

Assuming External strings aren't permanent, I don't think you should have a PERMANENT_BIT, just a PERMANENT_ATOM_FLAGS (in the style of EXTERNAL_FLAGS).  Also, there is an isPermanent() below that would return true for an External string.

@@ +383,5 @@
>      }
>  
> +    MOZ_ALWAYS_INLINE
> +    bool isFatInline() const {
> +        return (d.u1.flags & FAT_INLINE_FLAGS) == FAT_INLINE_FLAGS;

I wonder if it would be nicer to distinguish *_FLAGS (which imply that you test with d.u1.flags == *_FLAGS) and cases where you you need to do this mask + equality.  In generality, the mask could be different then the compared value, so you could have a (*_MASK, *_MATCH) pair for each one.  What do you think?

@@ +421,5 @@
>  
>      /* Only called by the GC for dependent or undepended strings. */
>  
>      inline bool hasBase() const {
> +        JS_STATIC_ASSERT((DEPENDENT_FLAGS | FLAT_BIT) == UNDEPENDED_FLAGS);

Seems like you don't need the static assert, given that this is all by design now.

@@ +671,5 @@
>      template <js::AllowGC allowGC>
>      static inline JSInlineString *new_(js::ThreadSafeContext *cx);
>  
>      inline jschar *init(size_t length);
> +    inline jschar *initFat(size_t length);

Could you move this to JSFatInlineString?  I was thinking it would be nice to use the static type to prevent initializing an inline string with initFat or vice versa.
Attachment #8423000 - Flags: review?(luke) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/85e85da78909

(Green on Try a few days ago.)

(In reply to Luke Wagner [:luke] (on partial PTO) from comment #9)
> Overall, looks great.  A couple nits and possibly one bug below.

Thanks for the careful review!

> ::: js/src/vm/String.h
> Pre-existing, but we seem to have lost the column headings for "instance
> encoding" and "subtype encoding"; can you add them back in?

Done.

> This was a bit confusing.  I think what you're saying is that bits 4 and 5
> don't imply a particular string, they are part of a denser encoding scheme. 
> In that case, listing out the cases just restates the above table.

I can see how it could be confusing. Removed that part.

> I think this para is mostly implied by the table which specifies the
> instance encoding.

Same.

> Assuming External strings aren't permanent, I don't think you should have a
> PERMANENT_BIT, just a PERMANENT_ATOM_FLAGS (in the style of EXTERNAL_FLAGS).
> Also, there is an isPermanent() below that would return true for an External
> string.

isPermanent is a method of JSAtom and External strings are not atoms, so there's no bug. I agree PERMANENT_BIT is confusing, removed it. I also changed JSAtom::isPermanent() to just call JSString::isPermanentAtom().

> I wonder if it would be nicer to distinguish *_FLAGS (which imply that you
> test with d.u1.flags == *_FLAGS) and cases where you you need to do this
> mask + equality.  In generality, the mask could be different then the
> compared value, so you could have a (*_MASK, *_MATCH) pair for each one. 
> What do you think?

Nice idea. I think the separate MASK/MATCH constants could make it more confusing though. I changed it to:

    static const uint32_t FAT_INLINE_MASK       = INLINE_CHARS_BIT | JS_BIT(4);
    static const uint32_t PERMANENT_ATOM_MASK   = ATOM_BIT | JS_BIT(5);

    /* Initial flags for inline and fat inline strings. */
    static const uint32_t INIT_INLINE_FLAGS     = FLAT_BIT | INLINE_CHARS_BIT;
    static const uint32_t INIT_FAT_INLINE_FLAGS = FLAT_BIT | FAT_INLINE_MASK;

The *_MASK ones are used for type tests.

> Could you move this to JSFatInlineString?  I was thinking it would be nice
> to use the static type to prevent initializing an inline string with initFat
> or vice versa.

I wasn't sure about overloading init() with the same signature, but it makes sense here; done.
https://hg.mozilla.org/mozilla-central/rev/85e85da78909
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
You need to log in before you can comment on or make changes to this bug.