Closed Bug 1015917 Opened 6 years ago Closed 6 years ago

Latin 1 strings: support string concatenation

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: jandem, Assigned: jandem)

References

Details

Attachments

(3 files, 1 obsolete file)

There are 3 parts to this: (1) Create latin1 ropes (or fat inline strings for small strings) if both sides are latin1, (2) make rope flattening work and (3) tests.
Probably the most complicated part here is the JIT string concat stub:

(1) If both lhs and rhs are Latin1, the allocated rope must have the Latin1 flag.
(2) If both lhs and rhs are Latin1, create a Latin1 fat inline string. Else, we create a TwoByte string and we have to either copy or inflate the lhs and rhs.

The string concat stub is the only place where JIT code does complicated string stuff, so the other patches will be mostly C++ code.
Attachment #8428655 - Flags: review?(luke)
The patch templatizes JSRope::flattenInternal on the character encoding (Latin1 or TwoByte).

It was fairly straight-forward. Note that when we create a TwoByte string, the child nodes can be either Latin1 or TwoByte so we have to handle both cases. When we flatten a Latin1 rope everything must be Latin1.

The next part will add tests.
Attachment #8428661 - Flags: review?(luke)
Attached patch Part 3 - TestsSplinter Review
Attachment #8428676 - Flags: review?(luke)
Comment on attachment 8428655 [details] [diff] [review]
Part 1 - Create ropes or fat inline strings

Review of attachment 8428655 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/CodeGenerator.cpp
@@ +5300,5 @@
>      }
>  
> +    masm.ret();
> +
> +    masm.bind(&isFatInlineLatin1);

Do you think you could factor out the following latin1 codegen path with the twoByte path?

::: js/src/jsstr.cpp
@@ +2779,5 @@
>  
>      JSFatInlineString *str = js_NewGCFatInlineString<CanGC>(cx);
>      if (!str)
>          return nullptr;
> +    jschar *buf = str->initTwoByte(outputLen);

Pre-existing, but can you put a \n after the "if (!str) return nullptr;"?

::: js/src/vm/String.cpp
@@ +433,5 @@
> +                InflateStringToBuffer(leftInspector.latin1Chars(), leftLen, buf);
> +            if (rightInspector.hasTwoByteChars())
> +                PodCopy(buf + leftLen, rightInspector.twoByteChars(), rightLen);
> +            else
> +                InflateStringToBuffer(rightInspector.latin1Chars(), rightLen, buf + leftLen);

InflateStringToBuffer is kindof a lame name; can you either rename it or just clone it with a good name (perhaps CopyAndInflateChars)?
Attachment #8428655 - Flags: review?(luke) → review+
Comment on attachment 8428661 [details] [diff] [review]
Part 2 - Make JSRope::flatten work

Review of attachment 8428661 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/String.cpp
@@ +277,2 @@
>  JSFlatString *
> +JSRope::flattenInternal(ExclusiveContext *maybecx, Op &op)

Could this whole 'op' abstraction be avoided by instead having a CharT template parameter?  It seems like most things would just work.  You can test IsTwoByte by:
 SameType<CharT, jschar>::value
where
 template <class T> struct SameType { enum { value = false; } };
 template <class T> struct SameType<T, T> { enum { value = true; } };
and then you could also add some templated char accessor:
 str->chars<CharT>
(which should coexist nicely with the current non-templated chars).
Attachment #8428676 - Flags: review?(luke) → review+
Thanks for the suggestion, this is indeed a bit simpler.
Attachment #8428661 - Attachment is obsolete: true
Attachment #8428661 - Flags: review?(luke)
Attachment #8430133 - Flags: review?(luke)
Comment on attachment 8430133 [details] [diff] [review]
Part 2 - Make JSRope::flatten work (v2)

Review of attachment 8430133 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

::: js/src/vm/String.h
@@ +1249,5 @@
> +}
> +
> +template<>
> +MOZ_ALWAYS_INLINE const char *
> +JSLinearString::nonInlineChars(const JS::AutoAssertNoGC &nogc) const

I'm a bit surprised you were able to write this instead of:
  JSLinearString::nonInlineChars<char>(const JS::AutoAssertNoGC &nogc) const
you might want to try-server compilation on all platforms just in case.
Attachment #8430133 - Flags: review?(luke) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/ca48add6d154
https://hg.mozilla.org/integration/mozilla-inbound/rev/4dd20813fe3b
https://hg.mozilla.org/integration/mozilla-inbound/rev/cdc1cbc3ce54

(In reply to Luke Wagner [:luke] from comment #4)
> Do you think you could factor out the following latin1 codegen path with the
> twoByte path?

Done.

> Pre-existing, but can you put a \n after the "if (!str) return nullptr;"?

Done.

> InflateStringToBuffer is kindof a lame name; can you either rename it or
> just clone it with a good name (perhaps CopyAndInflateChars)?

Renamed it to CopyAndInflateChars.

(In reply to Luke Wagner [:luke] from comment #7)
> I'm a bit surprised you were able to write this instead of:
>   JSLinearString::nonInlineChars<char>(const JS::AutoAssertNoGC &nogc) const
> you might want to try-server compilation on all platforms just in case.

GCC and MSVC are both happy (tested Clang locally), even the older GCC on b2g.
You need to log in before you can comment on or make changes to this bug.