Avoid malloc calls for small latin1 strings in String.prototype.toLowerCase/toUpperCase

RESOLVED FIXED in Firefox 56

Status

()

RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: anba, Assigned: anba)

Tracking

(Blocks: 1 bug)

Trunk
mozilla56
Points:
---

Firefox Tracking Flags

(firefox56 fixed)

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Assignee)

Description

2 years ago
String.prototype.toLowerCase:

If the input string fits into an inline string, we can directly allocate an inline string instead of calling malloc and then later transferring the malloc'ed chars into an inline string.

toLowerCase shows up in the Speedometer and general browsing profiles in bug 1365361, so optimizing this function to use fewer mallocs might help in both cases.


String.prototype.toUpperCase:

We can't easily use the same approach for toUpperCase because we don't know the output string size due to the special casing for "ß". (Unless we traverse the complete input string and search for "ß".) So for now only specialize one-element string case, which is quite common when implementing title casing (*). 


(*) Grep'ing for toUpperCase() in Speedometer showed a couple of title-casing methods, generally in the form of `s[0].toUpperCase() + s.substring(1)`.
(Assignee)

Comment 1

2 years ago
Posted patch bug1367779.patch (obsolete) — Splinter Review
function ToLower(strlen, charset) {
    var ss = [];
    for (var i = 0; i <= 0xff; ++i) ss[i] = String.fromCharCode(...Array(strlen).fill(i));

    var mask = charset === "ASCII" ? 0x7f : 0xff;
    var q = 0;
    var t = Date.now();
    for (var i = 0; i < 10000000; ++i) {
        q += ss[i & mask].toLowerCase().charCodeAt(0);
    }
    print(Date.now() - t, q);
}

ToLower(1, "ASCII"): From 540ms to 270ms
ToLower(2, "ASCII"): From 560ms to 380ms
ToLower(3, "ASCII"): From 680ms to 390ms
ToLower(23, "ASCII"): From 1050ms to 785ms

ToLower(1, "Latin"): From 560ms to 285ms
ToLower(1, "Latin"): From 665ms to 400ms
ToLower(1, "Latin"): From 730ms to 430ms



function TitleCase() {
    var words = MakeWords(); // All words from /usr/share/dict/words
    var q = 0;
    var t = Date.now();
    for (var i = 0; i < 1000000; ++i) {
        var w = words[i % words.length];
        q += (w[0].toUpperCase() + w.slice(1)).length;
    }
    return [Date.now() - t, q];
}

Improves from 280ms to 160ms.



The change to str_normalize is just a drive-by improvement, which I tried to sneak into one of my changesets since quite a while:
We don't need to linearize the string to test for Latin1 contents and AutoStableStringChars::initTwoByte() internally calls ensureLinear, too. So we can remove the call to ensureLinear() from str_normalize.
Attachment #8871317 - Flags: review?(jdemooij)
Comment on attachment 8871317 [details] [diff] [review]
bug1367779.patch

Review of attachment 8871317 [details] [diff] [review]:
-----------------------------------------------------------------

Nice results, thanks!

::: js/src/jsstr.cpp
@@ +847,5 @@
>  
>          resultLength = length;
> +
> +        if (IsSame<CharT, Latin1Char>::value) {
> +            // Latin-1 strings don't have special lower case mappings, so we

What do you think about replacing the malloc with a Vector or StringBuffer?

If StringBuffer is too slow we could use a Vector<CharT, 32, SystemAllocPolicy> or something and then factor out the code in StringBuffer::finishString so we can reuse it here. Then this would work transparently for TwoByte strings too, which are not that uncommon in the browser due to external strings for instance (tag names like "DIV" are probably often lower-cased/upper-cased).

@@ +853,5 @@
> +            if (JSInlineString::lengthFits<CharT>(resultLength)) {
> +                CharT* storage;
> +                inlineStr = AllocateInlineString<NoGC>(cx, resultLength, &storage);
> +                if (inlineStr)
> +                    newChars.reset(storage);

If anything after this is fallible, we would try to free() bogus memory (inline chars). Not sure if there's something we should do about that here, maybe add a separate newChars (raw pointer) and newCharsHolder (auto pointer to free it)?

@@ +1159,5 @@
> +                    MOZ_ASSERT(upper <= JSString::MAX_LATIN1_CHAR);
> +                    MOZ_ASSERT(StaticStrings::hasUnit(upper));
> +
> +                    return cx->staticStrings().getUnit(upper);
> +                } else {

Nit: no else after return
Attachment #8871317 - Flags: review?(jdemooij) → review+
(Assignee)

Comment 3

2 years ago
(In reply to Jan de Mooij [:jandem] from comment #2)
> ::: js/src/jsstr.cpp
> @@ +847,5 @@
> >  
> >          resultLength = length;
> > +
> > +        if (IsSame<CharT, Latin1Char>::value) {
> > +            // Latin-1 strings don't have special lower case mappings, so we
> 
> What do you think about replacing the malloc with a Vector or StringBuffer?
> 
> If StringBuffer is too slow we could use a Vector<CharT, 32,
> SystemAllocPolicy> or something and then factor out the code in
> StringBuffer::finishString so we can reuse it here. Then this would work
> transparently for TwoByte strings too, which are not that uncommon in the
> browser due to external strings for instance (tag names like "DIV" are
> probably often lower-cased/upper-cased).

Profiles show that ToLowerCase with two-byte strings is more common in Speedometer, probably because of external strings, just like you said. So it seems more sensible to use the StringBuffer/Vector approach. But when I tried to convert the code to use Vector, I realised that the Vector-API may not be the perfect fit for this use case:

1. Unless the string contains special-casing characters, the string length doesn't change, so we can directly allocate a correctly sized backing storage. Vector::resize/reserve() would allocate too much storage (visible in µ-benchmarks), so I changed it to Vector::initCapacity() (with a check to make sure initCapacity is not called if the inline storage is already sufficiently large). 
2. If we then encounter special-casing characters, the Vector-API doesn't provide any methods to realloc/malloc to the correct final size, so we're stuck with Vector::resize(), which means we over-allocate.

For example, this µ-benchmark regresses from ~500ms to ~1000ms when using Vectors:

    // U+0130 lower cases to the sequence <U+0069 U+0307>.
    var s = ("A".repeat(50) + "\u0130").split("").join("");
    var t = Date.now();
    for (var i = 0; i < 1000000; ++i) {
        s.toLowerCase();
    }
    print(Date.now() - t);

So we may be better off with a custom solution...
(Assignee)

Comment 4

2 years ago
Alternative patch to avoid the Vector limitations described in comment #3.

Does this still look like a sensible solution?
Attachment #8875748 - Flags: feedback?(jdemooij)
Comment on attachment 8875748 [details] [diff] [review]
bug1367779-inline-vector.patch

Review of attachment 8875748 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!

For Vector you could do something like ExtractWellSized in StringBuffer.cpp This is pretty straight-forward though and allocating exactly what we need is nice. Later we can probably use InlineCharVector elsewhere.

::: js/src/jsstr.cpp
@@ +630,5 @@
> +    static constexpr size_t value = JSFatInlineString::MAX_LENGTH_TWO_BYTE;
> +};
> +
> +template <typename CharT>
> +class InlineCharVector

It would be good to add a comment explaining why we don't want to use Vector.

We should make this MOZ_NON_PARAM or MOZ_STACK_CLASS probably so we don't end up copying the inline space.

@@ +639,5 @@
> +    CharT inlineStorage[InlineCapacity];
> +    CharPtr heapStorage;
> +
> +#ifdef DEBUG
> +    size_t lastRequestedLength = 0;

Nit: initialize this in the constructor? I don't think we use these initializers elsewhere.

@@ +885,5 @@
>  {
>      // Unlike toUpperCase, toLowerCase has the nice invariant that if the
>      // input is a Latin-1 string, the output is also a Latin-1 string.
> +
> +    InlineCharVector<CharT> newChars;

I know ToUpperCase is a bit more complicated but can we also use this there?

@@ +904,5 @@
>  
> +        // One element Latin-1 strings can be directly retrieved from the
> +        // static strings cache.
> +        if (IsSame<CharT, Latin1Char>::value) {
> +            if (length == 1) {

Does this static-strings case show up anywhere for non-Latin1 strings? It should be easy to fix so it might be worth investigating.
Attachment #8875748 - Flags: feedback?(jdemooij) → feedback+
(Assignee)

Comment 6

2 years ago
(In reply to Jan de Mooij [:jandem] from comment #5)
> @@ +639,5 @@
> > +    CharT inlineStorage[InlineCapacity];
> > +    CharPtr heapStorage;
> > +
> > +#ifdef DEBUG
> > +    size_t lastRequestedLength = 0;
> 
> Nit: initialize this in the constructor? I don't think we use these
> initializers elsewhere.

There are currently only a handful of uses (http://searchfox.org/mozilla-central/rev/a798ee4fc323f9387b7576dbed177859d29d09b7/js/src/vm/TraceLoggingGraph.h#233-243, http://searchfox.org/mozilla-central/rev/a798ee4fc323f9387b7576dbed177859d29d09b7/js/src/frontend/TokenStream.h#353-354, http://searchfox.org/mozilla-central/rev/a798ee4fc323f9387b7576dbed177859d29d09b7/js/src/frontend/Parser.h#1050, ...). FWIW Waldo gave me his permission to use member initialisers in bug 1041341, comment 8. ;-)


> 
> @@ +885,5 @@
> >  {
> >      // Unlike toUpperCase, toLowerCase has the nice invariant that if the
> >      // input is a Latin-1 string, the output is also a Latin-1 string.
> > +
> > +    InlineCharVector<CharT> newChars;
> 
> I know ToUpperCase is a bit more complicated but can we also use this there?

Yes, I think so. But then we need to instantiate InlineCharVector<Latin1> and InlineCharVector<char16_t>, because of µ, ÿ, and ß, don't we? I don't know if the increased stack size has any negative impact on the performance.


> 
> @@ +904,5 @@
> >  
> > +        // One element Latin-1 strings can be directly retrieved from the
> > +        // static strings cache.
> > +        if (IsSame<CharT, Latin1Char>::value) {
> > +            if (length == 1) {
> 
> Does this static-strings case show up anywhere for non-Latin1 strings? It
> should be easy to fix so it might be worth investigating.

Hmm, I don't know. Is it common to have one character two-byte strings where the single character is a Latin-1 character? I think most operations already implicitly turn the two-byte string into a Latin-1 string when the single character is Latin-1, so it may not be common. For example `isLatin1(newExternalString("HTML")[0])` and `isLatin1(newExternalString("HTML").substring(0, 1))` both return true.
(In reply to André Bargull from comment #6)
> FWIW Waldo gave me his permission to use member initialisers in bug
> 1041341, comment 8. ;-)

Oh nice, nevermind then :)

> Yes, I think so. But then we need to instantiate InlineCharVector<Latin1>
> and InlineCharVector<char16_t>, because of µ, ÿ, and ß, don't we? I don't
> know if the increased stack size has any negative impact on the performance.

Can you use MaybeOneOf so they "share" space?

> Hmm, I don't know. Is it common to have one character two-byte strings where
> the single character is a Latin-1 character? I think most operations already
> implicitly turn the two-byte string into a Latin-1 string when the single
> character is Latin-1, so it may not be common. For example
> `isLatin1(newExternalString("HTML")[0])` and
> `isLatin1(newExternalString("HTML").substring(0, 1))` both return true.

Yeah it's probably not worth worrying about.
(Assignee)

Comment 8

2 years ago
Thanks for the MaybeOneOf pointer, I didn't realise I could use it here (despite already being present in the existing ToUpperCase code)!


Additional notes (*):
- InlineCharVector was renamed to InlineCharBuffer, because "...Vector" may give the wrong impression that the class is related to mozilla::Vector.
- ReallocChars is now directly inlined in InlineCharBuffer::maybeRealloc.
- Null-termination was removed from ToLowerCaseImpl/ToUpperCaseImpl because it's now handled in InlineCharBuffer::toString.
- InlineCharBuffer::maybeAlloc/maybeRealloc take string lengths, not allocation sizes, therefore the various |+ 1|'s were removed in ToUpperCase/ToLowerCase.
- Fast paths were added to ToLowerCase/ToUpperCase for one-element latin-1 strings to improve performance when strings are used as single characters.


(*) Some of the changes were already present in the earlier patches. I'm just writing them down for documentation purposes and to make sure I didn't forget anything. :-)
Attachment #8871317 - Attachment is obsolete: true
Attachment #8875748 - Attachment is obsolete: true
Attachment #8876249 - Flags: review?(jdemooij)
Comment on attachment 8876249 [details] [diff] [review]
bug1367779.patch

Review of attachment 8876249 [details] [diff] [review]:
-----------------------------------------------------------------

Great patch.

As a follow-up, maybe we can use this new class also to simplify fromCharCode/fromCodePoint with nargs > 1 (I think we can remove the *_few_args functions)? Escape also calls pod_malloc/NewString right now (and should probably return the input string if no escaping is necessary..)
Attachment #8876249 - Flags: review?(jdemooij) → review+
Can we land this one? :)
Flags: needinfo?(andrebargull)
(Assignee)

Comment 11

2 years ago
Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cab9c7d94d21f175ec0b66dcab69cc9aea49b6f3
Flags: needinfo?(andrebargull)
Keywords: checkin-needed

Comment 12

2 years ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a90ad4d15a6f
Use stack allocated storage in case conversion operations when the result fits into JSInlineString. r=jandem
Keywords: checkin-needed

Comment 13

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/a90ad4d15a6f
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox56: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
You need to log in before you can comment on or make changes to this bug.