Closed Bug 1564347 Opened 6 years ago Closed 3 years ago

Use a lookup table for String.prototype.toLowerCase and inline it

Categories

(Core :: JavaScript Engine: JIT, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
106 Branch
Tracking Status
firefox70 --- wontfix
firefox106 --- fixed

People

(Reporter: anba, Assigned: anba)

Details

Attachments

(6 files)

jorendorff recently mentioned toLowerCase is still too slow: https://mozilla.logbot.info/jsapi/20190404#c16174991-c16175030.

At least for Latin-1 strings we can do better by using a lookup table for the lower-case operation.

Note: The upper-case operation can't easily be optimised through a lookup table, because U+00B5, U+00DF, and U+00FF don't map to Latin-1 characters resp. map to a sequence of characters:

  • UpperCase(U+00B5) = U+039C
  • UpperCase(U+00DF) = U+0053 U+0053
  • UpperCase(U+00FF) = U+0178

A lookup table alone leads to a ~15% speed-up in µ-benchmarks, because at least on x86 it can be expressed as two mov instructions.

Based on our current implementation:

#include <cstddef>
#include <cstdint>

class CharacterInfo {
 public:
  uint16_t upperCase;
  uint16_t lowerCase;
  uint8_t flags;
};

extern const uint8_t index1[];
extern const uint8_t index2[];
extern const CharacterInfo js_charinfo[];

inline const CharacterInfo& CharInfo(char16_t code) {
  const size_t shift = 6;
  size_t index = index1[code >> shift];
  index = index2[(index << shift) + (code & ((1 << shift) - 1))];
  return js_charinfo[index];
}

char16_t ToLowerCase(char16_t ch) {
  if (ch < 128) {
    if (ch >= 'A' && ch <= 'Z') {
      return ch + ('a' - 'A');
    }
    return ch;
  }
  const CharacterInfo& info = CharInfo(ch);
  return uint16_t(ch) + info.lowerCase;
}

godbolt returns the following assembly for clang-8 with O3:

ToLowerCase(char16_t):                      # @ToLowerCase(char16_t)
        cmp     di, 127
        ja      .LBB0_2
        lea     ecx, [rdi - 65]
        lea     eax, [rdi + 32]
        cmp     cx, 26
        cmovae  eax, edi
        ret
.LBB0_2:
        movzx   eax, di
        shr     rax, 6
        movzx   eax, byte ptr [rax + index1]
        shl     rax, 6
        mov     ecx, edi
        and     ecx, 63
        movzx   eax, byte ptr [rax + rcx + index2]
        lea     rax, [rax + 2*rax]
        add     di, word ptr [rax + rax + js_charinfo+2]
        mov     eax, edi
        ret

Whereas a lookup table:

using Latin1Char = unsigned char;

extern const Latin1Char latin1ToLowerCaseTable[];

Latin1Char ToLowerCase(Latin1Char ch) { return latin1ToLowerCaseTable[ch]; }

Gives this assembler code:

ToLowerCase(unsigned char):                       # @ToLowerCase(unsigned char)
        mov     eax, edi
        mov     al, byte ptr [rax + latin1ToLowerCaseTable]
        ret

In addition to that, a lookup table also makes it easier to inline the complete lower-case operation in Ion.

Both optimisations, the lookup table and inlining the operation, result in the following improvements:

// Improves from ~495ms to ~245ms.
function test1() {
    var xs = ["a".repeat(20), "b".repeat(20)];
    var q = 0;
    var t = dateNow();
    for (var i = 0; i < 10000000; ++i) {
        q += xs[i&1].toLowerCase().length;
    }
    print(dateNow() - t, q);
}

// Improves from ~1040ms to ~325ms.
function test2() {
    var xs = ["A".repeat(20), "B".repeat(20)];
    var q = 0;
    var t = dateNow();
    for (var i = 0; i < 10000000; ++i) {
        q += xs[i&1].toLowerCase().length;
    }
    print(dateNow() - t, q);
}

Using a lookup table for Latin-1 lower-case conversion allows the compiler to
emit the sequence mov, cmp, setne for ChangesWhenUpperCased resp. two mov
instructions for ToLowerCase. This makes it faster than the current approach
which requires multiple instructions for both operations.
Latin-1 upper-case conversion wasn't change to use a lookup table, because
not all Latin-1 characters have an upper-case representation which is also a
single Latin-1 character, cf. the conversion for U+00B5, U+00DF, and U+00FF.

Priority: -- → P1

I'm a little hesitant to take the second Ion patch. We've been trying to move away from taking more complexity in Ion-only optimizations these days.

Jan, any thoughts here? The micro-benchmark shows nice results but this is a niche function and the CodeGenerator code is non-trivial. On the other hand, the codegen changes are mostly straight assembly and don't rely on TI, etc.

Flags: needinfo?(jdemooij)

FWIW JSC and V8 also both optimise toLowerCase on the assembler level:

toUppercase, on the other hand, is not optimised in their compilers.

Raph Levien recently wrote a crate for short ASCII string upper/lowercasing using bitmasks - https://github.com/zbraniecki/tinystr

Maybe we could use similar optimization assuming vast majority of real word cases are ASCII and <16 chars?

That looks like a different optimisation (converting multiple characters 4/8/16 at once), which additionally needs some pre-processing to know we're in the ASCII-only case. V8 does something similar for their string conversion (in addition to using lookup tables).

Flags: needinfo?(jdemooij)
Attachment #9076738 - Attachment description: Bug 1564347 - Part 2: Inline String.prototype.toLowerCase for small Latin-1 strings. r=tcampbell! → Bug 1564347 - Part 2: Inline String.prototype.toLowerCase for small Latin-1 strings. r=jandem!
Keywords: leave-open

:anba, this patch failed t oland, can you please take a look?
"We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again. applying /tmp/tmph89jog js/src/builtin/String.cpp Hunk #1 FAILED at 706. Hunk #3 succeeded at 793 with fuzz 2 (offset -2 lines). 1 out of 3 hunks FAILED -- saving rejects to file js/src/builtin/String.cpp.rej abort: patch command failed: exited with status 256"

Flags: needinfo?(andrebargull)

(In reply to Cristian Brindusan [:cbrindusan] from comment #7)

:anba, this patch failed t oland, can you please take a look?

Ah, I just forgot to push the rebased changeset to Phabricator.

Flags: needinfo?(andrebargull)
Pushed by nbeleuzu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b99041c4bea3 Part 1: Add a lookup table for Latin-1 lower-case conversion. r=tcampbell

The leave-open keyword is there and there is no activity for 6 months.
:sdetar, maybe it's time to close this bug?

Flags: needinfo?(sdetar)

Jason, should we still leave this bug open?

Flags: needinfo?(sdetar) → needinfo?(jorendorff)

(In reply to Steven DeTar [:sdetar] from comment #12)

Jason, should we still leave this bug open?

André still has a patch up for review that we should consider landing.

Flags: needinfo?(jorendorff)

The leave-open keyword is there and there is no activity for 6 months.
:sdetar, maybe it's time to close this bug?

Flags: needinfo?(sdetar)
Flags: needinfo?(sdetar)

The leave-open keyword is there and there is no activity for 6 months.
:sdetar, maybe it's time to close this bug?

Flags: needinfo?(sdetar)
Flags: needinfo?(sdetar)

The leave-open keyword is there and there is no activity for 6 months.
:sdetar, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(sdetar)
Severity: normal → S4
Flags: needinfo?(sdetar)
Priority: P1 → P3

I've did some additional testing to see how this change affects benchmarks and websites.

Where Calls Input Can Optimise Note
Speedometer 2 770'000 All lower, length ≤ MAX_LENGTH_LATIN1 Yes
JetStream 2 3'780'000 All lower, length ≤ MAX_LENGTH_LATIN1 Yes
JetStream 2 20'000 Ropes No
Web Tooling 270'000 All lower, length ≤ MAX_LENGTH_LATIN1 Yes
Web Tooling 50'000 Two-byte No
SunSpider 20'000 All lower, length ≤ MAX_LENGTH_LATIN1 Yes
reddit.com (loading start page, no scrolling) 50'000 All lower, length ≤ MAX_LENGTH_LATIN1 Yes
reddit.com (loading start page, no scrolling) 30'000 All lower, length > MAX_LENGTH_LATIN1 Yes
reddit.com (video loading) ~10'000 per sec All lower, length ≤ MAX_LENGTH_LATIN1 Yes
reddit.com (video loading) ~10'000 per sec All lower, length > MAX_LENGTH_LATIN1 Yes Depends on video, also saw 10K only every 3 secs
reddit.com (scrolling down for 1 min) 610'000 All lower, length ≤ MAX_LENGTH_LATIN1 Yes
reddit.com (scrolling down for 1 min) 560'000 All lower, length > MAX_LENGTH_LATIN1 Yes

André, do you have an updated patch for part 2? Given the results in comment 17 and other engines having a similar optimization, we should probably land it.

Move inline string allocation into a separate function.

Attachment #9076738 - Attachment description: Bug 1564347 - Part 2: Inline String.prototype.toLowerCase for small Latin-1 strings. r=jandem! → Bug 1564347 - Part 3: Inline String.prototype.toLowerCase for small Latin-1 strings. r=jandem!

Sure, I've uploaded the rebased patches.

I noticed this was missing when I added the cmpb methods in part 4.

Depends on D155608

Pushed by andre.bargull@gmail.com: https://hg.mozilla.org/integration/autoland/rev/bd647f575382 Part 2: Add AllocateThinOrFatInlineString. r=jandem https://hg.mozilla.org/integration/autoland/rev/5805802dc654 Part 3: Inline String.prototype.toLowerCase for small Latin-1 strings. r=jandem https://hg.mozilla.org/integration/autoland/rev/2710e38f2381 Part 4: Add MacroAssembler::branch8(BaseIndex, Register). r=jandem https://hg.mozilla.org/integration/autoland/rev/48b73980369d Part 5: Use branch8 in CodeGenerator::visitStringToLowerCase(). r=jandem https://hg.mozilla.org/integration/autoland/rev/402644d8b48d Part 6: Use eax encoding for cmpb when possible. r=jandem
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → 106 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: