Closed Bug 1564347 Opened 6 years ago Closed 3 years ago

Use a lookup table for String.prototype.toLowerCase and inline it

Tracking

()

Status:

RESOLVED FIXED

Milestone:

106 Branch

Tracking Flags:

Tracking

Status

firefox70

---

wontfix

firefox106

---

fixed

People

(Reporter: anba, Assigned: anba)

Details

Attachments

(6 files)

Bug 1564347 - Part 1: Add a lookup table for Latin-1 lower-case conversion. r=tcampbell! 6 years ago André Bargull [:anba] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1564347 - Part 3: Inline String.prototype.toLowerCase for small Latin-1 strings. r=jandem! 6 years ago André Bargull [:anba] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1564347 - Part 2: Add AllocateThinOrFatInlineString. r=jandem! 3 years ago André Bargull [:anba] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1564347 - Part 4: Add MacroAssembler::branch8(BaseIndex, Register). r=jandem! 3 years ago André Bargull [:anba] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1564347 - Part 5: Use branch8 in CodeGenerator::visitStringToLowerCase(). r=jandem! 3 years ago André Bargull [:anba] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1564347 - Part 6: Use eax encoding for cmpb when possible. r=jandem! 3 years ago André Bargull [:anba] 48 bytes, text/x-phabricator-request		Details \| Review

André Bargull [:anba]

Assignee

Description

•

6 years ago

jorendorff recently mentioned toLowerCase is still too slow: https://mozilla.logbot.info/jsapi/20190404#c16174991-c16175030.

At least for Latin-1 strings we can do better by using a lookup table for the lower-case operation.

Note: The upper-case operation can't easily be optimised through a lookup table, because U+00B5, U+00DF, and U+00FF don't map to Latin-1 characters resp. map to a sequence of characters:

UpperCase(U+00B5) = U+039C
UpperCase(U+00DF) = U+0053 U+0053
UpperCase(U+00FF) = U+0178

A lookup table alone leads to a ~15% speed-up in µ-benchmarks, because at least on x86 it can be expressed as two mov instructions.

Based on our current implementation:

#include <cstddef>
#include <cstdint>

class CharacterInfo {
 public:
  uint16_t upperCase;
  uint16_t lowerCase;
  uint8_t flags;
};

extern const uint8_t index1[];
extern const uint8_t index2[];
extern const CharacterInfo js_charinfo[];

inline const CharacterInfo& CharInfo(char16_t code) {
  const size_t shift = 6;
  size_t index = index1[code >> shift];
  index = index2[(index << shift) + (code & ((1 << shift) - 1))];
  return js_charinfo[index];
}

char16_t ToLowerCase(char16_t ch) {
  if (ch < 128) {
    if (ch >= 'A' && ch <= 'Z') {
      return ch + ('a' - 'A');
    }
    return ch;
  }
  const CharacterInfo& info = CharInfo(ch);
  return uint16_t(ch) + info.lowerCase;
}

godbolt returns the following assembly for clang-8 with O3:

ToLowerCase(char16_t):                      # @ToLowerCase(char16_t)
        cmp     di, 127
        ja      .LBB0_2
        lea     ecx, [rdi - 65]
        lea     eax, [rdi + 32]
        cmp     cx, 26
        cmovae  eax, edi
        ret
.LBB0_2:
        movzx   eax, di
        shr     rax, 6
        movzx   eax, byte ptr [rax + index1]
        shl     rax, 6
        mov     ecx, edi
        and     ecx, 63
        movzx   eax, byte ptr [rax + rcx + index2]
        lea     rax, [rax + 2*rax]
        add     di, word ptr [rax + rax + js_charinfo+2]
        mov     eax, edi
        ret

Whereas a lookup table:

using Latin1Char = unsigned char;

extern const Latin1Char latin1ToLowerCaseTable[];

Latin1Char ToLowerCase(Latin1Char ch) { return latin1ToLowerCaseTable[ch]; }

Gives this assembler code:

ToLowerCase(unsigned char):                       # @ToLowerCase(unsigned char)
        mov     eax, edi
        mov     al, byte ptr [rax + latin1ToLowerCaseTable]
        ret

In addition to that, a lookup table also makes it easier to inline the complete lower-case operation in Ion.

Both optimisations, the lookup table and inlining the operation, result in the following improvements:

// Improves from ~495ms to ~245ms.
function test1() {
    var xs = ["a".repeat(20), "b".repeat(20)];
    var q = 0;
    var t = dateNow();
    for (var i = 0; i < 10000000; ++i) {
        q += xs[i&1].toLowerCase().length;
    }
    print(dateNow() - t, q);
}

// Improves from ~1040ms to ~325ms.
function test2() {
    var xs = ["A".repeat(20), "B".repeat(20)];
    var q = 0;
    var t = dateNow();
    for (var i = 0; i < 10000000; ++i) {
        q += xs[i&1].toLowerCase().length;
    }
    print(dateNow() - t, q);
}

André Bargull [:anba]

Assignee

Comment 1

•

6 years ago

Attached file Bug 1564347 - Part 1: Add a lookup table for Latin-1 lower-case conversion. r=tcampbell! — Details

Using a lookup table for Latin-1 lower-case conversion allows the compiler to
emit the sequence mov, cmp, setne for ChangesWhenUpperCased resp. two mov
instructions for ToLowerCase. This makes it faster than the current approach
which requires multiple instructions for both operations.
Latin-1 upper-case conversion wasn't change to use a lookup table, because
not all Latin-1 characters have an upper-case representation which is also a
single Latin-1 character, cf. the conversion for U+00B5, U+00DF, and U+00FF.

André Bargull [:anba]

Assignee

Comment 2

•

6 years ago

Attached file Bug 1564347 - Part 3: Inline String.prototype.toLowerCase for small Latin-1 strings. r=jandem! — Details

Depends on D37376

Steven DeTar [:sdetar]

Updated

•

6 years ago

Priority: -- → P1

Ted Campbell [:tcampbell]

Comment 3

•

6 years ago

I'm a little hesitant to take the second Ion patch. We've been trying to move away from taking more complexity in Ion-only optimizations these days.

Jan, any thoughts here? The micro-benchmark shows nice results but this is a niche function and the CodeGenerator code is non-trivial. On the other hand, the codegen changes are mostly straight assembly and don't rely on TI, etc.

Flags: needinfo?(jdemooij)

André Bargull [:anba]

Assignee

Comment 4

•

6 years ago

FWIW JSC and V8 also both optimise toLowerCase on the assembler level:

toUppercase, on the other hand, is not optimised in their compilers.

Zibi Braniecki [:zbraniecki][:gandalf]

Comment 5

•

6 years ago

Raph Levien recently wrote a crate for short ASCII string upper/lowercasing using bitmasks - https://github.com/zbraniecki/tinystr

Maybe we could use similar optimization assuming vast majority of real word cases are ASCII and <16 chars?

André Bargull [:anba]

Assignee

Comment 6

•

6 years ago

That looks like a different optimisation (converting multiple characters 4/8/16 at once), which additionally needs some pre-processing to know we're in the ASCII-only case. V8 does something similar for their string conversion (in addition to using lookup tables).

Jan de Mooij [:jandem]

Updated

•

6 years ago

Flags: needinfo?(jdemooij)

Phabricator Automation

Updated

•

6 years ago

Attachment #9076738 - Attachment description: Bug 1564347 - Part 2: Inline String.prototype.toLowerCase for small Latin-1 strings. r=tcampbell! → Bug 1564347 - Part 2: Inline String.prototype.toLowerCase for small Latin-1 strings. r=jandem!

André Bargull [:anba]

Assignee

Updated

•

5 years ago

Keywords: leave-open

Cristian Brindusan [:cbrindusan]

Comment 7

•

5 years ago

:anba, this patch failed t oland, can you please take a look?
"We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again. applying /tmp/tmph89jog js/src/builtin/String.cpp Hunk #1 FAILED at 706. Hunk #3 succeeded at 793 with fuzz 2 (offset -2 lines). 1 out of 3 hunks FAILED -- saving rejects to file js/src/builtin/String.cpp.rej abort: patch command failed: exited with status 256"

Flags: needinfo?(andrebargull)

André Bargull [:anba]

Assignee

Comment 8

•

5 years ago

(In reply to Cristian Brindusan [:cbrindusan] from comment #7)

:anba, this patch failed t oland, can you please take a look?

Ah, I just forgot to push the rebased changeset to Phabricator.

Flags: needinfo?(andrebargull)

Pulsebot

Comment 9

•

5 years ago

Pushed by nbeleuzu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b99041c4bea3 Part 1: Add a lookup table for Latin-1 lower-case conversion. r=tcampbell

Cristian Brindusan [:cbrindusan]

Comment 10

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b99041c4bea3

BugBot [:suhaib / :marco/ :calixte]

Comment 11

•

5 years ago

The leave-open keyword is there and there is no activity for 6 months.
:sdetar, maybe it's time to close this bug?

Flags: needinfo?(sdetar)

Steven DeTar [:sdetar]

Comment 12

•

5 years ago

Jason, should we still leave this bug open?

Flags: needinfo?(sdetar) → needinfo?(jorendorff)

Jan de Mooij [:jandem]

Comment 13

•

5 years ago

(In reply to Steven DeTar [:sdetar] from comment #12)

Jason, should we still leave this bug open?

André still has a patch up for review that we should consider landing.

Flags: needinfo?(jorendorff)

BugBot [:suhaib / :marco/ :calixte]

Comment 14

•

4 years ago

The leave-open keyword is there and there is no activity for 6 months.
:sdetar, maybe it's time to close this bug?

Flags: needinfo?(sdetar)

Steven DeTar [:sdetar]

Updated

•

4 years ago

Flags: needinfo?(sdetar)

BugBot [:suhaib / :marco/ :calixte]

Comment 15

•

4 years ago

The leave-open keyword is there and there is no activity for 6 months.
:sdetar, maybe it's time to close this bug?

Flags: needinfo?(sdetar)

Steven DeTar [:sdetar]

Updated

•

4 years ago

Flags: needinfo?(sdetar)

BugBot [:suhaib / :marco/ :calixte]

Comment 16

•

3 years ago

The leave-open keyword is there and there is no activity for 6 months.
:sdetar, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(sdetar)

Steven DeTar [:sdetar]

Updated

•

3 years ago

Severity: normal → S4

Flags: needinfo?(sdetar)

Priority: P1 → P3

André Bargull [:anba]

Assignee

Comment 17

•

3 years ago

I've did some additional testing to see how this change affects benchmarks and websites.

Where	Calls	Input	Can Optimise	Note
Speedometer 2	770'000	All lower, length ≤ `MAX_LENGTH_LATIN1`	Yes
JetStream 2	3'780'000	All lower, length ≤ `MAX_LENGTH_LATIN1`	Yes
JetStream 2	20'000	Ropes	No
Web Tooling	270'000	All lower, length ≤ `MAX_LENGTH_LATIN1`	Yes
Web Tooling	50'000	Two-byte	No
SunSpider	20'000	All lower, length ≤ `MAX_LENGTH_LATIN1`	Yes
reddit.com (loading start page, no scrolling)	50'000	All lower, length ≤ `MAX_LENGTH_LATIN1`	Yes
reddit.com (loading start page, no scrolling)	30'000	All lower, length > `MAX_LENGTH_LATIN1`	Yes
reddit.com (video loading)	~10'000 per sec	All lower, length ≤ `MAX_LENGTH_LATIN1`	Yes
reddit.com (video loading)	~10'000 per sec	All lower, length > `MAX_LENGTH_LATIN1`	Yes	Depends on video, also saw 10K only every 3 secs
reddit.com (scrolling down for 1 min)	610'000	All lower, length ≤ `MAX_LENGTH_LATIN1`	Yes
reddit.com (scrolling down for 1 min)	560'000	All lower, length > `MAX_LENGTH_LATIN1`	Yes

Jan de Mooij [:jandem]

Comment 18

•

3 years ago

André, do you have an updated patch for part 2? Given the results in comment 17 and other engines having a similar optimization, we should probably land it.

André Bargull [:anba]

Assignee

Comment 19

•

3 years ago

Attached file Bug 1564347 - Part 2: Add AllocateThinOrFatInlineString. r=jandem! — Details

Move inline string allocation into a separate function.

Phabricator Automation

Updated

•

3 years ago

Attachment #9076738 - Attachment description: Bug 1564347 - Part 2: Inline String.prototype.toLowerCase for small Latin-1 strings. r=jandem! → Bug 1564347 - Part 3: Inline String.prototype.toLowerCase for small Latin-1 strings. r=jandem!

André Bargull [:anba]

Assignee

Comment 20

•

3 years ago

Sure, I've uploaded the rebased patches.

André Bargull [:anba]

Assignee

Comment 21

•

3 years ago

Attached file Bug 1564347 - Part 4: Add MacroAssembler::branch8(BaseIndex, Register). r=jandem! — Details

Depends on D37377

André Bargull [:anba]

Assignee

Comment 22

•

3 years ago

Attached file Bug 1564347 - Part 5: Use branch8 in CodeGenerator::visitStringToLowerCase(). r=jandem! — Details

Depends on D155607

André Bargull [:anba]

Assignee

Comment 23

•

3 years ago

Attached file Bug 1564347 - Part 6: Use eax encoding for cmpb when possible. r=jandem! — Details

I noticed this was missing when I added the cmpb methods in part 4.

Depends on D155608

Pulsebot

Comment 24

•

3 years ago

Pushed by andre.bargull@gmail.com: https://hg.mozilla.org/integration/autoland/rev/bd647f575382 Part 2: Add AllocateThinOrFatInlineString. r=jandem https://hg.mozilla.org/integration/autoland/rev/5805802dc654 Part 3: Inline String.prototype.toLowerCase for small Latin-1 strings. r=jandem https://hg.mozilla.org/integration/autoland/rev/2710e38f2381 Part 4: Add MacroAssembler::branch8(BaseIndex, Register). r=jandem https://hg.mozilla.org/integration/autoland/rev/48b73980369d Part 5: Use branch8 in CodeGenerator::visitStringToLowerCase(). r=jandem https://hg.mozilla.org/integration/autoland/rev/402644d8b48d Part 6: Use eax encoding for cmpb when possible. r=jandem

Natalia Csoregi [:nataliaCs]

Comment 25

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/bd647f575382
https://hg.mozilla.org/mozilla-central/rev/5805802dc654
https://hg.mozilla.org/mozilla-central/rev/2710e38f2381
https://hg.mozilla.org/mozilla-central/rev/48b73980369d
https://hg.mozilla.org/mozilla-central/rev/402644d8b48d

André Bargull [:anba]

Assignee

Updated

•

3 years ago

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

status-firefox106: --- → fixed

Keywords: leave-open

Resolution: --- → FIXED

Target Milestone: --- → 106 Branch

Ryan VanderMeulen [:RyanVM]

Updated

•

3 years ago

status-firefox70: affected → wontfix

You need to log in before you can comment on or make changes to this bug.