Use a lookup table for String.prototype.toLowerCase and inline it
Categories
(Core :: JavaScript Engine: JIT, enhancement, P3)
Tracking
()
People
(Reporter: anba, Assigned: anba)
Details
Attachments
(6 files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
jorendorff recently mentioned toLowerCase
is still too slow: https://mozilla.logbot.info/jsapi/20190404#c16174991-c16175030.
At least for Latin-1 strings we can do better by using a lookup table for the lower-case operation.
Note: The upper-case operation can't easily be optimised through a lookup table, because U+00B5, U+00DF, and U+00FF don't map to Latin-1 characters resp. map to a sequence of characters:
- UpperCase(U+00B5) = U+039C
- UpperCase(U+00DF) = U+0053 U+0053
- UpperCase(U+00FF) = U+0178
A lookup table alone leads to a ~15% speed-up in µ-benchmarks, because at least on x86 it can be expressed as two mov
instructions.
Based on our current implementation:
#include <cstddef>
#include <cstdint>
class CharacterInfo {
public:
uint16_t upperCase;
uint16_t lowerCase;
uint8_t flags;
};
extern const uint8_t index1[];
extern const uint8_t index2[];
extern const CharacterInfo js_charinfo[];
inline const CharacterInfo& CharInfo(char16_t code) {
const size_t shift = 6;
size_t index = index1[code >> shift];
index = index2[(index << shift) + (code & ((1 << shift) - 1))];
return js_charinfo[index];
}
char16_t ToLowerCase(char16_t ch) {
if (ch < 128) {
if (ch >= 'A' && ch <= 'Z') {
return ch + ('a' - 'A');
}
return ch;
}
const CharacterInfo& info = CharInfo(ch);
return uint16_t(ch) + info.lowerCase;
}
godbolt returns the following assembly for clang-8 with O3:
ToLowerCase(char16_t): # @ToLowerCase(char16_t)
cmp di, 127
ja .LBB0_2
lea ecx, [rdi - 65]
lea eax, [rdi + 32]
cmp cx, 26
cmovae eax, edi
ret
.LBB0_2:
movzx eax, di
shr rax, 6
movzx eax, byte ptr [rax + index1]
shl rax, 6
mov ecx, edi
and ecx, 63
movzx eax, byte ptr [rax + rcx + index2]
lea rax, [rax + 2*rax]
add di, word ptr [rax + rax + js_charinfo+2]
mov eax, edi
ret
Whereas a lookup table:
using Latin1Char = unsigned char;
extern const Latin1Char latin1ToLowerCaseTable[];
Latin1Char ToLowerCase(Latin1Char ch) { return latin1ToLowerCaseTable[ch]; }
Gives this assembler code:
ToLowerCase(unsigned char): # @ToLowerCase(unsigned char)
mov eax, edi
mov al, byte ptr [rax + latin1ToLowerCaseTable]
ret
In addition to that, a lookup table also makes it easier to inline the complete lower-case operation in Ion.
Both optimisations, the lookup table and inlining the operation, result in the following improvements:
// Improves from ~495ms to ~245ms.
function test1() {
var xs = ["a".repeat(20), "b".repeat(20)];
var q = 0;
var t = dateNow();
for (var i = 0; i < 10000000; ++i) {
q += xs[i&1].toLowerCase().length;
}
print(dateNow() - t, q);
}
// Improves from ~1040ms to ~325ms.
function test2() {
var xs = ["A".repeat(20), "B".repeat(20)];
var q = 0;
var t = dateNow();
for (var i = 0; i < 10000000; ++i) {
q += xs[i&1].toLowerCase().length;
}
print(dateNow() - t, q);
}
Assignee | ||
Comment 1•6 years ago
|
||
Using a lookup table for Latin-1 lower-case conversion allows the compiler to
emit the sequence mov, cmp, setne
for ChangesWhenUpperCased resp. two mov
instructions for ToLowerCase. This makes it faster than the current approach
which requires multiple instructions for both operations.
Latin-1 upper-case conversion wasn't change to use a lookup table, because
not all Latin-1 characters have an upper-case representation which is also a
single Latin-1 character, cf. the conversion for U+00B5, U+00DF, and U+00FF.
Assignee | ||
Comment 2•6 years ago
|
||
Depends on D37376
Updated•6 years ago
|
Comment 3•6 years ago
|
||
I'm a little hesitant to take the second Ion patch. We've been trying to move away from taking more complexity in Ion-only optimizations these days.
Jan, any thoughts here? The micro-benchmark shows nice results but this is a niche function and the CodeGenerator code is non-trivial. On the other hand, the codegen changes are mostly straight assembly and don't rely on TI, etc.
Assignee | ||
Comment 4•6 years ago
|
||
FWIW JSC and V8 also both optimise toLowerCase
on the assembler level:
toUppercase
, on the other hand, is not optimised in their compilers.
Comment 5•6 years ago
|
||
Raph Levien recently wrote a crate for short ASCII string upper/lowercasing using bitmasks - https://github.com/zbraniecki/tinystr
Maybe we could use similar optimization assuming vast majority of real word cases are ASCII and <16 chars?
Assignee | ||
Comment 6•6 years ago
|
||
That looks like a different optimisation (converting multiple characters 4/8/16 at once), which additionally needs some pre-processing to know we're in the ASCII-only case. V8 does something similar for their string conversion (in addition to using lookup tables).
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Updated•5 years ago
|
Comment 7•5 years ago
|
||
:anba, this patch failed t oland, can you please take a look?
"We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again. applying /tmp/tmph89jog js/src/builtin/String.cpp Hunk #1 FAILED at 706. Hunk #3 succeeded at 793 with fuzz 2 (offset -2 lines). 1 out of 3 hunks FAILED -- saving rejects to file js/src/builtin/String.cpp.rej abort: patch command failed: exited with status 256"
Assignee | ||
Comment 8•5 years ago
|
||
(In reply to Cristian Brindusan [:cbrindusan] from comment #7)
:anba, this patch failed t oland, can you please take a look?
Ah, I just forgot to push the rebased changeset to Phabricator.
Comment 10•5 years ago
|
||
bugherder |
Comment 11•5 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:sdetar, maybe it's time to close this bug?
Comment 12•5 years ago
|
||
Jason, should we still leave this bug open?
Comment 13•5 years ago
|
||
(In reply to Steven DeTar [:sdetar] from comment #12)
Jason, should we still leave this bug open?
André still has a patch up for review that we should consider landing.
Comment 14•4 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:sdetar, maybe it's time to close this bug?
Updated•4 years ago
|
Comment 15•4 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:sdetar, maybe it's time to close this bug?
Updated•4 years ago
|
Comment 16•3 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:sdetar, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.
Updated•3 years ago
|
Assignee | ||
Comment 17•3 years ago
|
||
I've did some additional testing to see how this change affects benchmarks and websites.
Where | Calls | Input | Can Optimise | Note |
---|---|---|---|---|
Speedometer 2 | 770'000 | All lower, length ≤ MAX_LENGTH_LATIN1 |
Yes | |
JetStream 2 | 3'780'000 | All lower, length ≤ MAX_LENGTH_LATIN1 |
Yes | |
JetStream 2 | 20'000 | Ropes | No | |
Web Tooling | 270'000 | All lower, length ≤ MAX_LENGTH_LATIN1 |
Yes | |
Web Tooling | 50'000 | Two-byte | No | |
SunSpider | 20'000 | All lower, length ≤ MAX_LENGTH_LATIN1 |
Yes | |
reddit.com (loading start page, no scrolling) | 50'000 | All lower, length ≤ MAX_LENGTH_LATIN1 |
Yes | |
reddit.com (loading start page, no scrolling) | 30'000 | All lower, length > MAX_LENGTH_LATIN1 |
Yes | |
reddit.com (video loading) | ~10'000 per sec | All lower, length ≤ MAX_LENGTH_LATIN1 |
Yes | |
reddit.com (video loading) | ~10'000 per sec | All lower, length > MAX_LENGTH_LATIN1 |
Yes | Depends on video, also saw 10K only every 3 secs |
reddit.com (scrolling down for 1 min) | 610'000 | All lower, length ≤ MAX_LENGTH_LATIN1 |
Yes | |
reddit.com (scrolling down for 1 min) | 560'000 | All lower, length > MAX_LENGTH_LATIN1 |
Yes |
Comment 18•3 years ago
|
||
André, do you have an updated patch for part 2? Given the results in comment 17 and other engines having a similar optimization, we should probably land it.
Assignee | ||
Comment 19•3 years ago
|
||
Move inline string allocation into a separate function.
Updated•3 years ago
|
Assignee | ||
Comment 20•3 years ago
|
||
Sure, I've uploaded the rebased patches.
Assignee | ||
Comment 21•3 years ago
|
||
Depends on D37377
Assignee | ||
Comment 22•3 years ago
|
||
Depends on D155607
Assignee | ||
Comment 23•3 years ago
|
||
I noticed this was missing when I added the cmpb
methods in part 4.
Depends on D155608
Comment 24•3 years ago
|
||
Comment 25•3 years ago
|
||
bugherder |
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Description
•