Closed
Bug 1383647
Opened 8 years ago
Closed 8 years ago
See if there's more to optimize in str_toLowerCase for small, Latin-1 strings which are already lower-cased
Categories
(Core :: JavaScript: Standard Library, defect)
Core
JavaScript: Standard Library
Tracking
()
RESOLVED
FIXED
mozilla57
| Tracking | Status | |
|---|---|---|
| firefox57 | --- | fixed |
People
(Reporter: anba, Assigned: anba)
References
Details
Attachments
(2 files, 2 obsolete files)
|
21.42 KB,
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
|
5.34 KB,
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
Speedometer call counts per gdb breakpoints (*) (filtered with |if (str->d.u1.flags&63)==32| because I wanted to check for the presence of external strings).
Name Non-External External
Vanilla: 800 200
React: 3184 2011
React-Redux: 2724 1710
Ember: 4641 10
Backbone: 1135 302
AngularJS: 43485 1370
Angular2: 117 2
Vue: 1277 2
jQuery: 1027 200
Preact: 18 0
Inferno: 32407 0
Elm: 0 0
Flight: 2928 1301
So most calls are from AngularJS and Inferno
AngularJS:
Num Type Disp Enb Address What
15 breakpoint keep y 0x00007fffe8d37729 in js::str_toLowerCase(JSContext*, unsigned int, JS::Value*) at /home/andre/hg/mozilla-inbound/js/src/jsstr.cpp:1013
stop only if result==linear.ptr
breakpoint already hit 41960 times
16 breakpoint keep y 0x00007fffe8d37729 in js::str_toLowerCase(JSContext*, unsigned int, JS::Value*) at /home/andre/hg/mozilla-inbound/js/src/jsstr.cpp:1013
breakpoint already hit 43746 times
Inferno:
Num Type Disp Enb Address What
15 breakpoint keep y 0x00007fffe8d37729 in js::str_toLowerCase(JSContext*, unsigned int, JS::Value*) at /home/andre/hg/mozilla-inbound/js/src/jsstr.cpp:1013
stop only if result==linear.ptr
breakpoint already hit 21124 times
16 breakpoint keep y 0x00007fffe8d37729 in js::str_toLowerCase(JSContext*, unsigned int, JS::Value*) at /home/andre/hg/mozilla-inbound/js/src/jsstr.cpp:1013
breakpoint already hit 32590 times
Most AngularJS calls come from here [1] with |"false".toLowerCase()| and |"true".toLowerCase()|, so we'd need to optimize traversing a small, Latin-1 string which is already lower-cased. :-/
It still could be useful to search for optimization possibilities given that str_toLowerCase if the fourth most called built-in per bug 1365361.
(*) All frameworks run once from InteractiveRunner.html
[1] https://github.com/WebKit/webkit/blob/80b1e93d42aac34fddd652bb6f7749d26a265d36/PerformanceTests/Speedometer/resources/todomvc/architecture-examples/angularjs/node_modules/angular/angular.js#L18303-L18305
Comment 1•8 years ago
|
||
For the angular case, maybe we can fold toLowerCase in Ion?
| Assignee | ||
Comment 2•8 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #1)
> For the angular case, maybe we can fold toLowerCase in Ion?
Unfortunately the input strings aren't constant, so I don't think we can just constant-fold the calls. (Or did I misunderstand "fold toLowerCase in Ion"?)
The AngularJS code looks like:
actual = lowercase('' + actual);
expected = lowercase('' + expected);
return actual.indexOf(expected) !== -1;
where the "lowercase" function is defined as |var lowercase = function(string) {return isString(string) ? string.toLowerCase() : string;};|.
I was able to improve the performance it a bit by doing more or less useful things like:
- Manually inlining unicode::CanLowerCase and then adding MOZ_LIKELY around the ASCII case.
- And removing the two roots in str_toLowerCase. (Rooting analysis even seems to be okay with that change: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2d1821b2be1c116181b2667c25ec93c3a6a2a91f)
These two changes improved this test case from ~170ms to ~140ms for me. For comparison: V8 needs ~115ms, Chakra ~120ms, and JSC 55ms (https://bugs.webkit.org/show_bug.cgi?id=162887).
function f() {
var ss = ["true", "false"];
var q = 0;
var t = dateNow();
for (var i = 0; i < 5000000; ++i) {
q += ss[i & 1].toLowerCase().length;
}
return [dateNow() - t, q];
}
for (var i = 0; i < 10; ++i) print(f());
Comment 3•8 years ago
|
||
(In reply to André Bargull from comment #2)
> Unfortunately the input strings aren't constant, so I don't think we can
> just constant-fold the calls. (Or did I misunderstand "fold toLowerCase in
> Ion"?)
I was thinking maybe we know more about the types/strings if Ion is able to inline the "comparator" function and then inlines the "lowercase" function as well, but we probably won't have good enough type information anyway.
I wonder what perf would look like with a MIR instruction that just calls into the VM. In your micro-benchmark it would also make it easier for Ion to optimize the array accesses because then it knows the call cannot affect the array length.
| Assignee | ||
Comment 4•8 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #3)
> I wonder what perf would look like with a MIR instruction that just calls
> into the VM. In your micro-benchmark it would also make it easier for Ion to
> optimize the array accesses because then it knows the call cannot affect the
> array length.
Paired with MOZ_LIKELY for the ASCII case, it improves the µ-benchmark to ~95ms!
| Assignee | ||
Updated•8 years ago
|
Assignee: nobody → andrebargull
Status: NEW → ASSIGNED
| Assignee | ||
Comment 5•8 years ago
|
||
I've changed StringToLowerCase/StringToUpperCase to take a HandleString instead of a HandleLinearString, so we can directly call the function from the generated code. Apart from that, the patch simply adds a vm-call for str_toLowerCase/str_toUpperCase.
Attachment #8892433 -
Flags: review?(jdemooij)
| Assignee | ||
Comment 6•8 years ago
|
||
If we want to squeeze out a few more ms, we could add specialised CanUpperCase/CanLowerCase functions for Latin-1 characters:
If we assume most Latin-1 strings are actually ASCII strings, we can add MOZ_LIKELY around the ASCII case. And to make non-ASCII, Latin-1 strings a bit faster, we can avoid the CharInfo lookup by using some mild bit-twiddling. (Is this still acceptable or is already "too clever"?)
Attachment #8892437 -
Flags: review?(jdemooij)
| Assignee | ||
Comment 7•8 years ago
|
||
(In reply to André Bargull from comment #5)
> Created attachment 8892433 [details] [diff] [review]
> bug1383647-part1-vmcall-from-ion.patch
Note to self: This patch doesn't apply cleanly on inbound.
| Assignee | ||
Comment 8•8 years ago
|
||
Comment on attachment 8892437 [details] [diff] [review]
bug1383647-part2-latin1-specialization.patch
Review of attachment 8892437 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/vm/Unicode.h
@@ +299,5 @@
> }
>
> +// Returns true iff ToUpperCase(ch) != ch.
> +inline bool
> +CanUpperCase(Latin1Char ch)
JS::Latin1Char here and below
| Assignee | ||
Comment 9•8 years ago
|
||
Comment on attachment 8892433 [details] [diff] [review]
bug1383647-part1-vmcall-from-ion.patch
Review of attachment 8892433 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/MIR.h
@@ +7501,5 @@
> + TRIVIAL_NEW_WRAPPERS
> + NAMED_OPERANDS((0, string))
> +
> + bool congruentTo(const MDefinition* ins) const override {
> + return congruentIfOperandsEqual(ins);
Also needs to test that |mode_| matches.
| Assignee | ||
Comment 10•8 years ago
|
||
Try-servering found two small issues, noted above. I'll probably post fixed patches later today.
| Assignee | ||
Comment 11•8 years ago
|
||
Updated part 1 so it applies cleanly on inbound and to fix the issue noted in comment #9.
Attachment #8892433 -
Attachment is obsolete: true
Attachment #8892433 -
Flags: review?(jdemooij)
Attachment #8892576 -
Flags: review?(jdemooij)
| Assignee | ||
Comment 12•8 years ago
|
||
Updated part two to fix the comment described in comment #8.
Attachment #8892437 -
Attachment is obsolete: true
Attachment #8892437 -
Flags: review?(jdemooij)
Attachment #8892577 -
Flags: review?(jdemooij)
| Assignee | ||
Comment 13•8 years ago
|
||
Comment 14•8 years ago
|
||
Comment on attachment 8892576 [details] [diff] [review]
bug1383647-part1-vmcall-from-ion.patch
Review of attachment 8892576 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks, LGTM. Sorry for the delay.
Attachment #8892576 -
Flags: review?(jdemooij) → review+
Comment 15•8 years ago
|
||
Comment on attachment 8892577 [details] [diff] [review]
bug1383647-part2-latin1-specialization.patch
Review of attachment 8892577 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/vm/Unicode.h
@@ +307,5 @@
> +
> + // U+00B5 and U+00E0 to U+00FF, except U+00F7, have an uppercase form.
> + bool canUpper = ch == MICRO_SIGN ||
> + (((ch & ~0x1F) == LATIN_SMALL_LETTER_A_WITH_GRAVE) && ch != DIVISION_SIGN);
> + MOZ_ASSERT(canUpper == CanUpperCase(char16_t(ch)));
Yeah this is almost too clever :) but there are only 128 char possibilities and this assert verifies all cases are handled correctly, so IMO this is fine.
Attachment #8892577 -
Flags: review?(jdemooij) → review+
| Assignee | ||
Comment 16•8 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #15)
> ::: js/src/vm/Unicode.h
> @@ +307,5 @@
> > +
> > + // U+00B5 and U+00E0 to U+00FF, except U+00F7, have an uppercase form.
> > + bool canUpper = ch == MICRO_SIGN ||
> > + (((ch & ~0x1F) == LATIN_SMALL_LETTER_A_WITH_GRAVE) && ch != DIVISION_SIGN);
> > + MOZ_ASSERT(canUpper == CanUpperCase(char16_t(ch)));
>
> Yeah this is almost too clever :) but there are only 128 char possibilities
> and this assert verifies all cases are handled correctly, so IMO this is
> fine.
TBH I think the test in CanLowerCase is even worse: and'ing |ch & 0xD7| to exclude U+00D7 and U+00DF. :-)
| Assignee | ||
Comment 17•8 years ago
|
||
Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8997b7d045fca1be174dfedb8d14aa563e09a8bf
Keywords: checkin-needed
Comment 18•8 years ago
|
||
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a0e69aaf7f47
Part 1: Use direct vm calls for String.prototype.toLower/UpperCase. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/68ba6a81d15d
Part 2: Add unicode::Can{Lower,Upper}Case specialized for Latin1 characters. r=jandem
Keywords: checkin-needed
Comment 19•8 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/a0e69aaf7f47
https://hg.mozilla.org/mozilla-central/rev/68ba6a81d15d
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in
before you can comment on or make changes to this bug.
Description
•