Closed Bug 1383647 Opened 8 years ago Closed 8 years ago

See if there's more to optimize in str_toLowerCase for small, Latin-1 strings which are already lower-cased

Categories

(Core :: JavaScript: Standard Library, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: anba, Assigned: anba)

References

Details

Attachments

(2 files, 2 obsolete files)

Speedometer call counts per gdb breakpoints (*) (filtered with |if (str->d.u1.flags&63)==32| because I wanted to check for the presence of external strings). Name Non-External External Vanilla: 800 200 React: 3184 2011 React-Redux: 2724 1710 Ember: 4641 10 Backbone: 1135 302 AngularJS: 43485 1370 Angular2: 117 2 Vue: 1277 2 jQuery: 1027 200 Preact: 18 0 Inferno: 32407 0 Elm: 0 0 Flight: 2928 1301 So most calls are from AngularJS and Inferno AngularJS: Num Type Disp Enb Address What 15 breakpoint keep y 0x00007fffe8d37729 in js::str_toLowerCase(JSContext*, unsigned int, JS::Value*) at /home/andre/hg/mozilla-inbound/js/src/jsstr.cpp:1013 stop only if result==linear.ptr breakpoint already hit 41960 times 16 breakpoint keep y 0x00007fffe8d37729 in js::str_toLowerCase(JSContext*, unsigned int, JS::Value*) at /home/andre/hg/mozilla-inbound/js/src/jsstr.cpp:1013 breakpoint already hit 43746 times Inferno: Num Type Disp Enb Address What 15 breakpoint keep y 0x00007fffe8d37729 in js::str_toLowerCase(JSContext*, unsigned int, JS::Value*) at /home/andre/hg/mozilla-inbound/js/src/jsstr.cpp:1013 stop only if result==linear.ptr breakpoint already hit 21124 times 16 breakpoint keep y 0x00007fffe8d37729 in js::str_toLowerCase(JSContext*, unsigned int, JS::Value*) at /home/andre/hg/mozilla-inbound/js/src/jsstr.cpp:1013 breakpoint already hit 32590 times Most AngularJS calls come from here [1] with |"false".toLowerCase()| and |"true".toLowerCase()|, so we'd need to optimize traversing a small, Latin-1 string which is already lower-cased. :-/ It still could be useful to search for optimization possibilities given that str_toLowerCase if the fourth most called built-in per bug 1365361. (*) All frameworks run once from InteractiveRunner.html [1] https://github.com/WebKit/webkit/blob/80b1e93d42aac34fddd652bb6f7749d26a265d36/PerformanceTests/Speedometer/resources/todomvc/architecture-examples/angularjs/node_modules/angular/angular.js#L18303-L18305
For the angular case, maybe we can fold toLowerCase in Ion?
(In reply to Jan de Mooij [:jandem] from comment #1) > For the angular case, maybe we can fold toLowerCase in Ion? Unfortunately the input strings aren't constant, so I don't think we can just constant-fold the calls. (Or did I misunderstand "fold toLowerCase in Ion"?) The AngularJS code looks like: actual = lowercase('' + actual); expected = lowercase('' + expected); return actual.indexOf(expected) !== -1; where the "lowercase" function is defined as |var lowercase = function(string) {return isString(string) ? string.toLowerCase() : string;};|. I was able to improve the performance it a bit by doing more or less useful things like: - Manually inlining unicode::CanLowerCase and then adding MOZ_LIKELY around the ASCII case. - And removing the two roots in str_toLowerCase. (Rooting analysis even seems to be okay with that change: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2d1821b2be1c116181b2667c25ec93c3a6a2a91f) These two changes improved this test case from ~170ms to ~140ms for me. For comparison: V8 needs ~115ms, Chakra ~120ms, and JSC 55ms (https://bugs.webkit.org/show_bug.cgi?id=162887). function f() { var ss = ["true", "false"]; var q = 0; var t = dateNow(); for (var i = 0; i < 5000000; ++i) { q += ss[i & 1].toLowerCase().length; } return [dateNow() - t, q]; } for (var i = 0; i < 10; ++i) print(f());
(In reply to André Bargull from comment #2) > Unfortunately the input strings aren't constant, so I don't think we can > just constant-fold the calls. (Or did I misunderstand "fold toLowerCase in > Ion"?) I was thinking maybe we know more about the types/strings if Ion is able to inline the "comparator" function and then inlines the "lowercase" function as well, but we probably won't have good enough type information anyway. I wonder what perf would look like with a MIR instruction that just calls into the VM. In your micro-benchmark it would also make it easier for Ion to optimize the array accesses because then it knows the call cannot affect the array length.
(In reply to Jan de Mooij [:jandem] from comment #3) > I wonder what perf would look like with a MIR instruction that just calls > into the VM. In your micro-benchmark it would also make it easier for Ion to > optimize the array accesses because then it knows the call cannot affect the > array length. Paired with MOZ_LIKELY for the ASCII case, it improves the µ-benchmark to ~95ms!
Assignee: nobody → andrebargull
Status: NEW → ASSIGNED
I've changed StringToLowerCase/StringToUpperCase to take a HandleString instead of a HandleLinearString, so we can directly call the function from the generated code. Apart from that, the patch simply adds a vm-call for str_toLowerCase/str_toUpperCase.
Attachment #8892433 - Flags: review?(jdemooij)
If we want to squeeze out a few more ms, we could add specialised CanUpperCase/CanLowerCase functions for Latin-1 characters: If we assume most Latin-1 strings are actually ASCII strings, we can add MOZ_LIKELY around the ASCII case. And to make non-ASCII, Latin-1 strings a bit faster, we can avoid the CharInfo lookup by using some mild bit-twiddling. (Is this still acceptable or is already "too clever"?)
Attachment #8892437 - Flags: review?(jdemooij)
(In reply to André Bargull from comment #5) > Created attachment 8892433 [details] [diff] [review] > bug1383647-part1-vmcall-from-ion.patch Note to self: This patch doesn't apply cleanly on inbound.
Comment on attachment 8892437 [details] [diff] [review] bug1383647-part2-latin1-specialization.patch Review of attachment 8892437 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/Unicode.h @@ +299,5 @@ > } > > +// Returns true iff ToUpperCase(ch) != ch. > +inline bool > +CanUpperCase(Latin1Char ch) JS::Latin1Char here and below
Comment on attachment 8892433 [details] [diff] [review] bug1383647-part1-vmcall-from-ion.patch Review of attachment 8892433 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/MIR.h @@ +7501,5 @@ > + TRIVIAL_NEW_WRAPPERS > + NAMED_OPERANDS((0, string)) > + > + bool congruentTo(const MDefinition* ins) const override { > + return congruentIfOperandsEqual(ins); Also needs to test that |mode_| matches.
Try-servering found two small issues, noted above. I'll probably post fixed patches later today.
Updated part 1 so it applies cleanly on inbound and to fix the issue noted in comment #9.
Attachment #8892433 - Attachment is obsolete: true
Attachment #8892433 - Flags: review?(jdemooij)
Attachment #8892576 - Flags: review?(jdemooij)
Updated part two to fix the comment described in comment #8.
Attachment #8892437 - Attachment is obsolete: true
Attachment #8892437 - Flags: review?(jdemooij)
Attachment #8892577 - Flags: review?(jdemooij)
Comment on attachment 8892576 [details] [diff] [review] bug1383647-part1-vmcall-from-ion.patch Review of attachment 8892576 [details] [diff] [review]: ----------------------------------------------------------------- Thanks, LGTM. Sorry for the delay.
Attachment #8892576 - Flags: review?(jdemooij) → review+
Comment on attachment 8892577 [details] [diff] [review] bug1383647-part2-latin1-specialization.patch Review of attachment 8892577 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/Unicode.h @@ +307,5 @@ > + > + // U+00B5 and U+00E0 to U+00FF, except U+00F7, have an uppercase form. > + bool canUpper = ch == MICRO_SIGN || > + (((ch & ~0x1F) == LATIN_SMALL_LETTER_A_WITH_GRAVE) && ch != DIVISION_SIGN); > + MOZ_ASSERT(canUpper == CanUpperCase(char16_t(ch))); Yeah this is almost too clever :) but there are only 128 char possibilities and this assert verifies all cases are handled correctly, so IMO this is fine.
Attachment #8892577 - Flags: review?(jdemooij) → review+
(In reply to Jan de Mooij [:jandem] from comment #15) > ::: js/src/vm/Unicode.h > @@ +307,5 @@ > > + > > + // U+00B5 and U+00E0 to U+00FF, except U+00F7, have an uppercase form. > > + bool canUpper = ch == MICRO_SIGN || > > + (((ch & ~0x1F) == LATIN_SMALL_LETTER_A_WITH_GRAVE) && ch != DIVISION_SIGN); > > + MOZ_ASSERT(canUpper == CanUpperCase(char16_t(ch))); > > Yeah this is almost too clever :) but there are only 128 char possibilities > and this assert verifies all cases are handled correctly, so IMO this is > fine. TBH I think the test in CanLowerCase is even worse: and'ing |ch & 0xD7| to exclude U+00D7 and U+00DF. :-)
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/a0e69aaf7f47 Part 1: Use direct vm calls for String.prototype.toLower/UpperCase. r=jandem https://hg.mozilla.org/integration/mozilla-inbound/rev/68ba6a81d15d Part 2: Add unicode::Can{Lower,Upper}Case specialized for Latin1 characters. r=jandem
Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: