Closed Bug 1383647 Opened 8 years ago Closed 8 years ago

See if there's more to optimize in str_toLowerCase for small, Latin-1 strings which are already lower-cased

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla57

Tracking Flags:

Tracking

Status

firefox57

---

fixed

People

(Reporter: anba, Assigned: anba)

References

Details

Attachments

(2 files, 2 obsolete files)

bug1383647-part1-vmcall-from-ion.patch 8 years ago André Bargull [:anba] 21.38 KB, patch		Details \| Diff \| Splinter Review
bug1383647-part2-latin1-specialization.patch 8 years ago André Bargull [:anba] 5.33 KB, patch		Details \| Diff \| Splinter Review
bug1383647-part1-vmcall-from-ion.patch 8 years ago André Bargull [:anba] 21.42 KB, patch	jandem : review+	Details \| Diff \| Splinter Review
bug1383647-part2-latin1-specialization.patch 8 years ago André Bargull [:anba] 5.34 KB, patch	jandem : review+	Details \| Diff \| Splinter Review

André Bargull [:anba]

Assignee

Description

•

8 years ago

Speedometer call counts per gdb breakpoints (*) (filtered with |if (str->d.u1.flags&63)==32| because I wanted to check for the presence of external strings). Name Non-External External Vanilla: 800 200 React: 3184 2011 React-Redux: 2724 1710 Ember: 4641 10 Backbone: 1135 302 AngularJS: 43485 1370 Angular2: 117 2 Vue: 1277 2 jQuery: 1027 200 Preact: 18 0 Inferno: 32407 0 Elm: 0 0 Flight: 2928 1301 So most calls are from AngularJS and Inferno AngularJS: Num Type Disp Enb Address What 15 breakpoint keep y 0x00007fffe8d37729 in js::str_toLowerCase(JSContext*, unsigned int, JS::Value*) at /home/andre/hg/mozilla-inbound/js/src/jsstr.cpp:1013 stop only if result==linear.ptr breakpoint already hit 41960 times 16 breakpoint keep y 0x00007fffe8d37729 in js::str_toLowerCase(JSContext*, unsigned int, JS::Value*) at /home/andre/hg/mozilla-inbound/js/src/jsstr.cpp:1013 breakpoint already hit 43746 times Inferno: Num Type Disp Enb Address What 15 breakpoint keep y 0x00007fffe8d37729 in js::str_toLowerCase(JSContext*, unsigned int, JS::Value*) at /home/andre/hg/mozilla-inbound/js/src/jsstr.cpp:1013 stop only if result==linear.ptr breakpoint already hit 21124 times 16 breakpoint keep y 0x00007fffe8d37729 in js::str_toLowerCase(JSContext*, unsigned int, JS::Value*) at /home/andre/hg/mozilla-inbound/js/src/jsstr.cpp:1013 breakpoint already hit 32590 times Most AngularJS calls come from here [1] with |"false".toLowerCase()| and |"true".toLowerCase()|, so we'd need to optimize traversing a small, Latin-1 string which is already lower-cased. :-/ It still could be useful to search for optimization possibilities given that str_toLowerCase if the fourth most called built-in per bug 1365361. (*) All frameworks run once from InteractiveRunner.html [1] https://github.com/WebKit/webkit/blob/80b1e93d42aac34fddd652bb6f7749d26a265d36/PerformanceTests/Speedometer/resources/todomvc/architecture-examples/angularjs/node_modules/angular/angular.js#L18303-L18305

Jan de Mooij [:jandem]

Comment 1

•

8 years ago

For the angular case, maybe we can fold toLowerCase in Ion?

André Bargull [:anba]

Assignee

Comment 2

•

8 years ago

(In reply to Jan de Mooij [:jandem] from comment #1) > For the angular case, maybe we can fold toLowerCase in Ion? Unfortunately the input strings aren't constant, so I don't think we can just constant-fold the calls. (Or did I misunderstand "fold toLowerCase in Ion"?) The AngularJS code looks like: actual = lowercase('' + actual); expected = lowercase('' + expected); return actual.indexOf(expected) !== -1; where the "lowercase" function is defined as |var lowercase = function(string) {return isString(string) ? string.toLowerCase() : string;};|. I was able to improve the performance it a bit by doing more or less useful things like: - Manually inlining unicode::CanLowerCase and then adding MOZ_LIKELY around the ASCII case. - And removing the two roots in str_toLowerCase. (Rooting analysis even seems to be okay with that change: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2d1821b2be1c116181b2667c25ec93c3a6a2a91f) These two changes improved this test case from ~170ms to ~140ms for me. For comparison: V8 needs ~115ms, Chakra ~120ms, and JSC 55ms (https://bugs.webkit.org/show_bug.cgi?id=162887). function f() { var ss = ["true", "false"]; var q = 0; var t = dateNow(); for (var i = 0; i < 5000000; ++i) { q += ss[i & 1].toLowerCase().length; } return [dateNow() - t, q]; } for (var i = 0; i < 10; ++i) print(f());

Jan de Mooij [:jandem]

Comment 3

•

8 years ago

(In reply to André Bargull from comment #2) > Unfortunately the input strings aren't constant, so I don't think we can > just constant-fold the calls. (Or did I misunderstand "fold toLowerCase in > Ion"?) I was thinking maybe we know more about the types/strings if Ion is able to inline the "comparator" function and then inlines the "lowercase" function as well, but we probably won't have good enough type information anyway. I wonder what perf would look like with a MIR instruction that just calls into the VM. In your micro-benchmark it would also make it easier for Ion to optimize the array accesses because then it knows the call cannot affect the array length.

André Bargull [:anba]

Assignee

Comment 4

•

8 years ago

(In reply to Jan de Mooij [:jandem] from comment #3) > I wonder what perf would look like with a MIR instruction that just calls > into the VM. In your micro-benchmark it would also make it easier for Ion to > optimize the array accesses because then it knows the call cannot affect the > array length. Paired with MOZ_LIKELY for the ASCII case, it improves the µ-benchmark to ~95ms!

André Bargull [:anba]

Assignee

Updated

•

8 years ago

Assignee: nobody → andrebargull

Status: NEW → ASSIGNED

André Bargull [:anba]

Assignee

Comment 5

•

8 years ago

Attached patch bug1383647-part1-vmcall-from-ion.patch (obsolete) — Details — Splinter Review

I've changed StringToLowerCase/StringToUpperCase to take a HandleString instead of a HandleLinearString, so we can directly call the function from the generated code. Apart from that, the patch simply adds a vm-call for str_toLowerCase/str_toUpperCase.

Attachment #8892433 - Flags: review?(jdemooij)

André Bargull [:anba]

Assignee

Comment 6

•

8 years ago

Attached patch bug1383647-part2-latin1-specialization.patch (obsolete) — Details — Splinter Review

If we want to squeeze out a few more ms, we could add specialised CanUpperCase/CanLowerCase functions for Latin-1 characters: If we assume most Latin-1 strings are actually ASCII strings, we can add MOZ_LIKELY around the ASCII case. And to make non-ASCII, Latin-1 strings a bit faster, we can avoid the CharInfo lookup by using some mild bit-twiddling. (Is this still acceptable or is already "too clever"?)

Attachment #8892437 - Flags: review?(jdemooij)

André Bargull [:anba]

Assignee

Comment 7

•

8 years ago

(In reply to André Bargull from comment #5) > Created attachment 8892433 [details] [diff] [review] > bug1383647-part1-vmcall-from-ion.patch Note to self: This patch doesn't apply cleanly on inbound.

André Bargull [:anba]

Assignee

Comment 8

•

8 years ago

Comment on attachment 8892437 [details] [diff] [review] bug1383647-part2-latin1-specialization.patch Review of attachment 8892437 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/Unicode.h @@ +299,5 @@ > } > > +// Returns true iff ToUpperCase(ch) != ch. > +inline bool > +CanUpperCase(Latin1Char ch) JS::Latin1Char here and below

André Bargull [:anba]

Assignee

Comment 9

•

8 years ago

Comment on attachment 8892433 [details] [diff] [review] bug1383647-part1-vmcall-from-ion.patch Review of attachment 8892433 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/MIR.h @@ +7501,5 @@ > + TRIVIAL_NEW_WRAPPERS > + NAMED_OPERANDS((0, string)) > + > + bool congruentTo(const MDefinition* ins) const override { > + return congruentIfOperandsEqual(ins); Also needs to test that |mode_| matches.

André Bargull [:anba]

Assignee

Comment 10

•

8 years ago

Try-servering found two small issues, noted above. I'll probably post fixed patches later today.

André Bargull [:anba]

Assignee

Comment 11

•

8 years ago

Attached patch bug1383647-part1-vmcall-from-ion.patch — Details — Splinter Review

Updated part 1 so it applies cleanly on inbound and to fix the issue noted in comment #9.

Attachment #8892433 - Attachment is obsolete: true

Attachment #8892433 - Flags: review?(jdemooij)

Attachment #8892576 - Flags: review?(jdemooij)

André Bargull [:anba]

Assignee

Comment 12

•

8 years ago

Attached patch bug1383647-part2-latin1-specialization.patch — Details — Splinter Review

Updated part two to fix the comment described in comment #8.

Attachment #8892437 - Attachment is obsolete: true

Attachment #8892437 - Flags: review?(jdemooij)

Attachment #8892577 - Flags: review?(jdemooij)

André Bargull [:anba]

Assignee

Comment 13

•

8 years ago

Try looks good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8997b7d045fca1be174dfedb8d14aa563e09a8bf

Jan de Mooij [:jandem]

Comment 14

•

8 years ago

Comment on attachment 8892576 [details] [diff] [review] bug1383647-part1-vmcall-from-ion.patch Review of attachment 8892576 [details] [diff] [review]: ----------------------------------------------------------------- Thanks, LGTM. Sorry for the delay.

Attachment #8892576 - Flags: review?(jdemooij) → review+

Jan de Mooij [:jandem]

Comment 15

•

8 years ago

Comment on attachment 8892577 [details] [diff] [review] bug1383647-part2-latin1-specialization.patch Review of attachment 8892577 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/Unicode.h @@ +307,5 @@ > + > + // U+00B5 and U+00E0 to U+00FF, except U+00F7, have an uppercase form. > + bool canUpper = ch == MICRO_SIGN || > + (((ch & ~0x1F) == LATIN_SMALL_LETTER_A_WITH_GRAVE) && ch != DIVISION_SIGN); > + MOZ_ASSERT(canUpper == CanUpperCase(char16_t(ch))); Yeah this is almost too clever :) but there are only 128 char possibilities and this assert verifies all cases are handled correctly, so IMO this is fine.

Attachment #8892577 - Flags: review?(jdemooij) → review+

André Bargull [:anba]

Assignee

Comment 16

•

8 years ago

(In reply to Jan de Mooij [:jandem] from comment #15) > ::: js/src/vm/Unicode.h > @@ +307,5 @@ > > + > > + // U+00B5 and U+00E0 to U+00FF, except U+00F7, have an uppercase form. > > + bool canUpper = ch == MICRO_SIGN || > > + (((ch & ~0x1F) == LATIN_SMALL_LETTER_A_WITH_GRAVE) && ch != DIVISION_SIGN); > > + MOZ_ASSERT(canUpper == CanUpperCase(char16_t(ch))); > > Yeah this is almost too clever :) but there are only 128 char possibilities > and this assert verifies all cases are handled correctly, so IMO this is > fine. TBH I think the test in CanLowerCase is even worse: and'ing |ch & 0xD7| to exclude U+00D7 and U+00DF. :-)

André Bargull [:anba]

Assignee

Comment 17

•

8 years ago

Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8997b7d045fca1be174dfedb8d14aa563e09a8bf

Keywords: checkin-needed

Pulsebot

Comment 18

•

8 years ago

Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/a0e69aaf7f47 Part 1: Use direct vm calls for String.prototype.toLower/UpperCase. r=jandem https://hg.mozilla.org/integration/mozilla-inbound/rev/68ba6a81d15d Part 2: Add unicode::Can{Lower,Upper}Case specialized for Latin1 characters. r=jandem

Keywords: checkin-needed

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 19

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a0e69aaf7f47 https://hg.mozilla.org/mozilla-central/rev/68ba6a81d15d

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox57: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla57

You need to log in before you can comment on or make changes to this bug.