Closed Bug 1383647 Opened 4 years ago Closed 4 years ago

See if there's more to optimize in str_toLowerCase for small, Latin-1 strings which are already lower-cased

Categories

(Core :: JavaScript: Standard Library, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: anba, Assigned: anba)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 2 obsolete files)

Speedometer call counts per gdb breakpoints (*) (filtered with |if (str->d.u1.flags&63)==32| because I wanted to check for the presence of external strings).

Name     Non-External   External
Vanilla:          800        200
React:           3184       2011
React-Redux:     2724       1710
Ember:           4641         10
Backbone:        1135        302
AngularJS:      43485       1370
Angular2:         117          2
Vue:             1277          2
jQuery:          1027        200
Preact:            18          0
Inferno:        32407          0
Elm:                0          0
Flight:          2928       1301


So most calls are from AngularJS and Inferno 

AngularJS:

Num     Type           Disp Enb Address            What
15      breakpoint     keep y   0x00007fffe8d37729 in js::str_toLowerCase(JSContext*, unsigned int, JS::Value*) at /home/andre/hg/mozilla-inbound/js/src/jsstr.cpp:1013
        stop only if result==linear.ptr
        breakpoint already hit 41960 times
16      breakpoint     keep y   0x00007fffe8d37729 in js::str_toLowerCase(JSContext*, unsigned int, JS::Value*) at /home/andre/hg/mozilla-inbound/js/src/jsstr.cpp:1013
        breakpoint already hit 43746 times



Inferno:

Num     Type           Disp Enb Address            What
15      breakpoint     keep y   0x00007fffe8d37729 in js::str_toLowerCase(JSContext*, unsigned int, JS::Value*) at /home/andre/hg/mozilla-inbound/js/src/jsstr.cpp:1013
        stop only if result==linear.ptr
        breakpoint already hit 21124 times
16      breakpoint     keep y   0x00007fffe8d37729 in js::str_toLowerCase(JSContext*, unsigned int, JS::Value*) at /home/andre/hg/mozilla-inbound/js/src/jsstr.cpp:1013
        breakpoint already hit 32590 times


Most AngularJS calls come from here [1] with |"false".toLowerCase()| and |"true".toLowerCase()|, so we'd need to optimize traversing a small, Latin-1 string which is already lower-cased. :-/
It still could be useful to search for optimization possibilities given that str_toLowerCase if the fourth most called built-in per bug 1365361.

(*) All frameworks run once from InteractiveRunner.html
[1] https://github.com/WebKit/webkit/blob/80b1e93d42aac34fddd652bb6f7749d26a265d36/PerformanceTests/Speedometer/resources/todomvc/architecture-examples/angularjs/node_modules/angular/angular.js#L18303-L18305
For the angular case, maybe we can fold toLowerCase in Ion?
(In reply to Jan de Mooij [:jandem] from comment #1)
> For the angular case, maybe we can fold toLowerCase in Ion?

Unfortunately the input strings aren't constant, so I don't think we can just constant-fold the calls. (Or did I misunderstand "fold toLowerCase in Ion"?)

The AngularJS code looks like:

    actual = lowercase('' + actual);
    expected = lowercase('' + expected);
    return actual.indexOf(expected) !== -1;

where the "lowercase" function is defined as |var lowercase = function(string) {return isString(string) ? string.toLowerCase() : string;};|.


I was able to improve the performance it a bit by doing more or less useful things like:
- Manually inlining unicode::CanLowerCase and then adding MOZ_LIKELY around the ASCII case.
- And removing the two roots in str_toLowerCase. (Rooting analysis even seems to be okay with that change: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2d1821b2be1c116181b2667c25ec93c3a6a2a91f)

These two changes improved this test case from ~170ms to ~140ms for me. For comparison: V8 needs ~115ms, Chakra ~120ms, and JSC 55ms (https://bugs.webkit.org/show_bug.cgi?id=162887).

  function f() {
    var ss = ["true", "false"];
    var q = 0;
    var t = dateNow();
    for (var i = 0; i < 5000000; ++i) {
      q += ss[i & 1].toLowerCase().length;
    }
    return [dateNow() - t, q];
  }
  for (var i = 0; i < 10; ++i) print(f());
(In reply to André Bargull from comment #2)
> Unfortunately the input strings aren't constant, so I don't think we can
> just constant-fold the calls. (Or did I misunderstand "fold toLowerCase in
> Ion"?)

I was thinking maybe we know more about the types/strings if Ion is able to inline the "comparator" function and then inlines the "lowercase" function as well, but we probably won't have good enough type information anyway.

I wonder what perf would look like with a MIR instruction that just calls into the VM. In your micro-benchmark it would also make it easier for Ion to optimize the array accesses because then it knows the call cannot affect the array length.
(In reply to Jan de Mooij [:jandem] from comment #3)
> I wonder what perf would look like with a MIR instruction that just calls
> into the VM. In your micro-benchmark it would also make it easier for Ion to
> optimize the array accesses because then it knows the call cannot affect the
> array length.

Paired with MOZ_LIKELY for the ASCII case, it improves the µ-benchmark to ~95ms!
Assignee: nobody → andrebargull
Status: NEW → ASSIGNED
I've changed StringToLowerCase/StringToUpperCase to take a HandleString instead of a HandleLinearString, so we can directly call the function from the generated code. Apart from that, the patch simply adds a vm-call for str_toLowerCase/str_toUpperCase.
Attachment #8892433 - Flags: review?(jdemooij)
If we want to squeeze out a few more ms, we could add specialised CanUpperCase/CanLowerCase functions for Latin-1 characters:

If we assume most Latin-1 strings are actually ASCII strings, we can add MOZ_LIKELY around the ASCII case. And to make non-ASCII, Latin-1 strings a bit faster, we can avoid the CharInfo lookup by using some mild bit-twiddling. (Is this still acceptable or is already "too clever"?)
Attachment #8892437 - Flags: review?(jdemooij)
(In reply to André Bargull from comment #5)
> Created attachment 8892433 [details] [diff] [review]
> bug1383647-part1-vmcall-from-ion.patch

Note to self: This patch doesn't apply cleanly on inbound.
Comment on attachment 8892437 [details] [diff] [review]
bug1383647-part2-latin1-specialization.patch

Review of attachment 8892437 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/Unicode.h
@@ +299,5 @@
>  }
>  
> +// Returns true iff ToUpperCase(ch) != ch.
> +inline bool
> +CanUpperCase(Latin1Char ch)

JS::Latin1Char here and below
Comment on attachment 8892433 [details] [diff] [review]
bug1383647-part1-vmcall-from-ion.patch

Review of attachment 8892433 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +7501,5 @@
> +    TRIVIAL_NEW_WRAPPERS
> +    NAMED_OPERANDS((0, string))
> +
> +    bool congruentTo(const MDefinition* ins) const override {
> +        return congruentIfOperandsEqual(ins);

Also needs to test that |mode_| matches.
Try-servering found two small issues, noted above. I'll probably post fixed patches later today.
Updated part 1 so it applies cleanly on inbound and to fix the issue noted in comment #9.
Attachment #8892433 - Attachment is obsolete: true
Attachment #8892433 - Flags: review?(jdemooij)
Attachment #8892576 - Flags: review?(jdemooij)
Updated part two to fix the comment described in comment #8.
Attachment #8892437 - Attachment is obsolete: true
Attachment #8892437 - Flags: review?(jdemooij)
Attachment #8892577 - Flags: review?(jdemooij)
Comment on attachment 8892576 [details] [diff] [review]
bug1383647-part1-vmcall-from-ion.patch

Review of attachment 8892576 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, LGTM. Sorry for the delay.
Attachment #8892576 - Flags: review?(jdemooij) → review+
Comment on attachment 8892577 [details] [diff] [review]
bug1383647-part2-latin1-specialization.patch

Review of attachment 8892577 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/Unicode.h
@@ +307,5 @@
> +
> +    // U+00B5 and U+00E0 to U+00FF, except U+00F7, have an uppercase form.
> +    bool canUpper = ch == MICRO_SIGN ||
> +                    (((ch & ~0x1F) == LATIN_SMALL_LETTER_A_WITH_GRAVE) && ch != DIVISION_SIGN);
> +    MOZ_ASSERT(canUpper == CanUpperCase(char16_t(ch)));

Yeah this is almost too clever :) but there are only 128 char possibilities and this assert verifies all cases are handled correctly, so IMO this is fine.
Attachment #8892577 - Flags: review?(jdemooij) → review+
(In reply to Jan de Mooij [:jandem] from comment #15)
> ::: js/src/vm/Unicode.h
> @@ +307,5 @@
> > +
> > +    // U+00B5 and U+00E0 to U+00FF, except U+00F7, have an uppercase form.
> > +    bool canUpper = ch == MICRO_SIGN ||
> > +                    (((ch & ~0x1F) == LATIN_SMALL_LETTER_A_WITH_GRAVE) && ch != DIVISION_SIGN);
> > +    MOZ_ASSERT(canUpper == CanUpperCase(char16_t(ch)));
> 
> Yeah this is almost too clever :) but there are only 128 char possibilities
> and this assert verifies all cases are handled correctly, so IMO this is
> fine.

TBH I think the test in CanLowerCase is even worse: and'ing |ch & 0xD7| to exclude U+00D7 and U+00DF. :-)
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a0e69aaf7f47
Part 1: Use direct vm calls for String.prototype.toLower/UpperCase. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/68ba6a81d15d
Part 2: Add unicode::Can{Lower,Upper}Case specialized for Latin1 characters. r=jandem
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/a0e69aaf7f47
https://hg.mozilla.org/mozilla-central/rev/68ba6a81d15d
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in before you can comment on or make changes to this bug.