Closed Bug 813836 Opened 7 years ago Closed Last year

IonMonkey is significantly slower than V8 in comparing two characters from a string

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: zbraniecki, Assigned: anba)

References

Details

Attachments

(3 files, 1 obsolete file)

Attached file demo html
Not sure if it's one or two bugs, but in the following demo:

1) IonMonkey scores ~416 on the first test vs. ~257 of V8
2) same code with charAt is 35% slower than charCodeAt (no diff in V8)
That's definitely two separate issues.

So a few more comments:

1)  The only reason the first testcase measures anything at all is due to bug 813852.
    You can see this by commenting out one of the charCodeAt and watching what happens to
    the times.  Compare to what happens if you loop-hoist the set of "x" altogether.
2)  We're likely being bitten by the silly aliasing analysis thing in ion.  Otherwise we
    might have constant-folded the if condition altogether.

(all of which is to say, this is a really bad benchmark if you want to measure charCodeAt performance).  Which is questionable itself, since the times being measured are comparable to the time to increment the loop counter.

As soon as I change the testcase to not use charCodeAt with constant input, suddenly we're faster than V8 on that test...

As for charAt, the obvious way to get the same perf is to have a peephole optimization that converts charAt followed by == against a 1-char string literal into something fast that doesn't involve string objects.  Not sure whether we do something like that here.  But again, charAt and charCodeAt should have two separate bugs.
Attached file Shell test case.
charAt is currently implemented as charCodeAt & fromCharCode, which returns an atom string (see ion/MCallOptimize.cpp).

By reading the code, we should have moved both operations outside the loop by noticing that nothing can alias "s" or even change its content and then run empty loops.

The strangest thing which also show up with the previous test case is:

$ js --no-jm _build/charAt.js
charCodeAt: 310
charAt: 451
charCodeAt: 265
charAt: 483
charCodeAt: 264
charAt: 1240
charCodeAt: 263
charAt: 1230

An inexplicable slow-down which appear latter on.
yeah, the initial test case is dumb. Sorry for that.

So the three separate bugs here are:

1) We're likely being bitten by the silly aliasing analysis thing in ion.  Otherwise we might have constant-folded the if condition altogether.

2) As for charAt, the obvious way to get the same perf is to have a peephole optimization that converts charAt followed by == against a 1-char string literal into something fast that doesn't involve string objects.

3) By reading the code, we should have moved both operations outside the loop by noticing that nothing can alias "s" or even change its content and then run empty loops.

4) An inexplicable slow-down which appear latter on.

Does it sound right and do you want me to file those bugs and close this one? :)

btw. 2) is most important for writing parsers since without it in order to improve performance it's beneficial to write less readable code (which I'm doing right now) by replacing character search with charcode searches.
Nicolas, what would allow us to move the "&&" outside the loop currently?  If I don't have the "&&" in there, then the charCodeAt _is_ moved outside the loop.

Going to leave the rest of comment 3 to someone who knows something about Ion.  ;)
(In reply to Boris Zbarsky (:bz) from comment #4)
> Nicolas, what would allow us to move the "&&" outside the loop currently? 
> If I don't have the "&&" in there, then the charCodeAt _is_ moved outside
> the loop.

I cannot reproduce when I replace the 'i' by 5 in the shell test case.
CC marty.
(In reply to Boris Zbarsky (:bz) from comment #4)
> If I don't have the "&&" in there, then the charCodeAt _is_ moved outside
> the loop.
> 

var x = s.charCodeAt(5) == 47 && s.charCodeAt(6) == 42;

The problem is that the && RHS is never evaluated (LHS is always false), so we don't have good type information for it and don't inline the charCodeAt and compare. Alias analysis then can't prove the call + compare on the RHS don't touch the "s" global variable, so it can't hoist the LHS. You should get better numbers if both branches are evaluated.

Best fix is probably for Ion to not compile branches for which it has no type information. May also help regalloc and compilation time, and we will invalidate anyway when we reach the branch.
Assignee: general → nobody
On Windows 7

Nightly 37
charCodeAt: 114
charAt: 315

Chrome 39
charCodeAt: 376
charAt: 299
Recent JS

zbraniecki@zbmba:~/Desktop$ js ./charAt.js 
charCodeAt: 130
charAt: 181
charCodeAt: 153
charAt: 203
charCodeAt: 157
charAt: 203
charCodeAt: 151
charAt: 229
charCodeAt: 144
charAt: 227
charCodeAt: 169
charAt: 219
charCodeAt: 146
charAt: 214
charCodeAt: 137
charAt: 204
charCodeAt: 144
charAt: 209
charCodeAt: 138
charAt: 214

What's interesting is that in the browser I get:

charCodeAt: 41
charAt: 192

So, charCodeAt is still significantly faster than charAt and charCodeAt in the browser is way faster than in shell JS. I can't compile V8 on Mac due to their changes in infra, but on Chrome charAt is still 15-20% faster than charCodeAt.

Is it still worth keeping this bug alive? I'd say that if we can get charCodeAt in the browser so much faster than charAt, then there should be room to optimize charAt. Does it make sense?
Attached patch bug813836.patch (obsolete) — Splinter Review
Does this approach look okay?

The patch fixes the difference between charAt() and charCodeAt() from the attached test case. And it also vastly improves relational comparisons when single-element string are involved. For example the following test case needs 2600ms for me without the patch, but only 85ms when the patch is applied.

---
// https://searchfox.org/mozilla-central/rev/0f4d50ff5211e8dd85e119ef683d04b63062ed90/js/src/builtin/intl/CommonFunctions.js#679
function IsASCIIAlphaString(s) {
    for (var i = 0; i < s.length; i++) {
        var c = s[i];
        if (!(("A" <= c && c <= "Z") || ("a" <= c && c <= "z")))
            return false;
    }
    return true;
}

var xs = ["abcabcabcabc", "abcabcabcabc"];
var q = 0;
var t = dateNow();
for (var i = 0; i < 1000000; ++i) {
    var x = xs[i & 1];
    if (IsASCIIAlphaString(x))
        ++q;
}
print(dateNow() - t, q);
---
Attachment #8998869 - Flags: feedback?(jdemooij)
Comment on attachment 8998869 [details] [diff] [review]
bug813836.patch

Review of attachment 8998869 [details] [diff] [review]:
-----------------------------------------------------------------

Nice, I think the pattern in your test case is common enough that it's worth optimizing.

::: js/src/jit/IonBuilder.cpp
@@ +5872,5 @@
>      // Try to emit an compare based on the input types.
>  
> +    // Optimize |MConstant(string) <compare> (MFromCharCode MCharCodeAt)| to
> +    // |MConstant(charcode) <compare> MCharCodeAt|.
> +    auto isCharacterComparison = [](auto* left, auto* right) {

Nit: Add compareTry* for this optimization and then we can inline the code in this lambda directly into that? I think it would be a bit easier to read.

I also wondered if we could do this as part of GVN (in MCompare::foldsTo) but because we're looking at the atom's chars etc it's probably safer to do this on the main thread...
Attachment #8998869 - Flags: feedback?(jdemooij) → feedback+
(In reply to Jan de Mooij [:jandem] from comment #10)
> I also wondered if we could do this as part of GVN (in MCompare::foldsTo)
> but because we're looking at the atom's chars etc it's probably safer to do
> this on the main thread...

Actually when I first started the patch, I had this in MCompare::foldsTo, but then I thought it's probably a bit too late, considering that relational comparison will already have gone through all these steps [1] and then flagged as a slow vm-call [2]. And potentially it may have been turned into a stub via [3].

[1] https://searchfox.org/mozilla-central/rev/aff5d4ad5d7fb2919d267cbc23b1d87ae3cf0110/js/src/jit/IonBuilder.cpp#5821-5835
[2] https://searchfox.org/mozilla-central/rev/aff5d4ad5d7fb2919d267cbc23b1d87ae3cf0110/js/src/jit/IonBuilder.cpp#5837-5841
[3] https://searchfox.org/mozilla-central/rev/aff5d4ad5d7fb2919d267cbc23b1d87ae3cf0110/js/src/jit/IonBuilder.cpp#5833-5835
Attached patch bug813836.patchSplinter Review
Updated patch to add a separate |compareTryCharacter| to IonBuilder. Also added a new TrackedStrategy entry, now that the optimization is split from |compareTrySpecialized|. 


Applies on top of bug 1482359.
Assignee: nobody → andrebargull
Attachment #8998869 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8999228 - Flags: review?(jdemooij)
Comment on attachment 8999228 [details] [diff] [review]
bug813836.patch

Review of attachment 8999228 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great, thanks!
Attachment #8999228 - Flags: review?(jdemooij) → review+
Pushed by apavel@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/78d5bc33afd0
Optimize comparisons of single-element strings. r=jandem
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/78d5bc33afd0
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
In latest Nightly I still see different numbers:
charCodeAt: 136
charAt: 196

These numbers are similar to Firefox Release.
Chrome shows the same number in both situations, but it's slower (~350).
The difference for the html test case is kind of expected, because in the html file the "charCodeAt" part measures the performance of an empty loop (*). And the "charAt" part is also fast even without the patch from this bug, because the character access is moved outside of the loop (at least I assume it is based on the runtime numbers), so the loop body is probably just |var x = boolean1 && boolean2; if (x) { count++; }|.

If |constantString.charAt(constantIndex)| (or |constantString[constantIndex]|) is a frequent pattern in real-world sites, we could optimise it similar to |constantString.charCodeAt(constantIndex)|, but just for this µ-benchmark it's hardly worth it to add an extra optimisation. :-)

(*) |constantString.charCodeAt(constantIndex)| is optimised to a constant int32 in <https://searchfox.org/mozilla-central/rev/2466b82b729765fb0a3ab62f812c1a96a7362478/js/src/jit/MCallOptimize.cpp#1944-1949>, and |constant1Int32 == constant2Int32| is then also optimised away.
Duplicate of this bug: 813727
You need to log in before you can comment on or make changes to this bug.