Closed Bug 1884360 Opened 3 months ago Closed 2 months ago

Reimplement Array.prototype.sort with a JIT trampoline

Categories

(Core :: JavaScript Engine, task, P2)

task

Tracking

()

RESOLVED FIXED
126 Branch
Tracking Status
firefox126 --- fixed

People

(Reporter: jandem, Assigned: jandem)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [sp3])

Attachments

(7 files)

Array.prototype.sort with a comparator function shows up in a few Speedometer 3 tests as being slow, especially in Baseline. We currently don't have a great strategy for these builtins: calling from C++ into JIT code is slow so we implement them in self-hosted code, but that also has a lot of overhead especially in Baseline.

I've been prototyping a new approach for Array.prototype.sort. It becomes a C++ builtin with a custom JitEntry trampoline (similar to how we optimize Wasm functions). The trampoline stack frame stores all sort state and alternates between calling C++ and JIT code. In other words, the sort algorithm itself is implemented in C++ code but it's like a generator or async function that we can "yield" from and resume. This gives us the best of both worlds: fast calls to the comparator function but the main algorithm is implemented in C++.

Performance is very promising. Some Speedometer 3 Backbone and Vue subtests are a few percent faster with this. On sort micro-benchmarks this makes us faster than both Chrome and Safari in cases where we use a similar number of comparator calls. Unfortunately our sorting algorithm isn't the most efficient and we often have more calls to the comparator, but that's an orthogonal issue.

Micro-benchmark results below show that it's faster especially without Ion. This matters for Speedometer where the Baseline time is often hurting us.

Length 10:
* normal:           70 =>  61 ms
* --no-ion:        302 =>  95 ms
* --no-blinterp:  2674 => 847 ms

Length 100:
* normal:         754 => 507 ms
* --no-ion:      3367 => 801 ms
* --no-blinterp:   28 =>   8 sec

Length 10000 (1000 iterations):
* normal:        1267 =>  863 ms
* --no-ion:      5731 => 1384 ms
* --no-blinterp:   51 =>   14 sec
function f() {
  var len = 100;
  var fun = (x, y) => {
    if (i & 1) {
      return x.id - y.id;
    }
    return y.id - x.id;
  };
  var arr = [];
  for (var j = 0; j < len; j++) {
    arr.push({id: (j / 3)|0});
  }  
  var t = new Date;
  for (var i = 0; i < 100_000; i++) {
    arr.sort(fun);
  }
  print(new Date - t);
  return arr;
}
f();
Summary: Re-implement Array.prototype.sort with a JIT trampoline → Reimplement Array.prototype.sort with a JIT trampoline

toSorted is very similar and should also be ported.

TypedArraySort shows up on Speedometer 3 too and should be optimized as well.

This approach might also work for some of the string/regexp builtins such as RegExp.prototype[@@replace], but they're more messy and our Ion code for them is relatively fast so we'll have to see.

Severity: -- → N/A
Priority: -- → P2
Depends on: 1884368
See Also: → 1855968

Rename WASM_JIT_ENTRY to NATIVE_JIT_ENTRY and fix the code to allow using this for
non-Wasm functions.

Simplify the handling of Ion frames by checking for Ion frames directly instead of
checking for all the other frame types and excluding those. This is less of a
footgun when adding new calls to this function.

This patch adds code for using the JitEntry mechanism for native builtins.

The TrampolineNative JSJitInfo mechanism is very similar to what we do for
InlinableNative. For each trampoline native, a JIT trampoline is generated and
we use this as the function's JitEntry when creating the native JSFunction.

This ports Array.prototype.sort to C++ code with a JIT trampoline. The sort algorithm
is the same as the self-hosted code to avoid subtle differences in behavior.

Any data used by the sorting algorithm is stored in the ArraySortData instance stored
in the trampoline frame. The sortWithComparator C++ function is like a generator
that we can yield from and resume into. This avoids slow calls from C++ into JIT code.

Sort calls with a JS comparator function are much faster with this implementation.
Especially in Baseline, the implementation is more than 3x faster on micro-benchmarks.
Some Speedometer 3 subtests improve by 2-3%.

Longer-term we could make toSorted itself a trampoline native, but that's more
complicated and this patch gets us most of the performance and code reuse benefits.

After the previous patch there's now a single caller so we can de-duplicate
the |length| lookup and check.

Pushed by jdemooij@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9a6d5b48ad86
part 1 - Allow using Wasm JitEntry mechanism for other natives. r=iain
https://hg.mozilla.org/integration/autoland/rev/5138353c675f
part 2 - Improve TraceThisAndArguments. r=iain
https://hg.mozilla.org/integration/autoland/rev/9ef13ed79db6
part 3 - Add infrastructure for trampoline natives. r=iain
https://hg.mozilla.org/integration/autoland/rev/b743977afb83
part 4 - Reimplement Array.prototype.sort with a JIT trampoline. r=iain
https://hg.mozilla.org/integration/autoland/rev/4fc4b3124b4f
part 5 - Add tests. r=iain
https://hg.mozilla.org/integration/autoland/rev/5351a46c47b5
part 6 - Use std_Array_sort in Array.prototype.toSorted. r=iain
https://hg.mozilla.org/integration/autoland/rev/0ebc91ec13c2
part 7 - Rename ArrayNativeSortImpl and simplify it a bit. r=iain
Blocks: 1888119
Blocks: 1889117
Blocks: 1889685
Blocks: 1893977
Regressions: 1897150
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: