Reimplement Array.prototype.sort with a JIT trampoline
Categories
(Core :: JavaScript Engine, task, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox126 | --- | fixed |
People
(Reporter: jandem, Assigned: jandem)
References
(Blocks 3 open bugs)
Details
(Whiteboard: [sp3])
Attachments
(7 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
Array.prototype.sort
with a comparator function shows up in a few Speedometer 3 tests as being slow, especially in Baseline. We currently don't have a great strategy for these builtins: calling from C++ into JIT code is slow so we implement them in self-hosted code, but that also has a lot of overhead especially in Baseline.
I've been prototyping a new approach for Array.prototype.sort
. It becomes a C++ builtin with a custom JitEntry trampoline (similar to how we optimize Wasm functions). The trampoline stack frame stores all sort state and alternates between calling C++ and JIT code. In other words, the sort algorithm itself is implemented in C++ code but it's like a generator or async function that we can "yield" from and resume. This gives us the best of both worlds: fast calls to the comparator function but the main algorithm is implemented in C++.
Performance is very promising. Some Speedometer 3 Backbone and Vue subtests are a few percent faster with this. On sort
micro-benchmarks this makes us faster than both Chrome and Safari in cases where we use a similar number of comparator calls. Unfortunately our sorting algorithm isn't the most efficient and we often have more calls to the comparator, but that's an orthogonal issue.
Micro-benchmark results below show that it's faster especially without Ion. This matters for Speedometer where the Baseline time is often hurting us.
Length 10:
* normal: 70 => 61 ms
* --no-ion: 302 => 95 ms
* --no-blinterp: 2674 => 847 ms
Length 100:
* normal: 754 => 507 ms
* --no-ion: 3367 => 801 ms
* --no-blinterp: 28 => 8 sec
Length 10000 (1000 iterations):
* normal: 1267 => 863 ms
* --no-ion: 5731 => 1384 ms
* --no-blinterp: 51 => 14 sec
function f() {
var len = 100;
var fun = (x, y) => {
if (i & 1) {
return x.id - y.id;
}
return y.id - x.id;
};
var arr = [];
for (var j = 0; j < len; j++) {
arr.push({id: (j / 3)|0});
}
var t = new Date;
for (var i = 0; i < 100_000; i++) {
arr.sort(fun);
}
print(new Date - t);
return arr;
}
f();
Updated•3 months ago
|
Assignee | ||
Updated•3 months ago
|
Assignee | ||
Comment 1•3 months ago
|
||
toSorted
is very similar and should also be ported.
TypedArraySort
shows up on Speedometer 3 too and should be optimized as well.
This approach might also work for some of the string/regexp builtins such as RegExp.prototype[@@replace]
, but they're more messy and our Ion code for them is relatively fast so we'll have to see.
Updated•3 months ago
|
Assignee | ||
Comment 2•3 months ago
|
||
Rename WASM_JIT_ENTRY
to NATIVE_JIT_ENTRY
and fix the code to allow using this for
non-Wasm functions.
Assignee | ||
Comment 3•3 months ago
|
||
Simplify the handling of Ion frames by checking for Ion frames directly instead of
checking for all the other frame types and excluding those. This is less of a
footgun when adding new calls to this function.
Assignee | ||
Comment 4•3 months ago
|
||
This patch adds code for using the JitEntry mechanism for native builtins.
The TrampolineNative JSJitInfo mechanism is very similar to what we do for
InlinableNative. For each trampoline native, a JIT trampoline is generated and
we use this as the function's JitEntry when creating the native JSFunction.
Assignee | ||
Comment 5•3 months ago
|
||
This ports Array.prototype.sort to C++ code with a JIT trampoline. The sort algorithm
is the same as the self-hosted code to avoid subtle differences in behavior.
Any data used by the sorting algorithm is stored in the ArraySortData
instance stored
in the trampoline frame. The sortWithComparator
C++ function is like a generator
that we can yield from and resume into. This avoids slow calls from C++ into JIT code.
Sort calls with a JS comparator function are much faster with this implementation.
Especially in Baseline, the implementation is more than 3x faster on micro-benchmarks.
Some Speedometer 3 subtests improve by 2-3%.
Assignee | ||
Comment 6•3 months ago
|
||
Assignee | ||
Comment 7•3 months ago
|
||
Longer-term we could make toSorted
itself a trampoline native, but that's more
complicated and this patch gets us most of the performance and code reuse benefits.
Assignee | ||
Comment 8•3 months ago
|
||
After the previous patch there's now a single caller so we can de-duplicate
the |length| lookup and check.
Pushed by jdemooij@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9a6d5b48ad86 part 1 - Allow using Wasm JitEntry mechanism for other natives. r=iain https://hg.mozilla.org/integration/autoland/rev/5138353c675f part 2 - Improve TraceThisAndArguments. r=iain https://hg.mozilla.org/integration/autoland/rev/9ef13ed79db6 part 3 - Add infrastructure for trampoline natives. r=iain https://hg.mozilla.org/integration/autoland/rev/b743977afb83 part 4 - Reimplement Array.prototype.sort with a JIT trampoline. r=iain https://hg.mozilla.org/integration/autoland/rev/4fc4b3124b4f part 5 - Add tests. r=iain https://hg.mozilla.org/integration/autoland/rev/5351a46c47b5 part 6 - Use std_Array_sort in Array.prototype.toSorted. r=iain https://hg.mozilla.org/integration/autoland/rev/0ebc91ec13c2 part 7 - Rename ArrayNativeSortImpl and simplify it a bit. r=iain
Comment 10•2 months ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/9a6d5b48ad86
https://hg.mozilla.org/mozilla-central/rev/5138353c675f
https://hg.mozilla.org/mozilla-central/rev/9ef13ed79db6
https://hg.mozilla.org/mozilla-central/rev/b743977afb83
https://hg.mozilla.org/mozilla-central/rev/4fc4b3124b4f
https://hg.mozilla.org/mozilla-central/rev/5351a46c47b5
https://hg.mozilla.org/mozilla-central/rev/0ebc91ec13c2
Comment 11•2 months ago
|
||
Improvements on AWFY:
2% Sunspider , which is Driven by 9% on string-tagcloud
SP2
2%-3% on BackboneJS-TodoMVC driven by 7% on BackboneJS-TodoMVC/Adding100Items and 6%-7% on BackboneJS-TodoMVC/Adding100Items/Sync
Potential small/gentle improvements on React-Redux and Preact
Jetstream2
5.2% on tagcloud-SP-Average
1.6% and maybe a bit more on UniPoker-Average
Description
•