ARM64 wasm prologue/epilogue should use STP/LDP
Categories
(Core :: JavaScript: WebAssembly, enhancement, P3)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox150 | --- | fixed |
People
(Reporter: lth, Assigned: yury)
References
(Blocks 3 open bugs)
Details
(Keywords: perf-alert)
Attachments
(1 file)
The ARM64 wasm prologue is sub sp, 16; store lr; store fp; mov fp, sp, but we should be able to use stp {lr,fp} for the same behavior, and we may even be able to use the auto-decrement behavior to avoid the sub.
Ditto the epilogue should be able to use ldp.
This will save six instructions per function (because each function has two copies of the prologue and we save two for each, and two for the epilogue). WasmCheckedTailEntryOffset should be updated accordingly to avoid pointless nopfills.
Using stp is pretty easy, probably; using the auto-decrement is going to be harder because it means StartUnwinding has to be updated in more radical ways.
| Reporter | ||
Updated•4 years ago
|
| Reporter | ||
Updated•4 years ago
|
| Reporter | ||
Comment 1•4 years ago
|
||
Even if traps are placed OOL (bug 1680243) the savings won't be enough to get rid of a lot of NOPs on ARM64 that are there to align the unchecked call entry on a 16-byte boundary. The callable prologue will be reduced to two instructions, leaving only two for the signature check, and that always has to be enough. It will be if the signature is a smallish immediate or a pointer easily loaded from tls (ie offset is small enough) but that's not good enough, because sometimes the immediate will be large or the pointer has to be loaded from a large offset.
So I think the right combination of fixes here is, introduce STP (post-decrement) and LDP (post-increment) to generally reduce code size without worrying about the checked function entry size; this will reduce function size in general by four words. Then bug 1756792 can reduce the bloat from the checked call entry, maybe.
This is not urgent.
| Assignee | ||
Comment 2•1 month ago
|
||
Updated•1 month ago
|
| Assignee | ||
Comment 3•1 month ago
|
||
No regressions in performance tests were noticed https://perf.compare/compare-lando-results?baseLando=183718&newLando=183719&baseRepo=try&newRepo=try&framework=13
Comment 5•1 month ago
|
||
| bugherder | ||
Comment 6•1 month ago
|
||
(In reply to Sandor Molnar[:smolnar] from comment #5)
Perfherder has detected a browsertime performance change from push 9b39d51f95dd8a5f4836c1872ec79872134c7a70.
No action is required from the author; this comment is provided for informational purposes only.
| Improvement | Test | Platform | Options | Absolute values [old vs new] | Performance Profiles |
|---|---|---|---|---|---|
| 5% | jetstream3 Dart-flute-todomvc-wasm-Geometric (doc) | macosx1500-aarch64-shippable | fission webrender | 62.50 score -> 65.81 score | Before/After |
| 5% | jetstream3 Dart-flute-todomvc-wasm-Average (doc) | macosx1500-aarch64-shippable | fission webrender | 68.50 ms -> 65.02 ms | Before/After |
Need Help or Information?
If you have any questions, please reach out to fbilt@mozilla.com. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
Updated•21 days ago
|
Description
•