Closed Bug 1773434 Opened 3 years ago Closed 3 years ago

Use frame pointer relative addressing in Ion

Tracking

()

Status:

RESOLVED FIXED

Milestone:

104 Branch

Tracking Flags:

Tracking

Status

firefox104

---

fixed

People

(Reporter: jandem, Assigned: jandem)

References

(Regressed 1 open bug)

Details

Attachments

(10 files)

Bug 1773434 part 1 - Remove bogus assertion on ARM32. r?iain! 3 years ago Jan de Mooij [:jandem] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1773434 part 2 - Tidy up unused-stack computation for calls. r?iain! 3 years ago Jan de Mooij [:jandem] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1773434 part 3 - Replace StackOffsetOfPassedArg with AddressOfPassedArg. r?iain! 3 years ago Jan de Mooij [:jandem] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1773434 part 4 - Remove unused ToStackOffset and ToFramePointerOffset overloads. r?iain! 3 years ago Jan de Mooij [:jandem] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1773434 part 5 - Inline some functions into ToAddress. r?iain! 3 years ago Jan de Mooij [:jandem] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1773434 part 6 - Assert framePushed matches frameSize in ToAddress and AddressOfPassedArg. r?iain! 3 years ago Jan de Mooij [:jandem] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1773434 part 7 - Always use frame pointer for incoming arguments. r?iain! 3 years ago Jan de Mooij [:jandem] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1773434 part 8 - Use frame pointer to access local slots and passed-argument slots on x86/x64. r?iain! 3 years ago Jan de Mooij [:jandem] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1773434 part 9 - Add offsetOfArgsFromFP_ to simplify arguments code a bit. r?iain! 3 years ago Jan de Mooij [:jandem] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1773434 part 10 - Templatize ToAddress. r?iain! 3 years ago Jan de Mooij [:jandem] 48 bytes, text/x-phabricator-request		Details \| Review

Jan de Mooij [:jandem]

Assignee

Description

•

3 years ago

•

Edited

After bug 1770366 we can use [rbp+N] instead of [rsp+N] to access arguments and stack slots. (Wasm is already doing this for arguments but not stack slots.)

On x86/x64, this saves one byte per instruction and has other benefits. C/C++ compilers also prefer the frame pointer over the stack pointer if they're not omitting frame pointers.

Using the frame pointer is also less error-prone because it's not affected by stack pushes.

Jan de Mooij [:jandem]

Assignee

Comment 1

•

3 years ago

On x86/x64, this works great: using [RBP+-N] is one byte shorter than [RSP+N], and I see a small improvement on some benchmarks.

The problem is ARM64: to access locals, we'll need a negative offset from the frame pointer, but the range for load/store immediates on ARM is [-256, 4095]. This means that switching everything to use FP as base register would require a scratch register and an extra instruction in more cases.

We could keep using SP-relative addressing on ARM, or at least for larger offsets, but then there's a class of bugs that will only show up on ARM (when you get masm.framePushed() wrong for example) and I had hoped we could stop using SP-relative addressing completely.

I'm not sure what the best option is, but for x86/x64 there are clear benefits.

Iain Ireland [:iain]

Comment 2

•

3 years ago

Given that one ISA pushes heavily towards frame-relative, and the other pushes towards stack-relative, the optimal solution perf-wise is presumably to support both approaches. Not sure whether that's justifiable from a maintainability perspective.

How bad would it be to continue supporting stack-relative locals on x86 behind a flag, so that we can fuzz both ways?

Julian Seward [:jseward]

Comment 3

•

3 years ago

How bad would it be to continue supporting stack-relative locals on x86
behind a flag, so that we can fuzz both ways?

The increment in overall complexity and in verification burden concerns me,
especially considering that there are unresolved issues with the arm64 16-byte
SP-alignment requirement. I spent several weeks last year working on that
without much success, and it looks like we will need to revisit it soon, see
bug 1774449.

[..] I see a small improvement on some benchmarks [..]

Can you show some numbers? Do we know why there is an improvement? It sounds
like it could be either due to reduced icache misses, or the removal of
stack-sync uops on some Intels.

Jan de Mooij [:jandem]

Assignee

Comment 4

•

3 years ago

I've reworked the patches to easily support both modes based on a JitOption setting. On x86/x64 we default to FP-relative and on other platforms we stay with SP-relative.

Here are some numbers for embenchen in the shell on x64:

JS_SHELL1=obj-shell-opt/dist/bin/js-sp JS_SHELL2=obj-shell-opt/dist/bin/js-fp python3 wasm_bench.py --mode=ion+ion --range -n 7
# mode=ion+ion, runs=7, problem size=default
# Lower score is better
box2d           736     743     0.991   [725, 740]      [736, 746]
bullet          890     906     0.982   [878, 910]      [892, 911]
conditionals    293     250     1.172   [290, 295]      [248, 258]
copy            536     509     1.053   [509, 540]      [504, 518]
corrections     936     921     1.016   [919, 940]      [914, 945]
fannkuch        2154    2084    1.034   [2133, 2219]    [2080, 2093]
fasta           668     662     1.009   [661, 668]      [655, 666]
fib             641     649     0.988   [640, 662]      [637, 650]
fib_indirect1   797     804     0.991   [784, 800]      [803, 823]
fib_indirect2   840     841     0.999   [837, 843]      [840, 841]
fib_indirect3   805     836     0.963   [804, 806]      [834, 837]
ifs             173     182     0.951   [154, 192]      [161, 198]
binarytrees     3030    3031    1.0     [2994, 3061]    [3004, 3051]
memops          1025    908     1.129   [998, 1041]     [907, 910]
primes          971     844     1.15    [968, 977]      [843, 859]
raybench        1036    1045    0.991   [1022, 1044]    [1043, 1054]
rust-fannkuch   2130    2087    1.021   [2130, 2133]    [2084, 2088]
skinning        637     644     0.989   [633, 647]      [636, 655]
zlib            1008    1001    1.007   [983, 1011]     [996, 1007]

JS_SHELL1=obj-shell-opt/dist/bin/js-sp JS_SHELL2=obj-shell-opt/dist/bin/js-fp python3 wasm_bench.py --mode=ion+ion --range -n 7
# mode=ion+ion, runs=7, problem size=default
# Lower score is better
box2d           727     730     0.996   [721, 751]      [721, 741]
bullet          876     879     0.997   [874, 910]      [874, 887]
conditionals    293     253     1.158   [291, 311]      [251, 256]
copy            539     502     1.074   [533, 542]      [496, 516]
corrections     935     937     0.998   [932, 939]      [925, 947]
fannkuch        2175    2110    1.031   [2121, 2257]    [2087, 2130]
fasta           671     670     1.001   [665, 681]      [667, 676]
fib             649     653     0.994   [642, 657]      [639, 654]
fib_indirect1   787     803     0.98    [785, 792]      [803, 804]
fib_indirect2   837     840     0.996   [837, 843]      [840, 842]
fib_indirect3   805     846     0.952   [804, 809]      [836, 859]
ifs             157     170     0.924   [156, 159]      [161, 182]
binarytrees     3006    3036    0.99    [2998, 3015]    [3009, 3058]
memops          1023    931     1.099   [912, 1033]     [910, 932]
primes          977     841     1.162   [968, 993]      [839, 843]
raybench        1035    1057    0.979   [1033, 1040]    [1044, 1073]
rust-fannkuch   2131    2082    1.024   [2129, 2149]    [2082, 2097]
skinning        640     643     0.995   [628, 647]      [619, 649]
zlib            987     982     1.005   [972, 999]      [978, 1000]

The >= 10% improvement on conditionals, memops, and primes is very consistent.

Steve Fink [:sfink] [:s:]

Comment 5

•

3 years ago

Chromium switched ARM64 from fp-relative to sp-relative: https://codereview.chromium.org/1376173003/

Jan de Mooij [:jandem]

Assignee

Comment 6

•

3 years ago

Attached file Bug 1773434 part 1 - Remove bogus assertion on ARM32. r?iain! — Details

This code used to use ARM-specific data structures with limits on the offset range,
but we now match the x86-shared version and the assertion is unnecessary.
The StackPointer-case already uses larger offsets too.

Jan de Mooij [:jandem]

Assignee

Comment 7

•

3 years ago

Attached file Bug 1773434 part 2 - Tidy up unused-stack computation for calls. r?iain! — Details

Add a separate UnusedStackBytesForCall function so that we can change
StackOffsetOfPassedArg in a later patch.

Depends on D150944

Jan de Mooij [:jandem]

Assignee

Comment 8

•

3 years ago

Attached file Bug 1773434 part 3 - Replace StackOffsetOfPassedArg with AddressOfPassedArg. r?iain! — Details

Depends on D150945

Jan de Mooij [:jandem]

Assignee

Comment 9

•

3 years ago

Attached file Bug 1773434 part 4 - Remove unused ToStackOffset and ToFramePointerOffset overloads. r?iain! — Details

Depends on D150946

Jan de Mooij [:jandem]

Assignee

Comment 10

•

3 years ago

Attached file Bug 1773434 part 5 - Inline some functions into ToAddress. r?iain! — Details

The idea is to have the code for LAllocation => Address in one function, so
it's easier to change.

Depends on D150947

Jan de Mooij [:jandem]

Assignee

Comment 11

•

3 years ago

Attached file Bug 1773434 part 6 - Assert framePushed matches frameSize in ToAddress and AddressOfPassedArg. r?iain! — Details

Depends on D150948

Jan de Mooij [:jandem]

Assignee

Comment 12

•

3 years ago

Attached file Bug 1773434 part 7 - Always use frame pointer for incoming arguments. r?iain! — Details

Wasm was already doing this and we can now do this to JS too.

Always use SP for MIonToWasmCall because in GenerateDirectCallFromJit we clobber
FP and have code to adjust the stack offset.

Depends on D150949

Jan de Mooij [:jandem]

Assignee

Comment 13

•

3 years ago

Attached file Bug 1773434 part 8 - Use frame pointer to access local slots and passed-argument slots on x86/x64. r?iain! — Details

Now that we have a frame pointer, we can use it to access local slots in Ion, instead of
using the stack pointer. This results in more compact and faster code on x86 and x64.

On other platforms we still use the stack pointer, but we allow changing this in the
shell so that fuzzers can test both cases on each platform.

Depends on D150950

Jan de Mooij [:jandem]

Assignee

Comment 14

•

3 years ago

Attached file Bug 1773434 part 9 - Add offsetOfArgsFromFP_ to simplify arguments code a bit. r?iain! — Details

Depends on D150951

Jan de Mooij [:jandem]

Assignee

Comment 15

•

3 years ago

Attached file Bug 1773434 part 10 - Templatize ToAddress. r?iain! — Details

This ensures we eliminate some branching at compile-time. There are only two instantiations,
for Default and SP.

Depends on D150952

Pulsebot

Comment 16

•

3 years ago

Pushed by jdemooij@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4b218eed4568 part 1 - Remove bogus assertion on ARM32. r=iain https://hg.mozilla.org/integration/autoland/rev/868fd0d9d3ad part 2 - Tidy up unused-stack computation for calls. r=iain https://hg.mozilla.org/integration/autoland/rev/ef8b19ce75a3 part 3 - Replace StackOffsetOfPassedArg with AddressOfPassedArg. r=iain https://hg.mozilla.org/integration/autoland/rev/86e26be2815d part 4 - Remove unused ToStackOffset and ToFramePointerOffset overloads. r=iain https://hg.mozilla.org/integration/autoland/rev/e73abfde161c part 5 - Inline some functions into ToAddress. r=iain https://hg.mozilla.org/integration/autoland/rev/fa4753dfcd8a part 6 - Assert framePushed matches frameSize in ToAddress and AddressOfPassedArg. r=iain https://hg.mozilla.org/integration/autoland/rev/8154b8fbb975 part 7 - Always use frame pointer for incoming arguments. r=iain https://hg.mozilla.org/integration/autoland/rev/49b19abe314d part 8 - Use frame pointer to access local slots and passed-argument slots on x86/x64. r=iain https://hg.mozilla.org/integration/autoland/rev/e289b0cf4176 part 9 - Add offsetOfArgsFromFP_ to simplify arguments code a bit. r=iain https://hg.mozilla.org/integration/autoland/rev/f17954dda80a part 10 - Templatize ToAddress. r=iain

Iulian Moraru

Comment 17

•

3 years ago

bugherder

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

status-firefox104: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 104 Branch

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

3 years ago

Regressions: 1793929

You need to log in before you can comment on or make changes to this bug.