Open Bug 1710012 Opened 3 years ago Updated 3 years ago

ARM64: Address expressions are not commoned

Categories

(Core :: JavaScript: WebAssembly, enhancement, P3)

ARM64
All
enhancement

Tracking

()

People

(Reporter: lth, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

218 bytes, application/x-javascript
Details
Attached file loadvar.js

Consider:

  (func $f1 (param $p i32) (result i32)
    (i32.add (i32.load offset=36 (local.get $p))
             (i32.load offset=36 (local.get $p)))))

The code for this is:

0x3970aa2d4034  91009010  add     x16, x0, #0x24 (36)
0x3970aa2d4038  b8706aa1  ldr     w1, [x21, x16]
0x3970aa2d403c  91009010  add     x16, x0, #0x24 (36)
0x3970aa2d4040  b8706aa0  ldr     w0, [x21, x16]
0x3970aa2d4044  0b000020  add     w0, w1, w0

Here, it should have been possible to reuse the address computation at ...34 on line ...3c, but this is not so because this computation is exposed too late in the pipeline.

This is not a problem on x86/x64 because that platform has more elaborate addressing modes.

(When the computation requires an overflow check because the offset is too large this becomes really obvious and then it's a problem also on x86 / x64. This may become an interesting problem if we ever cut the max non-checking offset from 2^31-1 to something much smaller; code in the wild has been observed with offsets up to about 2^20.)

To fix this, we probably have to have a notion of legalization / pre-lowering, where the MIR that is generated exposes these computations. This is a good idea in general, especially for Wasm.

There's also the consideration that if the offset varies but the base pointer is constant, it's the heapptr+baseptr that should be commoned and the offset should be used with that sum to make the load a single instruction. Consider a dot product:

    (local.set $tmp (i32.load (local.get $a)))
    (local.set $sum (i32.mul (local.get $tmp) (i32.load offset=0 (local.get $b))))
    (local.set $sum (i32.add (local.get $sum) (i32.mul (local.get $tmp) (i32.load offset=4 (local.get $b)))))
    (local.set $sum (i32.add (local.get $sum) (i32.mul (local.get $tmp) (i32.load offset=8 (local.get $b)))))
    (local.set $sum (i32.add (local.get $sum) (i32.mul (local.get $tmp) (i32.load offset=12 (local.get $b)))))

which turns into this:

0x3427e36c4034  b8606aa0  ldr     w0, [x21, x0]
0x3427e36c4038  b8616aa2  ldr     w2, [x21, x1]
0x3427e36c403c  1b007c42  mul     w2, w2, w0
0x3427e36c4040  91001030  add     x16, x1, #0x4 (4)
0x3427e36c4044  b8706aa3  ldr     w3, [x21, x16]
0x3427e36c4048  1b007c63  mul     w3, w3, w0
0x3427e36c404c  0b030042  add     w2, w2, w3
0x3427e36c4050  91002030  add     x16, x1, #0x8 (8)
0x3427e36c4054  b8706aa3  ldr     w3, [x21, x16]
0x3427e36c4058  1b007c63  mul     w3, w3, w0
0x3427e36c405c  0b030042  add     w2, w2, w3
0x3427e36c4060  91003030  add     x16, x1, #0xc (12)
0x3427e36c4064  b8706aa1  ldr     w1, [x21, x16]
0x3427e36c4068  1b007c20  mul     w0, w1, w0
0x3427e36c406c  0b000040  add     w0, w2, w0

but which could be this (ignore addresses and code bytes):

0x3427e36c4034  b8606aa0  ldr     w0, [x21, x0]
                ........  add     x4, x21, x1
0x3427e36c4038  b8616aa2  ldr     w2, [x4]
0x3427e36c403c  1b007c42  mul     w2, w2, w0
0x3427e36c4044  b8706aa3  ldr     w3, [x4, 4]
0x3427e36c4048  1b007c63  mul     w3, w3, w0
0x3427e36c404c  0b030042  add     w2, w2, w3
0x3427e36c4054  b8706aa3  ldr     w3, [x4, 8]
0x3427e36c4058  1b007c63  mul     w3, w3, w0
0x3427e36c405c  0b030042  add     w2, w2, w3
0x3427e36c4064  b8706aa1  ldr     w1, [x4, 12]
0x3427e36c4068  1b007c20  mul     w0, w1, w0
0x3427e36c406c  0b000040  add     w0, w2, w0
See Also: → 1442544, 1712078
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: