ARM64: Address expressions are not commoned
Categories
(Core :: JavaScript: WebAssembly, enhancement, P3)
Tracking
()
People
(Reporter: lth, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
218 bytes,
application/x-javascript
|
Details |
Consider:
(func $f1 (param $p i32) (result i32)
(i32.add (i32.load offset=36 (local.get $p))
(i32.load offset=36 (local.get $p)))))
The code for this is:
0x3970aa2d4034 91009010 add x16, x0, #0x24 (36)
0x3970aa2d4038 b8706aa1 ldr w1, [x21, x16]
0x3970aa2d403c 91009010 add x16, x0, #0x24 (36)
0x3970aa2d4040 b8706aa0 ldr w0, [x21, x16]
0x3970aa2d4044 0b000020 add w0, w1, w0
Here, it should have been possible to reuse the address computation at ...34 on line ...3c, but this is not so because this computation is exposed too late in the pipeline.
This is not a problem on x86/x64 because that platform has more elaborate addressing modes.
(When the computation requires an overflow check because the offset is too large this becomes really obvious and then it's a problem also on x86 / x64. This may become an interesting problem if we ever cut the max non-checking offset from 2^31-1 to something much smaller; code in the wild has been observed with offsets up to about 2^20.)
To fix this, we probably have to have a notion of legalization / pre-lowering, where the MIR that is generated exposes these computations. This is a good idea in general, especially for Wasm.
Reporter | ||
Comment 1•3 years ago
•
|
||
There's also the consideration that if the offset varies but the base pointer is constant, it's the heapptr+baseptr that should be commoned and the offset should be used with that sum to make the load a single instruction. Consider a dot product:
(local.set $tmp (i32.load (local.get $a)))
(local.set $sum (i32.mul (local.get $tmp) (i32.load offset=0 (local.get $b))))
(local.set $sum (i32.add (local.get $sum) (i32.mul (local.get $tmp) (i32.load offset=4 (local.get $b)))))
(local.set $sum (i32.add (local.get $sum) (i32.mul (local.get $tmp) (i32.load offset=8 (local.get $b)))))
(local.set $sum (i32.add (local.get $sum) (i32.mul (local.get $tmp) (i32.load offset=12 (local.get $b)))))
which turns into this:
0x3427e36c4034 b8606aa0 ldr w0, [x21, x0]
0x3427e36c4038 b8616aa2 ldr w2, [x21, x1]
0x3427e36c403c 1b007c42 mul w2, w2, w0
0x3427e36c4040 91001030 add x16, x1, #0x4 (4)
0x3427e36c4044 b8706aa3 ldr w3, [x21, x16]
0x3427e36c4048 1b007c63 mul w3, w3, w0
0x3427e36c404c 0b030042 add w2, w2, w3
0x3427e36c4050 91002030 add x16, x1, #0x8 (8)
0x3427e36c4054 b8706aa3 ldr w3, [x21, x16]
0x3427e36c4058 1b007c63 mul w3, w3, w0
0x3427e36c405c 0b030042 add w2, w2, w3
0x3427e36c4060 91003030 add x16, x1, #0xc (12)
0x3427e36c4064 b8706aa1 ldr w1, [x21, x16]
0x3427e36c4068 1b007c20 mul w0, w1, w0
0x3427e36c406c 0b000040 add w0, w2, w0
but which could be this (ignore addresses and code bytes):
0x3427e36c4034 b8606aa0 ldr w0, [x21, x0]
........ add x4, x21, x1
0x3427e36c4038 b8616aa2 ldr w2, [x4]
0x3427e36c403c 1b007c42 mul w2, w2, w0
0x3427e36c4044 b8706aa3 ldr w3, [x4, 4]
0x3427e36c4048 1b007c63 mul w3, w3, w0
0x3427e36c404c 0b030042 add w2, w2, w3
0x3427e36c4054 b8706aa3 ldr w3, [x4, 8]
0x3427e36c4058 1b007c63 mul w3, w3, w0
0x3427e36c405c 0b030042 add w2, w2, w3
0x3427e36c4064 b8706aa1 ldr w1, [x4, 12]
0x3427e36c4068 1b007c20 mul w0, w1, w0
0x3427e36c406c 0b000040 add w0, w2, w0
Reporter | ||
Updated•3 years ago
|
Description
•