ARM64 signature check + code alignment constraints can generate nopfill
Categories
(Core :: JavaScript: WebAssembly, enhancement, P3)
Tracking
()
People
(Reporter: lth, Unassigned)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
For large immediate signatures, the checking code can turn into this:
0x1729efa0e000 d10043ff sub sp, sp, #0x10 (16)
0x1729efa0e004 f90007fe str x30, [sp, #8]
0x1729efa0e008 f90003fd str x29, [sp]
0x1729efa0e00c 910003fd mov x29, sp
0x1729efa0e010 529254f0 mov w16, #0x92a7
0x1729efa0e014 72a00490 movk w16, #0x24, lsl #16
0x1729efa0e018 6b10015f cmp w10, w16
0x1729efa0e01c 54000120 b.eq #+0x24 (addr 0x1729efa0e040)
0x1729efa0e020 d4a00000 unimplemented (Exception)
0x1729efa0e024 d503201f nop
0x1729efa0e028 d503201f nop
0x1729efa0e02c d503201f nop
where the first four instructions are the fixed prologue, followed by the signature check, the wasm code here is:
(module
(func (param i64) (param i64) (param i64) (param i64) (param i64) (result i64)
(i64.and (local.get 1) (i64.const 64))))
The nopfill is a result of requiring 16-byte alignment for the unchecked entry. It is possible this could be reduced to 8 bytes. It is also possible that fixing the prologue to use stp will change the calculus for this (bug 1705495). Either way we should try to pay attention to bloat here. It would be better for code size to load the constant pc-relative.
When the signature is no longer representable as a constant, the pointer that represents it is stored in the tls, and we load relative from the tls. In a situation when there are many different signatures, the offset into the tls may overflow and we may again get a several instructions here to load the offset (at least a couple), with nopfill resulting. Try this:
(module
(func (param i64) (param i64) (param i64) (param i64) (param i64) (param i64) (param i64) (param i64) (result i64)
(i64.and (local.get 1) (i64.const 64))))
| Reporter | ||
Comment 1•4 years ago
|
||
In truth, this is probably an issue on x86 too, it's just more obvious on arm64 since large arm64 constants really bloat the code, and arm64 code size is more relevant b/c of mobile.
| Reporter | ||
Comment 2•4 years ago
•
|
||
Looking at TypeIdDesc::immediate, it's clear that while the signature representation is relatively compact, we can do better for C/C++ type code to avoid loading immediates before the signature check. There are several approaches. For both assume there is a single tag bit to distinguish "compact" (0) from "other" (1). (Then "other" becomes the representation we use now but with one extra tag bit.)
One compact representation admits only i32/i64/f32/f64 for the argument and return types; signals the presence of a return type with a single bit; adds the return type at the end of the array of types so that it doesn't need any bits if the type is absent; limits the length field to 3 bits. Thus we have a shared overhead of five bits + 2 bits per type * (up to 7 argument types and one return type) = maximum 21 bits but more typically 3-4 types, so 11-13 bits, which will sometimes fit in an immediate in the compare and otherwise be a single move immediate to set it up.
Another compact representation has a dictionary of common signatures (computed from a corpus) that maps each signature to an integer and uses the normal typedesc immediate as a key to lookup a compact type from this corpus at compile time. The tag bit ensures there's no confusion between these compressed signatures. Now we can arbitrarily limit compact signatures to something that fits in the compare instruction. Some signatures will not be assigned a value and will end up being represented using the normal immediate. This approach is anyway nice because it is not limited to a specific C/C++ subset of types; it applies to all types and does not discriminate.
A third idea is that signatures that are encodable in a small immediate using the existing system should be left alone and we could bias a system in favor of signatures that are common but not encodable using the existing system; this effectively increases the reach of the dictionary approach, for example.
Before undertaking this work, we should try to get data on (via a corpus analysis or just a search of the type space) whether we stand to gain much in practice.
Description
•