Ion: global constants stores require a temporary register; f32 global/heap stores could use integer instructions
Categories
(Core :: JavaScript: WebAssembly, enhancement, P3)
Tracking
()
People
(Reporter: bbouvier, Unassigned)
References
(Blocks 1 open bug)
Details
In asmjs mode, on x86 and x64, storing constants require a temporary register (except for storing int in the heap in x64 architecture). It might be the same on ARM platform. --- For instance, the following code: var mymod = function(stdlib, ffi, heap) { "use asm"; var H32 = new stdlib.Int32Array(heap); var F32 = new stdlib.Float32Array(heap); var myGlobalInt = 0 var myGlobalFloat = 0.0 function storeGlobal() { myGlobalInt = 42; myGlobalFloat = 42.0; } function storeHeap() { H32[ 0 >> 2 ] = 1337; F32[ 0 >> 2 ] = 13.37; } } generates the following codes (I skipped the stack overflow checks): //storeGlobal (x64) 0x7ffff7ff2097: mov $0x2a,%eax 0x7ffff7ff209c: mov %eax,0x128e(%rip) # 0x7ffff7ff3330 0x7ffff7ff20a2: movabs $0x4045000000000000,%r11 0x7ffff7ff20ac: movq %r11,%xmm0 0x7ffff7ff20b1: movsd %xmm0,0x127f(%rip) # 0x7ffff7ff3338 0x7ffff7ff20b9: retq //storeHeap (x64) 0x7ffff7ff20cf: xor %eax,%eax 0x7ffff7ff20d1: movl $0x539,(%r15,%rax,1) // this one doesn't use any supplementary register 0x7ffff7ff20d9: movabs $0x402abd70a3d70a3d,%r11 0x7ffff7ff20e3: movq %r11,%xmm0 0x7ffff7ff20e8: xor %eax,%eax 0x7ffff7ff20ea: cvtsd2ss %xmm0,%xmm15 0x7ffff7ff20ef: movss %xmm15,(%r15,%rax,1) 0x7ffff7ff20f5: retq //storeGlobal (x86) 0xf767d0b4: mov $0x2a,%eax 0xf767d0b9: mov %eax,0xf767e1fc 0xf767d0be: movsd 0xf767e1e0,%xmm0 0xf767d0c6: movsd %xmm0,0xf767e204 0xf767d0ce: ret //storeHeap (x86) 0xf767d0dc: xor %eax,%eax 0xf767d0de: mov $0x539,%ecx 0xf767d0e3: cmp $0x8000,%eax 0xf767d0e9: jae 0xf767d0f5 0xf767d0ef: mov %ecx,0x8b08758(%eax) 0xf767d0f5: movsd 0xf767e1e8,%xmm0 0xf767d0fd: xor %eax,%eax 0xf767d0ff: cmp $0x8000,%eax 0xf767d105: jae 0xf767d117 0xf767d10b: cvtsd2ss %xmm0,%xmm7 0xf767d10f: movss %xmm7,0x8b08758(%eax) 0xf767d117: ret
Reporter | ||
Comment 1•11 years ago
|
||
Found this in Lowering-x86.cpp:visitAsmJSStoreHeap: "// For now, don't allow constants. The immediate operand affects // instruction layout which affects patching." In the mean time, constants are used for x64, which also requires patching, although integer store heap operations use immediate and no register. Is it not doable in x86, or this comment is outdated?
Comment 2•11 years ago
|
||
(In reply to Benjamin Bouvier [:bbouvier] from comment #1) > Found this in Lowering-x86.cpp:visitAsmJSStoreHeap: > "// For now, don't allow constants. The immediate operand affects > // instruction layout which affects patching." > > In the mean time, constants are used for x64, which also requires patching, > although integer store heap operations use immediate and no register. > > Is it not doable in x86, or this comment is outdated? The issue is the encoding of the instruction 'mov %ecx,0x8b08758(%eax)'. The x86 does support an immediate source 'mov $0x539,0x8b08758(%eax)', but this instruction needs patching for the heap base offset and the code for patching the instruction appears to assume the offset is at the end of the instruction, and based on the comments this is probably not correct is there is an immediate source. It would be useful to avoid the register allocation when storing a constant integer, and it should be possible to decode the instruction and patch the appropriate location. The ARM instructions do not support an immediate source for these moves. The x64 does not need to patch these instructions.
Comment 3•11 years ago
|
||
Good analysis. Given that we can record per-heap-access data (in AsmJSHeapAccess), it wouldn't be too hard to record the correct offset to patch. I was just lazy :) Also, good point that we can do this immediately on x64.
Assignee | ||
Updated•10 years ago
|
Comment 4•6 years ago
|
||
Per policy at https://wiki.mozilla.org/Bug_Triage/Projects/Bug_Handling/Bug_Husbandry#Inactive_Bugs. If this bug is not an enhancement request or a bug not present in a supported release of Firefox, then it may be reopened.
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 5•5 years ago
|
||
Ion optimizations are getting less relevant as we implement Cranelift for wasm.
Comment 6•3 years ago
|
||
We don't care about asm.js any longer but we do care about wasm, so let's look:
wasmDis(new WebAssembly.Module(wasmTextToBinary(`
(module
(memory 1)
(func $heap
(i32.store (i32.const 16) (i32.const 0))
(i32.store (i32.const 16) (i32.const 37))
(f32.store (i32.const 16) (f32.const 0.0))
(f32.store (i32.const 16) (f32.const 42.0))))
`)));
which yields (on x64)
00000014 41 c7 47 10 00 00 00 00 movl $0x00, 0x10(%r15)
0000001C 41 c7 47 10 25 00 00 00 movl $0x25, 0x10(%r15)
00000024 0f 57 c0 xorps %xmm0, %xmm0
00000027 f3 41 0f 11 47 10 movssl %xmm0, 0x10(%r15)
0000002D f3 0f 10 05 0b 00 00 00 movssl 0x0000000000000040, %xmm0
00000035 f3 41 0f 11 47 10 movssl %xmm0, 0x10(%r15)
which is optimal for integers. For the floating point values, it would be better to just store the bit patterns using integer instructions (and one could also discuss whether moving the bit pattern for 42.0f into a gpr and moving that to the fpr would not be faster than loading from memory into the fpr), but barring those types of optimizations the current code is about as good as it gets. Storing FP constants other than 0.0 is probably somewhat rare, too.
The ARM64 code is pretty good too:
0x38c12034 52800000 mov w0, #0x0
0x38c12038 b90012a0 str w0, [x21, #16]
0x38c1203c 528004a0 mov w0, #0x25
0x38c12040 b90012a0 str w0, [x21, #16]
0x38c12044 1e2703e0 fmov s0, wzr
0x38c12048 bd0012a0 str s0, [x21, #16]
0x38c1204c 1c000120 ldr s0, pc+36 (addr 0x38c12070)
0x38c12050 bd0012a0 str s0, [x21, #16]
although clearly the mov on ..34 is redundant, the store on ..38 could just store wzr, that's bug 1710087. Otherwise, the same comments apply as for x64.
Comment 7•3 years ago
|
||
Looking at globals:
wasmDis(new WebAssembly.Module(wasmTextToBinary(`
(module
(memory 1)
(global $f (mut f32) (f32.const 13.0))
(global $g (mut i32) (i32.const 12))
(func $heap
(global.set $g (i32.const 0))
(global.set $g (i32.const 37))
(global.set $f (f32.const 0.0))
(global.set $f (f32.const 42.0))))
`)));
we see (x64)
00000014 33 c0 xor %eax, %eax
00000016 41 89 46 64 movl %eax, 0x64(%r14)
0000001A b8 25 00 00 00 mov $0x25, %eax
0000001F 41 89 46 64 movl %eax, 0x64(%r14)
00000023 0f 57 c0 xorps %xmm0, %xmm0
00000026 f3 41 0f 11 46 60 movssl %xmm0, 0x60(%r14)
0000002C f3 0f 10 05 08 00 00 00 movssl 0x000000000000003C, %xmm0
00000034 f3 41 0f 11 46 60 movssl %xmm0, 0x60(%r14)
which is interestingly less good than it was for the heap stores.
ARM64:
0x1dea82bcf034 52800000 mov w0, #0x0
0x1dea82bcf038 b90066e0 str w0, [x23, #100]
0x1dea82bcf03c 528004a0 mov w0, #0x25
0x1dea82bcf040 b90066e0 str w0, [x23, #100]
0x1dea82bcf044 1e2703e0 fmov s0, wzr
0x1dea82bcf048 bd0062e0 str s0, [x23, #96]
0x1dea82bcf04c 1c000120 ldr s0, pc+36 (addr 0x1dea82bcf070)
0x1dea82bcf050 bd0062e0 str s0, [x23, #96]
is no worse than the code for heap stores since the constants have to be set up anyway.
Updated•3 years ago
|
Description
•