Here's an example: ``` wasmDis(new WebAssembly.Module(wasmTextToBinary(` (module (func (param v128) (param v128) (result v128) (i32x4.add (local.get 1) (local.get 0)))) `))) ``` The core of the code generated for this is: ``` 00000024 66 0f 6f d1 movdqa %xmm1, %xmm2 00000028 66 0f 6f ca movdqa %xmm2, %xmm1 0000002C 66 0f fe c8 paddd %xmm0, %xmm1 00000030 66 0f 6f c1 movdqa %xmm1, %xmm0 ``` which is not great. FP addition is commutative so the optimal code is just the addition; almost-optimal code has at most one move (the one at the end). But even ignoring that, the first two moves are clearly redundant, as xmm2 is dead and the move accomplishes nothing. (Note the swapped operand order. If the body were param0 + param1 then the machine code is simply the paddd.) Regalloc reports that the input is this: ``` [RegAlloc] [2,3 WasmParameter] [def v1<simd128>:%xmm0.i4] [RegAlloc] [4,5 WasmParameter] [def v2<simd128>:%xmm1.i4] [RegAlloc] [6,7 WasmParameter] [def v3<g>:r14] [RegAlloc] [8,9 WasmBinarySimd128] [def v4<simd128>:tied(0)] [use v2:r?] [use v1:r] [RegAlloc] [10,11 WasmReturn] [use v4:%xmm0.d] [use v3:r14] ``` but at the end the IR looks like this: ``` [RegAlloc] [2,3 WasmParameter] [def v1<simd128>:%xmm0.i4] [RegAlloc] [4,5 WasmParameter] [def v2<simd128>:%xmm1.i4] [RegAlloc] [MoveGroup] [%xmm1.i4 -> %xmm2.i4] [RegAlloc] [6,7 WasmParameter] [def v3<g>:r14] [RegAlloc] [MoveGroup] [%xmm2.i4 -> %xmm1.i4] [RegAlloc] [8,9 WasmBinarySimd128] [def v4<simd128>:%xmm1.i4] [use v2:r %xmm1.i4] [use v1:r %xmm0.i4] [RegAlloc] [MoveGroup] [%xmm1.i4 -> %xmm0.i4] [RegAlloc] [10,11 WasmReturn] [use v4:%xmm0.d %xmm0.i4] [use v3:r14 r14] ```
Bug 1701164 Comment 1 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
Here's an example: ``` wasmDis(new WebAssembly.Module(wasmTextToBinary(` (module (func (param v128) (param v128) (result v128) (i32x4.add (local.get 1) (local.get 0)))) `))) ``` The core of the code generated for this is: ``` 00000024 66 0f 6f d1 movdqa %xmm1, %xmm2 00000028 66 0f 6f ca movdqa %xmm2, %xmm1 0000002C 66 0f fe c8 paddd %xmm0, %xmm1 00000030 66 0f 6f c1 movdqa %xmm1, %xmm0 ``` which is not great. Integer SIMD addition is commutative so the optimal code is just the addition; almost-optimal code has at most one move (the one at the end). But even ignoring that, the first two moves are clearly redundant, as xmm2 is dead and the move accomplishes nothing. (Note the swapped operand order. If the body were param0 + param1 then the machine code is simply the paddd.) Regalloc reports that the input is this: ``` [RegAlloc] [2,3 WasmParameter] [def v1<simd128>:%xmm0.i4] [RegAlloc] [4,5 WasmParameter] [def v2<simd128>:%xmm1.i4] [RegAlloc] [6,7 WasmParameter] [def v3<g>:r14] [RegAlloc] [8,9 WasmBinarySimd128] [def v4<simd128>:tied(0)] [use v2:r?] [use v1:r] [RegAlloc] [10,11 WasmReturn] [use v4:%xmm0.d] [use v3:r14] ``` but at the end the IR looks like this: ``` [RegAlloc] [2,3 WasmParameter] [def v1<simd128>:%xmm0.i4] [RegAlloc] [4,5 WasmParameter] [def v2<simd128>:%xmm1.i4] [RegAlloc] [MoveGroup] [%xmm1.i4 -> %xmm2.i4] [RegAlloc] [6,7 WasmParameter] [def v3<g>:r14] [RegAlloc] [MoveGroup] [%xmm2.i4 -> %xmm1.i4] [RegAlloc] [8,9 WasmBinarySimd128] [def v4<simd128>:%xmm1.i4] [use v2:r %xmm1.i4] [use v1:r %xmm0.i4] [RegAlloc] [MoveGroup] [%xmm1.i4 -> %xmm0.i4] [RegAlloc] [10,11 WasmReturn] [use v4:%xmm0.d %xmm0.i4] [use v3:r14 r14] ```