Open Bug 877321 Opened 11 years ago Updated 2 years ago

Ion: global constants stores require a temporary register; f32 global/heap stores could use integer instructions

Categories

(Core :: JavaScript: WebAssembly, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: bbouvier, Unassigned)

References

(Blocks 1 open bug)

Details

In asmjs mode, on x86 and x64, storing constants require a temporary register (except for storing int in the heap in x64 architecture).
It might be the same on ARM platform.

---
For instance, the following code:

var mymod = function(stdlib, ffi, heap)
{
    "use asm";
    var H32 = new stdlib.Int32Array(heap);
    var F32 = new stdlib.Float32Array(heap);

    var myGlobalInt = 0
    var myGlobalFloat = 0.0

    function storeGlobal()
    {
        myGlobalInt = 42;
        myGlobalFloat = 42.0;
    }

    function storeHeap()
    {
        H32[ 0 >> 2 ] = 1337;
        F32[ 0 >> 2 ] = 13.37;
    }
}

generates the following codes (I skipped the stack overflow checks):

//storeGlobal (x64)
   0x7ffff7ff2097:	mov    $0x2a,%eax
   0x7ffff7ff209c:	mov    %eax,0x128e(%rip)        # 0x7ffff7ff3330
   0x7ffff7ff20a2:	movabs $0x4045000000000000,%r11
   0x7ffff7ff20ac:	movq   %r11,%xmm0
   0x7ffff7ff20b1:	movsd  %xmm0,0x127f(%rip)        # 0x7ffff7ff3338
   0x7ffff7ff20b9:	retq

//storeHeap (x64)
   0x7ffff7ff20cf:	xor    %eax,%eax
   0x7ffff7ff20d1:	movl   $0x539,(%r15,%rax,1)      // this one doesn't use any supplementary register
   0x7ffff7ff20d9:	movabs $0x402abd70a3d70a3d,%r11
   0x7ffff7ff20e3:	movq   %r11,%xmm0
   0x7ffff7ff20e8:	xor    %eax,%eax
   0x7ffff7ff20ea:	cvtsd2ss %xmm0,%xmm15
   0x7ffff7ff20ef:	movss  %xmm15,(%r15,%rax,1)
   0x7ffff7ff20f5:	retq

//storeGlobal (x86)
   0xf767d0b4:	mov    $0x2a,%eax
   0xf767d0b9:	mov    %eax,0xf767e1fc
   0xf767d0be:	movsd  0xf767e1e0,%xmm0
   0xf767d0c6:	movsd  %xmm0,0xf767e204
   0xf767d0ce:	ret

//storeHeap (x86)
   0xf767d0dc:	xor    %eax,%eax
   0xf767d0de:	mov    $0x539,%ecx
   0xf767d0e3:	cmp    $0x8000,%eax
   0xf767d0e9:	jae    0xf767d0f5
   0xf767d0ef:	mov    %ecx,0x8b08758(%eax)
   0xf767d0f5:	movsd  0xf767e1e8,%xmm0
   0xf767d0fd:	xor    %eax,%eax
   0xf767d0ff:	cmp    $0x8000,%eax
   0xf767d105:	jae    0xf767d117
   0xf767d10b:	cvtsd2ss %xmm0,%xmm7
   0xf767d10f:	movss  %xmm7,0x8b08758(%eax)
   0xf767d117:	ret
Found this in Lowering-x86.cpp:visitAsmJSStoreHeap:
"// For now, don't allow constants. The immediate operand affects
// instruction layout which affects patching."

In the mean time, constants are used for x64, which also requires patching, although integer store heap operations use immediate and no register.

Is it not doable in x86, or this comment is outdated?
Flags: needinfo?(luke)
(In reply to Benjamin Bouvier [:bbouvier] from comment #1)
> Found this in Lowering-x86.cpp:visitAsmJSStoreHeap:
> "// For now, don't allow constants. The immediate operand affects
> // instruction layout which affects patching."
> 
> In the mean time, constants are used for x64, which also requires patching,
> although integer store heap operations use immediate and no register.
> 
> Is it not doable in x86, or this comment is outdated?

The issue is the encoding of the instruction 'mov %ecx,0x8b08758(%eax)'.
The x86 does support an immediate source 'mov $0x539,0x8b08758(%eax)',
but this instruction needs patching for the heap base offset and
the code for patching the instruction appears to assume the
offset is at the end of the instruction, and based on the comments
this is probably not correct is there is an immediate source.

It would be useful to avoid the register allocation when storing
a constant integer, and it should be possible to decode the
instruction and patch the appropriate location.

The ARM instructions do not support an immediate source for these
moves.  The x64 does not need to patch these instructions.
Flags: needinfo?(luke)
Good analysis.  Given that we can record per-heap-access data (in AsmJSHeapAccess), it wouldn't be too hard to record the correct offset to patch.  I was just lazy :)

Also, good point that we can do this immediately on x64.
Assignee: general → nobody
Per policy at https://wiki.mozilla.org/Bug_Triage/Projects/Bug_Handling/Bug_Husbandry#Inactive_Bugs. If this bug is not an enhancement request or a bug not present in a supported release of Firefox, then it may be reopened.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INACTIVE
Status: RESOLVED → REOPENED
Resolution: INACTIVE → ---
Component: JavaScript Engine → Javascript: WebAssembly

Ion optimizations are getting less relevant as we implement Cranelift for wasm.

Type: defect → enhancement
Priority: -- → P5

We don't care about asm.js any longer but we do care about wasm, so let's look:

wasmDis(new WebAssembly.Module(wasmTextToBinary(`
(module
  (memory 1)
  (func $heap
    (i32.store (i32.const 16) (i32.const 0))
    (i32.store (i32.const 16) (i32.const 37))
    (f32.store (i32.const 16) (f32.const 0.0))
    (f32.store (i32.const 16) (f32.const 42.0))))
`)));

which yields (on x64)

00000014  41 c7 47 10 00 00 00 00   movl $0x00, 0x10(%r15)
0000001C  41 c7 47 10 25 00 00 00   movl $0x25, 0x10(%r15)
00000024  0f 57 c0                  xorps %xmm0, %xmm0
00000027  f3 41 0f 11 47 10         movssl %xmm0, 0x10(%r15)
0000002D  f3 0f 10 05 0b 00 00 00   movssl 0x0000000000000040, %xmm0
00000035  f3 41 0f 11 47 10         movssl %xmm0, 0x10(%r15)

which is optimal for integers. For the floating point values, it would be better to just store the bit patterns using integer instructions (and one could also discuss whether moving the bit pattern for 42.0f into a gpr and moving that to the fpr would not be faster than loading from memory into the fpr), but barring those types of optimizations the current code is about as good as it gets. Storing FP constants other than 0.0 is probably somewhat rare, too.

The ARM64 code is pretty good too:

0x38c12034  52800000  mov     w0, #0x0
0x38c12038  b90012a0  str     w0, [x21, #16]
0x38c1203c  528004a0  mov     w0, #0x25
0x38c12040  b90012a0  str     w0, [x21, #16]
0x38c12044  1e2703e0  fmov    s0, wzr
0x38c12048  bd0012a0  str     s0, [x21, #16]
0x38c1204c  1c000120  ldr     s0, pc+36 (addr 0x38c12070)
0x38c12050  bd0012a0  str     s0, [x21, #16]

although clearly the mov on ..34 is redundant, the store on ..38 could just store wzr, that's bug 1710087. Otherwise, the same comments apply as for x64.

See Also: → 1710087

Looking at globals:

wasmDis(new WebAssembly.Module(wasmTextToBinary(`
(module
  (memory 1)
  (global $f (mut f32) (f32.const 13.0))
  (global $g (mut i32) (i32.const 12))
  (func $heap
    (global.set $g (i32.const 0))
    (global.set $g (i32.const 37))
    (global.set $f (f32.const 0.0))
    (global.set $f (f32.const 42.0))))
`)));

we see (x64)

00000014  33 c0                     xor %eax, %eax
00000016  41 89 46 64               movl %eax, 0x64(%r14)
0000001A  b8 25 00 00 00            mov $0x25, %eax
0000001F  41 89 46 64               movl %eax, 0x64(%r14)
00000023  0f 57 c0                  xorps %xmm0, %xmm0
00000026  f3 41 0f 11 46 60         movssl %xmm0, 0x60(%r14)
0000002C  f3 0f 10 05 08 00 00 00   movssl 0x000000000000003C, %xmm0
00000034  f3 41 0f 11 46 60         movssl %xmm0, 0x60(%r14)

which is interestingly less good than it was for the heap stores.

ARM64:

0x1dea82bcf034  52800000  mov     w0, #0x0
0x1dea82bcf038  b90066e0  str     w0, [x23, #100]
0x1dea82bcf03c  528004a0  mov     w0, #0x25
0x1dea82bcf040  b90066e0  str     w0, [x23, #100]
0x1dea82bcf044  1e2703e0  fmov    s0, wzr
0x1dea82bcf048  bd0062e0  str     s0, [x23, #96]
0x1dea82bcf04c  1c000120  ldr     s0, pc+36 (addr 0x1dea82bcf070)
0x1dea82bcf050  bd0062e0  str     s0, [x23, #96]

is no worse than the code for heap stores since the constants have to be set up anyway.

Severity: normal → N/A
Status: REOPENED → NEW
Priority: P5 → P3
Hardware: x86 → All
Summary: OdinMonkey: {global,heap} constants stores require a temporary register → Ion: global constants stores require a temporary register; f32 global/heap stores could use integer instructions
Blocks: 1687630
You need to log in before you can comment on or make changes to this bug.