Closed Bug 1062067 Opened 10 years ago Closed 6 years ago

IonMonkey: backtracking register allocator bad SIMD code generation.

Categories

(Core :: JavaScript Engine: JIT, defect, P5)

x86_64
All
defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: dougc, Unassigned)

References

Details

Attachments

(2 files)

Unaligned stack spilling is reported in bug 1060789 comment 12.

   0x00007f2629a030a8:	cvtsd2ss %xmm0,%xmm1
   0x00007f2629a030ac:	shufps $0x0,%xmm1,%xmm1
   0x00007f2629a030b0:	movss  0x4(%rsp),%xmm0
   0x00007f2629a030b6:	shufps $0x0,%xmm0,%xmm0
   0x00007f2629a030ba:	movaps %xmm0,%xmm2
   0x00007f2629a030bd:	movaps %xmm0,0x4(%rsp)    <<<<

Focusing on the allocation of this spill location in the Ion debug output gives

[RegAlloc] Allocating v15[1] [34,192) v15:r?@34 v15:r?@34 v15:r?@137 v15:r?@143 [priority 158] [weight 25]
[RegAlloc]   Hint xmm0, used by group allocation
[RegAlloc]   xmm0 collides with v68[0] [147,151) [weight 1000]
[RegAlloc]   Spilling interval
[RegAlloc]     Reusing group spill location stack:12  <<<<

  v15[0] req(r,xmm0?) has(xmm0) [33,34) / v15[1] req(xmm0?) has(stack:12) [34,192) v15:r?@34 v15:r?@34 v15:r?@137 v15:r?@143

[MoveGroup] [stack:12 -> xmm0]
[32,33 SimdSplatX4] [def v15<(null)>:xmm0] [use v8:r xmm0]
[MoveGroup] [xmm0 -> xmm2] [xmm0 -> stack:12]    <<<<


[Codegen] instruction MoveGroup
[Codegen] movss      0x4(%rsp), %xmm0
[Codegen] instruction SimdSplatX4
[Codegen] shufps     0x0, %xmm0, %xmm0
[Codegen] instruction MoveGroup
[Codegen] movaps     %xmm0, %xmm2
[Codegen] movaps     %xmm0, 0x4(%rsp)        <<<<
Simply disabling the reuse of group spill locations worked around the problem. Not sure yet if it actually fixed it or just happened to make the SIMD spill aligned.

Might there be a problem adding to groups, or reusing spill locations?

Do we need to update some checks to make sure the locations are compatible for SIMD?
Flags: needinfo?(sunfish)
Yes, there very well could be a problem with how we share a stack slot with all members of a group here. We want to group registers with different types, and that can now mean that the spill slot for one may not be suitably aligned for the other.

Ideally, we'd like to leave grouping and stack slot sharing unconstrained by type mismatches, and just make sure that when multiple registers with differing types share a stack slot, the stack slot is aligned for any of the types, but offhand I don't know if that's feasible within the current framework.

A less ideal but not bad fix would be to disable stack slot sharing in the case where there are multiple types involved.
Flags: needinfo?(sunfish)
Is the issue really the spill? Just a wild guess, but it seems the choice of register is wrong in the first place:

0x00007f2629a030b0:	movss  0x4(%rsp),%xmm0     // we load the input X to float32x4.splat here
0x00007f2629a030b6:	shufps $0x0,%xmm0,%xmm0    // we apply shufps on the same register (because of defineReuseInput)
0x00007f2629a030ba:	movaps %xmm0,%xmm2         // we copy the result of shufps
0x00007f2629a030bd:	movaps %xmm0,0x4(%rsp)     // we try to put X back to its original position

It sounds like the register allocator should copy xmm0 before applying shufps, as shufps clobbers its input, and use the copy to put it back to 0x4($rsp), so this code should look more like:

movss 0x4($sp), $xmm0
movss $xmm0, $xmm1
shufps $0x0, $xmm0, $xmm0
// maybe copy out $xmm0 here, according to next requirements
movss $xmm1, 0x4($sp)

Does that make sense, or am I missing something?
Attached file fbird.shell.js
Here is a version of fbirds in the shell that shows the issue, once you've applied bug 1061637's patch. Hope that will help.
(In reply to Benjamin Bouvier [:bbouvier] from comment #3)
> Is the issue really the spill? Just a wild guess, but it seems the choice of
> register is wrong in the first place:
> 
> 0x00007f2629a030b0:	movss  0x4(%rsp),%xmm0     // we load the input X to
> float32x4.splat here
> 0x00007f2629a030b6:	shufps $0x0,%xmm0,%xmm0    // we apply shufps on the
> same register (because of defineReuseInput)
> 0x00007f2629a030ba:	movaps %xmm0,%xmm2         // we copy the result of
> shufps
> 0x00007f2629a030bd:	movaps %xmm0,0x4(%rsp)     // we try to put X back to
> its original position
> 
> It sounds like the register allocator should copy xmm0 before applying
> shufps, as shufps clobbers its input, and use the copy to put it back to
> 0x4($rsp), so this code should look more like:
> 
> movss 0x4($sp), $xmm0
> movss $xmm0, $xmm1
> shufps $0x0, $xmm0, $xmm0
> // maybe copy out $xmm0 here, according to next requirements
> movss $xmm1, 0x4($sp)
> 
> Does that make sense, or am I missing something?

That does not look right. Note that the input is not needed after shufps so there is no need to save it across the instruction. Looks like an issue spilling to the stack.
Priority: -- → P5
This has been dormant for some time, and the test case was using SIMD.js, which is being removed. Let's close it and re-open another bug if this happens again with wasm SIMD when we get to implement it.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: