Improve alias analysis for nbody
Categories
(Core :: JavaScript Engine: JIT, task, P2)
Tracking
()
People
(Reporter: iain, Unassigned)
References
(Blocks 1 open bug)
Details
There are a number of hotspots in JS3 where Ion simply generates slower code than Turbofan (with no/minimal calls to builtins, etc). NBodySystem.prototype.advance in n-body-SP is one of the hottest. I did an instruction-by-instruction comparison of the generated assembly for a cut-down version:
NBodySystem.prototype.advance = function(dt){
var size = this.bodies.length;
for (var i=0; i<size; i++) {
var body = this.bodies[i];
body.x += dt * body.vx;
}
}
Here are the hot inner loops. First, V8 (note that V8 peels one iteration from the start of the loop, and size is always 5 here, so the sample counts are slightly misleading):
TOP-OF-LOOP:
49 mov r9d, r12d
// check loop exit condition
27 cmp r9d, r8d
jnb 0x1e37cc
34 lea r12d, dword [r9 + 0x1]
// load bodies[i]
43 mov eax, dword [rcx + r9 * 4 + 0x7]
40 add rax, r14
// Guard that it's not a SMI
31 test al, 0x1
jz 0x1e3997
// Shape-guard bodies[i]
38 mov ebx, dword [rax - 0x1]
27 add rbx, r14
33 cmp ebx, r11d
jnz 0x1e38b5
// Load bodies[i].x
54 mov r9d, dword [rax + 0xb]
38 add r9, r14
// Load bodies[i].vx
46 mov eax, dword [rax + 0x17]
// Multiply bodies[i].vx by dt
49 vmulsd xmm0, xmm2, qword [r14 + rax * 1 + 0x3]
// Add this.bodies[i].x
92 vaddsd xmm0, xmm0, qword [r9 + 0x3]
// Store it back
54 vmovsd qword [r9 + 0x3], xmm0
// Interrupt check / backedge
58 cmp byte [r13 - 0x4f], 0x0
37 jz 0x1e3774 (TOP-OF-LOOP)
Second, SM:
InterruptCheck
22 mov r11, 0x716d3d835aac
60 cmp dword [r11], 0x0
105 jnz 0xd677
CompareAndBranch
82 cmp edx, ecx
jge 0xd5cc
LoadFixedSlotAndUnbox // this.bodies
102 mov r11, -0x2000000000000
10 xor r11, qword [rax + 0x18]
73 mov rbx, r11
1 shr r11, 0x2f
49 jnz 0xd6a1
GuardShape // this.bodies
33 mov r11, 0x319f1966c840
36 cmp qword [rbx], r11
1 jnz 0xd6a8
Elements // this.bodies[]
92 mov rsi, qword [rbx + 0x10]
InitializedLength
63 mov ebx, dword [rsi - 0xc]
BoundsCheck
32 cmp ebx, edx
2 jna 0xd6af
LoadElementAndUnbox // this.bodies[i]
72 mov r11, -0x2000000000000
xor r11, qword [rsi + rdx * 8]
40 mov rbx, r11
25 shr r11, 0x2f
28 jnz 0xd6b9
GuardShape // this.bodies[i]
318 mov r11, 0x319f196753a0
113 cmp qword [rbx], r11
3 jnz 0xd6c3
LoadFixedSlotAndUnbox // this.bodies[i].x
40 mov r11, qword [rbx + 0x18]
78 shr r11, 0x2f
18 cmp r11d, 0x1fff0
jna 0xd546
cmp dword [rbx + 0x1c], -0x78000
jnz 0xd6cd
xorpd xmm1, xmm1
cvtsi2sd xmm1, dword [rbx + 0x18]
jmp 0xd54b
62 vmovsd xmm1, qword [rbx + 0x18]
LoadFixedSlotAndUnbox // this.bodies[i].vx
109 mov r11, qword [rbx + 0x30]
51 shr r11, 0x2f
462 cmp r11d, 0x1fff0
jna 0xd57b
cmp dword [rbx + 0x34], -0x78000
jnz 0xd6d7
xorpd xmm2, xmm2
cvtsi2sd xmm2, dword [rbx + 0x30]
jmp 0xd580
90 vmovsd xmm2, qword [rbx + 0x30]
MathD
150 vmulsd xmm2, xmm0, xmm2
MathD
239 addsd xmm1, xmm2
StoreFixedSlotT // this.bodies[i].x
307 mov r11, 0x716d3d84fe10
19 test dword [r11], 0x1
133 jz 0xd5bf
mov r11, qword [rbx + 0x18]
shr r11, 0x2f
cmp r11d, 0x1fff6
jb 0xd5bf
push rdx
lea rdx, qword [rbx + 0x18]
call 0xfffffffffffeba15
pop rdx
28 vmovsd qword [rbx + 0x18], xmm1
AddI
129 add edx, 0x1
jmp 0xd48f
The obvious difference here is that we have way more code in the loop. In particular, we're loading this.bodies.elements each time through the loop, whereas V8 correctly identifies that it doesn't alias anything in the loop and can be hoisted out.
LICM doesn't hoist anything because everything depends on this.bodies, and alias analysis indicates that the load of this.bodies could alias with the store of body.x.
My first thought was that we could note that these two objects have different shapes, so they can't alias. A quick test patch closed more than half of the performance gap on my cut-down testcase. However, it's not clear that the logic is correct in general; for example, storing to an object of shape A might change its shape to shape B, in which case loading a property from an object of shape B would definitely alias a store to shape A.
So we would need to find a different way to improve the aliasing here.
| Reporter | ||
Comment 1•11 months ago
|
||
Other differences that may or may not matter:
- When we unbox a Number we always generate paths for both Int32 and Double, whereas IIUC V8 knows that the field is a double because they reshape objects when Int32 changes to Double.
- V8 does their interrupt check at the end of the loop, and unifies it with the backedge.
Comment 2•11 months ago
|
||
I presume, that the shape guard is sufficient for their alias analysis if the shape encodes the type which does not overlap with the aliasing bit set. Then the store operation probably does not change the type and thus would not alias.
I suspect one solution would be to implement Bug 1038917, as Kannan suggested a long time ago, or an equivalent based on alias sets & MIRType maybe.
| Reporter | ||
Comment 3•11 months ago
|
||
Yeah, I was thinking that it might be sound to compare baseshape instead, because that shouldn't change as you add properties. It's not obvious to me that we're allowed to load the baseshape from a shape off-thread, though.
Actually, as I write this, it occurs to me that my current patch might actually be sound. An arbitrary store to an object might change its shape, but an MAddFixedSlot won't; you would need an MAddAndStoreSlot. So if you have MStoreFixedSlot with shape S1, and MLoadFixedSlot with shape S2, they can't alias each other directly without another intervening op. If that op exists, and is guaranteed to alias the MLoadFixedSlot, then we can safely say that the load and the store don't alias each other.
But then looking at the alias set for MAllocateAndStoreSlot, it only aliases ObjectFields and DynamicSlots. So maybe something could go wrong there? I've spent some time fruitlessly trying to break my current patch that way, and I have a feeling that it isn't actually possible, but the justification for why that's the case feels pretty fragile. Hmm.
Description
•