Open Bug 1961056 Opened 11 months ago Updated 11 months ago

Improve alias analysis for nbody

Categories

(Core :: JavaScript Engine: JIT, task, P2)

task

Tracking

()

People

(Reporter: iain, Unassigned)

References

(Blocks 1 open bug)

Details

There are a number of hotspots in JS3 where Ion simply generates slower code than Turbofan (with no/minimal calls to builtins, etc). NBodySystem.prototype.advance in n-body-SP is one of the hottest. I did an instruction-by-instruction comparison of the generated assembly for a cut-down version:

NBodySystem.prototype.advance = function(dt){
   var size = this.bodies.length;

   for (var i=0; i<size; i++) {
      var body = this.bodies[i];
      body.x += dt * body.vx;
   }
}

Here are the hot inner loops. First, V8 (note that V8 peels one iteration from the start of the loop, and size is always 5 here, so the sample counts are slightly misleading):

TOP-OF-LOOP:
49	mov r9d, r12d

// check loop exit condition
27	cmp r9d, r8d
	jnb 0x1e37cc
34	lea r12d, dword [r9 + 0x1]

// load bodies[i]
43	mov eax, dword [rcx + r9 * 4 + 0x7]
40	add rax, r14

// Guard that it's not a SMI	
31	test al, 0x1
	jz 0x1e3997

// Shape-guard bodies[i]
38	mov ebx, dword [rax - 0x1]
27	add rbx, r14
33	cmp ebx, r11d
	jnz 0x1e38b5

// Load bodies[i].x
54	mov r9d, dword [rax + 0xb]
38	add r9, r14

// Load bodies[i].vx
46	mov eax, dword [rax + 0x17]

// Multiply bodies[i].vx by dt
49	vmulsd xmm0, xmm2, qword [r14 + rax * 1 + 0x3]

// Add this.bodies[i].x
92	vaddsd xmm0, xmm0, qword [r9 + 0x3]

// Store it back	
54	vmovsd qword [r9 + 0x3], xmm0

// Interrupt check / backedge
58	cmp byte [r13 - 0x4f], 0x0
37	jz 0x1e3774 (TOP-OF-LOOP)

Second, SM:

InterruptCheck		
	22	mov r11, 0x716d3d835aac
	60	cmp dword [r11], 0x0
	105	jnz 0xd677
CompareAndBranch	
	82	cmp edx, ecx
		jge 0xd5cc
LoadFixedSlotAndUnbox // this.bodies	
	102	mov r11, -0x2000000000000
	10	xor r11, qword [rax + 0x18]
	73	mov rbx, r11
	1	shr r11, 0x2f
	49	jnz 0xd6a1
GuardShape // this.bodies
	33	mov r11, 0x319f1966c840
	36	cmp qword [rbx], r11
	1	jnz 0xd6a8
Elements // this.bodies[]
	92	mov rsi, qword [rbx + 0x10]
InitializedLength	
	63	mov ebx, dword [rsi - 0xc]
BoundsCheck	
	32	cmp ebx, edx
	2	jna 0xd6af
LoadElementAndUnbox // this.bodies[i]
	72	mov r11, -0x2000000000000
		xor r11, qword [rsi + rdx * 8]
	40	mov rbx, r11
	25	shr r11, 0x2f
	28	jnz 0xd6b9
GuardShape // this.bodies[i]
	318	mov r11, 0x319f196753a0
	113	cmp qword [rbx], r11
	3	jnz 0xd6c3
LoadFixedSlotAndUnbox // this.bodies[i].x
	40	mov r11, qword [rbx + 0x18]
	78	shr r11, 0x2f
	18	cmp r11d, 0x1fff0
		jna 0xd546
		cmp dword [rbx + 0x1c], -0x78000
		jnz 0xd6cd
		xorpd xmm1, xmm1
		cvtsi2sd xmm1, dword [rbx + 0x18]
		jmp 0xd54b
	62	vmovsd xmm1, qword [rbx + 0x18]
LoadFixedSlotAndUnbox // this.bodies[i].vx
	109	mov r11, qword [rbx + 0x30]
	51	shr r11, 0x2f
	462	cmp r11d, 0x1fff0
		jna 0xd57b
		cmp dword [rbx + 0x34], -0x78000
		jnz 0xd6d7
		xorpd xmm2, xmm2
		cvtsi2sd xmm2, dword [rbx + 0x30]
		jmp 0xd580
	90	vmovsd xmm2, qword [rbx + 0x30]
MathD
	150	vmulsd xmm2, xmm0, xmm2
MathD	
	239	addsd xmm1, xmm2
StoreFixedSlotT // this.bodies[i].x
	307	mov r11, 0x716d3d84fe10
	19	test dword [r11], 0x1
	133	jz 0xd5bf
		mov r11, qword [rbx + 0x18]
		shr r11, 0x2f
		cmp r11d, 0x1fff6
		jb 0xd5bf
		push rdx
		lea rdx, qword [rbx + 0x18]
		call 0xfffffffffffeba15
		pop rdx
	28	vmovsd qword [rbx + 0x18], xmm1
AddI	
	129	add edx, 0x1
		jmp 0xd48f

The obvious difference here is that we have way more code in the loop. In particular, we're loading this.bodies.elements each time through the loop, whereas V8 correctly identifies that it doesn't alias anything in the loop and can be hoisted out.

LICM doesn't hoist anything because everything depends on this.bodies, and alias analysis indicates that the load of this.bodies could alias with the store of body.x.

My first thought was that we could note that these two objects have different shapes, so they can't alias. A quick test patch closed more than half of the performance gap on my cut-down testcase. However, it's not clear that the logic is correct in general; for example, storing to an object of shape A might change its shape to shape B, in which case loading a property from an object of shape B would definitely alias a store to shape A.

So we would need to find a different way to improve the aliasing here.

Other differences that may or may not matter:

  • When we unbox a Number we always generate paths for both Int32 and Double, whereas IIUC V8 knows that the field is a double because they reshape objects when Int32 changes to Double.
  • V8 does their interrupt check at the end of the loop, and unifies it with the backedge.

I presume, that the shape guard is sufficient for their alias analysis if the shape encodes the type which does not overlap with the aliasing bit set. Then the store operation probably does not change the type and thus would not alias.

I suspect one solution would be to implement Bug 1038917, as Kannan suggested a long time ago, or an equivalent based on alias sets & MIRType maybe.

Yeah, I was thinking that it might be sound to compare baseshape instead, because that shouldn't change as you add properties. It's not obvious to me that we're allowed to load the baseshape from a shape off-thread, though.

Actually, as I write this, it occurs to me that my current patch might actually be sound. An arbitrary store to an object might change its shape, but an MAddFixedSlot won't; you would need an MAddAndStoreSlot. So if you have MStoreFixedSlot with shape S1, and MLoadFixedSlot with shape S2, they can't alias each other directly without another intervening op. If that op exists, and is guaranteed to alias the MLoadFixedSlot, then we can safely say that the load and the store don't alias each other.

But then looking at the alias set for MAllocateAndStoreSlot, it only aliases ObjectFields and DynamicSlots. So maybe something could go wrong there? I've spent some time fruitlessly trying to break my current patch that way, and I have a feeling that it isn't actually possible, but the justification for why that's the case feels pretty fragile. Hmm.

You need to log in before you can comment on or make changes to this bug.