IonMonkey: backtracking allocator errors for large asm.js game demos.

RESOLVED FIXED in mozilla27

Status

()

defect
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: dougc, Assigned: bhackett)

Tracking

unspecified
mozilla27
x86
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

Tried the backtracking register allocator and noticed that some of the larger asm.js game demos are freezing, for example Citadel and BananaBench.  Examined one of the hot loops that Citadel was caught in but could not spot a register allocation problem.  A differential analysis might be possible and help narrow it down.
Some of the errors have been identified and appear to be a semantic differences in the register intervals for fixed register uses.

For example LIRGeneratorX86Shared::lowerUDiv targets the first argument to EAX using useFixed(div->getOperand(0), eax) however the backtracking register allocator appears to interpret this as EAX still holding the argument live at the end of the operation.  This operation destructively modifies EAX causing a computation error when using the backtracking allocator.  The LSRA appears to handle this case differently. The logical difference is around line 690 of LiveRangeAllocator.cpp.

Is the backtracking allocator expected to handle such usage in the same way as the LSRA?

Would it be best to re-work the ops, such as in lowerUDiv, to allocate a fixed temp for EAX, and might it be possible to target the argument to the same register as this temp to avoid a move when possible?
Flags: needinfo?(bhackett1024)
The two register allocators treat live intervals somewhat differently.  The backtracking allocator doesn't rely on fixed ranges for every place an instruction has a predefined use/def register, in the way that LSRA does.  This is nice because the backtracking allocator is not then required to move/spill an instruction live across such an instruction if it does not redefine the fixed register.  Instead, the backtracking allocator looks at the uses and definitions associated with an interval to decide where it can be allocated.  This is done in BacktrackingAllocator::setIntervalRequirement.  This should work properly for UDiv, which defines a fixed register for eax, as the vreg defined by the UDiv must be in eax, and satisfying that will require evicting whatever input vreg was in eax.  So this should be a problem with the allocator rather than the instruction.

Do you have a reduced testcase?  Also, have you tried this in a debug build?  We run an analysis to check the correctness of the allocation (AllocationIntegrityState::checkIntegrity) which should catch this sort of error.
Flags: needinfo?(bhackett1024)
(In reply to Brian Hackett (:bhackett) from comment #2)
> ... This should work properly
> for UDiv, which defines a fixed register for eax, as the vreg defined by the
> UDiv must be in eax, and satisfying that will require evicting whatever
> input vreg was in eax.  So this should be a problem with the allocator
> rather than the instruction.

But does it expect that eax is modified by the instruction?

The 'div' instruction requires an argument in eax (and edx), but also requires the result to be in eax and edx.  Might it be pinning the vreg to eax but not splitting it to take into account that eax is modified and ends its life at the start of the operation?

> Do you have a reduced testcase?  Also, have you tried this in a debug build?
> We run an analysis to check the correctness of the allocation
> (AllocationIntegrityState::checkIntegrity) which should catch this sort of
> error.

Sorry, no luck reproducing this out of context.  Here is an example from Citadel.

The source is:
  h = i32[i >> 2];
  i = b + 1 + h;
  b = i - ((i >>> 0) % (h >>> 0) >>> 0) | 0

The Odin compiled x86 assembly code:
   0xb8878455:	mov    -0x70cff000(%eax),%ecx       h = i32[i >> 2]
   0xb887845b:	mov    0x24(%esp),%eax              load arg b
   0xb887845f:	add    $0x1,%eax                    b + 1
   0xb8878462:	add    %ecx,%eax                    i = b + 1 + h

                                                    ((i >>> 0) % (h >>> 0) >>> 0) | 0
   0xb8878464:	test   %ecx,%ecx                    h == 0
   0xb8878466:	je     0xb8878552                   div by zero ool branch
   0xb887846c:	xor    %edx,%edx                    zero upper 32 bits, its a 64/32 bit op.
   0xb887846e:	div    %ecx                         

   0xb8878470:	sub    %edx,%eax                    Expected: i - ((i >>> 0) % (h >>> 0) >>> 0) | 0

This last instruction appears broken, with bad register allocation. EAX has been destroyed by the 'div' instruction, yet the instruction is assuming that 'i' is still live in EAX.

This is on a DEBUG build, and no assertions failed.
Got an example illustrating the register allocation failure when using the backtracking register allocator.  Here's some of the debug output:

Register Allocation:
Bock 0
[2,3 Label]
[4,5 AsmJSCheckOverRecursed]
[6,7 AsmJSParameter] [def v1 arg:0]
[8,9 Nop]
[0,1 MoveGroup] [arg:0 -> =eax]
[10,11 AddI] [def v2 =eax] [use v1:r =eax] [use c c]
[12,13 Integer] [def v3 =ecx]
[14,15 UDivOrMod] [def v4 =edx] [use v2:eax =eax] [use v3:r =ecx]
[16,17 Nop]
[18,19 SubI] [def v5 =eax] [use v2:r =eax] [use v4:* =edx]
[20,21 AsmJSReturn] [use v5:eax =eax]

[Codegen] cmpl       %esp, 0xffffffff
[Codegen] jae        ((12))
[Codegen] movl       0x4(%esp), %eax
[Codegen] addl       $0xb, %eax
[Codegen] movl       $0x3, %ecx
[Codegen] testl      %ecx, %ecx
[Codegen] je         ((32))
[Codegen] xorl       %edx, %edx
[Codegen] div        %ecx
[Codegen] subl       %edx, %eax
[Codegen] ret
Posted patch patchSplinter Review
Great, thanks!  This is actually a lowering problem with UMod, not UDiv.  UMod marked eax as an input, but not as an output or temp, so the regalloc had no reason to believe that eax would be modified by the instruction.  This works with LSRA because it (inefficiently) treats fixed inputs in effectively the same way as temps.  With this patch the test works with both LSRA and the backtracking allocator.
Assignee: nobody → bhackett1024
Attachment #814628 - Flags: review?(jdemooij)
Component: JavaScript Engine → JavaScript Engine: JIT
Attachment #814628 - Flags: review?(jdemooij) → review+
https://hg.mozilla.org/mozilla-central/rev/266e7e5e3a2b
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Thank you for the quick patch.

Differential testing of Citadel on the x86 and x64 shows no difference in asm.js heap stores when comparing runs using the LSRA versus Backtracking register allocator - so it's looking solid.

Have only encountered one minor issue on the ARM backend using the backtracking register allocator, and with this fixed everything tried so far is running.  Needs more testing.
You need to log in before you can comment on or make changes to this bug.