Tried the backtracking register allocator and noticed that some of the larger asm.js game demos are freezing, for example Citadel and BananaBench. Examined one of the hot loops that Citadel was caught in but could not spot a register allocation problem. A differential analysis might be possible and help narrow it down.
Some of the errors have been identified and appear to be a semantic differences in the register intervals for fixed register uses. For example LIRGeneratorX86Shared::lowerUDiv targets the first argument to EAX using useFixed(div->getOperand(0), eax) however the backtracking register allocator appears to interpret this as EAX still holding the argument live at the end of the operation. This operation destructively modifies EAX causing a computation error when using the backtracking allocator. The LSRA appears to handle this case differently. The logical difference is around line 690 of LiveRangeAllocator.cpp. Is the backtracking allocator expected to handle such usage in the same way as the LSRA? Would it be best to re-work the ops, such as in lowerUDiv, to allocate a fixed temp for EAX, and might it be possible to target the argument to the same register as this temp to avoid a move when possible?
The two register allocators treat live intervals somewhat differently. The backtracking allocator doesn't rely on fixed ranges for every place an instruction has a predefined use/def register, in the way that LSRA does. This is nice because the backtracking allocator is not then required to move/spill an instruction live across such an instruction if it does not redefine the fixed register. Instead, the backtracking allocator looks at the uses and definitions associated with an interval to decide where it can be allocated. This is done in BacktrackingAllocator::setIntervalRequirement. This should work properly for UDiv, which defines a fixed register for eax, as the vreg defined by the UDiv must be in eax, and satisfying that will require evicting whatever input vreg was in eax. So this should be a problem with the allocator rather than the instruction. Do you have a reduced testcase? Also, have you tried this in a debug build? We run an analysis to check the correctness of the allocation (AllocationIntegrityState::checkIntegrity) which should catch this sort of error.
(In reply to Brian Hackett (:bhackett) from comment #2) > ... This should work properly > for UDiv, which defines a fixed register for eax, as the vreg defined by the > UDiv must be in eax, and satisfying that will require evicting whatever > input vreg was in eax. So this should be a problem with the allocator > rather than the instruction. But does it expect that eax is modified by the instruction? The 'div' instruction requires an argument in eax (and edx), but also requires the result to be in eax and edx. Might it be pinning the vreg to eax but not splitting it to take into account that eax is modified and ends its life at the start of the operation? > Do you have a reduced testcase? Also, have you tried this in a debug build? > We run an analysis to check the correctness of the allocation > (AllocationIntegrityState::checkIntegrity) which should catch this sort of > error. Sorry, no luck reproducing this out of context. Here is an example from Citadel. The source is: h = i32[i >> 2]; i = b + 1 + h; b = i - ((i >>> 0) % (h >>> 0) >>> 0) | 0 The Odin compiled x86 assembly code: 0xb8878455: mov -0x70cff000(%eax),%ecx h = i32[i >> 2] 0xb887845b: mov 0x24(%esp),%eax load arg b 0xb887845f: add $0x1,%eax b + 1 0xb8878462: add %ecx,%eax i = b + 1 + h ((i >>> 0) % (h >>> 0) >>> 0) | 0 0xb8878464: test %ecx,%ecx h == 0 0xb8878466: je 0xb8878552 div by zero ool branch 0xb887846c: xor %edx,%edx zero upper 32 bits, its a 64/32 bit op. 0xb887846e: div %ecx 0xb8878470: sub %edx,%eax Expected: i - ((i >>> 0) % (h >>> 0) >>> 0) | 0 This last instruction appears broken, with bad register allocation. EAX has been destroyed by the 'div' instruction, yet the instruction is assuming that 'i' is still live in EAX. This is on a DEBUG build, and no assertions failed.
Got an example illustrating the register allocation failure when using the backtracking register allocator. Here's some of the debug output: Register Allocation: Bock 0 [2,3 Label] [4,5 AsmJSCheckOverRecursed] [6,7 AsmJSParameter] [def v1 arg:0] [8,9 Nop] [0,1 MoveGroup] [arg:0 -> =eax] [10,11 AddI] [def v2 =eax] [use v1:r =eax] [use c c] [12,13 Integer] [def v3 =ecx] [14,15 UDivOrMod] [def v4 =edx] [use v2:eax =eax] [use v3:r =ecx] [16,17 Nop] [18,19 SubI] [def v5 =eax] [use v2:r =eax] [use v4:* =edx] [20,21 AsmJSReturn] [use v5:eax =eax] [Codegen] cmpl %esp, 0xffffffff [Codegen] jae ((12)) [Codegen] movl 0x4(%esp), %eax [Codegen] addl $0xb, %eax [Codegen] movl $0x3, %ecx [Codegen] testl %ecx, %ecx [Codegen] je ((32)) [Codegen] xorl %edx, %edx [Codegen] div %ecx [Codegen] subl %edx, %eax [Codegen] ret
Great, thanks! This is actually a lowering problem with UMod, not UDiv. UMod marked eax as an input, but not as an output or temp, so the regalloc had no reason to believe that eax would be modified by the instruction. This works with LSRA because it (inefficiently) treats fixed inputs in effectively the same way as temps. With this patch the test works with both LSRA and the backtracking allocator.
Assignee: nobody → bhackett1024
Attachment #814628 - Flags: review?(jdemooij)
Attachment #814628 - Flags: review?(jdemooij) → review+
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Thank you for the quick patch. Differential testing of Citadel on the x86 and x64 shows no difference in asm.js heap stores when comparing runs using the LSRA versus Backtracking register allocator - so it's looking solid. Have only encountered one minor issue on the ARM backend using the backtracking register allocator, and with this fixed everything tried so far is running. Needs more testing.
You need to log in before you can comment on or make changes to this bug.