Closed Bug 1337367 Opened 3 years ago Closed 3 years ago

Postpone spilling bundles till end of register allocation's main loop

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox54 --- fixed

People

(Reporter: sandervv, Assigned: sandervv)

References

Details

Attachments

(1 file, 1 obsolete file)

When a bundle does not have a hint or register requirement, the register allocator will spill the bundle to avoid increasing the register pressure by blocking registers for other more important bundles.

This patch postpones spilling of these bundles till the end of the main register allocation loop. When the main loop is done, the patch will try to allocate a register for each postponed spilled bundle. When there is no register available, the bundle will be spilled as before. However, there are cases where a register is available, and the patch avoids spilling and uses an otherwise unused register.

I have seen one regression in runtime based on this patch. The regression is in an Emscripten benchmark (primes) compiled using Cheerp. The runtime was 4.7s before the patch, and 5.1s after. The patch does only remove spills to the stack and rename registers when a spill is avoided. This should reduce the code size and avoid storing and loading to/from the stack. See the difference in codegen (left before patch, right after patch; important parts marked with "<<<") for the regression:

  [Codegen] movabsq    $0x7fffffffffff, %rcx     |  [Codegen] movabsq    $0x7fffffffffff, %rcx
  [Codegen] andq       %rdx, %rcx                |  [Codegen] andq       %rdx, %rcx
  [Codegen] instruction MoveGroup                |  ----------------------------------------------  # <<<
  [Codegen] movq       %rcx, 0x20(%rsp)          |  ----------------------------------------------  # <<<
  [Codegen] instruction StackArgT                |  [Codegen] instruction StackArgT
  [Codegen] movabsq    $0xfff9000000000000, %r11 |  [Codegen] movabsq    $0xfff9000000000000, %r11
  [Codegen] movq       %r11, 0x0(%rsp)           |  [Codegen] movq       %r11, 0x0(%rsp)
  [Codegen] instruction StackArgV                |  [Codegen] instruction StackArgV
  [Codegen] movq       %rbx, 0x8(%rsp)           |  [Codegen] movq       %rbx, 0x8(%rsp)
  [Codegen] instruction StackArgT                |  [Codegen] instruction StackArgT
  [Codegen] movabsq    $0xfff8800000000000, %r11 |  [Codegen] movabsq    $0xfff8800000000000, %r11
  [Codegen] movq       %r11, 0x10(%rsp)          |  [Codegen] movq       %r11, 0x10(%rsp)
  [Codegen] instruction StackArgT                |  [Codegen] instruction StackArgT
  [Codegen] movl       %eax, 0x18(%rsp)          |  [Codegen] movl       %eax, 0x18(%rsp)
  [Codegen] movl       $0xfff88000, 0x1c(%rsp)   |  [Codegen] movl       $0xfff88000, 0x1c(%rsp)
  [Codegen] instruction MoveGroup                |  [Codegen] instruction MoveGroup
  [Codegen] movq       0x20(%rsp), %rax          |  [Codegen] movq       %rcx, %rax                 # <<<
  [Codegen] instruction CallGeneric              |  [Codegen] instruction CallGeneric
  [Codegen] movq       0x0(%rax), %r8            |  [Codegen] movq       0x0(%rax), %r8
  [Codegen] movq       0x0(%r8), %r8             |  [Codegen] movq       0x0(%r8), %r8
  [Codegen] cmpq       $0x1bbe540, %r8           |  [Codegen] cmpq       $0x1bbe540, %r8
  [Codegen] jne        .Lfrom712                 |  [Codegen] jne        .Lfrom705               
  [Codegen] testl      $0x10000, 0x20(%rax)      |  [Codegen] testl      $0x10000, 0x20(%rax)
  [Codegen] je         .Lfrom725                 |  [Codegen] je         .Lfrom718               
  [Codegen] movl       0x20(%rax), %ebx          |  [Codegen] movl       0x20(%rax), %ebx
  [Codegen] andl       $0xe0000000, %ebx         |  [Codegen] andl       $0xe0000000, %ebx
  [Codegen] cmpl       $0x60000000, %ebx         |  [Codegen] cmpl       $0x60000000, %ebx

It was hard to track the regression down but it was caused by unaligned branch targets. The branch targets were part of in the ModI instruction that was executed many times in an inner loop below the shown assembly code. The branches inside the ModI were not properly aligned, which caused a 400ms regression in runtime. I've a patch available for the regression (aligning the branch targets for ModI and DivI to a 4-byte boundary). The patch that aligns the branch targets even improves the runtime from the original 4.7s to 4.35s. I'll attach the alignment patch in another bug report.
Attachment #8834400 - Flags: review?(bhackett1024)
Assignee: nobody → sandervv
Comment on attachment 8834400 [details] [diff] [review]
postpone-spilling-bundles-till-end-of-main-loop.patch

Review of attachment 8834400 [details] [diff] [review]:
-----------------------------------------------------------------

Nice, thanks!

::: js/src/jit/BacktrackingAllocator.cpp
@@ +1617,5 @@
> +            if (success)
> +                break;
> +        }
> +
> +        // If the bundle has still no register, spill the bundle.

still has
Attachment #8834400 - Flags: review?(bhackett1024) → review+
Fixed typo in comment.
Attachment #8834400 - Attachment is obsolete: true
Attachment #8834422 - Flags: review+
Keywords: checkin-needed
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b7adf3986079
Postpone spilling bundles till after regalloc main loop r=bhackett
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/b7adf3986079
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Depends on: 1337967
Depends on: 1353681
Depends on: 1368577
You need to log in before you can comment on or make changes to this bug.