Closed Bug 1205073 Opened 9 years ago Closed 8 years ago

Regalloc inserts unnecessary MoveGroups into hot loop of 3bit-bits-in-byte

Categories

(Core :: JavaScript Engine: JIT, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: sstangl, Assigned: h4writer)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 3 obsolete files)

The hot loop of SunSpider's 3bit-bits-in-byte contains four unnecessary MoveGroup instructions around Integer instructions.

> Block 7 [successor 8]
> [MoveGroup] [rbx -> rcx]
> [92,93 ShiftI] [def v32<i>:rcx] [use v30:r rcx] [use c]
> [94,95 BitOpI] [def v33<i>:rcx] [use v32:r rcx] [use c]
> [96,97 Integer] [def v34<i>:rbp]
> [98,99 ShiftI] [def v35<i>:rbp] [use v34:r rbp] [use v33:rcx rcx]
> [100,101 BitOpI] [def v36<i>:rbp] [use v35:r rbp] [use c]
> [MoveGroup] [rbx -> rcx]
> [102,103 ShiftI] [def v37<i>:rcx] [use v30:r rcx] [use c]
> [104,105 BitOpI] [def v38<i>:rcx] [use v37:r rcx] [use c]
> [MoveGroup] [rcx -> stack:4] <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> [106,107 Integer] [def v39<i>:rsi]
> [MoveGroup] [stack:4 -> rcx] <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> [108,109 ShiftI] [def v40<i>:rsi] [use v39:r rsi] [use v38:rcx rcx]
> [110,111 BitOpI] [def v41<i>:rsi] [use v40:r rsi] [use c]
> [112,113 AddI] [def v42<i>:rbp] [use v36:r rbp] [use v41:r? rsi]
> [MoveGroup] [rbx -> rcx]
> [114,115 ShiftI] [def v43<i>:rcx] [use v30:r rcx] [use c]
> [116,117 BitOpI] [def v44<i>:rcx] [use v43:r rcx] [use c]
> [MoveGroup] [rcx -> stack:4] <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> [118,119 Integer] [def v45<i>:rsi]
> [MoveGroup] [stack:4 -> rcx] <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> [120,121 ShiftI] [def v46<i>:rsi] [use v45:r rsi] [use v44:rcx rcx]
> [122,123 BitOpI] [def v47<i>:rsi] [use v46:r rsi] [use c]
> [124,125 AddI] [def v48<i>:rbp] [use v42:r rbp] [use v47:r? rsi]
> [126,127 Goto]
Blocks: sm-js-perf
Priority: -- → P5
Attached patch Patch (obsolete) — Splinter Review
I tracked this back to a sub optimal splitting.

The instruction:
> [14,15 ShiftI] [def v6<i>:tied(0)] [use v5:r?] [use c]

What the allocator does:
> [RegAlloc] Allocating  v4 [5,13) (def) v4:*@7 v4:*@11 v4:r?@12 ## v5 [13,15) (def) v5:r?@14 ## v6 [15,20) (def) v6:%ecx@19 [priority 15] [weight 666]
> [RegAlloc]   Requirement arg:8, fixed by definition
> [RegAlloc]   bundle does not contain hot code
> [RegAlloc]   split before first register use at 18
> [RegAlloc]     splitting bundle  v4 [5,13) (def) ## v5 [13,15) (def) ## v6 [15,20) (def) into:
> [RegAlloc]        v4 [12,13) v4:r?@12 ## v5 [13,15) (def) v5:r?@14 ## v6 [15,16) (def)
> [RegAlloc]        v6 [18,20) v6:%ecx@19
> [RegAlloc]        v4 [5,13) (def) v4:*@7 v4:*@11 ## v5 [14,15) ## v6 [16,20)

We split the bundle at the first register "use", which is 18 here.
Now this seems incorrect to me. The v5 and v6 "defs" are register defines.
As a result this splitting splits "v6" at codepoint 18, which splits the "define and use" of that virtual reg, while both will be in registers.

Given that "trySplitBeforeFirstRegisterUse" tries to split non-register uses from register uses, we should actually also split at defines that need a register use. The attachment does that

> [RegAlloc] Allocating  v4 [5,13) (def) v4:*@7 v4:*@11 v4:r?@12 ## v5 [13,15) (def) v5:r?@14 ## v6 [15,20) (def) v6:%ecx@19 [priority 15] [weight 666]
> [RegAlloc]   Requirement arg:8, fixed by definition
> [RegAlloc]   bundle does not contain hot code
> [RegAlloc]   split before first register use at 13
> [RegAlloc]     splitting bundle  v4 [5,13) (def) ## v5 [13,15) (def) ## v6 [15,20) (def) into:
> [RegAlloc]        v4 [12,13) v4:r?@12
> [RegAlloc]        v5 [13,15) (def) v5:r?@14 ## v6 [15,20) (def) v6:%ecx@19
> [RegAlloc]        v4 [5,13) (def) v4:*@7 v4:*@11 ## v5 [14,15) ## v6 [16,20)

You now see that we split at "13", which is a define that is a reuseDef of a register. Which seems more logical. And we don't have this awkward split at 18, which split two register uses. 


I looked at octane for how many "moves in movegroups we have" and could confirm this is a reduction.
To have a better view I only looked at octane benchmarks that were giving stable numbers, but even in the unstable numbers it looked good:
>            Before Patched
> Richards:     653     642
> DeltaBlue:   1652    1641
> Raytrace:    2131    2116
> zlib:        8139    8138
Assignee: nobody → hv1989
Attachment #8803269 - Flags: review?(bhackett1024)
Attached patch Even better? (obsolete) — Splinter Review
I'm still sometimes surprised by what splitAt does. Therefore not entirely sure of everything.

Two potential improvements here:
1) I wasn't getting a nice split. Not sure this is caused by not setting the split position correct? Or is this splitAt working incorrectly for defines. I would think it would be best to get the whole LiveRange out of the Bundle. I think that is what supposed to happen.

2) firstRegisterFrom seems to not break out of both loops. As are result this doesn't contain firstRegisterFrom, but firstRegisterFrom on last LiveRange? Fixed it here.
Attachment #8803275 - Flags: feedback?(bhackett1024)
Comment on attachment 8803275 [details] [diff] [review]
Even better?

Review of attachment 8803275 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/BacktrackingAllocator.cpp
@@ +2776,4 @@
>          LiveRange* range = LiveRange::get(*iter);
>  
>          if (range->hasDefinition() && isRegisterDefinition(range)) {
> +            firstRegisterFrom = range->from().previous();

I'm not sure why range->from() is not doing what you want, splitAt should be putting the entire range into a new bundle if range->from() is a split position, per the UseNewBundle(..., range->from(), ...) call.

@@ +2790,4 @@
>                  }
>              }
>          }
> +        if (firstRegisterFrom > CodePosition::MIN)

Maybe test firstRegisterFrom.bits(), like the tests below this?
Attachment #8803275 - Flags: feedback?(bhackett1024) → feedback+
Attachment #8803269 - Flags: review?(bhackett1024) → review+
Attached patch Patch (obsolete) — Splinter Review
Thank you for the quick review. I double checked the split without the "previous()" and I think I made a mistake. Without the "previous()" it works fine.

I've put everything in one patch and also copied the "conflict" checks. This gives me:

>            Before Patched
> Richards:     653     637
> DeltaBlue:   1652    1613
> Raytrace:    2131    2107
> zlib:        8139    8135
Attachment #8803269 - Attachment is obsolete: true
Attachment #8803275 - Attachment is obsolete: true
Attachment #8803471 - Flags: review?(bhackett1024)
Attached patch PatchSplinter Review
And now with the correct patch.
Attachment #8803471 - Attachment is obsolete: true
Attachment #8803471 - Flags: review?(bhackett1024)
Attachment #8803476 - Flags: review?(bhackett1024)
Attachment #8803476 - Flags: review?(bhackett1024) → review+
Pushed by hv1989@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/22ed1c400c4e
IonMonkey - Take definition type in consideration when splitting before first register use, r=bhackett
https://hg.mozilla.org/mozilla-central/rev/22ed1c400c4e
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: