Bug 1758274 Comment 3 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Original comment by

Julian Seward [:jseward]

on 2022-03-18 01:21:30 PDT

TL;DR

This is the final version of the splitting-X0 (Splitting Experiment #0)
patches.  The patch is able to correctly allocate a couple of small test cases
and stays alive long enough for a more in-depth assessment on the the wasm
jit-tests.

It is clear now that design of the patch cannot work as-is.  But developing it
has cast a lot of light on how the existing allocator works, and points the
way to a possible rework that is more likely to be successful.

                               ----------------

In marginally more detail:

This patch introduces two main changes relative to the existing allocator:

1.  ::splitAt has been reworked so as to split exactly at the points
    requested.  This allows a clean separation of mechanism (implementing
    splits) from policy (deciding where to have them).  Together with a rework
    of ::trySplitAcrossCalls, this appears to give much better (viz, actually
    understandable) splitting behaviour across calls and loops, at least for
    the tiny test case in comment 2, and one other.  This change is worth
    keeping and is not discussed further.

2.  The existing allocator generates overlapping bundles when splitting.  This
    was a cause for concern -- what would that even mean? -- and so this patch
    implements a strict no-overlap policy: when a bundle is split, all of the
    resulting pieces, including the spill bundle, are mutually
    non-overlapping.  (It's a bit more complex than that, but that is the end
    result.)

    Although that worked for tiny test cases and even for some of the wasm
    test suite, the wasm tests also make clear this is unworkable.  Creating
    overlapping bundles is actually *necessary* in the most general case.
    However:
    
    - how the existing allocator does that leads to situations where we have
      unnecessary spills, particularly in loops.  Hence a rework is desirable.
    
    - we now have an answer to "what is the meaning of overlapping bundles?"

                               ----------------

In gory detail:

"SSA and the Art of Overlapping Bundles" (with hat-tip to Robert M. Pirsig)

Addressing (2) in detail.  What does it mean for bundles to overlap?  First
let's see why a strict no-overlap policy is unworkable.  Consider this LIR:
```
148-149 Float32 [def v32<f>]
150-151 WasmCall [use v32:F:xmm0] [use v32:F:xmm1] [use v3:F:r14]
```
This defines `v32` as a float value, assigned a constant by the first LIR,
which is then used twice in the second LIR.  The floatness and constantness
are irrelevant.  What matters is that the uses -- because they are for a call
-- produce a fixed register register conflict -- that value needs to be in
both xmm0 and xmm1.

The (patched) allocator produces the following bundle to encapsulate the def
and two uses:
```
LB95(p=none) v32 149-150 { 149_def:R 150_v32:F:xmm0 150_v32:F:xmm1 }
```
The fixed-reg conflict means this can't be allocated as-is, so it's handed off
to the splitter.  That is constrained by the no-overlap rule, so its only
option is to split between the two LIRs, producing:
```
LB411(p=LB410) v32 149-149 { 149_def:R }
LB412(p=LB410) v32 150-150 { 150_v32:F:xmm0 150_v32:F:xmm1 }
```
But this is useless: LB412 is still un-allocatable, but now it's unsplittable
(and unspillable) too.  The allocator asserts, and it's clear something is
broken at a conceptual level.

The original allocator (obviously) handles this without difficulty by
splitting this into three bundles, two of which overlap:
```
LB287(p=LB285 v32 149-149 { 149_def })
LB288(p=LB285 v32 150-150 { 150_v32:F:xmm0 })
LB289(p=LB285 v32 150-150 { 150_v32:F:xmm1 })
```
But what does such an overlap mean?  And in particular, what is the
MoveGroup-generation phase (::resolveControlFlow) supposed to do with it?

The answer lies in the fact that the input LIR's virtual regs (the `vXX`
values) are in SSA form.  The examples below show how to create MoveGroups for
such a case, using straight-line code (within a basic block).  The same
principles hold in the presence of arbitrary control flow -- the only
difficulty there is the added complexity of deciding where to put the
MoveGroups.

Recall that an LIR virtual register (vreg) will have a bunch of live ranges
attached to it.  Because the vreg is SSA, only one of those ranges may contain
a definition.  All the rest must contain only uses.  Hence it's OK to have
overlapping ranges in certain circumstances, because they must all by
definition hold the same value.  So when we need to "use" the value, we can
take it from any range that is live at the use point -- they are all the same.

A live range -- even with no defs and no uses -- represents a request for
storage of a value (the vreg) in some (statically) consecutive range of
instructions.  Each range will need to "acquire" that value from somewhere.  A
range that contains a definition is considered to start at the point of the
definition, so the definition provides the value.  Non-defining ranges will
need to get a copy of the value from some other range, and it is this "get a
copy" mechanism that the MoveGroup-generator (::resolveControlFlow)
implements.

Consider this, with instructions left-to-right across the page:
```
(v42 10-20 { 10_def })  (v42 21-29 { 25_use 27_use })
```
where each `(..)` pair is one range, and all (by definition) are for the same
vreg, `v42`.  Note that the actual allocations for each bundle are not shown
-- they are important for constructing the MoveGroups but are irrelevant here.

The first range defines its own value, so we ignore it.  The second range
needs to get its value from somewhere, and fortunately the first range flows
directly into it.  So ::resolveControlFlow must insert a MoveGroup in between
points 20 and 21.

Now consider this, with overlapping bundles:
```
(v42 10-20 { 10_def })  (v42 21-29 { 25_use 27_use })
            (v42 15-25 { 19_use })
```
As before we need a MoveGroup at the 20/21 transition.  The third range will
also need an initial value.  We can we get it from?  Well, from any range that
happens to be live at the third range's start point.  In this we can choose
only the defining group.  Hence we can place a MoveGroup at the 14/15
transition.  In fact, it can be anywhere before the first use in the third
group: that is, any transition in 14/15 to 18/19.

With this in mind, ::resolveControlFlow now makes a whole lot more sense.
What it basically does is to visit all of the ranges of a vreg, in increasing
order of their starting points, ignoring any overlaps.  For each new range, it
(presumably) tries to find either a predecessor or an overlapping range, and
copies the value from it:
```
LiveRange* predecessorRange =
    reg.rangeFor(start.previous(), /* preferRegister = */ true);
..
if (!moveInput(ins->toInstruction(), predecessorRange, range, reg.type())) {
```

This explains a couple of other things.  Firstly, this:
```
// The range which defines the register does not have a predecessor
// to add moves from.
if (range->hasDefinition()) {
  iter++;
  continue;
}
```
Of course .. a defining range doesn't need to acquire its value from anywhere
else.  So it is ignored.

Secondly, ::splitAt has the curious behaviour that any spill bundle created
starts from the point immediately following the definition point:
```
CodePosition from = range->from();
if (isRegisterDefinition(range)) {
  from = minimalDefEnd(insData[from]).next();
}
```
Why is that?  It is so that the "insert MoveGroup every time a new non-def
range appears" strategy will cause a spill to be inserted immediately after
the def.  Hence the value is safely spilled.  This is suboptimal though; see
below.

I believe that this overall design also has the advantage that:

* it avoids the "re-spilling" that is a natural consequence of a strict
  no-overlap discipline.  Here, if a range-for-reload comes to an end, so
  what?  We generate no MoveGroup.  Other ranges-for-reload will have to pull
  a copy from some other simultaneously live range (one in the spill bundle),
  but that's fine.

* more generally, it gives flexibility in where a non-defining group can
  acquire its value from.  It can pull it from any simultaneously-live or
  directly-flows-to-here range.  If one of those has been assigned to a stack
  slot and another to a register, then we can take a copy from the register as
  that's cheaper.

                               ----------------

The design of overlapping bundles and how they interact MoveGroup generation
is certainly ingenious.  But the above, and experimentation, does show some
weaknesses:

1.  ::resolveControlFlow again mixes up policy with implementation.  It would
be better if it limited itself to implementation.  Recall this example:
```
(v42 10-20 { 10_def })  (v42 21-29 { 25_use 27_use })
            (v42 15-25 { 19_use })
```
The transfer from the first to the third ranges can happen anywhere in 14/15
to 18/19.  ::resolveControlFlow is hardwired to use the first opportunity, viz
14/15, even if a later transition would be better.

2.  The above, combined with the fact that spill bundles are defined to start
at the point immediately after the original's definition leads to bad results
(observed in the wild).  Consider the following loop, written SSAishly for
clarity, with each `x1/2/3` representing a bundle:
```
int x1 = 0;
loop {
   x2 = phi(x1, x3);
   x3 = x2 + 1;
}
call foo()
.. use x3 ..
```
`x3` will be split into (at least) two bundles: its defining point till the
end of the loop, and the point just after the loop onwards.  The latter will
be further split across the call, but that's irrelevant here.  What matters is
that the first split will have created a spill bundle that (per ::splitAt's
behaviour) starts immediately after `x3`s definition -- inside the loop.
Unfortunately that means that ::resolveControlFlow will spill it at that point
-- inside the loop, quite unnecessarily.  What we really need is for that
spill to happen after the loop.

                               ----------------

So what now?  Possibly:

* Think up a new, more restricted semantics for splitting, that ..

* .. still allows overlapping bundles (because it must), but ..

* .. that retains the advantages of "split where I tell you to, and not at
  some other random place", as detailed far above ..

* .. and that allows placement of MoveGroups (per ::resolveControlFlow) to be
  precisely controlled.  In other words, move policy decisions out of
  ::resolveControlFlow.

Revision 1 by

Julian Seward [:jseward]

on 2022-03-18 01:25:59 PDT

TL;DR

This is the final version of the splitting-X0 (Splitting Experiment #0)
patches.  The patch is able to correctly allocate a couple of small test cases
and stays alive long enough for a more in-depth assessment on the the wasm
jit-tests.

It is clear now that design of the patch cannot work as-is.  But developing it
has cast a lot of light on how the existing allocator works, and points the
way to a possible rework that is more likely to be successful.

                               ----------------

In marginally more detail:

This patch introduces two main changes relative to the existing allocator:

1.  ::splitAt has been reworked so as to split exactly at the points
    requested.  This allows a clean separation of mechanism (implementing
    splits) from policy (deciding where to have them).  Together with a rework
    of ::trySplitAcrossCalls, this appears to give much better (viz, actually
    understandable) splitting behaviour across calls and loops, at least for
    the tiny test case in comment 2, and one other.  This change is worth
    keeping and is not discussed further.

2.  The existing allocator generates overlapping bundles when splitting.  This
    was a cause for concern -- what would that even mean? -- and so this patch
    implements a strict no-overlap policy: when a bundle is split, all of the
    resulting pieces, including the spill bundle, are mutually
    non-overlapping.  (It's a bit more complex than that, but that is the end
    result.)

    Although that worked for tiny test cases and even for some of the wasm
    test suite, the wasm tests also make clear this is unworkable.  Creating
    overlapping bundles is actually *necessary* in the most general case.
    However:
    
    - how the existing allocator does that leads to situations where we have
      unnecessary spills, particularly in loops.  Hence a rework is desirable.
    
    - we now have an answer to "what is the meaning of overlapping bundles?"

                               ----------------

In gory detail:

"SSA and the Art of Overlapping Bundles" (with hat-tip to Robert M. Pirsig)

Addressing (2) in detail.  What does it mean for bundles to overlap?  First
let's see why a strict no-overlap policy is unworkable.  Consider this LIR:
```
148-149 Float32 [def v32<f>]
150-151 WasmCall [use v32:F:xmm0] [use v32:F:xmm1] [use v3:F:r14]
```
This defines `v32` as a float value, assigned a constant by the first LIR,
which is then used twice in the second LIR.  The floatness and constantness
are irrelevant.  What matters is that the uses -- because they are for a call
-- produce a fixed register register conflict -- that value needs to be in
both xmm0 and xmm1.

The (patched) allocator produces the following bundle to encapsulate the def
and two uses:
```
LB95(p=none) v32 149-150 { 149_def:R 150_v32:F:xmm0 150_v32:F:xmm1 }
```
The fixed-reg conflict means this can't be allocated as-is, so it's handed off
to the splitter.  That is constrained by the no-overlap rule, so its only
option is to split between the two LIRs, producing:
```
LB411(p=LB410) v32 149-149 { 149_def:R }
LB412(p=LB410) v32 150-150 { 150_v32:F:xmm0 150_v32:F:xmm1 }
```
But this is useless: LB412 is still un-allocatable, but now it's unsplittable
(and unspillable) too.  The allocator asserts, and it's clear something is
broken at a conceptual level.

The original allocator (obviously) handles this without difficulty by
splitting this into three bundles, two of which overlap:
```
LB287(p=LB285 v32 149-149 { 149_def })
LB288(p=LB285 v32 150-150 { 150_v32:F:xmm0 })
LB289(p=LB285 v32 150-150 { 150_v32:F:xmm1 })
```
But what does such an overlap mean?  And in particular, what is the
MoveGroup-generation phase (::resolveControlFlow) supposed to do with it?

The answer lies in the fact that the input LIR's virtual regs (the `vXX`
values) are in SSA form.  The examples below show how to create MoveGroups for
such a case, using straight-line code (within a basic block).  The same
principles hold in the presence of arbitrary control flow -- the only
difficulty there is the added complexity of deciding where to put the
MoveGroups.

Recall that an LIR virtual register (vreg) will have a bunch of live ranges
attached to it.  Because the vreg is SSA, only one of those ranges may contain
a definition.  All the rest must contain only uses.  Hence it's OK to have
overlapping ranges in certain circumstances, because they must all by
definition hold the same value.  So when we need to "use" the value, we can
take it from any range that is live at the use point -- they are all the same.

A live range -- even with no defs and no uses -- represents a request for
storage of a value (the vreg) in some (statically) consecutive range of
instructions.  Each range will need to "acquire" that value from somewhere.  A
range that contains a definition is considered to start at the point of the
definition, so the definition provides the value.  Non-defining ranges will
need to get a copy of the value from some other range, and it is this "get a
copy" mechanism that the MoveGroup-generator (::resolveControlFlow)
implements.

Consider this, with instructions left-to-right across the page:
```
(v42 10-20 { 10_def })  (v42 21-29 { 25_use 27_use })
```
where each `(..)` pair is one range, and all (by definition) are for the same
vreg, `v42`.  Note that the actual allocations for each bundle are not shown
-- they are important for constructing the MoveGroups but are irrelevant here.

The first range defines its own value, so we ignore it.  The second range
needs to get its value from somewhere, and fortunately the first range flows
directly into it.  So ::resolveControlFlow must insert a MoveGroup in between
points 20 and 21.

Now consider this, with overlapping bundles:
```
(v42 10-20 { 10_def })  (v42 21-29 { 25_use 27_use })
            (v42 15-25 { 19_use })
```
As before we need a MoveGroup at the 20/21 transition.  The third range will
also need an initial value.  We can we get it from?  Well, from any range that
happens to be live at the third range's start point.  In this case we can choose
only the defining group.  Hence we can place a MoveGroup at the 14/15
transition.  In fact, it can be anywhere before the first use in the third
group: that is, any transition in 14/15 to 18/19.

With this in mind, ::resolveControlFlow now makes a whole lot more sense.
What it basically does is to visit all of the ranges of a vreg, in increasing
order of their starting points, ignoring any overlaps.  For each new range, it
(presumably) tries to find either a predecessor or an overlapping range, and
copies the value from it:
```
LiveRange* predecessorRange =
    reg.rangeFor(start.previous(), /* preferRegister = */ true);
..
if (!moveInput(ins->toInstruction(), predecessorRange, range, reg.type())) {
```

This explains a couple of other things.  Firstly, this:
```
// The range which defines the register does not have a predecessor
// to add moves from.
if (range->hasDefinition()) {
  iter++;
  continue;
}
```
Of course .. a defining range doesn't need to acquire its value from anywhere
else.  So it is ignored.

Secondly, ::splitAt has the curious behaviour that any spill bundle created
starts from the point immediately following the definition point:
```
CodePosition from = range->from();
if (isRegisterDefinition(range)) {
  from = minimalDefEnd(insData[from]).next();
}
```
Why is that?  It is so that the "insert MoveGroup every time a new non-def
range appears" strategy will cause a spill to be inserted immediately after
the def.  Hence the value is safely spilled.  This is suboptimal though; see
below.

I believe that this overall design also has the advantage that:

* it avoids the "re-spilling" that is a natural consequence of a strict
  no-overlap discipline.  Here, if a range-for-reload comes to an end, so
  what?  We generate no MoveGroup.  Other ranges-for-reload will have to pull
  a copy from some other simultaneously live range (one in the spill bundle),
  but that's fine.

* more generally, it gives flexibility in where a non-defining group can
  acquire its value from.  It can pull it from any simultaneously-live or
  directly-flows-to-here range.  If one of those has been assigned to a stack
  slot and another to a register, then we can take a copy from the register as
  that's cheaper.

                               ----------------

The design of overlapping bundles and how they interact MoveGroup generation
is certainly ingenious.  But the above, and experimentation, does show some
weaknesses:

1.  ::resolveControlFlow again mixes up policy with implementation.  It would
be better if it limited itself to implementation.  Recall this example:
```
(v42 10-20 { 10_def })  (v42 21-29 { 25_use 27_use })
            (v42 15-25 { 19_use })
```
The transfer from the first to the third ranges can happen anywhere in 14/15
to 18/19.  ::resolveControlFlow is hardwired to use the first opportunity, viz
14/15, even if a later transition would be better.

2.  The above, combined with the fact that spill bundles are defined to start
at the point immediately after the original's definition leads to bad results
(observed in the wild).  Consider the following loop, written SSAishly for
clarity, with each `x1/2/3` representing a bundle:
```
int x1 = 0;
loop {
   x2 = phi(x1, x3);
   x3 = x2 + 1;
}
call foo()
.. use x3 ..
```
`x3` will be split into (at least) two bundles: its defining point till the
end of the loop, and the point just after the loop onwards.  The latter will
be further split across the call, but that's irrelevant here.  What matters is
that the first split will have created a spill bundle that (per ::splitAt's
behaviour) starts immediately after `x3`s definition -- inside the loop.
Unfortunately that means that ::resolveControlFlow will spill it at that point
-- inside the loop, quite unnecessarily.  What we really need is for that
spill to happen after the loop.

                               ----------------

So what now?  Possibly:

* Think up a new, more restricted semantics for splitting, that ..

* .. still allows overlapping bundles (because it must), but ..

* .. that retains the advantages of "split where I tell you to, and not at
  some other random place", as detailed far above ..

* .. and that allows placement of MoveGroups (per ::resolveControlFlow) to be
  precisely controlled.  In other words, move policy decisions out of
  ::resolveControlFlow.

Back to Bug 1758274 Comment 3