Bug 1708381 Comment 16 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

I checked that this problem occurs also on arm64-Ion, kind-of in contradiction
to comment 2.

----

I see the attraction of threshold-based fixes per comments 14 and 15, but I'm
concerned it could lead to performance cliffs, the avoidance of which I had
understood to be an informal goal of wasm.  Here's a different kind of
proposal.

----

A possible solution -- assuming the analysis is correct -- would be to disallow
LICM and GVN for expressions of the form `wasmHeapBase + constant`, justified
thusly:

* We can compute such expressions "for free" as part of loads/stores on all
  targets if `constant` is smallish (2^12, on arm64?), and on arm64 even if we
  have to use a second insn to synthesise a larger constant into a register.

* So there's no point in LICMing them, and allocating a second register to hold
  it (in the best case) or spilling (as here)

* For GVN, we need to avoid the danger that GVNing `wasmHeapBase + constant` in
  two separate arms would lead to the value being hoisted to a block inside the
  loop but prior to the switch (viz, the immediate dominator of each arm).
  (But will GVN do that?  It would generate partial redundancies unless the
  expression was used in *all* arms).  Even just within each arm, GVNing of
  these expressions seems marginal to me.

This leads into dangerous territory, though.  The above proposal might fix the
immediate problem.  But it only works because the simulator state structure
being accessed is parked at the start of the linear memory -- I'd guess it's a
C++ global variable and that's what emscripten does with global variables.

Consider the situation if instead the structure lived in the emscripten C++
heap [in linear memory].  Then each access would be of the form
```
  *(wasmHeapBase + structBaseOffset + constant)
```
where `structBaseOffset` would also be a parameter to fn612.

If that sum is presented in MIR as
```
  wasmHeapBase + (structBaseOffset + constant)
```
then the above rules would not help: all of the 2100 different
`structBaseOffset + constant` terms would get lifted out, and we'd be just as
badly off as at present.  If the sum was presented as
```
  (wasmHeapBase + structBaseOffset) + constant
```
then all the `wasmHeapBase + structBaseOffset` get lifted out as a single value
(good, call it `structAddr`), but that value is also obviously loop invariant,
and so all the `structAddr + constant` exprs also get lifted out, so we're
still no further forward.

Maybe the LICM-inhibition rule needs to apply to expressions of the form
`E + const` where `E` can contain `wasmHeapBase` plus any other stuff we 
want to add to it, but not any constants.  Then duplicated `E`s can get 
LICM'd/GVN'd, no problem, and we just need ensure that the addition of the
constant is the last thing that happens, and that it can't be LICM'd/GVN'd.

I'm at least moderately sure this is all bog-standard
generate-good-array-and-structure-access-code C/C++/Fortran stuff, and we just
need to look in the right textbook to find the "standard" solution.
I verified that this problem occurs also on arm64-Ion, kind-of in contradiction
to comment 2.

----

I see the attraction of threshold-based fixes per comments 14 and 15, but I'm
concerned it could lead to performance cliffs, the avoidance of which I had
understood to be an informal goal of wasm.  Here's a different kind of
proposal.

----

A possible solution -- assuming the analysis is correct -- would be to disallow
LICM and GVN for expressions of the form `wasmHeapBase + constant`, justified
thusly:

* We can compute such expressions "for free" as part of loads/stores on all
  targets if `constant` is smallish (2^12, on arm64?), and on arm64 even if we
  have to use a second insn to synthesise a larger constant into a register.

* So there's no point in LICMing them, and allocating a second register to hold
  it (in the best case) or spilling (as here)

* For GVN, we need to avoid the danger that GVNing `wasmHeapBase + constant` in
  two separate arms would lead to the value being hoisted to a block inside the
  loop but prior to the switch (viz, the immediate dominator of each arm).
  (But will GVN do that?  It would generate partial redundancies unless the
  expression was used in *all* arms).  Even just within each arm, GVNing of
  these expressions seems marginal to me.

This leads into dangerous territory, though.  The above proposal might fix the
immediate problem.  But it only works because the simulator state structure
being accessed is parked at the start of the linear memory -- I'd guess it's a
C++ global variable and that's what emscripten does with global variables.

Consider the situation if instead the structure lived in the emscripten C++
heap [in linear memory].  Then each access would be of the form
```
  *(wasmHeapBase + structBaseOffset + constant)
```
where `structBaseOffset` would also be a parameter to fn612.

If that sum is presented in MIR as
```
  wasmHeapBase + (structBaseOffset + constant)
```
then the above rules would not help: all of the 2100 different
`structBaseOffset + constant` terms would get lifted out, and we'd be just as
badly off as at present.  If the sum was presented as
```
  (wasmHeapBase + structBaseOffset) + constant
```
then all the `wasmHeapBase + structBaseOffset` get lifted out as a single value
(good, call it `structAddr`), but that value is also obviously loop invariant,
and so all the `structAddr + constant` exprs also get lifted out, so we're
still no further forward.

Maybe the LICM-inhibition rule needs to apply to expressions of the form
`E + const` where `E` can contain `wasmHeapBase` plus any other stuff we 
want to add to it, but not any constants.  Then duplicated `E`s can get 
LICM'd/GVN'd, no problem, and we just need ensure that the addition of the
constant is the last thing that happens, and that it can't be LICM'd/GVN'd.

I'm at least moderately sure this is all bog-standard
generate-good-array-and-structure-access-code C/C++/Fortran stuff, and we just
need to look in the right textbook to find the "standard" solution.

Back to Bug 1708381 Comment 16