OdinMonkey: optimize add-with-carry and related JS patterns to exploit machine code instructions.

RESOLVED WONTFIX

Status

()

Core
JavaScript Engine: JIT
RESOLVED WONTFIX
3 years ago
a year ago

People

(Reporter: dougc, Unassigned)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

3 years ago
The machine code add-with-carry instruction can be emulated in JS and is an LLVM primitive that needs to representable in JS by a pattern that can be compiled back to efficient machine code.

An issue has been opened to work on the Emscripten side, the patterns that need to be supported: https://github.com/kripken/emscripten/issues/2578

For example, add-with-carry can be implemented by the pattern:

 sum = ((x | 0) + (y | 0)) | 0;
 carry = ((sum >>> 0) < (x >>> 0)) ? 1 : 0;

Or:

 sum = ((x | 0) + (y | 0)) | 0;
 carry = ((sum >>> 0) < (y >>> 0)) ? 1 : 0;

If the carry is only used in a conditional clause then the pattern can be:

 sum = ((x | 0) + (y | 0)) | 0;
 if ((sum >>> 0) < (x >>> 0)) ...

Or:

 sum = ((x | 0) + (y | 0)) | 0;
 if ((sum >>> 0) < (y >>> 0)) ...

The compiler could detect these patterns and optimize them to use add-with-carry instructions and make use of the carry result produced without the extra comparison in the above JS code.

Comment 1

3 years ago
Cool.  Does this show up in any workloads you've seen?  Do you see the signed version show up?  I guess that is UB in C++ :)

I assume the expected consumer of llvm.uadd.with.overflow is a branch?  In that case, will Emscripten be able to fold the 'carry' branch (in your second two examples above) with the branch performed by the consumer of the llvm.uadd.with.overflow?  If so then it seems like lowering could produce an LAddAndBranch that generates the desired 'add;jo'.
(Reporter)

Comment 2

3 years ago
(In reply to Luke Wagner [:luke] from comment #1)
> Cool.  Does this show up in any workloads you've seen?  Do you see the
> signed version show up?  I guess that is UB in C++ :)

Yes, it showed up as very hot on my first try of your profiler work in an asm.js game demo.
 
> I assume the expected consumer of llvm.uadd.with.overflow is a branch?

Yes, it seems so. All the examples I have seen consume the carry in a branch.

> In
> that case, will Emscripten be able to fold the 'carry' branch (in your
> second two examples above) with the branch performed by the consumer of the
> llvm.uadd.with.overflow?

That would be ideal. Looking at some example code shows that the carry is not always consumed by the immediately following instruction, so this might not be possible. It would still be useful to inline, and we can do better than emitting the extra comparison.

> If so then it seems like lowering could produce an
> LAddAndBranch that generates the desired 'add;jo'.

In some cases this should be possible. In other cases it might be optimized to an addition-with-carry followed by a single instruction to move the carry into a GRP for later use.

Comment 3

3 years ago
(In reply to Douglas Crosher [:dougc] from comment #2)
> Yes, it showed up as very hot on my first try of your profiler work in an
> asm.js game demo.

\o/

> In some cases this should be possible. In other cases it might be optimized
> to an addition-with-carry followed by a single instruction to move the carry
> into a GRP for later use.

Ok, then it seems like there would be two cases:
 1. Emscripten is able to fold the carry with a branch on the carry
   -> emit pattern 2 or 3 (in comment 0), emit add;jo
 2. Emscripten isn't able to fold the carry with a branch
   -> emit this pattern:
        sum = ((x | 0) + (y | 0)) | 0;
        carry = (((sum >>> 0) < (x >>> 0)))|0;
      and replace it with an MAddWithOverflow at the MIR-level and generate add;seto

Comment 4

3 years ago
Alright, Alon says (https://github.com/kripken/emscripten/issues/2578#issuecomment-50064995) that Emscripten emits:
 $uadd1$arith = (($mul7) + ($shl10))|0;
 $uadd1$overflow = ($uadd1$arith>>>0)<($mul7>>>0);

Alon, if the C++ code then branches on the i1 result, will it look like:
  if ($uadd1$overflow) { ... }
?

Comment 5

3 years ago
It should, yes, the eliminator pass will remove the second variable (assuming it has one use) and arrive at

 $uadd1$arith = (($mul7) + ($shl10))|0;
 if ( ($uadd1$arith>>>0) < ($mul7>>>0) ) { .. }

The first variable might also be eliminated (again, if it has only one use, which in this case would mean that we only care about whether an overflow occurred - probably rare).

To verify that the whole process works as it should, it would be good if we had a testcase to test on. But I'm not sure when LLVM emits llvm.uadd.with.overflow, no C code that I write seems to hit it, so all I have are artificial hand-written LLVM IR. sunfish, any idea?
The main place where clang emits llvm.uadd.with.overflow and llvm.umul.with.overflow is C++ operator new:

struct A {
  int x;
  A();
  ~A();
};

void *foo(unsigned long n) {
  return new A[n];
}

Comment 7

3 years ago
I assume these do as well
  http://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic-builtins
  http://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins
?
Yep. Another thing that causes them to be emitted is -fsanitize=integer.

Comment 9

3 years ago
Ok, on that code with new[] the result looks fine, as expected.
I assume with WebAssembly opcodes, there is no point anymore in pattern-matching JS patterns in general. Please reopen if you think otherwise.
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.