Closed Bug 1321189 Opened 3 years ago Closed 3 years ago

wasm: integer div-by-zero false positive

Categories

(Core :: JavaScript Engine: JIT, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox52 + fixed
firefox53 --- fixed

People

(Reporter: pipcet, Assigned: bbouvier)

References

Details

Attachments

(2 files, 1 obsolete file)

I suspect there is a bug in ion which causes UDivOrModConstant instructions to be emitted even in cases where the control flow never reaches the division in question, which causes false "integer division by zero" runtime errors in wasm.

I'm not a hundred percent certain yet this isn't just a compiler bug, but I've looked at the wasm code for a while and it looks correct to me.

I'm compiling this C code:

UDItype
__udivmoddi4 (UDItype n, UDItype d, UDItype *rp)
{
  const DWunion nn = {.ll = n};
  const DWunion dd = {.ll = d};
  DWunion rr;
  USItype d0, d1, n0, n1, n2;
  USItype q0, q1;
  USItype b, bm;

  d0 = dd.s.low;
  d1 = dd.s.high;
  n0 = nn.s.low;
  n1 = nn.s.high;
  if (d1 == 0)
    {
      if (d0 <= n1)
        {
          if (d0 == 0) {
            volatile int i = 0;
            if (i == 0)
              return 0;
            d0 = 1 / d0;
          }

          do {
            USItype __d1, __d0, __q1, __q0;
            USItype __r1, __r0, __m;
            __d1 = (((USItype) (d0)) | 1);
            (n0) = (n1) % __d1;
          } while (0);


        }

      if (rp != 0)
        {
          rr.s.low = n0 >> bm;
          rr.s.high = 0;
          *rp = rr.ll;
        }
    }
}

(derived from libgcc) to wasm using my gcc backend (https://github.com/pipcet/asmjs). There are two divison operations in this code; the second one is never a modulo by zero because its argument is ored with 1; the first one is a division by zero, but control can never reach it.  Yet, when I execute the resulting wasm code, passing 1 and 1 as arguments, I get an "integer divide by zero" runtime error at the bytecode offset corresponding to the 1/0 division.

I'm attaching the entire AST dump, but here are what I believe to be the relevant bits:

            (block $label$4
              (block $label$5
                (block $label$6
                  (block $label$7
                    (block $label$8
                      (br_table $label$2 $label$7 $label$6 $label$5 $label$4 $label$3 $label$8
                        (get_local $var$0)
                      )
                    ...
                    )
                  ...
                  )
                ...
                )
              ...
              )
              (i32.store
                (i32.add
                  (get_local $var$7)
                  (i32.const -4)
                )
                (i32.const 0)
              )
              (set_local $var$9
                (i32.load
                  (i32.add
                    (get_local $var$7)
                    (i32.const -4)
                  )
                )
              )
              (if
                (i32.ne
                  (get_local $var$9)
                  (i32.const 0)
                )
                (block $label$13
                  (set_local $var$0
                    (i32.const 4)
                  )
                  (br $label$1)
                )
              )
              (i32.store
                (get_local $var$2)
                (i32.const 0)
              )
              (i32.store
                (i32.add
                  (get_local $var$2)
                  (i32.const 4)
                )
                (i32.const 0)
              )
              (return
                (i32.add
                  (i32.const 80)
                  (get_local $var$7)
                )
              )
            )
            (set_local $var$9
              (i32.div_u
                (i32.const 1)
                (i32.const 0)
              )
            )

(There is no other break to the loop with $var$0 = 4). That code looks correct to me, and the division should never be reached. Yet the generated assembler code begins:

[Codegen] instruction WasmParameter
[Codegen] instruction WasmParameter
[Codegen] instruction MoveGroup
[Codegen] movl       %esi, 0x9c(%rsp)
[Codegen] instruction WasmParameter
[Codegen] instruction MoveGroup
[Codegen] movl       %edx, %r10d
[Codegen] instruction WasmParameter
[Codegen] instruction MoveGroup
[Codegen] movl       %ecx, 0x98(%rsp)
[Codegen] instruction WasmParameter
[Codegen] instruction MoveGroup
[Codegen] movl       %r8d, 0x94(%rsp)
[Codegen] instruction WasmParameter
[Codegen] instruction MoveGroup
[Codegen] movl       %r9d, 0x90(%rsp)
[Codegen] instruction WasmParameter
[Codegen] instruction MoveGroup
[Codegen] movl       0x9c(%rsp), %eax
[Codegen] instruction AddI
[Codegen] addl       $-16, %eax
[Codegen] instruction MoveGroup
[Codegen] movl       %eax, 0x8c(%rsp)
[Codegen] instruction Integer
[Codegen] movl       $0x1, %eax
[Codegen] instruction Integer
[Codegen] movl       $0x1, %ecx
[Codegen] instruction UDivOrModConstant
[Codegen] jmp        .Lfrom162

Not only has the unconditional jump to the exception code been moved to happen before the return does, it's been moved to the very beginning of the function, so if I insert a printf() statement in the function, it's never executed.

Any advice on how to proceed with this bug would be much appreciated, and I'm willing to provide more information if needed. There is no exception with baseline.
Disabling LICM with "--ion-licm=off" avoids the exception, so I'll try investigating what that does.
(In reply to pipcet from comment #1)
> Disabling LICM with "--ion-licm=off" avoids the exception, so I'll try
> investigating what that does.

That makes a lot of sense. LICM hoists code out of loops, so if you have a DIV that looks loop-invariant (except for the trap-on-div-by-zero), it will hoist it before the loop and trigger the bug. We shouldn't move instructions that can trap.
Flags: needinfo?(bbouvier)
Interesting, I can reproduce the issue with a very simple test case: if you have a div-by-zero in a loop, it's hoisted by LICM although it should not.

If I upload a patch, could you test against it and confirm it fixes the issue?
Hah, I wrote comment 3 and pressed "save changes" just before comment 1 was submitted, race condition hit.
Flags: needinfo?(bbouvier)
(In reply to Benjamin Bouvier [:bbouvier] from comment #3)
> Interesting, I can reproduce the issue with a very simple test case: if you
> have a div-by-zero in a loop, it's hoisted by LICM although it should not.

I've been trying to build a simple test case, but somehow it's not been working for me.

> If I upload a patch, could you test against it and confirm it fixes the
> issue?

Absolutely!

(In reply to Jan de Mooij [:jandem] from comment #2)
> (In reply to pipcet from comment #1)
> > Disabling LICM with "--ion-licm=off" avoids the exception, so I'll try
> > investigating what that does.
> 
> That makes a lot of sense. LICM hoists code out of loops, so if you have a
> DIV that looks loop-invariant (except for the trap-on-div-by-zero), it will
> hoist it before the loop and trigger the bug. We shouldn't move instructions
> that can trap.

Is it just a matter of redefining isEffectful to catch those instructions?
Attached patch movable.patch (obsolete) — Splinter Review
This fixes the issue, along with a few others, for modulo (rem_s) and fallible truncations to integer types.
Assignee: nobody → bbouvier
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #8815682 - Flags: review?(jdemooij)
Blocks: wasm
Summary: ion: wasm integer div-by-zero false positive → wasm: integer div-by-zero false positive
(In reply to Benjamin Bouvier [:bbouvier] from comment #6)
> This fixes the issue, along with a few others, for modulo (rem_s) and
> fallible truncations to integer types.

It fixes things here. Thanks!
Comment on attachment 8815682 [details] [diff] [review]
movable.patch

Review of attachment 8815682 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay! Forgot to look at this last week.

::: js/src/jit/MIR.h
@@ +5438,5 @@
>          isUnsigned_(isUnsigned),
>          trapOffset_(trapOffset)
>      {
>          setResultType(MIRType::Int64);
> +        setGuard(); // not removable nor movable because of possible side-effects.

Nit: I think here and below it's more grammatically correct to say "not removable or movable" or "neither removable nor movable".
Attachment #8815682 - Flags: review?(jdemooij) → review+
[Tracking Requested - why for this release]: wasm correctness issue
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d2ba2e62f871
Don't set instructions that can trap as Movable; r=jandem
https://hg.mozilla.org/mozilla-central/rev/d2ba2e62f871
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Tracking wasm issue for 52.  Benjamin, are you planning on requesting this gets uplifted?
Flags: needinfo?(bbouvier)
Attached patch uplift.patchSplinter Review
Yes, thanks for the reminder Julien! 
Updated patch for comment nit; carrying forward r+ from jandem.

Approval Request Comment
[Feature/Bug causing the regression]: bug 1268910
[User impact if declined]: correctness issues in wasm programs
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: not much
[Why is the change risky/not risky?]: well tested and green on treeherder for 1 week+
[String changes made/needed]: n/a
Attachment #8815682 - Attachment is obsolete: true
Flags: needinfo?(bbouvier)
Attachment #8820800 - Flags: review+
Attachment #8820800 - Flags: approval-mozilla-aurora?
Comment on attachment 8820800 [details] [diff] [review]
uplift.patch

wasm fix for aurora52
Attachment #8820800 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.