Rounding difference between JIT and interpreter

RESOLVED FIXED

Status

()

defect
P2
normal
RESOLVED FIXED
10 years ago
7 years ago

People

(Reporter: gkw, Assigned: graydon)

Tracking

(Blocks 2 bugs, {regression, testcase})

Trunk
x86
Linux
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.9.2 +
blocking1.9.1 -
in-testsuite ?

Firefox Tracking Flags

(status1.9.2 beta1-fixed)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 attachment)

(Reporter)

Description

10 years ago
v = 0
for (x = 0; x < 2; ++x) {
    --v;
    for each(e in ['', String(''), '']) {
        print(v /= --v)
    }
}

Note the different output between -j on and -j off in 1.9.1 branch.

===

$ ~/Desktop/25902-191/js-dbg-moz191-intelmac -j
js> v = 0
for (x = 0; x < 2; ++x) {
    --v;
    for each(e in ['', String(''), '']) {
        print(v /= --v)
    }
}0
js> 
0.5
-1
0.5
0.3333333333333333
-0.49999999999999994
0.3333333333333333
js> ^C
$ ~/Desktop/25902-191/js-dbg-moz191-intelmac
js> v = 0
for (x = 0; x < 2; ++x) {
    --v;
    for each(e in ['', String(''), '']) {
        print(v /= --v)
    }
}0
js> 
0.5
-1
0.5
0.3333333333333333
-0.4999999999999999
0.33333333333333326
js> ^C
Flags: blocking1.9.1?
That range doesn't really have anything that touches JS ...
(Reporter)

Comment 3

10 years ago
(In reply to comment #1)
> This is a regression between:
> 
> http://hg.mozilla.org/releases/mozilla-1.9.1/rev/3f1e6876d253 - identical -j
> and no -j output
> 
> and
> 
> http://hg.mozilla.org/releases/mozilla-1.9.1/rev/ff069347b3e5 - different
> output.
> 
> http://hg.mozilla.org/releases/mozilla-1.9.1/pushloghtml?fromchange=3f1e6876d253&tochange=ff069347b3e5

(In reply to comment #2)
> That range doesn't really have anything that touches JS ...

Sorry, I got confused myself. That range is invalid. Both listed changeset have different outputs and thus both show the bug.

That said, I tested this on Ubuntu. I _think_ this doesn't affect Mac, but am not sure.

Comment 4

10 years ago
This is TM tip on macosx. This is a rounding issue, so it might be platform and compiler dependent. Gary, are you using Windows?

host-7-221:src gal$ ./Darwin_DBG.OBJ/js -j x.js
0.5
-1
0.5
0.3333333333333333
-0.49999999999999994
0.3333333333333333
host-7-221:src gal$ ./Darwin_DBG.OBJ/js x.js
0.5
-1
0.5
0.3333333333333333
-0.49999999999999994
0.3333333333333333

Comment 5

10 years ago
Note: I have noticed differences in rounding behavior between MSVC and gcc before. sin(x) is often different between macosx firefox and safari and our Windows builds. MSVC was usually right in those cases, and everyone else was wrong. However, I think the rounding modes are somewhat underspecified and both results were acceptable (for sin anyway).

Comment 6

10 years ago
So it seems the non-jit linux results are incorrect since jit and nojit on macosx both agree with jit linux:

$ ~/Desktop/25902-191/js-dbg-moz191-intelmac
js> v = 0
for (x = 0; x < 2; ++x) {
    --v;
    for each(e in ['', String(''), '']) {
        print(v /= --v)
    }
}0
js> 
0.5
-1
0.5
0.3333333333333333
-0.4999999999999999
0.33333333333333326
js> ^C

Updated

10 years ago
Summary: TM: Different result from testcase involving for...in, print, for → Rounding difference between JIT and interpreter
You might not want to treat Mac output as normative. Mac forces 80-bit x87 mode while using 64-bit in SSE2, or did. See bug 338756.

ECMA-262 specifies the exact results for these inputs, I would think (for the sake of interoperability). To conform to the standard, we used to use fdlibm to gain cross-platform consistency. We gave it up for speed, OS compatibility where possible, and code footprint (both source and object).

/be
But maybe there's a tweak for Mac we need, or some way to bypass extended precision (80-bit) f.p. in the x87, if that is what is biting. From bug 338756 you can see that I found no way to convince the OS and hardware to go to 64-bit mode in that FPU.

/be
(Reporter)

Comment 9

10 years ago
(In reply to comment #4)
> This is TM tip on macosx. This is a rounding issue, so it might be platform and
> compiler dependent. Gary, are you using Windows?

Nope, I'm on Ubuntu 9.04, for this particular testcase.
I get the same results as comment #4, using Windows (VS2k5).
Fwiw, the numbers in comment 4 look correct to me, from a general "what is the most precise output you could get given the previous value" perspective...
Rhino agrees with comment 4, and for what it's worth, so does OpenJDK:

public strictfp class Fx {
    public static void main(String[] args) {
        double v = 0;
        for (int x = 0; x < 2; ++x) {
            --v;
            for (int e = 0; e < 3; e++) {
                v = v / (v - 1);
                System.out.println("" + v);
            }
        }
    }
}

0.5
-1.0
0.5
0.3333333333333333
-0.49999999999999994
0.3333333333333333

The "strictfp" is supposed to ensure freshness: http://en.wikipedia.org/wiki/Strictfp

java version "1.6.0_0"
IcedTea6 1.3.1 (6b12-0ubuntu6.4) Runtime Environment (build 1.6.0_0-b12)
OpenJDK Server VM (build 1.6.0_0-b12, mixed mode)
Without -j, the results are the same in tip as in rev 1.
Interpreter:
        0.3333333333333333  (0x3fd5555555555555)
     / -0.6666666666666667  (0xbfe5555555555556)
    == -0.4999999999999999  (0xbfdffffffffffffe)

JIT:
        0.3333333333333333  (0x3fd5555555555555)
     / -0.6666666666666667  (0xbfe5555555555556)
    == -0.49999999999999994 (0xbfdfffffffffffff)

I have no idea which might be correct or whether both might be allowed.  I defer to Graydon.
>>> import decimal
>>> n = decimal.Decimal("0.3333333333333333")
>>> n
Decimal("0.3333333333333333")
>>> d = decimal.Decimal("-0.6666666666666667")
>>> d
Decimal("-0.6666666666666667")
>>> n / d
Decimal("-0.4999999999999999250000000000")
         -0.49999999999999994
         -0.4999999999999999

JIT seems to be correct, then.
Ignore me, I'm stupid, recalculating correctly now...
find-waldo-now:/tmp jwalden$ cat foo.cpp 
#include <stdio.h>
#include <fenv.h>
#include <stdint.h>
#include <math.h>

int main()
{
  union { double d; uint64_t u; } a, b;

  printf("%d\n", fegetround());

  a.u = 0x3fd5555555555555ULL;
  b.u = 0xbfe5555555555556ULL;

  return (int) a.u;
}

Breakpoint 1, main () at foo.cpp:10
10        printf("%d\n", fegetround());
(gdb) n
0
12        a.u = 0x3fd5555555555555ULL;
(gdb) s
13        b.u = 0xbfe5555555555556ULL;
(gdb) s
15        return (int) a.u;
(gdb) set variable b.d = a.d / b.d
(gdb) p/x b.u
$1 = 0xbfdffffffffffffe

So interpreter is correct.

Comment 18

10 years ago
what code does gcc emit here?
gcc emits fdivl:
	.loc 1 4008 0
	fldl	-936(%ebp)
	fdivl	-944(%ebp)
	fstpl	-936(%ebp)

Nanojit is using sse:

    fsub1 = fsub $global0, #0x3ff00000:0
    fdiv1 = fdiv $global0, fsub1
    stqi state[748] = fdiv1
    ld8 = ld state[732]
    add1 = add ld8, 1
    ov1 = ov add1
    xt1: xt ov1 -> pc=0x8aabbe6 imacpc=(nil) sp+0 rp+0
              movq xmm0,740(ecx)              ecx(state) esi(sp)
              movsd xmm2,(#0x8215850) // =1.0 ecx(state) esi(sp) xmm0($global0)
              movsd xmm1,xmm0                 ecx(state) esi(sp) xmm0($global0) xmm2(#0x3ff00000:0)
              subsd xmm1,xmm2                 ecx(state) esi(sp) xmm0($global0) xmm2(#0x3ff00000:0)
              divsd xmm0,xmm1                 ecx(state) esi(sp) xmm0($global0) xmm1(fsub1)
              movq 748(ecx),xmm0              ecx(state) esi(sp) xmm0(fdiv1)
              mov ebx,732(ecx)                ecx(state) esi(sp)
              add ebx,1                       ecx(state) ebx(ld8) esi(sp)
              jo 0xb7ad1fe8                   ecx(state) ebx(add1) esi(sp)
(Assignee)

Comment 20

10 years ago
Add "-msse2 -mfpmath=sse" perhaps?
Not blocker: per comment 5, we've had this deviation across platforms before.
Flags: blocking1.9.1? → blocking1.9.1-
(Reporter)

Updated

10 years ago
Flags: blocking1.9.2?
(Assignee)

Comment 22

10 years ago
If (as is worth confirming) this is just x87-vs-sse2, IMO it should neither be a blocker for any release, nor even really a "bug"; it's the price we will continue to pay for having runtime-detected sse2 math vs. compile-time-chosen conservative x87 math in the interpreter.

Other options are to step the minimum requirements for running the interpreter up to sse2 hardware and compile with -msse2 -mfpmath=sse (plausible, that only means P4/K8 or newer) or else stick a softfp library or some other widget that eliminates the 80-bit-ness of the x87 operands. 

Good to check for sure, of course. I'll poke at it some more later.
Assignee: general → graydon
(Reporter)

Comment 23

10 years ago
(In reply to comment #22)
> If (as is worth confirming) this is just x87-vs-sse2, IMO it should neither be
> a blocker for any release, nor even really a "bug"; it's the price we will
> continue to pay for having runtime-detected sse2 math vs. compile-time-chosen
> conservative x87 math in the interpreter.

This bug arguably interferes with the fuzzing in bug 465479.

Updated

10 years ago
Flags: blocking1.9.2? → blocking1.9.2+
Priority: -- → P2
(Reporter)

Updated

10 years ago
Flags: in-testsuite?
(Reporter)

Comment 24

10 years ago
(In reply to comment #23)
> (In reply to comment #22)
> > If (as is worth confirming) this is just x87-vs-sse2, IMO it should neither be
> > a blocker for any release, nor even really a "bug"; it's the price we will
> > continue to pay for having runtime-detected sse2 math vs. compile-time-chosen
> > conservative x87 math in the interpreter.
> 
> This bug arguably interferes with the fuzzing in bug 465479.

Since the fuzzer in bug 465479 just got folded into jsfunfuzz, we have had to disable JITcompare (which does find issues nowadays) on Linux due to the noise generated by this issue.
(Assignee)

Comment 25

10 years ago
Hm. I looked into this a bit and am now more perplexed than previously.

I added a mechanism to force-disable SSE2 (an environment variable like the ARM one); when I do so, and confirm we're generating x87 code, we still differ in JIT and interp modes. Funny.

Also, I noticed a wrinkle here that the original diagnosis didn't: your original test was on mac, yes? Macs compile with SSE2 by default. All our interpreter routines use SSE there (I just disassembled them), so if it were just an SSE issue you would not see it on mac. I thought you were reporting seeing this everywhere *except* mac.

I talked to Julian and he suggested that he's seen bugs like this before due to spilling an 80 bit operand to a 64 bit slot when in x87 mode. That's a possibility -- we are in fact performing such precision-losing spills -- and it explains the phenomenon I'm seeing on linux, but *not* if it happens on mac.

But then, I cannot reproduce this on a mac, only a linux-gcc configuration. Can you confirm the exact host-compiler combinations you're seeing this on? Ideally it'll help if you objdump -d or otool -tV jsinterp.o on the offending platform, and grep for 'div', paste in what you find. Also tell me if it's a machine that *has* SSE2.

I'm CC'ing Julian on this bug so he can chime in if anything else that happens here reminds him of one of something he's seen before.
(Assignee)

Comment 26

10 years ago
Posted patch fix the bugSplinter Review
Ok, I think I know what's going on here now, and I can fix the bug, but I'd appreciate comments from anyone else who remembers this code, and/or assumptions baked into our FP code. It's pretty old.

On some platforms, the x87 FPU is set to 80 bit mode on startup. On others, 64 bit mode. We have existing code in jsnum.cpp that clobbers the FPU state back to 64 bit mode -- FIX_FPU -- but it's only called on a specific (and very narrow) set of platforms: OS/2 or WINXP when building with icc or gcc but *not* mingw.

As far as I can tell, archaeologically, this narrow ifdef-condition was arrived at merely to get the thing to build given that it uses some mask definitions and an intrinsic that are not available in the header files of all, or even most, x86 platforms. But it's not hard to just insert the necessary fpu-environment-twiddling asm manually, and ignore the issue of "getting the include files to work" entirely. 

The attached patch does just that, and correspondingly relaxes the ifdef to actually fire on linux, where it seems this was biting us. It's not clear to me which (if any) other platforms were reporting the problem, but I can only reproduce it on linux-x86, and the patch fixes that case. Also removes an ancient comment about Alpha, which I'd bet money we don't even build on anymore.

(The reason the bug exists is that, when you get the x87 FPU running in 80 bit mode, it will perform more precise calculations in interpreter mode than the JIT does in SSE mode. I'll file a followup patch to force-disable SSE mode in the JIT as well, but it turns out it's not necessary in this case.)
Attachment #397078 - Flags: review?
(Assignee)

Updated

10 years ago
Attachment #397078 - Flags: review? → review?(brendan)
(Reporter)

Comment 27

10 years ago
Unfortunately comment #0 had a wrongly named js binary - it's not supposed to be called js-dbg-moz191-intelmac - I had hardcoded my compile scripts to output "intelmac" on every platform. :-/ It's a Linux-only bug as far as I can tell, doesn't seem to occur on Mac/WinXP. Apologies for the confusion.

The patch seems to work out great on Linux! Thanks Graydon!
(Assignee)

Comment 28

10 years ago
Comment on attachment 397078 [details] [diff] [review]
fix the bug

Brendan reviewed in person at my desk just now, so removing the r? on him and landing.
Attachment #397078 - Flags: review?(brendan)
(Assignee)

Comment 29

10 years ago
http://hg.mozilla.org/tracemonkey/rev/8ad65f099bb5
Whiteboard: fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/8ad65f099bb5
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
(Reporter)

Updated

7 years ago
Blocks: 349611
You need to log in before you can comment on or make changes to this bug.