Closed Bug 948321 Opened 7 years ago Closed 3 years ago

Differential Testing: Different division results on x86 platforms

Categories

(Core :: JavaScript Engine: JIT, defect, P5)

x86
All
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: gkw, Unassigned)

References

Details

(Keywords: regression, testcase)

Attachments

(3 files)

+++ This bug was initially created as a clone of Bug #946969 +++

function f() {
    return Math.sin((Math.asinh(Math.PI) / Number.MAX_VALUE))
}
x = [];
x.push(f())
x.push(f())
print(x)


$ ./js-opt-32-dm-ts-linux-75c0c92d7fa4 w1529-cj-in.js
1.035936393812552e-308,1.035936393812552e-308

$ ./js-opt-32-dm-ts-linux-75c0c92d7fa4 --ion-eager w1529-cj-in.js
1.035936393812552e-308,1.0359363938125513e-308


Tested on js 32-bit opt threadsafe shell on m-c changeset 75c0c92d7fa4 on Ubuntu Linux.

My configure flags are:

CC="gcc -m32" AR=ar PKG_CONFIG_LIBDIR=/usr/lib/pkgconfig CXX="g++ -m32" sh ./configure --target=i686-pc-linux --enable-optimize --disable-debug --enable-profiling --enable-gczeal --enable-debug-symbols --enable-methodjit --enable-type-inference --disable-tests --enable-more-deterministic --with-ccache --enable-threadsafe <other NSPR options>


autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   http://hg.mozilla.org/mozilla-central/rev/348b2ba27515
user:        David Caabeiro
date:        Mon Jul 15 10:03:14 2013 -0500
summary:     Bug 717379, part 1 - Implement the new ES6 math functions. r=jorendorff.

This may have been a bug from whenever ES6 math functions landed in bug 717379. Setting needinfo? from jandem, as this may be a JIT issue.
Flags: needinfo?(jdemooij)
Gary, are you still seeing this? Do you also see this on OS X or only Linux?
Flags: needinfo?(jdemooij) → needinfo?(gary)
Can reproduce with --ion-parallel-compile=off --ion-eager on 32 bits builds, on Linux at least.
Flags: needinfo?(gary)
Simpler test case:

function f() {
    var x = Math.asinh(Math.PI);
    var y = Number.MAX_VALUE;
    var z = x / y;
    return z;
}
assertEq(f(), f())

So I have investigated a little bit and while the generated JIT code uses divsd, the compiled NumberDiv uses the fdiv instruction (specialized division for the FPU). Maybe we could try to use the fdiv instruction on x86 if it's available (SSE3?), otherwise there will be some small errors like this for sure.

FWIW, there was apparently a bug with the FDIV instruction at some point, see also [0]. Maybe we should ask some Intel engineers if that's expected or if that's a bug?

[0] https://en.wikipedia.org/wiki/Pentium_FDIV_bug
(In reply to Benjamin Bouvier [:bbouvier] from comment #3)
> So I have investigated a little bit and while the generated JIT code uses
> divsd, the compiled NumberDiv uses the fdiv instruction (specialized
> division for the FPU). Maybe we could try to use the fdiv instruction on x86
> if it's available (SSE3?), otherwise there will be some small errors like
> this for sure.

fdiv is part of the x87 instruction set that's much older than SSE2/divsd. x87 has its own stack and is a pain to use compared to SSE; we try to avoid it as much as possible. We do use it for the fisttp instruction for instance because there's no SSE equivalent on x86.

Unfortunately, fdiv uses 80-bit double-extended precision and divsd/SSE2 uses 64-bit double precision. That's probably causing the difference here. This sucks because it's not very easy to work around I'm afraid.

For differential testing we could try to use a compiler switch so that it's free to emit SSE2+ instructions; maybe that will keep it from emitting fdiv, or maybe it won't.. Is this also a problem with 64-bit builds?
(In reply to Jan de Mooij [:jandem] from comment #4)
> Unfortunately, fdiv uses 80-bit double-extended precision and divsd/SSE2
> uses 64-bit double precision. That's probably causing the difference here.
> This sucks because it's not very easy to work around I'm afraid.
> 
> For differential testing we could try to use a compiler switch so that it's
> free to emit SSE2+ instructions; maybe that will keep it from emitting fdiv,
> or maybe it won't.. Is this also a problem with 64-bit builds?

I can't tell for other compilers, but gcc 4.8-real on ubuntu generates the divsd instruction in NumberDiv, so the test passes.
Summary: Differential Testing: Different output message involving Math.sin and Math.asinh → Differential Testing: Different division results on x86 platforms
After trying locally, it didn't make a difference when using the gcc compiler flags -msse2, -mfpmath=sse or -ffast-math, the compiled NumberDiv function still contains x87 instructions.

Dan, do you have any idea of any combination of compiler flags that could prevent gcc from emitting x87 instructions?
Flags: needinfo?(sunfish)
If you use both -msse2 and -mfpmath=sse at the same time, it should use SSE2 instructions.
Flags: needinfo?(sunfish)
Setting needinfo? from myself to get this added to both our GCC and Clang parameters, and I need to check if Windows shows this issue as well.
Flags: needinfo?(gary)
I have some questions after thinking about this a little more:

(1) Is this 32-bit only?
(2) Does this affect ARM/ARM64?
Flags: needinfo?(sunfish)
For (1), all x86-64 hardware supports SSE2, and every C++ compiler I know of uses it by default when targetting x86-64, so yes, the x87 vs sse issue is 32-bit only here.

For (2), ARM doesn't have anything like 80-bit floating-point, but I believe NEON does not implement IEEE-754 subnormals [0], while VFP does is some configurations, and I don't know what soft-float implementations do. You could potentially see different results when switching from a compiler/platform that supports subnormals to one that doesn't. I don't know how big of a problem this will be for you in practice though.

[0] http://en.wikipedia.org/wiki/Denormal_number
Flags: needinfo?(sunfish)
Thanks Dan!

I have verified that adding "-msse2 -mfpmath=sse" to GCC compiler flags fixed the issue.

Mac 32-bit didn't seem to be affected (Mac uses Clang), but adding the flag didn't seem to cause any hurt.

Windows 32-bit is also affected, and I was pointed to http://msdn.microsoft.com/en-us/library/7t5yh4fd%28v=vs.110%29.aspx

Todos:

1. Recheck ARM on the pandaboard.
2. Fix this for Win32.
OS: Linux → All
> 1. Recheck ARM on the pandaboard.

ARM is not affected, and does not need the extra "-msse2 -mfpmath=sse" flags.

Final todo: fix this on Win32.
Clearing needinfo, waiting on bug 998577.
Flags: needinfo?(gary)
Could we just switch the x87 FPU into doing arithmetic at 53 bit
mantissa precision (that is, to 64 bit double accuracy)
instead of 64 bit mantissas (that is, 80 bit extended doubles)?
That is, set %fpucw to 0x027F instead of the default 0x037F?

Theoretically I think we could do that for our own JIT generated
code, but in practice I suspect this would cause breakage in other
libraries.  So, probably a bad idea.
(In reply to Julian Seward from comment #14)
> Theoretically I think we could do that for our own JIT generated
> code, but in practice I suspect this would cause breakage in other
> libraries.  So, probably a bad idea.

It's not a problem for JITted code, as it generates SSE2 instructions (in this case, divsd), while the interpreter (which is plain C++ and compiled by gcc) uses FPU (in this case, fdivl). Maybe we could set the control word before making arithmetic operations and resetting it afterwards (to avoid issues with other libraries), in more deterministic builds. This might have a perf cost but it wouldn't be a blocker if it's low, as it'd be only for fuzzing builds.

Actually, after some digging and testing:
1) bug 496816 (that Gary also filed a few years ago!) shows the same kind of error and a patch that is setting the FPU control word so that double precision is used (see also [1]). Copying the FIX_FPU function in js::NumberDiv doesn't solve the issue.
2) in js::NumberDiv, setting the control word manually to 0x037F (i.e. use extended precision, 80 bits doubles) fixes the assertion (!). Not sure which one (divsd or fdivl) yields the right result.

[1] http://dxr.mozilla.org/mozilla-central/source/js/src/jsnum.cpp#1116
Attached file C test case
Fwiw, in plain C:
- on x64 (with or without setting the FPU flag), the result is correct.
- on x86 without setting the FPU flag (i.e. having it to 0x037F in the first place), the result is correct.
- on x86 with setting the FPU flag (i.e. as we do in FIX_FPU on x86 platforms), the result is wrong (by one bit, I think).
Interesting. It reproduces for me too. You can reproduce the x86 behavior on x64 too by compiling with -mfpmath=387.

I looked up the FPU flags in Intel's documentation. Setting bit 9 (and clearing bit 8) is indeed documented to select double-precision mode. However, if I change the FIX_FPU code to set both bits 8 and 9 (changing 0x2f3 to 0x3f3), that should select double-extended-precision mode, and I get the "correct" result. Is the x87's double-precision mode rounded differently than SSE2's?

As an aside, FIX_FPU's comment also says "Raise bits 0-5" but the code actually does "control |= 0x2f3" which doesn't raise bits 2 or 3, but it isn't actually important because those flags are set by default these days. It also sets bits 6 and 7, which are "reserved", but that doesn't seem to be affecting anything.
This still affects MSVC 2013 Update 3 even though SSE mode is now compiled by default. (as per http://msdn.microsoft.com/en-us/library/7t5yh4fd%28v=vs.120%29.aspx )

$ ./js-dbg-opt-32-dm-nsprBuild-windows-d7ec36a77534.exe --fuzzing-safe --ion-eager 948321.js
1.035936393812552e-308,1.0359363938125513e-308

$ ./js-dbg-opt-32-dm-nsprBuild-windows-d7ec36a77534.exe --fuzzing-safe 948321.js
1.035936393812552e-308,1.035936393812552e-308

Dan/Benjamin, any suggestions on how to move forward?
Flags: needinfo?(sunfish)
Flags: needinfo?(benj)
Debug configure options for comment 18:

MAKE=mozmake AR=ar sh c://Users//fuzz1win8//trees//mozilla-central//js//src//configure --enable-debug --enable-optimize --enable-nspr-build --enable-more-deterministic --enable-gczeal --enable-debug-symbols --disable-tests

Tested on Windows 8.1, MSVC 2013 Update 3, MozillaBuild 1.10.0 and m-c rev d7ec36a77534.
To answer my question in comment 17, with FIX_FPU(), the x87 divide *does* round differently than the SSE2 divide in this case:

$ gcc -m32 f.c -msse2 -mfpmath=387 && ./a.out 
0x0.772fda7c4001cp-1022
$ gcc -m32 f.c -msse2 -mfpmath=sse && ./a.out 
0x0.772fda7c4001bp-1022
$ cat f.c
#include <stdio.h>

volatile double x = 0x1.dcbf69f10006dp+0;
volatile double y = 0x1.fffffffffffffp+1023;
volatile double z;

int main() {
  short control;
  asm("fstcw %0" : "=m" (control) : );
  control &= ~0x300; // Lower bits 8 and 9 (precision control).
  control |= 0x2f3;  // Raise bits 0-5 (exception masks) and 9 (64-bit precision).
  asm("fldcw %0" : : "m" (control) );

  z = x / y;

  printf("%a\n", z);
}
I don't see what else we could do. Gary, did using the switch work fine for former versions of Microsoft Visual Studio? What if you force using the /arch:SSE2 compiler option? Does changing any of the compiler options indicated in [0] avoids the assertion to be triggered?

[0] http://msdn.microsoft.com/en-us/library/e7s85ffb.aspx
Flags: needinfo?(benj)
(In reply to Benjamin Bouvier [:bbouvier] from comment #21)
> I don't see what else we could do. Gary, did using the switch work fine for
> former versions of Microsoft Visual Studio? What if you force using the
> /arch:SSE2 compiler option? Does changing any of the compiler options
> indicated in [0] avoids the assertion to be triggered?
> 
> [0] http://msdn.microsoft.com/en-us/library/e7s85ffb.aspx

I was unable to get the /arch:SSE2 (or other compiler) option(s) to work properly in our build system, so it never worked with MSVC 2010.

I switched to MSVC 2013 since it's the default option there, and this bug still shows.
(In reply to Gary Kwong [:gkw] [:nth10sd] GMT-4 Sep 29 - Oct 3 from comment #22)
> I switched to MSVC 2013 since it's the default option there, and this bug
> still shows.

We explicitly use -arch:IA32 for 32-bit builds since bug 836658. So you could probably undo attachment 8454247 [details] [diff] [review] or otherwise use it for inspiration.
Oh, and I guess attachment 8454247 [details] [diff] [review] from bug 1043108 landed on top of that to fix x64.
Thanks Emanuel! This is a big clue on how to move forward... I'll look into seeing if removing those lines causes the bug to still occur, so setting a needinfo? for myself.

Also, I wonder if configure.in can accept an #ifdef for --fuzzing-safe. Does anyone know?
Flags: needinfo?(gary)
Filed bug 1077074 to get a --enable-sse2 configure option, especially since I already pass in additional options for Mac and Linux (-msse2 -mfpmath=sse)
I don't think I know anything else that's in question here now.
Flags: needinfo?(sunfish)
Sadly, undo-ing the patches in bug 836658 and bug 1043108 didn't seem to cause the issue to go away. I guess I'll wait for bug 1077074 to land and see if it helps.
Flags: needinfo?(gary)
Just my $0.02 here, but ES is only required to provide an approximation of mathematical functions. If you really need 80-bit precision, then you should consider either using a library written in JS that provides more accuracy, or not rely on a scripting language at all.

The spec says:
"For other argument values, these functions are intended to compute approximations to the results of familiar mathematical functions, but some latitude is allowed in the choice of approximation algorithms. The general intent is that an implementer should be able to use the same mathematical library for ECMAScript on a given hardware platform that is available to C programmers on that platform."
Since what is implemented is what is available to C programmers, what is currently present should suffice, even if there are small accuracy differences between generated SIMD and x87 code by the compiler in use.

MSVC also makes no guarantee about which path will be chosen by the compiler. even with /arch:SSE2, there is no guarantee that the compiler will always use SSE2 instructions, since the compiler decides at compile-time what to use -- which, in the case of PGO especially, will simply be unknown.
Leaving precision up to the whims of PGO and JIT compilation makes me sad.

Would it make sense to replace the C div in js::NumberDiv with asm for the exact division instruction we want?
:sunfish requests me to needinfo? him this time. We have some sort of a likely solution on how to proceed.
Flags: needinfo?(sunfish)
Here's a patch implementing my idea. When SSE2 is available but not necessarily used for all math, use inline assembly to explicitly perform an SSE2 divide. Under JS_MORE_DETERMINISTIC only. I've not extensively tested it, but in theory this ought to work.
Flags: needinfo?(sunfish)
(In reply to Dan Gohman [:sunfish] from comment #32)
> Created attachment 8560580 [details] [diff] [review]
> deterministic-div.patch

Since I'm interested in this bug I had a look at the patch, and something stood out to me:

76 # else
77 #  error "deterministic divide implemented yet"
78 # endif
79         return a;

If you're neither having GNUC nor MSVC, you hit this error and looks like you return the UNdivided result (just a) instead of a / b.
(In reply to Mark Straver from comment #33)
> 76 # else
> 77 #  error "deterministic divide implemented yet"
> 78 # endif
> 79         return a;
> 
> If you're neither having GNUC nor MSVC, you hit this error and looks like
> you return the UNdivided result (just a) instead of a / b.

#error means you get a build error, so it never gets far enough to be wrong. I did miss a "not" in the error string though -- I've fixed that in my local copy.
Right, so, what if I wanted to build with, say, the Intel compiler?
1. Patches welcome.  :-)
2. It's not actualy an issue.  Many compilers (including clang and icc) competing against gcc, also define __GNUC__ and attempt to support most of the things gcc supports.  Assembly as here is definitely part of that.
Comment on attachment 8560580 [details] [diff] [review]
deterministic-div.patch

Putting this on my plate to test.
Attachment #8560580 - Flags: feedback?(gary)
Comment on attachment 8560580 [details] [diff] [review]
deterministic-div.patch

See compile failure log.
Attachment #8560580 - Flags: feedback?(gary) → feedback-
Flags: needinfo?(sunfish)
This is probably the issue:

c:\users\mozillaadmin\trees\mozilla-central\js\src\jslibmath.h(69) : error C2039: 'supports_sse2' : is not a member of 'mozilla'
c:\users\mozillaadmin\trees\mozilla-central\js\src\jslibmath.h(69) : error C3861: 'supports_sse2': identifier not found
c:\users\mozillaadmin\trees\mozilla-central\js\src\jslibmath.h(74) : error C2415: improper operand type
Oops, the first issue needs a #include "mozilla/SSE.h".

The second issue is that compilers apparently don't permit inline asm to require xmm registers when SSE is disabled. We could theoretically hack around this restriction by passing the values through memory, and using an xmm register covertly, with something like this in GNU-style inline asm:

        double temp[2];
        asm("movups %%xmm0, %0;"
            "movsd %1, %%xmm0;"
            "divsd %2, %%xmm0;"
            "movsd %%xmm0, %1;"
            "movups %0, %%xmm0" : "=m"(temp), "+m"(a) : "m"(b));

Does anyone know how to do something like this in Microsoft-style inline asm?
Flags: needinfo?(sunfish)
Sorry if I don't understand properly, but if you know the significant width of the result from an SSE instruction, and you know if it either rounds or truncs the result (which I think is likely the culprit here), can't you just make sure the result is treated the same way?

So: even if x87 instructions (double-extended) are used, round or trunc the result accordingly to double, to match what would be the result from SSE division, instead of trying to force SSE manipulation of the operation?
x87 and sse are both set to round-to-nearest, so in theory we're already doing that. And they do get the same results in most cases; it's just these extreme subnormal cases like in comment #20 where we're seeing a difference.
jsfunfuzz works around this by not using compareJIT on Windows if the testcase looks like it uses division:

https://github.com/MozillaSecurity/funfuzz/blob/6b21e57589c1e1236fa4f0a24a096d45077932f7/js/loopjsfunfuzz.py#L234
This still happens with Visual Studio 2015 with default configure parameters (e.g. --enable-debug --enable-more-deterministic) on 32-bit js shell builds.
Priority: -- → P5
With Visual Studio 2017, this seems to be resolved:

autoBisect shows this is probably related to the following changeset:

The first good revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/21edf7e4f6ad
user:        Jan de Mooij
date:        Thu Sep 14 10:03:09 2017 +0200
summary:     Bug 1399471 - Add a mechanism to check callWithABI invariants in debug builds. r=nbp

Jan, is bug 1399471 a likely fix?
Flags: needinfo?(jdemooij)
fwiw, this fix was found while testing with 32-bit deterministic builds on Windows 10 compiled with MSVC 2017. Tested with the testcase in comment 3.
Jan and I spoke f2f at Austin and while it seems that his patch is not likely to be the fix, SSE-only situations are on their way out anyway, and I don't see the bug anymore with the testcases above anymore. Thus, resolving WORKSFORME and we can file issues as needed where necessary in the future.
Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(jdemooij)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.