Closed
Bug 521245
Opened 16 years ago
Closed 14 years ago
Differences in some ecma3/Math tests when running 64bit vs 32bit shells
Categories
(Tamarin Graveyard :: Virtual Machine, defect, P2)
Tamarin Graveyard
Virtual Machine
Tracking
(Not tracked)
RESOLVED
FIXED
Q3 11 - Serrano
People
(Reporter: brbaker, Assigned: wmaddox)
References
Details
(Whiteboard: fixed-in-tr-serrano)
Attachments
(3 files)
|
4.29 KB,
patch
|
rreitmai
:
review+
|
Details | Diff | Splinter Review |
|
4.32 KB,
patch
|
edwsmith
:
review+
|
Details | Diff | Splinter Review |
|
1.32 KB,
text/plain
|
Details |
There are a handful of math testcases that produce different results when run in the 32bit vs 64bit shells. Tests and results follow.
There are a couple of odd observations:
1) mac always produces the 64bit results for both versions of the shell. This was confirmed on a new mac that could execute both 32 and 64bit shells and on an older mac that could only execute the 32bit shell.
2) linux and windows will produce 2 different values based on whether it is 32bit or 64bit.
3) running the tests in sunspider on a windows32 machine will produce the 64bit values
Math.sin(3.14159265359)
32bit: -2.0682311115474694e-13
64bit: -2.0682310711021444e-13
Math.tan(Math.PI)
32bit: -1.2246063538223773e-16
64bit: -1.2246467991473532e-16
Math.cos(1.570796326795)
32bit: -1.0341155557737347e-13
64bit: -1.0341155355510722e-13
Math.cos(4.712388980385)
32bit: 3.0979057752227035e-13
64bit: 3.097905714554716e-13
Math.cos(Math.PI/2)
32bit: 6.123031769111886e-17
64bit: 6.123233995736766e-17
Math.cos(3*Math.PI/2)
32bit: -1.836909530733566e-16
64bit: -1.8369701987210297e-16
Math.cos(-1.570796326795)
32bit: -1.0341155557737347e-13
64bit: -1.0341155355510722e-13
Math.cos(-4.712388980385)
32bit: 3.0979057752227035e-13
64bit: 3.097905714554716e-13
Math.cos(-Math.PI/2)
32bit: 6.123031769111886e-17
64bit: 6.123233995736766e-17
Math.cos(-3*Math.PI/2)
32bit: -1.836909530733566e-16
64bit: -1.8369701987210297e-16
Flags: flashplayer-triage+
Flags: flashplayer-qrb?
| Reporter | ||
Comment 1•16 years ago
|
||
These tests are in the following acceptance files:
ecma3/Math/e15_8_2_16.as
ecma3/Math/e15_8_2_18.as
ecma3/Math/e15_8_2_7.as
Comment 2•16 years ago
|
||
You should (in theory) be able to use 'arch i386', 'arch x86_64', 'arch ppc' to force the relevant execution even on a 64-bit mac
Assignee: nobody → edwsmith
Flags: flashplayer-qrb? → flashplayer-qrb+
Priority: -- → P2
Target Milestone: --- → flash10.1
Comment 3•16 years ago
|
||
is this problem limited to x86-32 and x86-64 or is it also a problem with ppc-32 and ppc-64?
| Reporter | ||
Comment 4•16 years ago
|
||
- ppc-32 and ppc-64 produce the results of x86_64 on a 32/64bit capable hardware
- ppc-32 produces the results of x86_64 on a 32bit capable hardware
- intel mac machines that are capable of running x86_64 and x86_32 will always produce the results of x86_64
Comment 5•16 years ago
|
||
any evidence that this is new since previous releases of the VM? otherwise imo it's not a P2.
Comment 6•16 years ago
|
||
Please retest to see if this is a regression.
Flags: flashplayer-triage+ → flashplayer-triage?
| Reporter | ||
Comment 7•16 years ago
|
||
This is /not/ a regression from previous version of VM.
- tested against Marlin r42_034, windows x86 and x64 SAP
Running the x64 and x86 versions of the Marlin SAP on a windows64 machine you get different results, same as what we are seeing in current version of the VM.
Math.sin(3.14159265359)
32bit: -2.0682311115474694e-13
64bit: -2.0682310711021444e-13
Flags: flashplayer-triage? → flashplayer-triage+
| Reporter | ||
Comment 8•16 years ago
|
||
Priority should be re-evaluated since this is not a regression from previous release of vm.
Comment 9•16 years ago
|
||
Working assumption is that using 80-bit precision on x87 FPU's is what causes this difference. I beleive spidermonkey operates the x87 FPU in 64-bit mode; that is what the ECMA language spec requires, and what I recommend we investigate doing, if the assumption can be confirmed.
Assignee: edwsmith → wmaddox
Target Milestone: flash10.1 → flash10.2
| Assignee | ||
Comment 10•16 years ago
|
||
This appears to be related to the use of x87 floating point instructions. I modified a Win32 build to call the platform-provided math library functions rather than the inline x87 assembler routines defined in platform/MathUtilsWin.cpp, and got the same results as reported for 64-bit platforms.
| Assignee | ||
Comment 11•16 years ago
|
||
While investigating this issue, I observe that we use the -fp:fast option when compiling on Win32. Removing it did not correct the issue reported here, but the documented effects of -fp:fast are truly horrid, if this blog post is to be believed:
http://blogs.sun.com/khb/entry/an_amazing_floating_point_misoptimization
Unfortunately, the link to the original Microsoft source quoted is now dead.
This is what the current MSDN documentation has to say:
http://msdn.microsoft.com/en-us/library/f99tchzc%28VS.80%29.aspx
Note in particular that proper handling of NaNs is not guaranteed, such as testing for NaN with an expression such as "x != x". This may explain why our MathUtils::isNaN function is defined in an apparently more costly manner, but we aren't consistent -- see, for example, ArraySort::NumericCompare().
| Assignee | ||
Comment 12•16 years ago
|
||
In my earlier experiment disabling the inlined x87 trig functions, I had also removed the -fp:fast option from the compiler command line. It turns out that *both* are required in order to produce the same results on 32-bit Windows as reported for 64-bit Windows.
| Assignee | ||
Comment 13•16 years ago
|
||
To clarify, the inlined assembler x87 trig functions must be disabled (i.e, the platform sin,cos,and tan functions must be called) *and* the -fp:fast switch must be removed.
Comment 14•16 years ago
|
||
Rumor has it that spidermonkey puts the x87 FPU in 64-bit mode, which brings it into ECMAScript compliance and erases issues that come from unwanted 80-bit precision (I think that's happening here). Arguably we should explore that, and it means some tricky engineering around the edges of the VM. There's nowhere inside the vm that requires or wants 80-bit precision, but callouts to native methods could matter.
Updated•15 years ago
|
Flags: flashplayer-bug+
Whiteboard: must–fix-candidate
Updated•15 years ago
|
Whiteboard: must–fix-candidate → must-fix-candidate
| Assignee | ||
Comment 15•15 years ago
|
||
I further investigated Math.cos() on 32-bit Windows with VS2008, confirming the observations made in comment 12. The problem here is not a generic 80-bit precision issue. In fact, the default precision in VS2008 is 53-bits significand, that is, IEEE double. I verified with instrumentation in the code that this was the mode actually in effect in the x87 control register just before MathUtils::cos() was invoked. Instead, the issue appears to be differing result computed by the x97 'fcos' instruction vs. the out-of-line C++ 'cos()' library function. With -fp:precise set and X86_MATH disabled, the library routine is invoked. Stepping through the code reveals that it takes a path appropriate for SSE2-enabled hardware. With X86_MATH enabled (as is always the case on IA32 without hacking the avmplus source), we use our own inline assembler code, which uses the x87 'fcos' instruction. With the -fp:fast option, VS2008 performs the same inlining, replacing the library calls. I verified the latter via stepping through the code at the assembler level.
It is worth noting that all of the divergent cases occur close to multiples of pi/2. I suspect, and seem to recall reading, that points close to the inflection points and zero-crossings are problematical for numeric accuracy in the trig functions.
It's not obvious which answer is the best mathematically. I would not be surprised, for example, if the there is excess precision internally in the x87's trig computations, rounded to double only at the end. Conversely, the x87 might also be taking shortcuts driven by performance or the considerations of a hardware implementation. It's clear, though, that all of the platforms using SSE2 rather than x87 floating point have settled on what we are calling the "64-bit" result, and that is also what the ecma3 tests expect.
I suggest that we rip out the x87-specific inline implementations of the trig
functions in MathUtils, and either drop the global -fp:fast option, or selectively disable it via appropriately-placed pragmas. It is quite likely that the library trig functions on a non-SSE CPU will use the same x87 trig instructions, and result in the same divergence, but these outdated CPUs are of minimal concern at this point.
I have not examined the issue on Linux, but I'm reasonably confident that the root cause is the same, though the exact means to avoid the inlining of x87 trig instructions will be different for the GNU toolchain.
| Assignee | ||
Comment 16•15 years ago
|
||
Fixing this problem may be a bit more complicated in Linux (Ubuntu 8.04).
It appears that we are not inlining the x87 trig here, but the library code is using it. There's some funny business, which I presume is range reduction,
but it's clearly using 'fcos'.
$ gdb --args tr/rel32/shell/avmshell tr/test/acceptance/ecma3/Math/e15_8_2_7.abc
...
(gdb) break cos
...
(gdb) run
...
Breakpoint 1, 0xb7df2920 in cos () from /lib/tls/i686/cmov/libm.so.6
(gdb) bt
#0 0xb7df2920 in cos () from /lib/tls/i686/cmov/libm.so.6
#1 0xb7bd5235 in ?? ()
#2 0x0810f81f in avmplus::BaseExecMgr::invokeGeneric ()
#3 0x0816d307 in avmplus::callprop_b<avmplus::Toplevel*> ()
#4 0x081141fa in avmplus::interpBoxed ()
#5 0x080b4643 in avmplus::AvmCore::callScriptEnvEntryPoint ()
...
In /lib/tls/i686/cmov/libm.so.6:
0xb7df2920 <cos>: fldl 0x4(%esp)
0xb7df2924 <cos+4>: fcos
0xb7df2926 <cos+6>: fnstsw %ax
0xb7df2928 <cos+8>: test $0x400,%eax
0xb7df292d <cos+13>: jne 0xb7df2930 <cos+16>
0xb7df292f <cos+15>: ret
0xb7df2930 <cos+16>: fldpi
0xb7df2932 <cos+18>: fadd %st(0),%st
0xb7df2934 <cos+20>: fxch %st(1)
0xb7df2936 <cos+22>: fprem1
0xb7df2938 <cos+24>: fnstsw %ax
0xb7df293a <cos+26>: test $0x400,%eax
0xb7df293f <cos+31>: jne 0xb7df2936 <cos+22>
0xb7df2941 <cos+33>: fstp %st(1)
0xb7df2943 <cos+35>: fcos
0xb7df2945 <cos+37>: ret
Comment 17•15 years ago
|
||
Sure would be nice if we could drop support for non-SSE2 machines. Might be worth pinging Flash/AIR mgmt to see if this is on their radar.
Comment 18•15 years ago
|
||
(In reply to comment #17)
> Sure would be nice if we could drop support for non-SSE2 machines. Might be
> worth pinging Flash/AIR mgmt to see if this is on their radar.
Recovering some information:
Last time this came up I wrote (4 November 2010) this in response to one of your messages on the same topic:
"Evidently the low-power Via C7, which is fairly recent, does not support SSE2 but is supported in configurations up to 2GHz, ie, SSE2 issues aside Flash content might run well on systems based around that CPU. Wikipedia notes that HP ultraportables running XP and Vista were based on this CPU as late as 2008."
Comment 19•15 years ago
|
||
10.2 Air and FP system requirements explicitly call for Pentium 4 and Athlon 64, both of which have SSE1/2/3. Even Atom has SSE3. OSX requires Core Duo. I think < SSE2 is behind us. In fact, we've recently removed the Pentium 3 builder from buildbot in bug 634957 per this rationale.
http://www.adobe.com/products/flashplayer/systemreqs/
| Assignee | ||
Comment 20•15 years ago
|
||
Note that we can fix this on Win32 simply by yanking out some dubious optimizations. Perhaps a compromise would be to do the easy fix for
Windows and let the bug eventually die when non-SSE2 support is truly
behind us. The only reason I haven't already done this is the fear of
an unexpected performance regression, particularly in player code that
might be calling into MathUtils.
| Assignee | ||
Comment 21•14 years ago
|
||
The Windows platform libraries look to be OK.
Attachment #535265 -
Flags: review?(rreitmai)
| Assignee | ||
Comment 22•14 years ago
|
||
ECMA 262 specifies the treatment of certain special arguments, such as NaNs, infinities, and zeroes, but otherwise allows the implementation some lattitude in exactly what approximation is used for the result, which need not be the most accurate representable double value. It is recommended, however, in the 5th edition of ECMA 262 that the values returned be those yielded by the algorithms of the 'fdlibm' math library, an allegedly portable math library capable of provding IEEE 754-compliant results. Using a common implementation of these algorithms on all platforms will permit bit-for-bit consistency across platforms, though at some expense in performance. The 'fdlibm' library is distributed under an extremely liberal open-source license, and there would likely be no objection to its usage on legal grounds.
Unfortunately, fdlibm is very old code, and makes extensive use of an idiom for extracting the high and low words of a double that is not C99 compliant, due to the strict aliasing rule. As a result, GCC miscompiles the code.
It appears that the Firefox JS implementation formerly used fdlibm, but dropped it recently as a result of code quality and portability issues, primarily the strict aliasing issue. The current JS code simply calls out to the platform libm library, with a few conditionally-compiled adjustments for divergences from the ECMA rules, e.g., the MSVC atan2 function handles infinities incorrectly. The adjustments are small enough, however, that there doesn't seem to be any case for splitting up the code into separate platform-specific implementations, and it can all be found in jsmath.cpp and jslibmath.h.
Inspection of the the OpenJDK for Java shows that the StrictMath class, which is intended to provide bit-for-bit cross-platform compatibility, appears to be using fldlibm, which appears to be included in the codebase verbatim. As for how they've dealt with the strict aliasing problems, I found a posting by Joe Darcy (JDK floating point guru) from 2007, as well as some other discussion, that suggest that the OpenJDK community has dealt with this by fiddling with compiler options, possibly without truly appreciating that the optimizers are acting within spec, and that the code itself is just not C99 compliant.
Apparently, the fdlibm-based StrictMath has been called out for substandard performance, e.g., by game developers. While there is something to be said for having an additional StrictMath library in AS3, we'd likely take a significant performance hit to ditch the tuned math libraries provided with the platforms, which can take advantage of both x87 and SSE2 support, for example, where appropriate.
Comment 23•14 years ago
|
||
Comment on attachment 535265 [details] [diff] [review]
Use platform libraries, not x87 inlines, for sin, cos, and tan
Any particular reason we'd want to keep the code around?
I'd say rip it out and put a comment in there regarding what happened and a link to this bug and be done with it.
Attachment #535265 -
Flags: review?(rreitmai) → review+
| Assignee | ||
Comment 24•14 years ago
|
||
Apparently, it is well-known that the x87 does not do accurate argument reduction, resulting in significant errors outside the range -pi/4 <= theta <= pi/4. See:
http://blogs.oracle.com/jag/entry/transcendental_meditation#comment-1122530621000
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4857011
| Assignee | ||
Comment 25•14 years ago
|
||
Following Rick's suggestion in his review, I just ripped out the code, but left some commentary behind. Also, I now limit the scope of the float_control pragma so that it's effect doesn't leak out of MathUtils.
| Assignee | ||
Comment 26•14 years ago
|
||
Comment on attachment 535528 [details] [diff] [review]
Remove x87 inlines for sin, cos, and tan
>diff --git a/platform/win32/MathUtilsWin-inlines.h b/platform/win32/MathUtilsWin-inlines.h
>--- a/platform/win32/MathUtilsWin-inlines.h
>+++ b/platform/win32/MathUtilsWin-inlines.h
>@@ -38,16 +38,22 @@
> * ***** END LICENSE BLOCK ***** */
>
> #include <math.h>
>
> #ifdef AVMPLUS_IA32
> #define X86_MATH
> #endif
>
>+// Avoid unsafe floating-point optimizations, including replacing library calls
>+// with inlined x87 instructions. We will do this explicitly with inline asm
>+// where appropriate.
>+#pragma float_control(push)
>+#pragma float_control(precise, on)
>+
> // warning this code is used by amd64 and arm builds
>
> namespace avmplus
> {
>
> REALLY_INLINE double MathUtils::abs(double value)
> {
> #ifdef X86_MATH
>@@ -105,28 +111,26 @@ namespace avmplus
>
> #ifndef X86_MATH
> REALLY_INLINE double MathUtils::ceil(double value)
> {
> return ::ceil(value);
> }
> #endif /* X86_MATH */
>
>-#ifdef X86_MATH
>- REALLY_INLINE double MathUtils::cos(double value)
>- {
>- _asm fld [value];
>- _asm fcos;
>- }
>-#elif !defined(AVMPLUS_ARM)
>+// The x87 trigonometric instructions produce values that differ from the
>+// results expected by the ECMAscript test suite for sin, cos, and tan.
>+// Use the standard library instead on Win32 as well as Win64. See bug 521245.
>+
>+#ifndef AVMPLUS_ARM
> REALLY_INLINE double MathUtils::cos(double value)
> {
> return ::cos(value);
> }
>-#endif /* X86_MATH */
>+#endif /* AVMPLUS_ARM */
>
> #ifndef X86_MATH
> REALLY_INLINE double MathUtils::exp(double value)
> {
> return ::exp(value);
> }
> #endif /* X86_MATH */
>
>@@ -176,39 +180,38 @@ namespace avmplus
>
> #ifndef X86_MATH
> REALLY_INLINE double MathUtils::powInternal(double x, double y)
> {
> return ::pow(x, y);
> }
> #endif /* X86_MATH */
>
>-#ifdef X86_MATH
>- REALLY_INLINE double MathUtils::sin(double value)
>- {
>- _asm fld [value];
>- _asm fsin;
>- }
>-#elif !defined(AVMPLUS_ARM)
>+// The x87 trigonometric instructions produce values that differ from the
>+// results expected by the ECMAscript test suite for sin, cos, and tan.
>+// Use the standard library instead on Win32 as well as Win64. See bug 521245.
>+
>+#ifndef AVMPLUS_ARM
> REALLY_INLINE double MathUtils::sin(double value)
> {
> return ::sin(value);
> }
>-#endif /* X86_MATH */
>
>-#if !defined(X86_MATH) && !defined(AVMPLUS_ARM)
> REALLY_INLINE double MathUtils::tan(double value)
> {
> return ::tan(value);
> }
>-#endif /* !defined(X86_MATH) && !defined(AVMPLUS_ARM) */
>+#endif /* AVMPLUS_ARM */
>
> REALLY_INLINE double MathUtils::sqrt(double value)
> {
> #ifdef X86_MATH
> _asm fld [value];
> _asm fsqrt;
> #else
> return ::sqrt(value);
> #endif /* X86_MATH */
> }
>
> }
>+
>+// Restore the prevailing floating-point model.
>+#pragma float_control(pop)
>diff --git a/platform/win32/MathUtilsWin.cpp b/platform/win32/MathUtilsWin.cpp
>--- a/platform/win32/MathUtilsWin.cpp
>+++ b/platform/win32/MathUtilsWin.cpp
>@@ -39,16 +39,21 @@
>
> #include "avmplus.h"
> #include <math.h>
>
> #ifdef AVMPLUS_IA32
> #define X86_MATH
> #endif
>
>+// Avoid unsafe floating-point optimizations, including replacing library calls
>+// with inlined x87 instructions. We will do this explicitly with inline asm
>+// where appropriate.
>+#pragma float_control(precise, on)
>+
> // warning this code is used by amd64 and arm builds
>
> namespace avmplus
> {
> #ifdef AVMPLUS_ARM
>
> const static double PI = 3.141592653589793;
> const static double PI3_BY_4 = 3*PI/4;
>@@ -346,33 +351,22 @@ namespace avmplus
> {
> return ::sin(adjustValueForTrigFuncs(value));
> }
> else
> {
> return ::sin(value);
> }
> }
>-#endif /* AVMPLUS_ARM */
>
>-#ifdef X86_MATH
>- double MathUtils::tan(double value)
>- {
>- // This is a good candidate for inlining, but VC++ 2008 chokes on it.
>- _asm fld [value];
>- _asm fptan;
>- _asm _emit 0xDD; // fstp st(0);
>- _asm _emit 0xD8;
>- }
>-#elif defined(AVMPLUS_ARM)
> double MathUtils::tan(double value)
> {
> if( broken_trig_funcs && (value > AVMPLUS_TRIG_FUNC_MAX || value < -AVMPLUS_TRIG_FUNC_MAX) )
> {
> return ::tan(adjustValueForTrigFuncs(value));
> }
> else
> {
> return ::tan(value);
> }
> }
>-#endif /* X86_MATH */
>+#endif /* AVMPLUS_ARM */
> }
Attachment #535528 -
Attachment is patch: true
| Assignee | ||
Comment 27•14 years ago
|
||
Comment on attachment 535528 [details] [diff] [review]
Remove x87 inlines for sin, cos, and tan
This is essentially what Rick approved, so this is more of an SR.
Are we good with a Windows-only fix for this?
Attachment #535528 -
Flags: review?(edwsmith)
| Assignee | ||
Comment 28•14 years ago
|
||
(In reply to comment #26)
> Comment on attachment 535528 [details] [diff] [review] [review]
> Remove x87 inlines for sin, cos, and tan
I apparently fat-fingered the "edit attachment as comment" button.
Ok, so that's what it does...
Comment 29•14 years ago
|
||
Comment on attachment 535528 [details] [diff] [review]
Remove x87 inlines for sin, cos, and tan
If this brings windows inline with other platforms (and other browsers on windows), great. But we shouldn't do this blind to the performance regression - what is it? or maybe more importantly, how do the asmicro/jsmicro numbers look across different platforms on the same cpu (e.g. win v. mac on a core 2 duo)
Attachment #535528 -
Flags: review?(edwsmith) → review+
| Assignee | ||
Comment 30•14 years ago
|
||
(In reply to comment #29)
It makes Win32 performance dependent on the quality of the platform libraries,
as we are on every other platform, including Win64. What I don't know is whether the Win32 libraries are substandard in performance, so this is worth a look. I'm sure it is possible to construct a microbenchmark that will show a significant regression on Win32, but the extra speed of the inlined instruction comes at the expense of getting the wrong answer, because we omit the argument reduction step necessary to get the right one. We could have written the spec such that the onus for argument reduction was on the user, but that's not what we and ECMA-262 have done, nor is it the prevailing practice elsewhere. I've looked at the code for some trig libraries that use the x87, and they have the additional argument reductoin logic, more code than we want to inline. On the other hand, judging from the complaints about slow trig in Java that I've seen on the web, there is a significant constituency for "fast and dirty" trig, either of limited accuracy or limited argument range. There may be some call for a 'FastMath' library if we got to the point where our overall performance was getting competitive with Java.
| Assignee | ||
Comment 31•14 years ago
|
||
This benchmark repeatedly evaluates Math.cos() at the arguments used in the ecma3/Math/ecma15_8_2_7.as test case, avoiding non-numeric arguments and those that yield a NaN result. On my T60p laptop, runtime is ~6.2s using the inlined x87 fcos instruction, and ~7.1s using the platform cos() function. I would expect similar results for sin and tan. This is a 15% hit in an extreme case that is doing nothing but trig, and is actually a smaller penalty than I expected.
| Assignee | ||
Comment 32•14 years ago
|
||
(In reply to comment #29)
> how do the asmicro/jsmicro numbers look across different platforms on the
> same cpu (e.g. win v. mac on a core 2 duo)
Only a handful of our performance tests do any trig at all, and none
do very much of it. I created a microbenchmark for Math.cos(). Perhaps
it should be converted to a proper jsmicro test, along with similar tests
for other Math.* functions.
Comment 33•14 years ago
|
||
changeset: 6361:11b703b874a0
user: William Maddox <wmaddox@adobe.com>
summary: Bug 521245 - Use platform sin/cos/tan on Win32 (r=rreitmai,edwsmith)
http://hg.mozilla.org/tamarin-redux/rev/11b703b874a0
| Assignee | ||
Comment 34•14 years ago
|
||
Pushed to tamarin-redux-serrano:
http://asteam.macromedia.com/hg/tamarin-redux-serrano/rev/e8b12237d93b
Updated•14 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•