Closed Bug 997459 Opened 6 years ago Closed 6 years ago

Try out different sets of coefficients for new sincos implementation to optimize accuracy and performance

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla31

People

(Reporter: ehoogeveen, Assigned: ehoogeveen)

References

(Depends on 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

As requested, here's a fairly rough set of coefficients for approximating cosine (I included the 1.0 at the end for completeness):

 2.06467337476762997948e-9
-2.75555495413759160741e-7
 2.48015808595638122085e-5
-1.38888888779622760722e-3
 4.16666666665987187046e-2
-4.99999999999999888978e-1
 1.0

These give a worst error of 7.87e-17 in arbitrary precision, less than 1.11e-16, the epsilon of the output range for the approximation, so with any luck the maximum error will still be 1 bit. It's still big enough that the errors may add up to 2 bits though (I haven't got the program from bug 967709 comment #6 set up yet, so I haven't been able to check).

The optimization goal for these coefficients was to minimize the maximum error (in arbitrary precision), so if these give a maximum error of 1 bit in practice, optimizing for the mean error might give a somewhat better result - but if these give a maximum error of 2 bits, this is probably the best we're going to get with this number of coefficients.

I also have a slightly better set of coefficients for sine, but I'll post those later.
Actually, here's the new set of coefficients for sine that I was working on:

 1.59046813973877163292e-10 //  6152825598094877 / exp2(85)
-2.50509001624159785668e-08 // -7571170002733246 / exp2(78)
 2.75573146431678644161e-06 //  6506786951439440 / exp2(71)
-1.98412698327005105692e-04 // -7320136534024805 / exp2(65)
 8.33333333332626768897e-03 //  4803839602524456 / exp2(59)
-1.66666666666666490881e-01 // -6004799503160655 / exp2(55)

In arbitrary precision, these have a 10.57% smaller mean error than the current set (the difference will be smaller in double precision), at the cost of a somewhat larger maximum error (the maximum error is still so low that the maximum difference in double precision should be 1 bit).
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #1)
> (the difference will be smaller in double precision)

These coefficients appear to have about an 8.21% smaller mean error in double precision - better than I expected! That's using Mathematica to do my own testing, so probably comparing against a different version of sin() as a baseline (is standard library sin() guaranteed to be accurate to the last bit?).
These coefficients should simply be better, but it would be good to confirm that the maximum error is still 1 epsilon with these.

It might be worth noting that the errors peak closer to Pi/4 with this set, but the peaks are narrower (hence the reduction in mean error). It would also be possible to generate sets that 1) optimize solely to minimize the maximum error or 2) optimize for the geometric mean of the maximum error and mean error. These would give lower, but broader peaks in arbitrary precision. But given that simply squaring the input in double precision introduces errors of the same magnitude, which are then shifted around in applying the approximation, I don't think it would help make the errors significantly more predictably spaced.
Assignee: nobody → emanuel.hoogeveen
Status: NEW → ASSIGNED
Attachment #8408199 - Flags: review?(sunfish)
This applies the same coefficients as in comment #0 and applies on top of part 1. I don't know if these coefficients are good enough to use, so asking for feedback instead of review. If we decide that these are good enough to land, I'll work on tweaking them a bit more later.
Attachment #8408201 - Flags: feedback?(sunfish)
Comment on attachment 8408199 [details] [diff] [review]
Part 1: Coefficients for polevl_sin with lower mean error.

Review of attachment 8408199 [details] [diff] [review]:
-----------------------------------------------------------------

My tests also show a maximum error still at one bit in the [0,pi/4] range, and the average error is reduced.
Attachment #8408199 - Flags: review?(sunfish) → review+
Comment on attachment 8408201 [details] [diff] [review]
Part 2: Coefficients for polevl_cos with fewer terms; potentially less precise, but more performant.

Review of attachment 8408201 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good. I see a maximum error of 2 bits on most ranges. When I translate this into the SIMD version, it shortens the cos lane to be the same length as the sin lane.

::: js/src/jsmath.cpp
@@ +384,5 @@
>  
>  static double polevl_cos(double zz)
>  {
> +    // Constants generated using Mathematica's GeneralMiniMaxApproximation
> +    double ans = 2.06467337476762997948e-9;

It'd be good to add a comment here explaining that this cos polynomial is one term shorter than is common, with a tradeoff in precision for performance.
Attachment #8408201 - Flags: feedback?(sunfish) → feedback+
(In reply to Dan Gohman [:sunfish] from comment #6)
> This looks good. I see a maximum error of 2 bits on most ranges. When I
> translate this into the SIMD version, it shortens the cos lane to be the
> same length as the sin lane.

Yes, I see the same maximum error in my own testing. By the way, I could easily generate a series for sine using one *more* coefficient if we want that final bit of precision for both of them (of course you could just set the final coefficient to 0.0 for testing). How much worse is performance using another coefficient?
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #7)
> (In reply to Dan Gohman [:sunfish] from comment #6)
> > This looks good. I see a maximum error of 2 bits on most ranges. When I
> > translate this into the SIMD version, it shortens the cos lane to be the
> > same length as the sin lane.
> 
> Yes, I see the same maximum error in my own testing. By the way, I could
> easily generate a series for sine using one *more* coefficient if we want
> that final bit of precision for both of them (of course you could just set
> the final coefficient to 0.0 for testing). How much worse is performance
> using another coefficient?

Doing one fewer multiply and add is about a 10% difference in performance. With my current projections, this speedup is important if we're going to be performance-competitive using the current algorithm.
Okay, that's a fair bit. I guess if there comes a point when browser vendors agree that high precision implementations of sin and cos are important, it's easy to add back another coefficient.

Further refining the current coefficients to reduce the maximum error (unlikely) or mean error (likely) will take time, so I'll post an updated patch for review.
Comment on attachment 8408199 [details] [diff] [review]
Part 1: Coefficients for polevl_sin with lower mean error.

># HG changeset patch
># Parent c55dfb01a02757b15d5c5d1f2bfec4310d0232fc
># User Emanuel Hoogeveen <emanuel.hoogeveen@gmail.com>
>Bug 997459 - Part 1: Coefficients for polevl_sin with lower mean error.
>
>diff --git a/js/src/jsmath.cpp b/js/src/jsmath.cpp
>--- a/js/src/jsmath.cpp
>+++ b/js/src/jsmath.cpp
>@@ -360,28 +360,28 @@ js::math_clz32(JSContext *cx, unsigned a
>  * [3] http://netlib.org/cephes/
>  * [4] https://svnweb.cern.ch/trac/vdt
>  * [5] http://www.ecma-international.org/ecma-262/5.1/#sec-15.8.2
>  * [6] http://netlib.org/fdlibm
>  */
> 
> static double polevl_sin(double z, double zz)
> {
>-    // Constants taken from fdlibm k_sin.c
>-    double ans = 1.58969099521155010221e-10;
>+    // Constants generated using Mathematica's GeneralMiniMaxApproximation.
>+    double ans = 1.59046813973877163292e-10; // 6152825598094877 / exp2(85)
>     ans *= zz;
>-    ans += -2.50507602534068634195e-08;
>+    ans += -2.50509001624159785668e-08; // -7571170002733246 / exp2(78)
>     ans *= zz;
>-    ans +=  2.75573137070700676789e-06;
>+    ans +=  2.75573146431678644161e-06; //  6506786951439440 / exp2(71)
>     ans *= zz;
>-    ans += -1.98412698298579493134e-04;
>+    ans += -1.98412698327005105692e-04; // -7320136534024805 / exp2(65)
>     ans *= zz;
>-    ans +=  8.33333333332248946124e-03;
>+    ans +=  8.33333333332626768897e-03; //  4803839602524456 / exp2(59)
>     ans *= zz;
>-    ans += -1.66666666666666324348e-01;
>+    ans += -1.66666666666666490881e-01; // -6004799503160655 / exp2(55)
>     ans *= zz * z;
>     ans += z;
>     return ans;
> }
> 
> static double polevl_cos(double zz)
> {
>     // Constants taken from fdlibm k_cos.c
Comment on attachment 8409085 [details] [diff] [review]
Part 2 v2: Coefficients for polevl_cos with fewer terms; less precise, but more performant.

># HG changeset patch
># Parent 8f23ed95cca6805045b49ff949438a3ff34eddad
># User Emanuel Hoogeveen <emanuel.hoogeveen@gmail.com>
>Bug 997459 - Part 2: Coefficients for polevl_cos with fewer terms; less precise, but more performant.
>
>diff --git a/js/src/jsmath.cpp b/js/src/jsmath.cpp
>--- a/js/src/jsmath.cpp
>+++ b/js/src/jsmath.cpp
>@@ -379,30 +379,30 @@ static double polevl_sin(double z, doubl
>     ans += -1.66666666666666490881e-01; // -6004799503160655 / exp2(55)
>     ans *= zz * z;
>     ans += z;
>     return ans;
> }
> 
> static double polevl_cos(double zz)
> {
>-    // Constants taken from fdlibm k_cos.c
>-    double ans = -1.13596475577881948265e-11;
>+    // Constants generated using Mathematica's GeneralMiniMaxApproximation.
>+    // This set uses one less coefficient than usual implementations to
>+    // increase performance, raising the maximum approximation error to 2 bits.
>+    double ans = 2.06467337476762997948e-9;
>     ans *= zz;
>-    ans += 2.08757232129817482790e-09;
>+    ans += -2.75555495413759160741e-7;
>     ans *= zz;
>-    ans += -2.75573143513906633035e-07;
>+    ans +=  2.48015808595638122085e-5;
>     ans *= zz;
>-    ans += 2.48015872894767294178e-05;
>+    ans += -1.38888888779622760722e-3;
>     ans *= zz;
>-    ans += -1.38888888888741095749e-03;
>+    ans +=  4.16666666665987187046e-2;
>     ans *= zz;
>-    ans += 4.16666666666666019037e-02;
>-    ans *= zz;
>-    ans += -0.5;
>+    ans += -4.99999999999999888978e-1;
>     ans *= zz;
>     ans += 1.0;
>     return ans;
> }
> 
> namespace {
> struct sincos_result { double s, c; };
> }
Attachment #8409085 - Attachment description: Part 2 v2: Coefficients for polevl_cos with fewer terms; potentially less precise, but more performant. → Part 2 v2: Coefficients for polevl_cos with fewer terms; less precise, but more performant.
Comment on attachment 8409085 [details] [diff] [review]
Part 2 v2: Coefficients for polevl_cos with fewer terms; less precise, but more performant.

Review of attachment 8409085 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #8409085 - Flags: review?(sunfish) → review+
Dan verified on IRC that the reduced precision isn't a problem for Sunspider, so setting checkin-needed :)
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/e6bb48383502
https://hg.mozilla.org/mozilla-central/rev/25925f2cb271
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
Eyeballing the AWFY results, it looks like this change improved ss-morph by around 3% (although something seems to have regressed it back to previous levels at the moment). So it seems like this did help performance slightly in the scalar code, albeit not much :)
You need to log in before you can comment on or make changes to this bug.