Closed Bug 984018 Opened 10 years ago Closed 9 years ago

sincos optimization

Categories

(Core :: JavaScript Engine: JIT, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
firefox43 --- fixed

People

(Reporter: sunfish, Assigned: victorcarlquist, Mentored)

References

Details

Attachments

(4 files, 31 obsolete files)

8.69 KB, text/plain
Details
10.92 KB, patch
victorcarlquist
: review+
Details | Diff | Splinter Review
45.65 KB, patch
nbp
: review+
Details | Diff | Splinter Review
2.52 KB, patch
nbp
: review+
Details | Diff | Splinter Review
IonMonkey should recognize when both sin(x) and cos(x) are computed for the same x and form a merged sincos(x) operation which returns both results, which can be significantly faster.

sincos may be implemented with the support in bug 967709 when/if it lands, or alternatively it may be implemented in terms of the sincos function provided by the standard libraries on some platforms.

Opportunities for this optimization include numerous Box2d workloads, and it looks like it would also trigger in pdf.js, as well as math-partial-sums.js and 3d-cube.js in SunSpider.

An alternative approach would be to use the math function cache; computing a sin(x) could insert a value into the cache for both sin(x) and cos(x) so that a subsequent call to cos(x) would be fast. However, it is hoped that the cache will be eliminated some day, possibly through bug 967709, so it would be nice to perform this optimization without using it.
Depends on: 967709
Note that asm.js does not currently use the math function cache at all, so the cache approach wouldn't help it unless we also enable the cache for it.
ZongShen, would you be interested to work on using  sincos(tau, &s, &c)  for computing both  sin(tau)  and  cos(tau) , and caching the result in the MathCache?  This would be a first part, and then after we can look at making a second part to make sure IonMonkey is only making one call to get both sin & cos results instead of doing 2 calls.

Basically, the reason behind is that it is fast to compute sin and cos at the same time as opposed to compute sin and cos separately.

Dan, would you be interested in mentoring?
Flags: needinfo?(sunfish)
Flags: needinfo?(andy.zsshen)
Yes, I can be a mentor here. I may be absent some days over the next few weeks, so I may not be able to respond quickly, but I will respond when I can. Box2d would be a good benchmark to guide this work.

For the cache-based strategy, the other interesting thing to watch for is what the speed of sincos is for programs which only want one result, compared to calling just sin or just cos.
Mentor: sunfish
No longer depends on: 967709
Flags: needinfo?(sunfish)
Flags: needinfo?(andy.zsshen)
Yes, I can try to implement this hybrid function.
I would prefer the initial version of this feature to use the sincos functionality provided by the standard libraries on some platforms, rather than implementing it manually. That way, we can focus on how this feature fits into the JIT without worrying about floating-point math implementation at the same time. sincos is not a standard C/C++ feature, but it is available on some platforms, for example:

https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/__sincos.3.html
http://www.gnu.org/software/libc/manual/html_node/Trig-Functions.html

I had previously thought it was available on Windows, but further investigation suggests that it's only available to C++ AMP and other specialized environments. Of course, I'd be happy to be corrected here.
OK, so instead of mathematics implementation, the initial version would be a cross-platform wrapper which
leverages the exported libraries from the supported platforms. Just a pseudo code, it might be:

#ifdef _WIN32
    #include <amp.math.h>
#endif

void
math_sincos(double in, double *p_sin, double *p_cos)
{
    #if defined(__linux__)
        sincos(in, p_sin, p_cos);
    #elif defined(__APPLE__)
        __sincos(in, p_sin, p_cos);
    #elif defined(_WIN32)
        sincos(in, p_sin, p_cos);
    #endif
}

Is that correct?
With a fallback that does |*p_sin = sin(in); *p_cos = cos(in);|, I'm guessing, so we can play around with using it whenever we know we need both.

I'm mostly replying to point out the bugs where we're investigating our own implementation: bug 967709 and bug 996375. I don't know if those will ever land, though I've been working on generating optimal sets of coefficients so we can make an informed decision.

The work in this bug could help land our own implementation, since it would let us call the function only once when we know we need both.
(In reply to andy.zsshen from comment #6)
> OK, so instead of mathematics implementation, the initial version would be a
> cross-platform wrapper which
> leverages the exported libraries from the supported platforms. Just a pseudo
> code, it might be:
> 
> #ifdef _WIN32
>     #include <amp.math.h>
> #endif

I'm not familiar with AMP; are there any negative consequences for using it in regular C++ code?

> void
> math_sincos(double in, double *p_sin, double *p_cos)
> {
>     #if defined(__linux__)

This should be __GLIBC__ rather than __linux__.

>         sincos(in, p_sin, p_cos);
>     #elif defined(__APPLE__)
>         __sincos(in, p_sin, p_cos);
>     #elif defined(_WIN32)
>         sincos(in, p_sin, p_cos);

And add a fallback as mentioned in comment 7.

>     #endif
> }
> 
> Is that correct?
 
Yes, with comments addressed. We'll likely revisit how we actually want to compute sin and cos later. For now this is good, and will let us proceed implementing the rest of the JIT side. Once we finish the JIT work, it'll be easier to evaluate options for sin and cos computation under real-world conditions.
I mentioned this over IRL a couple weeks ago, but it would be really really sweet, once SIMD is working and all, to use SIMD ops to implement a self-hosted MathSinCos method (maybe as asm.js to start, if necessary, then have math_sin and math_cos both be self-hosted methods that use that.  Fast, no need to worry about platform differences, as quick as computing sin or cos alone, etc.

That's probably not something to do now, tho, given the current state of our SIMD implementation.  But sometime eventually, maybe.  Just noting here to have in the record *somewhere*.
I just ignored the AMP library and drafted an uncached prototype.

void
js::math_sincos_uncached(double x, double *p_sin, double *p_cos)
{
    #if defined(__GLIBC__)
        sincos(x, p_sin, p_cos);
    #elif defined(__APPLE__)
        __sincos(x, p_sin, p_cos);
    #else
        *p_sin = js::math_sin_uncached(x);
        *p_cos = js::math_cos_uncached(x);
    #endif
}

And I am thinking how to cache the result of such hybrid functions in MathCache.
Could I work on this bug if Andy is working on something else?
I don't know. andy.zsshen, are you still working on this, or are you ok if someone else does?
Hi Victor,

Sorry for the delay response.

Yes. You can work on it, since I am now working on other bugs.
Nice, thanks ;)
Assignee: nobody → victorcarlquist
Attached patch WIP - Sincos (obsolete) — Splinter Review
Hi Dan,
With this path we can use the sincos function, the next step is to create the analysis to Ion recognize sin(x) and cos(x) and replace it by sincos(x).

Thanks!
Attachment #8566506 - Flags: feedback?(sunfish)
Comment on attachment 8566506 [details] [diff] [review]
WIP - Sincos

Review of attachment 8566506 [details] [diff] [review]:
-----------------------------------------------------------------

At a first glance, it looks like a good start! Some review comments below.

::: js/src/jit/CodeGenerator.cpp
@@ +5035,5 @@
>  
>      MOZ_ASSERT(ToFloatRegister(ins->output()) == ReturnDoubleReg);
>  }
>  
> +inline bool

This should be static, rather than inline. Static in this context gives the function internal name linkage, which hides the symbol name so that other compilation units in the program don't collide with it. And, most compilers will inline a static function that's only called in one place anyway.

@@ +5040,5 @@
> +OptimizeSinCos(const MathCache *mathCache, CodeGenerator *code, Register &temp, FloatRegister &input,
> +               LMathFunctionD *ins)
> +{
> +    MMathFunction::Function idFun = ins->mir()->function();
> +    if (!mathCache || idFun != MMathFunction::Sin || idFun != MMathFunction::Sin)

Should that last one be Cos?

@@ +5067,5 @@
>      MOZ_ASSERT(ToFloatRegister(ins->output()) == ReturnDoubleReg);
>  
>      const MathCache *mathCache = ins->mir()->cache();
> +    // WIP: if (ins->mir()->OptimizeSin() || ins->mir()->OptimizeCos()).
> +    if (OptimizeSinCos(mathCache, this, temp, input, ins))

Does this fully replace the Sin and Cos code in the switch below? If so, we should remove that code. And possibly, we should put this code inside the switch there the old code is so that it doesn't hvae to check the idFun itself.

::: js/src/jit/MIR.h
@@ +5801,5 @@
>              return false;
>          }
>      }
>  
> +    void setOptimizeSin(){

Style nit (here and elsewhere): put a space before the '{'.

@@ +5809,5 @@
> +    void setOptimizeCos(){
> +        isOptimizeCos_ = true;
> +    }
> +
> +    bool OptimizeSin(){

Getter functions can be marked const.

::: js/src/jsmath.cpp
@@ +932,5 @@
> +
> +    MathCache::MathFuncId idFunc = static_cast<MathCache::MathFuncId>(id);
> +    math_sincos_uncached(x, &s, &c);
> +    if (idFunc == MathCache::Sin) {
> +        cache->store(MathCache::Cos, x, c);

This caches the cos value but not the sin value. Is that intentional?

@@ +944,5 @@
> +
> +void
> +js::math_sincos_uncached(double x, double *p_sin, double *p_cos)
> +{
> +    #if defined(__GLIBC__)

Style nit: don't indent the #if.
Attachment #8566506 - Flags: feedback?(sunfish) → feedback+
Attached patch WIP - Sincos analysis. (obsolete) — Splinter Review
Hi Dan!
Thanks for your feedback :)

This patch is slower (~9%) on box-2d, I think that the issue is on store/lookup in MathCache.

I'll refactor OptimizeSinCos in IonBuilder...
Attachment #8566506 - Attachment is obsolete: true
Attachment #8567400 - Flags: feedback?(sunfish)
Attached patch WIP - Sincos analysis. (obsolete) — Splinter Review
Sorry,
This patch is the right one, I'll bench it.
Attachment #8567400 - Attachment is obsolete: true
Attachment #8567400 - Flags: feedback?(sunfish)
Attachment #8567505 - Flags: feedback?(sunfish)
Here are some results:
 
Box2D (average of 30)
before: 25894 
after : 25533 

So... this patch is little slower, I need to investigate it, do you have any idea?
Attached patch WIP - Sincos (obsolete) — Splinter Review
Comments added and OptimizeSinCos function, in IonBuilder,  refactored.
Attachment #8567505 - Attachment is obsolete: true
Attachment #8567505 - Flags: feedback?(sunfish)
Attachment #8567656 - Flags: feedback?(sunfish)
Comment on attachment 8567656 [details] [diff] [review]
WIP - Sincos

Review of attachment 8567656 [details] [diff] [review]:
-----------------------------------------------------------------

If you are testing this on a GLIBC system, one significant reason for the lack of speedup is that GLIBC (at least on x86/x64) is not using an optimized implementation. I looked around, and a few machines I have handy have code which appears to be compiled versions of this:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sincos.c;h=c8a99991cc8d47dcb56c7a9d6ca817db9cf2223c;hb=HEAD
We shouldn't do this optimization on top of that sincos implementation, unfortunately. Do we know if the Apple __sincos function is fast?

For people following along, the patch here relies on the cache approach described earlier in this bug. Given
  a = Math.sin(x);
  b = Math.cos(x);
it turns the Math.sin call into a sincos call and caches the sin and cos values in the math cache, and then the Math.cos call should hit the cache. This is subject to a few disadvantages as outlined above, but it is simple and should work, in theory, provided that the platform has a sincos which is actually fast.

While IonBuilder works as a place to perform this optimization, it means that the optimization doesn't kick in on code like this:
        var y = Math.sin(d[i]);
        var z = Math.cos(d[i]);
At IonBuilder time, it's not known that the second d[i] will always return the same result as the first one. GVN is IonMonkey's pass which figures such things out, so after GVN, the second load will have been replaced by a reuse of the first load, which would make it possible to apply the sincos optimization. It might be reasonable to do this optimization as a post-processing step in GVN. What do you think?

::: js/src/jit/IonBuilder.cpp
@@ +746,5 @@
>      return true;
>  }
>  
> +static void
> +OptimizeSinCos(IonBuilder *code)

Making functions static means that you can have functions with the same name, but in general, this kind of thing is still a little confusing for readers when they happen and they're closely related, as the two in this patch are. Using different names helps human readers, even if compilers don't need it :).

@@ +766,5 @@
> +                if (s->function() == MMathFunction::Sin)
> +                    s->setOptimizeSin();
> +                else
> +                    s->setOptimizeCos();
> +                code->optimizeSinCos.erase(&n);

Watch out, in general. This line passes the address of a local variable into erase.

@@ +772,5 @@
> +                j++;
> +            }
> +        }
> +        if (!(s->OptimizeSin() || s->OptimizeCos()))
> +            code->optimizeSinCos.erase(&s);

Ditto.
Attachment #8567656 - Flags: feedback?(sunfish)
(In reply to Dan Gohman [:sunfish] from comment #21)

Thanks for your feedback.

> Do we know if the Apple __sincos function is fast?
Yes, on Apple machine, we are still slower... (It's weird)
Box2D (average of 30 - Apple)
Before: 54942
After:  54468

> While IonBuilder works as a place to perform this optimization, it means
> that the optimization doesn't kick in on code like this:
>         var y = Math.sin(d[i]);
>         var z = Math.cos(d[i]);
> At IonBuilder time, it's not known that the second d[i] will always return
> the same result as the first one. GVN is IonMonkey's pass which figures such
> things out, so after GVN, the second load will have been replaced by a reuse
> of the first load, which would make it possible to apply the sincos
> optimization. It might be reasonable to do this optimization as a
> post-processing step in GVN. What do you think?

I'm not familiarized with GVN yet, so, I'll need some feedbacks..., ok? 

Thanks!
(In reply to Victor Carlquist from comment #22)
> > Do we know if the Apple __sincos function is fast?
> Yes, on Apple machine, we are still slower... (It's weird)
> Box2D (average of 30 - Apple)
> Before: 54942
> After:  54468

Ok. I don't have a good guess as to what's happening here. Can you insert some debug code and determine whether the patch is actually reducing the number of sin/cos/sincos calls?

> I'm not familiarized with GVN yet, so, I'll need some feedbacks..., ok? 

Sure. GVN (Global Value Numbering, and it's the pass in ValueNumbering.cpp) has the job of eliminating redundant expressions. For example, if the source code has this:

   var y = Math.sin(d[i]);
   var z = Math.cos(d[i]);

At IonBuilder time, each d[i] will show up as its own Load instruction MIR node. It will appear that the sin and cos functions here are operating on different values. When the GVN pass runs, it figures out that these two loads are loading the same value, and transforms the code to look effectively like this:

   var temp = d[i];
   var y = Math.sin(temp);
   var z = Math.cos(temp);

This is great on its own because now we only have to evaluate d[i] once rather than twice. But even better, if we do sin+cos optimization after GVN has done this, we'll see that the sin and cos are operating on the same value.
I found the issue, I forgot to add OptimizeSinCos(this) in IonBuilder::buildInline, sorry.
Although, It's slower:

Box2D (average of 30 - Apple)
Before: 54942 (old test)
After: 54667 (a little bit faster than the other patch :)

Maybe, tomorrow, I'll bench some native c++ code with sincos to check if it is faster than sin/cos.
Hi! I benched the sincos and sin/cos with the native c++ code:
> #include <cmath>
> 
> int main()
> {
>   double b,c;
>   for (int i = 0; i < 100000000; i++) { 
>      __sincos(i, &b, &c); 
>   }
>   return 0;
> }

and:

> #include <cmath>
> 
> int main()
> {
>   double b,c;
>   for (int i = 0; i < 100000000; i++) { 
>      b = sin(i);
>      c = cos(i); 
>   }
>   return 0;
> }

Dan, you were right about the issue with GLIBC, on my linux I got the following results:
sincos: 10.713s
sin/cos: 10.576s

But, on apple machine I got a nice result:
sincos: 4.465s
sin/cos: 8.622s

So, I have no idea why the patch is slower, yet!
Attached patch WIP - Sincos (obsolete) — Splinter Review
I'm using this patch to bench the Box2D.
Attachment #8567656 - Attachment is obsolete: true
Attached patch WIP - Sincos (obsolete) — Splinter Review
Hi, sorry for delay...

I'm not using the store/lookup in mathCache anymore, without this we are faster (~4%) on linux and apple machine :)
Following some results:
(average of 30)
Box2D linux: 
before: 25748
after: 26455

(average of 30)
Box2D apple: 
before: 54942
after: 56542

I still don't know why the __sincos on apple machine seems not work faster :(

Thanks!
Attachment #8571061 - Attachment is obsolete: true
Attachment #8574477 - Flags: feedback?(sunfish)
Nice :) How does Sunspider compare? I believe that's what the math cache was originally created for.
Attached file Sunspider (obsolete) —
Sunspider results!

I'll refactor the |OptimizeSinCos()|, because, on Linux, we are slower on some tests.
Comment on attachment 8574477 [details] [diff] [review]
WIP - Sincos

Review of attachment 8574477 [details] [diff] [review]:
-----------------------------------------------------------------

Cool, this patch is moving in the right direction.

Unfortunately, I had been hoping to see rather bigger speedups on Box2D, and possibly even on Sunspider 3d-cube and math-partial-sums. GLIBC's sincos is a bummer (mentioned above), but it'd be good to understand what's going on with the MacOS sincos before we get too much further here.

::: js/src/jit/Ion.cpp
@@ +1372,5 @@
>          if (mir->shouldCancel("GVN"))
>              return false;
>      }
>  
> +    OptimizeSinCos(mir, graph);

I think I'm going to suggest moving this pass even later, after RangeAnalysis. This because it's not really an enabling optimization that makes other optimizations better. And, it is somewhat helped by other optimizations.

Also, we should really dress this pass up with TraceLogger support and an option to disable it.
Attachment #8574477 - Flags: feedback?(sunfish)
Ok, I'll investigate the sincos on a Mac.
Attached patch Patch (obsolete) — Splinter Review
Hi Dan,

I tested the micro-bench below, and the patch seems to work fine:
> var x = 0;
> function test() {
>    function sincos(i, u) {
>        var c = Math.cos(i);
>        var s =  Math.sin(i);
>        
>        return  c + s;
>    }
>    for (var i=0; i<10000000; i++) {
>        x += sincos(i);
>    }
> }
> test(); 

Following the results:
With --ion-sincos=off
real	0m0.561s
user	0m0.553s
sys	0m0.007s

With --ion-sincos=on
real	0m0.305s
user	0m0.298s
sys	0m0.006s

We don't get a speed up on cube-2d because it isn't hot enough on MacOS. 
To confirm this, I put some messages in |OptimizeSinCos()| and it shows up the messages when I added |setJitCompilerOption("baseline.warmup.trigger", 0); setJitCompilerOption("ion.warmup.trigger", 1);| in cube-2d.js.

Sunspider's results (MacOS):

TEST                   COMPARISON               FROM                 TO       

==============================================================================

** TOTAL **:           -                 158.5ms +/- 0.8%    157.3ms +/- 0.5% 

==============================================================================

  3d:                  -                  27.3ms +/- 1.9%     27.3ms +/- 1.7% 
    cube:              ??                 10.4ms +/- 2.0%     10.5ms +/- 2.4% 
    morph:             -                   4.6ms +/- 1.6%      4.6ms +/- 1.5% 
    raytrace:          -                  12.3ms +/- 3.5%     12.2ms +/- 3.1% 

  access:              -                  14.4ms +/- 1.5%     14.4ms +/- 1.4% 
    binary-trees:      ??                  2.5ms +/- 2.9%      2.5ms +/- 2.9% 
    fannkuch:          -                   6.3ms +/- 2.0%      6.3ms +/- 2.5% 
    nbody:             ??                  2.8ms +/- 2.5%      2.8ms +/- 2.6% 
    nsieve:            -                   2.7ms +/- 1.9%      2.7ms +/- 1.5% 

  bitops:              -                   7.5ms +/- 1.1%      7.5ms +/- 1.0% 
    3bit-bits-in-byte: ??                  0.7ms +/- 1.8%      0.8ms +/- 1.5% 
    bits-in-byte:      -                   1.9ms +/- 2.1%      1.9ms +/- 2.3% 
    bitwise-and:       -                   1.9ms +/- 1.8%      1.9ms +/- 1.8% 
    nsieve-bits:       -                   3.0ms +/- 1.6%      3.0ms +/- 1.1% 

  controlflow:         ??                  2.2ms +/- 1.5%      2.2ms +/- 2.0% 
    recursive:         ??                  2.2ms +/- 1.5%      2.2ms +/- 2.0% 

  crypto:              ??                 15.7ms +/- 2.0%     15.8ms +/- 1.8% 
    aes:               ??                  7.8ms +/- 3.5%      7.8ms +/- 3.3% 
    md5:               ??                  4.3ms +/- 1.8%      4.3ms +/- 1.6% 
    sha1:              ??                  3.7ms +/- 2.4%      3.7ms +/- 2.7% 

  date:                ??                 18.7ms +/- 1.7%     19.1ms +/- 2.0% 
    format-tofte:      ??                  7.6ms +/- 2.9%      7.7ms +/- 3.6% 
    format-xparb:      ??                 11.1ms +/- 2.5%     11.3ms +/- 3.3% 

  math:                1.064x as fast      9.2ms +/- 1.7%      8.7ms +/- 2.2% 
    cordic:            -                   2.3ms +/- 3.0%      2.2ms +/- 3.8% 
    partial-sums:      1.107x as fast      5.3ms +/- 2.2%      4.7ms +/- 2.9% 
    spectral-norm:     -                   1.7ms +/- 2.5%      1.7ms +/- 2.0% 

  regexp:              -                   7.8ms +/- 1.8%      7.7ms +/- 0.7% 
    dna:               -                   7.8ms +/- 1.8%      7.7ms +/- 0.7% 

  string:              1.016x as fast     55.7ms +/- 1.4%     54.8ms +/- 0.5% 
    base64:            1.040x as fast      5.7ms +/- 3.4%      5.5ms +/- 1.3% 
    fasta:             -                   6.6ms +/- 1.3%      6.5ms +/- 1.0% 
    tagcloud:          -                  15.1ms +/- 1.7%     14.9ms +/- 0.8% 
    unpack-code:       -                  22.0ms +/- 3.0%     21.5ms +/- 0.7% 
    validate-input:    ??                  6.3ms +/- 1.1%      6.4ms +/- 1.2% 

Thanks.
Attachment #8574477 - Attachment is obsolete: true
Attachment #8577779 - Flags: feedback?(sunfish)
Comment on attachment 8577779 [details] [diff] [review]
Patch

Review of attachment 8577779 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay. Besides SunSpider, it'd be useful to see if this optimization helps box2d; a js benchmark can be found here [0].

Note for people following along; the present patch only works on non-asm.js code. Asm.js handles math functions differently.

One other high-level suggestion I have for this patch is that it seems desirable to disable the optimization entirely on platforms which don't actually have a sincos, or where it isn't actually faster.

[0] https://github.com/joelgwebber/bench2d
Attachment #8577779 - Flags: feedback?(sunfish)
I made the test, unfortunately, we didn't get a big speed up on Box2d (~%4)

--ion-sincos=off
~9.4s
--ion-sincos=on
~9s

I'll disable the sincos on platforms which don't have a sincos fast enough. I still want to do this today.

Thanks for your feedbacks!
Flags: needinfo?(sunfish)
Attached patch Patch (obsolete) — Splinter Review
Attachment #8577779 - Attachment is obsolete: true
Flags: needinfo?(sunfish)
Attachment #8585731 - Flags: feedback?(sunfish)
Box2d is one of the primary motivators for this optimization. Do you know if it contains sin/cos pairs which the current patch is unable to optimize?
Comment on attachment 8585731 [details] [diff] [review]
Patch

Review of attachment 8585731 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/CodeGenerator.cpp
@@ +5158,5 @@
> +            if (GenerateCodeToSincos(mathCache, this, temp, input, ins))
> +                return;
> +        }
> +
> +        // The siscos was called, so we need to get the value cached.

typo: siscos

::: js/src/jit/Ion.cpp
@@ +1177,5 @@
> +            for ( ; j;) {
> +                MDefinition *nextDef = *j;
> +                if (!nextDef->isMathFunction()) {
> +                    *j++;
> +                    continue;

Having a single special-purpose sincos field in the mathCache is a clever idea, but I'm worried that we could get into a situation where there's a call to sin(x) followed by some intervening code followed by a call to cos(x), and that the intervening code could involve, say, a call to a function which has its own sin/cos pair, clobbering the sincos cache. Is that possible? It looks like this code will walk past any arbitrary code.

An alternative approach which might be less risky would be to allocate the sincos cache field in a stack slot. That way, there'd be no risk of one sin/cos pair clobbering another sin/cos pair's value. What do you think?
Attachment #8585731 - Flags: feedback?(sunfish)
Attached patch Patch (WIP) (obsolete) — Splinter Review
(In reply to Dan Gohman [:sunfish] from comment #37)

Sorry for the delay, I've been busy lately....

> An alternative approach which might be less risky would be to allocate the
> sincos cache field in a stack slot. That way, there'd be no risk of one
> sin/cos pair clobbering another sin/cos pair's value. What do you think?

Good idea!
I used the native c++ stack to solve this issue. (I am not sure if it's right :)

> Box2d is one of the primary motivators for this optimization. Do you know if it contains sin/cos pairs which the current patch is unable to optimize?

I investigated the box2d in octane, It has only one sin/cos, and this patch can optimize it.

Thanks for your feedback ;)
Attachment #8585731 - Attachment is obsolete: true
Attachment #8596975 - Flags: feedback?(sunfish)
Comment on attachment 8596975 [details] [diff] [review]
Patch (WIP)

Review of attachment 8596975 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay on my side as well. I too have been quite busy.

When I said to store the cached value on the stack, I meant the JS function stack, similar to a register spill slot. Using a std::stack is a lot more than we need here. Is it possible to use a fixed temp fixed to a stack slot?

When you tested box2d, did you happen to measure the performance?
Attachment #8596975 - Flags: feedback?(sunfish)
(In reply to Dan Gohman [:sunfish] from comment #39)
> When I said to store the cached value on the stack, I meant the JS function
> stack, similar to a register spill slot. Using a std::stack is a lot more
> than we need here. Is it possible to use a fixed temp fixed to a stack slot?

I'm a beginner, so, I never used slots (yet), sorry :), I have a question about where I should store the sincos value. 

Should I store the value in the js::HeapSlot *slots_ field in a NativeObject and load it on the next sin or cos call using something like the line below?

> masm.loadPtr(Address(ObjectReg, NativeObject::offsetOfSlots())

Or Does It have another stack slot?
Flags: needinfo?(sunfish)
I'm unfortunately pretty busy elsewhere, but :nbp expressed an interest in this bug recently. :nbp, what do you see as the best way forward for this feature?
Flags: needinfo?(sunfish) → needinfo?(nicolas.b.pierron)
Hi Dan, 
ok, no problem ;)
Comment on attachment 8596975 [details] [diff] [review]
Patch (WIP)

Review of attachment 8596975 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry, if you already tried this in the past (or not), but I was wondering if it would not be easier to have a new MIRType_SinCosDouble, which just gets translated into 2 LIR outputs of one MIR instruction.

Then the transform will match:

1 Op
5 MMathFunction[Sin] (Op1)
8 MMathFunction[Cos] (Op1)

and modify it to:

1 Op
2 MSinCos Op1
5 MMathFunction[Sin] (SinCos2)    // alias input
8 MMathFunction[Cos] (SinCos2)    // alias input

As an example, we already have multiple LIR output when we are producing boxed values (MIRType_Value) on 32 bits platforms.

What do you think?

::: js/src/jit/CodeGenerator.cpp
@@ +5145,5 @@
> +
> +static double getSincos(MathCache *cache)
> +{
> +    double r = cache->sincos.top();
> +    cache->sincos.pop();

I might be wrong, but don't we have an issue with mixing up cache entries with a test case like:

function foo(a, b) {
  var sa = Math.sin(a);
  var sb = Math.sin(b);
  var ca = Math.cos(a);
  var cb = Math.cos(b);
  assertEq(sa * sa + ca * ca, sb * sb + cb * cb);
}


Also, saving the result of the sincos in the MathCache sounds like we might be manipulating extra memory.

::: js/src/jit/Ion.cpp
@@ +1187,5 @@
> +                    continue;
> +                }
> +
> +                // Check if the next sin/cos have the same parameter.
> +                if (def->getOperand(0)->id() == nextDef->getOperand(0)->id()) {

This second loop sounds like this might be a O(n²) complexity.  I wonder it would not be easier to walk over the uses of the operand instead of looking at the control flow graph.

for (MUseDef uses(def->getOperand(0)); …; uses++) {
    …
    if (nextDef->function() == …)
}
(In reply to Nicolas B. Pierron [:nbp] from comment #43)
> Comment on attachment 8596975 [details] [diff] [review]
> Patch (WIP)
> 
> Review of attachment 8596975 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Sorry, if you already tried this in the past (or not), but I was wondering
> if it would not be easier to have a new MIRType_SinCosDouble, which just
> gets translated into 2 LIR outputs of one MIR instruction.
> 
> Then the transform will match:
> 
> 1 Op
> 5 MMathFunction[Sin] (Op1)
> 8 MMathFunction[Cos] (Op1)
> 
> and modify it to:
> 
> 1 Op
> 2 MSinCos Op1
> 5 MMathFunction[Sin] (SinCos2)    // alias input
> 8 MMathFunction[Cos] (SinCos2)    // alias input
> 
> As an example, we already have multiple LIR output when we are producing
> boxed values (MIRType_Value) on 32 bits platforms.
> 
> What do you think?

It's a good idea, thanks!
I'll try to insert the MSinCos modifying the OptimizeSincos function in Ion.cpp (I don't know if it's possible to do, but I'll study it)

Cleaning needinfo? flag ;)
Flags: needinfo?(nicolas.b.pierron)
Attached patch Patch - MIRTYPE_SinCosDouble (obsolete) — Splinter Review
Hi Nicolas,

thanks for your last feedback :)

This patch is working, but I used a static double to return the second value of the sincos, Is there a better way to make this?
Attachment #8596975 - Attachment is obsolete: true
Attachment #8631929 - Flags: feedback?(nicolas.b.pierron)
Comment on attachment 8631929 [details] [diff] [review]
Patch - MIRTYPE_SinCosDouble

Review of attachment 8631929 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work, there is still some issues/improvements but this is a big step forward :)

::: js/src/jit/CodeGenerator.cpp
@@ +5124,5 @@
>      MOZ_ASSERT(ToFloatRegister(ins->output()) == ReturnDoubleReg);
>  
> +    if (ins->mir()->optimizeSin() || ins->mir()->optimizeCos()) {
> +        FloatRegister output = ToFloatRegister(ins->output());
> +        masm.moveDouble(input, output);

Use the "redefine" in the lowering instead of doing a move in the CodeGen.

@@ +6229,5 @@
> +    FloatRegister outputCos = ToFloatRegister(lir->outputCos());
> +
> +    static double cos;
> +    masm.setupUnalignedABICall(2, tmp);
> +    masm.movePtr(ImmPtr(&cos), tmp);

Do not use a static location, because we can have multiple worker thread which each have a different runtime, which might race on this static memory location.  Have a look at the trampoline code (generateVMWrapper) on how we handle out-param pointers, by reserving space on the stack.

::: js/src/jit/Ion.cpp
@@ +1177,5 @@
> +            if (!(defFunc->function() == MMathFunction::Sin || defFunc->function() == MMathFunction::Cos))
> +                continue;
> +
> +            if (iter == end)
> +                break;

How could this condition succeed?

@@ +1179,5 @@
> +
> +            if (iter == end)
> +                break;
> +
> +            for (MUseDefIterator uses(def->getOperand(0)); uses; uses++) {

This loop is made to replace the argument of the MMathFunction by a MSinCos, but if we are to replace this value in both of them, when we hit the second MMathFunction, we would probably already have a MSinCos as argument.

We should skip this loop if the first operand is already a MSinCos instruction.

@@ +1183,5 @@
> +            for (MUseDefIterator uses(def->getOperand(0)); uses; uses++) {
> +                MInstruction *nextDef = uses.def()->toInstruction();
> +                if (!nextDef->isMathFunction()) {
> +                    continue;
> +                }

nit: don't brace single statement.

@@ +1194,5 @@
> +                // Check if the next sin/cos have the same parameter.
> +                if (def->getOperand(0)->id() == nextDef->getOperand(0)->id()) {
> +                    if (defFunc->function() == MMathFunction::Sin) {
> +                        defFunc->setOptimizeSin();
> +                        nextFunc->setOptimizeCos();

I understand how this end-up being working, because of GVN, but would *highly* prefer if you could make this work with GVN disabled, i-e do not assume that we have no duplicate.

You can disable gvn on the JS shell command line with  --ion-gvn=off

Also add tests cases, such as the one the I mentionned in my previous comment.

::: js/src/jit/IonTypes.h
@@ +364,5 @@
>      MIRType_Boolean,
>      MIRType_Int32,
>      MIRType_Double,
>      MIRType_Float32,
> +    MIRType_SinCosDouble,              // Optimizing a sin/cos to sincos.

Move it after MIRType_Value.

::: js/src/jit/LIR.h
@@ +572,1 @@
>              return LDefinition::DOUBLE;

This sounds wrong, to associate only one Double output to a SinCos instruction.

::: js/src/jit/Lowering.cpp
@@ +2880,5 @@
> +               ins->input()->type() == MIRType_Float32 ||
> +               ins->input()->type() == MIRType_Int32);
> +
> +    LSinCos *lir = new (alloc()) LSinCos(useRegisterAtStart(ins->input()), tempFixed(CallTempReg0));
> +    defineReturn(lir, ins);

You should have a look at defineBox, and create a similar function named defineSinCos.

::: js/src/jit/MIR.h
@@ +5980,5 @@
>  
> +    void setOptimizeSin() {
> +        MOZ_ASSERT(!isOptimizeSin_);
> +        isOptimizeSin_ = true;
> +    }

These function no longer seems to be necessary, as the SinCos operand (with its MIRType_SinCosDouble) should be enough to distinguish between an optimized Sin/Cos in the lowering phase.

@@ +6502,5 @@
> +    }
> +
> +    MDefinition *input()
> +    {
> +        return getOperand(0);

This function is already provided by MUnaryInstruction. ;)

::: js/src/jit/shared/Lowering-shared-inl.h
@@ +146,5 @@
>          break;
> +      case MIRType_SinCosDouble:
> +        lir->setDef(0, LDefinition(vreg, LDefinition::DOUBLE, LFloatReg(ReturnDoubleReg)));
> +#if defined(JS_CODEGEN_MIPS)
> +        lir->setDef(1, LDefinition(vreg + VREG_INCREMENT, LDefinition::DOUBLE, LFloatReg(f2)));

defineReturn is used for the ABI hard-coded register output, but not for out-param, so-far.  I do not think you should hard-code the register of the out-param.

@@ +150,5 @@
> +        lir->setDef(1, LDefinition(vreg + VREG_INCREMENT, LDefinition::DOUBLE, LFloatReg(f2)));
> +#elif defined(JS_CODEGEN_ARM)
> +        lir->setDef(1, LDefinition(vreg + VREG_INCREMENT, LDefinition::DOUBLE, LFloatReg(d1)));
> +#else
> +        lir->setDef(1, LDefinition(vreg + VREG_INCREMENT, LDefinition::DOUBLE, LFloatReg(xmm1)));

do not use #else for a fallback on unimplemented architectures.
Use

#elif defined(…) || defined(…)
…
#else
#error("Unsupported architecture for SinCos")
#endif

@@ +451,5 @@
> +LIRGeneratorShared::useSinCos(MDefinition *mir, MMathFunction *math)
> +{
> +    MOZ_ASSERT(mir->type() == MIRType_SinCosDouble);
> +
> +    return LUse(VirtualRegisterOfSinCos(mir, math), LUse::REGISTER, true);

Nice work :)

::: js/src/jsmath.cpp
@@ +934,5 @@
> +
> +#if defined(__GLIBC__)
> +    sincos(x, &sin, cos);
> +#elif defined(__APPLE__)
> +    __sincos(x, &sin, cos);

Any reasons for not using these functions directly?  From what I understand, using js::math_sincos instead of sincos saves you from restoring the "sin" value from the stack,  which would be what you will do in the codegen for both sin & cos if you did not had this wrapper.
Attachment #8631929 - Flags: feedback?(nicolas.b.pierron) → feedback+
Attached patch patch - WIP (obsolete) — Splinter Review
Thanks Nicolas!

This patch is failing on this test:
var x = 0;
function test() {
	function foo(x) {
		var s1 = Math.sin(x);
		var c1 = Math.cos(x);
		
		return s1 + c1;
	}
    for (var i=0; i<500; i++) {
        x += foo(i); 
    }
}
x += 1;
test();

It is showing up:
> Assertion failure: uintptr_t(obj) > 0x1000 || uintptr_t(obj) == 0x42, at /js/src/shell/../js/Value.h:828

What does it mean?
Attachment #8631929 - Attachment is obsolete: true
Attachment #8633446 - Flags: feedback?(nicolas.b.pierron)
Comment on attachment 8633446 [details] [diff] [review]
patch - WIP

Review of attachment 8633446 [details] [diff] [review]:
-----------------------------------------------------------------

Nice, but I wondering about bailouts, can you add the following test case:

function bailoutHere() { bailout(); }

function sincos3(x) {
  var s = Math.sin(x);
  bailoutHere();
  var c = Math.cos(x);
  assertEq(Math.round(s * s + c * c), 1);
}

::: js/src/jit/CodeGenerator.cpp
@@ +6227,5 @@
> +
> +    masm.reserveStack(sizeof(double));
> +    masm.movq(esp, tmpSin);
> +    masm.reserveStack(sizeof(double));
> +    masm.movq(esp, tmpCos);

You can reserve 2 * sizeof(double), and only use one temp register for both Sin & Cos.

::: js/src/jit/Ion.cpp
@@ +1168,5 @@
> +    // - 7 mathfunction sincos6 Sin
> +    // - 8 mathfunction sincos6 Cos
> +    for (MBasicBlockIterator i(graph.begin()); i != graph.end(); i++) {
> +        for (MInstructionIterator iter(i->begin()), end(i->end()); iter != end; ) {
> +            MInstruction *def = *iter++;

nit: the coding style changed, and now we are writing  "MInstruction* def"  :/

@@ +1170,5 @@
> +    for (MBasicBlockIterator i(graph.begin()); i != graph.end(); i++) {
> +        for (MInstructionIterator iter(i->begin()), end(i->end()); iter != end; ) {
> +            MInstruction *def = *iter++;
> +            if (!def->isMathFunction() || def->getOperand(0)->type() == MIRType_SinCosDouble)
> +                continue;

Filter out recovered on bailout instructions.  And make sure the MMathFunction(Sin) cannot be recovered on bailout.

@@ +1179,5 @@
> +
> +            for (MUseDefIterator uses(def->getOperand(0)); uses; uses++) {
> +                MInstruction *nextDef = uses.def()->toInstruction();
> +                if (!nextDef->isMathFunction())
> +                    continue;

same here.

@@ +1202,5 @@
> +                    nextFunc->replaceOperand(0, ins);
> +                }
> +
> +                // Jump to next instruction, because we found a sin/cos par.
> +                break;

Replace all inputs, not only the last two.  Otherwise sincos2 would be slower if gvn is disabled.

::: js/src/jit/shared/Lowering-shared-inl.h
@@ +165,5 @@
> +    MOZ_ASSERT(!lir->isCall());
> +
> +    uint32_t vreg = getVirtualRegister();
> +
> +    lir->setDef(0, LDefinition(vreg, LDefinition::DOUBLE, LFloatReg(ReturnDoubleReg)));

We do not need a specific use of this double register.
Attachment #8633446 - Flags: feedback?(nicolas.b.pierron) → feedback+
(In reply to Victor Carlquist from comment #47)
> It is showing up:
> > Assertion failure: uintptr_t(obj) > 0x1000 || uintptr_t(obj) == 0x42, at /js/src/shell/../js/Value.h:828
> 
> What does it mean?

This means that we are trying to do a setObject with something which is probably not an object.
Do you have a stack trace?
Thanks for your feedback again :)

Here is the trace stack and lir graph.


(gdb) backtrace 
#0  0x00000000004138b3 in OBJECT_TO_JSVAL_IMPL (obj=0x126) at /home/victor/Documentos/firefox/SpiderMonkey/js/src/shell/../js/Value.h:828
#1  0x00000000004308a2 in JS::Value::setObject (this=0x7fffffffa318, obj=...)
    at /home/victor/Documentos/firefox/SpiderMonkey/js/src/shell/../js/Value.h:1057
#2  0x00000000004465e3 in JS::Value::setObjectOrNull (this=0x7fffffffa318, arg=0x126)
    at /home/victor/Documentos/firefox/SpiderMonkey/js/src/shell/../js/Value.h:1095
#3  0x0000000000b636ec in JS::ObjectOrNullValue (obj=0x126) at /home/victor/Documentos/firefox/SpiderMonkey/js/src/js/Value.h:1604
#4  0x0000000000b49935 in js::jit::FromObjectPayload (payload=294) at /home/victor/Documentos/firefox/SpiderMonkey/js/src/jit/JitFrames.cpp:1753
#5  0x0000000000b497c6 in js::jit::FromTypedPayload (type=JSVAL_TYPE_OBJECT, payload=294)
    at /home/victor/Documentos/firefox/SpiderMonkey/js/src/jit/JitFrames.cpp:1781
#6  0x0000000000b492e3 in js::jit::SnapshotIterator::allocationValue (this=0x7fffffffaf30, alloc=..., rm=js::jit::SnapshotIterator::RM_Normal)
    at /home/victor/Documentos/firefox/SpiderMonkey/js/src/jit/JitFrames.cpp:1864
#7  0x0000000000addf5d in js::jit::SnapshotIterator::read (this=0x7fffffffaf30)
    at /home/victor/Documentos/firefox/SpiderMonkey/js/src/jit/JitFrameIterator.h:541
#8  0x00000000009b9fe5 in InitFromBailout (cx=0x1f331c0, caller=..., callerPC=0x0, fun=..., script=..., ionScript=0x204cbb0, iter=..., 
    invalidate=false, builder=..., startFrameFormals=..., nextCallee=..., callPC=0x7fffffffadc8, excInfo=0x0)
    at /home/victor/Documentos/firefox/SpiderMonkey/js/src/jit/BaselineBailouts.cpp:775
#9  0x00000000009b82d1 in js::jit::BailoutIonToBaseline (cx=0x1f331c0, activation=0x7fffffffb900, iter=..., invalidate=false, 
    bailoutInfo=0x7fffffffb688, excInfo=0x0) at /home/victor/Documentos/firefox/SpiderMonkey/js/src/jit/BaselineBailouts.cpp:1514
#10 0x00000000009b765b in js::jit::Bailout (sp=0x7fffffffb690, bailoutInfo=0x7fffffffb688)
    at /home/victor/Documentos/firefox/SpiderMonkey/js/src/jit/Bailouts.cpp:53
#11 0x00007ffff7f7292c in ?? ()
#12 0x00007fffffffb6b0 in ?? ()
#13 0x00007fffffffb688 in ?? ()
#14 0x0000000001fa3320 in ?? ()
#15 0x0000000000000000 in ?? ()

############################################

IONFLAGS=logs,bailouts  _DBG_OBJ/js/src/js --ion-sincos=on --ion-offthread-compile=off --ion-eager --no-threads   test_sincos.js 
[IonBailouts] Took bailout! Snapshot offset: 0
[IonBailouts]  bailing from bytecode: nop, MIR: constant [0], LIR: label [0]
[IonBailouts] Took invalidation bailout! Snapshot offset: 40
[IonBailouts]  bailing from bytecode: nop, MIR: constant [0], LIR: osipoint [15]
[IonBailouts] Took bailout! Snapshot offset: 0
[IonBailouts]  bailing from bytecode: nop, MIR: constant [0], LIR: label [0]
[IonBailouts] Took bailout! Snapshot offset: 0
[IonBailouts]  bailing from bytecode: nop, MIR: constant [0], LIR: label [0]
[IonBailouts] Took invalidation bailout! Snapshot offset: 114
[IonBailouts]  bailing from bytecode: nop, MIR: constant [0], LIR: osipoint [38]
[IonBailouts] Took bailout! Snapshot offset: 114
[IonBailouts]  bailing from bytecode: return, MIR: unbox [49], LIR: unbox [40]
Flags: needinfo?(nicolas.b.pierron)
(In reply to Victor Carlquist from comment #50)
> Created attachment 8634163 [details]
> func06-pass00-Allocate Registers [Backtracking]-lir.gv.png
> 
> Thanks for your feedback again :)
> 
> Here is the trace stack and lir graph.

The backtrace highlights that we are trying to reconstruct an object when we bailout.  The only objects which are present are the scope chain and the function environment.  One thing we can notice, is that they remain alive, even across the call made in SinCos, which is likely the issue you are facing here.

It is easier to understand with "r8", our register allocator should save this register on the stack in order to prevent the it from being erased by any of the registers used in the sincos function.  Thus, I think the problem comes from the fact that LSinCos is not flagged as isCall.  I suggest you replace

> class LSinCos : public LInstructionHelper<2, 1, 3>

by

> class LSinCos : public LCallInstructionHelper<2, 1, 3>
Flags: needinfo?(nicolas.b.pierron)
Attached patch WIP - Patch (obsolete) — Splinter Review
Hi Nicolas, 

You were right about the LCallInstructionHelper, thanks!

This patch only works with the --ion-regalloc=stupid :(, because using the backtraking or lsra it shows up:
> Assertion failure: callPositions.length(), at js/src/jit/BacktrackingAllocator.cpp:2299

and if we comment the second replaceOperand in Ion.cpp
> nextFunc->replaceOperand(0, insSinCos);

it shows up this assert:
> Assertion failure: !minimalInterval(interval), at js/src/jit/BacktrackingAllocator.cpp:610

I'm studying how the backtracking is allocating the regs to fix it.
Attachment #8633446 - Attachment is obsolete: true
Attachment #8634163 - Attachment is obsolete: true
Attachment #8635851 - Flags: feedback?(nicolas.b.pierron)
Brian / Sunfish, do you have any idea what might be wrong in conment 52?
Victor, can you attach the last LIR phase before this failure?
Flags: needinfo?(sunfish)
Flags: needinfo?(bhackett1024)
Attached image func06-pass00-Generate LIR-lir.gv.png (obsolete) —
Sure.

Look the test case in the comment 47, please.

Thanks.
Hi folks,

The failure |callPositions.length()|, in comment 52, is occurring because the |splitAcrossCalls| method is trying to split a bundle that is already split, so the condition: 
> if (bundle->rangeFor(callRange->from()) && bundle->rangeFor(callRange->from().previous()) 
in |splitAcrossCalls| is false, because the position of the call is in other range, so the callPosition.length() returns 0.

Is the |splitAcrossCalls| in BacktrackingAllocator.cpp  called twice because the sincos has two Defs?
Can you attach the spew produced when setting IONFLAGS=regalloc?  It's hard to say what the backtracking allocator is doing without that output.
Attached file regalloc.txt (obsolete) —
Sure!
I'm using the test case below:
> var x = 0;
> function foo(v) {
>   var s1 = Math.sin(v);
>   var c1 = Math.cos(v);
>   return s1 + c1 ;
> }
> for (var i = 0; i < 500; i++)
>    x += foo(i); 
> x += 1;
Thanks.  The problem seems to be that the patch is creating an instruction LSinCos which is a call but does not have fixed registers for the outputs.  The backtracking allocator treats all registers as in use by call instructions so won't be able to find registers for the output vregs of LSinCos.  I think that LIRGeneratorShared::defineSinCos should specify specific registers for the output, like defineReturn() does.
Flags: needinfo?(bhackett1024)
(In reply to Brian Hackett (:bhackett) from comment #59)
Thank you for your fast reply!
I'll make the modifications.
Attached patch Patch (obsolete) — Splinter Review
Hi Nicolas, 
I did some the modifications and now we are using the mathcache because we are faster with it.

Tonight, I'll send the benchmark with Mac.

Thank you.
Attachment #8635851 - Attachment is obsolete: true
Attachment #8635851 - Flags: feedback?(nicolas.b.pierron)
Attachment #8644917 - Flags: feedback?(nicolas.b.pierron)
Attached file Bench - Mac
Hi guys,

The test with sin/cos in a loop is faster, 
but with the box2d (bench2d) we can't see a big difference in the time execution :/
Comment on attachment 8644917 [details] [diff] [review]
Patch

Review of attachment 8644917 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/CodeGenerator.cpp
@@ +6370,5 @@
> +    FloatRegister outputSin = ToFloatRegister(lir->outputSin());
> +    FloatRegister outputCos = ToFloatRegister(lir->outputCos());
> +
> +    masm.reserveStack(sizeof(double) * 2);
> +    masm.movePtr(esp, params);

use getStackPointer()

@@ +6391,5 @@
> +
> +    masm.callWithABI(JS_FUNC_TO_DATA_PTR(void *,  MAYBE_CACHED(js::math_sincos)));
> +
> +    masm.loadDouble(Address(esp, 0), outputCos);
> +    masm.loadDouble(Address(esp, sizeof(double)), outputSin);

and here as well.

::: js/src/jit/Ion.cpp
@@ +1305,5 @@
> +            if (!(insFunc->function() == MMathFunction::Sin || insFunc->function() == MMathFunction::Cos))
> +                continue;
> +
> +            for (MUseDefIterator uses(ins->getOperand(0)->type() != MIRType_SinCosDouble ?
> +                 ins->getOperand(0) : ins->getOperand(0)->getOperand(0)); uses; uses++)

Move the computation of the argument of the MUseDefIterator above, and add a comment, the mention that we are trying to match the initiali operand, and thus walk past the MSinCos instruction.

@@ +1312,5 @@
> +                if (!nextIns->isMathFunction())
> +                    continue;
> +
> +                if (nextIns->getOperand(0)->type() == MIRType_SinCosDouble ||
> +                    nextIns->isRecoveredOnBailout())

if (a || b)
  continue;

can be rewritten to

if (a)
  continue;

if (b)
  continue;

Do that here.

@@ +1324,5 @@
> +                // parameter in a block using only one sincos call.
> +                if (!((nextFunc->function() == MMathFunction::Sin  ||
> +                       nextFunc->function() == MMathFunction::Cos) &&
> +                      (nextFunc->function() != insFunc->function() ||
> +                       insFunc->input()->type() == MIRType_SinCosDouble)))

and for this expression too.


Also, I am not sure this work as expected if you have multiple calls such as:

  [ cos(x), cos(x), cos(x), sin(x) ]

I think solving this problem would be way easier for you if you iterate twice on the usesList, a first time to analyze if a SinCos is interesting, and a second time to make the transformation on all uses at once.


bool hasSin = false;
bool hasCos = false;
for (MuseDefIterator uses(ins->input()); uses; uses++) {
    …

    if (!hasSin && mathUse->function() == MMathFunction::Sin)
        hasSin = true;
    else if (!hasCos && mathUse->function() == MMathFunction::Cos)
        hasCos = true;
    else
        continue;

    if (hasSin && hasCos)
        break;
}

if (!hasSin || !hasCos)
    continue;

MSinCos *insSinCos = MSinCos::New(…);

for (MuseDefIterator uses(ins->input()); uses; uses++) {
    …

    if (mathUse->function() != MMathFunction::Sin && mathUse->function() != MMathFunction::Cos)
        continue;

    mathUse->replaceOperand(0, insSinCos);
}


What do you think?

Note, every program transformation phase can be transformed into an Analyze & Transform steps, and is easier when you address context-dependent problems.

::: js/src/jit/MIR.cpp
@@ +1262,5 @@
>      return ins->toParameter()->index() == index_;
>  }
>  
> +MDefinition*
> +MSinCos::foldsTo(TempAllocator& alloc)

I would prefer, if we did not add a MSinCosConstant, and if we added this as part of the MMathFunction, as we should already have code there, and the SinCos, would then become dead.

::: js/src/jit/shared/Lowering-shared-inl.h
@@ +166,5 @@
> +    lir->setDef(1, LDefinition(vreg + VREG_INCREMENT, LDefinition::DOUBLE, LFloatReg(d1)));
> +#elif defined(JS_CODEGEN_X86) || defined(JS_CODEGEN_X64)
> +    lir->setDef(1, LDefinition(vreg + VREG_INCREMENT, LDefinition::DOUBLE, LFloatReg(xmm1)));
> +#else
> +#error "Unsupported architecture for SinCos"

add code for JS_CODEGEN_ARM64, and JS_CODEGEN_NONE.
Attachment #8644917 - Flags: feedback?(nicolas.b.pierron) → feedback+
(In reply to Nicolas B. Pierron [:nbp] from comment #63)
> Comment on attachment 8644917 [details] [diff] [review]
> Patch
> 
> Review of attachment 8644917 [details] [diff] [review]:
 
> I think solving this problem would be way easier for you if you iterate
> twice on the usesList, a first time to analyze if a SinCos is interesting,
> and a second time to make the transformation on all uses at once.
> What do you think?

It's an awesome idea and very elegant solution! Thank you =D.

I'll do the modifications and bench the code again.
Attached patch Patch (obsolete) — Splinter Review
Hi Nicolas,
Thank you for the great idea about Analyse & Transform the sincos and all feedbacks!

I'm worried about the bench results [attachment 8645268 [details]], 
this patch is not faster in a real world...

I'll bench this new patch to see if the performance will be better than previous patch.
Attachment #8574976 - Attachment is obsolete: true
Attachment #8635967 - Attachment is obsolete: true
Attachment #8642662 - Attachment is obsolete: true
Attachment #8644917 - Attachment is obsolete: true
Attachment #8648488 - Flags: feedback?(nicolas.b.pierron)
Hi guys, 
I made the bench with the new patch, the results keep the same (look the attachment 8645268 [details])
Thanks.
Flags: needinfo?(sunfish)
Comment on attachment 8648488 [details] [diff] [review]
Patch

Review of attachment 8648488 [details] [diff] [review]:
-----------------------------------------------------------------

(Sorry for the delay)

From what I see from the micro benchmarks, this saves 1/3 of the time.  Which sounds like a good sign if we consider this micro benchmark to be representative. Another question is if there is any benchmark which relies on this pattern, as we expected, or if we have near misses, and what can we do to optimize even more.

Can you instrument the new MIR phase to spew (JitSpew functions) messages when the feature is used, and when it hit/miss an optimization opportunity?

::: js/src/jit/CodeGenerator.cpp
@@ +6368,5 @@
> +        masm.movePtr(ImmPtr(mathCache), temp);
> +        masm.passABIArg(temp);
> +    }
> +
> +#define MAYBE_CACHED(fcn) (mathCache ? (void*)fcn ## _impl : (void*)fcn ## _uncached)

nit: Kannan once suggested that temporary macro should have a _ suffix.

::: js/src/jit/Ion.cpp
@@ +1309,5 @@
> +                continue;
> +
> +            // We'll analyse the block to find sin(x) and cos(x) with the same
> +            // parameter and adding the MSinCos before calling and replacing
> +            // then parameters of the sin and cos to the MSinCos.

nit: s/block/uses/; s/parameter/argument/

@@ +1343,5 @@
> +            MSinCos *insSinCos = MSinCos::New(graph.alloc(),
> +                                              insFunc->input(),
> +                                              insFunc->toMathFunction()->cache());
> +            insSinCos->setImplicitlyUsedUnchecked();
> +            block->insertBefore(insFunc, insSinCos);

Add the following test case:

function f(x) {
  if (x < 0.5)
    x = sin(x);
  else
    x = cos(x);
  return x;
}

And test this with a debug build.  This should raise an issue which is caused by the fact that sincos is not dominating the cos instruction, which is caused by this line.

::: js/src/jit/Lowering.cpp
@@ +1398,5 @@
> +                  ins->type() == ins->input()->type());
> +
> +    if (ins->input()->type() == MIRType_SinCosDouble) {
> +        MOZ_ASSERT(ins->type() == MIRType_Double);
> +        redefine(ins, ins->input());

Reading this code out-of-context is weird, I think this deserves a comment saying that this will either alias the Sin or Cos input, based on the operation of the MathFunction.

Also, assert that the function is either  Sin or Cos.
And, I think having a new redefine function with third argument which correspond to the MMathFunction operation as argument will make much more sense.
Attachment #8648488 - Flags: feedback?(nicolas.b.pierron) → feedback+
Attached patch Path Sincos (obsolete) — Splinter Review
Hi Nicolas, thank you for your feedback.

In the previous patch, I was replacing the sin/cos in the same block only, now, we are replacing the sin/cos in all blocks where it dominates other block.

Following some results:

Box2d Octane (average  of 50)
49687.26  --ion-sincos=off
50154.52  --ion-sincos=on

Microbench (average of 10)
0m0.589s  --ion-sincos=off
0m0.379s  --ion-sincos=on


bench2d (average of 10)
ms/frame 9.4204101563  --ion-sincos=off
ms/frame 9.1006835938  --ion-sincos=on
Attachment #8654611 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8654611 [details] [diff] [review]
Path Sincos

Review of attachment 8654611 [details] [diff] [review]:
-----------------------------------------------------------------

Apply the nits, fix the comments, and re-attach a new patch on this bug.

::: js/src/jit/CodeGenerator.cpp
@@ +6408,5 @@
> +                                MoveOp::GENERAL);
> +    masm.passABIArg(MoveOperand(params, 0, MoveOperand::EFFECTIVE_ADDRESS),
> +                                MoveOp::GENERAL);
> +
> +    masm.callWithABI(JS_FUNC_TO_DATA_PTR(void *,  MAYBE_CACHED_(js::math_sincos)));

nit: void␣* -> void*
nit: ,␣␣MAYBE_CACHED_  ->  ,␣MAYBE_CACHED_

@@ +6414,5 @@
> +    masm.loadDouble(Address(masm.getStackPointer(), 0), outputCos);
> +    masm.loadDouble(Address(masm.getStackPointer(), sizeof(double)), outputSin);
> +    masm.freeStack(sizeof(double) * 2);
> +
> +#undef MAYBE_CACHED_

nit: Move this line right after the callWithABI.

::: js/src/jit/Ion.cpp
@@ +1390,5 @@
> +    // Graph will look like:
> +    // - 1 op
> +    // - 6 sincos op1
> +    // - 7 mathfunction sincos6 Sin
> +    // - 8 mathfunction sincos6 Cos

nit: In this comment, the Id of the MMathFunction should remain identical, as we only substitute their inputs.

@@ +1407,5 @@
> +                continue;
> +
> +            // We'll analyse the use to find sin(x) and cos(x) with the same
> +            // argument and adding the MSinCos before calling and replacing
> +            // then argument of the sin and cos to the MSinCos.

nit: This comment is not clear, maybe something like:

  // insFunc is either a |sin(x)| or |cos(x)| instruction. The following loop iterates over the uses of |x| to check if both |sin(x)| and |cos(x)| instructions exist.

@@ +1416,5 @@
> +                if (!uses.def()->isInstruction())
> +                    continue;
> +
> +                // We should replace the argument of the sin/cos just when it
> +                // is dominating by the |block|.

nit: is dominating by -> is dominated by

@@ +1439,5 @@
> +                    break;
> +            }
> +
> +            if (!hasCos || !hasSin) {
> +                JitSpew(JitSpew_Sincos, "Not found a pair sin/cos.");

nit: No sin/cos pair found.

@@ +1454,5 @@
> +            insSinCos->setImplicitlyUsedUnchecked();
> +            block->insertBefore(insFunc, insSinCos);
> +            for (MUseDefIterator uses(insFunc->input()); uses; uses++)
> +            {
> +                if (!uses.def()->isInstruction())

Before this line add:

  MDefinition* def = uses.def();
  uses++;

and replace all references of  uses.def()  by  def  in the rest of this loop.

@@ +1458,5 @@
> +                if (!uses.def()->isInstruction())
> +                    continue;
> +
> +                // We should replace the argument of the sin/cos just when it
> +                // is dominating by the |block|.

nit: is dominating by -> is dominated by

@@ +1473,5 @@
> +
> +                mathIns->replaceOperand(0, insSinCos);
> +
> +                // We need to refresh the operand uses because we modified it.
> +                uses.refresh(insSinCos->input());

MUses is a linked list, thus replacing an operand will after having passed the element is not a problem.  Thus this refresh becomes useless.

::: js/src/jit/JitSpewer.cpp
@@ +403,5 @@
>              "  escape     Escape analysis\n"
>              "  alias      Alias analysis\n"
>              "  gvn        Global Value Numbering\n"
>              "  licm       Loop invariant code motion\n"
> +            "  sincos     Optimization sin/cos by sincos\n"

nit: Optimization -> Replace

::: js/src/jit/MIR.h
@@ +6645,5 @@
> +    bool congruentTo(const MDefinition *ins) const override {
> +        return congruentIfOperandsEqual(ins);
> +    }
> +    bool canRecoverOnBailout() const override {
> +        return false;

nit: This is the default value, no need to override this function.

::: js/src/jit/shared/Lowering-shared-inl.h
@@ +218,5 @@
>          (from == MIRType_Int32 || from == MIRType_Boolean)) {
>          return true;
>      }
> +    if (to == MIRType_Double && from == MIRType_SinCosDouble)
> +        return true;

nit: Remove this lines, …

@@ +230,5 @@
> +// We can redefine the sin(x) and cos(x) function to return the sincos result.
> +void
> +LIRGeneratorShared::redefine(MDefinition* def, MDefinition* as, MMathFunction::Function func)
> +{
> +    MOZ_ASSERT(IsCompatibleLIRCoercion(def->type(), as->type()));

nit: Replace this assertion by the corresponding MIRType of the input/output.
Attachment #8654611 - Flags: review?(nicolas.b.pierron) → review+
Attached patch Path sincos (obsolete) — Splinter Review
Attachment #8648488 - Attachment is obsolete: true
Attachment #8654611 - Attachment is obsolete: true
Attachment #8656245 - Flags: review+
Hi, 
I have enabled the sincos in all platforms to do the tests.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=206524cf2d72
Hi Nicolas, 
Thank you very much for your review and feedbacks!
I am very glad.

I have pushed the patch over the try server (see comment 71), but the patch is not compiling on Mac OS.
The error is:
> use of undeclared identifier '__sincos'
For some reason the Mac OS doesn't seem to have the __sincos function... 

Do you have any idea what is happening?
Flags: needinfo?(nicolas.b.pierron)
doing a grep for "MACOS" over the source highlight than we have some macro named XP_MACOSX  and DARWIN, you might try to replace the __APPLE__ with these.
Flags: needinfo?(nicolas.b.pierron)
Attached patch Part 0 - ABI Signatures. (obsolete) — Splinter Review
Hi, thanks again for your big help.

In the last attempt [1] over try server, the sincos failed on the arm test, 
because the signatures of
> math_sincos_impl(int, double, int, int) 
and 
> math_sincos_uncached(double , double *, double *) 

don't exist in the simulators (arm, arm64 and mips32).


[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=2490b3802647
Attachment #8657299 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8657299 [details] [diff] [review]
Part 0 - ABI Signatures.

Review of attachment 8657299 [details] [diff] [review]:
-----------------------------------------------------------------

Welcome to the low-level and unexplainable parts of ABI calls :)

Unfortunately this from far one of the hardest part to get right (except for ARM64).
Lucky you, you won't have to dig unreadable documentation of each ABI and you can have a look at the ABIArgGenerator::next functions. ;)

::: js/src/jit/arm/Simulator-arm.cpp
@@ +2302,5 @@
> +            ival1 = get_register(1);
> +            if (UseHardFpABI())
> +                dval = get_double_from_d_register(0);
> +            else
> +                dval = get_double_from_register_pair(2);

soft-fp: [1]
  int (*)(double /* r0, r1 */, int /* r2 */, int /* r3 */);

hard-fp: [2]
  int (*)(double /* d0 */, int /* r0 */, int /* r1 */);


[1] https://dxr.mozilla.org/mozilla-central/source/js/src/jit/arm/Assembler-arm.cpp#46
[1] https://dxr.mozilla.org/mozilla-central/source/js/src/jit/arm/Assembler-arm.cpp#91

@@ +2318,5 @@
> +            ival2 = get_register(2);
> +            if (UseHardFpABI())
> +                dval = get_double_from_d_register(0);
> +            else
> +                dval = get_double_from_register_pair(2);

soft-fp:
  int (*)(int /* r0 */, double /* r2, r3 */, int /* stack[0] */, int /* stack[1] */);

hard-fp:
  int (*)(int /* r0 */, double /* d0 */, int /* r1 */, int /* r2 */);

::: js/src/jit/mips32/Simulator-mips32.cpp
@@ +1948,5 @@
>              setRegister(v0, res);
>              break;
>            }
> +          case Args_Int_DoubleIntInt: {
> +            double dval = getDoubleFromRegisterPair(a0);

The register allocation of the O32 ABI should be the following:

int (*)(double /* f12 */, int /* a2 */, int /* a3 */);

See https://dxr.mozilla.org/mozilla-central/source/js/src/jit/mips32/Assembler-mips32.cpp#32
Attachment #8657299 - Flags: review?(nicolas.b.pierron) → feedback+
Attached patch Part 0 - ABI Signatures. (obsolete) — Splinter Review
I got it,
Thanks for the nice explication :)
Attachment #8657299 - Attachment is obsolete: true
Attachment #8657879 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8657879 [details] [diff] [review]
Part 0 - ABI Signatures.

Review of attachment 8657879 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/Simulator-arm.cpp
@@ +2309,5 @@
> +            }
> +            Prototype_Int_DoubleIntInt target = reinterpret_cast<Prototype_Int_DoubleIntInt>(external);
> +            int32_t result = target(dval, ival0, ival1);
> +            scratchVolatileRegisters(/* scratchFloat = true */);
> +            setCallResultDouble(result);

use set_register(r0, result)

@@ +2329,5 @@
> +            }
> +            Prototype_Int_IntDoubleIntInt target = reinterpret_cast<Prototype_Int_IntDoubleIntInt>(external);
> +            int32_t result = target(ival0, dval, ival1, ival2);
> +            scratchVolatileRegisters(/* scratchFloat = true */);
> +            setCallResultDouble(result);

use set_register(r0, result)

::: js/src/jit/arm64/vixl/MozSimulator-vixl.cpp
@@ +596,5 @@
>      }
>  
> +    case js::jit::Args_Int_IntDoubleIntInt: {
> +      int64_t ret = reinterpret_cast<Prototype_Int_IntDoubleIntInt>(nativeFn)(x0, d0, x1, x2);
> +      setFP64Result(ret);

use setGPR64Result

@@ +602,5 @@
> +    }
> +
> +    case js::jit::Args_Int_DoubleIntInt: {
> +      int64_t ret = reinterpret_cast<Prototype_Int_DoubleIntInt>(nativeFn)(d0, x0, x1);
> +      setFP64Result(ret);

use setGPR64Result

::: js/src/jit/mips32/Simulator-mips32.cpp
@@ +1948,5 @@
>              setRegister(v0, res);
>              break;
>            }
> +          case Args_Int_DoubleIntInt: {
> +            double dval = getFpuRegisterFloat(12);

use getFpuRegisterDouble

@@ +1953,5 @@
> +            int32_t ival0, ival1;
> +            ival0 = getRegister(a2);
> +            ival1 = getRegister(a3);
> +            Prototype_Int_DoubleIntInt target = reinterpret_cast<Prototype_Int_DoubleIntInt>(external);
> +            int32_t res = target(dval, ival0, ival1);

nit: s/ival0/arg2/; s/ival1/arg3/;  These are already pre-loaded ahead of all function calls.

@@ +1965,5 @@
> +            dval = getDoubleFromRegisterPair(a2);
> +            ival1 = stack_pointer[4];
> +            ival2 = stack_pointer[5];
> +            Prototype_Int_IntDoubleIntInt target = reinterpret_cast<Prototype_Int_IntDoubleIntInt>(external);
> +            int32_t res = target(dval, ival0, ival1, ival2);

nit: s/ival0/arg0/; s/ival1/arg4/; s/ival2/arg5/;  (same as above)
Attachment #8657879 - Flags: review?(nicolas.b.pierron) → feedback+
Attached patch Part 0 - ABI Signatures. (obsolete) — Splinter Review
Sorry for that.
Attachment #8657879 - Attachment is obsolete: true
Attachment #8657887 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8657887 [details] [diff] [review]
Part 0 - ABI Signatures.

Review of attachment 8657887 [details] [diff] [review]:
-----------------------------------------------------------------

This sounds good :)
Attachment #8657887 - Flags: review?(nicolas.b.pierron) → review+
(In reply to Nicolas B. Pierron [:nbp] from comment #73)
> doing a grep for "MACOS" over the source highlight than we have some macro
> named XP_MACOSX  and DARWIN, you might try to replace the __APPLE__ with
> these.

Hi Nicolas, 
I tried to replace the __APPLE__ with the others macros, but the patch isn't compiling on the try server, it is showing up the "undeclared identifier '__sincos'" in every attempts [1][2][3][4][5].

Is the patch compiled on MacOS or the try server uses a cross compiler on linux?

Thanks in advance.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=731e1ebb7d95
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=2490b3802647
[3] https://treeherder.mozilla.org/#/jobs?repo=try&revision=59326f6e6a3a
[4] https://treeherder.mozilla.org/#/jobs?repo=try&revision=063e6f2ebf6f
[5] https://treeherder.mozilla.org/#/jobs?repo=try&revision=172735747199
Flags: needinfo?(nicolas.b.pierron)
(In reply to Victor Carlquist from comment #80)
> I tried to replace the __APPLE__ with the others macros, but the patch isn't
> compiling on the try server, it is showing up the "undeclared identifier
> '__sincos'" in every attempts [1][2][3][4][5].

I have no clue, could this be related to some version of the system library?
Flags: needinfo?(nicolas.b.pierron)
Added the signatures in switch in MacroAssembler-inl.h.
Attachment #8657887 - Attachment is obsolete: true
Attachment #8660428 - Flags: review+
Attached patch Part 1 - Sincos.Splinter Review
Our patch is now building correctly on try server!

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1839c88aa7b2

Now, we are checking in 'configure.in' file if the '__sincos' is defined in math library.
Attachment #8656245 - Attachment is obsolete: true
Attachment #8660430 - Flags: review?(nicolas.b.pierron)
Attachment #8660430 - Flags: review?(nicolas.b.pierron) → review+
Thank you!
Try is green (comment 83)
I'm setting the "keywords" field.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/b125230a7d32
https://hg.mozilla.org/mozilla-central/rev/3dec2b935295
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Thank you guys!
(In reply to Victor Carlquist from comment #87)
> Thank you guys!

https://twitter.com/sayrer/status/19304989209
(In reply to Victor Carlquist from comment #84)
> Try is green (comment 83)

Hi Victor,

Thanks for making this patch!

Did you try compiling with --enable-simulator=arm64? (Try doesn't support that yet, I'm working on it).

I'm seeing build errors like this:

In file included from /Users/jolesen/gecko-dev/obj-a64simdev/js/src/Unified_cpp_js_src3.cpp:83:
In file included from /Users/jolesen/gecko-dev/js/src/jit/BaselineIC.cpp:38:
/Users/jolesen/gecko-dev/js/src/jit/shared/Lowering-shared-inl.h:188:76: error: no matching conversion for functional-style cast from 'const ARMFPRegister'
      (aka 'const vixl::FPRegister') to 'js::jit::LFloatReg'
    lir->setDef(1, LDefinition(vreg + VREG_INCREMENT, LDefinition::DOUBLE, LFloatReg(d1)));
                                                                           ^~~~~~~~~~~~
/Users/jolesen/gecko-dev/js/src/jit/LIR.h:306:7: note: candidate constructor (the implicit copy constructor) not viable: no known conversion from
      'const ARMFPRegister' (aka 'const vixl::FPRegister') to 'const js::jit::LFloatReg' for 1st argument
class LFloatReg : public LAllocation
      ^
/Users/jolesen/gecko-dev/js/src/jit/LIR.h:306:7: note: candidate constructor (the implicit move constructor) not viable: no known conversion from
      'const ARMFPRegister' (aka 'const vixl::FPRegister') to 'js::jit::LFloatReg' for 1st argument
class LFloatReg : public LAllocation
      ^
/Users/jolesen/gecko-dev/js/src/jit/LIR.h:309:14: note: candidate constructor not viable: no known conversion from 'const ARMFPRegister'
      (aka 'const vixl::FPRegister') to 'js::jit::FloatRegister' for 1st argument
    explicit LFloatReg(FloatRegister reg)
             ^
1 error generated.
Flags: needinfo?(victorcarlquist)
(In reply to Jakob Stoklund Olesen from comment #89)
> Did you try compiling with --enable-simulator=arm64?
No, I didn't... sorry

I'm working on it right now!
Thanks Jakob.
Flags: needinfo?(victorcarlquist)
Attached patch Part 2 - Fixed build on Arm64 (obsolete) — Splinter Review
Now we can build on arm64 simulator :)

Thanks again.
Attachment #8662147 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8662147 [details] [diff] [review]
Part 2 - Fixed build on Arm64

Review of attachment 8662147 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/LIR.h
@@ +309,5 @@
>      explicit LFloatReg(FloatRegister reg)
>        : LAllocation(FPU, reg.code())
>      { }
> +    explicit LFloatReg(ARMFPRegister reg)
> +      : LAllocation(FPU, reg.code())

Have you tried to compile again with enabling the simulator?
I suggest you change the lowering to do

  LFloatReg(FloatRegister(FloatRegisters::d1, FloatRegisters::Double))

by the way, I think this is an ARM64 issue if vixl types are leaking in the js::jit namespace.
Attachment #8662147 - Flags: review?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #92)
> Comment on attachment 8662147 [details] [diff] [review]
> Part 2 - Fixed build on Arm64
> 
> Review of attachment 8662147 [details] [diff] [review]:

Thanks for your review.

We have two definition in Arm64 with same register name [0] [1], 
so we need to use LFloatReg(FloatRegisters::d1) to choose the right definition.

> Have you tried to compile again with enabling the simulator?
Yes, That patch was compiling, but it was wrong.


[0] https://dxr.mozilla.org/mozilla-central/source/js/src/jit/arm64/Assembler-arm64.h#122
[1] https://dxr.mozilla.org/mozilla-central/source/js/src/jit/arm64/Architecture-arm64.h#188
Attached patch Part 2 - Fixed build on Arm64 (obsolete) — Splinter Review
Please, see the comment 93.
Thanks!
Attachment #8662147 - Attachment is obsolete: true
Attachment #8662343 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8662343 [details] [diff] [review]
Part 2 - Fixed build on Arm64

Review of attachment 8662343 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/shared/Lowering-shared-inl.h
@@ +184,5 @@
>  
>      uint32_t vreg = getVirtualRegister();
>      lir->setDef(0, LDefinition(vreg, LDefinition::DOUBLE, LFloatReg(ReturnDoubleReg)));
>  #if defined(JS_CODEGEN_ARM) || defined(JS_CODEGEN_ARM64)
> +    lir->setDef(1, LDefinition(vreg + VREG_INCREMENT, LDefinition::DOUBLE, LFloatReg(FloatRegisters::d1)));

This patch is unexpectedly working because of the lack of "explicit" keyword on the FloatRegister constructors!
We should address that in a follow-up.
Attachment #8662343 - Flags: review?(nicolas.b.pierron) → review+
> This patch is unexpectedly working because of the lack of "explicit" keyword
> on the FloatRegister constructors!
> We should address that in a follow-up.

Should we create a new patch to do this or modify the last patch to address this issue?
Flags: needinfo?(nicolas.b.pierron)
(In reply to Victor Carlquist from comment #96)
> > This patch is unexpectedly working because of the lack of "explicit" keyword
> > on the FloatRegister constructors!
> > We should address that in a follow-up.
> 
> Should we create a new patch to do this or modify the last patch to address
> this issue?

We should create a new bug to do this.
Flags: needinfo?(nicolas.b.pierron)
Comment on attachment 8662343 [details] [diff] [review]
Part 2 - Fixed build on Arm64

Setting checkin flag.
Attachment #8662343 - Flags: checkin?
this doesn't apply to inbound:

adding 984018 to series file
renamed 984018 -> bug984018-part3.patch
applying bug984018-part3.patch
patching file js/src/jit/shared/Lowering-shared-inl.h
Hunk #1 FAILED at 179
1 out of 1 hunks FAILED -- saving rejects to file js/src/jit/shared/Lowering-shared-inl.h.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working directory
errors during apply, please fix and refresh bug984018-part3.patch

can you take a look please ? Thanks!
Flags: needinfo?(victorcarlquist)
Attached patch Part 2 - Fixed build on Arm64. (obsolete) — Splinter Review
Patch rebased.
Attachment #8662343 - Attachment is obsolete: true
Attachment #8662343 - Flags: checkin?
Flags: needinfo?(victorcarlquist)
Attachment #8663087 - Flags: review+
Attachment #8663087 - Flags: checkin?
sorry had to back this out since this caused a bustage like https://treeherder.mozilla.org/logviewer.html#?job_id=14415261&repo=mozilla-inbound
Flags: needinfo?(victorcarlquist)
Attached patch Part 2 - Fixed build on Arm64. (obsolete) — Splinter Review
Patch rebased and fixed.
Attachment #8663087 - Attachment is obsolete: true
Attachment #8663087 - Flags: checkin?
Flags: needinfo?(victorcarlquist)
Attachment #8663951 - Flags: review?(nicolas.b.pierron)
Attachment #8663951 - Flags: review?(nicolas.b.pierron) → review+
Depends on: 1207236
Attached patch Part 2 - Fixed build on Arm64 (obsolete) — Splinter Review
Thanks for your reviews!

Try is red again[1]
Fixed Arm32...

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=fec4de707f16
Attachment #8663951 - Attachment is obsolete: true
Attachment #8664344 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8664344 [details] [diff] [review]
Part 2 - Fixed build on Arm64

This patch is wrong.
Attachment #8664344 - Flags: review?(nicolas.b.pierron)
Attached patch Part 2 - Fixed build on Arm64 (obsolete) — Splinter Review
Fixed Arm32.

Thanks again.
Attachment #8664344 - Attachment is obsolete: true
Attachment #8664481 - Flags: review?(nicolas.b.pierron)
Attached patch Part 2 - Fixed build on Arm64 (obsolete) — Splinter Review
Fixed parameters.
Attachment #8664481 - Attachment is obsolete: true
Attachment #8664481 - Flags: review?(nicolas.b.pierron)
Attachment #8664586 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8664586 [details] [diff] [review]
Part 2 - Fixed build on Arm64

Review of attachment 8664586 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/shared/Lowering-shared-inl.h
@@ +188,1 @@
>      lir->setDef(1, LDefinition(vreg + VREG_INCREMENT, LDefinition::DOUBLE, LFloatReg(d1)));

Modify the patch by using the following on ARM,

  LFloatReg(FloateRegister(FloatRegisters::d1, FloatRegister::Double))

Apparently the Architecture-arm.h does not follow the same organization, and this should be fixed in follow-up bugs.
Then r=me.
Attachment #8664586 - Flags: review?(nicolas.b.pierron) → review+
Attached patch Part 2 - Fixed build on Arm64 (obsolete) — Splinter Review
Attachment #8664586 - Attachment is obsolete: true
Attachment #8665760 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8665760 [details] [diff] [review]
Part 2 - Fixed build on Arm64

Review of attachment 8665760 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/shared/Lowering-shared-inl.h
@@ +185,5 @@
>      uint32_t vreg = getVirtualRegister();
>      lir->setDef(0, LDefinition(vreg, LDefinition::DOUBLE, LFloatReg(ReturnDoubleReg)));
>  #if defined(JS_CODEGEN_ARM) || defined(JS_CODEGEN_ARM64)
> +    lir->setDef(1, LDefinition(vreg + VREG_INCREMENT, LDefinition::DOUBLE,
> +                LFloatReg(FloatRegister(FloatRegisters::d1, FloatRegisters::Double))));

Please test with both simulator. This would not compile on ARM, as you already saw with one of the previous patches.
As I mentioned in the previous review, Architecture-arm.h / Architecture-arm64.h are not consistent in the location of the enumeration for the kind of the float register content.

In one case you need to use 

    FloatRegisters::Double

and in the other you need to use

    FloatRegister::Double
                 ^
               no "s"
Attachment #8665760 - Flags: review?(nicolas.b.pierron)
I'm Sorry for that patch....

Try is green now :)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8f6ce9d834dc

Thanks
Attachment #8665760 - Attachment is obsolete: true
Attachment #8666111 - Flags: review?(nicolas.b.pierron)
Attachment #8666111 - Flags: review?(nicolas.b.pierron) → review+
thank you for your reviews.
Keywords: checkin-needed
Hi, did we ultimately get any speedups on the expected benchmarks like box2d?  I looked on awfy but I didn't see anything, but maybe I wasn't looking at the right place.
(In reply to Luke Wagner [:luke] from comment #118)
> Hi, did we ultimately get any speedups on the expected benchmarks like
> box2d?  I looked on awfy but I didn't see anything, but maybe I wasn't
> looking at the right place.

Hi,
currently this optimization is enabled only on MacOS, because it has a fast __sincos function.
 
It's probably occurring because the server doesn't has the __sincos function on MacOS (see comment 80).
I think that the builds on AWFY is calling the sin/cos instead of __sincos.

*Maybe*, we could implement a moz_sincos function to enable this optimization on all platforms,
what do you think?
Flags: needinfo?(luke)
Ah, ok, I wasn't aware that this was MacOS-only.  I would be in favor of enabling this optimization on all platforms, but I don't know what all this entails.  I know we had a separate bug about importing our own sin/cos impls instead of relying on the platform ones.  sunfish might be able to comment more on that.
Flags: needinfo?(luke) → needinfo?(sunfish)
When this landed, there was a improvement of 5% on Kraken-dft on Mac 32-bit and 4% on Kraken-dft on Mac 64-bit (the 32-bit one is marked on AWFY - https://arewefastyet.com/regressions/#/regression/1796698).
The sincos work I did in bug 967709 was done under the assumption that we could tolerate a slightly lower precision than typical libm sin and cos implementations. My subsequent understanding is that TC-39 now generally wishes JS implementations to aim for very high precision in the math functions, even though the spec doesn't cover this.

Most open-source libm-level sin/cos/sincos implementations use some variant of fdlibm, for example http://git.musl-libc.org/cgit/musl/tree/src/math/sincos.c, which unfortunately isn't very fast; it does save some work compared to calling the fdlibm code for sin and cos separately, but it falls back to the slow path argument reduction even for fairly low values. A new fast implementation could be interesting, both for SpiderMonkey here and potentially for WebAssembly as well. If anyone wants to work on this, I could offer some support.
Flags: needinfo?(sunfish)
Comment on attachment 8666111 [details] [diff] [review]
Part 2 - Fixed build on Arm64.

Review of attachment 8666111 [details] [diff] [review]:
-----------------------------------------------------------------

[Approval Request Comment]
[Bug caused by:] Bug 984018, involve: Bug 1213351
[User Impact] if declined: Impossible to build on ARM64.
[Describe test coverage new/current, TreeHerder]: This patch has already landed on m-c, for the last two weeks without problems on try server.
[Risk to taking this patch]: Low, it will fix the build on ARM64 at the sincos optimization without affect other codes.
[String/UUID change made/needed]: None.
Attachment #8666111 - Flags: approval-mozilla-aurora?
Comment on attachment 8666111 [details] [diff] [review]
Part 2 - Fixed build on Arm64.

Fix for ARM builds, approved for uplift to aurora.
Attachment #8666111 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Depends on: 1213351
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: