Closed Bug 930477 Opened 11 years ago Closed 10 years ago

IonMonkey: Specialize floor and round for float32

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla30

People

(Reporter: jandem, Assigned: bbouvier)

References

(Blocks 1 open bug)

Details

Attachments

(6 files, 4 obsolete files)

Attached file Testcase
For the attached testcase I get the following output:

floor d 24
floor f 249
round d 38
round f 269
pow d 118
pow f 336

Especially Math.floor we can't regress. Benjamin, do you think we can get this fixed for Firefox 27? We have a few days until the next merge.
Flags: needinfo?(benj)
Challenge accepted. Let me give it a try
Flags: needinfo?(benj)
Attached patch Floor + tests (obsolete) — Splinter Review
This implements Math.floor specialization for Float32.

Try: (I am not sure that floorf is available on all platforms / compilers)
https://tbpl.mozilla.org/?tree=Try&rev=7ecc7d5392b7

Asking r? for jonco for the ARM parts, jandem for the rest.
Attachment #821722 - Flags: review?(jdemooij)
Attachment #821722 - Flags: review?(jcoppeard)
Comment on attachment 821722 [details] [diff] [review]
Floor + tests

Review of attachment 821722 [details] [diff] [review]:
-----------------------------------------------------------------

ARM parts look good to me.
Attachment #821722 - Flags: review?(jcoppeard) → review+
Comment on attachment 821722 [details] [diff] [review]
Floor + tests

Review of attachment 821722 [details] [diff] [review]:
-----------------------------------------------------------------

Nice, thanks! r=me with comments below addressed.

::: js/src/assembler/assembler/X86Assembler.h
@@ +1273,5 @@
>              m_formatter.immediate32(imm);
>          }
>      }
>  
> +    void cmpd_ir(int imm, RegisterID dst)

You can remove this method (see other comments).

@@ +1450,5 @@
>               nameIReg(8,src), nameIReg(8,dst));
>          m_formatter.oneByteOp64(OP_TEST_EvGv, src, dst);
>      }
>  
> +    void testd_rr(RegisterID src, RegisterID dst)

And this one.

::: js/src/jit/LIR-Common.h
@@ +4364,5 @@
>      LFloor(const LAllocation &num) {
>          setOperand(0, num);
>      }
>  
>      MRound *mir() const {

Pre-existing but this looks wrong (MRound instead of MFloor), can you just remove this method?

@@ +4379,5 @@
> +    LFloorF(const LAllocation &num) {
> +        setOperand(0, num);
> +    }
> +
> +    MRound *mir() const {

Same here.

::: js/src/jit/MIR.cpp
@@ +676,5 @@
> +            ConvertDefinitionToDouble<0>(input(), this);
> +        return;
> +    }
> +
> +    if (type() == MIRType_Double)

I think you can remove this and JS_ASSERT(type() == MIRType_Int32); ?

::: js/src/jit/x64/Assembler-x64.h
@@ +601,5 @@
>      }
>      void cmpq(const Register &lhs, Imm32 rhs) {
>          masm.cmpq_ir(rhs.value, lhs.code());
>      }
> +    void cmpd(const Register &lhs, Imm32 rhs) {

You can remove this method, see other comment.

::: js/src/jit/x64/MacroAssembler-x64.cpp
@@ +368,5 @@
> +Assembler::Condition
> +MacroAssemblerX64::testNegativeZeroFloat32(const FloatRegister &reg, const Register &scratch)
> +{
> +    movd(reg, scratch);
> +    cmpd(scratch, Imm32(1));

s/cmpd/cmpl

::: js/src/jit/x86/MacroAssembler-x86.cpp
@@ +402,5 @@
>      return Zero;
>  }
> +
> +Assembler::Condition
> +MacroAssemblerX86::testNegativeZeroFloat32(const FloatRegister &reg, const Register &scratch)

I think we can do better for float32 and use what we have for x64:

movd(reg, scratch);
cmpl(scratch, Imm32(1));
return Overflow;
Attachment #821722 - Flags: review?(jdemooij) → review+
Attached patch Floor + tests + updated x86 code (obsolete) — Splinter Review
Carrying forward r+. Ready to land, as soon as jandem approves the diff.
Assignee: nobody → benj
Attachment #821722 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #821886 - Flags: review+
Attached patch diff.patch (obsolete) — Splinter Review
Not asking for review but just for feedback for this part. This diff contains all the changes I had to add for the x86 backend to work too. Basically, it just consists in using fstp for 32 bits instead of 64 bits when retrieving the result of the called function.
Attachment #821891 - Flags: feedback?(jdemooij)
Forgot to qref for compilation on x64.

Carrying forward r+
Attachment #821886 - Attachment is obsolete: true
Attachment #821892 - Flags: review+
Comment on attachment 821891 [details] [diff] [review]
diff.patch

Review of attachment 821891 [details] [diff] [review]:
-----------------------------------------------------------------

Nice find.
Attachment #821891 - Flags: feedback?(jdemooij) → feedback+
Attached patch Round + testsSplinter Review
This patch specializes Math.round for Float32 inputs. It makes round on par with doubles for the given test case.

Fixed a bug in ARM by the way: result of a MMathFunction call wasn't moved in the return result if the Result kind was FLOAT. Curious that this didn't show up last time I ran the tests.

Funny that libc's round() function rounds -0.5 to -1 while Math.round(-0.5) rounds it to -0. I launched a try run to be sure that all platforms have roundf:
https://tbpl.mozilla.org/?tree=Try&rev=c8933f55d03d

Asking for review to Marty for the ARM parts and Jan for the rest.
Attachment #821891 - Attachment is obsolete: true
Attachment #828344 - Flags: review?(mrosenberg)
Attachment #828344 - Flags: review?(jdemooij)
Comment on attachment 828344 [details] [diff] [review]
Round + tests

Review of attachment 828344 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks,

::: js/src/jit/MIR.cpp
@@ +685,5 @@
>  
> +void
> +MRound::trySpecializeFloat32()
> +{
> +    // No need to look at the output, as it's an integer (unique way to have this instruction in IonBuilder::inlineMathRound)

Nit: comment doesn't fit in 79 characters. Also add a JS_ASSERT(type() == MIRType_Int32).
Attachment #828344 - Flags: review?(jdemooij) → review+
Comment on attachment 828344 [details] [diff] [review]
Round + tests

Review of attachment 828344 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +3613,5 @@
>  {
>      if (secondScratchReg_ != lr)
>          ma_mov(secondScratchReg_, lr);
>  
> +    if (result == DOUBLE || result == FLOAT) {

This doesn't look correct.  When the result is a float, it will be returned in s0, not d0, so the two possible transfers should both be changed to the floating point equivalents.
Attachment #828344 - Flags: review?(mrosenberg)
Attachment #828344 - Flags: review?(jdemooij)
Attachment #828344 - Flags: review+
Comment on attachment 828344 [details] [diff] [review]
Round + tests

Review of attachment 828344 [details] [diff] [review]:
-----------------------------------------------------------------

Restore r+ flag.
Attachment #828344 - Flags: review?(jdemooij) → review+
Thanks for the reviews! As the try run shows, the build didn't work on Win XP, which doesn't implement roundf, so I have slightly more work to do here...
Attached patch roundf impl for all platforms (obsolete) — Splinter Review
I wish I had something more interesting to have you review. This is almost only copy and paste, with s/double/float and the right values at the right places.

The jit-tests in the previous patch pass, thus the function is correct.
Attachment #829552 - Flags: review?(jwalden+bmo)
Depends on: 936710
No longer depends on: 936710
Comment on attachment 829552 [details] [diff] [review]
roundf impl for all platforms

Review of attachment 829552 [details] [diff] [review]:
-----------------------------------------------------------------

I didn't look at the FloatingPoint.h stuff, as I'll just need to reread it all super-carefully to verify all the algorithms in a unified world anyway.

::: js/src/jsmath.cpp
@@ +47,2 @@
>  using mozilla::ExponentComponent;
> +using mozilla::FloatExponentComponent;

Keep these alphabetical, not enough value in pairing stuff up, I think.

@@ +789,5 @@
>      return js_copysign(floor(x + 0.5), x);
>  }
>  
> +float
> +js::math_roundf_impl(float x)

I want tests for this method.  Including for specific interesting values like -1, -0.5, -0.25, -FLT_MIN, -0, +0, +FLT_MIN, +0.25, +0.5, (1 << 22), (1 << 23) - 1, (1 << 23) - 0.5, (1 << 23), (1 << 23) + 0.5 (mostly for sanity), (1 << 23) + 1.0, (1 << 23) + 1.5, FLT_MAX, -FLT_MAX, the number just smaller than FLT_MAX, Infinity, -Infinity, and for a sampling of NaN bit patterns.

This is totally ripe for all sorts of errors, especially fenceposting ones, without a bunch of tests.  :-)

@@ +793,5 @@
> +js::math_roundf_impl(float x)
> +{
> +    int32_t i;
> +    if (FloatIsInt32(x, &i))
> +        return float(i);

Seems like

    int32_t ignored;
    if (FloatIsInt32(x, &ignored))
        return x;

would be better wrt returning its input unchanged.

@@ +797,5 @@
> +        return float(i);
> +
> +    /* Some numbers are so big that adding 0.5 would give the wrong number. */
> +    if (FloatExponentComponent(x) >= mozilla::FloatExponentShift)
> +        return x;

ExponentComponent is signed.  ExponentShift is unsigned.  By the usual arithmetic conversions, the signed value will be promoted to an unsigned value, using the normal modular arithmetic.  So ExponentComponent, on sufficiently small numbers (I think anything 0.5 or lower in magnitude) won't be rounded by this method!  Unless I'm horribly misreading, but I'm pretty sure I'm not.

Did I mention tests yet?

@@ +799,5 @@
> +    /* Some numbers are so big that adding 0.5 would give the wrong number. */
> +    if (FloatExponentComponent(x) >= mozilla::FloatExponentShift)
> +        return x;
> +
> +    return js_copysign(floorf(x + 0.5f), x);

We should move js_copysign into FloatingPoint.h at some point.  Separate patch, separate bug if you prefer, should be quick to do.

::: mfbt/FloatingPoint.h
@@ +289,5 @@
>    MOZ_ASSERT(IsFloatNaN(f));
>    return f;
>  }
>  
>  /**

I think we're at the point with FloatingPoint.h that we should condense the float/double implementations into one, to avoid typos and the like.  Something like:

namespace mozilla {

namespace detail {

struct FloatTypeTraits
{
    typedef uint32_t Bits;

    const unsigned FloatExponentBias = 127;
    const unsigned FloatExponentShift = 23;

    static const Bits SignBit         = 0x80000000UL;
    static const Bits ExponentBits    = 0x7F800000UL;
    static const Bits SignificandBits = 0x007FFFFFUL;
};

struct DoubleTypeTraits
{
    typedef uint64_t Bits;

    static const unsigned ExponentBias = 1023;
    static const unsigned ExponentShift = 52;

    static const Bits SignBit         = 0x8000000000000000ULL;
    static const Bits ExponentBits    = 0x7ff0000000000000ULL;
    static const Bits SignificandBits = 0x000fffffffffffffULL;
};

template<typename T> struct SelectTrait;
template<float> struct SelectTrait : FloatTypeTraits {};
template<double> struct SelectTrait : DoubleTypeTraits {};

template<typename T>
struct FPTraits : SelectTrait<T>
{
    typedef typename SelectTrait<T> Base;

    static_assert((Base::SignBit & Base::ExponentBits) == 0,
                  "sign bit shouldn't overlap exponent bits");
    static_assert((Base::SignBit & Base::SignificandBits) == 0,
                  "sign bit shouldn't overlap significand bits");
    static_assert((Base::ExponentBits & Base::SignificandBits) == 0,
                  "exponent bits shouldn't overlap significand bits");

    static_assert((Base::SignBit | Base::ExponentBits | Base::SignificandBits) ==
                  ~Base::Bits(0),
                  "all bits accounted for");

    /*
     * These implementations assume float/double are 32/64-bit single/double format
     * number types compatible with the IEEE-754 standard.  C/C++ don't require this
     * to be the case.  But we required this in implementations of these algorithms
     * that preceded this header, so we shouldn't break anything if we keep doing so.
     */
    static_assert(sizeof(T) == sizeof(Base::Bits), "Bits must be same size as T");
};

} // namespace detail

template<typename T>
static MOZ_ALWAYS_INLINE bool
IsNaN(T t)
{
  typedef typename FPTraits<T> Traits;
  Traits::Bits bits = BitwiseCast<Traits::Bits>(t);
  return (bits & Traits::ExponentBits) == Traits::ExponentBits &&
         (bits & Traits::SignificandBits) != 0;
}

// ...and so on

} // namespace mozilla
Attachment #829552 - Flags: review?(jwalden+bmo) → review-
Ah, you had tests in a separate patch.  Eyeballing quickly, it looks like you had double tests where |x| < 0.5 but not any float ones.  Err on the side of more tests!  :-)
Thanks for the careful review! Making FloatingPoint.h more generic seems a good idea, and as I wouldn't want to spam this thread (and there might also be some uses of FloatingPoint.h that would need to be updated), I filed bug 939843.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #18)
> Ah, you had tests in a separate patch.  Eyeballing quickly, it looks like
> you had double tests where |x| < 0.5 but not any float ones.  Err on the
> side of more tests!  :-)

Heh, the "double tests" denomination is kinda wrong, this deserves a comment in the tests I added to explain the difference between testRound and testDoubleRound. Actually, when creating the MIR nodes, the baseline can conclude that the return value of a Math.round can be an Integer (first round (pun not intended) of tests) or a Double (second round, which tests < 0.5 values, etc.). The above roundf function is actually called if and only if the returned value is expected to be a Double, so this was actually tested and passing on my machine, but not for all special cases you mentioned above.
Depends on: 939843
Benjamin, what's the next step here? Can we land these patches?
Flags: needinfo?(benj)
Not yet, bug 939843 has to be implemented and finished before, so that we have a nice implementation of roundf for all platforms (win xp doesn't have one). I'll work on bug 939843 again soon.
Flags: needinfo?(benj)
Long time no see, bug 930477! Here is a new patch based on the last API from bug 939843, fixing the related other comments.

Waldo, there is no C++ tests for the JS maths so far. To avoid the unsigned / signed comparison, I have used the same trick as in the equivalent double function, which is to write directly the value here (dirty)(but it works).
Should we begin some C++ js_math tests too?
Attachment #829552 - Attachment is obsolete: true
Attachment #8374165 - Flags: review?(jwalden+bmo)
Pow doesn't strictly commute -> can't specialize it.
Summary: IonMonkey: Specialize floor, round and pow for float32 → IonMonkey: Specialize floor and round for float32
(In reply to Benjamin Bouvier [:bbouvier] from comment #24)
> Pow doesn't strictly commute -> can't specialize it.

OK, but can you make sure we get the same numbers for "pow d" and "pow f" for the attached testcase? When I filed this bug, "pow f" was almost 3x slower (see comment 0), it would be good to fix that by converting to double or something.
Flags: needinfo?(benj)
Attached patch Parity for PowSplinter Review
Sure thing. This inlines the MPow or MPowHalf even if the input is a Float32, but the DoublePolicy (or PowPolicy) will convert these to a double before applying the instruction.
Attachment #8379100 - Flags: review?(jdemooij)
Flags: needinfo?(benj)
Comment on attachment 8379100 [details] [diff] [review]
Parity for Pow

Review of attachment 8379100 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

::: js/src/jit/MCallOptimize.cpp
@@ +779,2 @@
>          return InliningStatus_NotInlined;
> +    if (powerType != MIRType_Int32 && !IsFloatingPointType(powerType))

Nit: if (!IsNumberType(powerType)), same for baseType.
Attachment #8379100 - Flags: review?(jdemooij) → review+
I already reported this to bbouvier on IRC, but I thought I'd add a note to bugzilla. This patch seems to have caused a slowdown in Octane 2 Mandreel of ~50% (or at least this is what I saw when running tests manually).
Got r+ from h4writer over IRC. Fixes bug 975290 and the regression. Added test case from bug 975920.

https://hg.mozilla.org/integration/mozilla-inbound/rev/9ac8d347e6c6
Attachment #8379638 - Flags: review+
Comment on attachment 8374165 [details] [diff] [review]
Roundf for all platforms

Review of attachment 8374165 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/CodeGenerator.cpp
@@ +4326,5 @@
> +      case MMathFunction::ASin:  funptr = JS_FUNC_TO_DATA_PTR(void *, asinf);            break;
> +      case MMathFunction::ACos:  funptr = JS_FUNC_TO_DATA_PTR(void *, acosf);            break;
> +      case MMathFunction::Floor: funptr = JS_FUNC_TO_DATA_PTR(void *, floorf);           break;
> +      case MMathFunction::Round: funptr = JS_FUNC_TO_DATA_PTR(void *, math_roundf_impl); break;
> +      case MMathFunction::Ceil:  funptr = JS_FUNC_TO_DATA_PTR(void *, ceilf);            break;

Blugh, this is why I dislike |expr; break;|, with attempted break-alignment, on the same line as a case.  I would favor expanding this into the normal multiple-line thing, myself.

::: js/src/jsmath.cpp
@@ +751,3 @@
>  
>      /* Some numbers are so big that adding 0.5 would give the wrong number. */
>      if (ExponentComponent(x) >= 52)

After you land bug 939843, could you make a followup that changes this to mozilla::FloatingPoint<double>::ExponentShift?

@@ +763,5 @@
> +    if (NumberIsInt32(x, &ignored))
> +        return x;
> +
> +    /* Some numbers are so big that adding 0.5 would give the wrong number. */
> +    if (ExponentComponent(x) >= 23)

And this to mozilla::FloatingPoint<float>::ExponentShift.

@@ +766,5 @@
> +    /* Some numbers are so big that adding 0.5 would give the wrong number. */
> +    if (ExponentComponent(x) >= 23)
> +        return x;
> +
> +    return js_copysign(floorf(x + 0.5f), x);

Probably a mozilla::CopySign method would be useful, perhaps mozilla::Floor as well, given the C-required floor/floorf/floorl(?) mess, and the messiness of defining js_copysign across compilers.  Separate bug.
Attachment #8374165 - Flags: review?(jwalden+bmo) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #22)
> Waldo, there is no C++ tests for the JS maths so far. To avoid the unsigned
> / signed comparison, I have used the same trick as in the equivalent double
> function, which is to write directly the value here (dirty)(but it works).
> Should we begin some C++ js_math tests too?

I don't quite follow what's being commented on, or asked about, here.
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #34)
> I don't quite follow what's being commented on, or asked about, here.

Waldo, may I quote what Waldo-from-the-past said about using ExponentShift in this condition.

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #17)
> @@ +797,5 @@
> > +        return float(i);
> > +
> > +    /* Some numbers are so big that adding 0.5 would give the wrong number. */
> > +    if (FloatExponentComponent(x) >= mozilla::FloatExponentShift)
> > +        return x;
> 
> ExponentComponent is signed.  ExponentShift is unsigned.  By the usual
> arithmetic conversions, the signed value will be promoted to an unsigned
> value, using the normal modular arithmetic.  So ExponentComponent, on
> sufficiently small numbers (I think anything 0.5 or lower in magnitude)
> won't be rounded by this method!  Unless I'm horribly misreading, but I'm
> pretty sure I'm not.
> 
> Did I mention tests yet?

That's what I was referring to. You were asking at some points for tests, but didn't precise whether you wanted tests for the Floating point generic MFBT operations OR the js maths functions. So I was asking whether you wanted tests for js maths, and simply observed that there weren't any I could see so far.
Try looks good. Added a coercion to int16_fast_t in the comparisons (same return type as ExponentComponent()) and everything is fine.

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/a1b396e1f1dd
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/a07b45e32644
Whiteboard: [leave open]
https://hg.mozilla.org/mozilla-central/rev/a1b396e1f1dd
https://hg.mozilla.org/mozilla-central/rev/a07b45e32644
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
(In reply to Benjamin Bouvier [:bbouvier] from comment #35)
> (In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #34)
> > I don't quite follow what's being commented on, or asked about, here.
> 
> Waldo, may I quote what Waldo-from-the-past said about using ExponentShift
> in this condition.
> 
> (In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #17)
> > @@ +797,5 @@
> > > +        return float(i);
> > > +
> > > +    /* Some numbers are so big that adding 0.5 would give the wrong number. */
> > > +    if (FloatExponentComponent(x) >= mozilla::FloatExponentShift)
> > > +        return x;
> > 
> > ExponentComponent is signed.  ExponentShift is unsigned.  By the usual
> > arithmetic conversions, the signed value will be promoted to an unsigned
> > value, using the normal modular arithmetic.  So ExponentComponent, on
> > sufficiently small numbers (I think anything 0.5 or lower in magnitude)
> > won't be rounded by this method!  Unless I'm horribly misreading, but I'm
> > pretty sure I'm not.
> > 
> > Did I mention tests yet?
> 
> That's what I was referring to. You were asking at some points for tests,
> but didn't precise whether you wanted tests for the Floating point generic
> MFBT operations OR the js maths functions. So I was asking whether you
> wanted tests for js maths, and simply observed that there weren't any I
> could see so far.

Ah!  The code you're implementing is in the JS engine, exposed purely via JS stuff.  So this should be a test in JS.  (The mfbt stuff already largely has tests.  We just need them for algorithms that use the low-level mfbt primitive bits, without just using them exactly in a 1-to-1 relationship.)  I suggest changing the C++, writing the test and verifying it fails as I noted there it should, then undoing the C++ change, to be sure the test would have properly caught the mistake.
Okay, this is weird.  math_round_impl has the exact same ExponentComponent(d) >= FloatingPoint<T>::ExponentShift thing going on, and on Windows int_fast16_t is int, and so Math.round(.125) should wrongly be .125 there, it appears.  But it doesn't seem to be if I test in a nightly.  Dunno what's up.  But definitely round and roundf should be identical in these respects here, and they're not, and somehow round() is bad, which *really* should have been caught by our existing tests, so I really dunno what's up here.  (int_fast16_t is long [!] on my new Linux box, so I can't reproduce anything there.  Strange, I thought I remembered it being int even on 64-bit there, before, but maybe not.)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #39)
> Okay, this is weird.  math_round_impl has the exact same
> ExponentComponent(d) >= FloatingPoint<T>::ExponentShift thing going on, and
> on Windows int_fast16_t is int, and so Math.round(.125) should wrongly be
> .125 there, it appears.  But it doesn't seem to be if I test in a nightly. 
> Dunno what's up.  But definitely round and roundf should be identical in
> these respects here, and they're not, and somehow round() is bad, which
> *really* should have been caught by our existing tests, so I really dunno
> what's up here.  (int_fast16_t is long [!] on my new Linux box, so I can't
> reproduce anything there.  Strange, I thought I remembered it being int even
> on 64-bit there, before, but maybe not.)

I've added a conversion from FloatingPoint<T>::ExponentShift to int_fast16_t, which is safe as the exponent shift is <= 52 (double exponent shift) and 52 has the same binary representation as an int, long or unsigned (at least in base 2). That makes the comparison safe. In the meanwhile, once again, as the unsigned and the int / long representations are the same for these values, it doesn't seem dangerous to compare them against each other directly and I suppose that's the reason why the compiler doesn't complain about it.

For sanity, should we add an assert that int_fast16_t(FloatingPoint<T>::ExponentShift) >= 0 and <= 52?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: