Closed Bug 774364 Opened 12 years ago Closed 9 years ago

IonMonkey: Inline Math.random().

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla43

People

(Reporter: sstangl, Assigned: arai)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [ion:t][games:p2])

Attachments

(6 files, 8 obsolete files)

8.08 KB, text/plain
Details
34.20 KB, patch
arai
: review+
arai
: checkin+
Details | Diff | Splinter Review
80.24 KB, patch
sstangl
: review+
Details | Diff | Splinter Review
3.59 KB, patch
sstangl
: review+
Details | Diff | Splinter Review
95.10 KB, patch
sstangl
: review+
hev
: review+
Details | Diff | Splinter Review
7.04 KB, patch
nbp
: review+
Details | Diff | Splinter Review
Spin-off of Bug 773859 (for JM) for tracking in Ion.
I think, i can do this :)
Assignee: general → evilpies
Well actually not so easy, because our implementation does int64 arithmetic and shifting with doubles and stuff like that.
Assignee: evilpies → general
Whiteboard: [ion:t] → [ion:t][games:p1]
OS: Linux → All
Hardware: x86_64 → All
Whiteboard: [ion:t][games:p1] → [ion:t][games:p2]
Assignee: general → nobody
WIP patch.

It enables inlining Math.random() on x86_64, but it uses several 64bit operations directly, and not yet generalized.
It's better to enable inlining on all archs at once, and wrap those operations with more generic functions, right?

Performance testcase and result:

  let t = elapsed();
  (function() {
    var n = 0;
    for (var i = 0; i < 100000000; i++)  {
      n += Math.random();
    }
    return n;
  })();
  print(elapsed() - t);

Before: 678540 [us]
After:  512344 [us] -- about 30% faster


On x86, the problem is that MULL uses fixed registors pair rdx:rax as destination, and not sure how to handle it properly with other temporary registors...

Not yet touched other archs.
Attachment #8585016 - Flags: feedback?(jdemooij)
Comment on attachment 8585016 [details] [diff] [review]
Part 1: Inline Math.random() in Ion on x86_64.

Review of attachment 8585016 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks arai! This needs a careful review because random is potentially security-sensitive and there can be subtle bugs here (numbers that appear random but are not).

I've had a ton of reviews lately so I'll forward this to Sean because he added the current MRandom/LRandom code (bug 797551).
Attachment #8585016 - Flags: feedback?(jdemooij) → feedback?(sstangl)
Comment on attachment 8585016 [details] [diff] [review]
Part 1: Inline Math.random() in Ion on x86_64.

Review of attachment 8585016 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/CodeGenerator.cpp
@@ +5130,5 @@
>  
>      MOZ_ASSERT(ToFloatRegister(ins->output()) == ReturnDoubleReg);
>  }
>  
> +#ifdef JS_CODEGEN_X64

There should be no #ifdefs in shared files. It's OK to just implement inlining for x86_64 for this patch, but the LIR, lowering, and codegen must still be made arch-specific, even if they just wind up using callVM().

@@ +5134,5 @@
> +#ifdef JS_CODEGEN_X64
> +const double RNG_DSCALE_INV = 1/double(1LL << 53);
> +static const uint64_t RNG_MULTIPLIER = 0x5DEECE66DLL;
> +static const uint64_t RNG_ADDEND = 0xBLL;
> +static const uint64_t RNG_MASK = (1LL << 48) - 1;

These should not be duplicated from jsmath.cpp. They could be moved into jsmath.h.

@@ +5146,5 @@
> +
> +    Register temp1 = ToRegister(ins->temp());
> +    Register temp2 = ToRegister(ins->temp2());
> +    Register temp3 = ToRegister(ins->temp3());
> +    Register temp4 = ToRegister(ins->temp4());

It would really be nice to give these registers better names if possible!

@@ +5157,5 @@
> +    masm.loadPtr(Address(temp3, JSCompartment::offsetOfRngState()), temp2);
> +
> +    // if (rngState == 0)
> +    //     random_initState();
> +    Label initState[2], initState_return[2];

This is pretty weird and at least deserves an explanation.

@@ +5172,5 @@
> +    // nextstate &= RNG_MASK;
> +    masm.movq(ImmWord(RNG_MASK), temp4);
> +    masm.andq(temp4, temp2);
> +
> +    // temp3 = (nextstate >> (48 - 26)) << 27;

No magic numbers, please. These should be given names and shared with jsmath.h.

@@ +5194,5 @@
> +    masm.andq(temp4, temp2);
> +
> +    // temp4 = nextstate >> (48 - 27);
> +    masm.movq(temp2, temp4);
> +    masm.shrq(Imm32(48 - 27), temp4);

Same comment for magic numbers.

@@ +5207,5 @@
> +    masm.vmulsd(tempFloat, output, output);
> +
> +    // cx->compartment()->rngState = nextstate
> +    masm.loadJSContext(temp1);
> +    masm.loadPtr(Address(temp1, JSContext::offsetOfCompartment()), temp3);

Could probably just remember this value from earlier.

@@ +5213,5 @@
> +
> +    Label end;
> +    masm.jump(&end);
> +
> +    // call random_initState

This could be done in a separate MIR instruction, say MInitRandom, that dominates each instance of MRandom. That would also let it be hoisted out of loops.

Of course, that also subtly changes the behavior of the RNG, so it might be a better idea to instead just put this code into an OutOfLine path. Grep for "OutOfLine" to find plenty of examples.

If you're calling out to C code for purposes of initialization, you might as well just call random_nextDouble() or math_random(), which will do the initialization check again and not require you to rejoin in the middle of the inline code.

Since rngstate isn't overwritten until the end, this is safe to do at any point.

::: js/src/jit/Lowering.cpp
@@ +1403,5 @@
>  LIRGenerator::visitRandom(MRandom *ins)
>  {
> +#ifdef JS_CODEGEN_X64
> +    LRandom *lir = new(alloc()) LRandom(tempFixed(CallTempReg0), tempFixed(CallTempReg1),
> +                                        tempFixed(CallTempReg2), tempFixed(CallTempReg3),

Are fixed registers really required here?

::: js/src/jscompartment.h
@@ +420,5 @@
>      /* Random number generator state, used by jsmath.cpp. */
>      uint64_t rngState;
>  
> +    static size_t offsetOfRngState() {
> +        return offsetof(JSCompartment, rngState);

This use of offsetof() is technically undefined in C++ since JSCompartment is non-POD, but that's not your problem.
Attachment #8585016 - Flags: feedback?(sstangl) → feedback+
Thank you for your feedback :D

(In reply to Sean Stangl [:sstangl] from comment #5)
> Comment on attachment 8585016 [details] [diff] [review]
> Part 1: Inline Math.random() in Ion on x86_64.
> 
> Review of attachment 8585016 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/CodeGenerator.cpp
> @@ +5130,5 @@
> >  
> >      MOZ_ASSERT(ToFloatRegister(ins->output()) == ReturnDoubleReg);
> >  }
> >  
> > +#ifdef JS_CODEGEN_X64
> 
> There should be no #ifdefs in shared files. It's OK to just implement
> inlining for x86_64 for this patch, but the LIR, lowering, and codegen must
> still be made arch-specific, even if they just wind up using callVM().

Okay, moved code to x86/x64/arm/mips dir.
in this patch, x86/arm/mips is same as before. Is it better to file bugs for each arch?

I'm working on x86 now, but it may take some more time. actually, at least it works, but slower than non-inlined. it needs adding more opcodes support to do same thing as math_random_no_outparam does.

> @@ +5134,5 @@
> > +#ifdef JS_CODEGEN_X64
> > +const double RNG_DSCALE_INV = 1/double(1LL << 53);
> > +static const uint64_t RNG_MULTIPLIER = 0x5DEECE66DLL;
> > +static const uint64_t RNG_ADDEND = 0xBLL;
> > +static const uint64_t RNG_MASK = (1LL << 48) - 1;
> 
> These should not be duplicated from jsmath.cpp. They could be moved into
> jsmath.h.

Moved. Also removed same definition from TestingFunctions.cpp.

> @@ +5146,5 @@
> > +
> > +    Register temp1 = ToRegister(ins->temp());
> > +    Register temp2 = ToRegister(ins->temp2());
> > +    Register temp3 = ToRegister(ins->temp3());
> > +    Register temp4 = ToRegister(ins->temp4());
> 
> It would really be nice to give these registers better names if possible!

Okay, renamed them.
One register (lowReg) is now used in multiple purpose, so I declared aliases (rngMultiplierReg and rngDscaleInvReg) for it.

> @@ +5207,5 @@
> > +    masm.vmulsd(tempFloat, output, output);
> > +
> > +    // cx->compartment()->rngState = nextstate
> > +    masm.loadJSContext(temp1);
> > +    masm.loadPtr(Address(temp1, JSContext::offsetOfCompartment()), temp3);
> 
> Could probably just remember this value from earlier.

It needs one more register, so I added it.
Also, I realized that temporary FloatRegister is not required for vmulsd and removed it. So LRandom still needs 5 temporary registers.

> @@ +5213,5 @@
> > +
> > +    Label end;
> > +    masm.jump(&end);
> > +
> > +    // call random_initState
> 
> This could be done in a separate MIR instruction, say MInitRandom, that
> dominates each instance of MRandom. That would also let it be hoisted out of
> loops.
> 
> Of course, that also subtly changes the behavior of the RNG, so it might be
> a better idea to instead just put this code into an OutOfLine path. Grep for
> "OutOfLine" to find plenty of examples.
> 
> If you're calling out to C code for purposes of initialization, you might as
> well just call random_nextDouble() or math_random(), which will do the
> initialization check again and not require you to rejoin in the middle of
> the inline code.
> 
> Since rngstate isn't overwritten until the end, this is safe to do at any
> point.

Added OutOfLineRandom, which calls math_random_no_outparam.
Also, I changed define to defineFixed in LIRGeneratorX64::visitRandom, to make sure output register is ReturnDoubleReg, to reveive the return value of math_random_no_outparam directly. Is it correct way?

Then, I need to keep the value of xmm0 through restoreVolatile call, but `saveVolatile(ReturnDoubleReg)` saves/restores xmm0, and I need to add ReturnFloat32Reg/ReturnDoubleReg/ReturnInt32x4Reg/ReturnFloat32x4Reg to register set and call saveVolatile/restoreVolatile with it. I wonder this is a kind of bug. how do you think?

> ::: js/src/jit/Lowering.cpp
> @@ +1403,5 @@
> >  LIRGenerator::visitRandom(MRandom *ins)
> >  {
> > +#ifdef JS_CODEGEN_X64
> > +    LRandom *lir = new(alloc()) LRandom(tempFixed(CallTempReg0), tempFixed(CallTempReg1),
> > +                                        tempFixed(CallTempReg2), tempFixed(CallTempReg3),
> 
> Are fixed registers really required here?

Thanks, made all of them `temp()`.

> ::: js/src/jscompartment.h
> @@ +420,5 @@
> >      /* Random number generator state, used by jsmath.cpp. */
> >      uint64_t rngState;
> >  
> > +    static size_t offsetOfRngState() {
> > +        return offsetof(JSCompartment, rngState);
> 
> This use of offsetof() is technically undefined in C++ since JSCompartment
> is non-POD, but that's not your problem.

So is it okay to keep using offsetof there? or is there any other trick? (maybe getting the offset dynamically?)


Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=32c0aa87f322
Attachment #8585016 - Attachment is obsolete: true
Attachment #8599112 - Flags: review?(sstangl)
Okay, finally inlining on x86 works, and it's about 90% faster!

Currently it uses 7 registers (eax, edx, 4 generic temps, 1 float temp)
I'll post register mapping in next post, if we should reduce the number of registers, temp5 and temp6 (they're exclusive for JSCompartment and RNG_MULTIPLIER) could be removed by using temp2(edx) instead.

Performance result are following (same code as comment #3)
x86
  before: 1280866
  after:   667245 -- 92% faster

Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2105fa3912aa&exclusion_profile=false
Attachment #8600777 - Flags: review?(sstangl)
Comment on attachment 8599112 [details] [diff] [review]
Part 1: Inline Math.random() in Ion on x86_64.

Review of attachment 8599112 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the long wait. This is an exceptionally good patch. Thank you!

I will land it for you tomorrow independent of the x86 patch.

::: js/src/jit/x64/Assembler-x64.h
@@ +568,5 @@
> +    void imulq(Register src, Register dest) {
> +        masm.imulq_rr(src.encoding(), dest.encoding());
> +    }
> +    void vcvtsi2sdq(Register src, FloatRegister dest)
> +    {

nit: { on previous line

::: js/src/jit/x64/CodeGenerator-x64.cpp
@@ +830,5 @@
> +namespace js {
> +namespace jit {
> +
> +// Out-of-line math_random_no_outparam call for LRandom.
> +class OutOfLineRandom : public OutOfLineCodeBase<CodeGeneratorX64>

Normally this would go into the header file, but I'm not opposed to it going here instead.

@@ +896,5 @@
> +    // rngState = nextstate
> +
> +    // if rngState == 0, escape from inlined code and call
> +    // math_random_no_outparam.
> +    masm.branchTestPtr(Assembler::Zero, rngStateReg, rngStateReg, ool->entry());

Instead of testing again here, we can use j(Assembler::Zero, ool->entry()) after the andq(rngMaskReg, rngStateReg) above, since andq sets ZF.

@@ +909,5 @@
> +    masm.andq(rngMaskReg, rngStateReg);
> +
> +    // low = nextstate >> (RNG_STATE_WIDTH - RNG_LOW_BITS);
> +    masm.movq(rngStateReg, lowReg);
> +    masm.shrq(Imm32(RNG_STATE_WIDTH - RNG_LOW_BITS), lowReg);

This is OK instead of sarq because of the mask (also the shrq above).

::: js/src/jit/x64/LIR-x64.h
@@ +126,5 @@
> +{
> +  public:
> +    LIR_HEADER(Random)
> +    LRandom(const LDefinition &temp, const LDefinition &temp2, const LDefinition &temp3,
> +            const LDefinition &temp4, const LDefinition &temp5) {

nit: { on line by itself (for multi-line definitions)

::: js/src/jsmath.h
@@ +99,5 @@
>  static const double RNG_DSCALE = double(1LL << 53);
> +static const uint64_t RNG_MULTIPLIER = 0x5DEECE66DLL;
> +static const uint64_t RNG_ADDEND = 0xBLL;
> +static const int RNG_STATE_WIDTH = 48;
> +static const uint64_t RNG_MASK = (1LL << RNG_STATE_WIDTH) - 1;

nit: Could you please swap RNG_MASK with RNG_STATE_WIDTH, so that all the uint64_ts are together?
Attachment #8599112 - Flags: review?(sstangl) → review+
Thank you for reviewing!

(In reply to Sean Stangl [:sstangl] from comment #9)
> Comment on attachment 8599112 [details] [diff] [review]
> Part 1: Inline Math.random() in Ion on x86_64.
> 
> Review of attachment 8599112 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Sorry for the long wait. This is an exceptionally good patch. Thank you!
> 
> I will land it for you tomorrow independent of the x86 patch.

I appreciate your kindness :)

> @@ +896,5 @@
> > +    // rngState = nextstate
> > +
> > +    // if rngState == 0, escape from inlined code and call
> > +    // math_random_no_outparam.
> > +    masm.branchTestPtr(Assembler::Zero, rngStateReg, rngStateReg, ool->entry());
> 
> Instead of testing again here, we can use j(Assembler::Zero, ool->entry())
> after the andq(rngMaskReg, rngStateReg) above, since andq sets ZF.

Cool!

> ::: js/src/jsmath.h
> @@ +99,5 @@
> >  static const double RNG_DSCALE = double(1LL << 53);
> > +static const uint64_t RNG_MULTIPLIER = 0x5DEECE66DLL;
> > +static const uint64_t RNG_ADDEND = 0xBLL;
> > +static const int RNG_STATE_WIDTH = 48;
> > +static const uint64_t RNG_MASK = (1LL << RNG_STATE_WIDTH) - 1;
> 
> nit: Could you please swap RNG_MASK with RNG_STATE_WIDTH, so that all the
> uint64_ts are together?

They cannot be swapped because of the dependency. So I moved int variables above uint64_t variables. Is this okay?
Assignee: nobody → arai.unmht
Attachment #8599112 - Attachment is obsolete: true
Attachment #8613925 - Flags: review+
Keywords: leave-open
Depends on: 1176629
Comment on attachment 8600777 [details] [diff] [review]
Part 2: Inline Math.random() in Ion on x86.

Review of attachment 8600777 [details] [diff] [review]:
-----------------------------------------------------------------

Rather than duplicating this code for every architecture, I think it would be better to now take the code from the past bug for X64 and translate that code into platform-independent MacroAssembler invocations which can be shared across all of the architectures. That means that the codegen will be ever-so-slightly suboptimal for each individual architecture, but in practice it will produce no meaningful difference.
Attachment #8600777 - Flags: review?(sstangl)
Thank you for reviewing :)

Moved entire logic into js/src/jit/CodeGenerator.cpp, and added required methods to each macro assembler.

Other notable(?) changes are following:

1. Added Register64 struct, which has one register on 64bit, and 2 registers on 32bit, to make CodeGenerator compatible with both archs.

2. Added CodeGenerator*::setReturnDoubleRegs to set all bits for ReturnDoubleRegs in LiveRegisterSet.  saveVolatile doesn't exclude ReturnDoubleRegs unless setting all related bits.

3. LIRGenerator*::visitRandom are still in each arch directories, since x86 needs dedicated code to use EAX and EDX.


Performance comparison on x86 are following (same code as comment #3)
  x86
    before: 1272289
    after:  705209  -- 80% faster
there're no notable performance change on x64 code (already jitted).


Then, here're several questions:

a) x86 doesn't have scratch general register (am I correct?), and some methods (branchTest64, convertUInt64ToDouble, mulDoublePtr) need temporary register as argument.  Is this correct solution?

b) x86's MULL clobbers EAX and EDX (it means 2 extra registers are needed), so I have to push/pop registers (highReg) only on x86 (nbp told me that we cannot go above 5 registers, especially with profiler).  Do you have any ideas other than using #ifdef around push/pop in CodeGenerator.cpp?

c) Currently several methods on 32bit archs are partially implemented (branchTest64 supports only Assembler::Zero, mul64 supports only higher 32bits of imm is 5), is it okay?

d) on 64bit archs, 5 registers are allocated but only 3 registers are used.  Is there any demerit with allocating unused registers?

e) I haven't yet compared performance on ARM/MIPS (tested only correctness of the result, on simulator).  I'm going to check ARM performance on my Flame device.  Can I check MIPS performance somewhere? (I think some operations are not optimized well, like convertUInt64ToDouble, and they might introduce performance regression)

f) I haven't yet tested ARM64 code (neither performance nor correctness). Might be better to put MOZ_CRASH there?

g) I prepared testing function to set RNG state, to compare the result of Math.random() between interpreter and JIT:
     https://hg.mozilla.org/try/rev/d5ff53f85d4b
   Do you think it's better to make and land testsuite using this function, to test inlined code generates same result?


Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=82b37d7cd375
Attachment #8600777 - Attachment is obsolete: true
Attachment #8630828 - Flags: feedback?(sstangl)
Performance comparison on ARM with following code
  var t = Date.now();
  (function() {
    var n = 0;
    for (var i = 0; i < 10000000; i++)  {
      n += Math.random();
    }
    return n;
  })();
  var time = Date.now() - t;

result (average of 20 runs)
  before: 1302 [ms]
  after: 803 [ms] -- 62% faster :D
Comment on attachment 8630828 [details] [diff] [review]
Part 2: Move Math.random() to macro assembler.

Review of attachment 8630828 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great. I like it a lot.

a) Yes, x86 doesn't have a scratch register. The correct solution is to design the function call to take a temp register, as you have done. In general, our current solution of just using a magical global ScratchRegister occasionally leads to tricky problems when two functions (usually one calling the other) both believe they have exclusive ownership of that register. On ARM, this lead to the very bad solution of some functions using ScratchReg2. ARM64 requires mutation of MacroAssembler state to declare ownership of a scratch register before using it. Having the caller provide the scratch register is a good compromise.

b) No obvious improvement stands out to me. Push/pop is fine IMO.

c) It's fine by me. It isn't likely that these functions will be used elsewhere, and they are annotated with MOZ_CRASH. It doesn't seem likely that someone will come along and try to use these functions differently without encountering the crashes.

d) Those extra registers have to come from somewhere, so registers will be spilled. This is still better than the present situation where every register is spilled around a call. It would be even better to not have to spill needlessly.

e) None of us have MIPS hardware, unfortunately (it's not officially a tested platform).

f) No need, it should be fine. We'll figure it out in testing later anyway, once Ion works.

g) That's a great idea!

::: js/src/jit/LIR-Common.h
@@ +7068,5 @@
> +        setTemp(2, temp1);
> +        setTemp(3, temp2);
> +        setTemp(4, temp3);
> +    }
> +    const LDefinition* tempMaybeEAX() {

This probably deserves a comment.

::: js/src/jit/Registers.h
@@ +92,5 @@
>          return Codes::LastBit(x);
>      }
>  };
>  
> +#if defined(JS_CODEGEN_X64) || defined(JS_CODEGEN_ARM64)

#ifdef JS_PUNBOX64 covers all 64-bit architectures.

::: js/src/jit/arm64/MacroAssembler-arm64.h
@@ +1480,5 @@
>          Adds(scratch32, scratch32, Operand(imm.value));
>          Str(scratch32, MemOperand(ARMRegister(dest.base, 64), dest.offset));
>      }
> +    void add64(Imm32 imm, Register64 dest) {
> +        Adds(ARMRegister(dest.reg, 64), ARMRegister(dest.reg, 64), Operand(imm.value));

Does this have to be Adds instead of Add? (The -s version sets flags).

@@ +3121,5 @@
>          Add(xdest, xsrc, Operand(xsrc, vixl::LSL, 1));
>      }
>  
> +    void mul64(Imm64 imm, const Register64& dest) {
> +        mov(ImmWord(imm.value), ScratchReg);

Scratch registers cannot be used on ARM64 like this. For an example of proper usage, grep this file for "UseScratchRegisterScope".

@@ +3130,5 @@
> +        Ucvtf(ARMFPRegister(dest, 64), ARMRegister(src.reg, 64));
> +    }
> +    void mulDoublePtr(ImmPtr imm, Register temp, FloatRegister dest) {
> +        movePtr(imm, ScratchReg);
> +        loadDouble(Address(ScratchReg, 0), ScratchDoubleReg);

This also requires a UseScratchRegisterScope.

::: js/src/jit/mips/MacroAssembler-mips.h
@@ +731,5 @@
> +    void branchTest64(Condition cond, Register64 lhs, Register64 rhs, Register temp,
> +                      Label* label) {
> +        if (cond != Assembler::Zero) {
> +            MOZ_CRASH("Not supported condition");
> +        }

nit: { not used for single-line if()

@@ +1391,5 @@
>  
> +    void mul64(Imm64 imm, const Register64& dest) {
> +        // HIGH(dest) = LOW(HIGH(dest) * LOW(imm));
> +        ma_li(ScratchRegister, Imm32(imm.value & 0xFFFFFFFFL));
> +        as_multu(dest.high, ScratchRegister);

I would ask for a specific review of the MIPS code from :rankov.

::: js/src/jit/x86-shared/Assembler-x86-shared.h
@@ +1128,5 @@
>            default:
>              MOZ_CRASH("unexpected operand kind");
>          }
>      }
> +#ifdef JS_CODEGEN_X86

Can these x86-only functions be instead added to Assembler-x86.h?

::: js/src/jit/x86-shared/BaseAssembler-x86-shared.h
@@ +436,5 @@
>          spew("addb       %s, " MEM_obs, GPReg8Name(src), ADDR_obs(offset, base, index, scale));
>          m_formatter.oneByteOp8(OP_ADD_EbGb, offset, base, index, scale, src);
>      }
>  
> +#ifdef JS_CODEGEN_X86

Ditto for these functions?

::: js/src/jit/x86/Lowering-x86.cpp
@@ +416,5 @@
>  
>  void
>  LIRGeneratorX86::visitRandom(MRandom* ins)
>  {
> +    LRandom *lir = new(alloc()) LRandom(tempFixed(eax),

Suggest comment about eax/edx being necessary for imul.
Attachment #8630828 - Flags: feedback?(sstangl) → feedback+
Thank you for reviewing :D

(In reply to Sean Stangl [:sstangl] from comment #16)
> b) No obvious improvement stands out to me. Push/pop is fine IMO.

So, it's okay to use #ifdef in CodeGenerator.cpp, right? (or should I remove the #ifdef and do push/pop on all archs? or wrap them with some method which does push/pop only on x86?)

> d) Those extra registers have to come from somewhere, so registers will be
> spilled. This is still better than the present situation where every
> register is spilled around a call. It would be even better to not have to
> spill needlessly.

What would be the best way to reduce register allocation only on 64bit archs?  Can I specify invalid register or same register or something as LDefinition passed to LRandom?  or should I move LRandom definition to each directory and reduce its argument?

> ::: js/src/jit/x86-shared/BaseAssembler-x86-shared.h
> @@ +436,5 @@
> >          spew("addb       %s, " MEM_obs, GPReg8Name(src), ADDR_obs(offset, base, index, scale));
> >          m_formatter.oneByteOp8(OP_ADD_EbGb, offset, base, index, scale, src);
> >      }
> >  
> > +#ifdef JS_CODEGEN_X86
> 
> Ditto for these functions?

Where should I move them to?  currently BaseAssembler exists only in x86-shared.  maybe into Assembler-x86.cpp?
Flags: needinfo?(sstangl)
(In reply to Tooru Fujisawa [:arai] from comment #17)
> Thank you for reviewing :D
> 
> (In reply to Sean Stangl [:sstangl] from comment #16)
> > b) No obvious improvement stands out to me. Push/pop is fine IMO.
> 
> So, it's okay to use #ifdef in CodeGenerator.cpp, right? (or should I remove
> the #ifdef and do push/pop on all archs? or wrap them with some method which
> does push/pop only on x86?)

In general no, but in this one particular case I think an exception is valid. The #ifdefs are probably easiest to read.

> 
> > d) Those extra registers have to come from somewhere, so registers will be
> > spilled. This is still better than the present situation where every
> > register is spilled around a call. It would be even better to not have to
> > spill needlessly.
> 
> What would be the best way to reduce register allocation only on 64bit
> archs?  Can I specify invalid register or same register or something as
> LDefinition passed to LRandom?  or should I move LRandom definition to each
> directory and reduce its argument?

Can you just use an #ifndef to ask for one fewer register on those architectures?

> 
> > ::: js/src/jit/x86-shared/BaseAssembler-x86-shared.h
> > @@ +436,5 @@
> > >          spew("addb       %s, " MEM_obs, GPReg8Name(src), ADDR_obs(offset, base, index, scale));
> > >          m_formatter.oneByteOp8(OP_ADD_EbGb, offset, base, index, scale, src);
> > >      }
> > >  
> > > +#ifdef JS_CODEGEN_X86
> > 
> > Ditto for these functions?
> 
> Where should I move them to?  currently BaseAssembler exists only in
> x86-shared.  maybe into Assembler-x86.cpp?

Making a new file BaseAssembler-x86.cpp would be valid.
Flags: needinfo?(sstangl)
Attachment #8613925 - Flags: checkin+
Okay, in new Part 2, moved #ifdefs in BaseAssembler-x86-shared.h to each directory.  Almost just cut-n-paste work for each method, fixed brace position after function arguments in some places.
Currently I moved only BaseAssembler methods, not its inner classes.  Also, changed private to protected, to call them from inherited BaseAssemblerX86/BaseAssemblerX64 classes.
Attachment #8630828 - Attachment is obsolete: true
Attachment #8634608 - Flags: review?(sstangl)
(In reply to Sean Stangl [:sstangl] from comment #16)
> a) Yes, x86 doesn't have a scratch register. The correct solution is to
> design the function call to take a temp register, as you have done. In
> general, our current solution of just using a magical global ScratchRegister
> occasionally leads to tricky problems when two functions (usually one
> calling the other) both believe they have exclusive ownership of that
> register. On ARM, this lead to the very bad solution of some functions using
> ScratchReg2. ARM64 requires mutation of MacroAssembler state to declare
> ownership of a scratch register before using it. Having the caller provide
> the scratch register is a good compromise.

Added comment to tempReg declaration, to clarify that it doesn't conflict with new rngStateReg in 64bit.

> d) Those extra registers have to come from somewhere, so registers will be
> spilled. This is still better than the present situation where every
> register is spilled around a call. It would be even better to not have to
> spill needlessly.

Added #ifdef/#ifndefs to LIR-Common.h and CodeGenerator.cpp, and changed Register64 ctor on 64bit to take only one argument, and now rngStateReg uses tempMaybeEAX, and highReg uses tempMaybeEDX.
temp2 and temp3 are removed from 64bit archs, so changed visitOutOfLineRandom to use tempMaybeEAX/tempMaybeEDX instead of temp1/temp2.

> g) That's a great idea!

Thanks :) will post it as Part 4.

> ::: js/src/jit/arm64/MacroAssembler-arm64.h
> @@ +1480,5 @@
> >          Adds(scratch32, scratch32, Operand(imm.value));
> >          Str(scratch32, MemOperand(ARMRegister(dest.base, 64), dest.offset));
> >      }
> > +    void add64(Imm32 imm, Register64 dest) {
> > +        Adds(ARMRegister(dest.reg, 64), ARMRegister(dest.reg, 64), Operand(imm.value));
> 
> Does this have to be Adds instead of Add? (The -s version sets flags).

Fixed. no need to use -s.

> @@ +3121,5 @@
> >          Add(xdest, xsrc, Operand(xsrc, vixl::LSL, 1));
> >      }
> >  
> > +    void mul64(Imm64 imm, const Register64& dest) {
> > +        mov(ImmWord(imm.value), ScratchReg);
> 
> Scratch registers cannot be used on ARM64 like this. For an example of
> proper usage, grep this file for "UseScratchRegisterScope".
> 
> @@ +3130,5 @@
> > +        Ucvtf(ARMFPRegister(dest, 64), ARMRegister(src.reg, 64));
> > +    }
> > +    void mulDoublePtr(ImmPtr imm, Register temp, FloatRegister dest) {
> > +        movePtr(imm, ScratchReg);
> > +        loadDouble(Address(ScratchReg, 0), ScratchDoubleReg);
> 
> This also requires a UseScratchRegisterScope.

Thanks, fixed both ScratchReg and ScratchDoubleReg to AcquireX and AcquireD.

> ::: js/src/jit/x86-shared/BaseAssembler-x86-shared.h
> @@ +436,5 @@
> >          spew("addb       %s, " MEM_obs, GPReg8Name(src), ADDR_obs(offset, base, index, scale));
> >          m_formatter.oneByteOp8(OP_ADD_EbGb, offset, base, index, scale, src);
> >      }
> >  
> > +#ifdef JS_CODEGEN_X86
> 
> Ditto for these functions?

Moved to BaseAssembler-x86.h, added by Part 2.


rankov, would you review the MIPS part of this patch?
Attachment #8634609 - Flags: review?(sstangl)
Attachment #8634609 - Flags: review?(branislav.rankov)
Added testcases for Math.random(), which checks some values of deterministic output sequence to test visitRandom impl, and the `rngState == 0` cases both before high bits and low bits to test visitOutOfLineRandom impl.

Green on try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f7dbd307cb6a
Attachment #8634610 - Flags: review?(sstangl)
Comment on attachment 8634609 [details] [diff] [review]
Part 3: Move Math.random() to macro assembler.

Review of attachment 8634609 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the late review.

I have only looked at MIPS code. It seems good with a few minor issues.

I'm giving feedback+ since Sean needs to look at the rest of the files.

::: js/src/jit/mips/CodeGenerator-mips.cpp
@@ +2159,5 @@
> +    MOZ_ASSERT(ReturnFloat32Reg.code_ == FloatRegisters::f0);
> +    MOZ_ASSERT(ReturnDoubleReg.code_ == FloatRegisters::f0);
> +    FloatRegister f1  = { FloatRegisters::f1, FloatRegister::Single };
> +    regs->add(ReturnFloat32Reg);
> +    regs->add(f1);

You can use ReturnDoubleReg.singleOverlay(1) instead of f1.

::: js/src/jit/mips/MacroAssembler-mips.cpp
@@ +84,5 @@
> +MacroAssemblerMIPSCompat::convertUInt64ToDouble(Register64 src, Register temp, FloatRegister dest)
> +{
> +    convertUInt32ToDouble(src.high, dest);
> +    movePtr(ImmPtr(&TO_DOUBLE_HIGH_SCALE), ScratchRegister);
> +    loadDouble(Address(ScratchRegister, 0), ScratchDoubleReg);

You can use loadConstantDouble here. It will be more efficient (3 instructions instead of 3+). And it is cleaner.

::: js/src/jit/mips/MacroAssembler-mips.h
@@ +727,5 @@
>      }
>      void branchTestPtr(Condition cond, const Address& lhs, Imm32 imm, Label* label) {
>          branchTest32(cond, lhs, imm, label);
>      }
> +    void branchTest64(Condition cond, Register64 lhs, Register64 rhs, Register temp,

Maybe move the implementation to .cpp file.

@@ +734,5 @@
> +            MOZ_CRASH("Not supported condition");
> +        MOZ_ASSERT(lhs.low == rhs.low);
> +        MOZ_ASSERT(lhs.high == rhs.high);
> +        move32(lhs.low, ScratchRegister);
> +        or32(lhs.high, ScratchRegister);

Since this is mips specific file, you can use:
ma_or(ScratchRegister, lhs.low, lhs.high);

@@ +1190,5 @@
>      void and32(Imm32 imm, const Address& dest);
>      void and32(const Address& src, Register dest);
> +    void and64(Imm64 imm, Register64 dest) {
> +        and32(Imm32(imm.value & 0xFFFFFFFFL), dest.low);
> +        and32(Imm32((imm.value >> 32) & 0xFFFFFFFFL), dest.high);

You should create constant on the top of the file for the mask. Also it seems that preferred way to write these masks is:
#define LOW_32_MASK ((1L << 32) - 1)

@@ +1241,5 @@
>      void load32(const BaseIndex& address, Register dest);
>      void load32(AbsoluteAddress address, Register dest);
> +    void load64(const Address& address, Register64 dest) {
> +        load32(address, dest.low);
> +        load32(Address(address.base, address.offset + 4), dest.high);

Please add constants on the top of the file similar to PAYLOAD_OFFSET and TAG_OFFSET. You can call them LOW32_OFFSET and HIGH32_OFFSET.
This will be useful if and when someone does big endian port.

@@ +1315,5 @@
>      }
>  
> +    void store64(Register64 src, Address address) {
> +        store32(src.low, address);
> +        store32(src.high, Address(address.base, address.offset + 4));

Same comment as load64.

@@ +1387,5 @@
>          as_addu(dest, src, src);
>          as_addu(dest, dest, src);
>      }
>  
> +    void mul64(Imm64 imm, const Register64& dest) {

This is also large function and should be moved to cpp file.

@@ +1405,5 @@
> +        if (((imm.value >> 32) & 0xFFFFFFFFL) == 5) {
> +            as_sll(ScratchRegister, dest.low, 2);
> +            as_addu(ScratchRegister, ScratchRegister, dest.low);
> +        } else {
> +            MOZ_CRASH("Not supported imm");

Why not implement this case also? You only need to save mflo value to SecondScratchReg.
Attachment #8634609 - Flags: review?(branislav.rankov) → feedback+
Thank you rankov :)
Addressed those comments locally.

(In reply to Branislav Rankov [:rankov] from comment #22)
> @@ +1405,5 @@
> > +        if (((imm.value >> 32) & 0xFFFFFFFFL) == 5) {
> > +            as_sll(ScratchRegister, dest.low, 2);
> > +            as_addu(ScratchRegister, ScratchRegister, dest.low);
> > +        } else {
> > +            MOZ_CRASH("Not supported imm");
> 
> Why not implement this case also? You only need to save mflo value to
> SecondScratchReg.

Simply because that case is not used by Math.random().  Okay, I'll add that code too.
rebased and addressed feedback comment
Attachment #8634609 - Attachment is obsolete: true
Attachment #8634609 - Flags: review?(sstangl)
Attachment #8646636 - Flags: review?(sstangl)
rebased
Attachment #8634610 - Attachment is obsolete: true
Attachment #8634610 - Flags: review?(sstangl)
Attachment #8646637 - Flags: review?(sstangl)
Comment on attachment 8646635 [details] [diff] [review]
Part 2: Move architecture specific function from BaseAssembler-x86-shared.h to BaseAssembler-x86.h and BaseAssembler-x64.h.

Review of attachment 8646635 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me. Thanks for doing this.

::: js/src/jit/x64/BaseAssembler-x64.h
@@ +15,5 @@
> +namespace X86Encoding {
> +
> +class BaseAssemblerX64 : public BaseAssembler
> +{
> +public:

nit: 2-space indent

::: js/src/jit/x86/BaseAssembler-x86.h
@@ +15,5 @@
> +namespace X86Encoding {
> +
> +class BaseAssemblerX86 : public BaseAssembler
> +{
> +public:

nit: 2-space indent
Attachment #8646635 - Flags: review?(sstangl) → review+
Comment on attachment 8646636 [details] [diff] [review]
Part 3: Move Math.random() to macro assembler.

Review of attachment 8646636 [details] [diff] [review]:
-----------------------------------------------------------------

This is very cool. It took a while to review. Giving r+, but there are still some parts that could use added commentary to increase ease-of-reading for future maintenance. I noted those sections in the review. In particular, I'm not sure where the magic numbers in ALIGN_TO_DOUBLE come from, and the code that uses ALIGN_TO_DOUBLE really needs verbal explanation.

r+ assuming commentary for MacroAssemblerX86::convertUInt64ToDouble is provided and makes sense :)

::: js/src/jit/Registers.h
@@ +92,5 @@
>          return Codes::LastBit(x);
>      }
>  };
>  
> +#ifdef JS_PUNBOX64

Would prefer the #ifdef within the body of the struct {}.

@@ +93,5 @@
>      }
>  };
>  
> +#ifdef JS_PUNBOX64
> +struct Register64 {

nit: { must be on new line.

@@ +97,5 @@
> +struct Register64 {
> +    Register reg;
> +
> +    Register64(Register r) {
> +        reg = r;

Preferred:

MOZ_CONSTEXPR Register64(Register r)
  : reg(r)
{ }

Similarly for the other struct.

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +96,5 @@
> +    movePtr(ImmPtr(&TO_DOUBLE_HIGH_SCALE), ScratchRegister);
> +    loadDouble(Address(ScratchRegister, 0), ScratchDoubleReg);
> +    mulDouble(ScratchDoubleReg, dest);
> +    convertUInt32ToDouble(src.low, ScratchDoubleReg);
> +    addDouble(ScratchDoubleReg, dest);

Is this same logic not suitable for x86 also? It is much easier to read and verify.

::: js/src/jit/arm/MacroAssembler-arm.h
@@ +1014,5 @@
>          branch32(cond, lhs, Imm32(0), label);
>      }
> +    void branchTest64(Condition cond, Register64 lhs, Register64 rhs, Register temp,
> +                      Label* label) {
> +        if (cond != Assembler::Zero)

See comment on the x86 code about how to structure this

@@ +1715,5 @@
>      }
>      void mulBy3(const Register& src, const Register& dest) {
>          as_add(dest, src, lsl(src, 1));
>      }
> +    void mul64(Imm64 imm, const Register64& dest) {

Could use similar commentary as with the x86 version.

::: js/src/jit/arm64/MacroAssembler-arm64.h
@@ +1182,5 @@
>          load32(src, scratch32.asUnsized());
>          And(ARMRegister(dest, 32), ARMRegister(dest, 32), Operand(scratch32));
>      }
> +    void and64(Imm64 imm, Register64 dest) {
> +        mov(ImmWord(imm.value), ScratchReg);

ARM64 doesn't have a named ScratchReg, so this won't compile. To get a scratch register, use vixl::UseScratchRegisterScope with AcquireX().

Something like this should work:

vixl::UseScratchRegisterScope temps(this);
const Register scratch = temps.AcquireX().asUnsized();
mov(ImmWord(imm.value), scratch);
andPtr(scratch, dest.reg);

To compile ARM64 code, pass --enable-simulator=arm64 as a build option. Although you can't actually test the code since we don't support Ion yet.

@@ +1275,5 @@
>      void loadDouble(const Address& src, FloatRegister dest) {
>          Ldr(ARMFPRegister(dest, 64), MemOperand(ARMRegister(src.base,64), src.offset));
>      }
> +    void loadDouble(const Address& src, ARMFPRegister dest) {
> +        Ldr(dest, MemOperand(ARMRegister(src.base,64), src.offset));

MemOperand has an Address constructor, so you can just do MemOperand(src).

The loadDouble() above it there could also be changed...

@@ +1372,5 @@
>      }
>      void mulDouble(FloatRegister src, FloatRegister dest) {
>          fmul(ARMFPRegister(dest, 64), ARMFPRegister(dest, 64), ARMFPRegister(src, 64));
>      }
> +    void mulDouble(ARMFPRegister src, FloatRegister dest) {

It's strange to see an ARMFPRegister in the non-VIXL MacroAssembler interface. Should this be a FloatRegister also?

@@ +2032,5 @@
>          Cmp(ARMRegister(value.valueReg(), 64), Operand(scratch64));
>          B(label, cond);
>      }
> +    void branchTest64(Condition cond, Register64 lhs, Register64 rhs, Register temp,
> +                      Label* label) {

Move last argument to previous line

@@ +3153,5 @@
> +        vixl::UseScratchRegisterScope temps(this);
> +        const ARMRegister scratch64 = temps.AcquireX();
> +        MOZ_ASSERT(dest.reg != scratch64.asUnsized());
> +        mov(ImmWord(imm.value), scratch64.asUnsized());
> +        Smulh(ARMRegister(dest.reg, 64), ARMRegister(dest.reg, 64), scratch64);

This instruction multiplies dest by scratch and returns the bits in positions <127:64>! Probably this is not what is intended. Do you instead mean Mul?

::: js/src/jit/mips/MacroAssembler-mips.cpp
@@ +79,5 @@
>  }
>  
>  void
> +MacroAssemblerMIPSCompat::mul64(Imm64 imm, const Register64& dest)
> +{

I would ask :rankov to review the MIPS code.

::: js/src/jit/shared/LIR-shared.h
@@ +7159,5 @@
>  
> +// Math.random().
> +class LRandom : public LInstructionHelper<1, 0,
> +#ifdef JS_PUNBOX64
> +                                          3

Heh, instead of this, I would prefer:

#ifdef JS_PUNBOX64
# define LRANDOM_NUM_TEMPS 3
#else
# define LRANDOM_NUM_TEMPS 5
#endif

class LRandom : public LInstructionHelper<1, 0, LRANDOM_NUM_TEMPS>

::: js/src/jit/x64/MacroAssembler-x64.h
@@ +711,5 @@
>          j(cond, label);
>      }
>  
> +    void branchTest64(Condition cond, Register64 lhs, Register64 rhs, Register temp,
> +                      Label* label) {

Label* on previous line (or "{" on new line if it can't fit in 100 columns)

::: js/src/jit/x86/Lowering-x86.cpp
@@ +440,5 @@
>  
>  void
>  LIRGeneratorX86::visitRandom(MRandom* ins)
>  {
> +    // eax and edx are nexessary for mull.

nit: necessary

::: js/src/jit/x86/MacroAssembler-x86.cpp
@@ +25,5 @@
> +#else
> +#define ALIGN __attribute__(( aligned (16) ))
> +#endif
> +
> +static const uint64_t ALIGN TO_DOUBLE[4] = {

This /really/ wants commentary. I'm not sure what this is.

@@ +44,5 @@
> +
> +    movePtr(ImmPtr(TO_DOUBLE), temp);
> +    vpunpckldq(Operand(temp, 0), dest, dest);
> +    vsubpd(Operand(temp, sizeof(uint64_t) * 2), dest, dest);
> +    vhaddpd(dest, dest);

Could you explain what's happening here, extremely verbosely?

::: js/src/jit/x86/MacroAssembler-x86.h
@@ +644,5 @@
>      void mulBy3(const Register& src, const Register& dest) {
>          lea(Operand(src, src, TimesTwo), dest);
>      }
> +    // Note: this function clobbers eax and edx.
> +    void mul64(Imm64 imm, const Register64& dest) {

This looks good, although it did take a while to read. An explanatory comment would help, something obvious like:

Low32 = Low(Low(Dest) * Low(Imm));
High32 = Low(High(Dest) * Low(Imm)) [multiply Imm into upper bits]
       + Low(Low(Dest) * High(Imm)) [multiply Dest into upper bits]
       + High(Low(Dest) * Low(Imm)) [carry]

@@ +759,5 @@
>          j(cond, label);
>      }
>  
> +    void branchTest64(Condition cond, Register64 lhs, Register64 rhs, Register temp,
> +                      Label* label) {

Label on previous line or { on new line

@@ +766,5 @@
> +        MOZ_ASSERT(lhs.low == rhs.low);
> +        MOZ_ASSERT(lhs.high == rhs.high);
> +        movl(lhs.low, temp);
> +        orl(lhs.high, temp);
> +        branchTestPtr(cond, temp, temp, label);

Prefer the structure:

if (cond == Assembler::Zero) {
        movl(lhs.low, temp);
        orl(lhs.high, temp);
	branchTestPtr(cond, temp, temp, label);
} else {
        MOZ_CRASH("Unsupported condition");
}
Attachment #8646636 - Flags: review?(sstangl) → review+
Comment on attachment 8646637 [details] [diff] [review]
Part 4: Add setRNGState testing function.

Review of attachment 8646637 [details] [diff] [review]:
-----------------------------------------------------------------

Great patch.

::: js/src/builtin/TestingFunctions.cpp
@@ +2678,5 @@
>      return true;
>  }
>  
> +static bool
> +SetRNGState(JSContext* cx, unsigned argc, Value* vp)

I would strongly prefer that SetRNGState() be only defined #ifdef DEBUG.

@@ +3127,5 @@
>  "setARMHwCapFlags(\"flag1,flag2 flag3\")",
>  "  On non-ARM, no-op. On ARM, set the hardware capabilities. The list of \n"
>  "  flags is available by calling this function with \"help\" as the flag's name"),
>  
> +    JS_FN_HELP("setRNGState", SetRNGState, 1, 0,

Likewise #ifdef DEBUG.

::: js/src/jit-test/tests/basic/math-random.js
@@ +1,3 @@
> +// With this seed, state won't be reset in 10000 iteration.  The result is
> +// deterministic and it can be used to check the correctness of implementation.
> +setRNGState(0x12341234);

This test should only run if "setRNGState" is defined.
Attachment #8646637 - Flags: review?(sstangl) → review+
Comment on attachment 8646636 [details] [diff] [review]
Part 3: Move Math.random() to macro assembler.

Would you be able to look at the MIPS code here?

It is defining methods to work on a new struct "Register64", which allows for manipulation of 64-bit data (add, mul, etc.) on 32-bit platforms.
Attachment #8646636 - Flags: review?(branislav.rankov)
Thank you for reviewing :D

Try run with updated patch is running now:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=055eec63367a

(In reply to Sean Stangl [:sstangl] from comment #28)
> ::: js/src/jit/arm/MacroAssembler-arm.cpp
> @@ +96,5 @@
> > +    movePtr(ImmPtr(&TO_DOUBLE_HIGH_SCALE), ScratchRegister);
> > +    loadDouble(Address(ScratchRegister, 0), ScratchDoubleReg);
> > +    mulDouble(ScratchDoubleReg, dest);
> > +    convertUInt32ToDouble(src.low, ScratchDoubleReg);
> > +    addDouble(ScratchDoubleReg, dest);
> 
> Is this same logic not suitable for x86 also? It is much easier to read and
> verify.

This is a bit less-performant code than x86's one (I'm not sure how it can be optimized for arm and mips).  There're 5% performance difference in total (not convertUInt64ToDouble itself),  performance results are following (same code as comment #3):
  x86 with arm's convertUInt64ToDouble
    769283
  x86 with current patch
    732451 -- 5 % faster

> @@ +1372,5 @@
> >      }
> >      void mulDouble(FloatRegister src, FloatRegister dest) {
> >          fmul(ARMFPRegister(dest, 64), ARMFPRegister(dest, 64), ARMFPRegister(src, 64));
> >      }
> > +    void mulDouble(ARMFPRegister src, FloatRegister dest) {
> 
> It's strange to see an ARMFPRegister in the non-VIXL MacroAssembler
> interface. Should this be a FloatRegister also?

Those loadDouble and mulDouble with ARMFPRegister are used only by mulDoublePtr, so moved those code into mulDoublePtr, instead of adding those methods.

> @@ +3153,5 @@
> > +        vixl::UseScratchRegisterScope temps(this);
> > +        const ARMRegister scratch64 = temps.AcquireX();
> > +        MOZ_ASSERT(dest.reg != scratch64.asUnsized());
> > +        mov(ImmWord(imm.value), scratch64.asUnsized());
> > +        Smulh(ARMRegister(dest.reg, 64), ARMRegister(dest.reg, 64), scratch64);
> 
> This instruction multiplies dest by scratch and returns the bits in
> positions <127:64>! Probably this is not what is intended. Do you instead
> mean Mul?

Thanks, it should be Mul.

> ::: js/src/jit/mips/MacroAssembler-mips.cpp
> @@ +79,5 @@
> >  }
> >  
> >  void
> > +MacroAssemblerMIPSCompat::mul64(Imm64 imm, const Register64& dest)
> > +{
> 
> I would ask :rankov to review the MIPS code.

Already got f+ on comment #5, should I post updated one and get r+?

> ::: js/src/jit/x86/MacroAssembler-x86.cpp
> @@ +25,5 @@
> > +#else
> > +#define ALIGN __attribute__(( aligned (16) ))
> > +#endif
> > +
> > +static const uint64_t ALIGN TO_DOUBLE[4] = {
> 
> This /really/ wants commentary. I'm not sure what this is.
> 
> @@ +44,5 @@
> > +
> > +    movePtr(ImmPtr(TO_DOUBLE), temp);
> > +    vpunpckldq(Operand(temp, 0), dest, dest);
> > +    vsubpd(Operand(temp, sizeof(uint64_t) * 2), dest, dest);
> > +    vhaddpd(dest, dest);
> 
> Could you explain what's happening here, extremely verbosely?

Okay, added comments.
  https://hg.mozilla.org/try/rev/8aeca4d55b03#l40.12
Do they make sense?
Flags: needinfo?(sstangl)
(In reply to Tooru Fujisawa [:arai] from comment #31)
> This is a bit less-performant code than x86's one (I'm not sure how it can
> be optimized for arm and mips).  There're 5% performance difference in total
> (not convertUInt64ToDouble itself),  performance results are following (same
> code as comment #3):
>   x86 with arm's convertUInt64ToDouble
>     769283
>   x86 with current patch
>     732451 -- 5 % faster

OK, just wondering. Thanks for checking!

> > ::: js/src/jit/mips/MacroAssembler-mips.cpp
> > @@ +79,5 @@
> > >  }
> > >  
> > >  void
> > > +MacroAssemblerMIPSCompat::mul64(Imm64 imm, const Register64& dest)
> > > +{
> > 
> > I would ask :rankov to review the MIPS code.
> 
> Already got f+ on comment #5, should I post updated one and get r+?

I'm not sure what f+ you're referring to, but I asked rankov for r? on the MIPS code in Comment 30.

> Okay, added comments.
>   https://hg.mozilla.org/try/rev/8aeca4d55b03#l40.12
> Do they make sense?

Discussed on IRC -- the comments are great, but they show that we're accidentally using named double registers when we should be using named XMM registers! That causes us to clobber XMM registers to which we do not have access. Most of this can probably be fixed with some usage of ScratchSimdReg. Can post an updated patch with that fix for a quick r+.

The comments are really great, though. Thank you so much for writing them!
Flags: needinfo?(sstangl)
(In reply to Sean Stangl [:sstangl] from comment #32)
> > > ::: js/src/jit/mips/MacroAssembler-mips.cpp
> > > @@ +79,5 @@
> > > >  }
> > > >  
> > > >  void
> > > > +MacroAssemblerMIPSCompat::mul64(Imm64 imm, const Register64& dest)
> > > > +{
> > > 
> > > I would ask :rankov to review the MIPS code.
> > 
> > Already got f+ on comment #5, should I post updated one and get r+?
> 
> I'm not sure what f+ you're referring to, but I asked rankov for r? on the
> MIPS code in Comment 30.

sorry, I meant comment #22.

> > Okay, added comments.
> >   https://hg.mozilla.org/try/rev/8aeca4d55b03#l40.12
> > Do they make sense?
> 
> Discussed on IRC -- the comments are great, but they show that we're
> accidentally using named double registers when we should be using named XMM
> registers! That causes us to clobber XMM registers to which we do not have
> access. Most of this can probably be fixed with some usage of
> ScratchSimdReg. Can post an updated patch with that fix for a quick r+.
> 
> The comments are really great, though. Thank you so much for writing them!

It means that I should keep the value of higher 64bit of xmm0 (for ReturnDoubleReg) through the operations?  I cannot find any instruction which doesn't clobber higher 64bit when moving value to lower 64bit, do you know that instruction?  Also, in which case higher 64bit contains other data while lower 64bit can be rewritten?
Flags: needinfo?(sstangl)
As lower 64-bit of XMM register is used as double register, and in that case currently higher 64-bit is not used, I'd like to go with current way, so, using higher 64-bit as temporary space.
To make it clear, changed dest and ScratchDoubleReg to 128-bit registers dest128 (= FloatRegister(dest.encoding(), FloatRegisters::Int32x4)) and ScratchInt32x4Reg (= FloatRegister(X86Encoding::xmm7, FloatRegisters::Int32x4), added because ScratchSimdReg.size() is 8), and added some comments and asserts.
Attachment #8646636 - Attachment is obsolete: true
Attachment #8646636 - Flags: review?(branislav.rankov)
Attachment #8650622 - Flags: review?(sstangl)
Attachment #8650622 - Flags: review?(branislav.rankov)
Comment on attachment 8650622 [details] [diff] [review]
Part 3: Move Math.random() to macro assembler.

Review of attachment 8650622 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the changes. LGTM.

::: js/src/jit/x86/MacroAssembler-x86.cpp
@@ +23,5 @@
> +// vpunpckldq requires 16-byte boundary for memory operand.
> +#ifdef _MSC_VER
> +#define ALIGN __declspec(align(16))
> +#else
> +#define ALIGN __attribute__(( aligned (16) ))

ultra nit: indent both of these #defines as "# define"

Also, unfortunately, ALIGN() seems to be defined by ffi_common.h. Normally this wouldn't matter, but with our unified-file build system, the symbol conflict may produce an error based on file ordering. Just to be safe, this could be named something like ALIGN16 or something.
Attachment #8650622 - Flags: review?(sstangl) → review+
Comment on attachment 8650622 [details] [diff] [review]
Part 3: Move Math.random() to macro assembler.

Review of attachment 8650622 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/x86/MacroAssembler-x86.cpp
@@ +27,5 @@
> +#define ALIGN __attribute__(( aligned (16) ))
> +#endif
> +
> +// See convertUInt64ToDouble for the details.
> +static const uint64_t ALIGN TO_DOUBLE[4] = {

Use #include "mozilla/Alignment.h" and MOZ_ALIGNED_DECL for this.
looks like rankov is busy now, is there anything I can do now for this?
Comment on attachment 8650622 [details] [diff] [review]
Part 3: Move Math.random() to macro assembler.

hi hev, can you review changes under js/src/jit/mips32 ?  this patch adds several 64bit operations used by Math.random().
Attachment #8650622 - Flags: review?(branislav.rankov) → review?(r)
Attachment #8650622 - Flags: review?(r) → review+
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/91f4cc676c0a for somehow causing SM(p) on Windows to OOM in oomInFormatStackDump.js
Here're try runs for these patches separately:

for bug 195578: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f5703860d9ed
for this bug Part 2: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a06c150001c9
for this bug Part 3 (and Part 5): https://treeherder.mozilla.org/#/jobs?repo=try&revision=84c0d68984a8
for this bug Part 4: https://treeherder.mozilla.org/#/jobs?repo=try&revision=aaca0ed7b83e
removing tests added by above: https://treeherder.mozilla.org/#/jobs?repo=try&revision=17165a3811b5

So, Part 4 causes the orange, and removing the test does not help.  Just adding SetRNGState testing function causes it :P
oops, wrong bug number. it's bug 1195578.
Depends on: 1203814
The test is following:
  https://dxr.mozilla.org/mozilla-central/source/js/src/jit-test/tests/gc/oomInFormatStackDump.js
  https://dxr.mozilla.org/mozilla-central/source/js/src/jit-test/lib/oomTest.js

putting resetOOMFailure into |try-catch| helps, so the OOM happens there.
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=b156811e08b9

putting resetOOMFailure into |with| makes it worse, it causes a bunch of OOMs.
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=22b446a4f72c

also, here's try run that only adds testing function, no other related patches before that.  It also causes orange, so other parts are definitely not-related to this orange :/
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=eb3d7a51f85e

Filed as bug 1203814.
Separated and64 and or64 from Part 3, and moved it to generic MacroAssembler.
I'll merge it later.
Attachment #8659882 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8659882 [details] [diff] [review]
Part 3 followup: Add and64 and or64 to the generic macro assembler and *-inl.h (will be merged to Part 3 later).

Review of attachment 8659882 [details] [diff] [review]:
-----------------------------------------------------------------

Apply nits and r=me.

::: js/src/jit/MacroAssembler.h
@@ +679,5 @@
>  
>      inline void andPtr(Register src, Register dest) PER_ARCH;
>      inline void andPtr(Imm32 imm, Register dest) PER_ARCH;
>  
> +    inline void and64(Imm64 imm, Register64 dest) PER_SHARED_ARCH;

nit: Run "make check-masm" and fix this line.

@@ +688,5 @@
>  
>      inline void orPtr(Register src, Register dest) PER_ARCH;
>      inline void orPtr(Imm32 imm, Register dest) PER_ARCH;
>  
> +    inline void or64(Register64 src, Register64 dest) PER_SHARED_ARCH;

same here.
Attachment #8659882 - Flags: review?(nicolas.b.pierron) → review+
Flags: needinfo?(sstangl)
forgot to remove leave-open
Status: NEW → RESOLVED
Closed: 9 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
According to AWFY, this is a 30% win on dromaeo-sunspider-string-validate-input (https://arewefastyet.com/#machine=17&view=single&suite=dromaeo&subtest=sunspider-string-validate-input)
Depends on: 1208876
Depends on: 1209132
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: