Closed Bug 1279248 Opened 4 years ago Closed 3 years ago

Implement 64bit integer operations on x86

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox50 --- fixed

People

(Reporter: h4writer, Assigned: h4writer)

References

Details

Attachments

(27 files, 9 obsolete files)

9.50 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
8.86 KB, patch
lth
: review+
Details | Diff | Splinter Review
11.54 KB, patch
lth
: review+
Details | Diff | Splinter Review
29.90 KB, patch
jandem
: review+
Details | Diff | Splinter Review
30.12 KB, patch
luke
: review+
Details | Diff | Splinter Review
10.58 KB, patch
luke
: review+
Details | Diff | Splinter Review
30.55 KB, patch
jandem
: review+
Details | Diff | Splinter Review
8.22 KB, patch
efaust
: review+
Details | Diff | Splinter Review
2.40 KB, patch
efaust
: review+
Details | Diff | Splinter Review
12.19 KB, patch
efaust
: review+
Details | Diff | Splinter Review
1.73 KB, patch
efaust
: review+
Details | Diff | Splinter Review
9.06 KB, patch
lth
: review+
Details | Diff | Splinter Review
7.26 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
1.94 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
48.87 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
11.13 KB, patch
sunfish
: review+
Details | Diff | Splinter Review
20.76 KB, patch
nbp
: review+
Details | Diff | Splinter Review
1.82 KB, patch
nbp
: review+
Details | Diff | Splinter Review
8.47 KB, patch
luke
: review+
Details | Diff | Splinter Review
38.62 KB, patch
lth
: review+
Details | Diff | Splinter Review
32.90 KB, patch
h4writer
: review+
Details | Diff | Splinter Review
4.40 KB, patch
h4writer
: review+
Details | Diff | Splinter Review
5.00 KB, patch
h4writer
: review+
Details | Diff | Splinter Review
25.30 KB, patch
h4writer
: review+
Details | Diff | Splinter Review
16.47 KB, patch
h4writer
: review+
Details | Diff | Splinter Review
32.66 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
3.63 KB, patch
bbouvier
: review+
Details | Diff | Splinter Review
Wasm is introducing int64 values. As a result we need to be able to handle int64 values. This is already working on x64, since most instruction can just be ported to their 64 assembly that exist on x64. For x86 this is a bit harder. We need to work with 2x32bit operands and need have the same result, but using 32bit operations.
Attached patch WIP (obsolete) — Splinter Review
This is a preliminary patch. It implements the 64 bit instructions on x86. I think I covered them all, together with a bunch of typos. Next steps is to get those typos removed and to split them in reviewable patches, but most of the hard work is done.
Assignee: nobody → hv1989
Attached patch WIP (obsolete) — Splinter Review
Passes tests already. Still some work to do:
- I Need to find right semantics for unsigned truncate to int64.
- Polishing (move more into shared macroassemblers/codegen).
- Make sure it passes try
- Split into multiple patches.
Attachment #8761623 - Attachment is obsolete: true
(In reply to Hannes Verschore [:h4writer] from comment #2)
> Passes tests already. Still some work to do:
...
> - Polishing (move more into shared macroassemblers/codegen).

The less that goes in codegen the better: I'm just going to have to move it out again to make use of it from the baseline compiler.

The first baseline patch moved a number of things from codegen to masm, and upcoming patches I have on bug 1278283 move even more.  Code that adds OOL code is hard to share, but otherwise it's useful if the masm layer can expose abstractions - even per-platform abstractions - that we can just call into.

(Sharing code that adds OOL code could be done by either using lambdas to add the OOL code or by abstracting the OOL management system, but I haven't investigated either of those yet.)
(In reply to Lars T Hansen [:lth] from comment #3)
> (In reply to Hannes Verschore [:h4writer] from comment #2)
> > Passes tests already. Still some work to do:
> ...
> > - Polishing (move more into shared macroassemblers/codegen).
> 
> The less that goes in codegen the better: I'm just going to have to move it
> out again to make use of it from the baseline compiler.

Oh I see. Though "move more into shared codegen" means having more codegen that don't need a specific per architecture visitXXX function, but a platform-independent visitXXX function. In most cases that means leveraging the functionality into the MacroAssemblers in order to have no platform-independent instructions in the visitXXX functions.
Therefore I think this is only helping you ;).
(In reply to Lars T Hansen [:lth] from comment #3)
> (In reply to Hannes Verschore [:h4writer] from comment #2)
> (Sharing code that adds OOL code could be done by either using lambdas to
> add the OOL code or by abstracting the OOL management system, but I haven't
> investigated either of those yet.)

I have no code that really improves the situation on OOL, sadly.
Attachment #8763674 - Attachment is obsolete: true
Attached patch Part 4: Implement LCompareI64 (obsolete) — Splinter Review
Attached patch Part 5: Implement LShiftI64 (obsolete) — Splinter Review
Comment on attachment 8766128 [details] [diff] [review]
Part 1: Make ionmonkey ready for int64 on 32bit systems

Review of attachment 8766128 [details] [diff] [review]:
-----------------------------------------------------------------

The initial int64 assumed an int64 can fit into a register in IonMonkey. This isn't the case on x86. This fixes that.
Attachment #8766128 - Flags: review?(jdemooij)
Comment on attachment 8766129 [details] [diff] [review]
Part 2: implement LAsmJSParameterI64

Review of attachment 8766129 [details] [diff] [review]:
-----------------------------------------------------------------

Defines LAsmJSParameterI64, which has INT64_PIECES as output defined
Attachment #8766129 - Flags: review?(jdemooij)
Comment on attachment 8766131 [details] [diff] [review]
Part 3: Implement LAsmJSReturnI64

Review of attachment 8766131 [details] [diff] [review]:
-----------------------------------------------------------------

Adds LAsmJSReturnI64 which takes the correct register/registers to put the values in.
Attachment #8766131 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8766132 [details] [diff] [review]
Part 4: Implement LCompareI64

Review of attachment 8766132 [details] [diff] [review]:
-----------------------------------------------------------------

- Changes LCompare64 to LCompareI64 which seems to correspond better to to all our other LXXXI64 instructions.
- Implements LCompareI64(AndBranch)
- Sadly due to "jumpToBlock" I couldn't uplift more into the MacroAssemblers.
Attachment #8766132 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8766133 [details] [diff] [review]
Part 5: Implement LShiftI64

Review of attachment 8766133 [details] [diff] [review]:
-----------------------------------------------------------------

Implements everything to make the shift code shared on all platforms. Now only x86 and x64, but should be easy now to have this on mips and arm too.
Attachment #8766133 - Flags: review?(bbouvier)
Comment on attachment 8766134 [details] [diff] [review]
Part 6: Implement LBitOpI64

Review of attachment 8766134 [details] [diff] [review]:
-----------------------------------------------------------------

Same but now for the bit operations
Attachment #8766134 - Flags: review?(bbouvier)
Comment on attachment 8766135 [details] [diff] [review]
Part 7: Implement LAddI64

Review of attachment 8766135 [details] [diff] [review]:
-----------------------------------------------------------------

Implementing LAddI64 as shared as possible by implementing add64 on x86 and x64. The intention is to also share this on arm/mips.
Attachment #8766135 - Flags: review?(lhansen)
Comment on attachment 8766136 [details] [diff] [review]
Part 8: Implement LSubI64

Review of attachment 8766136 [details] [diff] [review]:
-----------------------------------------------------------------

Same but for SubI
Attachment #8766136 - Flags: review?(lhansen)
Comment on attachment 8766137 [details] [diff] [review]
Part 9: Implement LMulI64

Review of attachment 8766137 [details] [diff] [review]:
-----------------------------------------------------------------

Implementing LMulI64. I was able to share the codegen, but the lowering is harder. X86 needs specific registers.
Attachment #8766137 - Flags: review?(jdemooij)
Comment on attachment 8766138 [details] [diff] [review]
Part 10: Implement LRotateI64

Review of attachment 8766138 [details] [diff] [review]:
-----------------------------------------------------------------

Added rotation functions in the MacroAssembler and made the codegen shared in IonMonkey (only x86, x64, hopefully arm, mips later too).
Attachment #8766138 - Flags: review?(luke)
Comment on attachment 8766139 [details] [diff] [review]
Part 11: Implement LAsmJSPassStackArgI64

Review of attachment 8766139 [details] [diff] [review]:
-----------------------------------------------------------------

Adds LAsmJSPassStackArgI64, since LAsmJSPassStackArg has only 1 output, while we need 2 on x86. The INT64_PIECES make sure we compile the right amount.
Attachment #8766139 - Flags: review?(luke)
Comment on attachment 8766140 [details] [diff] [review]
Part 12: Implement L(U)DivOrModI64

Review of attachment 8766140 [details] [diff] [review]:
-----------------------------------------------------------------

As a first implementation delegate this to a call. Div/Mod is generally assumed to be slow. If we need more performance, we can improve later.
Attachment #8766140 - Flags: review?(jdemooij)
Comment on attachment 8766141 [details] [diff] [review]
Part 13: Implement LAsmSelectI64

Review of attachment 8766141 [details] [diff] [review]:
-----------------------------------------------------------------

Adds LAsmSelectI64, which has an INT64_PIECES operand. Allowing us to have two regs inputs on x86 and one reg on x64 to contain the Int64.
Attachment #8766141 - Flags: review?(efaustbmo)
Comment on attachment 8766142 [details] [diff] [review]
Part 14: Implement LAsmReinterpret(From|To)I64

Review of attachment 8766142 [details] [diff] [review]:
-----------------------------------------------------------------

Implements the LIR LAsmReinterpret(From|To)I64 that already exists.
Attachment #8766142 - Flags: review?(efaustbmo)
Comment on attachment 8766144 [details] [diff] [review]
Part 15: Implement LExtendInt32ToInt64

Review of attachment 8766144 [details] [diff] [review]:
-----------------------------------------------------------------

Had to split this, since x86 uses cdq() to sign extend. Which uses specific registers.
(Shifting is also possible, but way slower).
Attachment #8766144 - Flags: review?(efaustbmo)
Comment on attachment 8766145 [details] [diff] [review]
Part 16: Implement LWrapInt64ToInt32

Review of attachment 8766145 [details] [diff] [review]:
-----------------------------------------------------------------

Implement the already existing LWrapInt64ToInt32
Attachment #8766145 - Flags: review?(efaustbmo)
Comment on attachment 8766146 [details] [diff] [review]
Part 17: Implement LPopcntI64

Review of attachment 8766146 [details] [diff] [review]:
-----------------------------------------------------------------

Again try to make codegen of LPopcntI64 platform independent
Attachment #8766146 - Flags: review?(lhansen)
Comment on attachment 8766147 [details] [diff] [review]
Part 18: Implement LClzI64 and LCtzI64

Review of attachment 8766147 [details] [diff] [review]:
-----------------------------------------------------------------

Sadly couldn't share the codegen, since on x86 we need to zero the upper bits which are in another register.
Attachment #8766147 - Flags: review?(bbouvier)
Comment on attachment 8766148 [details] [diff] [review]
Part 19: Implement LNotI64

Review of attachment 8766148 [details] [diff] [review]:
-----------------------------------------------------------------

Implement LNotI64 on x86
Attachment #8766148 - Flags: review?(bbouvier)
Comment on attachment 8766149 [details] [diff] [review]
Part 20: Implement LWasmTruncateToInt64

Review of attachment 8766149 [details] [diff] [review]:
-----------------------------------------------------------------

- Tried to leverage a good chunk into MacroAssembler hoping that will help when doing the arm part.
- Also renamed branchTruncateDouble to branchTruncateDoubleMaybeMod32, since that is what it actually does!
- Luke told me that a call to c++ would be good enough for a first iteration, but found a way that this wasn't needed. Not sure if it is the most optimal, though.
Attachment #8766149 - Flags: review?(sunfish)
Comment on attachment 8766150 [details] [diff] [review]
Part 21: Implement LInt64ToFloatingPoint

Review of attachment 8766150 [details] [diff] [review]:
-----------------------------------------------------------------

Now the other way around. Was easier ;)
Attachment #8766150 - Flags: review?(sunfish)
Comment on attachment 8766151 [details] [diff] [review]
Part 22: Implement LAsmJSCallI64

Review of attachment 8766151 [details] [diff] [review]:
-----------------------------------------------------------------

Add LAsmJSCallI64 for when we want to get int64 back from a call.
Comment on attachment 8766152 [details] [diff] [review]
Part 23: Implement LTestI64AndBranch

Review of attachment 8766152 [details] [diff] [review]:
-----------------------------------------------------------------

Last but not least.
Attachment #8766152 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8766153 [details] [diff] [review]
Part 24: Make Wasm ready for int64 on 32bit systems

Review of attachment 8766153 [details] [diff] [review]:
-----------------------------------------------------------------

Changes needed for wasm. To get it working with the 64bit operations on x86
Attachment #8766153 - Flags: review?(luke)
Comment on attachment 8766154 [details] [diff] [review]
Part 25: Some tests on things I got wrong the first time

Review of attachment 8766154 [details] [diff] [review]:
-----------------------------------------------------------------

Just some tests
Attachment #8766154 - Flags: review?(lhansen)
Attachment #8766151 - Flags: review?(nicolas.b.pierron)
Attachment #8766135 - Flags: review?(lhansen) → review+
Comment on attachment 8766136 [details] [diff] [review]
Part 8: Implement LSubI64

Review of attachment 8766136 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/x86/BaseAssembler-x86.h
@@ +47,5 @@
>      }
>  
> +    void sbbl_ir(int32_t imm, RegisterID dst)
> +    {
> +        spew("adcl       $%d, %s", imm, GPReg32Name(dst));

sbbl

@@ +59,5 @@
> +    }
> +
> +    void sbbl_rr(RegisterID src, RegisterID dst)
> +    {
> +        spew("adcl       %s, %s", GPReg32Name(src), GPReg32Name(dst));

sbbl
Attachment #8766136 - Flags: review?(lhansen) → review+
Comment on attachment 8766146 [details] [diff] [review]
Part 17: Implement LPopcntI64

Review of attachment 8766146 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MacroAssembler.h
@@ +860,5 @@
>      inline void ctz64(Register64 src, Register64 dest) DEFINED_ON(x64);
>  
>      // temp may be invalid only if the chip has the POPCNT instruction.
>      inline void popcnt32(Register src, Register dest, Register temp) DEFINED_ON(x86_shared);
> +    inline void popcnt64(Register64 src, Register64 dest, Register temp) DEFINED_ON(x86, x64);

My gut feeling is that it would be better for this API (and clz64 and ctz64) to have a 32-bit destination and to take an additional 32-bit temp on x86.  It's hard to back that up with any hard data, though, at this point, so I'm probably OK with keeping it as is.

::: js/src/jit/x86/MacroAssembler-x86-inl.h
@@ +552,5 @@
> +    // The tmp register is only needed if there is no native POPCNT.
> +
> +    MOZ_ASSERT(dest.low != tmp);
> +    MOZ_ASSERT(dest.high != tmp);
> +    MOZ_ASSERT(dest.low != dest.high);

I don't think the following code will work if tmp aliases src.high either, do you want to assert that?

IMO we should probably just require tmp not to alias either src or dest and for src and dest not to alias each other.  What is the justification for allowing the latter aliasing below?

@@ +563,5 @@
> +        popcnt32(src.high, dest.low, tmp);
> +    }
> +    addl(dest.high, dest.low);
> +    xorl(dest.high, dest.high);
> +    return;

Redundant return.
Attachment #8766146 - Flags: review?(lhansen) → review+
Attachment #8766131 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8766132 [details] [diff] [review]
Part 4: Implement LCompareI64

Review of attachment 8766132 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/x86-shared/Assembler-x86-shared.cpp
@@ +224,5 @@
> +    }
> +}
> +
> +AssemblerX86Shared::Condition
> +AssemblerX86Shared::ConditionWithoutEqual(Condition cond)

nit: what do you think of  StrictCondition / StrictInequalityCondition?

@@ +239,5 @@
> +        return GreaterThan;
> +      case Above:
> +      case AboveOrEqual:
> +        return Above;
> +        return AboveOrEqual;

nit: copy and pasta? let's remove the second return ;)

::: js/src/jit/x86/CodeGenerator-x86.cpp
@@ +1192,5 @@
> +        Register64 rhsRegs = ToRegister64(rhs);
> +        masm.branch64(condition, lhsRegs, rhsRegs, &done);
> +    }
> +
> +    masm.bind(&falseBranch);

nit: falseBranch label is bound but never used, we can remove it.

@@ +1199,5 @@
> +    masm.bind(&done);
> +}
> +
> +void
> +CodeGeneratorX86::visitCompareI64AndBranch(LCompareI64AndBranch* lir)

comment: It seems to me that the problem comes from a lack of expressivity of the macro assembler labels.  I guess we could make this part of the MacroAssembler, if we could annotate the labels with Normal / FallThrough / InterruptBackedge.
Attachment #8766132 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8766154 [details] [diff] [review]
Part 25: Some tests on things I got wrong the first time

Review of attachment 8766154 [details] [diff] [review]:
-----------------------------------------------------------------

Generally seems OK but there are enough open issues here to perhaps have a second round.

::: js/src/jsapi-tests/testJitMacroAssembler.cpp
@@ +142,5 @@
> +    TEST(0, 0);
> +    TEST(0x0.0000000000001p-1022, 0);
> +    TEST(1, 1);
> +    TEST(9223372036854774784.0, 9223372036854774784);
> +    TEST((uint64_t)0x8000000000000000, 0x8000000000000000);

Don't these constants need to be ULL for the casts to be effective on a 32-bit-int platform?

@@ +288,5 @@
> +    allRegs.take(shift);
> +#elif defined(JS_CODEGEN_X64)
> +    Register shift = rcx;
> +    allRegs.take(shift);
> +#elif defined(JS_CODEGEN_X64)

This case is never reached.

@@ +315,5 @@
> +    {                                                                                       \
> +        Label next;                                                                         \
> +        masm.move64(Imm64(INPUT), input);                                                   \
> +        masm.move32(Imm32(SHIFT), shift);                                                   \
> +        masm.lshift64(shift, input);                                                        \

Presumably this should use an immediate shift, not one in a register, since you're already testing the in-register case above?  Ditto for the two cases below (right shift / right shift arithmetic).

@@ +331,5 @@
> +    TEST(0, -1, 0xffffffffffffffff);
> +    TEST(1, -1, 0xfffffffffffffffe);
> +    TEST(2, -1, 0xfffffffffffffffc);
> +    TEST(32, -1, 0xffffffff00000000);
> +    TEST(-1, 1, 0x8000000000000000);

Hm...  This makes sense when the shift count is in a register of course, I'd be much happier if the assembler would assert for !(0 <= shift <= 63), for immediate shifts (which are not getting tested, see previous comments).  That's what happens for 32-bit shifts.

Still, that said, I'd be happier to see 0xffffffff here than -1.
Attachment #8766154 - Flags: review?(lhansen)
Comment on attachment 8766151 [details] [diff] [review]
Part 22: Implement LAsmJSCallI64

Review of attachment 8766151 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/Lowering.cpp
@@ +4187,4 @@
>      if (ins->type() == MIRType::None)
>          add(lir, ins);
>      else
>          defineReturn(lir, ins);

Which patch modifies defineReturn to account for Int64 / UInt64 MIRTypes to map them to ReturnReg64?

::: js/src/jit/shared/LIR-shared.h
@@ +8167,5 @@
> +};
> +
> +class LAsmJSCallI64 : public LAsmJSCallBase
> +{
> +    LDefinition defs_[INT64_PIECES];

This patch sounds fine, even if it is a bit verbose.

If we were to store numDefs_ like we do for numOperands_, we could always allocate a vector of 2 LDefinitions for all LAsmJSCall, which would simplify this patch, but this can be done as a follow-up if needs be.
Attachment #8766151 - Flags: review?(nicolas.b.pierron) → review+
Attachment #8766152 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8766149 [details] [diff] [review]
Part 20: Implement LWasmTruncateToInt64

Review of attachment 8766149 [details] [diff] [review]:
-----------------------------------------------------------------

x64 can do a cvttsd2si with a 64-bit integer result, so it doesn't need the fistp/fisttp code, however that's an optimization that we can do later.

::: js/src/asmjs/WasmBaselineCompile.cpp
@@ +2746,5 @@
>              addOutOfLineCode(new (alloc_) OutOfLineTruncateF32OrF64ToI32(AnyReg(src),
>                                                                           dest));
>          if (!ool)
>              return false;
> +        // FIXME: This shouldn't do mod32??

No question marks needed; it shouldn't do mod32 :-).

::: js/src/jit/MacroAssembler.h
@@ +953,3 @@
>          DEFINED_ON(arm, arm64, mips_shared, x86, x64);
> +    inline void branchTruncateDoubleMaybeMod32(FloatRegister src, Register dest, Label* fail)
> +        DEFINED_ON(arm, arm64, mips_shared, x86, x64);

I'm not sure which patch introduces the "MaybeMod32" name, but is this actually meant to be modulo 2**32, rather than modulo 32?

Also, the comment above is being deleted here, but so there no longer appears to be a comment describing what "branchTruncateDoubleMaybeMod32" does now. Does it guarantee to perform the branch if overflow occurs? If so, then it seems like it isn't ever going to be doing a modulo.

::: js/src/jit/x64/MacroAssembler-x64-inl.h
@@ +716,5 @@
>      uint64_t magic = MagicValue(why).asRawBits();
>      cmpPtr(valaddr, ImmWord(magic));
>      j(cond, label);
>  }
> +// ========================================================================

This "====" comment line isn't present in the x86 version of this code. Since the surrounding code here is otherwise quite similar between x86 and x64, it'd be good to keep extraneous details like this in sync too.

::: js/src/jit/x86-shared/MacroAssembler-x86-shared-inl.h
@@ +1090,5 @@
> +    fldcw(Operand(esp, 1*sizeof(int32_t)));
> +
> +    // Load double on fp stack, convert and load regular stack.
> +    fld(Operand(src));
> +    fistp(Operand(dest));

Could this load or store trap, and could this leave the control word in a non-default state?

::: js/src/jit/x86/LOpcodes-x86.h
@@ +17,5 @@
>      _(UDivOrMod)                \
>      _(UDivOrModConstant)        \
>      _(UDivOrModI64)             \
> +    _(DivOrModI64)              \
> +    _(WasmTruncateToInt64)

For consistency with the previous operators, should this be WasmTruncateToI64?
Attachment #8766149 - Flags: review?(sunfish) → review+
Attachment #8766150 - Flags: review?(sunfish) → review+
Comment on attachment 8766128 [details] [diff] [review]
Part 1: Make ionmonkey ready for int64 on 32bit systems

Review of attachment 8766128 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for doing this.

::: js/src/jit/LIR.h
@@ +340,5 @@
>  #endif
>  };
>  
>  using LInt64Allocation = LInt64Value<LAllocation>;
> +using LInt64AllocationPtr = LInt64Value<LAllocation*>;

Is this because we sometimes need to modify the allocations? I'd prefer to use LInt64Allocation where we can: LAllocation fits in a pointer so there's no perf difference, and it makes a copy so it's clear we don't modify the original value. If we use |const LInt64Allocation&| as argument it's even more clear/efficient.

@@ +467,5 @@
>      // This should be kept in sync with LIR.cpp's TypeChars.
>      enum Type {
>          GENERAL,      // Generic, integer or pointer-width data (GPR).
>          INT32,        // int32 data (GPR).
> +        INT64,        // int64 data (GPR).

I was hoping we could use 1 (64-bit) or 2 (32-bit) GENERAL defs, and we shouldn't call LDefinition::TypeFrom on 32-bit with MIRType::Int64.

Is that not possible? On 32-bit it's a bit confusing to see LDefinition::INT64 because there are two defs.

::: js/src/jit/x64/Lowering-x64.cpp
@@ +119,5 @@
>  LIRGeneratorX64::lowerUntypedPhiInput(MPhi* phi, uint32_t inputPosition, LBlock* block, size_t lirIndex)
>  {
>      lowerTypedPhiInput(phi, inputPosition, block, lirIndex);
>  }
> + 

Nit: trailing whitespace

::: js/src/jit/x86-shared/CodeGenerator-x86-shared.h
@@ +8,5 @@
>  #define jit_x86_shared_CodeGenerator_x86_shared_h
>  
>  #include "jit/shared/CodeGenerator-shared.h"
>  
> +#include "jit/shared/CodeGenerator-shared-inl.h"

The style checker will complain about #including an 'inline' file in a non-inline header file. Maybe just move the definitions below into the .cpp file for now, like ToOperand?

@@ +122,5 @@
>      Operand ToOperand(const LAllocation& a);
>      Operand ToOperand(const LAllocation* a);
>      Operand ToOperand(const LDefinition* def);
>  
> +#if JS_BITS_PER_WORD == 32

Maybe use CodeGenerator-x86.h/CodeGenerator-x64.h?
Attachment #8766128 - Flags: review?(jdemooij) → review+
Attachment #8766129 - Flags: review?(jdemooij) → review+
Comment on attachment 8766137 [details] [diff] [review]
Part 9: Implement LMulI64

Review of attachment 8766137 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/x86/Lowering-x86.cpp
@@ +211,5 @@
> +void
> +LIRGeneratorX86::lowerForALUInt64(LInstructionHelper<INT64_PIECES, 2 * INT64_PIECES, 1>* ins,
> +                                  MDefinition* mir, MDefinition* lhs, MDefinition* rhs)
> +{
> +    MOZ_ASSERT(mir->isMul());

Most of this is mul specific, so I think we should call this lowerMul64 (like lowerDivI64), and change the signature to take MMul*, LMul*.
Attachment #8766137 - Flags: review?(jdemooij) → review+
Comment on attachment 8766140 [details] [diff] [review]
Part 12: Implement L(U)DivOrModI64

Review of attachment 8766140 [details] [diff] [review]:
-----------------------------------------------------------------

Very nice.

::: js/src/asmjs/WasmTypes.cpp
@@ +164,5 @@
> +    int64_t x = ((uint64_t)x_hi << 32) + x_lo;
> +    int64_t y = ((uint64_t)y_hi << 32) + y_lo;
> +    MOZ_ASSERT(x != INT64_MIN || y != -1);
> +    MOZ_ASSERT(y != 0);
> +    return x/y;

Nit: add a space before and after the / (and below)

::: js/src/jit/x86/Lowering-x86.cpp
@@ +513,5 @@
> +    defineReturn(lir, mod);
> +}
> +
> +void
> +LIRGeneratorX86::lowerUDivI64(MDiv* div) 

Nit: trailing whitespace here and below.
Attachment #8766140 - Flags: review?(jdemooij) → review+
Comment on attachment 8766138 [details] [diff] [review]
Part 10: Implement LRotateI64

Review of attachment 8766138 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great
Attachment #8766138 - Flags: review?(luke) → review+
Comment on attachment 8766139 [details] [diff] [review]
Part 11: Implement LAsmJSPassStackArgI64

Review of attachment 8766139 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/x86-shared/CodeGenerator-x86-shared.cpp
@@ +310,1 @@
>      }

No {} around then/else
Attachment #8766139 - Flags: review?(luke) → review+
Comment on attachment 8766153 [details] [diff] [review]
Part 24: Make Wasm ready for int64 on 32bit systems

Review of attachment 8766153 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/asmjs/Wasm.cpp
@@ +66,2 @@
>  {
> +#if defined(JS_CPU_X64) or defined(JS_CODEGEN_X86)

\o/

::: js/src/asmjs/WasmStubs.cpp
@@ +217,5 @@
> +              case MIRType::Int64: {
> +#if JS_BITS_PER_WORD == 32
> +                masm.load32(Address(src.base, src.offset + INT64LOW_OFFSET), scratch);
> +                masm.store32(scratch,
> +                    Address(masm.getStackPointer(), iter->offsetFromArgBase() + INT64LOW_OFFSET));

ubernit: could you have a "Register sp = masm.getStackPointer();" so that you can use "sp" in these stores which should then fit on one line (which looks a bit nicer)?
Attachment #8766153 - Flags: review?(luke) → review+
(In reply to Lars T Hansen [:lth] from comment #59)
> ::: js/src/jsapi-tests/testJitMacroAssembler.cpp
> @@ +142,5 @@
> > +    TEST(0, 0);
> > +    TEST(0x0.0000000000001p-1022, 0);
> > +    TEST(1, 1);
> > +    TEST(9223372036854774784.0, 9223372036854774784);
> > +    TEST((uint64_t)0x8000000000000000, 0x8000000000000000);
> 
> Don't these constants need to be ULL for the casts to be effective on a
> 32-bit-int platform?

Imm64 takes a uint64. I assume that is the reason why this just works.

> @@ +315,5 @@
> > +    {                                                                                       \
> > +        Label next;                                                                         \
> > +        masm.move64(Imm64(INPUT), input);                                                   \
> > +        masm.move32(Imm32(SHIFT), shift);                                                   \
> > +        masm.lshift64(shift, input);                                                        \
> 
> Presumably this should use an immediate shift, not one in a register, since
> you're already testing the in-register case above?  Ditto for the two cases
> below (right shift / right shift arithmetic).

Correct.

> @@ +331,5 @@
> > +    TEST(0, -1, 0xffffffffffffffff);
> > +    TEST(1, -1, 0xfffffffffffffffe);
> > +    TEST(2, -1, 0xfffffffffffffffc);
> > +    TEST(32, -1, 0xffffffff00000000);
> > +    TEST(-1, 1, 0x8000000000000000);
> 
> Hm...  This makes sense when the shift count is in a register of course, I'd
> be much happier if the assembler would assert for !(0 <= shift <= 63), for
> immediate shifts (which are not getting tested, see previous comments). 
> That's what happens for 32-bit shifts.
> 
> Still, that said, I'd be happier to see 0xffffffff here than -1.

Fixed. I added asserts to only take 0 - 63.

I also made branch64 with Address work with Equal condition. I'll do the same for arm in that patches queue. For now only x86 and x64. I use it in one of my tests.
Attachment #8766154 - Attachment is obsolete: true
Attachment #8767141 - Flags: review?(lhansen)
Comment on attachment 8767141 [details] [diff] [review]
Part 25: Some tests on things I got wrong the first time

Review of attachment 8767141 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM.

::: js/src/jit/MacroAssembler.h
@@ +900,5 @@
>          DEFINED_ON(x86, x64);
>      inline void branch64(Condition cond, Register64 lhs, Register64 rhs, Label* label)
>          DEFINED_ON(x86, x64);
> +    // On x86 and x64 NotEqual and Equal conditions are allowed for the brancht64 variants
> +    // with Address as lhs. On others only the NotEqual condition.

Typo, "brancht64".

::: js/src/jit/arm/MacroAssembler-arm-inl.h
@@ +387,5 @@
>  
>  void
>  MacroAssembler::lshiftPtr(Imm32 imm, Register dest)
>  {
> +    MOZ_ASSERT(0 <= imm.value && imm.value < 32);

Technically this is redundant because ma_lsl asserts it too (at least I tripped over that), but you don't have to remove it on my account, I'm just mentioning it.

::: js/src/jsapi-tests/testJitMacroAssembler.cpp
@@ +184,5 @@
> +        Label next;                                                                         \
> +        masm.loadConstantDouble(double(INPUT), input);                                      \
> +        masm.storeDouble(input, Operand(esp, 0));                                           \
> +        if (!!OUTPUT) {                                                                     \
> +            masm.branchDoubleNotInInt64Range(Address(esp, 0), temp, &next);                 \

I don't thknk the "!!" is needed here, after all OUTPUT is always a boolean.  Also below.
Attachment #8767141 - Flags: review?(lhansen) → review+
Comment on attachment 8766133 [details] [diff] [review]
Part 5: Implement LShiftI64

Review of attachment 8766133 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the patch. Double-check me, but there might be a few bugs; plus the lshiftArithmetic don't seem very useful. I'd like to see another version of this patch.

::: js/src/jit/MacroAssembler.h
@@ +794,3 @@
>      inline void lshiftPtr(Imm32 imm, Register dest) PER_ARCH;
> +    inline void lshift64(Imm32 imm, Register64 dest) PER_ARCH;
> +    inline void lshift64(Register shift, Register64 srcDest) DEFINED_ON(x86,x64);

nit: spaces before and after comma in the DEFINED_ON macro

@@ +797,4 @@
>  
> +    inline void lshift32Arithmetic(Register shift, Register srcDest) DEFINED_ON(x86_shared);
> +    inline void lshift64Arithmetic(Imm32 imm, Register64 srcDest);
> +    inline void lshift64Arithmetic(Register shift, Register64 srcDest) DEFINED_ON(x86_shared);

Since these are just renamings for lshift32/lshift64, can you just remove those and just use lshift32/lshift64 instead?

@@ +803,4 @@
>      inline void rshiftPtr(Imm32 imm, Register dest) PER_ARCH;
>      inline void rshiftPtr(Imm32 imm, Register src, Register dest) DEFINED_ON(arm64);
> +    inline void rshift64(Imm32 imm, Register64 dest) PER_ARCH;
> +    inline void rshift64(Register shift, Register64 srcDest) DEFINED_ON(x86,x64);

nit: spaces, again

::: js/src/jit/shared/LIR-shared.h
@@ +3274,5 @@
>        : op_(op)
>      { }
>  
> +    static const size_t Lhs = 0;
> +    static const size_t Rhs = INT64_PIECES;

Why not simply two accessors that return the right values?

::: js/src/jit/x64/MacroAssembler-x64-inl.h
@@ +258,5 @@
> +void
> +MacroAssembler::lshift64(Register shift, Register64 srcDest)
> +{
> +   MOZ_ASSERT(shift == rcx);
> +   shlq_cl(srcDest.reg);

nit: indentation is wrong (3 spaces)

@@ +277,5 @@
> +void
> +MacroAssembler::rshift64(Register shift, Register64 srcDest)
> +{
> +   MOZ_ASSERT(shift == rcx);
> +   shrq_cl(srcDest.reg);

nit: indentation!

@@ +292,2 @@
>  {
> +   rshiftPtrArithmetic(imm, dest.reg);

nit: indentation!

@@ +295,5 @@
> +
> +void
> +MacroAssembler::rshift64Arithmetic(Register shift, Register64 srcDest)
> +{
> +   MOZ_ASSERT(shift == rcx);

nit: indentation!

::: js/src/jit/x86-shared/BaseAssembler-x86-shared.h
@@ +1465,5 @@
>      }
>  
> +    void shrdl_CLr(RegisterID src, RegisterID dst)
> +    {
> +        spew("shrdl      %%cl, %s, %s", GPReg32Name(dst), GPReg32Name(src));

We generally put the dest at the right-most position. Can you fix this here and below?

::: js/src/jit/x86/MacroAssembler-x86-inl.h
@@ +281,5 @@
> +
> +    shldl_cl(srcDest.low, srcDest.high);
> +    shll_cl(srcDest.low);
> +
> +    testl(Imm32(0x20), ecx);

I think you need to test against 0x30 to have ecx ready for the (missing) left-shift of the high bits.

@@ +286,5 @@
> +    j(Condition::Equal, &done);
> +
> +    // 32 - 63 bit shift
> +    movl(srcDest.low, srcDest.high);
> +    xorl(srcDest.low, srcDest.low);

I think an operation is missing before after the movl for the high counterparts: we still need to shift the high bits. Can you add a test case for this, please?

@@ +326,5 @@
> +    j(Condition::Equal, &done);
> +
> +    // 32 - 63 bit shift
> +    movl(srcDest.high, srcDest.low);
> +    xorl(srcDest.high, srcDest.high);

Same remarks as in lshift64?

@@ +366,5 @@
> +    j(Condition::Equal, &done);
> +
> +    // 32 - 63 bit shift
> +    movl(srcDest.high, srcDest.low);
> +    sarl(Imm32(0x1f), srcDest.high);

Same remarks as in lshift64.
Attachment #8766133 - Flags: review?(bbouvier)
Comment on attachment 8766135 [details] [diff] [review]
Part 7: Implement LAddI64

Review of attachment 8766135 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MacroAssembler.h
@@ +750,5 @@
>  
>      inline void add64(Register64 src, Register64 dest) PER_ARCH;
>      inline void add64(Imm32 imm, Register64 dest) PER_ARCH;
> +    inline void add64(Imm64 imm, Register64 dest) DEFINED_ON(x86, x64);
> +    inline void add64(const Operand& src, Register64 dest) DEFINED_ON(x64);

Is it expected that this variant is not implemented on x86?
Comment on attachment 8766134 [details] [diff] [review]
Part 6: Implement LBitOpI64

Review of attachment 8766134 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, looks good. Why don't we need to implement the Operand variants?

::: js/src/jit/MacroAssembler.h
@@ +727,5 @@
>      inline void xorPtr(Imm32 imm, Register dest) PER_ARCH;
>  
> +    inline void and64(const Operand& src, Register64 dest) DEFINED_ON(x64);
> +    inline void or64(const Operand& src, Register64 dest) DEFINED_ON(x64);
> +    inline void xor64(const Operand& src, Register64 dest) DEFINED_ON(x64);

Why are these only on x64? Will regalloc make sure we never use an operand/register for src on x86?

::: js/src/jit/x86/MacroAssembler-x86-inl.h
@@ +47,5 @@
>  
>  void
>  MacroAssembler::and64(Imm64 imm, Register64 dest)
>  {
> +    if (imm.low().value != (int32_t)0xFFFFFFFF)

Here and below, can you use C++ casts: int32_t(0xffffffff)

@@ +65,5 @@
>  
>  void
>  MacroAssembler::xor64(Imm64 imm, Register64 dest)
>  {
> +    xorl(imm.low(), dest.low);

The neutral element of xor is also 0, so you might want to check for that value before doing the xor.
Attachment #8766134 - Flags: review?(bbouvier) → review+
Comment on attachment 8766147 [details] [diff] [review]
Part 18: Implement LClzI64 and LCtzI64

Review of attachment 8766147 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thanks.

::: js/src/jit/MacroAssembler.h
@@ +856,5 @@
>      inline void clz32(Register src, Register dest, bool knownNotZero) DEFINED_ON(x86_shared);
>      inline void ctz32(Register src, Register dest, bool knownNotZero) DEFINED_ON(x86_shared);
>  
> +    inline void clz64(Register64 src, Register dest) DEFINED_ON(x86, x64);
> +    inline void ctz64(Register64 src, Register dest) DEFINED_ON(x86, x64);

If the `dest` register was an Register64 too, you could have the zeroing of high bits to be done in the masm function directly, providing the opportunity to share the codegen code between x64 and x86. I think it makes sense since it's a 64 bits operation, one can expect the input and output to be 64 bits registers (even if we know the high bits to be 0).

::: js/src/jit/x64/CodeGenerator-x64.cpp
@@ +1124,5 @@
>  
>  void
>  CodeGeneratorX64::visitClzI64(LClzI64* lir)
>  {
> +    Register64 input = ToRegister64(lir->getInt64Operand(0));

(follow up) how about making an input64() next to the input() helper in LInstruction?
Attachment #8766147 - Flags: review?(bbouvier) → review+
Comment on attachment 8766148 [details] [diff] [review]
Part 19: Implement LNotI64

Review of attachment 8766148 [details] [diff] [review]:
-----------------------------------------------------------------

Nice, thanks!
Attachment #8766148 - Flags: review?(bbouvier) → review+
Attached patch Part 5: Implement LShiftI64 (obsolete) — Splinter Review
(In reply to Benjamin Bouvier [:bbouvier] from comment #70)
> Comment on attachment 8766133 [details] [diff] [review]
> Part 5: Implement LShiftI64
> 
> Review of attachment 8766133 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thanks for the patch. Double-check me, but there might be a few bugs; plus
> the lshiftArithmetic don't seem very useful. I'd like to see another version
> of this patch.

I did it for symmetry. Not really needed indeed. Removed.

> ::: js/src/jit/shared/LIR-shared.h
> @@ +3274,5 @@
> >        : op_(op)
> >      { }
> >  
> > +    static const size_t Lhs = 0;
> > +    static const size_t Rhs = INT64_PIECES;
> 
> Why not simply two accessors that return the right values?

The default way is to have these constants. Since sometimes we need to access the indexes during lowering. Which having only accessors don't help with. Having only these constants do solve this.
Because of this I didn't add accessors. Some LIRs do have this, but most not. I could add them if you insist.

> ::: js/src/jit/x86-shared/BaseAssembler-x86-shared.h
> @@ +1465,5 @@
> >      }
> >  
> > +    void shrdl_CLr(RegisterID src, RegisterID dst)
> > +    {
> > +        spew("shrdl      %%cl, %s, %s", GPReg32Name(dst), GPReg32Name(src));
> 
> We generally put the dest at the right-most position. Can you fix this here
> and below?

Check

> ::: js/src/jit/x86/MacroAssembler-x86-inl.h
> @@ +281,5 @@
> > +
> > +    shldl_cl(srcDest.low, srcDest.high);
> > +    shll_cl(srcDest.low);
> > +
> > +    testl(Imm32(0x20), ecx);
> 
> I think you need to test against 0x30 to have ecx ready for the (missing)
> left-shift of the high bits.

The code is actually correct. Shifting is decomposable.
Shifting with a shift in the 0x3f range is simulated by:
- first shift value with 'shift & 0x1f'
- secondly shift value with 'shift & 0x20'

1) Shift input with "shift & 0x1f"

- shldl_cl(srcDest.low, srcDest.high);
- shll_cl(srcDest.low);

2) If the shift was in the 0x1f range, we are done.

- testl(Imm32(0x20), ecx);
- j(Condition::Equal, &done);

3) Shift input with "shift & 0x20" (which is shifting by exactly 32 bits)

(32bits shift is moving low register to high register and zero filling the low register)
-  movl(srcDest.low, srcDest.high);  
-  xorl(srcDest.low, srcDest.low);


> @@ +286,5 @@
> > +    j(Condition::Equal, &done);
> > +
> > +    // 32 - 63 bit shift
> > +    movl(srcDest.low, srcDest.high);
> > +    xorl(srcDest.low, srcDest.low);
> 
> I think an operation is missing before after the movl for the high
> counterparts: we still need to shift the high bits. Can you add a test case
> for this, please?

Same trick as explained above.

> @@ +326,5 @@
> > +    j(Condition::Equal, &done);
> > +
> > +    // 32 - 63 bit shift
> > +    movl(srcDest.high, srcDest.low);
> > +    xorl(srcDest.high, srcDest.high);
> 
> Same remarks as in lshift64?

Same trick as explained above.

> @@ +366,5 @@
> > +    j(Condition::Equal, &done);
> > +
> > +    // 32 - 63 bit shift
> > +    movl(srcDest.high, srcDest.low);
> > +    sarl(Imm32(0x1f), srcDest.high);
> 
> Same remarks as in lshift64.

Same trick as explained above.

In part 25 I have macroassembler tests for this. Since it was tricky to get them right the first time.
Attachment #8766133 - Attachment is obsolete: true
Attachment #8767681 - Flags: review?(bbouvier)
Comment on attachment 8767681 [details] [diff] [review]
Part 5: Implement LShiftI64

Review of attachment 8767681 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the explanation! Indeed I missed the fact that the shift in the low bits was already done just before, which made the whole shift work. Perhaps could be useful as comments in the masm x86 code? Thanks for the updated patch.

::: js/src/jit/x86/MacroAssembler-x86-inl.h
@@ +276,5 @@
> +MacroAssembler::lshift64(Register shift, Register64 srcDest)
> +{
> +    MOZ_ASSERT(shift == ecx);
> +
> +    Label done, shiftAbove32;

In this function and the two others below, shiftAbove32 is unused.

@@ +279,5 @@
> +
> +    Label done, shiftAbove32;
> +
> +    shldl_cl(srcDest.low, srcDest.high);
> +    shll_cl(srcDest.low);

I haven't seen any changes to lowering, so this remark might be stall if this is in another patch: but if srcDest.high == ecx, then this code might be wrong. The use of ecx is marked as useAtStart here, so I guess this could be the case that srcDest.high == ecx. Can you either MOZ_ASSERT this isn't the case, or change the code here? (applies to the two other functions below too)
Attachment #8767681 - Flags: review?(bbouvier) → review+
(In reply to Jan de Mooij [:jandem] from comment #62)
> ::: js/src/jit/LIR.h
> @@ +340,5 @@
> >  #endif
> >  };
> >  
> >  using LInt64Allocation = LInt64Value<LAllocation>;
> > +using LInt64AllocationPtr = LInt64Value<LAllocation*>;
> 
> Is this because we sometimes need to modify the allocations? I'd prefer to
> use LInt64Allocation where we can: LAllocation fits in a pointer so there's
> no perf difference, and it makes a copy so it's clear we don't modify the
> original value. If we use |const LInt64Allocation&| as argument it's even
> more clear/efficient.

We cannot make it a pointer since we create and return it in getInt64Operand. Therefore I now changed it in "const LInt64Allocation". Like you said it is equally slim as the LInt64AllocationPtr variant.

> 
> @@ +467,5 @@
> >      // This should be kept in sync with LIR.cpp's TypeChars.
> >      enum Type {
> >          GENERAL,      // Generic, integer or pointer-width data (GPR).
> >          INT32,        // int32 data (GPR).
> > +        INT64,        // int64 data (GPR).
> 
> I was hoping we could use 1 (64-bit) or 2 (32-bit) GENERAL defs, and we
> shouldn't call LDefinition::TypeFrom on 32-bit with MIRType::Int64.
> 
> Is that not possible? On 32-bit it's a bit confusing to see
> LDefinition::INT64 because there are two defs.

I copied the MIRType::Value to LDefinition::BOX approach here. Where on 64-bit we would generate a INT64 for MIRType::Int64. (I should have made that a little bit more clear by ifdef this to 64-bit machines). Though in contrast to the BOX approach there is no special casing needed for INT64. I updated to return GENERAL now on 64-bit now.

> ::: js/src/jit/x86-shared/CodeGenerator-x86-shared.h
> @@ +8,5 @@
> >  #define jit_x86_shared_CodeGenerator_x86_shared_h
> >  
> >  #include "jit/shared/CodeGenerator-shared.h"
> >  
> > +#include "jit/shared/CodeGenerator-shared-inl.h"
> 
> The style checker will complain about #including an 'inline' file in a
> non-inline header file. Maybe just move the definitions below into the .cpp
> file for now, like ToOperand?
> 
> @@ +122,5 @@
> >      Operand ToOperand(const LAllocation& a);
> >      Operand ToOperand(const LAllocation* a);
> >      Operand ToOperand(const LDefinition* def);
> >  
> > +#if JS_BITS_PER_WORD == 32
> 
> Maybe use CodeGenerator-x86.h/CodeGenerator-x64.h?

Done.
Attachment #8766128 - Attachment is obsolete: true
Attachment #8767894 - Flags: review+
Attachment #8767894 - Attachment description: bugxxx-ion-preparations-for-i64 → Part 1: Make ionmonkey ready for int64 on 32bit systems
Attachment #8766129 - Attachment is obsolete: true
Attachment #8767898 - Flags: review+
Attachment #8766131 - Attachment is obsolete: true
(In reply to Nicolas B. Pierron [:nbp] from comment #58)
> Comment on attachment 8766132 [details] [diff] [review]
> Part 4: Implement LCompareI64
> 
> Review of attachment 8766132 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/x86-shared/Assembler-x86-shared.cpp
> @@ +224,5 @@
> > +    }
> > +}
> > +
> > +AssemblerX86Shared::Condition
> > +AssemblerX86Shared::ConditionWithoutEqual(Condition cond)
> 
> nit: what do you think of  StrictCondition / StrictInequalityCondition?

Not sure about it. To me it looks like it could cause some confusion with strict equality. Therefore I'm not really in favor of it. Now I'm not a native English speaker and maybe this is the term for it? I currently left it as is.

> 
> @@ +239,5 @@
> > +        return GreaterThan;
> > +      case Above:
> > +      case AboveOrEqual:
> > +        return Above;
> > +        return AboveOrEqual;
> 
> nit: copy and pasta? let's remove the second return ;)

Good call.

> ::: js/src/jit/x86/CodeGenerator-x86.cpp
> @@ +1192,5 @@
> > +        Register64 rhsRegs = ToRegister64(rhs);
> > +        masm.branch64(condition, lhsRegs, rhsRegs, &done);
> > +    }
> > +
> > +    masm.bind(&falseBranch);
> 
> nit: falseBranch label is bound but never used, we can remove it.

Yes, indeed.

> @@ +1199,5 @@
> > +    masm.bind(&done);
> > +}
> > +
> > +void
> > +CodeGeneratorX86::visitCompareI64AndBranch(LCompareI64AndBranch* lir)
> 
> comment: It seems to me that the problem comes from a lack of expressivity
> of the macro assembler labels.  I guess we could make this part of the
> MacroAssembler, if we could annotate the labels with Normal / FallThrough /
> InterruptBackedge.

Yeah there is an annoyance here. Though I don't think labels are the culprit. The issue is due to skipping of trivial blocks and the way backedges are handled. When doing arm I might want to do that already to remove the code duplication.
Attachment #8766132 - Attachment is obsolete: true
Attachment #8767905 - Flags: review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #76)
> ::: js/src/jit/x86/MacroAssembler-x86-inl.h
> @@ +276,5 @@
> > +MacroAssembler::lshift64(Register shift, Register64 srcDest)
> > +{
> > +    MOZ_ASSERT(shift == ecx);
> > +
> > +    Label done, shiftAbove32;
> 
> In this function and the two others below, shiftAbove32 is unused.

Oh indeed.

> @@ +279,5 @@
> > +
> > +    Label done, shiftAbove32;
> > +
> > +    shldl_cl(srcDest.low, srcDest.high);
> > +    shll_cl(srcDest.low);
> 
> I haven't seen any changes to lowering, so this remark might be stall if
> this is in another patch: but if srcDest.high == ecx, then this code might
> be wrong. The use of ecx is marked as useAtStart here, so I guess this could
> be the case that srcDest.high == ecx. Can you either MOZ_ASSERT this isn't
> the case, or change the code here? (applies to the two other functions below
> too)

I don't think they can overlap. The inputs cannot overlap each other. So the shift and two input registers should be different. And we request the same input registers as output registers. That way they cannot overlap either. I did add the asserts in the MacroAssembler.
Attachment #8767681 - Attachment is obsolete: true
Attachment #8767909 - Flags: review+
Comment on attachment 8766141 [details] [diff] [review]
Part 13: Implement LAsmSelectI64

Review of attachment 8766141 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/shared/LIR-shared.h
@@ +7690,5 @@
>  {
>    public:
>      LIR_HEADER(AsmSelectI64);
>  
> +    static const size_t TrueExprIndex = 0;

It's a little weird, but is it better for TrueExprIndex to still live on LAsmSelectBase, so that it can be shared and we don't have to defineReuseInput(...,0)?
Attachment #8766141 - Flags: review?(efaustbmo) → review+
Attachment #8766142 - Flags: review?(efaustbmo) → review+
Comment on attachment 8766144 [details] [diff] [review]
Part 15: Implement LExtendInt32ToInt64

Review of attachment 8766144 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/Lowering-arm.cpp
@@ +807,5 @@
> +
> +void
> +LIRGeneratorARM::visitExtendInt32ToInt64(MExtendInt32ToInt64* ins)
> +{
> +    MOZ_CRASH("NY");

nit: NYI everywhere

::: js/src/jit/x86/CodeGenerator-x86.cpp
@@ +1435,5 @@
> +    if (lir->mir()->isUnsigned()) {
> +        masm.xorl(output.high, output.high);
> +    } else {
> +        MOZ_ASSERT(output.low == eax);
> +        MOZ_ASSERT(output.high == edx);

the cdq register pinning requirements are hilarious.
Attachment #8766144 - Flags: review?(efaustbmo) → review+
Attachment #8766145 - Flags: review?(efaustbmo) → review+
Priority: -- → P1
- moved the "masm.append(wasm::MemoryAccess(masm.size()));" inside the load/store functions, since the int64 variant needs two of those.
- renamed visitWasmLoadBase to emitWasmLoad, which is how we normally call such functions.
Attachment #8775160 - Flags: review?(bbouvier)
Comment on attachment 8775160 [details] [diff] [review]
Part 26: Implement LWasmLoadI64 and LWasmStoreI64

Review of attachment 8775160 [details] [diff] [review]:
-----------------------------------------------------------------

Only nits, so nothing preventing landing, I think. Thank you for doing that!

::: js/src/jit/shared/LIR-shared.h
@@ +7807,5 @@
>    public:
>      LIR_HEADER(WasmStore);
> +
> +    static const size_t PtrIndex = 0;
> +    static const size_t ValueIndex = 1;

Why these constants? LWasmStore doesn't handle int64 and all the uses of these two constants are internal to this class. Can you remove this change, please?

@@ +7830,5 @@
> +};
> +
> +class LWasmStoreI64 : public LInstructionHelper<0, INT64_PIECES + 1, 1>
> +{
> +

nit: blank line

::: js/src/jit/x64/CodeGenerator-x64.cpp
@@ +503,5 @@
>      }
>  }
>  
>  void
> +CodeGeneratorX64::loadI64(Scalar::Type type, const Operand& srcAddr, Register64 out)

Or make it just take a Register instead of a Register64, and the caller can pass ToOutRegister(ins).reg

::: js/src/jit/x64/CodeGenerator-x64.h
@@ +63,5 @@
>      void visitInt64ToFloatingPoint(LInt64ToFloatingPoint* lir);
>      void visitLoadTypedArrayElementStatic(LLoadTypedArrayElementStatic* ins);
>      void visitStoreTypedArrayElementStatic(LStoreTypedArrayElementStatic* ins);
> +    template <typename T>
> +    void emitWasmLoad(T* ins);

Could it be private instead?

::: js/src/jit/x86-shared/Lowering-x86-shared.h
@@ +46,5 @@
>      void lowerForBitAndAndBranch(LBitAndAndBranch* baab, MInstruction* mir,
>                                   MDefinition* lhs, MDefinition* rhs);
>      void visitAsmJSNeg(MAsmJSNeg* ins);
>      void visitWasmBoundsCheck(MWasmBoundsCheck* ins);
> +    void lowerWasmLoad(MWasmLoad* ins);

Could it be protected instead?

::: js/src/jit/x86/CodeGenerator-x86.cpp
@@ +328,5 @@
> +        } else {
> +            MOZ_ASSERT(srcAddr.kind() == Operand::MEM_REG_DISP);
> +            Register base = Register::FromCode(Register::Code(srcAddr.base()));
> +            Operand low(base, uint32_t(srcAddr.disp()) + INT64LOW_OFFSET);
> +            Operand high(base, uint32_t(srcAddr.disp()) + INT64HIGH_OFFSET);

Probably less trouble to use

Address addr = srcAddr.toAddress();

and then use addr.base and addr.offset directly.

@@ +330,5 @@
> +            Register base = Register::FromCode(Register::Code(srcAddr.base()));
> +            Operand low(base, uint32_t(srcAddr.disp()) + INT64LOW_OFFSET);
> +            Operand high(base, uint32_t(srcAddr.disp()) + INT64HIGH_OFFSET);
> +
> +            if (base != out.low) {

Ha, useRegisterAtStart can be tricky with int64...

@@ +336,5 @@
> +                masm.append(wasm::MemoryAccess(masm.size()));
> +                masm.movlWithPatch(high, out.high);
> +                masm.append(wasm::MemoryAccess(masm.size()));
> +            } else {
> +                MOZ_ASSERT(srcAddr.disp() != out.high.encoding());

I don't understand this assertion: the LHS is an offset, the RHS is the internal integer representation of a register. Am I missing something?

@@ +699,5 @@
> +        masm.movlWithPatch(input.high, high);
> +        masm.append(wasm::MemoryAccess(masm.size()));
> +    } else {
> +        MOZ_ASSERT(dstAddr.kind() == Operand::MEM_REG_DISP);
> +        Register base = Register::FromCode(Register::Code(dstAddr.base()));

Same comment as in load64.

@@ +709,5 @@
> +            masm.append(wasm::MemoryAccess(masm.size()));
> +            masm.movlWithPatch(input.high, high);
> +            masm.append(wasm::MemoryAccess(masm.size()));
> +        } else {
> +            MOZ_ASSERT(dstAddr.disp() != input.high.encoding());

and same question.

::: js/src/jit/x86/CodeGenerator-x86.h
@@ +62,5 @@
>      void emitAsmJSCall(LAsmJSCallBase* ins);
>      void visitAsmJSCall(LAsmJSCall* ins);
>      void visitAsmJSCallI64(LAsmJSCallI64* ins);
> +    template <typename T>
> +    void emitWasmLoad(T* ins);

Could this (and emitWasmStore) be private or protected instead?

::: js/src/jit/x86/LOpcodes-x86.h
@@ +19,5 @@
>      _(UDivOrModI64)             \
>      _(DivOrModI64)              \
>      _(WasmTruncateToInt64)      \
> +    _(Int64ToFloatingPoint)     \
> +    _(WasmStoreI64)

Could you put it next to WasmStore, in LOpcodes.h, please? (with a dummy impl for other platforms instead, that's what we've done for the rest of opcodes).
Attachment #8775160 - Flags: review?(bbouvier) → review+
Comment on attachment 8775207 [details] [diff] [review]
Part 27: Implement LWasmLoadGlobalVarI64 and LWasmStoreGlobalVarI64

Review of attachment 8775207 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Attachment #8775207 - Flags: review?(bbouvier) → review+
Pushed by hv1989@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f7f0a9f0833e
Part 1: Preparations in IonMonkey to support i64 on x86, r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/1a594c9bd29f
Part 2: Implement the 64bit variant of AsmJSParameter on 32bit platforms, r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/c65db91b411f
Part 3: Implement the 64bit variant of AsmJSReturn on 32bit platforms, r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/8bc3e8bbbf52
Part 4: Implement the 64bit variant of Compare on x86, r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/27826b22e140
Part 5: Implement the 64bit variant of Shift on x86, r=bbouvier
https://hg.mozilla.org/integration/mozilla-inbound/rev/02f604c9ad73
Part 6: Implement the 64bit variant of BitOp on x86, r=bbouvier
https://hg.mozilla.org/integration/mozilla-inbound/rev/e3e8bb6b8d81
Part 7: Implement the 64bit variant of Add on x86, r=lth
https://hg.mozilla.org/integration/mozilla-inbound/rev/5512359e559f
Part 8: Implement the 64bit variant of Sub on x86, r=lth
https://hg.mozilla.org/integration/mozilla-inbound/rev/0c56943e6d0e
Part 9: Implement the 64bit variant of Mul on x86, r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/8c7e1e2e1a9d
Part 10: Implement the 64bit variant of Rotate on x86, r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/61f6a1448812
Part 11: Implement the 64bit variant of AsmJSPassStackArg on x86, r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/b15c4d7a91ac
Part 12: Implement the 64bit variant of Div and Mod on x86, r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/39ec8e937bf8
Part 13: Implement the 64bit variant of AsmSelect on x86, r=efaust
https://hg.mozilla.org/integration/mozilla-inbound/rev/1189e5f979cf
Part 14: Implement the 64bit variant of AsmReinterpret on x86, r=efaust
https://hg.mozilla.org/integration/mozilla-inbound/rev/27f7299c0454
Part 15: Implement the 64bit variant of ExtendInt32toInt64 on x86, r=efaust
https://hg.mozilla.org/integration/mozilla-inbound/rev/bd1c4839cd66
Part 16: Implement the 64bit variant of WrapInt64ToInt32 on x86, r=efaust
https://hg.mozilla.org/integration/mozilla-inbound/rev/91846989e364
Part 17: Implement the 64bit variant of PopCnt on x86, r=lth
https://hg.mozilla.org/integration/mozilla-inbound/rev/0ef5f5b0c607
Part 18: Implement the 64bit variant of Clz and Ctz on x86, r=bbouvier
https://hg.mozilla.org/integration/mozilla-inbound/rev/efe2a71423f7
Part 19: Implement the 64bit variant of Not on x86, r=bbouvier
https://hg.mozilla.org/integration/mozilla-inbound/rev/cb47a62a37a4
Part 20: Implement the 64bit variant of WasmTruncate on x86, r=sunfish
https://hg.mozilla.org/integration/mozilla-inbound/rev/0c40b01a4cba
Part 21: Implement the 64bit variant of ToFloatingPoint on x86, r=sunfish
https://hg.mozilla.org/integration/mozilla-inbound/rev/c63714ab5d4d
Part 22: Implement the 64bit variant of AsmJSCall on x86, r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/fa0c24a5b89d
Part 23: Implement the 64bit variant of Test on x86, r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/346f051f5978
Part 24: Make WebAssembly ready to run 64bit instructions on x86, r=luke
https://hg.mozilla.org/integration/mozilla-inbound/rev/bb7803606205
Part 25: Extra tests, r=lth
https://hg.mozilla.org/integration/mozilla-inbound/rev/4dabba8cf926
Part 26: Implement the 64bit variant of WasmLoad and WasmStore on x86, r=bbouvier
https://hg.mozilla.org/integration/mozilla-inbound/rev/c1e1449a0ad8
Part 27: Implement the 64bit variant of WasmLoadGlobalVar and WasmStoreGlobalVar on x86, r=bbouvier
https://hg.mozilla.org/integration/mozilla-inbound/rev/ec9e3ca99293
Part 27: Temporarily disable the Wasm baseline compiler for x86, r=lth
https://hg.mozilla.org/mozilla-central/rev/f7f0a9f0833e
https://hg.mozilla.org/mozilla-central/rev/1a594c9bd29f
https://hg.mozilla.org/mozilla-central/rev/c65db91b411f
https://hg.mozilla.org/mozilla-central/rev/8bc3e8bbbf52
https://hg.mozilla.org/mozilla-central/rev/27826b22e140
https://hg.mozilla.org/mozilla-central/rev/02f604c9ad73
https://hg.mozilla.org/mozilla-central/rev/e3e8bb6b8d81
https://hg.mozilla.org/mozilla-central/rev/5512359e559f
https://hg.mozilla.org/mozilla-central/rev/0c56943e6d0e
https://hg.mozilla.org/mozilla-central/rev/8c7e1e2e1a9d
https://hg.mozilla.org/mozilla-central/rev/61f6a1448812
https://hg.mozilla.org/mozilla-central/rev/b15c4d7a91ac
https://hg.mozilla.org/mozilla-central/rev/39ec8e937bf8
https://hg.mozilla.org/mozilla-central/rev/1189e5f979cf
https://hg.mozilla.org/mozilla-central/rev/27f7299c0454
https://hg.mozilla.org/mozilla-central/rev/bd1c4839cd66
https://hg.mozilla.org/mozilla-central/rev/91846989e364
https://hg.mozilla.org/mozilla-central/rev/0ef5f5b0c607
https://hg.mozilla.org/mozilla-central/rev/efe2a71423f7
https://hg.mozilla.org/mozilla-central/rev/cb47a62a37a4
https://hg.mozilla.org/mozilla-central/rev/0c40b01a4cba
https://hg.mozilla.org/mozilla-central/rev/c63714ab5d4d
https://hg.mozilla.org/mozilla-central/rev/fa0c24a5b89d
https://hg.mozilla.org/mozilla-central/rev/346f051f5978
https://hg.mozilla.org/mozilla-central/rev/bb7803606205
https://hg.mozilla.org/mozilla-central/rev/4dabba8cf926
https://hg.mozilla.org/mozilla-central/rev/c1e1449a0ad8
https://hg.mozilla.org/mozilla-central/rev/ec9e3ca99293
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Depends on: 1292563
Depends on: 1309476
No longer depends on: 1309476
You need to log in before you can comment on or make changes to this bug.