Closed Bug 1077014 Opened 5 years ago Closed 5 years ago

Generate better code when the result is not used

Categories

(Core :: JavaScript Engine: JIT, defect)

x86_64
All
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox39 --- fixed

People

(Reporter: lth, Assigned: lth)

References

(Blocks 1 open bug)

Details

Attachments

(4 files, 1 obsolete file)

Followup work on 979594.

When the result of an atomic ADD or SUB is not used we can generate simply 'LOCK ADD' or 'LOCK SUB' instead of 'MOV; LOCK XADD' for add or 'MOV; NEG; LOCK XADD' for subtract.  Immediate operands will fit right into the instruction.

When the result of an atomic AND, OR, or XOR is not used we can generate simply 'LOCK AND', etc, instead of a loop with CMPXCHG that we have to use if the result is needed.  Again, immediate operands will fit right into the instruction.

(I'm not aware of tweaks to atomic CMPXCHG, and in any case the result of that operation will usually be used.)
A fair amount of the code is crossplatform and even on ARM there are pleasant effects of this optimization (less register pressure).
Summary: x86 atomics: Generate better code when the result is not used → Generate better code when the result is not used
Depends on: 1141067
Blocks: 1141121
Simple variant on existing code.
Attachment #8575167 - Flags: review?(mrosenberg)
Simple API changes.
Attachment #8575225 - Flags: review?(hv1989)
Attached patch Optimizations and tests (obsolete) — Splinter Review
(There are additional opportunities, many will be taken care of by bug 1141121.)
Attachment #8575226 - Flags: review?(hv1989)
Comment on attachment 8575225 [details] [diff] [review]
Macroassembler changes

Review of attachment 8575225 [details] [diff] [review]:
-----------------------------------------------------------------

Can you open a follow-up bug for the mips changes and needinfo rankov?

::: js/src/jit/MacroAssembler.cpp
@@ +641,5 @@
>  MacroAssembler::atomicBinopToTypedIntArray(AtomicOp op, Scalar::Type arrayType,
>                                             const Register &value, const BaseIndex &mem,
>                                             Register temp1, Register temp2, AnyRegister output);
>  
> +// For effect

??
Attachment #8575225 - Flags: review?(hv1989) → review+
Comment on attachment 8575226 [details] [diff] [review]
Optimizations and tests

Review of attachment 8575226 [details] [diff] [review]:
-----------------------------------------------------------------

I think it would be better to add a new LIR for this case. LAsmJSAtomicBinopHeap has a 'def', where the register allocator need to worry about. If we add a new LIR we can remove that 'def'.
Attachment #8575226 - Flags: review?(hv1989)
MIPS stubs factored out from the MacroAssembler patch, as requested by h4writer.
Attachment #8577118 - Flags: review?(branislav.rankov)
Refactored as requested.
Attachment #8575226 - Attachment is obsolete: true
Attachment #8577171 - Flags: review?(hv1989)
Comment on attachment 8577171 [details] [diff] [review]
Optimizations and tests, v2

Review of attachment 8577171 [details] [diff] [review]:
-----------------------------------------------------------------

I only had some reservations with the function "asmJSAtomicComputeAddress". Feel free to contact me in chat upon questions about it or other possible suggestions.
r+ when agreeing on the "asmJSAtomicComputeAddress" changes.

::: js/src/jit/LIR-Common.h
@@ +5094,5 @@
>          return mir_->toAtomicTypedArrayElementBinop();
>      }
>  };
>  
> +class LAtomicTypedArrayElementBinopForEffect : public LInstructionHelper<0, 3, 0>

Can you add an explanation above of what this LIR does?

@@ +6523,5 @@
>          return mir_->toAsmJSAtomicBinopHeap();
>      }
>  };
>  
> +class LAsmJSAtomicBinopHeapForEffect : public LInstructionHelper<0, 2, 1>

Can you add an explanation above of what this LIR does?

::: js/src/jit/x86/CodeGenerator-x86.cpp
@@ +694,1 @@
>  {

Oh this is strange. Can we make sure the name "asmJSAtomicComputeAddress" matches what happens in here?
This function does more than expected. It also does: "set out to zero upon boundscheckfail", instead of only "computeAddress".

So can we move this out of this function:
> memoryBarrier(MembarFull);
> if (out != InvalidReg)
>     masm.xorl(out,out);

To something like this:
> asmJSAtomicComputeAddress(Register addrTemp, Register ptrReg, bool boundsCheck,
>                           int32_t offset, int32_t endOffset, Label &fail)
> {
>     if (mir->needsBoundsCheck()) {
>         ...
>         masm.j(Assembler::Above, &fail);
>     }
>     ...
> }
> 
> visitAsmJSAtomicBinopHeap(..)
> {
>     asmJSAtomicComputeAddress(...);
> 
> 
>     if (fail.used) {
>         masm.jump(&rejoin);
>         masm.bind(&fail);
>         memoryBarrier(MembarFull);
>         if (out != InvalidReg)
>             masm.xorl(out,out);
> 
>     }
>     masm.bind(&rejoin);
> }

If the "fail" case is very uncommon we can even make it an oolcode, removing the jump.
Attachment #8577171 - Flags: review?(hv1989) → review+
Discussion with Hannes on IRC: we'll add a comment to the existing asmJSAtomicComputeAddress() method to clarify what it does (bounds check, barrier + clear result on failure + compute address).
Comment on attachment 8575167 [details] [diff] [review]
Atomic binops for effect for ARM

Review of attachment 8575167 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +5049,5 @@
> +                                        const T &mem)
> +{
> +    // Fork for non-word operations on ARMv6.
> +    //
> +    // Bug 1077321: We may further optimize for ARMv8 here.

ARMv8 is different enough from ARMv7 that it is getting an entirely new arch directory.
Attachment #8575167 - Flags: review?(mrosenberg) → review+
Comment on attachment 8577118 [details] [diff] [review]
Macroassembler changes (MIPS)

Review of attachment 8577118 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/mips/MacroAssembler-mips.h
@@ +1016,5 @@
>      template<typename T, typename S>
>      void atomicFetchAdd32(const S &value, const T &mem, Register temp, Register output) {
>          MOZ_CRASH("NYI");
>      }
> +    template <typename T, typename S> void atomicAdd8(const T &value, const S &mem) {

NIT: Add new line after template <typename T, typename S>

Do the same for all other instances.
Attachment #8577118 - Flags: review?(branislav.rankov) → review+
(In reply to Marty Rosenberg [:mjrosenb] from comment #11)
> Comment on attachment 8575167 [details] [diff] [review]
> Atomic binops for effect for ARM
> 
> Review of attachment 8575167 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jit/arm/MacroAssembler-arm.cpp
> @@ +5049,5 @@
> > +                                        const T &mem)
> > +{
> > +    // Fork for non-word operations on ARMv6.
> > +    //
> > +    // Bug 1077321: We may further optimize for ARMv8 here.
> 
> ARMv8 is different enough from ARMv7 that it is getting an entirely new arch
> directory.

You're probably thinking about 64-bit?  I was thinking about the fact that there are new AArch32 instructions for synchronization (some release-acquire thing that will allow for the insertion of fewer memory barrier).

(I'll clarify the comment.)
You need to log in before you can comment on or make changes to this bug.