SIMD: Generalize load/store

RESOLVED FIXED in Firefox 39

Status

()

defect
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: bbouvier, Assigned: bbouvier)

Tracking

Trunk
mozilla39
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox39 fixed)

Details

Attachments

(6 attachments, 3 obsolete attachments)

Current Load/Store MIR nodes exist only for Odin and are very specialized to asm.js, as they reuse the same AsmJSLoad/StoreHeap already existing.  We should be probably able to do the same for load/store in Ion, that is, reusing the existing MIR nodes.
Assignee: nobody → benj
Status: NEW → ASSIGNED
Posted patch load.patch (obsolete) — Splinter Review
It's amazing how SIMD load can actually fit in the LoadTypedArrayElement scheme with not that much work... What do you think of this approach?
Attachment #8570042 - Flags: feedback?(sunfish)
Comment on attachment 8570042 [details] [diff] [review]
load.patch

Review of attachment 8570042 [details] [diff] [review]:
-----------------------------------------------------------------

I like it, provided we can figure out something to do about the overflow situation. I think we can then do the XYZ/XY/X versions in a similar manner too; do you agree?

We should also get input from people with more IonBuilder knowledge.

::: js/src/jit/MCallOptimize.cpp
@@ +3156,5 @@
> +
> +    // Artificially make sure the index is in bounds by adding the difference
> +    // number of slots needed (e.g. reading from Float32Array we need to make
> +    // sure to be in bounds for 4 slots, so add 3, etc.).
> +    int32_t suppSlotsNeeded = Simd128DataSize / Scalar::byteSize(arrayType) - 1;

It'd be nice to assert that this division has no remainder.

@@ +3160,5 @@
> +    int32_t suppSlotsNeeded = Simd128DataSize / Scalar::byteSize(arrayType) - 1;
> +    if (suppSlotsNeeded) {
> +        MConstant *suppSlots = constant(Int32Value(suppSlotsNeeded));
> +        MAdd *addedIndex = MAdd::New(alloc(), index, suppSlots);
> +        addedIndex->setInt32(); // XXX overflow?

Hrm. I don't have any great ideas here either yet.

::: js/src/jit/MIR.h
@@ +8865,4 @@
>      Scalar::Type arrayType() const {
>          return arrayType_;
>      }
> +

Nit: whitespace
Attachment #8570042 - Flags: feedback?(sunfish) → feedback+
Posted patch loads.patchSplinter Review
So this patch implements and inlines SIMD.int32x4.load (and for float32x4). Fortunately, most of the work already done for normal typed arrays can be reused. Even more fortunately, most caveats in typed arrays don't have to be handled here: reading out-of-bounds for SIMD memory accesses throws, so there's no chance we've seen undefined beforehands, and we don't need to apply particular things for clamping.

Note that SIMD.int32x4.load expects 2 arguments: any array view and an index. The fact that we need any array view implies that we need to split the meaning of what's the array type vs what's the type of read value in MLoadTypedArrayElement, thus the readType member.

I am fairly experimenting with this patch and might miss some cases. How does this approach look?
Attachment #8570042 - Attachment is obsolete: true
Attachment #8570601 - Flags: feedback?(bhackett1024)
Posted patch stores.patchSplinter Review
Same thing for stores, with some refactoring.

I haven't said it for the previous patch, but feel free to morph this into a review if you think the approach is acceptable.
Attachment #8570602 - Flags: feedback?(bhackett1024)
Comment on attachment 8570601 [details] [diff] [review]
loads.patch

Review of attachment 8570601 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/RangeAnalysis.cpp
@@ +1641,5 @@
>      Range *output = new(alloc) Range(input());
>      setRange(output);
>  }
>  
> +static Range *GetTypedArrayRange(TempAllocator &alloc, Scalar::Type type)

Preexisting, but there should be a line break before 'GetTypedArrayRange'.

@@ +1663,5 @@
>        case Scalar::Float32:
>        case Scalar::Float64:
> +      case Scalar::Float32x4:
> +      case Scalar::Int32x4:
> +      case Scalar::MaxTypedArrayViewType:

Maybe just change these cases to 'default'

@@ +1668,4 @@
>          break;
>      }
>  
>    return nullptr;

Preexisting, but the spacing on this line is wrong.

::: js/src/jit/arm/MacroAssembler-arm.h
@@ +1403,5 @@
>  
>      void loadAlignedFloat32x4(const Address &addr, FloatRegister dest) { MOZ_CRASH("NYI"); }
>      void storeAlignedFloat32x4(FloatRegister src, Address addr) { MOZ_CRASH("NYI"); }
>      void loadUnalignedFloat32x4(const Address &addr, FloatRegister dest) { MOZ_CRASH("NYI"); }
> +    void loadUnalignedFloat32x4(const BaseIndex &addr, FloatRegister dest) { MOZ_CRASH("NYI"); }

MacroAssembler-mips.h should be updated too.
Attachment #8570601 - Flags: feedback?(bhackett1024) → review+
Comment on attachment 8570602 [details] [diff] [review]
stores.patch

Review of attachment 8570602 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/MacroAssembler-arm.h
@@ +1406,5 @@
>      void storeAlignedFloat32x4(FloatRegister src, Address addr) { MOZ_CRASH("NYI"); }
>      void loadUnalignedFloat32x4(const Address &addr, FloatRegister dest) { MOZ_CRASH("NYI"); }
>      void loadUnalignedFloat32x4(const BaseIndex &addr, FloatRegister dest) { MOZ_CRASH("NYI"); }
>      void storeUnalignedFloat32x4(FloatRegister src, Address addr) { MOZ_CRASH("NYI"); }
> +    void storeUnalignedFloat32x4(FloatRegister src, BaseIndex addr) { MOZ_CRASH("NYI"); }

MacroAssembler-mips.h should also be updated.
Attachment #8570602 - Flags: feedback?(bhackett1024) → feedback+
Attachment #8570602 - Flags: feedback+ → review+
Thanks for the reviews! I unfortunately get an assertion error in the backtracking allocator, when running an x86 build with --ion-eager --no-threads on the SIMD/store test:

Assertion failure: !minimalInterval(interval), at /home/ben/code/moz/repo/js/src/jit/BacktrackingAllocator.cpp:610

So apparently it means we're trying to split an interval that can't be split.  Can you spot something obvious in the inlineSimdStore method that could trigger this? What debug information would help? I'd like to investigate more but I'm just not sure about how to do it...
Flags: needinfo?(bhackett1024)
(In reply to Benjamin Bouvier [:bbouvier] from comment #7)
> Thanks for the reviews! I unfortunately get an assertion error in the
> backtracking allocator, when running an x86 build with --ion-eager
> --no-threads on the SIMD/store test:
> 
> Assertion failure: !minimalInterval(interval), at
> /home/ben/code/moz/repo/js/src/jit/BacktrackingAllocator.cpp:610
> 
> So apparently it means we're trying to split an interval that can't be
> split.  Can you spot something obvious in the inlineSimdStore method that
> could trigger this? What debug information would help? I'd like to
> investigate more but I'm just not sure about how to do it...

Can you attach the IONFLAGS=regalloc output for the script?
Flags: needinfo?(bhackett1024)
Posted file regalloc.txt
Sure, here we go!
Flags: needinfo?(bhackett1024)
Right before the regalloc hits this assert, it is trying to allocate a float32x4 vreg with a requirement that it be stored in eax.  Impossible allocations like this should be the only thing that cause this assert to fail, other than bugs in the regalloc itself.

The reason it is required to be in eax is that in LIRGenerator::visitStoreTypedArrayElement we useByteOpRegisterOrNonDoubleConstant on the value when storing to a byte array, which has the effect of requiring eax on x86 (x86 has a weird thing where not all registers can be used in byte width ops, and we don't support requirements that a vreg be in a particular set of registers).

Unfortunately this isn't very clear from the output, as instead of printing that the vreg needs to be in eax it prints this weird thing:

[RegAlloc]   Requirement %xmm0.s, due to use at 1439

This is I think due to the gpr-fpu mismatch between the def and use confusing GetFixedRegister.
Flags: needinfo?(bhackett1024)
(In reply to Brian Hackett (:bhackett) from comment #10)
> Right before the regalloc hits this assert, it is trying to allocate a
> float32x4 vreg with a requirement that it be stored in eax.  Impossible
> allocations like this should be the only thing that cause this assert to
> fail, other than bugs in the regalloc itself.
> 
> The reason it is required to be in eax is that in
> LIRGenerator::visitStoreTypedArrayElement we
> useByteOpRegisterOrNonDoubleConstant on the value when storing to a byte
> array, which has the effect of requiring eax on x86 (x86 has a weird thing
> where not all registers can be used in byte width ops, and we don't support
> requirements that a vreg be in a particular set of registers).
> 
> Unfortunately this isn't very clear from the output, as instead of printing
> that the vreg needs to be in eax it prints this weird thing:
> 
> [RegAlloc]   Requirement %xmm0.s, due to use at 1439
> 
> This is I think due to the gpr-fpu mismatch between the def and use
> confusing GetFixedRegister.

Thanks a lot! The fix is trivial: i've changed the test to isByteArray and !isSimdWrite, which makes me think this shouldn't happen and isByteArray should instead be isByteWrite (patch forthcoming).

I think I've understood enough about the spew here to be able to find the source of the error myself next time. Thanks again!
Posted patch cleanup.patchSplinter Review
So this patch cleans up the code introduced in the previous patches, by replacing checks like "is this array of this type" with "are we writing this type".  I've also shared some code between the 3 StoreTypedArray* classes, which have this in common.  Introducing isIntegerWrite, which is a cleaner check for range analysis.  Does the naming sound good?
Attachment #8571459 - Flags: review?(bhackett1024)
Comment on attachment 8571459 [details] [diff] [review]
cleanup.patch

Review of attachment 8571459 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +9061,5 @@
>      bool canProduceFloat32() const MOZ_OVERRIDE { return accessType() == Scalar::Float32; }
>      void collectRangeInfoPreTrunc() MOZ_OVERRIDE;
>  };
>  
> +// Base class for MStoreTypedArrayElement, MStoreTypedArrayElementHole, MStoreTypedArrayStatic

Maybe 'Base class for MIR ops that write to typed arrays'
Attachment #8571459 - Flags: review?(bhackett1024) → review+
sorry had to back this out for bustage like https://treeherder.mozilla.org/logviewer.html#?job_id=7134040&repo=mozilla-inbound
Flags: needinfo?(benj)
Sorry about the bustage! Two mistakes of mine:
- forgot to make the ctor explicit :/
- and a real issue: i was initializing MStoreTypedArrayElementStatic writeType with the type of the uninitialized someArray_ field...

Try run, just to be safe:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=63523112a379
Flags: needinfo?(benj)
Keywords: leave-open
Depends on: 1140709
Flags: needinfo?(benj)
Posted patch loadxyz.patch (obsolete) — Splinter Review
Attachment #8579532 - Flags: review?(bhackett1024)
Posted patch storexyz.patch (obsolete) — Splinter Review
Attachment #8579533 - Flags: review?(bhackett1024)
Comment on attachment 8579532 [details] [diff] [review]
loadxyz.patch

Review of attachment 8579532 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/shared/MacroAssembler-x86-shared.h
@@ +985,5 @@
> +    }
> +    void loadQuadWord(const BaseIndex &src, FloatRegister dest) {
> +        vmovq(Operand(src), dest);
> +    }
> +    void loadThreeDoubleWords(const BaseIndex &src, const BaseIndex &srcZ, FloatRegister dest) {

What is the advantage of having this srcZ parameter?  Currently srcZ will always differ from src by twice the size of whatever type of data is being loaded, and it seems like loadThreeDoubleWords and loadThreeFloat32 could just make a new BaseIndex or Address rather than having it passed in explicitly.  Will the difference between src and srcZ always be set in this way?  If so, removing the parameter would save a lot of complexity in the callers.
Attachment #8579532 - Flags: review?(bhackett1024)
Comment on attachment 8579533 [details] [diff] [review]
storexyz.patch

Review of attachment 8579533 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/CodeGenerator.cpp
@@ +8802,5 @@
> +        Address dest(elements, offset);
> +        if (!Scalar::isSimdType(writeType) || numElems != 3) {
> +            StoreToTypedArray(masm, writeType, value, dest, numElems);
> +        } else {
> +            Address destZ(elements, offset + 2 * sizeof(int32_t));

This patch has the same issue with the destZ parameters as the loadXYZ patch, can you fix that here too (if it is appropriate to fix at all)?  Also, why is this 2 * sizeof(int32_t), rather than 2 * width?  (the loadXYZ patch has the same behavior, fwiw)
Attachment #8579533 - Flags: review?(bhackett1024)
Posted patch loadXYZSplinter Review
Good point! Plus, there was something wrong in the conditionals and one test was actually hitting it...
Attachment #8579532 - Attachment is obsolete: true
Attachment #8579533 - Attachment is obsolete: true
Attachment #8580025 - Flags: review?(bhackett1024)
Posted patch storeXYZSplinter Review
Attachment #8580026 - Flags: review?(bhackett1024)
Comment on attachment 8580025 [details] [diff] [review]
loadXYZ

Review of attachment 8580025 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +8850,5 @@
>                         int32_t offsetAdjustment, bool canonicalizeDoubles)
>        : MBinaryInstruction(elements, index),
>          indexType_(indexType),
>          readType_(indexType),
> +        numElems_(0),

numElems_ defaulting to zero is kind of weird.  Could you either add a comment that the field is only used by SIMD, or just default it to one?

::: js/src/jit/shared/MacroAssembler-x86-shared.h
@@ +973,5 @@
>      void zeroInt32x4(FloatRegister dest) {
>          vpxor(dest, dest, dest);
>      }
>  
> +    void loadDoubleWord(const Address &src, FloatRegister dest) {

Using 'Word' in these method names is kind of confusing as it is an architecture dependent term and the use here differs from how we use it in ImmWord.

Maybe make these methods loadInt32x{1,2,3} to match existing methods better?

@@ +1089,5 @@
>      void packedUnsignedRightShiftByScalar(Imm32 count, FloatRegister dest) {
>          vpsrld(count, dest, dest);
>      }
>  
> +    void loadThreeFloat32(const Address &src, FloatRegister dest) {

Similarly, maybe call this loadFloat32x3 to match existing methods better.
Attachment #8580025 - Flags: review?(bhackett1024) → review+
Comment on attachment 8580026 [details] [diff] [review]
storeXYZ

Review of attachment 8580026 [details] [diff] [review]:
-----------------------------------------------------------------

All the comments on the loadXYZ patch apply here as well.
Attachment #8580026 - Flags: review?(bhackett1024) → review+
Thanks!
Try build in comment 26, and all platforms build locally with the renamings, so we should be good.

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/266fac2f7b2a
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/44aa05cc400f
Flags: needinfo?(benj)
(and that will be the last patches for that bug!)
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/266fac2f7b2a
https://hg.mozilla.org/mozilla-central/rev/44aa05cc400f
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
You need to log in before you can comment on or make changes to this bug.