Closed Bug 1135042 Opened 9 years ago Closed 9 years ago

SIMD: Generalize load/store

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla39

Tracking Flags:

Tracking

Status

firefox39

---

fixed

People

(Reporter: bbouvier, Assigned: bbouvier)

References

Details

Attachments

(6 files, 3 obsolete files)

load.patch 9 years ago Benjamin Bouvier [:bbouvier] (inactive) 16.38 KB, patch	sunfish : feedback+	Details \| Diff \| Splinter Review
loads.patch 9 years ago Benjamin Bouvier [:bbouvier] (inactive) 19.97 KB, patch	bhackett1024 : review+	Details \| Diff \| Splinter Review
stores.patch 9 years ago Benjamin Bouvier [:bbouvier] (inactive) 25.41 KB, patch	bhackett1024 : review+	Details \| Diff \| Splinter Review
regalloc.txt 9 years ago Benjamin Bouvier [:bbouvier] (inactive) 727.23 KB, text/plain		Details
cleanup.patch 9 years ago Benjamin Bouvier [:bbouvier] (inactive) 13.12 KB, patch	bhackett1024 : review+	Details \| Diff \| Splinter Review
loadxyz.patch 9 years ago Benjamin Bouvier [:bbouvier] (inactive) 29.66 KB, patch		Details \| Diff \| Splinter Review
storexyz.patch 9 years ago Benjamin Bouvier [:bbouvier] (inactive) 29.10 KB, patch		Details \| Diff \| Splinter Review
loadXYZ 9 years ago Benjamin Bouvier [:bbouvier] (inactive) 28.73 KB, patch	bhackett1024 : review+	Details \| Diff \| Splinter Review
storeXYZ 9 years ago Benjamin Bouvier [:bbouvier] (inactive) 28.14 KB, patch	bhackett1024 : review+	Details \| Diff \| Splinter Review

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Description

•

9 years ago

Current Load/Store MIR nodes exist only for Odin and are very specialized to asm.js, as they reuse the same AsmJSLoad/StoreHeap already existing.  We should be probably able to do the same for load/store in Ion, that is, reusing the existing MIR nodes.

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Updated

•

9 years ago

Assignee: nobody → benj

Status: NEW → ASSIGNED

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 1

•

9 years ago

Attached patch load.patch (obsolete) — Details — Splinter Review

It's amazing how SIMD load can actually fit in the LoadTypedArrayElement scheme with not that much work... What do you think of this approach?

Attachment #8570042 - Flags: feedback?(sunfish)

Dan Gohman [:sunfish]

Comment 2

•

9 years ago

Comment on attachment 8570042 [details] [diff] [review]
load.patch

Review of attachment 8570042 [details] [diff] [review]:
-----------------------------------------------------------------

I like it, provided we can figure out something to do about the overflow situation. I think we can then do the XYZ/XY/X versions in a similar manner too; do you agree?

We should also get input from people with more IonBuilder knowledge.

::: js/src/jit/MCallOptimize.cpp
@@ +3156,5 @@
> +
> +    // Artificially make sure the index is in bounds by adding the difference
> +    // number of slots needed (e.g. reading from Float32Array we need to make
> +    // sure to be in bounds for 4 slots, so add 3, etc.).
> +    int32_t suppSlotsNeeded = Simd128DataSize / Scalar::byteSize(arrayType) - 1;

It'd be nice to assert that this division has no remainder.

@@ +3160,5 @@
> +    int32_t suppSlotsNeeded = Simd128DataSize / Scalar::byteSize(arrayType) - 1;
> +    if (suppSlotsNeeded) {
> +        MConstant *suppSlots = constant(Int32Value(suppSlotsNeeded));
> +        MAdd *addedIndex = MAdd::New(alloc(), index, suppSlots);
> +        addedIndex->setInt32(); // XXX overflow?

Hrm. I don't have any great ideas here either yet.

::: js/src/jit/MIR.h
@@ +8865,4 @@
>      Scalar::Type arrayType() const {
>          return arrayType_;
>      }
> +

Nit: whitespace

Attachment #8570042 - Flags: feedback?(sunfish) → feedback+

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 3

•

9 years ago

Attached patch loads.patch — Details — Splinter Review

So this patch implements and inlines SIMD.int32x4.load (and for float32x4). Fortunately, most of the work already done for normal typed arrays can be reused. Even more fortunately, most caveats in typed arrays don't have to be handled here: reading out-of-bounds for SIMD memory accesses throws, so there's no chance we've seen undefined beforehands, and we don't need to apply particular things for clamping.

Note that SIMD.int32x4.load expects 2 arguments: any array view and an index. The fact that we need any array view implies that we need to split the meaning of what's the array type vs what's the type of read value in MLoadTypedArrayElement, thus the readType member.

I am fairly experimenting with this patch and might miss some cases. How does this approach look?

Attachment #8570042 - Attachment is obsolete: true

Attachment #8570601 - Flags: feedback?(bhackett1024)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 4

•

9 years ago

Attached patch stores.patch — Details — Splinter Review

Same thing for stores, with some refactoring.

I haven't said it for the previous patch, but feel free to morph this into a review if you think the approach is acceptable.

Attachment #8570602 - Flags: feedback?(bhackett1024)

Brian Hackett [Laid off!]

Comment 5

•

9 years ago

Comment on attachment 8570601 [details] [diff] [review]
loads.patch

Review of attachment 8570601 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/RangeAnalysis.cpp
@@ +1641,5 @@
>      Range *output = new(alloc) Range(input());
>      setRange(output);
>  }
>  
> +static Range *GetTypedArrayRange(TempAllocator &alloc, Scalar::Type type)

Preexisting, but there should be a line break before 'GetTypedArrayRange'.

@@ +1663,5 @@
>        case Scalar::Float32:
>        case Scalar::Float64:
> +      case Scalar::Float32x4:
> +      case Scalar::Int32x4:
> +      case Scalar::MaxTypedArrayViewType:

Maybe just change these cases to 'default'

@@ +1668,4 @@
>          break;
>      }
>  
>    return nullptr;

Preexisting, but the spacing on this line is wrong.

::: js/src/jit/arm/MacroAssembler-arm.h
@@ +1403,5 @@
>  
>      void loadAlignedFloat32x4(const Address &addr, FloatRegister dest) { MOZ_CRASH("NYI"); }
>      void storeAlignedFloat32x4(FloatRegister src, Address addr) { MOZ_CRASH("NYI"); }
>      void loadUnalignedFloat32x4(const Address &addr, FloatRegister dest) { MOZ_CRASH("NYI"); }
> +    void loadUnalignedFloat32x4(const BaseIndex &addr, FloatRegister dest) { MOZ_CRASH("NYI"); }

MacroAssembler-mips.h should be updated too.

Attachment #8570601 - Flags: feedback?(bhackett1024) → review+

Brian Hackett [Laid off!]

Comment 6

•

9 years ago

Comment on attachment 8570602 [details] [diff] [review]
stores.patch

Review of attachment 8570602 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/MacroAssembler-arm.h
@@ +1406,5 @@
>      void storeAlignedFloat32x4(FloatRegister src, Address addr) { MOZ_CRASH("NYI"); }
>      void loadUnalignedFloat32x4(const Address &addr, FloatRegister dest) { MOZ_CRASH("NYI"); }
>      void loadUnalignedFloat32x4(const BaseIndex &addr, FloatRegister dest) { MOZ_CRASH("NYI"); }
>      void storeUnalignedFloat32x4(FloatRegister src, Address addr) { MOZ_CRASH("NYI"); }
> +    void storeUnalignedFloat32x4(FloatRegister src, BaseIndex addr) { MOZ_CRASH("NYI"); }

MacroAssembler-mips.h should also be updated.

Attachment #8570602 - Flags: feedback?(bhackett1024) → feedback+

Brian Hackett [Laid off!]

Updated

•

9 years ago

Attachment #8570602 - Flags: feedback+ → review+

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 7

•

9 years ago

Thanks for the reviews! I unfortunately get an assertion error in the backtracking allocator, when running an x86 build with --ion-eager --no-threads on the SIMD/store test:

Assertion failure: !minimalInterval(interval), at /home/ben/code/moz/repo/js/src/jit/BacktrackingAllocator.cpp:610

So apparently it means we're trying to split an interval that can't be split.  Can you spot something obvious in the inlineSimdStore method that could trigger this? What debug information would help? I'd like to investigate more but I'm just not sure about how to do it...

Flags: needinfo?(bhackett1024)

Brian Hackett [Laid off!]

Comment 8

•

9 years ago

(In reply to Benjamin Bouvier [:bbouvier] from comment #7)
> Thanks for the reviews! I unfortunately get an assertion error in the
> backtracking allocator, when running an x86 build with --ion-eager
> --no-threads on the SIMD/store test:
> 
> Assertion failure: !minimalInterval(interval), at
> /home/ben/code/moz/repo/js/src/jit/BacktrackingAllocator.cpp:610
> 
> So apparently it means we're trying to split an interval that can't be
> split.  Can you spot something obvious in the inlineSimdStore method that
> could trigger this? What debug information would help? I'd like to
> investigate more but I'm just not sure about how to do it...

Can you attach the IONFLAGS=regalloc output for the script?

Flags: needinfo?(bhackett1024)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 9

•

9 years ago

Attached file regalloc.txt — Details

Sure, here we go!

Flags: needinfo?(bhackett1024)

Brian Hackett [Laid off!]

Comment 10

•

9 years ago

Right before the regalloc hits this assert, it is trying to allocate a float32x4 vreg with a requirement that it be stored in eax.  Impossible allocations like this should be the only thing that cause this assert to fail, other than bugs in the regalloc itself.

The reason it is required to be in eax is that in LIRGenerator::visitStoreTypedArrayElement we useByteOpRegisterOrNonDoubleConstant on the value when storing to a byte array, which has the effect of requiring eax on x86 (x86 has a weird thing where not all registers can be used in byte width ops, and we don't support requirements that a vreg be in a particular set of registers).

Unfortunately this isn't very clear from the output, as instead of printing that the vreg needs to be in eax it prints this weird thing:

[RegAlloc]   Requirement %xmm0.s, due to use at 1439

This is I think due to the gpr-fpu mismatch between the def and use confusing GetFixedRegister.

Flags: needinfo?(bhackett1024)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 11

•

9 years ago

(In reply to Brian Hackett (:bhackett) from comment #10)
> Right before the regalloc hits this assert, it is trying to allocate a
> float32x4 vreg with a requirement that it be stored in eax.  Impossible
> allocations like this should be the only thing that cause this assert to
> fail, other than bugs in the regalloc itself.
> 
> The reason it is required to be in eax is that in
> LIRGenerator::visitStoreTypedArrayElement we
> useByteOpRegisterOrNonDoubleConstant on the value when storing to a byte
> array, which has the effect of requiring eax on x86 (x86 has a weird thing
> where not all registers can be used in byte width ops, and we don't support
> requirements that a vreg be in a particular set of registers).
> 
> Unfortunately this isn't very clear from the output, as instead of printing
> that the vreg needs to be in eax it prints this weird thing:
> 
> [RegAlloc]   Requirement %xmm0.s, due to use at 1439
> 
> This is I think due to the gpr-fpu mismatch between the def and use
> confusing GetFixedRegister.

Thanks a lot! The fix is trivial: i've changed the test to isByteArray and !isSimdWrite, which makes me think this shouldn't happen and isByteArray should instead be isByteWrite (patch forthcoming).

I think I've understood enough about the spew here to be able to find the source of the error myself next time. Thanks again!

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 12

•

9 years ago

Attached patch cleanup.patch — Details — Splinter Review

So this patch cleans up the code introduced in the previous patches, by replacing checks like "is this array of this type" with "are we writing this type".  I've also shared some code between the 3 StoreTypedArray* classes, which have this in common.  Introducing isIntegerWrite, which is a cleaner check for range analysis.  Does the naming sound good?

Attachment #8571459 - Flags: review?(bhackett1024)

Brian Hackett [Laid off!]

Comment 13

•

9 years ago

Comment on attachment 8571459 [details] [diff] [review]
cleanup.patch

Review of attachment 8571459 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +9061,5 @@
>      bool canProduceFloat32() const MOZ_OVERRIDE { return accessType() == Scalar::Float32; }
>      void collectRangeInfoPreTrunc() MOZ_OVERRIDE;
>  };
>  
> +// Base class for MStoreTypedArrayElement, MStoreTypedArrayElementHole, MStoreTypedArrayStatic

Maybe 'Base class for MIR ops that write to typed arrays'

Attachment #8571459 - Flags: review?(bhackett1024) → review+

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 14

•

9 years ago

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/756a0e4a9f48
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/d69585ea6ca9
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/b7b0305f5747

Carsten Book [:Tomcat]

Comment 15

•

9 years ago

sorry had to back this out for bustage like https://treeherder.mozilla.org/logviewer.html#?job_id=7134040&repo=mozilla-inbound

Flags: needinfo?(benj)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 16

•

9 years ago

Sorry about the bustage! Two mistakes of mine:
- forgot to make the ctor explicit :/
- and a real issue: i was initializing MStoreTypedArrayElementStatic writeType with the type of the uninitialized someArray_ field...

Try run, just to be safe:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=63523112a379

Flags: needinfo?(benj)

Keywords: leave-open

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 17

•

9 years ago

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/ddee53b10d77
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/3f54fe544025
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/caea9f8b038e

Ryan VanderMeulen [:RyanVM]

Comment 18

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/ddee53b10d77
https://hg.mozilla.org/mozilla-central/rev/3f54fe544025
https://hg.mozilla.org/mozilla-central/rev/caea9f8b038e

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Depends on: 1140709

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Updated

•

9 years ago

Flags: needinfo?(benj)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 19

•

9 years ago

Attached patch loadxyz.patch (obsolete) — Details — Splinter Review

Attachment #8579532 - Flags: review?(bhackett1024)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 20

•

9 years ago

Attached patch storexyz.patch (obsolete) — Details — Splinter Review

Attachment #8579533 - Flags: review?(bhackett1024)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 21

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3e1f6b8a37af

Brian Hackett [Laid off!]

Comment 22

•

9 years ago

Comment on attachment 8579532 [details] [diff] [review]
loadxyz.patch

Review of attachment 8579532 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/shared/MacroAssembler-x86-shared.h
@@ +985,5 @@
> +    }
> +    void loadQuadWord(const BaseIndex &src, FloatRegister dest) {
> +        vmovq(Operand(src), dest);
> +    }
> +    void loadThreeDoubleWords(const BaseIndex &src, const BaseIndex &srcZ, FloatRegister dest) {

What is the advantage of having this srcZ parameter?  Currently srcZ will always differ from src by twice the size of whatever type of data is being loaded, and it seems like loadThreeDoubleWords and loadThreeFloat32 could just make a new BaseIndex or Address rather than having it passed in explicitly.  Will the difference between src and srcZ always be set in this way?  If so, removing the parameter would save a lot of complexity in the callers.

Attachment #8579532 - Flags: review?(bhackett1024)

Brian Hackett [Laid off!]

Comment 23

•

9 years ago

Comment on attachment 8579533 [details] [diff] [review]
storexyz.patch

Review of attachment 8579533 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/CodeGenerator.cpp
@@ +8802,5 @@
> +        Address dest(elements, offset);
> +        if (!Scalar::isSimdType(writeType) || numElems != 3) {
> +            StoreToTypedArray(masm, writeType, value, dest, numElems);
> +        } else {
> +            Address destZ(elements, offset + 2 * sizeof(int32_t));

This patch has the same issue with the destZ parameters as the loadXYZ patch, can you fix that here too (if it is appropriate to fix at all)?  Also, why is this 2 * sizeof(int32_t), rather than 2 * width?  (the loadXYZ patch has the same behavior, fwiw)

Attachment #8579533 - Flags: review?(bhackett1024)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 24

•

9 years ago

Attached patch loadXYZ — Details — Splinter Review

Good point! Plus, there was something wrong in the conditionals and one test was actually hitting it...

Attachment #8579532 - Attachment is obsolete: true

Attachment #8579533 - Attachment is obsolete: true

Attachment #8580025 - Flags: review?(bhackett1024)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 25

•

9 years ago

Attached patch storeXYZ — Details — Splinter Review

Attachment #8580026 - Flags: review?(bhackett1024)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 26

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c5eda0e1fc7f

Brian Hackett [Laid off!]

Comment 27

•

9 years ago

Comment on attachment 8580025 [details] [diff] [review]
loadXYZ

Review of attachment 8580025 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/MIR.h
@@ +8850,5 @@
>                         int32_t offsetAdjustment, bool canonicalizeDoubles)
>        : MBinaryInstruction(elements, index),
>          indexType_(indexType),
>          readType_(indexType),
> +        numElems_(0),

numElems_ defaulting to zero is kind of weird.  Could you either add a comment that the field is only used by SIMD, or just default it to one?

::: js/src/jit/shared/MacroAssembler-x86-shared.h
@@ +973,5 @@
>      void zeroInt32x4(FloatRegister dest) {
>          vpxor(dest, dest, dest);
>      }
>  
> +    void loadDoubleWord(const Address &src, FloatRegister dest) {

Using 'Word' in these method names is kind of confusing as it is an architecture dependent term and the use here differs from how we use it in ImmWord.

Maybe make these methods loadInt32x{1,2,3} to match existing methods better?

@@ +1089,5 @@
>      void packedUnsignedRightShiftByScalar(Imm32 count, FloatRegister dest) {
>          vpsrld(count, dest, dest);
>      }
>  
> +    void loadThreeFloat32(const Address &src, FloatRegister dest) {

Similarly, maybe call this loadFloat32x3 to match existing methods better.

Attachment #8580025 - Flags: review?(bhackett1024) → review+

Brian Hackett [Laid off!]

Comment 28

•

9 years ago

Comment on attachment 8580026 [details] [diff] [review]
storeXYZ

Review of attachment 8580026 [details] [diff] [review]:
-----------------------------------------------------------------

All the comments on the loadXYZ patch apply here as well.

Attachment #8580026 - Flags: review?(bhackett1024) → review+

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 29

•

9 years ago

Thanks!
Try build in comment 26, and all platforms build locally with the renamings, so we should be good.

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/266fac2f7b2a
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/44aa05cc400f

Flags: needinfo?(benj)

Benjamin Bouvier [:bbouvier] (inactive)

Assignee

Comment 30

•

9 years ago

(and that will be the last patches for that bug!)

Keywords: leave-open

Phil Ringnalda (:philor)

Comment 31

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/266fac2f7b2a
https://hg.mozilla.org/mozilla-central/rev/44aa05cc400f

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox39: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla39

You need to log in before you can comment on or make changes to this bug.