Closed
Bug 1019831
Opened 10 years ago
Closed 10 years ago
SIMD x86/x64: Regalloc and moves bits
Categories
(Core :: JavaScript Engine: JIT, defect)
Tracking
()
RESOLVED
FIXED
mozilla34
People
(Reporter: bbouvier, Assigned: bbouvier)
References
Details
Attachments
(12 files, 13 obsolete files)
5.36 KB,
patch
|
sunfish
:
review+
|
Details | Diff | Splinter Review |
5.99 KB,
patch
|
sunfish
:
review+
|
Details | Diff | Splinter Review |
8.53 KB,
patch
|
sunfish
:
review+
|
Details | Diff | Splinter Review |
3.73 KB,
patch
|
sunfish
:
review+
|
Details | Diff | Splinter Review |
5.99 KB,
patch
|
sunfish
:
review+
|
Details | Diff | Splinter Review |
4.17 KB,
patch
|
sunfish
:
review+
|
Details | Diff | Splinter Review |
1.79 KB,
patch
|
bbouvier
:
review+
|
Details | Diff | Splinter Review |
7.37 KB,
patch
|
sunfish
:
review+
|
Details | Diff | Splinter Review |
11.51 KB,
patch
|
bbouvier
:
review+
|
Details | Diff | Splinter Review |
18.25 KB,
patch
|
sunfish
:
review+
|
Details | Diff | Splinter Review |
8.54 KB,
patch
|
bbouvier
:
review+
|
Details | Diff | Splinter Review |
2.00 KB,
patch
|
sunfish
:
review+
|
Details | Diff | Splinter Review |
This should be x86 / x64 only and should crash under ARM, for now.
Assignee | ||
Comment 1•10 years ago
|
||
This one just creates a LIR definition for Int32x4, and adds a const bool in Architecture files to assert that we can't fall under ARM in SIMD functions.
Attachment #8433541 -
Flags: feedback?(sunfish)
Assignee | ||
Comment 2•10 years ago
|
||
This one implements quad slots in the stack slot allocator, so that we can use them for register spilling later.
Attachment #8433543 -
Flags: feedback?(sunfish)
Assignee | ||
Comment 3•10 years ago
|
||
Macro assembler and assembler {load,store,move}Int32x4 functions.
Attachment #8433544 -
Flags: feedback?(sunfish)
Assignee | ||
Comment 4•10 years ago
|
||
And finally, MoveResolver and MoveEmitter for Int32x4. I'll update all my patches with your comments and Float32x4 implementation as well in the future.
Attachment #8433545 -
Flags: feedback?(sunfish)
Comment 5•10 years ago
|
||
Comment on attachment 8433541 [details] [diff] [review] SIMD x86-x64: LIR definition Review of attachment 8433541 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/LIR.h @@ +434,5 @@ > OBJECT, // Pointer that may be collected as garbage (GPR). > SLOTS, // Slots/elements pointer that may be moved by minor GCs (GPR). > FLOAT32, // 32-bit floating-point value (FPU). > DOUBLE, // 64-bit floating-point value (FPU). > + INT32X4, // Packed data containing four 32-bit integers (FPU). "Packed data" means a variety of things in tangentially related contexts. I suggest saying "SIMD data". @@ +483,5 @@ > return (Policy)((bits_ >> POLICY_SHIFT) & POLICY_MASK); > } > Type type() const { > + Type ret = (Type)((bits_ >> TYPE_SHIFT) & TYPE_MASK); > + JS_ASSERT_IF(ret == INT32X4, SupportsSIMD); It'd be better to put this assert in the set function, so we catch SIMD types at the point where they are produced, rather than where they are consumed. ::: js/src/jit/arm/Assembler-arm.h @@ +140,5 @@ > static const bool StackKeptAligned = true; > static const uint32_t NativeFrameSize = sizeof(void*); > static const uint32_t AlignmentAtAsmJSPrologue = sizeof(void*); > > +static const bool SupportsSIMD = false; A brief comment saying that SIMD is merely not implemented yet would be good.
Updated•10 years ago
|
Attachment #8433543 -
Flags: feedback?(sunfish) → feedback+
Comment 6•10 years ago
|
||
Comment on attachment 8433544 [details] [diff] [review] SIMD x86-x64: masm / asm instructions for int32x4 moves Review of attachment 8433544 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/assembler/assembler/X86Assembler.h @@ +2843,5 @@ > } > > + void movdqa_rr(XMMRegisterID src, XMMRegisterID dst) > + { > + spew("movdqa %s, %s", Please add spaces to keep the first operand indented consistently. ::: js/src/jit/shared/MacroAssembler-x86-shared.h @@ +470,5 @@ > + movdqa(Operand(src), dest); > + } > + void storeInt32x4(FloatRegister src, const Address &dest) { > + movdqa(src, Operand(dest)); > + } The interesting question here is alignment, since we're eventually going to need unaligned loads and stores too. Will it make sense to name them loadUnalignedInt32x4, or should we rename these aligned functions to loadAlignedInt32x4? I don't have a strong opinion right now. Either way, it'd be nice to have comments explaining which one these are.
Attachment #8433544 -
Flags: feedback?(sunfish)
Comment 7•10 years ago
|
||
Comment on attachment 8433545 [details] [diff] [review] SIMD x86-x64: MoveResolver and MoveEmitter for Int32x4 Review of attachment 8433545 [details] [diff] [review]: ----------------------------------------------------------------- MoveOperand.isFloatReg() returns true for SIMD types, right? If so, characterizeCycle will set allFloatRegs to true for SIMD registers, and maybeEmitOptimizedCycle will do the xor swap for SIMD values too, which is fun. ::: js/src/jit/shared/MoveEmitter-x86-shared.cpp @@ +158,5 @@ > +// SIMD prerequisites for cycle slot's size: > +static const size_t CYCLE_SLOT_SIZE = 4 * sizeof(int32_t); > +JS_STATIC_ASSERT(CYCLE_SLOT_SIZE == 2 * sizeof(double)); > +JS_STATIC_ASSERT(CYCLE_SLOT_SIZE == 4 * sizeof(float)); > +JS_STATIC_ASSERT(CYCLE_SLOT_SIZE == 4 * sizeof(int32_t)); It doesn't matter right now, but these asserts could be >= instead of ==, in case we ever add larger slots.
Attachment #8433545 -
Flags: feedback?(sunfish)
Updated•10 years ago
|
Attachment #8433541 -
Flags: feedback?(sunfish)
Assignee | ||
Comment 8•10 years ago
|
||
(In reply to Dan Gohman [:sunfish] from comment #6) > Comment on attachment 8433544 [details] [diff] [review] > SIMD x86-x64: masm / asm instructions for int32x4 moves > > Review of attachment 8433544 [details] [diff] [review]: > ----------------------------------------------------------------- > The interesting question here is alignment, since we're eventually going to > need unaligned loads and stores too. Will it make sense to name them > loadUnalignedInt32x4, or should we rename these aligned functions to > loadAlignedInt32x4? I don't have a strong opinion right now. Either way, > it'd be nice to have comments explaining which one these are. Actually I was wondering the same thing. Now, explicit is better than implicit, so I would even suggest not to have loadInt32x4 at all, but rather load{Un,}AlignedInt32x4. Otherwise, this would make any of the other two a norm (however if we need to choose a side, the exception should be loadUnalignedInt32x4).
Comment 9•10 years ago
|
||
(In reply to Benjamin Bouvier [:bbouvier] from comment #8) > Actually I was wondering the same thing. Now, explicit is better than > implicit, so I would even suggest not to have loadInt32x4 at all, but rather > load{Un,}AlignedInt32x4. Yeah, that sounds good. Alignment of SIMD values is often enough a concern that it'll be good to make it explicit everywhere. Nice Zen of Python quote, btw :).
Assignee | ||
Comment 10•10 years ago
|
||
See bug 1014083 comment 21. There shall be a pass that checks whether there is at least one SIMD operation, and in this case will set needsSimdStackAlignment for CodeGen.
Attachment #8435635 -
Flags: feedback?(sunfish)
Comment 11•10 years ago
|
||
Comment on attachment 8435635 [details] [diff] [review] SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present Review of attachment 8435635 [details] [diff] [review]: ----------------------------------------------------------------- Is there a reason why we don't want to just align the stack to 16 bytes always, instead of trying to cleverly track when it's needed?
Updated•10 years ago
|
Attachment #8435635 -
Flags: feedback?(sunfish)
Assignee | ||
Comment 12•10 years ago
|
||
(In reply to Dan Gohman [:sunfish] from comment #11) > Is there a reason why we don't want to just align the stack to 16 bytes > always, instead of trying to cleverly track when it's needed? Last time we talked about it, Luke was very against unconditional alignment, to avoid de-optimizing leaf functions in asm.js. I'll let him weigh in here for more details.
Comment 13•10 years ago
|
||
Well, I wouldn't say *very*, just wanted to optimize leaf functions assuming it's not an undue burden. Btw, it seems like we could remove setPerformsAsmJSCall() and needsInitialStackAlignment_: we'd align the stack if gen->performsCall() and, since MAsmJSCall::isCall()=true, performsCall is already set for asm.js calls.
Assignee | ||
Comment 14•10 years ago
|
||
Fixed nits.
Attachment #8433541 -
Attachment is obsolete: true
Attachment #8439343 -
Flags: feedback?(sunfish)
Assignee | ||
Updated•10 years ago
|
Attachment #8433543 -
Attachment description: SIMD x86-x64: Allow stack slot allocator to emit quad slots → 2 - SIMD x86-x64: Allow stack slot allocator to emit quad slots
Assignee | ||
Comment 15•10 years ago
|
||
Fixed remarks.
Attachment #8433544 -
Attachment is obsolete: true
Assignee | ||
Updated•10 years ago
|
Attachment #8439344 -
Flags: feedback?(sunfish)
Assignee | ||
Comment 16•10 years ago
|
||
Attachment #8439345 -
Flags: feedback?(sunfish)
Assignee | ||
Updated•10 years ago
|
Attachment #8433545 -
Attachment is obsolete: true
Assignee | ||
Comment 17•10 years ago
|
||
I would like to make some measurements in a separate bug before having it aligned in all cases, as I think we'd also need the stack top to be aligned, to avoid cache lines faults. In the meanwhile, what do you think of the current patch?
Attachment #8435635 -
Attachment is obsolete: true
Attachment #8439349 -
Flags: feedback?(sunfish)
Assignee | ||
Comment 18•10 years ago
|
||
Float32x4 bundle.
Attachment #8439350 -
Flags: feedback?(sunfish)
Comment 19•10 years ago
|
||
Comment on attachment 8439343 [details] [diff] [review] 1 - SIMD x86-x64: LIR definition Review of attachment 8439343 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/LIR.h @@ +434,5 @@ > OBJECT, // Pointer that may be collected as garbage (GPR). > SLOTS, // Slots/elements pointer that may be moved by minor GCs (GPR). > FLOAT32, // 32-bit floating-point value (FPU). > DOUBLE, // 64-bit floating-point value (FPU). > + INT32X4, // SIMD data containing four 32-bit integers (FPU). Nit: It'd be nice to keep all the SIMD types next to each other. @@ +447,5 @@ > }; > > void set(uint32_t index, Type type, Policy policy) { > JS_STATIC_ASSERT(MAX_VIRTUAL_REGISTERS <= VREG_MASK); > + JS_ASSERT_IF(type == INT32X4, SupportsSIMD); This looks like it should use isSIMDType, or whatever it's called, rather than just checking for INT32X4. Or if we don't yet have such a function, it would be nice to create it :-).
Attachment #8439343 -
Flags: feedback?(sunfish) → feedback+
Comment 20•10 years ago
|
||
Comment on attachment 8439344 [details] [diff] [review] 3 - SIMD x86-x64: masm / asm instructions for int32x4 moves Review of attachment 8439344 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/assembler/assembler/X86Assembler.h @@ +2846,5 @@ > + { > + spew("movdqa %s, %s", > + nameFPReg(src), nameFPReg(dst)); > + m_formatter.prefix(PRE_SSE_F2); > + m_formatter.twoByteOp(OP2_MOVDQ_VsdWsd, (RegisterID)dst, (RegisterID)src); Should this be OP2_MOVQ_VdqWdq?
Attachment #8439344 -
Flags: feedback?(sunfish) → feedback+
Updated•10 years ago
|
Attachment #8439345 -
Flags: feedback?(sunfish) → feedback+
Comment 21•10 years ago
|
||
Comment on attachment 8439350 [details] [diff] [review] 6 - Same things for Float32x4 Review of attachment 8439350 [details] [diff] [review]: ----------------------------------------------------------------- One related thing that'd be nice to do is teach characterizeCycle and maybeEmitOptimizedCycle to use xorps or pxor instead of xorpd if all the moves are FLOAT32x4 or INT32x4 respectively. It isn't critical though, and doesn't have to happen in this patch series.
Attachment #8439350 -
Flags: feedback?(sunfish) → feedback+
Comment 22•10 years ago
|
||
Comment on attachment 8439349 [details] [diff] [review] 5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present Review of attachment 8439349 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/MIRGenerator.h @@ +117,5 @@ > } > bool performsCall() const { > return performsCall_; > } > + void setUsesSimd() { setUsesSimd() doesn't appear to be called anywhere. Also, this appears to be the only place where usesSimd_ is set to true, so with this patch, it appears it will never be set. ::: js/src/jit/arm/Assembler-arm.h @@ +145,5 @@ > // this architecture or not. Rather than a method in the LIRGenerator, it is > // here such that it is accessible from the entire codebase. Once full support > // for SIMD is reached on all tier-1 platforms, this constant can be deleted. > static const bool SupportsSIMD = false; > +static const bool SimdStackAlignment = 8; // TODO? If you plan to leave this TODO here for a while, it'd be good to have a little blurb explaining what needs to be done. e.g. // TODO: Does ARM want its stack 16-byte-aligned for SIMD?
Attachment #8439349 -
Flags: feedback?(sunfish) → feedback-
Assignee | ||
Comment 23•10 years ago
|
||
(In reply to Dan Gohman [:sunfish] from comment #19) > Comment on attachment 8439343 [details] [diff] [review] > 1 - SIMD x86-x64: LIR definition > > Review of attachment 8439343 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: js/src/jit/LIR.h > @@ +434,5 @@ > > OBJECT, // Pointer that may be collected as garbage (GPR). > > SLOTS, // Slots/elements pointer that may be moved by minor GCs (GPR). > > FLOAT32, // 32-bit floating-point value (FPU). > > DOUBLE, // 64-bit floating-point value (FPU). > > + INT32X4, // SIMD data containing four 32-bit integers (FPU). > > Nit: It'd be nice to keep all the SIMD types next to each other. Huh, I am confused: as of this patch, there is only INT32X4? > This looks like it should use isSIMDType, or whatever it's called, rather > than just checking for INT32X4. Or if we don't yet have such a function, it > would be nice to create it :-). Done (this was in a later patch).
Assignee | ||
Comment 24•10 years ago
|
||
(In reply to Dan Gohman [:sunfish] from comment #20) > Comment on attachment 8439344 [details] [diff] [review] > 3 - SIMD x86-x64: masm / asm instructions for int32x4 moves > > Review of attachment 8439344 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: js/src/assembler/assembler/X86Assembler.h > @@ +2846,5 @@ > > + { > > + spew("movdqa %s, %s", > > + nameFPReg(src), nameFPReg(dst)); > > + m_formatter.prefix(PRE_SSE_F2); > > + m_formatter.twoByteOp(OP2_MOVDQ_VsdWsd, (RegisterID)dst, (RegisterID)src); > > Should this be OP2_MOVQ_VdqWdq? Good question. I'd like to keep MOVDQ rather than MOVQ (to avoid confusion with (single) quad-words operations). As it operates on int32x4, VdqWdq is the right name. So, OP2_MOVDQ_VdqWdq? I've also changed existing uses of the reciprocal operation (WsdVsd) to follow that convention too.
Assignee | ||
Comment 25•10 years ago
|
||
We do actually need these in some places in Odin where we don't have control on the caller's stack pointer. (and if we don't actually need these, i'd prefer to have that done in follow up bugs)
Attachment #8461660 -
Flags: review?(sunfish)
Assignee | ||
Comment 26•10 years ago
|
||
Attachment #8461663 -
Flags: review?(sunfish)
Comment 27•10 years ago
|
||
Comment on attachment 8461660 [details] [diff] [review] Unaligned moves for Int32x4 Review of attachment 8461660 [details] [diff] [review]: ----------------------------------------------------------------- LGTM.
Attachment #8461660 -
Flags: review?(sunfish) → review+
Updated•10 years ago
|
Attachment #8461663 -
Flags: review?(sunfish) → review+
Assignee | ||
Comment 28•10 years ago
|
||
An enhanced version of sp alignment in Codegen shared: the attached MIRGraph will be scanned and if there's at least one SIMD instruction, alignment will be forced. To avoid scanning several times, the result value is cached.
Attachment #8439349 -
Attachment is obsolete: true
Attachment #8463356 -
Flags: review?(sunfish)
Comment 29•10 years ago
|
||
Comment on attachment 8463356 [details] [diff] [review] 5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present Review of attachment 8463356 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/MIRGraph.cpp @@ +48,5 @@ > + end = graph_->rpoEnd(); > + block != end; > + block++) > + { > + for (MInstructionIterator inst = block->begin(); inst != block->end(); inst++) { It's ok to use MInstructionIterator here because you don't need to worry about phis, since any reachable phi (or phi cycle) will have at least one instruction as an input. Please add a comment to this effect. @@ +49,5 @@ > + block != end; > + block++) > + { > + for (MInstructionIterator inst = block->begin(); inst != block->end(); inst++) { > + if (IsSimdType(inst->type())) { It's ok to ignore instructions which use SIMD but don't have a SIMD result type (e.g. stores) because they'll have at least one SIMD operand so there will be an instruction somewhere with a SIMD result type. At least, this is true now and for the forseeable future. Please comment this too. ::: js/src/jit/shared/CodeGenerator-shared.cpp @@ +74,5 @@ > if (unsigned rem = alignmentAtCall % StackAlignment) > frameDepth_ += StackAlignment - rem; > + > + if (gen->usesSimd()) { > + // At this point, we have frameDepth_ == -AlignmentAtAsmJSPrologue % StackAlignment, I don't understand this comment. frameDepth_ is some function of how many slots the function used, and other things, so it isn't always going to be less than StackAlignment here. I think you want something with % StackAlignment == 0. @@ +84,5 @@ > + frameInitialAdjustment_ = frameDepth_ % SimdStackAlignment; > + // We need at least the frame depth that was available before all > + // adjustments. > + if (frameDepth_ - frameInitialAdjustment_ < initialFrameDepth) > + frameDepth_ += StackAlignment; I don't quite understand this code. Is this assuming that at most one StackAlignment is needed to reach Simd alignment? It happens that SimdStackAlignment is at most 16 and StackAlignment is 8, so this is true, but it's not obvious.
Assignee | ||
Comment 30•10 years ago
|
||
(In reply to Dan Gohman [:sunfish] from comment #29) > Comment on attachment 8463356 [details] [diff] [review] > 5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD > instructions are present > > Review of attachment 8463356 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: js/src/jit/MIRGraph.cpp > @@ +48,5 @@ > > + end = graph_->rpoEnd(); > > + block != end; > > + block++) > > + { > > + for (MInstructionIterator inst = block->begin(); inst != block->end(); inst++) { > > It's ok to use MInstructionIterator here because you don't need to worry > about phis, since any reachable phi (or phi cycle) will have at least one > instruction as an input. Please add a comment to this effect. > > @@ +49,5 @@ > > + block != end; > > + block++) > > + { > > + for (MInstructionIterator inst = block->begin(); inst != block->end(); inst++) { > > + if (IsSimdType(inst->type())) { > > It's ok to ignore instructions which use SIMD but don't have a SIMD result > type (e.g. stores) because they'll have at least one SIMD operand so there > will be an instruction somewhere with a SIMD result type. At least, this is > true now and for the forseeable future. Please comment this too. Good points. I've added comments. > > ::: js/src/jit/shared/CodeGenerator-shared.cpp > @@ +74,5 @@ > > if (unsigned rem = alignmentAtCall % StackAlignment) > > frameDepth_ += StackAlignment - rem; > > + > > + if (gen->usesSimd()) { > > + // At this point, we have frameDepth_ == -AlignmentAtAsmJSPrologue % StackAlignment, > > I don't understand this comment. frameDepth_ is some function of how many > slots the function used, and other things, so it isn't always going to be > less than StackAlignment here. I think you want something with % > StackAlignment == 0. Hum, maybe just a notation nit: A == -B % C should be read "C divides A+B" here (like in math notation). I understand that there is some confusion with C syntax, so I've made it more explicit by just having (A+B) % C == 0. > > @@ +84,5 @@ > > + frameInitialAdjustment_ = frameDepth_ % SimdStackAlignment; > > + // We need at least the frame depth that was available before all > > + // adjustments. > > + if (frameDepth_ - frameInitialAdjustment_ < initialFrameDepth) > > + frameDepth_ += StackAlignment; > > I don't quite understand this code. Is this assuming that at most one > StackAlignment is needed to reach Simd alignment? It happens that > SimdStackAlignment is at most 16 and StackAlignment is 8, so this is true, > but it's not obvious. Wow, good catch. I think it's just a typo and I had in mind SimdStackAlignment. How about instead using SimdStackAlignment, and static assert above that SimdStackAlignment % StackAlignment == 0? (true on all platforms: x86,x64: 16 % (16 or 4), arm: 8 % 8) (if it were not for cases where StackAlignment == 4, we could get rid of SimdStackAlignment, as it's usually equal to StackAlignment)
Flags: needinfo?(sunfish)
Assignee | ||
Updated•10 years ago
|
Attachment #8463356 -
Flags: review?(sunfish)
Comment 31•10 years ago
|
||
(In reply to Benjamin Bouvier [:bbouvier] from comment #30) > > ::: js/src/jit/shared/CodeGenerator-shared.cpp > > @@ +74,5 @@ > > > if (unsigned rem = alignmentAtCall % StackAlignment) > > > frameDepth_ += StackAlignment - rem; > > > + > > > + if (gen->usesSimd()) { > > > + // At this point, we have frameDepth_ == -AlignmentAtAsmJSPrologue % StackAlignment, > > > > I don't understand this comment. frameDepth_ is some function of how many > > slots the function used, and other things, so it isn't always going to be > > less than StackAlignment here. I think you want something with % > > StackAlignment == 0. > > Hum, maybe just a notation nit: A == -B % C should be read "C divides A+B" > here (like in math notation). I understand that there is some confusion with > C syntax, so I've made it more explicit by just having (A+B) % C == 0. Oh, I get it now. I do think that C syntax is best in this context; thanks for changing this. > > > > @@ +84,5 @@ > > > + frameInitialAdjustment_ = frameDepth_ % SimdStackAlignment; > > > + // We need at least the frame depth that was available before all > > > + // adjustments. > > > + if (frameDepth_ - frameInitialAdjustment_ < initialFrameDepth) > > > + frameDepth_ += StackAlignment; > > > > I don't quite understand this code. Is this assuming that at most one > > StackAlignment is needed to reach Simd alignment? It happens that > > SimdStackAlignment is at most 16 and StackAlignment is 8, so this is true, > > but it's not obvious. > > Wow, good catch. I think it's just a typo and I had in mind > SimdStackAlignment. How about instead using SimdStackAlignment, and static > assert above that SimdStackAlignment % StackAlignment == 0? (true on all > platforms: x86,x64: 16 % (16 or 4), arm: 8 % 8) Yeah, that's a reasonable static_assert. I'm still a little confused though. The code for aligning to plain StackAlignment does this: if (unsigned rem = alignmentAtCall % StackAlignment) frameDepth_ += StackAlignment - rem; Why doesn't the code for aligning to SimdStackAlignment have to do something similar? > (if it were not for cases where StackAlignment == 4, we could get rid of > SimdStackAlignment, as it's usually equal to StackAlignment) Yeah, this is unfortunately still relevant.
Flags: needinfo?(sunfish)
Assignee | ||
Comment 32•10 years ago
|
||
(In reply to Dan Gohman [:sunfish] from comment #31) > I'm still a little confused though. The code for aligning to plain > StackAlignment does this: > > if (unsigned rem = alignmentAtCall % StackAlignment) > frameDepth_ += StackAlignment - rem; > > Why doesn't the code for aligning to SimdStackAlignment have to do something > similar? I'll try to explain it to make it clearer for everyone (including me :)) The constraints we have are the following: - (frameDepth_ + AsmJSFrameSize) % StackAlignment == 0 (1 -- asmjs call alignment) - (frameDepth_ - frameInitialAdjustment_) % SimdStackAlignment == 0 (2 -- SIMD operand alignment) - (frameDepth_ - frameInitialAdjustment_) >= initialFrameDepth (3 -- the amount of stack depth must be sufficient) (3) tells us that frameInitialAdjustment_ := frameDepth_ % SimdStackAlignment is sufficient. Let's take a concrete example. On x86, with StackAlignment := 16 (assume GCC), if frameDepth_ := 16k + 8, then there's nothing to add to frameDepth_ for the asmjs call restriction (as AsmJSFrameSize is 2 * sizeof(void*) == 8), but frameInitialAdjustment == 8, so (frameDepth_ - 8) == 16k < 16k + 8, so (2) is not satisfied anymore. More generally, under these conditions (x86 and gcc), if frameDepth_ := 16k + r, frameInitialAdjustment_ ends up being 8 all the time and frameDepth_ - frameInitialAdjustment_ < initial frameDepth if r == 4 or r == 8. We actually need to add SimdStackAlignment if the first frame depth fix up is less than the initial adjustment, I think.
Assignee | ||
Comment 33•10 years ago
|
||
Different wording, same principle. As the comment makes this function way fatter than it should be, I moved it to a setupSimdAlignment function, which makes the constructor slimmer.
Attachment #8463356 -
Attachment is obsolete: true
Attachment #8465589 -
Flags: review?(sunfish)
Assignee | ||
Comment 34•10 years ago
|
||
This is the last thing needed for Odin support, at least: every time we use PushRegsInMask (or PopRegsInMask), we need to know whether we want to push 64 or 128 bits of a float register on the stack. The proper long term fix is to have the TypedRegisterSet<FloatRegister> contain a type information about the float register. More precisely, this type information could initially fit in a bit, indicating whether the type is 64 bits or 128 bits. Ideally, we'd want to indicate whether the type is float32, float64, or which SIMD type it is (to use the right SIMD move).
Attachment #8467595 -
Flags: review?(sunfish)
Comment 35•10 years ago
|
||
Comment on attachment 8467595 [details] [diff] [review] 7 - Add an optional simdSet parameter to push and pop regs in mask Review of attachment 8467595 [details] [diff] [review]: ----------------------------------------------------------------- (In reply to Benjamin Bouvier [:bbouvier] from comment #34) > The proper long term fix is to have the TypedRegisterSet<FloatRegister> > contain a type information about the float register. More precisely, this > type information could initially fit in a bit, indicating whether the type > is 64 bits or 128 bits. Ideally, we'd want to indicate whether the type is > float32, float64, or which SIMD type it is (to use the right SIMD move). Yeah, this sounds nice. Your current patch looks fine for now though.
Attachment #8467595 -
Flags: review?(sunfish) → review+
Comment 36•10 years ago
|
||
Comment on attachment 8465589 [details] [diff] [review] 5 - SIMD x86-x64: Align stack top on 16 bytes boundaries if SIMD instructions are present Review of attachment 8465589 [details] [diff] [review]: ----------------------------------------------------------------- Cool, this looks good.
Attachment #8465589 -
Flags: review?(sunfish) → review+
Assignee | ||
Comment 37•10 years ago
|
||
Attachment #8439343 -
Attachment is obsolete: true
Attachment #8468483 -
Flags: review?(sunfish)
Assignee | ||
Comment 38•10 years ago
|
||
Attachment #8433543 -
Attachment is obsolete: true
Attachment #8468484 -
Flags: review?(sunfish)
Assignee | ||
Comment 39•10 years ago
|
||
Attachment #8439344 -
Attachment is obsolete: true
Attachment #8468485 -
Flags: review?(sunfish)
Assignee | ||
Updated•10 years ago
|
Attachment #8468484 -
Attachment description: bug1019831-2-stackslot.patch → 2 - SIMD x86-x64: Allow stack slot allocator to emit quad slots
Assignee | ||
Comment 41•10 years ago
|
||
Coming from bug 1023404, carrying forward r+ from sunfish.
Attachment #8468489 -
Flags: review+
Assignee | ||
Comment 42•10 years ago
|
||
Attachment #8439345 -
Attachment is obsolete: true
Attachment #8468492 -
Flags: review?(sunfish)
Comment 43•10 years ago
|
||
Comment on attachment 8468483 [details] [diff] [review] 1 - SIMD x86-x64: LIR definition Review of attachment 8468483 [details] [diff] [review]: ----------------------------------------------------------------- LGTM. ::: js/src/jit/LIR.h @@ +446,5 @@ > > void set(uint32_t index, Type type, Policy policy) { > JS_STATIC_ASSERT(MAX_VIRTUAL_REGISTERS <= VREG_MASK); > bits_ = (index << VREG_SHIFT) | (policy << POLICY_SHIFT) | (type << TYPE_SHIFT); > + JS_ASSERT_IF(isSimdType(), SupportsSimd); This would be slightly nicer as JS_ASSERT_IF(!SupportsSimd, !isSimdType()) since SupportsSimd is the less varying of the two expressions.
Attachment #8468483 -
Flags: review?(sunfish) → review+
Updated•10 years ago
|
Attachment #8468484 -
Flags: review?(sunfish) → review+
Assignee | ||
Comment 44•10 years ago
|
||
carrying forward r+ from sunfish
Attachment #8465589 -
Attachment is obsolete: true
Attachment #8468496 -
Flags: review+
Assignee | ||
Comment 45•10 years ago
|
||
Attachment #8439350 -
Attachment is obsolete: true
Attachment #8468498 -
Flags: review?(sunfish)
Assignee | ||
Comment 46•10 years ago
|
||
carrying forward r+ from sunfish
Attachment #8467595 -
Attachment is obsolete: true
Attachment #8468503 -
Flags: review+
Updated•10 years ago
|
Attachment #8468485 -
Flags: review?(sunfish) → review+
Updated•10 years ago
|
Attachment #8468487 -
Flags: review?(sunfish) → review+
Comment 47•10 years ago
|
||
Comment on attachment 8468492 [details] [diff] [review] 6 - SIMD x86-x64: MoveResolver and MoveEmitter for Int32x4 Review of attachment 8468492 [details] [diff] [review]: ----------------------------------------------------------------- MoveEmitterX86::maybeEmitOptimizedCycle currently emits an xorpd sequence if all moves are "isFloatReg()" and other conditions hold. Ideally characterizeCycle should be extended to look at the move types too, so that maybeEmitOptimizedCycle can emit pxor or xorps as appropriate. If you don't want to do that in this patch, we can do it later; xorpd is functionally correct here. ::: js/src/jit/shared/MoveEmitter-x86-shared.cpp @@ +282,5 @@ > // saved value of B, to A. > switch (type) { > + case MoveOp::INT32X4: > + JS_ASSERT(pushedAtCycle_ != -1); > + JS_ASSERT(pushedAtCycle_ - pushedAtStart_ >= 4 * sizeof(int32_t)); Instead of computing 4 * sizeof(int32_t), could this use Simd128Size (or whatever it's called)?
Attachment #8468492 -
Flags: review?(sunfish) → review+
Updated•10 years ago
|
Attachment #8468498 -
Flags: review?(sunfish) → review+
Assignee | ||
Comment 48•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=7cc3f4f2a4e8
Comment 49•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/b0117ac74441 https://hg.mozilla.org/mozilla-central/rev/6120f6d2c90e https://hg.mozilla.org/mozilla-central/rev/113476b5c52b https://hg.mozilla.org/mozilla-central/rev/9c8ed786f307 https://hg.mozilla.org/mozilla-central/rev/a59517b4353b https://hg.mozilla.org/mozilla-central/rev/8333f43d2bfe https://hg.mozilla.org/mozilla-central/rev/b8887016ec4f https://hg.mozilla.org/mozilla-central/rev/c18202c2ccd0 https://hg.mozilla.org/mozilla-central/rev/7d603367bda5 https://hg.mozilla.org/mozilla-central/rev/55117245d4ac https://hg.mozilla.org/mozilla-central/rev/7cc3f4f2a4e8
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Assignee | ||
Comment 50•10 years ago
|
||
For completeness and consistency, these should be in this bug too. These are just stubs for the load{Una,A}ligned{Int,Float}32x4 methods.
Attachment #8472204 -
Flags: review?(sunfish)
Updated•10 years ago
|
Attachment #8472204 -
Flags: review?(sunfish) → review+
You need to log in
before you can comment on or make changes to this bug.
Description
•