Closed Bug 916167 Opened 12 years ago Closed 12 years ago

Zero destination register before cvtsi2sd to avoid dependency stalls

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla27

People

(Reporter: jandem, Assigned: sunfish)

Details

Attachments

(3 files, 2 obsolete files)

V8 recently landed a patch to do this and it was about a 24% win on Kraken audio-oscillator. I just added a xorps call to AssemblerX86Shared::cvtsi2sd and we get a similar win, so let's do this. See also bug 474654, a similar bug for nanojit/TM. A quote from Moh (Intel): “When cvtsi2sd is used for int->double conversion, the destination register is only partially written. Register renaming cannot properly handle the subsequent instructions that use the value. The "partial register stall" problem gets worse when you have it in a loop, for which case false cross-dependence relations are created. A solution is to clear out the destination register before the cvtsi2sd instruction using pxor.”
Attached patch dependence-breaking.patch (obsolete) — Splinter Review
Here's a patch which implements this.
Assignee: general → sunfish
Attachment #805166 - Flags: review?(jdemooij)
And here's a patch which implements a related feature, using movaps/movapd for register-to-register moves, since register-to-register movss/movsd don't write the whole destination register.
Attachment #805167 - Flags: review?(jdemooij)
Comment on attachment 805166 [details] [diff] [review] dependence-breaking.patch Review of attachment 805166 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for taking this! This is what I used to quickly test this in comment 0, but it's probably better to add masm.xorps/xorpd calls to convertInt32ToDouble and convertInt32ToFloat, and call these functions where we call masm.cvt* now. When you see masm.cvtsi2sd you expect it to only emit a cvtsi2sd instruction I think, and there are probably cases where the extra xor is not necessary..
Attachment #805166 - Flags: review?(jdemooij)
Comment on attachment 805167 [details] [diff] [review] dependence-breaking-moves.patch Review of attachment 805167 [details] [diff] [review]: ----------------------------------------------------------------- Nice find! ::: js/src/jit/shared/Assembler-x86-shared.h @@ +411,5 @@ > void movsd(const FloatRegister &src, const FloatRegister &dest) { > JS_ASSERT(HasSSE2()); > + // Use movapd instead of movsd so that we define the entire output > + // register and avoid false dependencies. > + masm.movapd_rr(src.code(), dest.code()); It would be nice if we could rename movsd(reg, reg) to movapd(reg, reg), but that name won't work for movsd(operand, reg). So maybe it's best to be consistent and use movsd everywhere else like you're doing here... It's a bit unfortunate because we're not really emitting a movsd instruction but I can't think of a better solution. We could rename movsd(operand, reg) to loadDouble but a reg => reg move is not really a load and loadDouble will probably clash with loadDouble in the MacroAssembler... What do you think?
Attachment #805167 - Flags: review?(jdemooij) → review+
> ::: js/src/jit/shared/Assembler-x86-shared.h > @@ +411,5 @@ > > void movsd(const FloatRegister &src, const FloatRegister &dest) { > > JS_ASSERT(HasSSE2()); > > + // Use movapd instead of movsd so that we define the entire output > > + // register and avoid false dependencies. > > + masm.movapd_rr(src.code(), dest.code()); > > It would be nice if we could rename movsd(reg, reg) to movapd(reg, reg), but > that name won't work for movsd(operand, reg). So maybe it's best to be > consistent and use movsd everywhere else like you're doing here... > > It's a bit unfortunate because we're not really emitting a movsd > instruction but I can't think of a better solution. We could rename > movsd(operand, reg) to loadDouble but a reg => reg move is not really a load > and loadDouble will probably clash with loadDouble in the MacroAssembler... > What do you think? I think this use of movsd is consistent with the way the code is already using it. I agree that it's a little surprising that an interface named movsd doesn't just do a movsd, but I think the problem is that movsd is currently being used as a general-purpose move interface. I'd like to migrate users of movsd over to loadDouble/storeDouble/moveDouble and make those be the primary move interface, so that movsd can go back to just being the movsd instruction. I'll look into this and submit a separate patch. On a related note, I think it's odd that loadDouble and storeDouble are overloaded to take Operands, since Operands can be registers too. What if we make loadDouble and storeDouble just load and store, and give moveDouble overloads for Operand instead?
Whiteboard: [leave open]
(In reply to Dan Gohman [:sunfish] from comment #5) > I think this use of movsd is consistent with the way the code is already > using it. I agree that it's a little surprising that an interface named > movsd doesn't just do a movsd, but I think the problem is that movsd is > currently being used as a general-purpose move interface. I'd like to > migrate users of movsd over to loadDouble/storeDouble/moveDouble and make > those be the primary move interface, so that movsd can go back to just being > the movsd instruction. I'll look into this and submit a separate patch. Sounds good. > On a related note, I think it's odd that loadDouble and storeDouble are > overloaded to take Operands, since Operands can be registers too. What if we > make loadDouble and storeDouble just load and store, and give moveDouble > overloads for Operand instead? We don't use "move" for loads/stores usually (for instance movePtr/move32 should not access memory) but moveDouble is fine here I think, I can't come up with a better name :)
(In reply to Jan de Mooij [:jandem] from comment #6) > We don't use "move" for loads/stores usually (for instance movePtr/move32 > should not access memory) but moveDouble is fine here I think, I can't come > up with a better name :) Hm on second thought, I see: void load32(const Operand &src, Register dest) { movl(src, dest); } And not move32(Operand, Register). So maybe it's best to keep loadDouble/storeDouble with Operand dest/source for consistency...
Attached patch dependence-breaking-reorg.patch (obsolete) — Splinter Review
(In reply to Jan de Mooij [:jandem] from comment #7) > (In reply to Jan de Mooij [:jandem] from comment #6) > > We don't use "move" for loads/stores usually (for instance movePtr/move32 > > should not access memory) but moveDouble is fine here I think, I can't come > > up with a better name :) > > Hm on second thought, I see: > > void load32(const Operand &src, Register dest) { > movl(src, dest); > } > > And not move32(Operand, Register). So maybe it's best to keep > loadDouble/storeDouble with Operand dest/source for consistency... What if we renamed the Operand form of load32 to move32 also? Attached is a proof-of-concept patch which implements the loadDouble/storeDouble changes and also implements this renaming for all kinds of moves.
(In reply to Jan de Mooij [:jandem] from comment #3) > Comment on attachment 805166 [details] [diff] [review] > dependence-breaking.patch > > Review of attachment 805166 [details] [diff] [review]: > ----------------------------------------------------------------- > > Thanks for taking this! > > This is what I used to quickly test this in comment 0, but it's probably > better to add masm.xorps/xorpd calls to convertInt32ToDouble and > convertInt32ToFloat, and call these functions where we call masm.cvt* now. > When you see masm.cvtsi2sd you expect it to only emit a cvtsi2sd instruction > I think, and there are probably cases where the extra xor is not necessary.. Ok. Here's a new version of this patch which implements this, and converts most users over to convertInt32ToDouble.
Attachment #805166 - Attachment is obsolete: true
Attachment #805793 - Flags: review?(jdemooij)
Comment on attachment 805793 [details] [diff] [review] dependence-breaking.patch Review of attachment 805793 [details] [diff] [review]: ----------------------------------------------------------------- Nice.
Attachment #805793 - Flags: review?(jdemooij) → review+
(In reply to Dan Gohman [:sunfish] from comment #8) > What if we renamed the Operand form of load32 to move32 also? Attached is a > proof-of-concept patch which implements the loadDouble/storeDouble changes > and also implements this renaming for all kinds of moves. Hm I don't know, the Operand versions are used most of the time for memory locations I think and I'd really like to have load/store in the name (also to keep people from introducing bug 812280 again). Problem is that neither move vs load/store is correct in all cases :(
(In reply to Jan de Mooij [:jandem] from comment #11) > Hm I don't know, the Operand versions are used most of the time for memory > locations I think and I'd really like to have load/store in the name (also > to keep people from introducing bug 812280 again). Problem is that neither > move vs load/store is correct in all cases :( However, this problem goes way beyond move. What about add, or sub? Do they dereference their Operand, in the sence of bug 812280? This kind of thing is common throughout the codebase. I don't think the names "load" and "store" support a viable mental model for reasoning about what things are dereferenced and what aren't. A mental model that I do think works is that an Operand can be talking about either memory or an undereferenced register, as determined by its kind(), and that any instruction which takes an Operand performs whatever kind of access its kind() implies, regardless of whether the operation is add, sub, or move. In the current code, its more confusing than it needs to be since Operand(reg) is a register and Operand(reg, 0) is memory, but this can be fixed. Attached is a new patch which takes the earlier patch and takes it a step further. It removes the confusing constructors from Operator, and introduces a new useful invariant: In order to talk to memory in the MacroAssembler API, you have to construct an Address, BaseIndex, or AbsoluteAddress (with the exception of some special interfaces on ARM). If you don't do one of those things, you're not talking to memory. If you do do one of those things, you are talking to memory (with the exception of lea/computeEffectiveAddress, which remains special). It only works on x64 at the moment, and it probably depends on some other cleanup patches in my tree, but I can clean up those details if you think this approach is worth pursuing.
Attachment #805779 - Attachment is obsolete: true
The initial movaps/movapd patch checked in here: https://hg.mozilla.org/integration/mozilla-inbound/rev/46c82e844637
The cvtsi2sd/cvtsi2ss patch checked in here: https://hg.mozilla.org/integration/mozilla-inbound/rev/a4bf8390deb0 I'll open a new bug for the changes under discussion.
Whiteboard: [leave open]
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
For the record, going back over awfy graphs, I think this patch was responsible for a 10% win on box2d on x64.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: