Closed Bug 876064 Opened 12 years ago Closed 12 years ago

IonMonkey: Use a double constant pool on x86/x64, fold loads into operands

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla25

People

(Reporter: luke, Unassigned)

References

Details

Attachments

(2 files, 1 obsolete file)

Attached patch use rip-relative loads (WIP) (obsolete) — Splinter Review
As a first step, we could load doubles into xmm registers with a single instruction instead of what we have now pcmpeqw %xmm1,%xmm1 psllq $0x36,%xmm1 psrlq $0x2,%xmm1 or movabsq $0x3ff0000000000000, %r11 movq %r11, %xmm1 I have an old, simple WIP patch (attached) that does this for x64 using rip-relative loads, but we could do the same thing with normal loads on x86. Better than that, we could fold this constant-double-load into the instructions that consume them (like we do with integer immediates with emitAtUses). This involves converting a bunch of double arith functions to useRegisterOrConstant(). One subtlety, though, is that, unlike with integer literals, several instructions using the same double constant, we probably want to load the constant into a register so it can be reused. Ideally, we'd integrate this with regalloc so that we don't unnecessarily extend live ranges. Jan, how hard would it be to get the regalloc to do this? This seems related to what we do with STACK_SLOTs.
Summary: IonMonkey: Use a constant pool on x86/x64, fold loads into operands → IonMonkey: Use a double constant pool on x86/x64, fold loads into operands
Depends on: 885175
This takes the NonassertingLabel part of the original patch, updates it, and replaces the HeapLabel class, which had been introduced in the meantime to serve the same purpose.
Attachment #768943 - Flags: review?(luke)
Attached patch the main patchSplinter Review
This patch updates the original patch to implement rip-relative constant-pool loads for double constants. It also makes a few minor cleanups for x86 for consistency with x64.
Attachment #754034 - Attachment is obsolete: true
Attachment #768945 - Flags: review?(luke)
Attachment #768943 - Flags: review?(luke) → review+
Comment on attachment 768945 [details] [diff] [review] the main patch Review of attachment 768945 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, but probalby sstangl or jan or hannes should review. ::: js/src/ion/x64/MacroAssembler-x64.cpp @@ +27,5 @@ > + return; > + } > + size_t doubleIndex; > + DoubleMap::AddPtr p = doubleMap_.lookupForAdd(d); > + if (p) { You can write if (DoubleMap::AddPtr p = ...) (and p will be in scope for the 'else').
Attachment #768945 - Flags: review?(luke) → review?(sstangl)
Comment on attachment 768945 [details] [diff] [review] the main patch Review of attachment 768945 [details] [diff] [review]: ----------------------------------------------------------------- Looks good. Could we measure effect on benchmarks before landing? ::: js/src/ion/x64/MacroAssembler-x64.cpp @@ +52,5 @@ > + > +void > +MacroAssemblerX64::finish() > +{ > + if (!doubles_.empty()) It would be useful here to JS_STATIC_ASSERT that CodeAlignment >= sizeof(double). ::: js/src/ion/x64/MacroAssembler-x64.h @@ +49,5 @@ > + double value; > + NonAssertingLabel uses; > + Double(double value) : value(value) {} > + }; > + Vector<Double, 0, SystemAllocPolicy> doubles_; SystemAllocPolicy allocates into the heap, while IonAllocPolicy allocates into a (logically) thread-local temporary buffer that persists until the end of compilation. Can we use IonAllocPolicy here and below? ::: js/src/ion/x86/MacroAssembler-x86.cpp @@ +17,5 @@ > > void > MacroAssemblerX86::loadConstantDouble(double d, const FloatRegister &dest) > { > + if (maybeInlineDouble(mozilla::BitwiseCast<uint64_t>(d), dest)) It may be nicer for maybeInlineDouble to take a double, performing the BitwiseCast internally.
Attachment #768945 - Flags: review?(sstangl) → review+
I don't see any change on the octane benchmark, but I do see speedups on microbenchmarks like the one in bug 876057.
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/91851a7f561a for rather a lot of asm.js test failures.
It appears IonAllocPolicy's heap doesn't persist until the end of compilation; asm.js frees it after each function. Is it ok to switch back to SystemAllocPolicy for the constant pool, or would you like me to look for a different approach?
(In reply to Dan Gohman from comment #8) > It appears IonAllocPolicy's heap doesn't persist until the end of > compilation; asm.js frees it after each function. Is it ok to switch back to > SystemAllocPolicy for the constant pool, or would you like me to look for a > different approach? AsmJS doesn't destruct the MacroAssembler after each function; instead, it uses resetForNewCodeGenerator(). I'd prefer to keep the memory in the temporary buffers if it's not a hassle. Would it be reasonable to make constant pool emission an explicit step during linking? Then we could include constant pool emission and label binding as a step in CodeGenerator::generateAsmJS(), used via GenerateAsmJSCode().
AsmJS does push a LifoAllocScope around each function compilation though, which looks like it causes all the IonAllocPolicy memory to be reused. How about ContextAllocPolicy?
(In reply to Dan Gohman from comment #10) > AsmJS does push a LifoAllocScope around each function compilation though, > which looks like it causes all the IonAllocPolicy memory to be reused. Each function would have to have its own constant pool. > How about ContextAllocPolicy? SystemAllocPolicy would be OK. A number of other compilation objects use it for storage, and it's not the end of the world.
D'oh! Sorry I didn't see this coming; I think my original patch predates the single-macroassembler change to Odin. Could the constant double pool be moved into the CodeGenerator?
Luke: The CodeGenerator uses SystemAllocPolicy too :). I discussed with sstangl on IRC and he agreed with SystemAllocPolicy, at least for now. Of course, anyone with a better idea is free to refactor it. https://hg.mozilla.org/integration/mozilla-inbound/rev/65b5396b4401 https://hg.mozilla.org/integration/mozilla-inbound/rev/ccb0796f0869
There was a net regression in kraken on AWFY, but the details are ambiguous. For example, kraken-desaturate got 8.5% better on 64-bit MacOS, but got 5% worse on 32-bit MacOS, while kraken-fft got 3% worse on 64-bit MacOS but 1% better on 32-bit MacOS. I examined the assembly code for kraken-desaturate and kraken-stanford-crypto-aes and confirmed that the patch implements exactly the intended changes. Constant pools are the textbook way to do floating-point constants on x86, and they are significantly faster than what ion was doing before in some cases, so I'm assuming the slowdowns aren't a significant problem. These kraken loops are fairly small and really want their floating-point constants hoisted in any case (bug 881390).
kraken-desaturate: 64bit: 8.5% better, 32bit: 5% worse I think this is pretty normal, since the pool is only implemented in 64bit and you removed some inline code for loadConstant. So I can understand this would be a 32bit regression. kraken-fft: 64bit: 3% worse, 32bit: 1% better This is indeed strange.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: