Closed
Bug 876064
Opened 12 years ago
Closed 12 years ago
IonMonkey: Use a double constant pool on x86/x64, fold loads into operands
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
mozilla25
People
(Reporter: luke, Unassigned)
References
Details
Attachments
(2 files, 1 obsolete file)
|
16.20 KB,
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
|
10.34 KB,
patch
|
sstangl
:
review+
|
Details | Diff | Splinter Review |
As a first step, we could load doubles into xmm registers with a single instruction instead of what we have now
pcmpeqw %xmm1,%xmm1
psllq $0x36,%xmm1
psrlq $0x2,%xmm1
or
movabsq $0x3ff0000000000000, %r11
movq %r11, %xmm1
I have an old, simple WIP patch (attached) that does this for x64 using rip-relative loads, but we could do the same thing with normal loads on x86.
Better than that, we could fold this constant-double-load into the instructions that consume them (like we do with integer immediates with emitAtUses). This involves converting a bunch of double arith functions to useRegisterOrConstant().
One subtlety, though, is that, unlike with integer literals, several instructions using the same double constant, we probably want to load the constant into a register so it can be reused. Ideally, we'd integrate this with regalloc so that we don't unnecessarily extend live ranges. Jan, how hard would it be to get the regalloc to do this? This seems related to what we do with STACK_SLOTs.
| Reporter | ||
Updated•12 years ago
|
Summary: IonMonkey: Use a constant pool on x86/x64, fold loads into operands → IonMonkey: Use a double constant pool on x86/x64, fold loads into operands
Comment 1•12 years ago
|
||
This takes the NonassertingLabel part of the original patch, updates it, and replaces the HeapLabel class, which had been introduced in the meantime to serve the same purpose.
Attachment #768943 -
Flags: review?(luke)
Comment 2•12 years ago
|
||
This patch updates the original patch to implement rip-relative constant-pool loads for double constants.
It also makes a few minor cleanups for x86 for consistency with x64.
Attachment #754034 -
Attachment is obsolete: true
Attachment #768945 -
Flags: review?(luke)
| Reporter | ||
Updated•12 years ago
|
Attachment #768943 -
Flags: review?(luke) → review+
| Reporter | ||
Comment 3•12 years ago
|
||
Comment on attachment 768945 [details] [diff] [review]
the main patch
Review of attachment 768945 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good, but probalby sstangl or jan or hannes should review.
::: js/src/ion/x64/MacroAssembler-x64.cpp
@@ +27,5 @@
> + return;
> + }
> + size_t doubleIndex;
> + DoubleMap::AddPtr p = doubleMap_.lookupForAdd(d);
> + if (p) {
You can write if (DoubleMap::AddPtr p = ...) (and p will be in scope for the 'else').
Attachment #768945 -
Flags: review?(luke) → review?(sstangl)
Comment 4•12 years ago
|
||
Comment on attachment 768945 [details] [diff] [review]
the main patch
Review of attachment 768945 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good. Could we measure effect on benchmarks before landing?
::: js/src/ion/x64/MacroAssembler-x64.cpp
@@ +52,5 @@
> +
> +void
> +MacroAssemblerX64::finish()
> +{
> + if (!doubles_.empty())
It would be useful here to JS_STATIC_ASSERT that CodeAlignment >= sizeof(double).
::: js/src/ion/x64/MacroAssembler-x64.h
@@ +49,5 @@
> + double value;
> + NonAssertingLabel uses;
> + Double(double value) : value(value) {}
> + };
> + Vector<Double, 0, SystemAllocPolicy> doubles_;
SystemAllocPolicy allocates into the heap, while IonAllocPolicy allocates into a (logically) thread-local temporary buffer that persists until the end of compilation. Can we use IonAllocPolicy here and below?
::: js/src/ion/x86/MacroAssembler-x86.cpp
@@ +17,5 @@
>
> void
> MacroAssemblerX86::loadConstantDouble(double d, const FloatRegister &dest)
> {
> + if (maybeInlineDouble(mozilla::BitwiseCast<uint64_t>(d), dest))
It may be nicer for maybeInlineDouble to take a double, performing the BitwiseCast internally.
Attachment #768945 -
Flags: review?(sstangl) → review+
Comment 5•12 years ago
|
||
I don't see any change on the octane benchmark, but I do see speedups on microbenchmarks like the one in bug 876057.
Comment 6•12 years ago
|
||
Comment 7•12 years ago
|
||
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/91851a7f561a for rather a lot of asm.js test failures.
Comment 8•12 years ago
|
||
It appears IonAllocPolicy's heap doesn't persist until the end of compilation; asm.js frees it after each function. Is it ok to switch back to SystemAllocPolicy for the constant pool, or would you like me to look for a different approach?
Comment 9•12 years ago
|
||
(In reply to Dan Gohman from comment #8)
> It appears IonAllocPolicy's heap doesn't persist until the end of
> compilation; asm.js frees it after each function. Is it ok to switch back to
> SystemAllocPolicy for the constant pool, or would you like me to look for a
> different approach?
AsmJS doesn't destruct the MacroAssembler after each function; instead, it uses resetForNewCodeGenerator(). I'd prefer to keep the memory in the temporary buffers if it's not a hassle.
Would it be reasonable to make constant pool emission an explicit step during linking? Then we could include constant pool emission and label binding as a step in CodeGenerator::generateAsmJS(), used via GenerateAsmJSCode().
Comment 10•12 years ago
|
||
AsmJS does push a LifoAllocScope around each function compilation though, which looks like it causes all the IonAllocPolicy memory to be reused.
How about ContextAllocPolicy?
Comment 11•12 years ago
|
||
(In reply to Dan Gohman from comment #10)
> AsmJS does push a LifoAllocScope around each function compilation though,
> which looks like it causes all the IonAllocPolicy memory to be reused.
Each function would have to have its own constant pool.
> How about ContextAllocPolicy?
SystemAllocPolicy would be OK. A number of other compilation objects use it for storage, and it's not the end of the world.
| Reporter | ||
Comment 12•12 years ago
|
||
D'oh! Sorry I didn't see this coming; I think my original patch predates the single-macroassembler change to Odin.
Could the constant double pool be moved into the CodeGenerator?
Comment 13•12 years ago
|
||
Luke: The CodeGenerator uses SystemAllocPolicy too :).
I discussed with sstangl on IRC and he agreed with SystemAllocPolicy, at least for now. Of course, anyone with a better idea is free to refactor it.
https://hg.mozilla.org/integration/mozilla-inbound/rev/65b5396b4401
https://hg.mozilla.org/integration/mozilla-inbound/rev/ccb0796f0869
Comment 14•12 years ago
|
||
There was a net regression in kraken on AWFY, but the details are ambiguous. For example, kraken-desaturate got 8.5% better on 64-bit MacOS, but got 5% worse on 32-bit MacOS, while kraken-fft got 3% worse on 64-bit MacOS but 1% better on 32-bit MacOS.
I examined the assembly code for kraken-desaturate and kraken-stanford-crypto-aes and confirmed that the patch implements exactly the intended changes. Constant pools are the textbook way to do floating-point constants on x86, and they are significantly faster than what ion was doing before in some cases, so I'm assuming the slowdowns aren't a significant problem.
These kraken loops are fairly small and really want their floating-point constants hoisted in any case (bug 881390).
Comment 15•12 years ago
|
||
kraken-desaturate:
64bit: 8.5% better, 32bit: 5% worse
I think this is pretty normal, since the pool is only implemented in 64bit and you removed some inline code for loadConstant. So I can understand this would be a 32bit regression.
kraken-fft:
64bit: 3% worse, 32bit: 1% better
This is indeed strange.
Comment 16•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/65b5396b4401
https://hg.mozilla.org/mozilla-central/rev/ccb0796f0869
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
You need to log in
before you can comment on or make changes to this bug.
Description
•