Closed Bug 885186 Opened 11 years ago Closed 11 years ago

Optimize MoveEmitter for x86/x64

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla25

People

(Reporter: sunfish, Unassigned)

References

Details

Attachments

(3 files, 1 obsolete file)

a proposed fix 11 years ago Dan Gohman [:sunfish] 20.34 KB, patch		Details \| Diff \| Splinter Review
an updated patch 11 years ago Dan Gohman [:sunfish] 21.48 KB, patch	jandem : review+	Details \| Diff \| Splinter Review
regressing benchmarks 11 years ago Alon Zakai (:azakai) 82.43 KB, application/x-bzip		Details
nops for SPEED 11 years ago Dan Gohman [:sunfish] 740 bytes, patch		Details \| Diff \| Splinter Review

Dan Gohman [:sunfish]

Reporter

Description

•

11 years ago

x86/x64 offer several fun ways to implement MoveGroups: xchg can swap two GPRs directly, push+pop can implement moves without requiring a temporary register, and xorpd can be used to perform an xor-swap on xmm registers without requiring stack storage.

Dan Gohman [:sunfish]

Reporter

Updated

•

11 years ago

Depends on: 885183

Dan Gohman [:sunfish]

Reporter

Comment 1

•

11 years ago

Attached patch a proposed fix (obsolete) — Details — Splinter Review

This patch contains some changes which are not yet tested on x86 (32-bit), though I will do that soon.

Attachment #765184 - Flags: review?(jdemooij)

Dan Gohman [:sunfish]

Reporter

Comment 2

•

11 years ago

Attached patch an updated patch — Details — Splinter Review

And it turns out that there was a bug in the untested code. The pop instruction computes the effective address for its output *after* decrementing the stack pointer, so the code in toOperand needs to add an offset if the result is to be used by a pop.

The attached patch adds code to account for this.

Attachment #765184 - Attachment is obsolete: true

Attachment #765184 - Flags: review?(jdemooij)

Attachment #765514 - Flags: review?(jdemooij)

Jan de Mooij [:jandem]

Comment 3

•

11 years ago

Comment on attachment 765514 [details] [diff] [review]
an updated patch

Review of attachment 765514 [details] [diff] [review]:
-----------------------------------------------------------------

Nice. Good to see spilledReg_ go, it could be pretty confusing.

::: js/src/ion/shared/MoveEmitter-x86-shared.cpp
@@ +127,5 @@
> +            // Otherwise use the stack.
> +            breakCycle(to, move.kind());
> +            inCycle_ = true;
> +        }
> +    

Nit: some trailing whitespace

::: js/src/ion/shared/MoveEmitter-x86-shared.h
@@ +37,3 @@
>  
> +    size_t characterizeCycle(const MoveResolver &moves, size_t i,
> +                             bool &allGeneralRegs, bool &allFloatRegs);

Use pointers for the booleans here, to make it clear they are outparams.

Attachment #765514 - Flags: review?(jdemooij) → review+

Ed Morley [:emorley]

Comment 4

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/eb8d520f9954

Status: UNCONFIRMED → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla25

Alon Zakai (:azakai)

Comment 5

•

11 years ago

This appears to have slowed down the asm primes benchmark on awfy by 10%. The range is invalid on awfy, I bisected manually on my machine.

Dan Gohman [:sunfish]

Reporter

Comment 6

•

11 years ago

-[Codegen] push       %rdx
The two codegen changes from this patch on the asm primes benchmark are this:

-[Codegen] push       %rdx
-[Codegen] leaq       0x28(%r10), %rdx
-[Codegen] movq       %rdx, 0x8(%rsp)
+[Codegen] leaq       0x28(%r10), %r11
+[Codegen] movq       %r11, 0x0(%rsp)
 [Codegen] leaq       0x20(%r10), %r9
 [Codegen] leaq       0x18(%r10), %r8
 [Codegen] movq       0x10(%r10), %rcx
 [Codegen] movq       0x8(%r10), %rdx
 [Codegen] movq       0x0(%r10), %rsi
-[Codegen] addq       $0x8, %rsp

which looks good, and several instances of diffs like this:

-[Codegen] subq       $0x8, %rsp
-[Codegen] movq       %rax, 0x0(%rsp)
-[Codegen] movq       %rcx, %rax
-[Codegen] movq       0x0(%rsp), %rcx
-[Codegen] addq       $0x8, %rsp
+[Codegen] xchgq      %rax, %rcx

which is good as long as reg-to-reg reg xchgq isn't slow. According to Agner's Instruction tables, it's at most 3 cycles across all the CPUs.

On my Sandy Bridge, microbenchmarks of the swap using xchgq are twice as fast as the swap using sub, store, mov, load, add.

What CPU are you seeing the slowdowns on? I'll continue to look into this.

Alon Zakai (:azakai)

Comment 7

•

11 years ago

Attached file regressing benchmarks — Details

My local machine runs 64-bit linux and reports |Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz|. I also see that zlib slows down by 1.5% with this commit (less than on awfy, that's the second of two regressions that happened at the same time).

I did build the benchmarks locally, I doubt the difference matters, but attached them if you want to run on the exact same ones as me.

Your diffs there do look good, so perhaps the benefits of this patch outweigh the downsides?

Dan Gohman [:sunfish]

Reporter

Comment 8

•

11 years ago

Attached patch nops for SPEED — Details — Splinter Review

I can reproduce it, but only with the attached testcases. With the precompiled versions in the AWFY repository, I see the opposite results - the MoveEmitter patch speeds up primes by 10%, and zlib is sped up slightly.

The primes regression appears entirely due to code alignment or some other microarchitectural quirk. The attached patch simply inserts 16 nops after every xchg, making it take the same amount of code size as the sequence it replaces. With this patch, primes speeds back up by 10%.

Alon Zakai (:azakai)

Comment 9

•

11 years ago

Interesting. I guess the regression is not significant then.

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Optimize MoveEmitter for x86/x64

Categories

(Core :: JavaScript Engine, enhancement)

Tracking

()

People

(Reporter: sunfish, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(3 files, 1 obsolete file)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Attachment

General

Description

File Name

Content Type