Folds Lsh/Rsh same bits to SignExntend

RESOLVED FIXED in Firefox 51

Status

()

defect
P1
normal
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: hev, Assigned: hev)

Tracking

(Blocks 1 bug)

unspecified
mozilla51
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox51 fixed)

Details

Attachments

(4 attachments, 4 obsolete attachments)

Assignee

Description

3 years ago
e.g.

original instructions:
Lsh v1, v0, 24
Rsh v2, v1, 24

folds to sign-extend:
SignExtend(byte) v2, v0
Assignee

Comment 1

3 years ago
Simulate Test Results:
==== x86_64 Intel i7-4790 3.60GHz ====
Source code:
// test.S
// gcc -O3 -o test test.S
        .text

        .global main

main:
        movabs $0x6fc23ac00,%rdx
        mov    $0xff,%ecx
1:
#if 0
        movsbl %cxl,%ecx
#else
        sall   $24, %ecx
        sarl   $24, %ecx
#endif
        sub    $0x1,%rdx
        jne    1b
        xor    %eax,%eax
        retq

        .end    main
        .size   main, . - main

Results (Unit: second) Test Loop count = 30000000000:
Sign-extned: 8.497, 8.283, 8.083, 8.103, 8.463, 8.613, 8.137, 8.827, 8.533, 8.267
    Average: 8.3806
Shift      : 15.067, 15.18, 15.09, 15.097, 15.243, 15.163, 15.027, 15.027, 15.043, 15.03
    Average: 15.0967


==== ARM Rpi2 Model-B ARMv7 900MHz ====
Source code:
// test.S
// gcc -O3 -o test test.S
        .text
        .align  2

        .global main

main:
        movw    r0, #0x5e00
        movt    r0, #0xb2d0
        mov     r3, #0xff
1:
#if 0
        sxtb    r3, r3
#else
        mov     r3, r3, asl #24
        mov     r3, r3, asr #24
#endif

        subs    r0, r0, #1
        bne     1b

        bx      lr

        .end    main
        .size   main, . - main

Results (Unit: second) Test Loop count = 3000000000:
Sign-extned: 6.84, 6.69, 6.71, 6.68, 6.7, 6.71, 6.71, 6.69, 6.71, 6.69
    Average: 6.713
Shift      : 13.53 ,13.39, 13.42, 13.39, 13.42, 13.38, 13.42, 13.39, 13.39, 13.39
    Average: 13.412


==== MIPS Loongson-3A2000 1.0GHz & Ci20 1.0GHz ====
Source code:
// test.S
// gcc -O3 -o test test.S
	.text
	.align	2

	.global	main
	.ent	main, 0
	.type	main, @function

main:
	li	$v0, 3000000000
	li	$t0, 0xff

1:
#if 0
	seb	$t0, $t0
#else
	sll	$t0, 24
	sra	$t0, 24
#endif
	addiu	$v0, -1
	bnez	$v0, 1b

	jr	$ra

	.end	main
	.size	main, . - main

Results (Unit: second) Test Loop count = 3000000000:
Loongson-3A2000:
Sign-extned: 6.02, 6.023, 6.02, 6.02, 6.02, 6.023, 6.02, 6.023, 6.016, 6.02
    Average: 6.0205
Shift      : 9.031, 9.027, 9.031, 9.027, 9.027, 9.031, 9.027, 9.031, 9.031, 9.031
    Average: 9.0294

Ci20:
Sign-extned: 12.51, 12.53, 12.52, 12.54, 12.53, 12.53, 12.52, 12.54, 12.53, 12.53
    Average: 12.528
Shift      : 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51
    Average: 12.51
Comment on attachment 8780423 [details] [diff] [review]
0001-Bug-1294606-Folds-Lsh-Rsh-same-bits-to-SignExntend.-.patch

Review of attachment 8780423 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work, the main remark from nits bellow is that this code is too redundant, and you can factor most of it and leave the differences for the */MacroAssembler-*-inl.h implementations.

::: js/src/jit/MIR.cpp
@@ +3004,5 @@
> +{
> +    MDefinition* lhs = getOperand(0);
> +    MDefinition* rhs = getOperand(1);
> +
> +    if (!rhs->isConstant() || rhs->type() != MIRType::Int32 || MDefinition::Op_Lsh != lhs->op())

nit: replace "MDefinition::Op_Lsh != lhs->op()" by "lhs->isLsh()" which is more common in the code base, and less verbose.

::: js/src/jit/MIR.h
@@ +5994,5 @@
> +  public:
> +    enum Mode {
> +        Byte,
> +        Half,
> +        Word,

nit: Word is not handled anywhere, remove it, and remove the default clauses from the switch statements, as these would generate a compiler warning if the switch is not complete.

@@ +6014,5 @@
> +        return new(alloc) MSignExtend(op, mode);
> +    }
> +    static MSignExtend* NewAsmJS(TempAllocator& alloc, MDefinition* op, Mode mode) {
> +        return new(alloc) MSignExtend(op, mode);
> +    }

nit: replace these New functions, including the NewAsmJS by the TRIVIAL_NEW_WRAPPER macro.

@@ +6018,5 @@
> +    }
> +
> +    Mode mode() { return mode_; }
> +
> +    ALLOW_CLONE(MSignExtend)

follow-up: Add a recover instruction for MSignExtend. (see js/src/jit/Recover.h)

::: js/src/jit/mips-shared/CodeGenerator-mips-shared.cpp
@@ +821,5 @@
> +    Register out = ToRegister(ins->output());
> +
> +    switch (ins->mode()) {
> +      case MSignExtend::Byte:
> +        masm.as_seb(out, input);

nit: Move this CodeGenerator function to the generic CodeGenerator, and add the following signature to the MacroAssembler.h file:

  inline void move8SignExtend(Register src, Register dst) PER_SHARED_ARCH
  inline void move16SignExtend(Register src, Register dst) PER_SHARED_ARCH;

Add these functions under the "Move instructions" section.

::: js/src/jit/x86/Lowering-x86.cpp
@@ +700,5 @@
> +
> +void
> +LIRGeneratorX86::visitSignExtend(MSignExtend* ins)
> +{
> +    define(new(alloc()) LSignExtend(useFixed(ins->input(), ebx), ins->mode()), ins);

Looking at the Intel documentation does not seem to mention any specific register allocation for MOVSX instructions.

nit: This implies that we have the same LIRGenetator for all platform, thus move this to the Lowering.cpp file instead of having one for each architecture.
Attachment #8780423 - Flags: review?(nicolas.b.pierron) → feedback+
Assignee

Comment 4

3 years ago
Thank you for review. :)

(In reply to Nicolas B. Pierron [:nbp] from comment #3)
> Comment on attachment 8780423 [details] [diff] [review]
> 0001-Bug-1294606-Folds-Lsh-Rsh-same-bits-to-SignExntend.-.patch
> 
> Review of attachment 8780423 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Nice work, the main remark from nits bellow is that this code is too
> redundant, and you can factor most of it and leave the differences for the
> */MacroAssembler-*-inl.h implementations.
> 
> ::: js/src/jit/MIR.cpp
> @@ +3004,5 @@
> > +{
> > +    MDefinition* lhs = getOperand(0);
> > +    MDefinition* rhs = getOperand(1);
> > +
> > +    if (!rhs->isConstant() || rhs->type() != MIRType::Int32 || MDefinition::Op_Lsh != lhs->op())
> 
> nit: replace "MDefinition::Op_Lsh != lhs->op()" by "lhs->isLsh()" which is
> more common in the code base, and less verbose.
> 
> ::: js/src/jit/MIR.h
> @@ +5994,5 @@
> > +  public:
> > +    enum Mode {
> > +        Byte,
> > +        Half,
> > +        Word,
> 
> nit: Word is not handled anywhere, remove it, and remove the default clauses
> from the switch statements, as these would generate a compiler warning if
> the switch is not complete.
> 
> @@ +6014,5 @@
> > +        return new(alloc) MSignExtend(op, mode);
> > +    }
> > +    static MSignExtend* NewAsmJS(TempAllocator& alloc, MDefinition* op, Mode mode) {
> > +        return new(alloc) MSignExtend(op, mode);
> > +    }
> 
> nit: replace these New functions, including the NewAsmJS by the
> TRIVIAL_NEW_WRAPPER macro.
> 
> @@ +6018,5 @@
> > +    }
> > +
> > +    Mode mode() { return mode_; }
> > +
> > +    ALLOW_CLONE(MSignExtend)
> 
> follow-up: Add a recover instruction for MSignExtend. (see
> js/src/jit/Recover.h)

How to verify recover instruction is correct?

> 
> ::: js/src/jit/mips-shared/CodeGenerator-mips-shared.cpp
> @@ +821,5 @@
> > +    Register out = ToRegister(ins->output());
> > +
> > +    switch (ins->mode()) {
> > +      case MSignExtend::Byte:
> > +        masm.as_seb(out, input);
> 
> nit: Move this CodeGenerator function to the generic CodeGenerator, and add
> the following signature to the MacroAssembler.h file:
> 
>   inline void move8SignExtend(Register src, Register dst) PER_SHARED_ARCH
>   inline void move16SignExtend(Register src, Register dst) PER_SHARED_ARCH;
> 
> Add these functions under the "Move instructions" section.
> 
> ::: js/src/jit/x86/Lowering-x86.cpp
> @@ +700,5 @@
> > +
> > +void
> > +LIRGeneratorX86::visitSignExtend(MSignExtend* ins)
> > +{
> > +    define(new(alloc()) LSignExtend(useFixed(ins->input(), ebx), ins->mode()), ins);
> 
> Looking at the Intel documentation does not seem to mention any specific
> register allocation for MOVSX instructions.

At the start, The LIR::visitSignExtend is a common function, but jit-test is busted on x86 (32-bit).
Looks some registers's low 8-bit is not addressable on x86 (32-bit), e.g. movsbl %dil, %xxx is a invalid instruction encoding. In fact, there have a subset of GPRs can be allocated here, not only %ebx. Do you know which use?? function is appropriate?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b8d3bcf83b22&selectedJob=25618358

	movsbl %dil,%ecx

test.S: Assembler messages:
test.S:8: Error: bad register name `%dil'

or

js/src/jit/Lowering.cpp:

void
LIRGenerator::visitSignExtend(MSignExtend* ins)
{
#ifdef JS_CODEGEN_X86
    define(new(alloc()) LSignExtend(useFixed(ins->input(), ebx), ins->mode()), ins);
#else
    define(new(alloc()) LSignExtend(useRegisterAtStart(ins->input()), ins->mode()), ins);
#endif
}

What do you think?

> 
> nit: This implies that we have the same LIRGenetator for all platform, thus
> move this to the Lowering.cpp file instead of having one for each
> architecture.
Flags: needinfo?(nicolas.b.pierron)
Assignee

Comment 5

3 years ago
Attachment #8780423 - Attachment is obsolete: true
(In reply to Heiher [:hev] from comment #4)
> (In reply to Nicolas B. Pierron [:nbp] from comment #3)
> > > +    ALLOW_CLONE(MSignExtend)
> > 
> > follow-up: Add a recover instruction for MSignExtend. (see
> > js/src/jit/Recover.h)
> 
> How to verify recover instruction is correct?

Have a look at
http://searchfox.org/mozilla-central/source/js/src/jit-test/tests/ion/dce-with-rinstructions.js

> > Looking at the Intel documentation does not seem to mention any specific
> > register allocation for MOVSX instructions.
> 
> At the start, The LIR::visitSignExtend is a common function, but jit-test is
> busted on x86 (32-bit).
> Looks some registers's low 8-bit is not addressable on x86 (32-bit), e.g.
> movsbl %dil, %xxx is a invalid instruction encoding. In fact, there have a
> subset of GPRs can be allocated here, not only %ebx. Do you know which use??
> function is appropriate?
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=b8d3bcf83b22&selectedJob=25618358
> 
> 	movsbl %dil,%ecx
> 
> test.S: Assembler messages:
> test.S:8: Error: bad register name `%dil'
> […]
> 
> What do you think?

Currently the register allocator is not aware of byte-size GPR.  Currently we have a function named useByteOpRegister [1] which either takes any register on platform where this is doable (such as x64), or a fixed register (such as x86).

Unfortunately, this function does not make use of the AtStart flag.  I suggest to either add an enum optional argument for AtStart flag, or copy these useByteOpRegister to make a AtStart variant.

[1] http://searchfox.org/mozilla-central/rev/9ec085584d7491ddbaf6574d3732c08511709172/js/src/jit/x86/Lowering-x86.cpp#28-32
Flags: needinfo?(nicolas.b.pierron)
Assignee

Comment 7

3 years ago
Attachment #8781468 - Attachment is obsolete: true
Attachment #8781524 - Flags: review?(nicolas.b.pierron)
Assignee

Comment 8

3 years ago
Attachment #8781533 - Flags: review?(nicolas.b.pierron)
Assignee

Comment 9

3 years ago
Attachment #8781534 - Flags: review?(nicolas.b.pierron)
Assignee

Comment 10

3 years ago
(In reply to Nicolas B. Pierron [:nbp] from comment #6)
> (In reply to Heiher [:hev] from comment #4)
> > (In reply to Nicolas B. Pierron [:nbp] from comment #3)
> > > > +    ALLOW_CLONE(MSignExtend)
> > > 
> > > follow-up: Add a recover instruction for MSignExtend. (see
> > > js/src/jit/Recover.h)
> > 
> > How to verify recover instruction is correct?
> 
> Have a look at
> http://searchfox.org/mozilla-central/source/js/src/jit-test/tests/ion/dce-
> with-rinstructions.js
> 
> > > Looking at the Intel documentation does not seem to mention any specific
> > > register allocation for MOVSX instructions.
> > 
> > At the start, The LIR::visitSignExtend is a common function, but jit-test is
> > busted on x86 (32-bit).
> > Looks some registers's low 8-bit is not addressable on x86 (32-bit), e.g.
> > movsbl %dil, %xxx is a invalid instruction encoding. In fact, there have a
> > subset of GPRs can be allocated here, not only %ebx. Do you know which use??
> > function is appropriate?
> > 
> > https://treeherder.mozilla.org/#/
> > jobs?repo=try&revision=b8d3bcf83b22&selectedJob=25618358
> > 
> > 	movsbl %dil,%ecx
> > 
> > test.S: Assembler messages:
> > test.S:8: Error: bad register name `%dil'
> > […]
> > 
> > What do you think?
> 
> Currently the register allocator is not aware of byte-size GPR.  Currently
> we have a function named useByteOpRegister [1] which either takes any
> register on platform where this is doable (such as x64), or a fixed register
> (such as x86).
> 
> Unfortunately, this function does not make use of the AtStart flag.  I
> suggest to either add an enum optional argument for AtStart flag, or copy
> these useByteOpRegister to make a AtStart variant.
> 
> [1]
> http://searchfox.org/mozilla-central/rev/
> 9ec085584d7491ddbaf6574d3732c08511709172/js/src/jit/x86/Lowering-x86.cpp#28-
> 32


Thank you reply. :)
Assignee

Comment 12

3 years ago
Fix build failure on arm.
Attachment #8781533 - Attachment is obsolete: true
Attachment #8781533 - Flags: review?(nicolas.b.pierron)
Attachment #8781853 - Flags: review?(nicolas.b.pierron)
Attachment #8781524 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8781534 [details] [diff] [review]
Part 3 - Add tests for SignExtend.

Review of attachment 8781534 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit-test/tests/ion/dce-with-rinstructions.js
@@ +165,5 @@
> +    if (uceFault_signextend(i) || uceFault_signextend(i))
> +        assertEq(x, 99  /* = (99 << 24) >> 24 */);
> +    var y = (i << 16) >> 16;
> +    if (uceFault_signextend(i) || uceFault_signextend(i))
> +        assertEq(y, 99  /* = (99 << 16) >> 16 */);

This assert won't work as expected, create a new function to test the shifts by 16.
Also add 2 new functions to test with negative numbers. (-1 * i)

The reason this does not work as expect is that "uceFault" ensure the removal of the branch and the assertion is made to check the Recover instruction result.  In this case, recover instructions are executed during the bailout path, thus the first assertion will used the recover instruction path, while the second will compute the result within baseline.
Attachment #8781534 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8781853 [details] [diff] [review]
Part 2 - Folds Lsh/Rsh same bits to SignExntend.

Review of attachment 8781853 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/Lowering.cpp
@@ +1307,5 @@
>  
>  void
> +LIRGenerator::visitSignExtend(MSignExtend* ins)
> +{
> +    define(new(alloc()) LSignExtend(useByteOpRegisterAtStart(ins->input()), ins->mode()), ins);

nit: Only use useByteOpRegisterAtStart for MSignExtend::Byte.

::: js/src/vm/Interpreter-inl.h
@@ +766,5 @@
> +template <typename T>
> +static MOZ_ALWAYS_INLINE bool
> +SignExtendOperation(JSContext* cx, HandleValue in, int* out)
> +{
> +    int i;

nit: int32_t
Attachment #8781853 - Flags: review?(nicolas.b.pierron) → review+
Assignee

Comment 16

3 years ago
Attachment #8781534 - Attachment is obsolete: true
Attachment #8782723 - Flags: review?(nicolas.b.pierron)
Attachment #8782723 - Flags: review?(nicolas.b.pierron) → review+

Comment 17

3 years ago
Pushed by r@hev.cc:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3172e3fa6e24
Part 1: Implement LIRGenerator::useByteOpRegisterAtStart. r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/defd76119eda
Part 2: Folds Lsh/Rsh same bits to SignExntend. r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/4d05a40172ca
Part 3: Add tests for SignExtend. r=nbp
sorry had to back this out for Spider Monkey ARM64 bustage, e.g., https://treeherder.mozilla.org/logviewer.html#?job_id=34219022&repo=mozilla-inbound#L979
Flags: needinfo?(r)

Comment 19

3 years ago
Backout by ihsiao@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/835eee39bb16
Backed out changeset 4d05a40172ca for Spider Monkey ARM64 bustage
https://hg.mozilla.org/integration/mozilla-inbound/rev/6568553f7432
Backed out changeset defd76119eda 
https://hg.mozilla.org/integration/mozilla-inbound/rev/c9dcfd11f12d
Backed out changeset 3172e3fa6e24

Comment 20

3 years ago
Pushed by r@hev.cc:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b67594606d54
Part 1: Implement LIRGenerator::useByteOpRegisterAtStart. r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/803ae1fb9740
Part 2: Folds Lsh/Rsh same bits to SignExntend. r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/d547167d7b56
Part 3: Add tests for SignExtend. r=nbp
Assignee

Comment 21

3 years ago
(In reply to Iris Hsiao [:ihsiao] from comment #18)
> sorry had to back this out for Spider Monkey ARM64 bustage, e.g.,
> https://treeherder.mozilla.org/logviewer.html#?job_id=34219022&repo=mozilla-
> inbound#L979

Fixed, thank you.
Flags: needinfo?(r)

Comment 23

3 years ago
Hi,
I'm running into severe compilation time issues with recent versions of SpiderMonkey when running a (very) large asm.js file. I've bisected the problem down to patch part 2 of this bug. My first guess at what's going wrong is that the new MRSh::foldsTo function just gives up if it's not looking at a sign extension, rather than calling MBinaryBitwiseInstruction::foldsTo.

The details:
I'm working on a gcc (binutils,glibc) backend for asmjs (and wasm) at https://github.com/pipcet/asmjs. I'm producing large switch statements, one for each function, rather than using an Emscripten-like relooper. 

When compiling Perl, the Perl_yylex function turns into about 30,000 lines of code. When switching from SpiderMonkey-as-of-a-few-months ago to the current inbound branch, compilation time drastically increased, from one minute real time (which is painful to begin with) to about three minutes. Most of that time was indeed spent compiling Perl_yylex, and the problem appeared to be in the Ion register allocators. --wasm-always-baseline resulted in acceptable startup times, and nodejs --harmony similarly didn't seem to be too problematic.

I bisected it down to part 2 of the patch for this bug, noticed the lack of inheritance for ::foldsTo, tried fixing it, and performance appears to have gone back to the previous level.

I've attached a proposed patch which applies to the current (well, a few days old) inbound tree. If it looks okay I'll convert to SVN format.

Comment 24

3 years ago
I believe this fixes a minor issue that was causing major compile time performance degradation.
pipcet, thanks for the patch! Can you open a new bug for this and ask Nicolas (the reviewer of the patches in this bug) for feedback on your patch?

This bug is closed as the original issue it was about is fixed and all patches have been landed. Attaching a new patch here runs the risk of it being overlooked.
Flags: needinfo?(pipcet)

Comment 26

3 years ago
Thanks! I've done that at bug 1318926. Do let me know if there's anything else I can do (I'll try whittling down the problematic script to something manageable in size to see whether that still triggers the bug).
Flags: needinfo?(pipcet)
Blocks: jsperf
Priority: -- → P1
You need to log in before you can comment on or make changes to this bug.