Closed Bug 1294606 Opened 6 years ago Closed 6 years ago

Folds Lsh/Rsh same bits to SignExntend

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox51 --- fixed

People

(Reporter: hev, Assigned: hev)

References

(Blocks 1 open bug)

Details

Attachments

(4 files, 4 obsolete files)

e.g.

original instructions:
Lsh v1, v0, 24
Rsh v2, v1, 24

folds to sign-extend:
SignExtend(byte) v2, v0
Simulate Test Results:
==== x86_64 Intel i7-4790 3.60GHz ====
Source code:
// test.S
// gcc -O3 -o test test.S
        .text

        .global main

main:
        movabs $0x6fc23ac00,%rdx
        mov    $0xff,%ecx
1:
#if 0
        movsbl %cxl,%ecx
#else
        sall   $24, %ecx
        sarl   $24, %ecx
#endif
        sub    $0x1,%rdx
        jne    1b
        xor    %eax,%eax
        retq

        .end    main
        .size   main, . - main

Results (Unit: second) Test Loop count = 30000000000:
Sign-extned: 8.497, 8.283, 8.083, 8.103, 8.463, 8.613, 8.137, 8.827, 8.533, 8.267
    Average: 8.3806
Shift      : 15.067, 15.18, 15.09, 15.097, 15.243, 15.163, 15.027, 15.027, 15.043, 15.03
    Average: 15.0967


==== ARM Rpi2 Model-B ARMv7 900MHz ====
Source code:
// test.S
// gcc -O3 -o test test.S
        .text
        .align  2

        .global main

main:
        movw    r0, #0x5e00
        movt    r0, #0xb2d0
        mov     r3, #0xff
1:
#if 0
        sxtb    r3, r3
#else
        mov     r3, r3, asl #24
        mov     r3, r3, asr #24
#endif

        subs    r0, r0, #1
        bne     1b

        bx      lr

        .end    main
        .size   main, . - main

Results (Unit: second) Test Loop count = 3000000000:
Sign-extned: 6.84, 6.69, 6.71, 6.68, 6.7, 6.71, 6.71, 6.69, 6.71, 6.69
    Average: 6.713
Shift      : 13.53 ,13.39, 13.42, 13.39, 13.42, 13.38, 13.42, 13.39, 13.39, 13.39
    Average: 13.412


==== MIPS Loongson-3A2000 1.0GHz & Ci20 1.0GHz ====
Source code:
// test.S
// gcc -O3 -o test test.S
	.text
	.align	2

	.global	main
	.ent	main, 0
	.type	main, @function

main:
	li	$v0, 3000000000
	li	$t0, 0xff

1:
#if 0
	seb	$t0, $t0
#else
	sll	$t0, 24
	sra	$t0, 24
#endif
	addiu	$v0, -1
	bnez	$v0, 1b

	jr	$ra

	.end	main
	.size	main, . - main

Results (Unit: second) Test Loop count = 3000000000:
Loongson-3A2000:
Sign-extned: 6.02, 6.023, 6.02, 6.02, 6.02, 6.023, 6.02, 6.023, 6.016, 6.02
    Average: 6.0205
Shift      : 9.031, 9.027, 9.031, 9.027, 9.027, 9.031, 9.027, 9.031, 9.031, 9.031
    Average: 9.0294

Ci20:
Sign-extned: 12.51, 12.53, 12.52, 12.54, 12.53, 12.53, 12.52, 12.54, 12.53, 12.53
    Average: 12.528
Shift      : 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51
    Average: 12.51
Comment on attachment 8780423 [details] [diff] [review]
0001-Bug-1294606-Folds-Lsh-Rsh-same-bits-to-SignExntend.-.patch

Review of attachment 8780423 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work, the main remark from nits bellow is that this code is too redundant, and you can factor most of it and leave the differences for the */MacroAssembler-*-inl.h implementations.

::: js/src/jit/MIR.cpp
@@ +3004,5 @@
> +{
> +    MDefinition* lhs = getOperand(0);
> +    MDefinition* rhs = getOperand(1);
> +
> +    if (!rhs->isConstant() || rhs->type() != MIRType::Int32 || MDefinition::Op_Lsh != lhs->op())

nit: replace "MDefinition::Op_Lsh != lhs->op()" by "lhs->isLsh()" which is more common in the code base, and less verbose.

::: js/src/jit/MIR.h
@@ +5994,5 @@
> +  public:
> +    enum Mode {
> +        Byte,
> +        Half,
> +        Word,

nit: Word is not handled anywhere, remove it, and remove the default clauses from the switch statements, as these would generate a compiler warning if the switch is not complete.

@@ +6014,5 @@
> +        return new(alloc) MSignExtend(op, mode);
> +    }
> +    static MSignExtend* NewAsmJS(TempAllocator& alloc, MDefinition* op, Mode mode) {
> +        return new(alloc) MSignExtend(op, mode);
> +    }

nit: replace these New functions, including the NewAsmJS by the TRIVIAL_NEW_WRAPPER macro.

@@ +6018,5 @@
> +    }
> +
> +    Mode mode() { return mode_; }
> +
> +    ALLOW_CLONE(MSignExtend)

follow-up: Add a recover instruction for MSignExtend. (see js/src/jit/Recover.h)

::: js/src/jit/mips-shared/CodeGenerator-mips-shared.cpp
@@ +821,5 @@
> +    Register out = ToRegister(ins->output());
> +
> +    switch (ins->mode()) {
> +      case MSignExtend::Byte:
> +        masm.as_seb(out, input);

nit: Move this CodeGenerator function to the generic CodeGenerator, and add the following signature to the MacroAssembler.h file:

  inline void move8SignExtend(Register src, Register dst) PER_SHARED_ARCH
  inline void move16SignExtend(Register src, Register dst) PER_SHARED_ARCH;

Add these functions under the "Move instructions" section.

::: js/src/jit/x86/Lowering-x86.cpp
@@ +700,5 @@
> +
> +void
> +LIRGeneratorX86::visitSignExtend(MSignExtend* ins)
> +{
> +    define(new(alloc()) LSignExtend(useFixed(ins->input(), ebx), ins->mode()), ins);

Looking at the Intel documentation does not seem to mention any specific register allocation for MOVSX instructions.

nit: This implies that we have the same LIRGenetator for all platform, thus move this to the Lowering.cpp file instead of having one for each architecture.
Attachment #8780423 - Flags: review?(nicolas.b.pierron) → feedback+
Thank you for review. :)

(In reply to Nicolas B. Pierron [:nbp] from comment #3)
> Comment on attachment 8780423 [details] [diff] [review]
> 0001-Bug-1294606-Folds-Lsh-Rsh-same-bits-to-SignExntend.-.patch
> 
> Review of attachment 8780423 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Nice work, the main remark from nits bellow is that this code is too
> redundant, and you can factor most of it and leave the differences for the
> */MacroAssembler-*-inl.h implementations.
> 
> ::: js/src/jit/MIR.cpp
> @@ +3004,5 @@
> > +{
> > +    MDefinition* lhs = getOperand(0);
> > +    MDefinition* rhs = getOperand(1);
> > +
> > +    if (!rhs->isConstant() || rhs->type() != MIRType::Int32 || MDefinition::Op_Lsh != lhs->op())
> 
> nit: replace "MDefinition::Op_Lsh != lhs->op()" by "lhs->isLsh()" which is
> more common in the code base, and less verbose.
> 
> ::: js/src/jit/MIR.h
> @@ +5994,5 @@
> > +  public:
> > +    enum Mode {
> > +        Byte,
> > +        Half,
> > +        Word,
> 
> nit: Word is not handled anywhere, remove it, and remove the default clauses
> from the switch statements, as these would generate a compiler warning if
> the switch is not complete.
> 
> @@ +6014,5 @@
> > +        return new(alloc) MSignExtend(op, mode);
> > +    }
> > +    static MSignExtend* NewAsmJS(TempAllocator& alloc, MDefinition* op, Mode mode) {
> > +        return new(alloc) MSignExtend(op, mode);
> > +    }
> 
> nit: replace these New functions, including the NewAsmJS by the
> TRIVIAL_NEW_WRAPPER macro.
> 
> @@ +6018,5 @@
> > +    }
> > +
> > +    Mode mode() { return mode_; }
> > +
> > +    ALLOW_CLONE(MSignExtend)
> 
> follow-up: Add a recover instruction for MSignExtend. (see
> js/src/jit/Recover.h)

How to verify recover instruction is correct?

> 
> ::: js/src/jit/mips-shared/CodeGenerator-mips-shared.cpp
> @@ +821,5 @@
> > +    Register out = ToRegister(ins->output());
> > +
> > +    switch (ins->mode()) {
> > +      case MSignExtend::Byte:
> > +        masm.as_seb(out, input);
> 
> nit: Move this CodeGenerator function to the generic CodeGenerator, and add
> the following signature to the MacroAssembler.h file:
> 
>   inline void move8SignExtend(Register src, Register dst) PER_SHARED_ARCH
>   inline void move16SignExtend(Register src, Register dst) PER_SHARED_ARCH;
> 
> Add these functions under the "Move instructions" section.
> 
> ::: js/src/jit/x86/Lowering-x86.cpp
> @@ +700,5 @@
> > +
> > +void
> > +LIRGeneratorX86::visitSignExtend(MSignExtend* ins)
> > +{
> > +    define(new(alloc()) LSignExtend(useFixed(ins->input(), ebx), ins->mode()), ins);
> 
> Looking at the Intel documentation does not seem to mention any specific
> register allocation for MOVSX instructions.

At the start, The LIR::visitSignExtend is a common function, but jit-test is busted on x86 (32-bit).
Looks some registers's low 8-bit is not addressable on x86 (32-bit), e.g. movsbl %dil, %xxx is a invalid instruction encoding. In fact, there have a subset of GPRs can be allocated here, not only %ebx. Do you know which use?? function is appropriate?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b8d3bcf83b22&selectedJob=25618358

	movsbl %dil,%ecx

test.S: Assembler messages:
test.S:8: Error: bad register name `%dil'

or

js/src/jit/Lowering.cpp:

void
LIRGenerator::visitSignExtend(MSignExtend* ins)
{
#ifdef JS_CODEGEN_X86
    define(new(alloc()) LSignExtend(useFixed(ins->input(), ebx), ins->mode()), ins);
#else
    define(new(alloc()) LSignExtend(useRegisterAtStart(ins->input()), ins->mode()), ins);
#endif
}

What do you think?

> 
> nit: This implies that we have the same LIRGenetator for all platform, thus
> move this to the Lowering.cpp file instead of having one for each
> architecture.
Flags: needinfo?(nicolas.b.pierron)
Attachment #8780423 - Attachment is obsolete: true
(In reply to Heiher [:hev] from comment #4)
> (In reply to Nicolas B. Pierron [:nbp] from comment #3)
> > > +    ALLOW_CLONE(MSignExtend)
> > 
> > follow-up: Add a recover instruction for MSignExtend. (see
> > js/src/jit/Recover.h)
> 
> How to verify recover instruction is correct?

Have a look at
http://searchfox.org/mozilla-central/source/js/src/jit-test/tests/ion/dce-with-rinstructions.js

> > Looking at the Intel documentation does not seem to mention any specific
> > register allocation for MOVSX instructions.
> 
> At the start, The LIR::visitSignExtend is a common function, but jit-test is
> busted on x86 (32-bit).
> Looks some registers's low 8-bit is not addressable on x86 (32-bit), e.g.
> movsbl %dil, %xxx is a invalid instruction encoding. In fact, there have a
> subset of GPRs can be allocated here, not only %ebx. Do you know which use??
> function is appropriate?
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=b8d3bcf83b22&selectedJob=25618358
> 
> 	movsbl %dil,%ecx
> 
> test.S: Assembler messages:
> test.S:8: Error: bad register name `%dil'
> […]
> 
> What do you think?

Currently the register allocator is not aware of byte-size GPR.  Currently we have a function named useByteOpRegister [1] which either takes any register on platform where this is doable (such as x64), or a fixed register (such as x86).

Unfortunately, this function does not make use of the AtStart flag.  I suggest to either add an enum optional argument for AtStart flag, or copy these useByteOpRegister to make a AtStart variant.

[1] http://searchfox.org/mozilla-central/rev/9ec085584d7491ddbaf6574d3732c08511709172/js/src/jit/x86/Lowering-x86.cpp#28-32
Flags: needinfo?(nicolas.b.pierron)
Attachment #8781468 - Attachment is obsolete: true
Attachment #8781524 - Flags: review?(nicolas.b.pierron)
Attachment #8781533 - Flags: review?(nicolas.b.pierron)
Attachment #8781534 - Flags: review?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #6)
> (In reply to Heiher [:hev] from comment #4)
> > (In reply to Nicolas B. Pierron [:nbp] from comment #3)
> > > > +    ALLOW_CLONE(MSignExtend)
> > > 
> > > follow-up: Add a recover instruction for MSignExtend. (see
> > > js/src/jit/Recover.h)
> > 
> > How to verify recover instruction is correct?
> 
> Have a look at
> http://searchfox.org/mozilla-central/source/js/src/jit-test/tests/ion/dce-
> with-rinstructions.js
> 
> > > Looking at the Intel documentation does not seem to mention any specific
> > > register allocation for MOVSX instructions.
> > 
> > At the start, The LIR::visitSignExtend is a common function, but jit-test is
> > busted on x86 (32-bit).
> > Looks some registers's low 8-bit is not addressable on x86 (32-bit), e.g.
> > movsbl %dil, %xxx is a invalid instruction encoding. In fact, there have a
> > subset of GPRs can be allocated here, not only %ebx. Do you know which use??
> > function is appropriate?
> > 
> > https://treeherder.mozilla.org/#/
> > jobs?repo=try&revision=b8d3bcf83b22&selectedJob=25618358
> > 
> > 	movsbl %dil,%ecx
> > 
> > test.S: Assembler messages:
> > test.S:8: Error: bad register name `%dil'
> > […]
> > 
> > What do you think?
> 
> Currently the register allocator is not aware of byte-size GPR.  Currently
> we have a function named useByteOpRegister [1] which either takes any
> register on platform where this is doable (such as x64), or a fixed register
> (such as x86).
> 
> Unfortunately, this function does not make use of the AtStart flag.  I
> suggest to either add an enum optional argument for AtStart flag, or copy
> these useByteOpRegister to make a AtStart variant.
> 
> [1]
> http://searchfox.org/mozilla-central/rev/
> 9ec085584d7491ddbaf6574d3732c08511709172/js/src/jit/x86/Lowering-x86.cpp#28-
> 32


Thank you reply. :)
Fix build failure on arm.
Attachment #8781533 - Attachment is obsolete: true
Attachment #8781533 - Flags: review?(nicolas.b.pierron)
Attachment #8781853 - Flags: review?(nicolas.b.pierron)
Attachment #8781524 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8781534 [details] [diff] [review]
Part 3 - Add tests for SignExtend.

Review of attachment 8781534 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit-test/tests/ion/dce-with-rinstructions.js
@@ +165,5 @@
> +    if (uceFault_signextend(i) || uceFault_signextend(i))
> +        assertEq(x, 99  /* = (99 << 24) >> 24 */);
> +    var y = (i << 16) >> 16;
> +    if (uceFault_signextend(i) || uceFault_signextend(i))
> +        assertEq(y, 99  /* = (99 << 16) >> 16 */);

This assert won't work as expected, create a new function to test the shifts by 16.
Also add 2 new functions to test with negative numbers. (-1 * i)

The reason this does not work as expect is that "uceFault" ensure the removal of the branch and the assertion is made to check the Recover instruction result.  In this case, recover instructions are executed during the bailout path, thus the first assertion will used the recover instruction path, while the second will compute the result within baseline.
Attachment #8781534 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8781853 [details] [diff] [review]
Part 2 - Folds Lsh/Rsh same bits to SignExntend.

Review of attachment 8781853 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/Lowering.cpp
@@ +1307,5 @@
>  
>  void
> +LIRGenerator::visitSignExtend(MSignExtend* ins)
> +{
> +    define(new(alloc()) LSignExtend(useByteOpRegisterAtStart(ins->input()), ins->mode()), ins);

nit: Only use useByteOpRegisterAtStart for MSignExtend::Byte.

::: js/src/vm/Interpreter-inl.h
@@ +766,5 @@
> +template <typename T>
> +static MOZ_ALWAYS_INLINE bool
> +SignExtendOperation(JSContext* cx, HandleValue in, int* out)
> +{
> +    int i;

nit: int32_t
Attachment #8781853 - Flags: review?(nicolas.b.pierron) → review+
Attachment #8781534 - Attachment is obsolete: true
Attachment #8782723 - Flags: review?(nicolas.b.pierron)
Attachment #8782723 - Flags: review?(nicolas.b.pierron) → review+
Pushed by r@hev.cc:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3172e3fa6e24
Part 1: Implement LIRGenerator::useByteOpRegisterAtStart. r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/defd76119eda
Part 2: Folds Lsh/Rsh same bits to SignExntend. r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/4d05a40172ca
Part 3: Add tests for SignExtend. r=nbp
sorry had to back this out for Spider Monkey ARM64 bustage, e.g., https://treeherder.mozilla.org/logviewer.html#?job_id=34219022&repo=mozilla-inbound#L979
Flags: needinfo?(r)
Backout by ihsiao@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/835eee39bb16
Backed out changeset 4d05a40172ca for Spider Monkey ARM64 bustage
https://hg.mozilla.org/integration/mozilla-inbound/rev/6568553f7432
Backed out changeset defd76119eda 
https://hg.mozilla.org/integration/mozilla-inbound/rev/c9dcfd11f12d
Backed out changeset 3172e3fa6e24
Pushed by r@hev.cc:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b67594606d54
Part 1: Implement LIRGenerator::useByteOpRegisterAtStart. r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/803ae1fb9740
Part 2: Folds Lsh/Rsh same bits to SignExntend. r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/d547167d7b56
Part 3: Add tests for SignExtend. r=nbp
(In reply to Iris Hsiao [:ihsiao] from comment #18)
> sorry had to back this out for Spider Monkey ARM64 bustage, e.g.,
> https://treeherder.mozilla.org/logviewer.html#?job_id=34219022&repo=mozilla-
> inbound#L979

Fixed, thank you.
Flags: needinfo?(r)
Hi,
I'm running into severe compilation time issues with recent versions of SpiderMonkey when running a (very) large asm.js file. I've bisected the problem down to patch part 2 of this bug. My first guess at what's going wrong is that the new MRSh::foldsTo function just gives up if it's not looking at a sign extension, rather than calling MBinaryBitwiseInstruction::foldsTo.

The details:
I'm working on a gcc (binutils,glibc) backend for asmjs (and wasm) at https://github.com/pipcet/asmjs. I'm producing large switch statements, one for each function, rather than using an Emscripten-like relooper. 

When compiling Perl, the Perl_yylex function turns into about 30,000 lines of code. When switching from SpiderMonkey-as-of-a-few-months ago to the current inbound branch, compilation time drastically increased, from one minute real time (which is painful to begin with) to about three minutes. Most of that time was indeed spent compiling Perl_yylex, and the problem appeared to be in the Ion register allocators. --wasm-always-baseline resulted in acceptable startup times, and nodejs --harmony similarly didn't seem to be too problematic.

I bisected it down to part 2 of the patch for this bug, noticed the lack of inheritance for ::foldsTo, tried fixing it, and performance appears to have gone back to the previous level.

I've attached a proposed patch which applies to the current (well, a few days old) inbound tree. If it looks okay I'll convert to SVN format.
I believe this fixes a minor issue that was causing major compile time performance degradation.
pipcet, thanks for the patch! Can you open a new bug for this and ask Nicolas (the reviewer of the patches in this bug) for feedback on your patch?

This bug is closed as the original issue it was about is fixed and all patches have been landed. Attaching a new patch here runs the risk of it being overlooked.
Flags: needinfo?(pipcet)
Thanks! I've done that at bug 1318926. Do let me know if there's anything else I can do (I'll try whittling down the problematic script to something manageable in size to see whether that still triggers the bug).
Flags: needinfo?(pipcet)
Blocks: jsperf
Priority: -- → P1
You need to log in before you can comment on or make changes to this bug.