Closed
Bug 1294606
Opened 6 years ago
Closed 6 years ago
Folds Lsh/Rsh same bits to SignExntend
Categories
(Core :: JavaScript Engine: JIT, defect, P1)
Core
JavaScript Engine: JIT
Tracking
()
RESOLVED
FIXED
mozilla51
Tracking | Status | |
---|---|---|
firefox51 | --- | fixed |
People
(Reporter: hev, Assigned: hev)
References
(Blocks 1 open bug)
Details
Attachments
(4 files, 4 obsolete files)
7.08 KB,
patch
|
nbp
:
review+
|
Details | Diff | Splinter Review |
13.60 KB,
patch
|
nbp
:
review+
|
Details | Diff | Splinter Review |
2.29 KB,
patch
|
nbp
:
review+
|
Details | Diff | Splinter Review |
466 bytes,
patch
|
Details | Diff | Splinter Review |
e.g. original instructions: Lsh v1, v0, 24 Rsh v2, v1, 24 folds to sign-extend: SignExtend(byte) v2, v0
Assignee | ||
Comment 1•6 years ago
|
||
Simulate Test Results: ==== x86_64 Intel i7-4790 3.60GHz ==== Source code: // test.S // gcc -O3 -o test test.S .text .global main main: movabs $0x6fc23ac00,%rdx mov $0xff,%ecx 1: #if 0 movsbl %cxl,%ecx #else sall $24, %ecx sarl $24, %ecx #endif sub $0x1,%rdx jne 1b xor %eax,%eax retq .end main .size main, . - main Results (Unit: second) Test Loop count = 30000000000: Sign-extned: 8.497, 8.283, 8.083, 8.103, 8.463, 8.613, 8.137, 8.827, 8.533, 8.267 Average: 8.3806 Shift : 15.067, 15.18, 15.09, 15.097, 15.243, 15.163, 15.027, 15.027, 15.043, 15.03 Average: 15.0967 ==== ARM Rpi2 Model-B ARMv7 900MHz ==== Source code: // test.S // gcc -O3 -o test test.S .text .align 2 .global main main: movw r0, #0x5e00 movt r0, #0xb2d0 mov r3, #0xff 1: #if 0 sxtb r3, r3 #else mov r3, r3, asl #24 mov r3, r3, asr #24 #endif subs r0, r0, #1 bne 1b bx lr .end main .size main, . - main Results (Unit: second) Test Loop count = 3000000000: Sign-extned: 6.84, 6.69, 6.71, 6.68, 6.7, 6.71, 6.71, 6.69, 6.71, 6.69 Average: 6.713 Shift : 13.53 ,13.39, 13.42, 13.39, 13.42, 13.38, 13.42, 13.39, 13.39, 13.39 Average: 13.412 ==== MIPS Loongson-3A2000 1.0GHz & Ci20 1.0GHz ==== Source code: // test.S // gcc -O3 -o test test.S .text .align 2 .global main .ent main, 0 .type main, @function main: li $v0, 3000000000 li $t0, 0xff 1: #if 0 seb $t0, $t0 #else sll $t0, 24 sra $t0, 24 #endif addiu $v0, -1 bnez $v0, 1b jr $ra .end main .size main, . - main Results (Unit: second) Test Loop count = 3000000000: Loongson-3A2000: Sign-extned: 6.02, 6.023, 6.02, 6.02, 6.02, 6.023, 6.02, 6.023, 6.016, 6.02 Average: 6.0205 Shift : 9.031, 9.027, 9.031, 9.027, 9.027, 9.031, 9.027, 9.031, 9.031, 9.031 Average: 9.0294 Ci20: Sign-extned: 12.51, 12.53, 12.52, 12.54, 12.53, 12.53, 12.52, 12.54, 12.53, 12.53 Average: 12.528 Shift : 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51, 12.51 Average: 12.51
Assignee | ||
Comment 2•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2658ea46bd2c
Attachment #8780423 -
Flags: review?(nicolas.b.pierron)
Comment 3•6 years ago
|
||
Comment on attachment 8780423 [details] [diff] [review] 0001-Bug-1294606-Folds-Lsh-Rsh-same-bits-to-SignExntend.-.patch Review of attachment 8780423 [details] [diff] [review]: ----------------------------------------------------------------- Nice work, the main remark from nits bellow is that this code is too redundant, and you can factor most of it and leave the differences for the */MacroAssembler-*-inl.h implementations. ::: js/src/jit/MIR.cpp @@ +3004,5 @@ > +{ > + MDefinition* lhs = getOperand(0); > + MDefinition* rhs = getOperand(1); > + > + if (!rhs->isConstant() || rhs->type() != MIRType::Int32 || MDefinition::Op_Lsh != lhs->op()) nit: replace "MDefinition::Op_Lsh != lhs->op()" by "lhs->isLsh()" which is more common in the code base, and less verbose. ::: js/src/jit/MIR.h @@ +5994,5 @@ > + public: > + enum Mode { > + Byte, > + Half, > + Word, nit: Word is not handled anywhere, remove it, and remove the default clauses from the switch statements, as these would generate a compiler warning if the switch is not complete. @@ +6014,5 @@ > + return new(alloc) MSignExtend(op, mode); > + } > + static MSignExtend* NewAsmJS(TempAllocator& alloc, MDefinition* op, Mode mode) { > + return new(alloc) MSignExtend(op, mode); > + } nit: replace these New functions, including the NewAsmJS by the TRIVIAL_NEW_WRAPPER macro. @@ +6018,5 @@ > + } > + > + Mode mode() { return mode_; } > + > + ALLOW_CLONE(MSignExtend) follow-up: Add a recover instruction for MSignExtend. (see js/src/jit/Recover.h) ::: js/src/jit/mips-shared/CodeGenerator-mips-shared.cpp @@ +821,5 @@ > + Register out = ToRegister(ins->output()); > + > + switch (ins->mode()) { > + case MSignExtend::Byte: > + masm.as_seb(out, input); nit: Move this CodeGenerator function to the generic CodeGenerator, and add the following signature to the MacroAssembler.h file: inline void move8SignExtend(Register src, Register dst) PER_SHARED_ARCH inline void move16SignExtend(Register src, Register dst) PER_SHARED_ARCH; Add these functions under the "Move instructions" section. ::: js/src/jit/x86/Lowering-x86.cpp @@ +700,5 @@ > + > +void > +LIRGeneratorX86::visitSignExtend(MSignExtend* ins) > +{ > + define(new(alloc()) LSignExtend(useFixed(ins->input(), ebx), ins->mode()), ins); Looking at the Intel documentation does not seem to mention any specific register allocation for MOVSX instructions. nit: This implies that we have the same LIRGenetator for all platform, thus move this to the Lowering.cpp file instead of having one for each architecture.
Attachment #8780423 -
Flags: review?(nicolas.b.pierron) → feedback+
Assignee | ||
Comment 4•6 years ago
|
||
Thank you for review. :) (In reply to Nicolas B. Pierron [:nbp] from comment #3) > Comment on attachment 8780423 [details] [diff] [review] > 0001-Bug-1294606-Folds-Lsh-Rsh-same-bits-to-SignExntend.-.patch > > Review of attachment 8780423 [details] [diff] [review]: > ----------------------------------------------------------------- > > Nice work, the main remark from nits bellow is that this code is too > redundant, and you can factor most of it and leave the differences for the > */MacroAssembler-*-inl.h implementations. > > ::: js/src/jit/MIR.cpp > @@ +3004,5 @@ > > +{ > > + MDefinition* lhs = getOperand(0); > > + MDefinition* rhs = getOperand(1); > > + > > + if (!rhs->isConstant() || rhs->type() != MIRType::Int32 || MDefinition::Op_Lsh != lhs->op()) > > nit: replace "MDefinition::Op_Lsh != lhs->op()" by "lhs->isLsh()" which is > more common in the code base, and less verbose. > > ::: js/src/jit/MIR.h > @@ +5994,5 @@ > > + public: > > + enum Mode { > > + Byte, > > + Half, > > + Word, > > nit: Word is not handled anywhere, remove it, and remove the default clauses > from the switch statements, as these would generate a compiler warning if > the switch is not complete. > > @@ +6014,5 @@ > > + return new(alloc) MSignExtend(op, mode); > > + } > > + static MSignExtend* NewAsmJS(TempAllocator& alloc, MDefinition* op, Mode mode) { > > + return new(alloc) MSignExtend(op, mode); > > + } > > nit: replace these New functions, including the NewAsmJS by the > TRIVIAL_NEW_WRAPPER macro. > > @@ +6018,5 @@ > > + } > > + > > + Mode mode() { return mode_; } > > + > > + ALLOW_CLONE(MSignExtend) > > follow-up: Add a recover instruction for MSignExtend. (see > js/src/jit/Recover.h) How to verify recover instruction is correct? > > ::: js/src/jit/mips-shared/CodeGenerator-mips-shared.cpp > @@ +821,5 @@ > > + Register out = ToRegister(ins->output()); > > + > > + switch (ins->mode()) { > > + case MSignExtend::Byte: > > + masm.as_seb(out, input); > > nit: Move this CodeGenerator function to the generic CodeGenerator, and add > the following signature to the MacroAssembler.h file: > > inline void move8SignExtend(Register src, Register dst) PER_SHARED_ARCH > inline void move16SignExtend(Register src, Register dst) PER_SHARED_ARCH; > > Add these functions under the "Move instructions" section. > > ::: js/src/jit/x86/Lowering-x86.cpp > @@ +700,5 @@ > > + > > +void > > +LIRGeneratorX86::visitSignExtend(MSignExtend* ins) > > +{ > > + define(new(alloc()) LSignExtend(useFixed(ins->input(), ebx), ins->mode()), ins); > > Looking at the Intel documentation does not seem to mention any specific > register allocation for MOVSX instructions. At the start, The LIR::visitSignExtend is a common function, but jit-test is busted on x86 (32-bit). Looks some registers's low 8-bit is not addressable on x86 (32-bit), e.g. movsbl %dil, %xxx is a invalid instruction encoding. In fact, there have a subset of GPRs can be allocated here, not only %ebx. Do you know which use?? function is appropriate? https://treeherder.mozilla.org/#/jobs?repo=try&revision=b8d3bcf83b22&selectedJob=25618358 movsbl %dil,%ecx test.S: Assembler messages: test.S:8: Error: bad register name `%dil' or js/src/jit/Lowering.cpp: void LIRGenerator::visitSignExtend(MSignExtend* ins) { #ifdef JS_CODEGEN_X86 define(new(alloc()) LSignExtend(useFixed(ins->input(), ebx), ins->mode()), ins); #else define(new(alloc()) LSignExtend(useRegisterAtStart(ins->input()), ins->mode()), ins); #endif } What do you think? > > nit: This implies that we have the same LIRGenetator for all platform, thus > move this to the Lowering.cpp file instead of having one for each > architecture.
Flags: needinfo?(nicolas.b.pierron)
Assignee | ||
Comment 5•6 years ago
|
||
Attachment #8780423 -
Attachment is obsolete: true
Comment 6•6 years ago
|
||
(In reply to Heiher [:hev] from comment #4) > (In reply to Nicolas B. Pierron [:nbp] from comment #3) > > > + ALLOW_CLONE(MSignExtend) > > > > follow-up: Add a recover instruction for MSignExtend. (see > > js/src/jit/Recover.h) > > How to verify recover instruction is correct? Have a look at http://searchfox.org/mozilla-central/source/js/src/jit-test/tests/ion/dce-with-rinstructions.js > > Looking at the Intel documentation does not seem to mention any specific > > register allocation for MOVSX instructions. > > At the start, The LIR::visitSignExtend is a common function, but jit-test is > busted on x86 (32-bit). > Looks some registers's low 8-bit is not addressable on x86 (32-bit), e.g. > movsbl %dil, %xxx is a invalid instruction encoding. In fact, there have a > subset of GPRs can be allocated here, not only %ebx. Do you know which use?? > function is appropriate? > > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=b8d3bcf83b22&selectedJob=25618358 > > movsbl %dil,%ecx > > test.S: Assembler messages: > test.S:8: Error: bad register name `%dil' > […] > > What do you think? Currently the register allocator is not aware of byte-size GPR. Currently we have a function named useByteOpRegister [1] which either takes any register on platform where this is doable (such as x64), or a fixed register (such as x86). Unfortunately, this function does not make use of the AtStart flag. I suggest to either add an enum optional argument for AtStart flag, or copy these useByteOpRegister to make a AtStart variant. [1] http://searchfox.org/mozilla-central/rev/9ec085584d7491ddbaf6574d3732c08511709172/js/src/jit/x86/Lowering-x86.cpp#28-32
Flags: needinfo?(nicolas.b.pierron)
Assignee | ||
Comment 7•6 years ago
|
||
Attachment #8781468 -
Attachment is obsolete: true
Attachment #8781524 -
Flags: review?(nicolas.b.pierron)
Assignee | ||
Comment 8•6 years ago
|
||
Attachment #8781533 -
Flags: review?(nicolas.b.pierron)
Assignee | ||
Comment 9•6 years ago
|
||
Attachment #8781534 -
Flags: review?(nicolas.b.pierron)
Assignee | ||
Comment 10•6 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #6) > (In reply to Heiher [:hev] from comment #4) > > (In reply to Nicolas B. Pierron [:nbp] from comment #3) > > > > + ALLOW_CLONE(MSignExtend) > > > > > > follow-up: Add a recover instruction for MSignExtend. (see > > > js/src/jit/Recover.h) > > > > How to verify recover instruction is correct? > > Have a look at > http://searchfox.org/mozilla-central/source/js/src/jit-test/tests/ion/dce- > with-rinstructions.js > > > > Looking at the Intel documentation does not seem to mention any specific > > > register allocation for MOVSX instructions. > > > > At the start, The LIR::visitSignExtend is a common function, but jit-test is > > busted on x86 (32-bit). > > Looks some registers's low 8-bit is not addressable on x86 (32-bit), e.g. > > movsbl %dil, %xxx is a invalid instruction encoding. In fact, there have a > > subset of GPRs can be allocated here, not only %ebx. Do you know which use?? > > function is appropriate? > > > > https://treeherder.mozilla.org/#/ > > jobs?repo=try&revision=b8d3bcf83b22&selectedJob=25618358 > > > > movsbl %dil,%ecx > > > > test.S: Assembler messages: > > test.S:8: Error: bad register name `%dil' > > […] > > > > What do you think? > > Currently the register allocator is not aware of byte-size GPR. Currently > we have a function named useByteOpRegister [1] which either takes any > register on platform where this is doable (such as x64), or a fixed register > (such as x86). > > Unfortunately, this function does not make use of the AtStart flag. I > suggest to either add an enum optional argument for AtStart flag, or copy > these useByteOpRegister to make a AtStart variant. > > [1] > http://searchfox.org/mozilla-central/rev/ > 9ec085584d7491ddbaf6574d3732c08511709172/js/src/jit/x86/Lowering-x86.cpp#28- > 32 Thank you reply. :)
Assignee | ||
Comment 11•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8259ddd5ba0
Assignee | ||
Comment 12•6 years ago
|
||
Fix build failure on arm.
Attachment #8781533 -
Attachment is obsolete: true
Attachment #8781533 -
Flags: review?(nicolas.b.pierron)
Attachment #8781853 -
Flags: review?(nicolas.b.pierron)
Assignee | ||
Comment 13•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=44136eb2be42
Updated•6 years ago
|
Attachment #8781524 -
Flags: review?(nicolas.b.pierron) → review+
Comment 14•6 years ago
|
||
Comment on attachment 8781534 [details] [diff] [review] Part 3 - Add tests for SignExtend. Review of attachment 8781534 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit-test/tests/ion/dce-with-rinstructions.js @@ +165,5 @@ > + if (uceFault_signextend(i) || uceFault_signextend(i)) > + assertEq(x, 99 /* = (99 << 24) >> 24 */); > + var y = (i << 16) >> 16; > + if (uceFault_signextend(i) || uceFault_signextend(i)) > + assertEq(y, 99 /* = (99 << 16) >> 16 */); This assert won't work as expected, create a new function to test the shifts by 16. Also add 2 new functions to test with negative numbers. (-1 * i) The reason this does not work as expect is that "uceFault" ensure the removal of the branch and the assertion is made to check the Recover instruction result. In this case, recover instructions are executed during the bailout path, thus the first assertion will used the recover instruction path, while the second will compute the result within baseline.
Attachment #8781534 -
Flags: review?(nicolas.b.pierron)
Comment 15•6 years ago
|
||
Comment on attachment 8781853 [details] [diff] [review] Part 2 - Folds Lsh/Rsh same bits to SignExntend. Review of attachment 8781853 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/Lowering.cpp @@ +1307,5 @@ > > void > +LIRGenerator::visitSignExtend(MSignExtend* ins) > +{ > + define(new(alloc()) LSignExtend(useByteOpRegisterAtStart(ins->input()), ins->mode()), ins); nit: Only use useByteOpRegisterAtStart for MSignExtend::Byte. ::: js/src/vm/Interpreter-inl.h @@ +766,5 @@ > +template <typename T> > +static MOZ_ALWAYS_INLINE bool > +SignExtendOperation(JSContext* cx, HandleValue in, int* out) > +{ > + int i; nit: int32_t
Attachment #8781853 -
Flags: review?(nicolas.b.pierron) → review+
Assignee | ||
Comment 16•6 years ago
|
||
Attachment #8781534 -
Attachment is obsolete: true
Attachment #8782723 -
Flags: review?(nicolas.b.pierron)
Updated•6 years ago
|
Attachment #8782723 -
Flags: review?(nicolas.b.pierron) → review+
Comment 17•6 years ago
|
||
Pushed by r@hev.cc: https://hg.mozilla.org/integration/mozilla-inbound/rev/3172e3fa6e24 Part 1: Implement LIRGenerator::useByteOpRegisterAtStart. r=nbp https://hg.mozilla.org/integration/mozilla-inbound/rev/defd76119eda Part 2: Folds Lsh/Rsh same bits to SignExntend. r=nbp https://hg.mozilla.org/integration/mozilla-inbound/rev/4d05a40172ca Part 3: Add tests for SignExtend. r=nbp
Comment 18•6 years ago
|
||
sorry had to back this out for Spider Monkey ARM64 bustage, e.g., https://treeherder.mozilla.org/logviewer.html#?job_id=34219022&repo=mozilla-inbound#L979
Flags: needinfo?(r)
Comment 19•6 years ago
|
||
Backout by ihsiao@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/835eee39bb16 Backed out changeset 4d05a40172ca for Spider Monkey ARM64 bustage https://hg.mozilla.org/integration/mozilla-inbound/rev/6568553f7432 Backed out changeset defd76119eda https://hg.mozilla.org/integration/mozilla-inbound/rev/c9dcfd11f12d Backed out changeset 3172e3fa6e24
Comment 20•6 years ago
|
||
Pushed by r@hev.cc: https://hg.mozilla.org/integration/mozilla-inbound/rev/b67594606d54 Part 1: Implement LIRGenerator::useByteOpRegisterAtStart. r=nbp https://hg.mozilla.org/integration/mozilla-inbound/rev/803ae1fb9740 Part 2: Folds Lsh/Rsh same bits to SignExntend. r=nbp https://hg.mozilla.org/integration/mozilla-inbound/rev/d547167d7b56 Part 3: Add tests for SignExtend. r=nbp
Assignee | ||
Comment 21•6 years ago
|
||
(In reply to Iris Hsiao [:ihsiao] from comment #18) > sorry had to back this out for Spider Monkey ARM64 bustage, e.g., > https://treeherder.mozilla.org/logviewer.html#?job_id=34219022&repo=mozilla- > inbound#L979 Fixed, thank you.
Flags: needinfo?(r)
Comment 22•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b67594606d54 https://hg.mozilla.org/mozilla-central/rev/803ae1fb9740 https://hg.mozilla.org/mozilla-central/rev/d547167d7b56
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox51:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Comment 23•6 years ago
|
||
Hi, I'm running into severe compilation time issues with recent versions of SpiderMonkey when running a (very) large asm.js file. I've bisected the problem down to patch part 2 of this bug. My first guess at what's going wrong is that the new MRSh::foldsTo function just gives up if it's not looking at a sign extension, rather than calling MBinaryBitwiseInstruction::foldsTo. The details: I'm working on a gcc (binutils,glibc) backend for asmjs (and wasm) at https://github.com/pipcet/asmjs. I'm producing large switch statements, one for each function, rather than using an Emscripten-like relooper. When compiling Perl, the Perl_yylex function turns into about 30,000 lines of code. When switching from SpiderMonkey-as-of-a-few-months ago to the current inbound branch, compilation time drastically increased, from one minute real time (which is painful to begin with) to about three minutes. Most of that time was indeed spent compiling Perl_yylex, and the problem appeared to be in the Ion register allocators. --wasm-always-baseline resulted in acceptable startup times, and nodejs --harmony similarly didn't seem to be too problematic. I bisected it down to part 2 of the patch for this bug, noticed the lack of inheritance for ::foldsTo, tried fixing it, and performance appears to have gone back to the previous level. I've attached a proposed patch which applies to the current (well, a few days old) inbound tree. If it looks okay I'll convert to SVN format.
Comment 24•6 years ago
|
||
I believe this fixes a minor issue that was causing major compile time performance degradation.
Comment 25•6 years ago
|
||
pipcet, thanks for the patch! Can you open a new bug for this and ask Nicolas (the reviewer of the patches in this bug) for feedback on your patch? This bug is closed as the original issue it was about is fixed and all patches have been landed. Attaching a new patch here runs the risk of it being overlooked.
Flags: needinfo?(pipcet)
Comment 26•6 years ago
|
||
Thanks! I've done that at bug 1318926. Do let me know if there's anything else I can do (I'll try whittling down the problematic script to something manageable in size to see whether that still triggers the bug).
Flags: needinfo?(pipcet)
You need to log in
before you can comment on or make changes to this bug.
Description
•