Closed
Bug 516909
Opened 15 years ago
Closed 15 years ago
nanojit: improve LIR_div/LIR_mod codegen
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
status1.9.2 | --- | beta1-fixed |
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(1 file)
7.80 KB,
patch
|
gal
:
review+
|
Details | Diff | Splinter Review |
Our codegen for LIR_div/LIR_mod is pretty sucky. This patch improves things. When it sees a mod, it generates a code for the div and the mod together, rather than trying to half of it and then the rest when the div is seen. This makes the code easier to understand and results in better code quality when the LIR_div result is not used (except to feed into the LIR_mod result). An example: Current code, which acts as if div6 is live after the mod: div6 = div ld1269, 4 mov edx,eax sar edx,31 idiv edx:eax, ebx mov -32(ebp),eax mod6 = mod div6 mov ebx,edx mov edx,-32(ebp) With patch: div6 = div ld1265, 4 # codegen'd with the mod mod6 = mod div6 mov edx,eax sar edx,31 idiv edx:eax, ebx mov ebx,edx The code generated when the div value is reused is unchanged by the patch. I'm getting a 3--5ms SunSpider speedup, but I'm not sure I believe it. Nonetheless the generated code is better, and the compile-time should be shorter because the div-and-mod code avoids some redundant steps. Cachegrind confirms small reductions in instructions executed. Note that the patch incorporates the patch from bug 516903.
Attachment #400964 -
Flags: review?(gal)
Comment 1•15 years ago
|
||
Comment on attachment 400964 [details] [diff] [review] patch >diff --git a/js/src/nanojit/Assembler.cpp b/js/src/nanojit/Assembler.cpp >--- a/js/src/nanojit/Assembler.cpp >+++ b/js/src/nanojit/Assembler.cpp >@@ -1352,20 +1352,33 @@ namespace nanojit > // > // Note that some live LIR instructions won't be printed. Eg. an > // immediate won't be printed unless it is explicitly loaded into > // a register (as opposed to being incorporated into an immediate > // field in another machine instruction). > // > if (_logc->lcbits & LC_Assembly) { > outputf(" %s", _thisfrag->lirbuf->names->formatIns(ins)); >- // Special case: a guard condition won't get printed next time >- // around the loop, so do it now. >- if (ins->isGuard() && ins->oprnd1()) { >- outputf(" %s # handled by the guard", >+ if ((ins->isGuard() && ins->oprnd1()) || >+ ins->isop(LIR_cmov) || >+ ins->isop(LIR_qcmov)) >+ { >+ // Special case: code is generated for guard/cmov >+ // conditions at the same time that code is generated for >+ // the guard/cmov itself. If the condition is only used >+ // by the guard/cmov, we must print it now otherwise it >+ // won't get printed. So we do print it now, with an >+ // explanatory comment. If the condition *is* used again >+ // we'll end up printing it twice, but that's ok. >+ outputf(" %s # codegen'd with the guard/cmov", >+ _thisfrag->lirbuf->names->formatIns(ins->oprnd1())); >+ >+ } else if (ins->isop(LIR_mod)) { >+ // There's a similar case when a div feeds into a mod. >+ outputf(" %s # codegen'd with the mod", > _thisfrag->lirbuf->names->formatIns(ins->oprnd1())); > } > } > #endif > > if (error()) > return; > >diff --git a/js/src/nanojit/Assembler.h b/js/src/nanojit/Assembler.h >--- a/js/src/nanojit/Assembler.h >+++ b/js/src/nanojit/Assembler.h >@@ -276,16 +276,17 @@ namespace nanojit > void asm_spill(Register rr, int d, bool pop, bool quad); > void asm_load64(LInsp i); > void asm_pusharg(LInsp p); > void asm_ret(LInsp p); > void asm_quad(LInsp i); > void asm_fcond(LInsp i); > void asm_cond(LInsp i); > void asm_arith(LInsp i); >+ void asm_div_mod(LInsp i); This is i386 only so lets move it into DECLARE_PLATFORM_ASSEMBLER (suggested by dvander) > void asm_neg_not(LInsp i); > void asm_ld(LInsp i); > void asm_cmov(LInsp i); > void asm_param(LInsp i); > void asm_int(LInsp i); > void asm_short(LInsp i); > void asm_qlo(LInsp i); > void asm_qhi(LInsp i); >diff --git a/js/src/nanojit/Nativei386.cpp b/js/src/nanojit/Nativei386.cpp >--- a/js/src/nanojit/Nativei386.cpp >+++ b/js/src/nanojit/Nativei386.cpp >@@ -681,16 +681,46 @@ namespace nanojit > > void Assembler::asm_switch(LIns* ins, NIns* exit) > { > LIns* diff = ins->oprnd1(); > findSpecificRegFor(diff, EDX); > JMP(exit); > } > >+ // This generates a 'test' or 'cmp' instruction for a condition, which >+ // causes the condition codes to be set appropriately. It's used with >+ // conditional branches, conditional moves, and when generating >+ // conditional values. For example: >+ // >+ // LIR: eq1 = eq a, 0 >+ // LIR: xf1: xf eq1 -> ... >+ // asm: test edx, edx # generated by this function >+ // asm: je ... >+ // >+ // If this is the only use of eq1, then on entry 'cond' is *not* marked as >+ // used, and we do not allocate a register for it. That's because its >+ // result ends up in the condition codes rather than a normal register. >+ // This doesn't get recorded in the regstate and so the asm code that >+ // consumes the result (eg. a conditional branch like 'je') must follow >+ // shortly after. >+ // >+ // If eq1 is instead used again later, we will also generate code >+ // (eg. in asm_cond()) to compute it into a normal register, something >+ // like this: >+ // >+ // LIR: eq1 = eq a, 0 >+ // LIR: test edx, edx >+ // asm: sete ebx >+ // asm: movzx ebx, ebx >+ // >+ // In this case we end up computing the condition twice, but that's ok, as >+ // it's just as short as testing eq1's value in the code generated for the >+ // guard. >+ // > void Assembler::asm_cmp(LIns *cond) > { > LOpcode condop = cond->opcode(); > > // LIR_ov recycles the flags set by arithmetic ops > if (condop == LIR_ov) > return; > >@@ -766,31 +796,30 @@ namespace nanojit > } > > void Assembler::asm_arith(LInsp ins) > { > LOpcode op = ins->opcode(); > LInsp lhs = ins->oprnd1(); > > if (op == LIR_mod) { >- /* LIR_mod expects the LIR_div to be near (no interference from the register allocator) */ >- findSpecificRegFor(lhs, EDX); >- prepResultReg(ins, rmask(EDX)); >- evictIfActive(EAX); >+ asm_div_mod(ins); > return; > } > > LInsp rhs = ins->oprnd2(); > > bool forceReg; > RegisterMask allow = GpRegs; > Register rb = UnknownReg; > > switch (op) { > case LIR_div: >+ // Nb: if the div feeds into a mod it will be handled by >+ // asm_div_mod() rather than here. > forceReg = true; > rb = findRegFor(rhs, (GpRegs ^ (rmask(EAX)|rmask(EDX)))); > allow = rmask(EAX); > evictIfActive(EDX); > break; > case LIR_mul: > forceReg = true; > break; >@@ -862,18 +891,18 @@ namespace nanojit > case LIR_rsh: > SAR(rr, rb); > break; > case LIR_ush: > SHR(rr, rb); > break; > case LIR_div: > case LIR_mod: LIR_mod should be unused here (dvander also). >- DIV(rb); >- CDQ(); >+ DIV(rb); >+ CDQ(); > break; > default: > NanoAssertMsg(0, "Unsupported"); > } > } > else > { > int c = rhs->imm32(); >@@ -912,16 +941,45 @@ namespace nanojit > break; > } > } > > if ( rr != ra ) > MR(rr,ra); > } > >+ // This is called when we have a mod(div(divLhs, divRhs)) sequence. >+ void Assembler::asm_div_mod(LInsp mod) >+ { >+ LInsp div = mod->oprnd1(); >+ >+ // LIR_mod expects the LIR_div to be near (no interference from the register allocator) >+ >+ NanoAssert(mod->isop(LIR_mod)); >+ NanoAssert(div->isop(LIR_div)); >+ >+ LInsp divLhs = div->oprnd1(); >+ LInsp divRhs = div->oprnd2(); >+ >+ prepResultReg(mod, rmask(EDX)); >+ prepResultReg(div, rmask(EAX)); >+ >+ Register rDivRhs = findRegFor(divRhs, (GpRegs ^ (rmask(EAX)|rmask(EDX)))); >+ >+ Register rDivLhs = ( divLhs->isUnusedOrHasUnknownReg() >+ ? findSpecificRegFor(divLhs, EAX) >+ : divLhs->getReg() ); >+ >+ DIV(rDivRhs); >+ CDQ(); // sign-extend EAX into EDX:EAX >+ >+ if ( EAX != rDivLhs ) >+ MR(EAX, rDivLhs); >+ } >+ > void Assembler::asm_neg_not(LInsp ins) > { > LOpcode op = ins->opcode(); > Register rr = prepResultReg(ins, GpRegs); > > LIns* lhs = ins->oprnd1(); > // if this is last use of lhs in reg, we can re-use result reg > // else, lhs already has a register assigned.
Attachment #400964 -
Flags: review?(gal) → review+
Assignee | ||
Comment 2•15 years ago
|
||
(In reply to comment #1) > >+ void asm_div_mod(LInsp i); > > This is i386 only so lets move it into DECLARE_PLATFORM_ASSEMBLER (suggested by > dvander) Won't it be necessary for x86-64 as well? But if it is I guess it can be put into x86-64's DECLARE_PLATFORM_ASSEMBLER as well. > >@@ -862,18 +891,18 @@ namespace nanojit > > case LIR_rsh: > > SAR(rr, rb); > > break; > > case LIR_ush: > > SHR(rr, rb); > > break; > > case LIR_div: > > case LIR_mod: > > LIR_mod should be unused here (dvander also). Yes, I have that in another patch I'm working on but I can include it here.
Assignee | ||
Comment 3•15 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/350fc3706ba8
Whiteboard: fixed-in-tracemonkey
Comment 4•15 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/350fc3706ba8
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Updated•15 years ago
|
Flags: wanted1.9.2+
Comment 6•15 years ago
|
||
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/558e94d98018
status1.9.2:
--- → beta1-fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•