nanojit: improve LIR_div/LIR_mod codegen

RESOLVED FIXED

Status

()

Core
JavaScript Engine
RESOLVED FIXED
9 years ago
9 years ago

People

(Reporter: njn, Assigned: njn)

Tracking

unspecified
x86
Mac OS X
Points:
---
Dependency tree / graph
Bug Flags:
wanted1.9.2 +

Firefox Tracking Flags

(status1.9.2 beta1-fixed)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 attachment)

(Assignee)

Description

9 years ago
Created attachment 400964 [details] [diff] [review]
patch

Our codegen for LIR_div/LIR_mod is pretty sucky.

This patch improves things.  When it sees a mod, it generates a code for the div and the mod together, rather than trying to half of it and then the rest when the div is seen.  This makes the code easier to understand and results in better code quality when the LIR_div result is not used (except to feed into the LIR_mod result).  An example:

Current code, which acts as if div6 is live after the mod:

      div6 = div ld1269, 4
               mov edx,eax
               sar edx,31
               idiv  edx:eax, ebx
               mov -32(ebp),eax
      mod6 = mod div6
               mov ebx,edx            
               mov edx,-32(ebp)

With patch:

      div6 = div ld1265, 4       # codegen'd with the mod
      mod6 = mod div6
               mov edx,eax
               sar edx,31
               idiv  edx:eax, ebx
               mov ebx,edx

The code generated when the div value is reused is unchanged by the patch.

I'm getting a 3--5ms SunSpider speedup, but I'm not sure I believe it.  Nonetheless the generated code is better, and the compile-time should be shorter because the div-and-mod code avoids some redundant steps.  Cachegrind confirms small reductions in instructions executed.

Note that the patch incorporates the patch from bug 516903.
Attachment #400964 - Flags: review?(gal)

Comment 1

9 years ago
Comment on attachment 400964 [details] [diff] [review]
patch

>diff --git a/js/src/nanojit/Assembler.cpp b/js/src/nanojit/Assembler.cpp
>--- a/js/src/nanojit/Assembler.cpp
>+++ b/js/src/nanojit/Assembler.cpp
>@@ -1352,20 +1352,33 @@ namespace nanojit
>             //
>             // Note that some live LIR instructions won't be printed.  Eg. an
>             // immediate won't be printed unless it is explicitly loaded into
>             // a register (as opposed to being incorporated into an immediate
>             // field in another machine instruction).
>             //
>             if (_logc->lcbits & LC_Assembly) {
>                 outputf("    %s", _thisfrag->lirbuf->names->formatIns(ins));
>-                // Special case: a guard condition won't get printed next time
>-                // around the loop, so do it now.
>-                if (ins->isGuard() && ins->oprnd1()) {
>-                    outputf("    %s       # handled by the guard",
>+                if ((ins->isGuard() && ins->oprnd1()) ||
>+                     ins->isop(LIR_cmov) ||
>+                     ins->isop(LIR_qcmov))
>+                {
>+                    // Special case: code is generated for guard/cmov
>+                    // conditions at the same time that code is generated for
>+                    // the guard/cmov itself.  If the condition is only used
>+                    // by the guard/cmov, we must print it now otherwise it
>+                    // won't get printed.  So we do print it now, with an
>+                    // explanatory comment.  If the condition *is* used again
>+                    // we'll end up printing it twice, but that's ok.
>+                    outputf("    %s       # codegen'd with the guard/cmov",
>+                            _thisfrag->lirbuf->names->formatIns(ins->oprnd1()));
>+
>+                } else if (ins->isop(LIR_mod)) {
>+                    // There's a similar case when a div feeds into a mod.
>+                    outputf("    %s       # codegen'd with the mod",
>                             _thisfrag->lirbuf->names->formatIns(ins->oprnd1()));
>                 }
>             }
> #endif
> 
>             if (error())
>                 return;
> 
>diff --git a/js/src/nanojit/Assembler.h b/js/src/nanojit/Assembler.h
>--- a/js/src/nanojit/Assembler.h
>+++ b/js/src/nanojit/Assembler.h
>@@ -276,16 +276,17 @@ namespace nanojit
>             void        asm_spill(Register rr, int d, bool pop, bool quad);
>             void        asm_load64(LInsp i);
>             void        asm_pusharg(LInsp p);
>             void        asm_ret(LInsp p);
>             void        asm_quad(LInsp i);
>             void        asm_fcond(LInsp i);
>             void        asm_cond(LInsp i);
>             void        asm_arith(LInsp i);
>+            void        asm_div_mod(LInsp i);

This is i386 only so lets move it into DECLARE_PLATFORM_ASSEMBLER (suggested by dvander)

>             void        asm_neg_not(LInsp i);
>             void        asm_ld(LInsp i);
>             void        asm_cmov(LInsp i);
>             void        asm_param(LInsp i);
>             void        asm_int(LInsp i);
>             void        asm_short(LInsp i);
>             void        asm_qlo(LInsp i);
>             void        asm_qhi(LInsp i);
>diff --git a/js/src/nanojit/Nativei386.cpp b/js/src/nanojit/Nativei386.cpp
>--- a/js/src/nanojit/Nativei386.cpp
>+++ b/js/src/nanojit/Nativei386.cpp
>@@ -681,16 +681,46 @@ namespace nanojit
> 
>     void Assembler::asm_switch(LIns* ins, NIns* exit)
>     {
>         LIns* diff = ins->oprnd1();
>         findSpecificRegFor(diff, EDX);
>         JMP(exit);
>     }
> 
>+    // This generates a 'test' or 'cmp' instruction for a condition, which
>+    // causes the condition codes to be set appropriately.  It's used with
>+    // conditional branches, conditional moves, and when generating
>+    // conditional values.  For example:
>+    //
>+    //   LIR:   eq1 = eq a, 0
>+    //   LIR:   xf1: xf eq1 -> ...
>+    //   asm:       test edx, edx       # generated by this function
>+    //   asm:       je ...
>+    //
>+    // If this is the only use of eq1, then on entry 'cond' is *not* marked as
>+    // used, and we do not allocate a register for it.  That's because its
>+    // result ends up in the condition codes rather than a normal register.
>+    // This doesn't get recorded in the regstate and so the asm code that
>+    // consumes the result (eg. a conditional branch like 'je') must follow
>+    // shortly after.
>+    //
>+    // If eq1 is instead used again later, we will also generate code
>+    // (eg. in asm_cond()) to compute it into a normal register, something
>+    // like this:
>+    //
>+    //   LIR:   eq1 = eq a, 0
>+    //   LIR:       test edx, edx
>+    //   asm:       sete ebx
>+    //   asm:       movzx ebx, ebx        
>+    //
>+    // In this case we end up computing the condition twice, but that's ok, as
>+    // it's just as short as testing eq1's value in the code generated for the
>+    // guard.
>+    //
>     void Assembler::asm_cmp(LIns *cond)
>     {
>         LOpcode condop = cond->opcode();
> 
>         // LIR_ov recycles the flags set by arithmetic ops
>         if (condop == LIR_ov)
>             return;
> 
>@@ -766,31 +796,30 @@ namespace nanojit
>     }
> 
>     void Assembler::asm_arith(LInsp ins)
>     {
>         LOpcode op = ins->opcode();
>         LInsp lhs = ins->oprnd1();
> 
>         if (op == LIR_mod) {
>-            /* LIR_mod expects the LIR_div to be near (no interference from the register allocator) */
>-            findSpecificRegFor(lhs, EDX);
>-            prepResultReg(ins, rmask(EDX));
>-            evictIfActive(EAX);
>+            asm_div_mod(ins);
>             return;
>         }
> 
>         LInsp rhs = ins->oprnd2();
> 
>         bool forceReg;
>         RegisterMask allow = GpRegs;
>         Register rb = UnknownReg;
> 
>         switch (op) {
>         case LIR_div:
>+            // Nb: if the div feeds into a mod it will be handled by
>+            // asm_div_mod() rather than here.
>             forceReg = true;
>             rb = findRegFor(rhs, (GpRegs ^ (rmask(EAX)|rmask(EDX))));
>             allow = rmask(EAX);
>             evictIfActive(EDX);
>             break;
>         case LIR_mul:
>             forceReg = true;
>             break;
>@@ -862,18 +891,18 @@ namespace nanojit
>             case LIR_rsh:
>                 SAR(rr, rb);
>                 break;
>             case LIR_ush:
>                 SHR(rr, rb);
>                 break;
>             case LIR_div:
>             case LIR_mod:

LIR_mod should be unused here (dvander also).

>-                 DIV(rb);
>-                 CDQ();
>+                DIV(rb);
>+                CDQ();
>                 break;
>             default:
>                 NanoAssertMsg(0, "Unsupported");
>             }
>         }
>         else
>         {
>             int c = rhs->imm32();
>@@ -912,16 +941,45 @@ namespace nanojit
>                 break;
>             }
>         }
> 
>         if ( rr != ra )
>             MR(rr,ra);
>     }
> 
>+    // This is called when we have a mod(div(divLhs, divRhs)) sequence.
>+    void Assembler::asm_div_mod(LInsp mod)
>+    {
>+        LInsp div = mod->oprnd1();
>+
>+        // LIR_mod expects the LIR_div to be near (no interference from the register allocator)
>+
>+        NanoAssert(mod->isop(LIR_mod));
>+        NanoAssert(div->isop(LIR_div));
>+
>+        LInsp divLhs = div->oprnd1();
>+        LInsp divRhs = div->oprnd2();
>+
>+        prepResultReg(mod, rmask(EDX));
>+        prepResultReg(div, rmask(EAX));
>+
>+        Register rDivRhs = findRegFor(divRhs, (GpRegs ^ (rmask(EAX)|rmask(EDX))));
>+
>+        Register rDivLhs = ( divLhs->isUnusedOrHasUnknownReg()
>+                           ? findSpecificRegFor(divLhs, EAX)
>+                           : divLhs->getReg() );
>+
>+        DIV(rDivRhs);
>+        CDQ();     // sign-extend EAX into EDX:EAX
>+
>+        if ( EAX != rDivLhs )
>+            MR(EAX, rDivLhs);
>+    }
>+
>     void Assembler::asm_neg_not(LInsp ins)
>     {
>         LOpcode op = ins->opcode();
>         Register rr = prepResultReg(ins, GpRegs);
> 
>         LIns* lhs = ins->oprnd1();
>         // if this is last use of lhs in reg, we can re-use result reg
>         // else, lhs already has a register assigned.
Attachment #400964 - Flags: review?(gal) → review+
(Assignee)

Comment 2

9 years ago
(In reply to comment #1)
> >+            void        asm_div_mod(LInsp i);
> 
> This is i386 only so lets move it into DECLARE_PLATFORM_ASSEMBLER (suggested by
> dvander)

Won't it be necessary for x86-64 as well?  But if it is I guess it can be put into x86-64's DECLARE_PLATFORM_ASSEMBLER as well.


> >@@ -862,18 +891,18 @@ namespace nanojit
> >             case LIR_rsh:
> >                 SAR(rr, rb);
> >                 break;
> >             case LIR_ush:
> >                 SHR(rr, rb);
> >                 break;
> >             case LIR_div:
> >             case LIR_mod:
> 
> LIR_mod should be unused here (dvander also).

Yes, I have that in another patch I'm working on but I can include it here.
(Assignee)

Comment 3

9 years ago
http://hg.mozilla.org/tracemonkey/rev/350fc3706ba8
Whiteboard: fixed-in-tracemonkey

Comment 4

9 years ago
http://hg.mozilla.org/mozilla-central/rev/350fc3706ba8
Status: ASSIGNED → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED

Updated

9 years ago
Flags: wanted1.9.2+
You need to log in before you can comment on or make changes to this bug.