Closed Bug 560926 Opened 10 years ago Closed 10 years ago

Add support for arithmetic with branch on overflow

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED
Q3 11 - Serrano

People

(Reporter: wmaddox, Assigned: wmaddox)

References

Details

(Whiteboard: has-patch, PACMAN, fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)

Attachments

(3 files, 5 obsolete files)

This patch adds support for the addjovl, subjovl, and muljovl opcodes, which perform int32 arithmetic followed by a branch on overflow, analogous to the existing addxovl, subxovl, and mulxovl instructions which exit on overflow.
Int64 forms addjovq and subjovq are also defined, as addjovq is needed for a forthcoming fast arithmetic patch, and subjovq was trivial to provide.
Additionally, there is now an int64 subtract, subq, which will also be used in the forthcoming fast arithmetic patch.  The handling of the *jovl opcodes is modeled on the existing code for the *xovl instructions, and shares code with them.  The present patch is for i386 and x86_64 only, but it should be straightforward to support all architectures where the *xovl instructions are implemented. I have a draft patch for ARM and Sparc, but would prefer to get x86 and the client Tamarin codegen patches landed first.

See Bug 552542 for more information on the motivation for the new opcodes.

I've also added support for the LIR_label opcode in lirasm, so it may be written explicitly in preference to the use of the sugared "label :" syntax if desired, and added some tests cases for *jovl opcodes.
Attachment #440578 - Attachment is patch: true
Attachment #440578 - Flags: review?(edwsmith)
Attachment #440578 - Flags: feedback?(nnethercote)
Assignee: nobody → wmaddox
Flags: flashplayer-qrb+
Priority: -- → P3
Target Milestone: --- → flash10.2
Comment on attachment 440578 [details] [diff] [review]
Add support for arithmetic with branch on overflow

Nice patch!  I see very little to complain about, just a few nits below.



>+        // The combined arithmetic and branch-on-overflow instructions (addjov, etc.)
>+        // have the character of a conditional branch, and any changes to the logic
>+        // below will likely needed to be propagated there as well.
>+

I'd prefer this to be factored out -- the asm_jcc() code is very similar to
the case LIR_addjovl, etc. code.  You may need a boolean flag to indicate
whether it's an overflow jump or not, that's ok, it's better than having
almost the same code duplicated.


>+    // Thre is no CseFilter::insGuardJov(), as LIR_*jov are not CSEable.

Nits: "Thre" --> "There", "LIR_*jov" --> "LIR_*jov*".


>+    LIns* ValidateWriter::insBranchJov(LOpcode op, LIns* a, LIns* b, LIns* to)
>+    {
>+        int nArgs = 2;
>+        LTy formals[2];
>+        LIns* args[2] = { a, b };
>+
>+        switch (op) {
>+        case LIR_addjovl:
>+        case LIR_subjovl:
>+        case LIR_muljovl:
>+            formals[0] = LTy_I32;
>+            formals[1] = LTy_I32;
>+            break;
>+
>+#ifdef NANOJIT_64BIT
>+        case LIR_addjovq:
>+        case LIR_subjovq:
>+            formals[0] = LTy_I64;
>+            formals[1] = LTy_I64;
>+            break;
>+#endif
>+        default:
>+            NanoAssert(0);
>+        }
>+
>+        typeCheckArgs(op, nArgs, formals, args);

Nit: Can you add this comment here:

        // We check that target is a label in ValidateReader because it may
        // not have been set here.

insBranch() has the same comment.


>+
>+        return out->insBranchJov(op, a, b, to);
>+    }



>@@ -3168,16 +3263,23 @@ namespace nanojit
>     {
>         LIns *ins = in->read();
>         switch (ins->opcode()) {
>         case LIR_jt:
>         case LIR_jf:
>         case LIR_j:
>             NanoAssert(ins->getTarget() && ins->oprnd2()->isop(LIR_label));
>             break;
>+        case LIR_addjovl:
>+        case LIR_subjovl:
>+        case LIR_muljovl:
>+        CASE64(LIR_addjovq:)
>+        CASE64(LIR_subjovq:)
>+            NanoAssert(ins->getTarget() && ins->oprnd3()->isop(LIR_label));
>+            break;
>         case LIR_jtbl: {
>             uint32_t tableSize = ins->getTableSize();
>             NanoAssert(tableSize > 0);
>             for (uint32_t i = 0; i < tableSize; i++) {
>                 LIns* target = ins->getTarget(i);
>                 NanoAssert(target);
>                 NanoAssert(target->isop(LIR_label));
>             }

Nit: can you add a blank line before "case LIR_addjovl" and before "case
LIR_jtbl".


>diff --git a/nanojit/LIR.h b/nanojit/LIR.h
>--- a/nanojit/LIR.h
>+++ b/nanojit/LIR.h
>@@ -57,51 +57,53 @@ namespace nanojit
> #ifdef NANOJIT_64BIT
> #  define PTR_SIZE(a,b)  b
> #else
> #  define PTR_SIZE(a,b)  a
> #endif
> 
>         // Pointer-sized synonyms.
> 
>-        LIR_paramp  = PTR_SIZE(LIR_paraml, LIR_paramq),
>+        LIR_paramp  = PTR_SIZE(LIR_paraml,  LIR_paramq),
> 
>-        LIR_allocp  = PTR_SIZE(LIR_allocl, LIR_allocq),
>+        LIR_allocp  = PTR_SIZE(LIR_allocl,  LIR_allocq),
> 
>-        LIR_retp    = PTR_SIZE(LIR_retl,   LIR_retq),
>+        LIR_retp    = PTR_SIZE(LIR_retl,    LIR_retq),
> 
>-        LIR_livep   = PTR_SIZE(LIR_livel,  LIR_liveq),
>+        LIR_livep   = PTR_SIZE(LIR_livel,   LIR_liveq),
> 
>-        LIR_ldp     = PTR_SIZE(LIR_ldl,    LIR_ldq),
>+        LIR_ldp     = PTR_SIZE(LIR_ldl,     LIR_ldq),
> 
>-        LIR_stp     = PTR_SIZE(LIR_stl,    LIR_stq),
>+        LIR_stp     = PTR_SIZE(LIR_stl,     LIR_stq),
> 
>-        LIR_callp   = PTR_SIZE(LIR_calll,  LIR_callq),
>+        LIR_callp   = PTR_SIZE(LIR_calll,   LIR_callq),
> 
>-        LIR_eqp     = PTR_SIZE(LIR_eql,    LIR_eqq),
>-        LIR_ltp     = PTR_SIZE(LIR_ltl,    LIR_ltq),
>-        LIR_gtp     = PTR_SIZE(LIR_gtl,    LIR_gtq),
>-        LIR_lep     = PTR_SIZE(LIR_lel,    LIR_leq),
>-        LIR_gep     = PTR_SIZE(LIR_gel,    LIR_geq),
>-        LIR_ltup    = PTR_SIZE(LIR_ltul,   LIR_ltuq),
>-        LIR_gtup    = PTR_SIZE(LIR_gtul,   LIR_gtuq),
>-        LIR_leup    = PTR_SIZE(LIR_leul,   LIR_leuq),
>-        LIR_geup    = PTR_SIZE(LIR_geul,   LIR_geuq),
>+        LIR_eqp     = PTR_SIZE(LIR_eql,     LIR_eqq),
>+        LIR_ltp     = PTR_SIZE(LIR_ltl,     LIR_ltq),
>+        LIR_gtp     = PTR_SIZE(LIR_gtl,     LIR_gtq),
>+        LIR_lep     = PTR_SIZE(LIR_lel,     LIR_leq),
>+        LIR_gep     = PTR_SIZE(LIR_gel,     LIR_geq),
>+        LIR_ltup    = PTR_SIZE(LIR_ltul,    LIR_ltuq),
>+        LIR_gtup    = PTR_SIZE(LIR_gtul,    LIR_gtuq),
>+        LIR_leup    = PTR_SIZE(LIR_leul,    LIR_leuq),
>+        LIR_geup    = PTR_SIZE(LIR_geul,    LIR_geuq),
> 
>-        LIR_addp    = PTR_SIZE(LIR_addl,   LIR_addq),
>+        LIR_addp    = PTR_SIZE(LIR_addl,    LIR_addq),
>+        LIR_subp    = PTR_SIZE(LIR_subl,    LIR_subq),
>+        LIR_addjovp = PTR_SIZE(LIR_addjovl, LIR_addjovq),
> 
>-        LIR_andp    = PTR_SIZE(LIR_andl,   LIR_andq),
>-        LIR_orp     = PTR_SIZE(LIR_orl,    LIR_orq),
>-        LIR_xorp    = PTR_SIZE(LIR_xorl,   LIR_xorq),
>+        LIR_andp    = PTR_SIZE(LIR_andl,    LIR_andq),
>+        LIR_orp     = PTR_SIZE(LIR_orl,     LIR_orq),
>+        LIR_xorp    = PTR_SIZE(LIR_xorl,    LIR_xorq),
> 
>-        LIR_lshp    = PTR_SIZE(LIR_lshl,   LIR_lshq),
>-        LIR_rshp    = PTR_SIZE(LIR_rshl,   LIR_rshq),
>-        LIR_rshup   = PTR_SIZE(LIR_rshul,  LIR_rshuq),
>+        LIR_lshp    = PTR_SIZE(LIR_lshl,    LIR_lshq),
>+        LIR_rshp    = PTR_SIZE(LIR_rshl,    LIR_rshq),
>+        LIR_rshup   = PTR_SIZE(LIR_rshul,   LIR_rshuq),
> 
>-        LIR_cmovp   = PTR_SIZE(LIR_cmovl,  LIR_cmovq),
>+        LIR_cmovp   = PTR_SIZE(LIR_cmovl,   LIR_cmovq),

Nit: unnecessary whitespace changes, please revert.


>-void Assembler::asm_branch_xov(LOpcode op, NIns* target)
>+NIns* Assembler::asm_branch_xov(LOpcode op, NIns* target)

Nit: Now that this function is used for *jov* as well, asm_branch_ov() would
be a better name.
Attachment #440578 - Flags: feedback?(nnethercote) → feedback+
> Nit: unnecessary whitespace changes, please revert.

When I proof-read the patch just before posting it, I thought the same thing, then noticed the reason:  I expanded the spacing of the entire column so that
it would line up after adding the following:

>+        LIR_addjovp = PTR_SIZE(LIR_addjovl, LIR_addjovq),

The opcode names are a bit lengthier here, and I would have either had to
omit the space after the comma, or allowed the columns to be slightly out
of alignment.  I realize it does make the patch messier, and I should have
commented on it when I posted the patch.
Blocks: 561249
(In reply to comment #2)
> > Nit: unnecessary whitespace changes, please revert.
> 
> When I proof-read the patch just before posting it, I thought the same thing,
> then noticed the reason:  I expanded the spacing of the entire column so that
> it would line up after adding the following:
> 
> >+        LIR_addjovp = PTR_SIZE(LIR_addjovl, LIR_addjovq),

Oh sorry, I missed that.  That's fine then.
> I'd prefer this to be factored out -- the asm_jcc() code is very similar to
> the case LIR_addjovl, etc. code.  You may need a boolean flag to indicate
> whether it's an overflow jump or not, that's ok, it's better than having
> almost the same code duplicated.

This is turning out to be tough one.  Both the register state tracking and the address backpatching logic are identical in both cases, and the code really does cry out to be abstracted and shared.  The problem is that the embedded calls to asm_branch or asm_branch_ov are parameterized very differently.  Thus, to take the simple approach that you suggest, a boolean flag, leads to shared code like this:

    void Assembler::asm_cond_branch(LInsp ins,
                                    bool branchOnOverflow, 
                                    LInsp cond,
                                    bool branchOnFalse,
                                    LOpcode op,
                                    InsList& pending_lives)
    {
        countlir_jcc();
        LInsp to = ins->getTarget();
        LabelState *label = _labels.get(to);
        if (label && label->addr) {
            // Forward jump to known label.
            // Need to merge with label's register state.
            unionRegisterState(label->regs);
            if (branchOnOverflow)
                asm_branch_ov(op, label->addr);
            else
                asm_branch(branchOnFalse, cond, label->addr);
        }
        else {
            // Back edge.
            handleLoopCarriedExprs(pending_lives);
            if (!label) {
                // Evict all registers, most conservative approach.
                evictAllActiveRegs();
                _labels.add(to, 0, _allocator);
            }
            else {
                // Evict all registers, most conservative approach.
                intersectRegisterState(label->regs);
            }
            NIns *branch =
                (branchOnOverflow)
                  ? asm_branch_ov(op, 0)
                  : asm_branch(branchOnFalse, cond, 0);
            _patches.put(branch,to);
        }
    }

The difficulty with this, besides the lengthy parameter list, is that the set of parameters that is actually meaninful changes radically with the value of the boolean branchOnOverflow, and the extraneous parameters must be given dummy values at each call site.  In a language with decent support for macros and/or higher-order functions, the parameters to the asm_branch or asm_branch_ov other than the target address (of no interest to the common code) could be passed through transparently.  For example, with a suitable definition of assemble-conditional-branch, I could write in Lisp:

    (assemble-conditional-branch (ins pending-lives)
      (lambda (addr) (asm-branch branch-on-false condition addr))

    (assemble-conditional-branch (ins pending-lives)
      (lambda (addr) (asm-branch_ov op addr))

In C++, however, the machinations necessary to simulate closures are much too ugly and heavyweight to apply here.

The shared code could be plausibly broken up into two functions, so that an overflow branch would be assembled like so:

    LIns* target = ins->getTarget();
    NIns* addr = asm_cond_pre(target, pending_lives);
    LIns* branch = asm_branch_ov(op, addr);
    asm_cond_post(branch, target, addr);

Alternatively, the target address could always be backpatched, so an overflow branch would be assembled this way:

    LIns* branch = asm_branch_ov(op, 0);
    asm_resolve_branch_target(branch, ins, pending_lives);

While this seems reasonably clean, especially if we get rid of the now unneeded address argument, it would reorder the shared code and the platform-specific assembly of the branch, and I haven't verified if this can be tolerated.

Comments or suggestions are welcome.
(In reply to comment #4)
> 
> This is turning out to be tough one.

Bleh.  I'd suggest either:

(a) do the frankenstein all-in-one approach, and make it very clear via comments which args are used depending on the value of branchOnOverflow, or

(b) have the two functions as you originally did, but make sure they're next to each other and that there's a clear comment at the top of each saying "must be kept in sync with the other".

I'll let you choose, you've clearly given this a good amount of thought :)
Attachment #440578 - Attachment is obsolete: true
Attachment #441618 - Flags: review?(edwsmith)
Attachment #441618 - Flags: feedback?(nnethercote)
Attachment #440578 - Flags: review?(edwsmith)
One thing we did in TraceMonkey after too much bad macrology was to use template functions parameterized by traits classes, whose static (often inline) methods handled the differences, while the templated function contained the sameness. The inlining helped make this all efficient. Used for several different patterns, now that I think about it:

http://mxr.mozilla.org/mozilla-central/search?string=SlotVisitorBase
http://mxr.mozilla.org/mozilla-central/search?string=GetUpvarOnTrace

/be
Attachment #441618 - Flags: feedback?(nnethercote) → feedback+
Attachment #441618 - Flags: review?(edwsmith) → review+
Comment on attachment 441618 [details] [diff] [review]
 Add support for arithmetic with branch on overflow (revised)

Is there an #ifdef missing advertizing to LIR frontends that the JOV opcodes are implemented?
I am not a big fan of ifdefs for the nanojit API. We are both tracking nanojit trunk, so I think we can just start using stuff as it becomes available. I don't think anyone is trying to build libnanojit.so, so for now we don't have to worry too much :)
Generally I agree.  The reason for the ifdef here would be to allow a phased implementation - adding the new opcodes in i386/x64 first, then using them in TR, then adding them in other backends over time.

precedents are (for example) NJ_JTBL_SUPPORTED, NJ_EXPANDED_LOADSTORE_SUPPORTED, F2I_SUPPORTED.  in each case a frontend can generate less good code that doesn't use the opcodes.

I'm 100% for aggressively finishing support in other backends and removing these switches over time.  they're only a tool to ease transition, not to clutter up the code forever.
My understanding of the the advertised ifdefs is that they place the burden on
the front ends to deal with the possibility of unimplemented opcodes.  This is
appropriate when one or more target architectures cannot support an opcode, or
cannot do so with performance characteristics expected of native code.  In this
case, there is no apparent reason that any platform that supports *xov* opcodes
should not be able to support *jov* opcodes as well.  The burden is thus on the
backends to implement these, and frontends need not be conditionalized on their
presence in principle, though they must be in the interim until support is
extended to the full set of platforms.  I took my cue here from the *xov*
opcodes, which remain unimplemented on MIPS and PPC, but are not advertised
present or absent.

I am not averse to adding an ifdef, but I'd want to rationalize the treatment of the new opcodes and the existing incomplete *xov* implementation, and make it very clear in which cases the long-term obligation to acommodate the variation belongs to the frontends vs. nanojit.
I think the situation with the xov opcodes is that TR doesn't use them at all, and TM doesn't build at all on MIPS or PPC.  Hence, no ifdefs were needed.  the jov opcodes are different because TR will use them when available, and TR does build on MIPS/PPC (etc).
Sometimes a stick (ie. compile breakage) is needed to force people to update code appropriately.  Eg. the ARM/Sparc/MIPS/PPC backends still use the old deprecated reservation API, and unfortunately I don't see that being changed as long as it's supported because there's no great incentive to update.  So I'd suggest avoiding ifdefs in this case if possible -- just force backends to get updated quickly.
Rebased, and updated to conform to new liveness bits convention introduced in Bug 518267.

Please feel free to commit on approval -- I'm still waiting for my commit access request to go through the process.
Attachment #441618 - Attachment is obsolete: true
Attachment #442308 - Flags: review?(nnethercote)
Attachment #442308 - Flags: feedback?(edwsmith)
Comment on attachment 442308 [details] [diff] [review]
Add support for arithmetic with branch on overflow (rebased) 

>+                case LIR_addjovi:
>+                case LIR_subjovi:
>+                case LIR_muljovi:
>+                CASE64(LIR_addjovq:)
>+                CASE64(LIR_subjovq:)
>+                    countlir_jcc();
>+                    countlir_alu();
>+                    ins->oprnd1()->setResultLive();
>+                    ins->oprnd2()->setResultLive();
>+                    if (ins->isExtant()) {
>+                        asm_jov(ins, pending_lives);
>+#ifdef NANOJIT_64BIT
>+                        if (op == LIR_addjovq || op == LIR_subjovq)
>+                            asm_qbinop(ins);
>+                        else
>+#endif
>+                            asm_arith(ins);
>+                    }
>+                    break;

The general style in this switch would be to split the 32-bit cases from the
64-bit cases.  There'll be some code duplication, eg. the setResultLive()
calls, but no need for the additional test.

r=me with that changed.  Thanks.
Attachment #442308 - Flags: review?(nnethercote) → review+
Patch updated per Nick's comment.  Can Nick or Edwin push this?
I'll do it
Blocks: 562744
No longer blocks: 562744
Whiteboard: has-patch
Whiteboard: has-patch → has-patch, PACMAN
Assignee: wmaddox → edwsmith
Comment on attachment 442308 [details] [diff] [review]
Add support for arithmetic with branch on overflow (rebased) 

assigning to me for pushing to nanojit-central
Attachment #442308 - Flags: feedback?(edwsmith) → feedback+
Rebased one more time, and cleaned up opcode names (great opcode renaming is complete).  If it looks good to you I'll push and merge to TR.
Attachment #442308 - Attachment is obsolete: true
Attachment #442469 - Attachment is obsolete: true
Attachment #446889 - Flags: review?(wmaddox)
Comment on attachment 446889 [details] [diff] [review]
(rebased) adds LIR_[add,sub,mul]jov[i,q]

The rebased patch omits the new tests that I added to lirasm/tests:

addjovi.in	addjovi_ovf.in	muljovi.in	muljovi_ovf.in	subjovi.in	subjovi_ovf.in
addjovi.out	addjovi_ovf.out	muljovi.out	muljovi_ovf.out	subjovi.out	subjovi_ovf.out

Otherwise, patch looks good to go.

R+ with ommitted test files included.
Attachment #446889 - Flags: review?(wmaddox) → review+
NC: http://hg.mozilla.org/projects/nanojit-central/rev/68023362b083
Assignee: edwsmith → nobody
Whiteboard: has-patch, PACMAN → has-patch, PACMAN, fixed-in-nanojit
A renaming was not propagated to the MIPS backend.
Attachment #447805 - Flags: review?(rreitmai)
Attachment #447805 - Attachment is patch: true
Attachment #447805 - Flags: review?(rreitmai) → review+
Pushed fix for renaming issue to nanojit-central:

http://hg.mozilla.org/projects/nanojit-central/rev/8c6a61c93517
This builds in the tr sandbox now, but it doesn't look like were running the full acceptance tests.  Need as nj build with lirasm and the lirasm tests.
Attachment #447864 - Flags: review?(rreitmai)
This compiles and runs on MIPS, but the addjovi_ovf and subjovi_ovf tests fail.
The muljovi_ovf test passes.  Both of the --random tests fail.  This failure
is also observed in an earlier version, rev 1320, so there is evidence that
the MIPS backend is broken in other ways as well.  In fact, I do not know
what its claimed development status is -- AFAIK, MIPS is not a currently supported platform for the major clients.  The patch here does not presume to fix any preexisting issues, but simply extends the translation of the existing *xov* instructions in the most straightforward way.
Attachment #447877 - Flags: review?(rreitmai)
Attachment #447877 - Attachment is patch: true
Attachment #447864 - Attachment is obsolete: true
Attachment #447864 - Flags: review?(rreitmai)
Attachment #447877 - Flags: review?(rreitmai) → review+
Comment on attachment 447877 [details] [diff] [review]
Patch for MIPS support

Chris Dearman maintains this code, requesting feedback.
Attachment #447877 - Flags: feedback?(chris)
See Bug 568737 - Incorrect overflow tests generated for MIPS.
http://hg.mozilla.org/tracemonkey/rev/7e5432180193
Whiteboard: has-patch, PACMAN, fixed-in-nanojit → has-patch, PACMAN, fixed-in-nanojit, fixed-in-tracemonkey
For ARM support, see bug 571202.
Whiteboard: has-patch, PACMAN, fixed-in-nanojit, fixed-in-tracemonkey → has-patch, PACMAN, fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin
Reassigning to myself and closing -- it looks like this has made it to all the right places.  Issues related to support on additional platforms should have their own bugs.
Assignee: nobody → wmaddox
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Attachment #447877 - Flags: feedback?(chris) → feedback?
Attachment #447877 - Flags: feedback?
Flags: flashplayer-bug-
You need to log in before you can comment on or make changes to this bug.