Closed Bug 561249 Opened 14 years ago Closed 14 years ago

Speculative inlining and specialization for untyped addition

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED
Q3 11 - Serrano

People

(Reporter: wmaddox, Assigned: wmaddox)

References

Details

(Whiteboard: PACMAN)

Attachments

(5 files, 9 obsolete files)

5.54 KB, text/plain
Details
5.09 KB, text/plain
wsharp
: feedback+
Details
30.36 KB, patch
wsharp
: review+
edwsmith
: superreview+
Details | Diff | Splinter Review
2.80 KB, text/plain
wsharp
: feedback+
Details
51.83 KB, patch
Details | Diff | Splinter Review
This is the first of a series of patches based on the experimental work reported in bug 552542.  It implements speculative inlining of integer addition involving one or more untyped (atom) arguments.  Additionally, if either argument is known to be numeric, a specialized out-of-line handler is invoked to avoid unnecessary conversion of an argument of known numeric type to an atom.

The patch depends on the addjovl, addjov1, and subq LIR opcodes added by the patch in bug 560926.  The optimizations are currently enabled only for i386 and
x86_64 platforms, but will apply to other platforms as the Nanojit support is expanded.
Depends on: 560926, 555345
Assignee: nobody → wmaddox
Whiteboard: PACMAN
Attachment #440926 - Attachment is patch: true
Attachment #440926 - Flags: review?(edwsmith)
Attachment #440926 - Flags: feedback?(rreitmai)
The careful reader will no doubt notice a few small helper functions not otherwise used, e.g., LirHelper::ldl() and atomIsBothString(). These anticipate the subsequent patch to specialize relational and equality comparisons, and I was "working in the neighborhood".
Flags: flashplayer-qrb+
Priority: -- → P3
Target Milestone: --- → flash10.2
Priority: P3 → P1
Attachment #440926 - Attachment is obsolete: true
Attachment #441600 - Flags: review?(edwsmith)
Attachment #441600 - Flags: feedback?(rreitmai)
Attachment #440926 - Flags: review?(edwsmith)
Attachment #440926 - Flags: feedback?(rreitmai)
Blocks: 561963
Attachment #441600 - Attachment is obsolete: true
Attachment #442494 - Flags: review?(edwsmith)
Attachment #442494 - Flags: feedback?(rreitmai)
Attachment #441600 - Flags: review?(edwsmith)
Attachment #441600 - Flags: feedback?(rreitmai)
Attachment #442494 - Attachment is patch: true
Comment on attachment 442494 [details] [diff] [review]
Speculative inlining and specialization for untyped addition (rebased to track opcode renaming) 

I think you already referred to this in the code, but possibly merge common code in asm_jcc and asm_jov

Not necessarily an issue with your patch, but it looks like the interface between CodegenLabel and VarTracker is ready to be tightened up a bit.  The fact that we now have two places in CodegenLIR where unpatchedEdges is allocated by code external to CodegenLabel implies the design needs to be tweaked.
Attachment #442494 - Flags: feedback?(rreitmai) → feedback+
Depends on: 563944
Comment on attachment 442494 [details] [diff] [review]
Speculative inlining and specialization for untyped addition (rebased to track opcode renaming) 

removing review pending resolution of bug 563944
Attachment #442494 - Flags: review?(edwsmith)
Component: Virtual Machine → JIT Compiler (NanoJIT)
Hardware: x86 → All
Depends on: 591556
Attachment #442494 - Attachment is obsolete: true
Rebased.  Updated to build non-x86 platforms with required NanoJIT support.
Attachment #473732 - Attachment is obsolete: true
Attachment #478486 - Attachment is obsolete: true
Attachment #478827 - Flags: superreview?(edwsmith)
Attachment #478827 - Flags: review?(rreitmai)
Attachment #478827 - Flags: feedback?(wsharp)
Comment on attachment 478827 [details] [diff] [review]
Inlining and specialization for untyped addition, configurable to avoid speculative inlining (revised)

High level notes, then I'll wait for Rick/Werner's input before a final superreview.

Its really clear we're on a slippery slope with code organization.  I propose we finish things in-flight, then work on any cleanup we want to do before starting new patches.

* You've chosen a different naming convention for specialized helper functions than Werner chose.  both are workable, but we need a convention and a brief comment saying what the convention is.

* the pattern of:   if (type == INT_TYPE) { .. } else if (type == UINT_TYPE ) ... should be turned into a switch(bt(type)) whenever possible.

* since these if/else or switch chains tend to get long, each case could/should be in a separate method; then reviewing the dispatch logic (which optimization is being chosen) is separated from reviewing the code that generates LIR.

(if a different principle is better, lets talk about it then apply it uniformly).

* I suggested to Werner that we need a coarse switch to turn on/off all the specialization optimizations, and maybe also a series of booleans, one per "interesting" case (how fine grained? i dont know; put yourself in the shoes of tracking down a JIT bug in a large flash application).

given said switches, we probably dont need a #define for each fine-grained optimization.  but, any helpers in instr.cpp, or StringObject.cpp, or elsewhere, need to be guarded by #ifdef FEATURE_NANOJIT, or a finer-grained switch, if they are only callable from jit code.

Need acceptance tests to cover each variant so we can get to 100% line coverage of the helper functions, and 100% line coverage of each arm of the if/else-if chains in CodegenLIR.
Attached file small brightspot run
Some data on untyped addition in a 500+ file brightspot run.  Certainly not the majority but definitely some uses of *+String, *+Number, etc.
Comment on attachment 478827 [details] [diff] [review]
Inlining and specialization for untyped addition, configurable to avoid speculative inlining (revised)

Logic seems fine.  Do we want to support UINT_TYPE as well?  From the brightspot data I generated it's a less common case but still got some hits.

I think the "non-numeric" comments are not correct.  It's really that one argument is unknown (which may or may not be non-numeric).  Calling it non-numeric had me confused for a sec.

Also the "// no overflow possible" comment needs a bit more explanation.  Maybe "// no overflow possible because each int is only 29-bits"?  On 2nd thought, why can't that code overflow?  In op_add_a_ai, isn't the rhs argument a full 32-bit integer value?  0x7fffffff + Atom(1) would overflow.
Attachment #478827 - Flags: feedback?(wsharp) → feedback+
Also, this patch deserves a well crafted regression test included to hit the various overflow checks, type combinations, etc.
(In reply to comment #13)
> Comment on attachment 478827 [details] [diff] [review]
> Inlining and specialization for untyped addition, configurable to avoid
> speculative inlining (revised)
> 
> Logic seems fine.  Do we want to support UINT_TYPE as well?  From the
> brightspot data I generated it's a less common case but still got some hits.

Possibly for specialization only.  I'm hesitant to generate inline code for it.

> I think the "non-numeric" comments are not correct.  It's really that one
> argument is unknown (which may or may not be non-numeric).  Calling it
> non-numeric had me confused for a sec.

In CodegenLIR::emitAdd(), we have previously handled the case in which both arguments are numeric.  Thus this comment:

   // If we arrive here, we know that at least one argument is non-numeric.

In the following cases, where we have identified one of the arguments as being
numeric, e.g., of a specific numeric type such as INT, we then know that the
other argument *must* not be numeric.  Note that the "non-numeric" argument may be an atom representing a number -- here, "non-numeric" is to be interpreted in the narrow sense that !argument.traits->isNumeric() holds.  I think this is reasonably clear from the other comments in context, but perhaps this could be made more explicit.

> Also the "// no overflow possible" comment needs a bit more explanation.  Maybe
> "// no overflow possible because each int is only 29-bits"?  On 2nd thought,
> why can't that code overflow?  In op_add_a_ai, isn't the rhs argument a full
> 32-bit integer value?  0x7fffffff + Atom(1) would overflow.

Good catch.  I don't know what I was thinking...
This is what I thinking:

	function simple(a:Number, b:*):Number
	{
		return a + b;
	}
	trace(simple(5, 4));

We're adding a NUMBER_TYPE to a NULL traits.  It does not mean that 'b' is non-numeric...it means that b's type in undefined.  In the current code, this gets into op_add with two integer atoms. I'd rather see it be something like "non-numeric/untyped".  I realize b will be a number-encoded Atom at runtime, but I still found the comment confusing.
This fixes the missing overflow checks.  I've also refactored the specialized cases into their own methods.  A possible nit is the use of the 'type' argument to these methods, which, it should be clear from assertions in the code will always be OBJECT_TYPE.  This is a nod to the planned elaboration of the scheme to specialize on expected type as well, but this may be a bit in the future.

Regression tests are forthcoming as a separate patch.
Attachment #472743 - Attachment is obsolete: true
Attachment #478827 - Attachment is obsolete: true
Attachment #479236 - Flags: review?(rreitmai)
Attachment #479236 - Flags: feedback?(wsharp)
Attachment #478827 - Flags: superreview?(edwsmith)
Attachment #478827 - Flags: review?(rreitmai)
Comment on attachment 478827 [details] [diff] [review]
Inlining and specialization for untyped addition, configurable to avoid speculative inlining (revised)

Comments on the obsolete patch...

Nothing fundamentally wrong with the patch, but I'm going to hold back r+ing until the we work out some of the items below (and Eds comments above) since I'd like another peek at the code prior to it landing.

- like to see the common code factored out e.g (val1.traits == INT_TYPE) case is almost identical to (val2.traits == INT_TYPE) ditto for NUMBER_TYPE.
- the dynamic add (addjovp) case did not strip the rhs , I don't see why this is not done as loadAtomRep will give an atom no?   In any event a comment block showing the math would be invaluable.
- also related to the above code block could you clarify why the 2nd arg of localSet() does not need to call atomToNativeRep().
- emitIntPlusAtomFastpath() shifts left then right does this mean that negative values will fail this optimization and jump to fallback?
- A bit of scope creep, but with all of our bit shifting / fiddling going on in LIR , does it make sense to have some helper functions to encapsulate 
common manipulations surrounding Atom conversions.
- in branchJovToLabel() should trackForwardEdge() contain a 2nd parameter?
- I haven't looked in detail but any reason we can't replace all the xxx_slowpath()'s with AvmCore op_add() ?   Or are we gaining sufficiently in the slowpath cases, that this special logic is worth it.
(In reply to comment #18)
> Comment on attachment 478827 [details] [diff] [review]
> Inlining and specialization for untyped addition, configurable to avoid
> speculative inlining (revised)
> 
> Comments on the obsolete patch...

> - like to see the common code factored out e.g (val1.traits == INT_TYPE) case
> is almost identical to (val2.traits == INT_TYPE) ditto for NUMBER_TYPE.

This duplication remains in the current patch, but the code has been broken up into smaller methods for each case, so will refer to them by name here. Note that some duplicated code in earlier revisions was factored out in CodegenLIR::emitIntPlusAtomFastpath().  The remaining duplication in CodegenLIR::emitAddIntToAtom(), CodegenLIR::emitAddAtomToInt() is due in part to the JIT profiling code.  The parameter to JIT_EVENT, a macro, is a name that will be declared as a local variable, not the use of a previously-defined value.  I cannot pass such a name as an argument, thus if the duplicated code were to be factored out, I would end up conflating the profile events for the int+atom and atom+int cases, for example.  This wouldn't be disastrous, but it was a deliberate choice to preserve this information. 

Aside from this concern, it appears rather awkward to deal with the role reversal of lhs and the rhs.  The attractive strategy of simply reversing the arguments to the method encapsulating the shared code will not work, because addition in AS3 is not commutative.  Here is an attempt:

    void CodegenLIR::emitAddHelper(int i,
				   const CallInfo* fid_slow, 
                                   const CallInfo* fid_spec,
				   LIns* lhs, LIns* rhs,
				   LIns* intArg, LIns* otherArg,
				   Traits* type)
    {
#ifdef VMCFG_FASTPATH_ADD_INLINE
            CodegenLabel fallback("fallback");
            CodegenLabel done("done");
            suspendCSE();
            emitIntPlusAtomFastpath(i, type, intArg, otherArg, fallback);
            branchToLabel(LIR_j, NULL, done);
            emitLabel(fallback);
            LIns* out = callIns(fid_slow, 3, coreAddr, lhs, rhs);
            localSet(i, out, type);
            JIT_EVENT(jit_add_a_ia_slow);
            emitLabel(done);
            resumeCSE();
        #else
            LIns* out = callIns(fid_spec, 3, coreAddr, lhs, rhs);
            localSet(i, out, type);
            JIT_EVENT(jit_add_a_ia);
        #endif
    }

    void CodegenLIR::emitAddIntToAtom(int i, int j, Traits* type)
    {
        LIns* rhs = loadAtomRep(j);
        LIns* lhs = localGet(i);
	emitAddHelper(i,
		      FUNCTIONID(op_add_a_ia_slowpath),
		      FUNCTIONID(op_add_a_ia),
		      lhs, rhs, lhs, rhs, type);
    }
    
    void CodegenLIR::emitAddAtomToInt(int i, int j, Traits* type)
    {
        LIns* lhs = loadAtomRep(i);
        LIns* rhs = localGet(j);
	emitAddHelper(i,
		      FUNCTIONID(op_add_a_ai_slowpath),
		      FUNCTIONID(op_add_a_ai),
		      lhs, rhs, rhs, lhs, type);
    }

To my eye, this is not an improvement, especially considering the loss of profiling specificity.  I'm open to concrete and constructive suggestions, however.

> - the dynamic add (addjovp) case did not strip the rhs , I don't see why this
> is not done as loadAtomRep will give an atom no?   In any event a comment block
> showing the math would be invaluable.

Indeed, the code is subtle here and needs some explanation.  What I am doing is preserving the tag of one of the arguments so that the sum is formed with the proper tag already in place.  The logic is similar to the existing out-of-line integer fastpath, where we add two tagged values, but subtract off one tag from the result, leaving the low-order bits as if we'd stripped both arguments and then OR'd in the tag after the sum was formed.

> - also related to the above code block could you clarify why the 2nd arg of
> localSet() does not need to call atomToNativeRep().

I think you mean why the sum (the second argument), in native representation, need not be converted to an atom.  That's because, as noted above, the sum is computed with the tag in place, so it is an atom, not a native integer.

> - emitIntPlusAtomFastpath() shifts left then right does this mean that negative
> values will fail this optimization and jump to fallback?

No.  The purpose of the shifts is to sign extend the lower 29 bits.
We then test that the result is numerically equal to the original value.
If this is the case, then we know that the value will fit it 29 bits,
and nothing will be lost if we convert to a 32-bit intptr with a 3-bit tag.

> - A bit of scope creep, but with all of our bit shifting / fiddling going on in
> LIR , does it make sense to have some helper functions to encapsulate 
> common manipulations surrounding Atom conversions.

I did add a few helpers earlier, in anticipation of this patch, for ORing and masking intptr values as needed for atom arithmetic.  There may be other opportunities, e.g., shifting by tagsize comes to mind.  Suggestions?

> - in branchJovToLabel() should trackForwardEdge() contain a 2nd parameter?

Yes.  A recent patch added this parameter to distinguish exception edges from normal edges.

> - I haven't looked in detail but any reason we can't replace all the
> xxx_slowpath()'s with AvmCore op_add() ?   Or are we gaining sufficiently in
> the slowpath cases, that this special logic is worth it.

Some early experiments showed regressions on some benchmarks that were apparently due to duplicated work on the slowpath after doing the fastpath tests.  I adopted the principle of avoiding such duplication by specializing op_add() for each context in which it was called while guarded by earlier fastpath tests.  I have not verified that all such cases actually pay off on our benchmarks.  Rather, I presumed that it would be possible to construct a benchmark that exhibited a regression for any case where there was duplicated work, and endeavored to avoid or minimize it in all cases.
Comment on attachment 479236 [details] [diff] [review]
Specialization and speculative inlining of addition (refactored, with fixes)

r+ ; nice cleanup !
Attachment #479236 - Flags: review?(rreitmai) → review+
Attachment #479236 - Flags: superreview?(edwsmith)
This AS3 program exercises integer addition at the boundary cases where overflow or a change of result representation occurs.  Integer values to be added are presented in all forms (int, double, atom) that can represent the value without loss.  Each pair of arguments is presented twice, with lhs and rhs roles reversed.

This is a standalone exerciser, not yet in a form suitable for automated testing.
Attachment #479662 - Flags: feedback?(wsharp)
Attachment #479662 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 479236 [details] [diff] [review]
Specialization and speculative inlining of addition (refactored, with fixes)

I'm wondering whether op_add_a_ai and op_add_a_ai_slowpath should be combined into one function and the VMCFG_FASTPATH_ADD_INLINE code should be integrated into the function itself.  One does overflow checks on integer math while the rest of the function is identical.

Also, are we considering inlining Number+* as direct ASM or does that generate too much inline code?  From the brightspot run, Number+* is about 3x more common that int+*.
Attachment #479236 - Flags: feedback?(wsharp) → feedback+
Comment on attachment 479662 [details]
Exhaustive test of boundaries at which integer overflow or representation change may occur

This looks fine for the integer, double tests.

I'd like to see a test for all the combinations of addition possible:

int + String
int + Object
int + Object with defaultValue func
int + Namespace
int + xml
int + xmlList
int + whatever I'm forgetting

Whatever combination hits all the various cases in the int+*, *+int, Number+*, *+Number, *.* functions in this patch.
Attachment #479662 - Flags: feedback?(wsharp) → feedback+
(In reply to comment #22)
> Comment on attachment 479236 [details] [diff] [review]
> Specialization and speculative inlining of addition (refactored, with fixes)
> 
> I'm wondering whether op_add_a_ai and op_add_a_ai_slowpath should be combined
> into one function and the VMCFG_FASTPATH_ADD_INLINE code should be integrated
> into the function itself.  One does overflow checks on integer math while the
> rest of the function is identical.

If the inlined check succeeds, the function call to op_add_ai_slowpath is avoided.  Are you suggesting doing away with the inlining and just calling the specialized function?  op_add_ai is provided only to handle the case in which inlining is suppressed as a configuration option.  The inlined overflow check is also faster, because it can use hardware overflow detection on the addition rather than multiple ALU ops to infer the overflow after the fact.  Or are you suggesting that the op_add_ai could replace the slowpath helper?  In this case, if the fastpath check fails (i.e., the speculated argument is *not* an integer), then we would forget this fact and repeat the test.  True, op_add_ai_slowpath eventually must check for the intptr case anyway, because both a tag mismatch (not an integer) and an integer overflow fall back to the same slowpath handler.  The presumption is that the oveflow case is rare, though, so the fallback most likely means we've got a double.  This is probably one of the more marginal cases of specialization.  Though manually coded, the specialized routines were all derived systematically from the the original op_add routine and the preconditions established by prior inlined guards.

> Also, are we considering inlining Number+* as direct ASM or does that generate
> too much inline code?  From the brightspot run, Number+* is about 3x more
> common that int+*.

As long as the result must be an atom, we'll likely have to do a function call to convert the double result to an atom anyway.  The logic here is complicated by the convention that small integral numeric values are represented as kIntptrType.  So, yes, I didn't want to inline all of that code.  There is, however, the opportunity to further specialize when the result will be converted back to a double: Number + * -> Number can be inlined nicely. I'm looking forward to getting this much of the solution in place, and then adding these coercion-context transformations as you have recently been doing elsewhere.
I was suggesting that you had one function whether or not the define is enabled.  If the helper function is used when VMCFG_FASTPATH_ADD_INLINE is turned on, you #ifdef out the overflow check.  

Atom op_add_a_ai(AvmCore* core, Atom lhs, int32_t rhs)
{
#if !VMCFG_FASTPATH_ADD_INLINE
    // do overflow checks
#endif

   // same exact code in either case
}
(In reply to comment #25)
> I was suggesting that you had one function whether or not the define is
> enabled.  If the helper function is used when VMCFG_FASTPATH_ADD_INLINE is
> turned on, you #ifdef out the overflow check.  

Ah.  Interestingly, op_add_a_aa_slowpath is also very similar to op_add and should admit a similar unification.  The logic in op_add_a_aa_slowpath that is shared with op_add, however, is more tightly coded in op_add_a_aa_slowpath.  Also, op_add is called from the interpreter, so the fastpath logic should remain even if VMCFG_FASTPATH_ADD_INLINE is configured.  I've revised the patch with a new function op_add_a_aa that replaces calls to op_add from JIT'd code when VMCFG_FASTPATH_ADD is configured, even when not inlining.  Eventually, I think the generic op_add can simply go away -- it's not coded as tightly as op_add_a_aa, and the intptr+intptr fastpath can probably be lifted into the interpreter.  For now, I like having op_add around to be called when !VMCFG_FASTPATH_ADD, to be used as a baseline for comparison.  Its logic more directly mirrors the ECMA spec, contains useful comments, and serves a reference implementation for specialized variants.  I think it will eventually go, however.
Refactored to reflect comment #24, comment #25, and comment #26.

Fixed bug in int+Date case.
Attachment #479236 - Attachment is obsolete: true
Attachment #479991 - Flags: superreview?(edwsmith)
Attachment #479991 - Flags: review?(wsharp)
Attachment #479236 - Flags: superreview?(edwsmith)
Attachment #479991 - Attachment is patch: true
Attachment #479992 - Attachment description: ition/concatenation with a Tests for specialized addnumeric and non-numeric argument → Tests for specialized addition/concatentation of numeric and non-numeric argument
Attachment #479992 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 479991 [details] [diff] [review]
Specialization and speculative inlining of addition (refactored, with more fixes)

Looks good.
Attachment #479991 - Flags: review?(wsharp) → review+
Attachment #479992 - Flags: feedback?(wsharp) → feedback+
Comment on attachment 479991 [details] [diff] [review]
Specialization and speculative inlining of addition (refactored, with more fixes)

caveat: I didn't review the new slow-path helpers for correctness, and only skimmed the new inline jit code.  I'm counting on test coverage!  SR+ with the following fixes:

In the places you have #ifndef VMCFG_FASTPATH_ADD_INLINE in a slow-path helper, please add an #else case that asserts the condition that should have been handled by the inline jit code.  (in instr.cpp op_add_a_aa() for example, but also other places).

Comments are needed for the new helpers (op_add_a_aa, etc).  Yeah, the naming convention and signature sort of tell the story, but my eyes glaze over... 
maybe a one-liner showing the expected type signature, e.g.

  // atom + int => atom
  Atom op_add_a_ai(...)

avmbuild.h: 

  // Required NanoJIT support not available on PPC and SPARC.

Comment should say what instructions are missing and/or reference an outstanding bug to implement those instructions.  (create bug if needed).


+    void CodegenLIR::emitIntPlusAtomFastpath(int i, Traits* type, LIns* lhs, LIns* rhs, CodegenLabel &fallback)
+    {
+        LIns* tag = andp(rhs, AtomConstants::kAtomTypeMask);
+        branchToLabel(LIR_jf, eqp(tag, AtomConstants::kIntptrType), fallback);

Here and in similar emitters, it would be extremely helpful to include
lirasm or C pseudocode to show what is being generated.
Attachment #479991 - Flags: superreview?(edwsmith) → superreview+
There's nothing new here in the logic, but I did add quite a few lines of commentary and assertions.  Also, automated regression tests are included. You've seen this before, except insofar as I addressed the concerns in the previous comment, but I'd like someone to take one last look at it.
Attachment #482784 - Flags: superreview?(edwsmith)
Attachment #482784 - Attachment description: Updated patch, with commentary, assertions, and regression tests → Updated patch, incorporating changes requested in comment #30
Attachment #482784 - Flags: superreview?(edwsmith)
Pushed to tamarin-redux:

http://hg.mozilla.org/tamarin-redux/rev/e0d8f4a425de
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Flags: flashplayer-bug-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: