[redux] Avoid unbox/rebox on the interpreter call path; enable tail call optimization on call path

VERIFIED FIXED

Status

Tamarin
Virtual Machine
VERIFIED FIXED
10 years ago
9 years ago

People

(Reporter: Lars T Hansen, Unassigned)

Tracking

Details

Attachments

(1 attachment)

18.02 KB, patch
Edwin Smith
: review+
Steven Johnson
: review+
Details | Diff | Splinter Review
(Reporter)

Description

10 years ago
Created attachment 347758 [details] [diff] [review]
Patch

Avoiding unboxing/reboxing and enabling tail calling in compilers that support it (gcc, armcc) are intertwined and are being submitted as a single patch.

Unbox/rebox are being avoided by having coerceEnter check whether the execution engine it is about to invoke is an interpreter function.  If so then it bypasses first unboxCoerceArgs (replacing it with an in-place argument coercion) as well as interp32 and interpN, and goes directly to interp, passing the boxed arguments.

Tail call optimization is aided by making sure that the addresses of local variables are not being taken in functions that are on the interp->interp call path.  In practice, for functions that perform simple function calls, interp() frames end up being stacked back-to-back on the call stack, all frames for intermediate functions disappear.

One possibly controversial decision is alluded to above, boxed arguments are coerced in-place in coerceEnter.  I've seen no regressions because of this but the legality of this is not documented, so in principle it may break the embedding environment.  Opinions welcome.

Because interp() is now called with boxed arguments, there are implications for CallStackFrame and assorted logic.  This has been handled by making it possible for CallStackFrame to be initialized either by boxed or unboxed argument lists; complexity increases somewhat, also in client code I expect.

One remaining optimization, to be handled another day, is that tail calls are inhibited and overhead is added by having to copy arguments in cases when coerceEnter is called with a separate 'this' argument; its interface currently takes only a single arguments array, where the this object is argument zero.
Attachment #347758 - Flags: review?(edwsmith)
(Reporter)

Updated

10 years ago
Attachment #347758 - Flags: review?(stejohns)

Comment 1

10 years ago
It's an unwritten assumption throughout the VM that args can be modified in place by the called function, even if args are overlapped with locals (although they aren't currently).

... still reviewing

Updated

10 years ago
Attachment #347758 - Flags: review?(edwsmith) → review+

Comment 2

10 years ago
Comment on attachment 347758 [details] [diff] [review]
Patch

in MethodEnv::coerceAtom(), the code could be optimized a bit by

   switch(Traits::getBuiltinType(t)) {
     case BUILTIN_number .. 
     case BUILTIN_int ..
     ..
   }
(Reporter)

Comment 3

10 years ago
(In reply to comment #2)
> (From update of attachment 347758 [details] [diff] [review])
> in MethodEnv::coerceAtom(), the code could be optimized a bit by
> 
>    switch(Traits::getBuiltinType(t)) {
>      case BUILTIN_number .. 
>      case BUILTIN_int ..
>      ..
>    }

That depends.  Last time Steven and I were in Newton we tried something similar in a different situation, and the switch code was slower.  The reason is that short switches on non-dense value sets are turned back into an "if" or a binary search; if we write the "if" explicitly we can sometimes do better than the compiler because we can put common cases first.  That said, I have no particular reason to assume any particular distribution of types here; obviously it's program-dependent.
(Reporter)

Comment 4

10 years ago
(In reply to comment #1)
> It's an unwritten assumption throughout the VM that args can be modified in
> place by the called function, even if args are overlapped with locals (although
> they aren't currently).

That's good.

I'll consider making this a written assumption :-)

Comment 5

10 years ago
(In reply to comment #3)
> (In reply to comment #2)
> > (From update of attachment 347758 [details] [diff] [review] [details])
> > in MethodEnv::coerceAtom(), the code could be optimized a bit by
> > 
> >    switch(Traits::getBuiltinType(t)) {
> >      case BUILTIN_number .. 
> >      case BUILTIN_int ..
> >      ..
> >    }
> 
> That depends.  Last time Steven and I were in Newton we tried something similar
> in a different situation, and the switch code was slower.  The reason is that
> short switches on non-dense value sets are turned back into an "if" or a binary
> search; if we write the "if" explicitly we can sometimes do better than the
> compiler because we can put common cases first.  That said, I have no
> particular reason to assume any particular distribution of types here;
> obviously it's program-dependent.

Ah, interesting.  My beleif that using BUILTIN_foo is faster was based on it being a small int constant, vs FOO_TYPE being an expression involving a couple pointer de-refs and a load.

So, if comparison order matters, maybe:

BuiltintType bt = Traits::getBuiltinType(t);
if (bt == BUILTIN_number) { }
...

assumptions:
  getBuiltinType gets inlined
  the if(!t) inside there doesn't hurt being first

agreed, lots of varibales that could affect performance.

Comment 6

10 years ago
aside: we'd clean up a lot of code by creating a real Traits for "*" and never having null Traits* to worry about.
(Reporter)

Comment 7

10 years ago
(In reply to comment #6)
> aside: we'd clean up a lot of code by creating a real Traits for "*" and never
> having null Traits* to worry about.

Is that a bug or a work item?  :-)

Comment 8

10 years ago
work item, if we agree its a good idea.  just being verbose today.

Comment 9

10 years ago
> > search; if we write the "if" explicitly we can sometimes do better than the
> > compiler because we can put common cases first.  That said, I have no
> > particular reason to assume any particular distribution of types here;
> > obviously it's program-dependent.
> 
> Ah, interesting.  My beleif that using BUILTIN_foo is faster was based on it
> being a small int constant, vs FOO_TYPE being an expression involving a couple
> pointer de-refs and a load.

Using the constants is definitely a (slight) win over the FOO_TYPE expression... the place we saw if-else as a win over switch was in get/setSlotAtom, where "atom" is most common, thus putting it first netted a small but measurable win. (that said, a switch that was done as a lookup table might be better still, but there's no way to ensure all compilers do that, alas)

Comment 10

10 years ago
(In reply to comment #8)
> work item, if we agree its a good idea.  just being verbose today.

+1, let's add it as a work item.... that's been a wart that should be cleaned someday.

Comment 11

10 years ago
(In reply to comment #4)
> (In reply to comment #1)
> > It's an unwritten assumption throughout the VM that args can be modified in
> > place by the called function, even if args are overlapped with locals (although
> > they aren't currently).

I'll fix nativegen.py to no longer make the arguments const, then :-)

Comment 12

10 years ago
Why does interp32, interpN, etc take uint32* instead of Atom* (or even uintptr*)? (I realize this is historical, just wondering if it matters... uint32 doesn't seem like a very 64-bit-safe declaration)

Updated

10 years ago
Attachment #347758 - Flags: review?(stejohns) → review+
(Reporter)

Comment 13

10 years ago
Pushed to tamarin-redux as changeset 1094:6321952e04d0
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
(Reporter)

Comment 14

10 years ago
(In reply to comment #12)
> Why does interp32, interpN, etc take uint32* instead of Atom* (or even
> uintptr*)? (I realize this is historical, just wondering if it matters...
> uint32 doesn't seem like a very 64-bit-safe declaration)

Atom* would be wrong because the arguments to interp32 and interpN are unboxed values (unlike the arguments to interp, which are now boxed).  But uint32_t* does not seem right, I agree -- should be byte*, void*, or uintptr_t* IMO.

Comment 15

10 years ago
(In reply to comment #14)

> Atom* would be wrong because the arguments to interp32 and interpN are unboxed
> values (unlike the arguments to interp, which are now boxed).  But uint32_t*
> does not seem right, I agree -- should be byte*, void*, or uintptr_t* IMO.

Better yet, declare a new type ("unboxed_t*" or some such) to emphasize this.
(Reporter)

Comment 16

10 years ago
(In reply to comment #15)
> (In reply to comment #14)
> 
> > Atom* would be wrong because the arguments to interp32 and interpN are unboxed
> > values (unlike the arguments to interp, which are now boxed).  But uint32_t*
> > does not seem right, I agree -- should be byte*, void*, or uintptr_t* IMO.
> 
> Better yet, declare a new type ("unboxed_t*" or some such) to emphasize this.

Will add a work item, should not be a lot of work though.

Comment 17

9 years ago
Resolved fixed engineering / work item that has been pushed.  Setting status to verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.