Closed Bug 490565 Opened 15 years ago Closed 14 years ago

Port TT Box type to Redux

Categories

(Tamarin Graveyard :: Virtual Machine, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED WONTFIX
Future

People

(Reporter: stejohns, Assigned: stejohns)

References

Details

Attachments

(29 files, 4 obsolete files)

37.88 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
54.22 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
11.66 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
94.25 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
49.16 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
55.21 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
51.83 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
47.65 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
25.06 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
17.60 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
6.29 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
664 bytes, patch
lhansen
: review+
Details | Diff | Splinter Review
86.50 KB, patch
Details | Diff | Splinter Review
46.26 KB, patch
Details | Diff | Splinter Review
26.40 KB, patch
Details | Diff | Splinter Review
7.72 KB, patch
Details | Diff | Splinter Review
64.25 KB, patch
Details | Diff | Splinter Review
29.46 KB, patch
Details | Diff | Splinter Review
23.26 KB, patch
Details | Diff | Splinter Review
151.76 KB, patch
Details | Diff | Splinter Review
3.82 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
7.18 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
5.12 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
5.14 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
2.33 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
15.71 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
6.91 KB, patch
lhansen
: review+
Details | Diff | Splinter Review
43.78 KB, patch
stejohns
: review-
Details | Diff | Splinter Review
30.02 KB, patch
Details | Diff | Splinter Review
In Tamarin-Tracing, we converted most uses of Atom into a new type, Box, that is able to store doubles without GC allocation, which dramatically improved interpreter performance. We should convert TR/TC to use the same type.

This bug is going to be heavily staged in multiple patches so as to facilitate reviewing, but the first dozen or so (!) patches will probably have to land at once; there will be many intermittent stages of partial conversion that will cause serious performance degradation.
Initial implementation of Box, based roughly on what we used in TT, with a few improvements.

The 64-bit-architecture implementation needs some more examination; currently it uses a 128-bit (!) structure but I think this could probably be trimmed with a bit of work.
Attachment #374967 - Flags: review?(lhansen)
Initial proof-of-concept to get things working.
Attachment #374968 - Flags: review?(lhansen)
A little cleanup of the FramePtr type used in CallStackNode and friends, preparatory for subsequent work.
Attachment #374969 - Flags: review?(lhansen)
Interpreter stack and frame become Box instead of Atom. Lots and lots of conversions to/from Atoms inside the loop, which will be gradually fixed over the next few patches.
Attachment #374970 - Flags: review?(lhansen)
More conversion of helper functions to use Box directly, and then cleanup of Interpreter (and other callers).
Attachment #374971 - Flags: review?(lhansen)
More conversion of helper functions to use Box directly, and then cleanup of Interpreter (and other callers).
Attachment #374972 - Flags: review?(lhansen)
More conversion of helper functions to use Box directly, and then cleanup of Interpreter (and other callers).
Attachment #374973 - Flags: review?(lhansen)
More conversion of helper functions to use Box directly, and then cleanup of Interpreter (and other callers).
Attachment #374974 - Flags: review?(lhansen)
In Wordcode mode, don't push Atoms for ints, just push the int (or uint) directly.
Attachment #374975 - Flags: review?(lhansen)
Comment on attachment 374967 [details] [diff] [review]
Patch #1 -- define Box

Despite the voluminous comment, BOX_NAN is not defined dependent on architecture.  We want a VMCFG_ value to control what a quiet nan looks like, and maybe expose it as an AVMSYSTEM_ definition that will usually be sniffed by the platform sniffer.

I really do not care for the _boxToDouble convention; not only does it violate the standard, but the underscore is virtually invisible to my eyes.  I use a "...Slow" convention myself: "boxToDoubleSlow".
Attachment #374967 - Flags: review?(lhansen) → review+
Attachment #374968 - Flags: review?(lhansen) → review+
Comment on attachment 374968 [details] [diff] [review]
Patch #2 -- convert coerceEnter() methods to use Box instead of Atom

Major bonus points for renaming unbox1.

Is uintptr_t* the most desirable return type from to_native_args?  We have an open work item on concocting an abstract type name here.

nit: the patch to nativegen.py probably belongs elsewhere.
Comment on attachment 374969 [details] [diff] [review]
Patch #3 -- clean up FramePtr

There's a forward ref here to a type "struct FramePtr_" (the underscore again) which isn't defined in this or any other patch.

Is this comment correct: "Note that the framep you hand to CallStackNode 
varies depending on whether the Method in question is jitted or interped: if jitted, framep points to a series of 8-byte native values. if interped, framep points to a series of Atom."  Specifically, is the last word correct?
Attachment #374969 - Flags: review?(lhansen) → review+
Comment on attachment 374970 [details] [diff] [review]
Patch #4 -- initial conversion of Interpreter stack to Box

Most of this is OK, provided that a lot of the box <-> atom conversions go away.

Getting rid of abstractions like IS_BOTH_DOUBLE and IS_BOTH_INT is not OK, because branch-free optimizations of those idioms may be possible.

I have misgivings about the rewrite of the INC and DEC macros but I have to spend more time on it than I have right now, plus they're known to be buggy anyway (wraparound is not working ok) so future scrutiny is inevitable.

There's some introduction of new local variables here, which is a no-no for stack frame size reasons, but not a biggie (not too many from what I can tell).

Introducing computation where there was none, as in the case of OP_not, is not OK.

Stylistically testing an integer for zero by using the variable name by itself is not acceptable.

r+ with serious reservations.
Attachment #374970 - Flags: review?(lhansen) → review+
Attachment #374971 - Flags: review?(lhansen) → review+
Attachment #374972 - Flags: review?(lhansen) → review+
Attachment #374973 - Flags: review?(lhansen) → review+
Attachment #374974 - Flags: review?(lhansen) → review+
Attachment #374975 - Flags: review?(lhansen) → review+
The last few were reviewed somewhat lightly, MEGO.  Generally I think it looks good.  Once you feel everything is in good shape I'll apply them all to a clean checkout and do a thorough review of all at once.
(In reply to comment #10)
> (From update of attachment 374967 [details] [diff] [review])
> Despite the voluminous comment, BOX_NAN is not defined dependent on
> architecture.  We want a VMCFG_ value to control what a quiet nan looks like,
> and maybe expose it as an AVMSYSTEM_ definition that will usually be sniffed by
> the platform sniffer.

That might be tricky to do, as the entire premise of the encoding of Box is predicated on what quiet NaN's look like -- on systems that differ substantially, nontrivial surgery might be necessary. I'd prefer to update the comment for now to indicate that, and add the relevant VMCFG_, etc. options if/when we encounter such a system.

> I really do not care for the _boxToDouble convention; not only does it violate
> the standard, but the underscore is virtually invisible to my eyes.  I use a
> "...Slow" convention myself: "boxToDoubleSlow".

The day that standard starts being enforced by compilers, the world will break, but I will defer to you on this and rename :-)

(In reply to comment #11)
> (From update of attachment 374968 [details] [diff] [review])
> Is uintptr_t* the most desirable return type from to_native_args?  We have an
> open work item on concocting an abstract type name here.

Eh, not sure, to be honest. My hunch is that this area of code will continue to be a bit mutable so I don't think that work item will get addressed real soon.

> nit: the patch to nativegen.py probably belongs elsewhere.

Yeah, it's a general bug that we were just getting lucky with before. Want me to split it into a separate bug?

(In reply to comment #12)
> (From update of attachment 374969 [details] [diff] [review])
> There's a forward ref here to a type "struct FramePtr_" (the underscore again)
> which isn't defined in this or any other patch.

Correct, that's by design; it's an opaque type that can't be dereferenced, as a "FramePtr" is context-dependent depending on the method in question, so an explicit cast is necessary after sniffing. 

And to respond in nit-kind: AFAIK the standard doesn't say *anything* about trailing underscore, only leading ones :-)
 
> Is this comment correct: "Note that the framep you hand to CallStackNode 
> varies depending on whether the Method in question is jitted or interped: if
> jitted, framep points to a series of 8-byte native values. if interped, framep
> points to a series of Atom."  Specifically, is the last word correct?

At this point in the patch sequence, yes. (I think one of the subsequent patches changes the last word to Box, because at that point that's what it is)


(In reply to comment #13)
> (From update of attachment 374970 [details] [diff] [review])
> Most of this is OK, provided that a lot of the box <-> atom conversions go
> away.

Plan is that none of this will be landed until all conversions (or at least, all hot conversions) go away.
 
> Getting rid of abstractions like IS_BOTH_DOUBLE and IS_BOTH_INT is not OK,
> because branch-free optimizations of those idioms may be possible.

I'm happy to add those back in, but FWIW, I experimentally had such abstractions in place (mimicking the existing) code and actually got worse peformance. Still, adding the abstractions back is no issue, but the underlying implementation might be the same as the branchy version you see.
 
> I have misgivings about the rewrite of the INC and DEC macros but I have to
> spend more time on it than I have right now, plus they're known to be buggy
> anyway (wraparound is not working ok) so future scrutiny is inevitable.

Yeah, me too, the behavior in all cases seems a little dodgy (both before and after). Still, I'd rather get this right (or at least as right as we know how) the first time, so can you expound more on your specific misgivings?

> There's some introduction of new local variables here, which is a no-no for
> stack frame size reasons, but not a biggie (not too many from what I can tell).

Can you expound a bit more on this? What are the offenders, and what is the preferred best practice? 
 
> Introducing computation where there was none, as in the case of OP_not, is not
> OK.

I don't see what you're referring to in this case -- where's the computation? Does  "!=0" count as computation?

> Stylistically testing an integer for zero by using the variable name by itself
> is not acceptable.

IMHO this is generally but not absolutely true (e.g. when stupid compilers attack), but where am I doing this? Generally I prefer explicit !=0 usage, so I'm surprised it's present...

> r+ with serious reservations.

I reject such an r+ :-) Let's get the serious reservations out of the way; nits can be dealt with later, but no point in approving stuff with known concerns.
BTW... as I make the suggested changes, want me to replace the existing patches, or add a subsequent patch with the requested changes? I'm guessing the latter will be easier for you to deal with, but I can do the former if you prefer...
(In reply to comment #16)
> BTW... as I make the suggested changes, want me to replace the existing
> patches, or add a subsequent patch with the requested changes? I'm guessing the
> latter will be easier for you to deal with, but I can do the former if you
> prefer...

I didn't even know you could replace a patch.  Do what you prefer; I'll cope.
(In reply to comment #15)
> (In reply to comment #10)
> > (From update of attachment 374967 [details] [diff] [review] [details])
> > Despite the voluminous comment, BOX_NAN is not defined dependent on
> > architecture.  We want a VMCFG_ value to control what a quiet nan looks
> > like,
> > and maybe expose it as an AVMSYSTEM_ definition that will usually be
> > sniffed by
> > the platform sniffer.
> 
> That might be tricky to do, as the entire premise of the encoding of Box is
> predicated on what quiet NaN's look like -- on systems that differ
> substantially, nontrivial surgery might be necessary. I'd prefer to update the
> comment for now to indicate that, and add the relevant VMCFG_, etc. options
> if/when we encounter such a system.

According to this: http://f2c.sourcearchive.com/documentation/20020621-3.4/uninit_8c-source.html the MIPS architecture, which we will support sooner rather than later, uses a different representation for QNaN/SNaN than x86.  I did some research on these representations at a previous job.  In all cases I could only ever see that it was the most significant digit of the mantissa that matters.  And we really want to be sure we get it right.

> > nit: the patch to nativegen.py probably belongs elsewhere.
> 
> Yeah, it's a general bug that we were just getting lucky with before. Want me
> to split it into a separate bug?

Consider it R+ and just ship it.

> > Getting rid of abstractions like IS_BOTH_DOUBLE and IS_BOTH_INT is not OK,
> > because branch-free optimizations of those idioms may be possible.
> 
> I'm happy to add those back in, but FWIW, I experimentally had such
> abstractions in place (mimicking the existing) code and actually got worse
> peformance. Still, adding the abstractions back is no issue, but the underlying
> implementation might be the same as the branchy version you see.

Sure, that's fair.

> > I have misgivings about the rewrite of the INC and DEC macros but I have to
> > spend more time on it than I have right now, plus they're known to be buggy
> > anyway (wraparound is not working ok) so future scrutiny is inevitable.
> 
> Yeah, me too, the behavior in all cases seems a little dodgy (both before and
> after). Still, I'd rather get this right (or at least as right as we know how)
> the first time, so can you expound more on your specific misgivings?

The cross reference is bug #482506.

I looked again.  If you're not seeing regression failures they're by definition OK until the final patch lands, in my view.  I think I got confused by the overflow checking, but I think it's probably all right.  BTW for the unsigned case of eg FAST_INC_MAYBE it's better style to keep the value unsigned and comparing to ~0U rather than casting to int and comparing to -1.

> > There's some introduction of new local variables here, which is a no-no for
> > stack frame size reasons, but not a biggie (not too many from what I can tell).
> 
> Can you expound a bit more on this? What are the offenders, and what is the
> preferred best practice? 

INSTR(negate) has a couple of locals that are fresh.  One reason the interpreter is so unreadable already is that all variables have been lifted to the top and are reused explicitly because many compilers failed to reuse them properly (looks like the optimizer caved in on the function, frankly).  Without this reuse the stack frames were gigantic.  So we should reuse variables already defined at the top of the function when possible, and define new variables there for reuse when necessary.  Of course, sometimes a local definition is still right, if a lifted var can't be reused elsewhere.

> > Introducing computation where there was none, as in the case of OP_not, is not
> > OK.
> 
> I don't see what you're referring to in this case -- where's the computation?
> Does  "!=0" count as computation?

In the case when a1 was a boolean already the old code would just xor it with a bit pattern and return it.  Now the boolean value is extracted from a box and then a new box is created.  My comment was probably a little harsh; different representations call for different logic.  I just didn't want to see the code generalized for the common case more than necessary.

> > Stylistically testing an integer for zero by using the variable name by itself
> > is not acceptable.
> 
> IMHO this is generally but not absolutely true (e.g. when stupid compilers
> attack), but where am I doing this? Generally I prefer explicit !=0 usage, so
> I'm surprised it's present...

A number of instances of (u && u <= 0x80000000) for various values of u and 0x80000000.

> > r+ with serious reservations.
> 
> I reject such an r+ :-) Let's get the serious reservations out of the way; nits
> can be dealt with later, but no point in approving stuff with known concerns.

As I wrote elsewhere, I think this is good enough as an intermediate stage, just be aware that the interpreter looks nasty not because I'm a bad programmer (necessarily) but because of biases toward certain cases, dealing with broken compilers, and so on, so it's good to try not to undo optimizations.  That said, I fully realize we will have to do a re-tuning pass once you're done with the conversion.

Consider reservations voided.
(In reply to comment #18)

> According to this:
> http://f2c.sourcearchive.com/documentation/20020621-3.4/uninit_8c-source.html
> the MIPS architecture, which we will support sooner rather than later, uses a
> different representation for QNaN/SNaN than x86.  I did some research on these
> representations at a previous job.  In all cases I could only ever see that it
> was the most significant digit of the mantissa that matters.  And we really
> want to be sure we get it right.

Good to know, I'll check it out.
 
> > > Getting rid of abstractions like IS_BOTH_DOUBLE and IS_BOTH_INT 

FWIW, I'm going to add these back in as new box methods -- not sure about the impementation, though. Using a clever technique is apparently causing GCC's optimizer to become very unhappy and generate stupid code, on x86-32 anyway. I'll have to poke a bit more, but having the "isBoth" (sic) approach can't hurt and improves readability.

> I looked again.  If you're not seeing regression failures they're by definition
> OK until the final patch lands, in my view.  

Not seeing any regression failures so far. Need to test on more compilers/architectures, though.

> > > There's some introduction of new local variables here, which is a no-no for
> > > stack frame size reasons, but not a biggie (not too many from what I can tell).
> > 
> > Can you expound a bit more on this? What are the offenders, and what is the
> > preferred best practice? 
> 
> INSTR(negate) has a couple of locals that are fresh. 

Doh! yep, will nuke these, good catch.

> (looks like the optimizer caved in on the function, frankly)

No argument here, I'm seeing similar unhappiness as I disassemble. (It probably doesn't help that replacing a1,a2->bx1,bx2 is doubling register pressure on the already-register-starved x86-32.)

> In the case when a1 was a boolean already the old code would just xor it with a
> bit pattern and return it.  Now the boolean value is extracted from a box and
> then a new box is created.  My comment was probably a little harsh; different
> representations call for different logic.  I just didn't want to see the code
> generalized for the common case more than necessary.

Point taken. I'll look at added some smarter abstraction for this case.
 
> A number of instances of (u && u <= 0x80000000) for various values of u and
> 0x80000000.

will fix.
 
> As I wrote elsewhere, I think this is good enough as an intermediate stage,
> just be aware that the interpreter looks nasty not because I'm a bad programmer
> (necessarily) but because of biases toward certain cases, dealing with broken
> compilers, and so on, so it's good to try not to undo optimizations.  That
> said, I fully realize we will have to do a re-tuning pass once you're done with
> the conversion.

Yep, definitely. Another issue I've avoided worrying about for now is optimization of various tail-call cases (eg MethodEnv::coerceEnter) as it's unlikely stay reliably fixed until the code stabilizes. 

I'll prepare a subsequent patch that address this stuff, plus a few other modest optimizations, but none of this will land until performance is equal-or-better across the board, which is isn't yet -- at least a few more days of work is necessary for that (eg, Arrays will probably need to be converted to Box, as they were in TT)
(In reply to comment #18)

> According to this:
> http://f2c.sourcearchive.com/documentation/20020621-3.4/uninit_8c-source.html
> the MIPS architecture, which we will support sooner rather than later, uses a
> different representation for QNaN/SNaN than x86.  I did some research on these
> representations at a previous job.  In all cases I could only ever see that it
> was the most significant digit of the mantissa that matters.  And we really
> want to be sure we get it right.

FYI, there's probably existing code in MathUtils.cpp that is wrong for MIPS in this case.
Misc cleanups per Lars suggestions in previous comments, plus a couple of minor optimizations based on light profiling.
Attachment #375270 - Flags: review?(lhansen)
(In reply to comment #20)

> FYI, there's probably existing code in MathUtils.cpp that is wrong for MIPS in
> this case.

Good point.  (I might worry about the JIT as well but I haven't looked.)
Comment on attachment 375270 [details] [diff] [review]
Patch #10 -- misc cleanup

Ultra-nit: in the case for INSTR(negate) this one slipped through the do-not-test-for-zero-by-using-value-as-boolean filter:

+					if (i1 && i1 != int32_t(0x80000000)) {
Attachment #375270 - Flags: review?(lhansen) → review+
Blocks: 482506
Blocks: 467776
Blocks: 465754
A few more minor tweaks to improve Box performance:

-- on x86-32, FASTCALL won't spread a single 64-bit arg into the two registers we have available, so do a bit of hackery to force the issue. For heavily used functions with a single Box input argument, this is a modest but worthwhile win (2-5%)

-- a few tweaks to the inlined fast-path wrapper for a few functions to reduce branching.
Attachment #376353 - Flags: review?(lhansen)
Comment on attachment 376353 [details] [diff] [review]
Patch #11 -- minor speed tweaks

Neat.

I doubt the parens required for BOX_FROM_FASTCALL_ARG add anything, and losing them would make the syntax more like ARG_IN and ARG_PASSTHRU and thus reduce brain print, but consider it my opinion only.
Attachment #376353 - Flags: review?(lhansen) → review+
FYI, I verified that the ARM ABI specifies that 64-bit (and 128-bit) args are passed by register, in the way you might expect. (Ditto 64 and 128-bit return values.) Not sure about PPC, MIPS, etc.
the bit-ops in Interpreter have a better fastpath now; after checking both-int, they check both-number, which allows for mixed int/uint/double usage without have to call out to generic compare.
Attachment #376753 - Flags: review?(lhansen)
Attachment #376753 - Attachment is patch: true
Attachment #376753 - Attachment mime type: application/octet-stream → text/plain
Attachment #376753 - Flags: review?(lhansen) → review+
Lars -- this and the next several patches are all conceptually a single patch, split into smaller chunks to make reviewing possible. The overall goal is to remove box<->atom conversions by pushing Box as far down the chain as possible; this will include converting all the methods in ScriptObject (and descendants) that take/return Atom, to take/return Box. It also includes a conversion of Array to hold Box rather than Atom. None of these reviews are time-critical, since more work has to be done before any could land, so don't rush.
Attachment #377462 - Flags: review?(lhansen)
Attachment #377463 - Flags: review?(lhansen)
Attachment #377470 - Flags: review?(lhansen)
Attachment #377471 - Flags: review?(lhansen)
This moves the needle substantially for code that hammers on global vars (eg jsbench/HeapSort improves by 5%)
Attachment #377475 - Flags: review?(lhansen)
Attachment #377475 - Flags: review?(lhansen) → review+
Comment on attachment 377475 [details] [diff] [review]
Patch #21 -- cache the global scope in the interpreter loop

Did you actually observe a performance difference?
Yep, though a bit backhanded -- HeapSort is one of a few benches that are actually slower (vs TR tip) with Box, vs without, at least with the current patch sequence. But with the global-scope patch, it went from -19% to -14%.
Minor tweaks based on integration into Flash:

-- make boxFromObj/boxFromObjNonNull accept const pointers, so that const member methods can use them on "this"

-- add bogus debug-only def of delUintProperty to catch improper redefinitions 

-- add inline wrappers for get/set/has/delproperty_box, that accept Object instead of Box and infer vtable, since this is how it's used almost all the time anyway

-- convert Toplevel::has/delproperty into box form, instead of atom
Attachment #378171 - Flags: review?(lhansen)
Grab-bag of compiler warning/error fixes turned up by the buildbot.
Attachment #378219 - Flags: review?(lhansen)
Blocks: 493922
Blocks: 493924
A few minor optimizations to Box and friends that arose from profiling. Mostly a wash, but a few benchmarks (eg FFT) get up to a 10% perf gain from this.
Attachment #379018 - Flags: review?(lhansen)
Use a trick to speed up readU30 on 32-bit systems: both x86 and ARM will return 64-bit values in a register pair, so return the updated pointer in the MSW of a 64-bit value. This allows the source address to stay registerized. (Note: I haven't attempted to duplicate this for 64-bit systems.)
Attachment #379224 - Flags: review?(lhansen)
Comment on attachment 379224 [details] [diff] [review]
Patch #25 -- speed up readU30 calls on 32-bit systems

obsoleting this patch for now -- initial testing showed a modest positive bump but further testing shows some anomalies. not worth figuring out at this point.
Attachment #379224 - Attachment is obsolete: true
Attachment #379224 - Flags: review?(lhansen)
Supporting methods boxToNativeRep, loadBoxRep, and storeBoxArgs and a 'newclass_box' call that confirms the new methods are correct.
Attachment #379018 - Flags: review?(lhansen) → review+
Attachment #378171 - Flags: review?(lhansen) → review+
Attachment #378219 - Flags: review?(lhansen) → review+
-- AS3_call and the related variants of coerceEnter were incorrectly assuming that argc/argv included a receiver at argv[0]. 
-- Inlined the simple ctor for Multiname; the old out-of-line one was showing up at over 1% of interp time in some cases...
Attachment #379750 - Flags: review?(lhansen)
Move Toplevel::coerce_box into Box.h as boxCoerce, and smarten it up to do some trivial checks inline. Checking for expected=* and for thinks that can be computed via the BUILTIN_xxx via bitmask is worthwhile, moving the needle up to 5% on some benches (x86).
Attachment #379778 - Flags: review?(lhansen)
Comment on attachment 379265 [details] [diff] [review]
JIT patch #1 - base support for box

loadBoxRep could probably just do a switch on t's BuiltinType rather than repeated if-else.
Attached patch Patch #27aSplinter Review
Same as Patch 27, but with a couple of minor fixes found after the fact.
Attachment #379778 - Attachment is obsolete: true
Attachment #379790 - Flags: review?(lhansen)
Attachment #379778 - Flags: review?(lhansen)
Attachment #379750 - Flags: review?(lhansen) → review+
Attachment #379790 - Flags: review?(lhansen) → review+
JIT box work in back-end - acceptance tests run cleanly!
- Add support for 64b args in Nativei386
- Tweak CallInfo ARG format to make it make clear the separation of
  32b args and 64b args.
- Fix some bugs in loadBoxRep and boxToNativeRep
- All acceptance testing running, with -Ojit 2 failures, but unclear
  if these are newly introduced or not.

Also linking bugs 416398 and 433793 here, since they contain comments on Box encoding and other relevant info.
Attachment #379265 - Attachment is obsolete: true
setSlot_box is (almost) always preceded by a call to boxCoerce(), but has enough information to do the job internally in many cases. Inlining the necessary logic inside the function and removing the preceding calls saves 5% in typical cases, 30% in a few outliers.

("almost" because setsuper doesn't do so, nor does it need to, but the extra coerce doesn't hurt and isn't worth optimizing out for this case.)
Attachment #379968 - Flags: review?(lhansen)
Status: NEW → ASSIGNED
Flags: flashplayer-qrb+
Priority: -- → P2
Target Milestone: --- → flash10.x
Comment on attachment 379968 [details] [diff] [review]
Patch #28 -- setSlot_box does boxCoerce() internally

I might prefer it if setSlot_box were renamed to reflect the fact that it now performs coercion; coerceAndSetSlot_box maybe.
Attachment #379968 - Flags: review?(lhansen) → review+
(In reply to comment #51)
> (From update of attachment 379968 [details] [diff] [review])
> I might prefer it if setSlot_box were renamed to reflect the fact that it now
> performs coercion; coerceAndSetSlot_box maybe.

good thought. I'll make it so.
A few final tweaks suggested by profiling, that add up to another 5% or so on average: these include removing the redundant VTable* argument to most of these; splitting up the heavily-used ScriptObject variant of get/set property from the rarelyused non-ScriptObject ones (thus avoiding the need to pass the redundant Box tag and also streamlining the code options; removing the redundant 'base' argument to callproperty (requiring argv[0] to be set properly by the caller instead), plus a few others.
Attachment #380269 - Flags: review?(lhansen)
Comment on attachment 380269 [details] [diff] [review]
Patch #29 -- optimize Toplevel::getproperty and friends

I am going to go ahead and "review-" this myself; though I think it has merit, it's not clear if the callproperty optimization (to remove the explicit receiver parameter) is correct in all cases (even though it passes acceptance tests).
Attachment #380269 - Flags: review?(lhansen) → review-
Patch applies against latest from http://asteam/hg/users/stejohns/tr-box/ and full acceptance test run passes with -Ojit option.
Attachment #379938 - Attachment is obsolete: true
Attachment #377462 - Flags: review?(lhansen)
Attachment #377463 - Flags: review?(lhansen)
Attachment #377467 - Flags: review?(lhansen)
Attachment #377469 - Flags: review?(lhansen)
Attachment #377470 - Flags: review?(lhansen)
Attachment #377471 - Flags: review?(lhansen)
Attachment #377472 - Flags: review?(lhansen)
Attachment #377474 - Flags: review?(lhansen)
Target Milestone: flash10.x → Future
No longer blocks: 465754
Priority: P2 → --
Blocks: 416398
Depends on: 502279
Most definitely sour; marking won't fix
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
bulk verifying resolved !fixed issues
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: