Closed Bug 531250 Opened 15 years ago Closed 6 years ago

Tune ScriptObject initialization

Categories

(Tamarin Graveyard :: Virtual Machine, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX
Future

People

(Reporter: lhansen, Unassigned)

References

Details

(Whiteboard: PACMAN Tracking)

Attachments

(3 files, 14 obsolete files)

376 bytes, text/plain
Details
1006 bytes, patch
Details | Diff | Splinter Review
389 bytes, text/plain
Details
We really want to enable rapidly-allocating programs to function well.  (Rapidly-allocating is not an absolute, but it should never be an issue about whether it's right to reuse an event handler object or allocate a new one - it should always be affordable to allocate a new one.)

There's probably some low-hanging fruit in ScriptObject initialization; I've just started looking. Consider ScriptObject::initHashtable:

  iht->initialize(this->gc(), capacity);

In turn InlineHashtable::initialize:

  capacity = MathUtils::nextPowerOfTwo(capacity);
  setCapacity(capacity*2);
  setAtoms((Atom *)gc->Calloc(getCapacity(), 
                              sizeof(Atom), 
                              GC::kContainsPointers|GC::kZero));

nextPowerOfTwo is a proper function and that function has a loop (which probably terminates reasonably quickly since most arguments are small, but still).  In turn setCapacity() calls FindOneBit to compute the log of capacity; getCapacity() undoes that computation to produce the original value, etc - small potatoes to be sure, but if the allocation rate is high enough this all adds up.  Most of the values could presumably be cached with the ScriptObject metadata in this case, ie, we're looking for specializing / parameterizing fast factories for ScriptObjects.
Attached file Simple benchmark
This is a simple allocating loop that just creates and discards a lot of class instances.  Everything is strongly typed and I expect this is a typical "best case" for both the GC and the VM; the GC does not run, everything is handled by DRC as we want to.  The idea is just to get an idea of where the cost goes in instance creation and destruction.  Will post data when I have them.
Gross findings from this simple benchmark:

- Overall running time for 10 million allocations is 1523ms on my Mac Pro,
  a cost of 0.15us / object or ~450 clocks / object, including all overhead
  from the benchmark loop etc.

- 48.5% of the time spent is in creating and destroying Point objects (not 
  including memory management):
    ScriptObject::createInstance 20.1
    Traits::destroyInstance       9.5
    ScriptObject::~ScriptObject   5.9
    ScriptObject::ScriptObject    5.4
    AvmCore::newObject            5.1
    ClassClosure::newInstance     2.5

- 31% of the time spent in MMgc:
    Allocation, deallocation     17.5
    ZCT reaping                   9.0
    Write barrier                 4.4

- 20% executing the code:
    Jitted code                  14.4
    finddef_cache                 5.6

This is with the allocator/deallocator optimizations from bug #528338 applied.  Note that the GC basically does not run, everything is handled by reference counting.

General observations and questions:

- The best case for initializing an eight-word AS3 object (three 'int' fields,
  AS3 vtable, delegate pointer, RC overhead, C++ vtable, rounded up) is more
  than 8 simple stores, but 2.5 times the allocation+deallocation cost is
  much more than expected.

- Why is finddef_cache showing up so much?  I tried my best to make 
  everything be early-bindable, but maybe I failed.

- 14.4% in jitted code is more than expected - it's just a loop calling a
  constructor - so presumably some of this time is actually for jitted
  initialization code?
Another finding is that __bzero accounts for fully 3.8% of the running time, and almost all that time is accountable to destroyInstance (so it accounts for 40% of the time in destroyInstance).  __bzero is only called if the object has no pointer or atom slots (as is true here).  Its setup cost can be substantial however; I shaved 2.5% off the overall running time by open-coding special cases for objects up to 32 bytes inside destroyInstance.
AvmCore::newObject is just a trampoline but accounts for 5% of the running time, making it REALLY_INLINE removes about half of that.
A significant fraction of the time in ScriptObject::ScriptObject and ScriptObject::~ScriptObject - say 20% - is for adding the pointer of the new reference to the ZCT and removing it again.
It would seem that ScriptObject::createInstance could be specialized in various ways: once the first instance of a type has been created, some of the checking in createInstance appears not necessary; once it's been established that a type does not extend a class with a custom createInstance then the recursive virtual calls can be performed as a loop.  The latter may be important because the prologue of createInstance is hot; there are many dependent loads.  If refactored as a loop these loads could be hoisted out of the loop, possibly (remains to be seen).  In general, perhaps the indirection could be flattened a bit.
... and of course that "loop" can be completely unrolled.
Attached patch Short-circuit createInstance (obsolete) — Splinter Review
Actually the "loop" disappears completely, if we assume the error checking is redundant; all other data are loop-invariant.

The proof-of-concept code here specializes the case where only ScriptObjects are on the path, custom createInstance don't get in the way.  This provides another healthy speedup - the three patches together have lopped off about 300ms (20%) of the running time.  Now jitted code consumes about 21% and MMgc nearly 40% of the remaining running time.
Annoyingly GCC does not tail-call-optimize calls through a function pointer, so newInstance is more expensive than it needs to be - it calls the now-static createInstance helper indirect; that function then returns only to return again.

newInstance also contains a check for whether 'prototype' is NULL, which is presumably almost never true - it's wasted work to an arbitrary number of decimal places.  It's particularly annoying since the indirection does not serve to create anything lazily, just to fetch a constant value (using a non-constant computation.)  

Part of the problem is that the prototype can be set by user code, so even if it wasn't NULL it can become NULL.  That setting is always via an accessor so it can be intercepted, but there are other complications: lots of native code access the 'prototype' property on the C++ side directly and will need to read NULL.  A splitting into two separate properties /might/ fix it but the protocol vis-a-vis host code is unclear right now.
The 'gcInterface' member is not used either in the shell or in the Player, but it adds one callback object to the GC's list of callback objects.  Callbacks are evil because of the prereap(obj) callback; that facility needs to be removed, but in the mean time we should only use it when necessary, not willy-nilly.

(Drops the time from 1230 to 1204 ms pretty consistently - 2%. Two percent here and two percent there... maybe it'll add up.)

It might be useful to split the prereap(obj) callback out as a separate object that the Player can register only when it has objects that require it - almost none I think (some SharedObject thing).  It may be the best we can do in the short term.
One more note about newInstance.  The code for OP_construct has two parts, the one part invoking newInstance and the other part calling the object constructor.  The jitter does this for 

  LIns* vtable = loadVTable(objDisp);
  LIns* ivtable = loadIns(LIR_ldcp, offsetof(VTable, ivtable), vtable);
  method = loadIns(LIR_ldcp, offsetof(VTable, init), ivtable);
  LIns* inst = callIns(FUNCTIONID(newInstance),1, localGetp(objDisp));
  localSet(dest, inst, result);

If we avoid for the moment the song and dance around 'prototype' the body of newInstance is this:

  VTable* ivtable = this->ivtable();
  return ivtable->createInstance(this, ivtable, prototype);

where ClassClosure::ivtable() is

  return vtable->ivtable;

We see that if we can get rid of the song and dance around 'prototype' then newInstance is trivial and can be removed.  The savings from the jitter calling ivtable->createInstance directly will not just be the removed function call, but also removing redundant chained loads to load ivtable (the jitted code does it, and newInstance does it too).
Anyway after all this dinking we're in a situation where coerceUnboxEnter and finddef_cache together take about 30% of the running time.  Yet the benchmark is a simple loop calling 'new' on a known class that has a trivial, generated constructor, inheriting directly from Object.  In turn Object has a trivial, generated constructor.  Where is all the time going?

(Lest this be thought an academic exercise, the Point class in the player is very similar to this example: http://help.adobe.com/en_US/AS3LCR/Flash_10.0/flash/geom/Point.html.  It only has x and y fields of type Number (not int),  it has some methods, and it has a constructor whose default argument values are 0 and 0 - basically a trivial constructor though it's harder to see that.)

The jitted code apparently calls finddef_cache to get the class, then newInstance to create an instance, then calls the constructor, which apparently calls the base class constructor.

This is the ABC for the Point constructor by asc -optimize:

  function foo::Point():*	/* disp_id -1*/
  {
    // local_count=1 max_scope=1 max_stack=1 code_len=6
    0       getlocal0     	
    1       pushscope     	
    2       getlocal0     	
    3       constructsuper	(0)
    5       returnvoid    	
  }

Judging from disassembly (I'm still a novice) this is compiled pretty straight:

             push ebp              
             mov ebp,esp           
             sub esp,56            
             mov -28(ebp),ebx
             mov -44(ebp),esi
             mov ecx,8(ebp)        
             mov eax,16(ebp)
             lea edx,-40(ebp)
             mov   esi,0(513494)
             mov -36(ebp),ecx
             mov -40(ebp),esi
             mov 5321876(0),edx
             mov -32(ebp),-55903873
             mov   esi,0(513488)
             cmp edx,esi
             mov esi,-44(ebp)
             jnb   0x618f0a
             mov ebx,8(ebp)        
             mov ecx,ebx
             call handleStackOverfl
             mov ecx,ebx
             mov ebx,-28(ebp)
             mov eax,16(ebp)
0000618f0a
             mov eax,0(eax)
             or eax,1
             mov ecx,8(ecx)
             mov ecx,0(ecx)
             mov ecx,8(ecx)
             mov ecx,4(ecx)
             lea edx,-28(ebp)
             mov -28(ebp),eax
             mov eax,0(ecx)
             sub esp,4
             push edx
             push 0
             push ecx
             call eax              
             add esp,16            
             mov eax,4             
             mov ecx,-40(ebp)
             mov 5321876(0),ecx
             mov esp,ebp           
             pop ebp               
             ret                   

That is, 38 dynamic instructions, one of them a call, none of which are technically needed since the constructor is trivial.

This is the constructor for 'Object':

  function Object():*	/* disp_id -1*/
  {
    // local_count=1 max_scope=1 max_stack=1 code_len=3
    0       getlocal0     	
    1       pushscope     	
    2       returnvoid    	
  }

I haven't looked at the disassembly for this but I assume there's at least some boilerplate stack overflow checking and so on (TBD).
One would think that a supercall to Object() could *always* be safely eliminated...
(In reply to comment #12)
> Created an attachment (id=414861) [details]
> Remove the 'gcInterface' member from AvmCore
> 
> The 'gcInterface' member is not used either in the shell or in the Player, but
> it adds one callback object to the GC's list of callback objects.

... and that callback is actually used in the vm core for pool management, so we can't delete gcInterface that easily.

Makes it more attractive to split the single callback into multiple callbacks.
This is another sketch; there may be corner cases here I haven't thought much about (when 'super' passes arguments, for example.)  This doesn't make a lot of difference (< 1% speedup) but it's consistenly measurable.
Data point: the time in stock TR (ie without any allocator optimizations) is 1700ms on the same machine.
(In reply to comment #11)
> 
> newInstance also contains a check for whether 'prototype' is NULL, which is
> presumably almost never true - it's wasted work to an arbitrary number of
> decimal places.  It's particularly annoying since the indirection does not
> serve to create anything lazily, just to fetch a constant value (using a
> non-constant computation.)  
> 
> Part of the problem is that the prototype can be set by user code, so even if
> it wasn't NULL it can become NULL.  That setting is always via an accessor so
> it can be intercepted, but there are other complications: ...

The most credible fix for this problem is as follows:

- Make ClassClosure::prototype private, hide it behind a getter/setter pair
  prototypePtr / setPrototypePtr.  This causes a fair amount of pain as every
  native subclass uses the prototype field during construction.  Strike one
  for C++ which does not have getter/setter syntax.  But the pain is not deep.
  (Be sure to change even the update from the set_prototype setter to go
  through setPrototypePtr.)

- Move the handling of a NULL prototype value to the ivtable->createInstance
  function.  The generic case can always check whether the prototype needs to
  be updated from NULL to something more reasonable.  The fast case never needs
  to check provided setPrototypePtr snaps the createInstance pointer back to
  the generic case again when a NULL value is stored.

The benefit is that there is more or less provably no change in semantics: the change filters out redundant NULL checks but every non-redundant NULL check will happen exactly at the time it currently happens, and will have exactly the same effects.
Comment on attachment 415107 [details] [diff] [review]
Don't call the Object constructor from a subclass's constructor

Testing shows that this introduces regressions eg in as3/Definitions/Classes/ClassDef/Bug118272Package.abc
Fix embarrassing bug.
Attachment #414834 - Attachment is obsolete: true
Comment on attachment 414861 [details] [diff] [review]
Remove the 'gcInterface' member from AvmCore

Unsound idea.
Attachment #414861 - Attachment is obsolete: true
The jit is only able to early bind and invoke a constructor method when we know that the class doesn't require custom logic prior to the constructor invocation.  the flag that controls this has been found to be flaky (bug 530222).

If, while exploring, we see an opportunity to restructure the object construction protocol to avoid the need for the flag, and let the jit unconditionally invoke the constructor, then we'll get a nice win for those classes that require customized ScriptObject::construct() logic.
(In reply to comment #19)
> 
> The most credible fix for this problem is as follows:

To my relative astonishment this appears to work and yields about 2% speedup on this benchmark.  Patches coming.
Attached patch Hide the 'prototype' field. (obsolete) — Splinter Review
Obviously a biggish patch for the avmglue will also be needed before this one can land, but this is good API discipline in any case.
Attached patch Some tweaks to the ZCT code (obsolete) — Splinter Review
Drive-by tuning, needs further vetting but the FreeNotNull tweak was a visible win.
In general I also question the need for the 'composite == 0' invariant in the reference counting code.  It's of dubious value now that the object index is stored in that word when the object is on the free list - and no asserts seem to have fired anyway - and there are tests in the RC code that check it yet no ill effects have some far come from the tests failing, that I have seen.  In general it just inserts work everywhere.  We should revisit it.
A bit rough this one - contains reworkings of parts of earlier patches - but passes acceptance tests.  The trick is as outlined earlier: whenever the prototype property is updated we snap the createInstance pointer back to the general case, which will then compute a new prototype value as appropriate.
Attached patch Refactor GC::FreeNotNull (obsolete) — Splinter Review
This refactors GC::FreeNotNull by moving most of the logic into GCAlloc::Free and GCLargeAlloc::Free; the way it does that is to create a GCAllocBase class that the two allocator types derive from, with a virtual Free() method.  FreeNotNull dispatches by grabbing the allocator out of the block header and making a virtual call.

This yields a healthy 7% speedup on the Mac Pro.  The savings comes from several sources.  FreeNotNull is now inlinable so we avoid the call overhead.  The inlined virtual call is correctly predicted pretty much always.  But most importantly, gcc did not generate good code for FreeNotNull because it had several uncommon-case branches - as a result the procedure prologue was very heavyweight, for something that was supposed to be a quick dispatch.  Now it /is/ a quick dispatch, the complexity has been absorbed into GCAlloc::Free and GCLargeAlloc::Free, where it does much less damage.  Finally, the really-uncommon cases were factored out as a separate bailout function in GC to reduce compiler confusion further.
With all those changes we've seen a 36% reduction in running time (relative to TR) and the most-wanted list now looks like this:

MethodEnv::coerceUnboxEnter       19.2%
ScriptObject::fastCreateInstance  14.4%
GCAlloc::Alloc                    13.4%
ZCT::Reap                         10.2%
ScriptObject::~ScriptObject        8.5%
GCAlloc::Free                      8.2%
Traits::destroyInstance            7.9%
finddef_cache                      7.7%
GC::WriteBarrierRC_ctor            3.8%
GC::WriteBarrierRC_dtor            2.7%
ZCT::PinStackObjects               2.0%
GCCallback::prepreap(obj)          1.3%

fastCreateInstance inlines newObject which appears to have inlined ScriptObject::ScriptObject, which in turn inlines RCObject::RCObject, which inlines logic to add the object to the ZCT - so the time spent here is not surprising, and some should be billed to the GC.

ScriptObject::~ScriptObject inlines RCObject::~RCObject, which inlines logic to remove the object from the ZCT - again, some of this time should be billed to the GC.

At this point the profile shows mutator/GC ~60/~40 though in truth it's probably closer to 50/50 when all the inlining has been accounted for.
This program is similar to the one in attachment #414742 [details], except that a pointer to each new object is stored in an instance variable, not a local variable.  Ergo each object's reference count is incremented and then decremented when the next object is stored.  The profile is similar to the first program, but the RC write barrier has taken the number two spot with 15% of the running time.  This program runs in 1345ms compared to 1090ms for the first program.
Depends on: 482787
Recently it occurred to me that all instances of ordinary classes share the same value of ScriptObject::delegate, and therefore that value could be stashed somewhere else (VTable).  The only way SO::delegate can vary from instance to instance is instances of Object created with a Function constructor.

Saving one word per object will be interesting, eventually, but this could also reduce work at object instantiation time.
Edwin notes elsewhere about how the JIT code is distributed (this is TR): 

"With an AS3-symbol-instrumented shark run, I get 6.5% self time in Point(),
5.9% in allocate(), 1.3% in Object(), negligible everywhere else.  If I
flatten, it all attributes to allocate(), except for a few stray samples
attributed to Point but without allocate on the stack.  (i blame shark for
fuzzy noise like that)"

Comparatively that's a lot of time in allocate(), but if it has to call
finddef_cache, then call newInstance, then call the constructor, I can see it,
depending on how well those calls are done.

Ed again: "[Calls] are not superb, so the relative time in Point vs Allocate
doesn't surprise me.  Improvements to finddef, the as3 jit ABI, and general jit
code quality should uniformly help."
In reference to comment #30, can we optimize the ZCT further?

Some observations:

- It's always valid to evict members of the ZCT.  The ZCT helps to make memory available sooner, but it's just an optimization (though an important one).

- Objects must be inserted in the ZCT when they are created (with reference count 0), and must be removed when they become referenced by other objects (and their reference counts go to 1)

- Objects in the ZCT have a "ZCT index" that indicate their place in the ZCT, so that an explicit deletion of an object in the ZCT can remove it quickly.

- Every RCObject is allocated via GCAlloc::Alloc or GCLargeAlloc::Alloc, so anything that goes on in the RCObject constructor can move into the allocators if we so desire.

There is a fair amount of logic in the RCObject constructor to insert the object into the ZCT, call a function on overflow, and compute and insert the ZCT index.  The code is large and branchy in practice - it's been tuned, but it's still complex.

Two optimizations suggest themselves:

- One optimization is that the ZCT does not have to be precise; it can be a hash table indexed by the object pointer.  Collisions would be handled by simple overwriting.  Whether an object is in the ZCT or not becomes a lookup in the hash table rather than a check of the object's header.  The ZCT index field - and its maintenance - goes away.  Insertions into the ZCT and removals from the ZCT are simple pointer writes, no decision points, no overflow checks.

- Another optimization is that CodegenLIR generates better code when an object is constructed that is likely to become referenced by another object; this completely removes the ZCT insertion/removal cost.  Remember, it's always correct not to insert an object in the ZCT, so it's OK if the JIT guesses wrong from time to time, or if exceptions or similar effects invalidate a correct analysis.

In a pattern like

   construct
   ...
   putproperty

the 'construct' call would create the object without adding it to the ZCT but setting its reference count to 1, and putproperty would store it using a cheaper write barrier, like WriteBarrierRC_dtor (because only the existing content in the slot is affected by the barrier, the new object has a correct reference count already).

What are the dangers of these optimizations?  We risk that large object structures that would have been freed by RC will no longer be so, they will require GC.  This is so because pointers to their roots (or significant elements) will have been evicted from the ZCT by an overwrite or not entered into the ZCT by an incorrect analysis.  We can combat these dangers by using a larger hash table, a better hash function, and a better analysis in the JIT.
(In reply to comment #34)
> In reference to comment #30, can we optimize the ZCT further?

The Smalltalk "Green Book" notes that RC logic becomes much simpler if the RC field is so large that it won't overflow.  That's a trivial observation perhaps, but it applies to us: if we don't need a ZCT index because the ZCT is a hash table indexed by the pointer value, then the ZCT index field's space as well as the in-ZCT bit now in 'composite' can be used for the reference count.  Since we only need one bit to denote pinning, we can devote 31 bits to the reference count.  Since the smallest object is 8 bytes, we only need 29 bits for a reference count that's guaranteed never to overflow.  We currently have code that depends on being able to 'stick' an RCObject, but (a) I suspect that code is dodgy and (b) if we really need it there's no problem using a reference count that can never reach zero - it has the same effect.
Comment on attachment 415359 [details] [diff] [review]
Refactor GC::FreeNotNull

Moved to bug #528338.
Attachment #415359 - Attachment is obsolete: true
Comment on attachment 415162 [details] [diff] [review]
Some tweaks to the ZCT code

The FreeNotNull tweak landed with bug #528338, the removal of the local variables needs investigation on multiple systems before it's trustworthy.
Priority: -- → P3
Target Milestone: --- → flash10.2
Rebased to redux 3530.
Attachment #414835 - Attachment is obsolete: true
Attachment #421770 - Flags: review?(edwsmith)
Comment on attachment 415162 [details] [diff] [review]
Some tweaks to the ZCT code

The FreeNotNull change landed elsewhere; the rest are probably machine-specific and not worth changing.
Attachment #415162 - Attachment is obsolete: true
(In reply to comment #10)
> Created an attachment (id=414845) [details]
> Short-circuit createInstance

The method Traits::hasCustomConstruct is the canonical way to determine whether the base case constructor is overridden, and we should probably just use that.  See bug #530222 for a discussion of problems with that method, however.
Attached patch Short-circuit createInstance (obsolete) — Splinter Review
Rebased to redux 3530; more documentation.  FP builtins vetted to ensure they obey the restriction now documented in ScriptObject.h.

The condition I'm computing here appears to be different from the condition implied by hasCustomConstructor, which is why I'm not reusing that.
Attachment #414845 - Attachment is obsolete: true
Attachment #421797 - Flags: review?(stejohns)
Attachment #421797 - Flags: review?(edwsmith)
Attached patch Hide the 'prototype' field (obsolete) — Splinter Review
Rebased to redux 3530.  This is a non-functional change creating controlled access to the 'prototype' field.  The patch sets us up for the next one, where setPrototypePtr will do more work (off the hot path of object creation).
Attachment #415159 - Attachment is obsolete: true
Attachment #421800 - Flags: review?(stejohns)
Attachment #421800 - Flags: review?(edwsmith)
Attachment #421770 - Flags: review?(edwsmith) → review+
Comment on attachment 421797 [details] [diff] [review]
Short-circuit createInstance

Assuming the invariant is true that the base case is always hit or never hit, this looks right.

It might be possible to use the function pointer alone and no baseCase flag, if we didn't mind genericCreateInstance() being called repeatedly for the non-optimized case.  (fast path gets faster, slow path gets slower).  meh.
Attachment #421797 - Flags: review?(edwsmith) → review+
(In reply to comment #43)

> It might be possible to use the function pointer alone and no baseCase flag, if
> we didn't mind genericCreateInstance() being called repeatedly for the
> non-optimized case.  (fast path gets faster, slow path gets slower).  meh.

I don't understand how that would make the fast path faster.
Attachment #421800 - Flags: review?(edwsmith) → review+
eliminating the baseCase flag doesn't make the fast path faster, i was referring to the overall change.  moot point since the new bool uses padding space anyway.
Rebased to redux 3530 + the above three patches.

Avoids calling newInstance at all by moving the check for the validity of the prototype off the hot path.  When the prototype is set to NULL by user code the original createInstance trampoline is snapped back in and computes a proper value.
Attachment #415163 - Attachment is obsolete: true
Attachment #421812 - Flags: review?(stejohns)
Attachment #421812 - Flags: review?(edwsmith)
Whiteboard: Has patch
Comment on attachment 421812 [details] [diff] [review]
Make the jit call ivtable->createInstance directly

Looks right.
Attachment #421812 - Flags: review?(edwsmith) → review+
Attached patch Clean up newInstance (obsolete) — Splinter Review
newInstance is no longer needed by the JIT, so remove the defs.

newInstance and ivtable are small enough to inline, and there are other inline functions in ClassClosure, so introduce ClassClosure-inlines.h.
Attachment #421815 - Flags: review?(edwsmith)
Attachment #421815 - Flags: review?(edwsmith) → review+
Comment on attachment 421797 [details] [diff] [review]
Short-circuit createInstance

This is a nice change, but I suspect we could do even better, as a given class shouldn't ever be able to switch this behavior at runtime: ie, we should (in theory) be able to decide at compiletime which to use for a given class, shouldn't we?
Attachment #421797 - Flags: review?(stejohns) → review+
Attachment #421800 - Flags: review?(stejohns) → review+
Comment on attachment 421812 [details] [diff] [review]
Make the jit call ivtable->createInstance directly

r+, but IMHO having both "genericCreateInstance" and "generalCreateInstance" is... confusing. Surely better names are available?
Attachment #421812 - Flags: review?(stejohns) → review+
Comment on attachment 421770 [details] [diff] [review]
Make AvmCore::newObject REALLY_INLINE

redux changeset:   3574:8390bcf5f5ea
Attachment #421770 - Attachment is obsolete: true
Comment on attachment 421797 [details] [diff] [review]
Short-circuit createInstance

redux changeset:   3574:8390bcf5f5ea
Attachment #421797 - Attachment is obsolete: true
Comment on attachment 421800 [details] [diff] [review]
Hide the 'prototype' field

redux changeset:   3574:8390bcf5f5ea
Attachment #421800 - Attachment is obsolete: true
Comment on attachment 421812 [details] [diff] [review]
Make the jit call ivtable->createInstance directly

redux changeset:   3574:8390bcf5f5ea
Attachment #421812 - Attachment is obsolete: true
Comment on attachment 421815 [details] [diff] [review]
Clean up newInstance

redux changeset:   3574:8390bcf5f5ea
Attachment #421815 - Attachment is obsolete: true
(In reply to comment #50)
> (From update of attachment 421812 [details] [diff] [review])
> r+, but IMHO having both "genericCreateInstance" and "generalCreateInstance"
> is... confusing. Surely better names are available?

Agreed.  Renamed the latter as "slowCreateInstance" to contrast with "fastCreateInstance".  redux changeset:   3574:8390bcf5f5ea.
(In reply to comment #1)
> nextPowerOfTwo is a proper function and that function has a loop (which
> probably terminates reasonably quickly since most arguments are small, but
> still).

In particular, that loop might be made faster; it currently executes n left-shifts by 1; it could instead do log(n) shifts of varying magnitude (wikipedia has the pseudocode for this).  If most arguments are small as you said, then that might be a bad trade-off. . . but its sounds like a pretty simple thing to try out.
Summary: [Speed] Tune ScriptObject initialization → Tune ScriptObject initialization
Whiteboard: Has patch → PACMAN, Has patch
Whiteboard: PACMAN, Has patch → PACMAN
This bug is still open, and marked for PACMAN, but it's not clear which (specific) items are being targeted, nor when we'll consider it "done". I'm guessing the items in comments 34 and 35 are worthy of exploring; are there other items on the try-it-out list?
(In reply to comment #58)
> This bug is still open, and marked for PACMAN, but it's not clear which
> (specific) items are being targeted, nor when we'll consider it "done". I'm
> guessing the items in comments 34 and 35 are worthy of exploring; are there
> other items on the try-it-out list?

Useless calls to useless constructors - we want to optimize that if we can, as it accounts for 50% of the running time of the benchmark, function calls to empty functions being way expensive.

See comment #14 and comment #17 (at least).

Reference counting overhead is being addressed by some MMgc projects, probably not worth focusing on those here.

Feel free to open more targeted bugs for items in this bug, so that we can close it - 60 comments is more than enough.
Continued into https://bugzilla.mozilla.org/show_bug.cgi?id=561140

Closing this one
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Reopening this as a tracking bug
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: PACMAN → PACMAN Tracking
Status: REOPENED → ASSIGNED
Depends on: 561140
Priority: P3 → --
Target Milestone: flash10.2 → Future
(In reply to comment #7)
> A significant fraction of the time in ScriptObject::ScriptObject and
> ScriptObject::~ScriptObject - say 20% - is for adding the pointer of the new
> reference to the ZCT and removing it again.

A lot of times the ZCT operates as a LIFO: object is born into it and more or less immediately removed.

So I'm wondering if in Remove() when you're removing the last thing you could just decrement top. That might be a tiny bit faster in itself, but more importantly the ZCT will grow a lot more slowly since the top entry will tend to be used over and over again.

(Yeah, it might be a bit fiddly since "top==limit" operates as a flag.)
Comment on attachment 415107 [details] [diff] [review]
Don't call the Object constructor from a subclass's constructor

This has been handled in bug #561140.
Attachment #415107 - Attachment is obsolete: true
Depends on: 657135
Assignee: lhansen → nobody
Status: ASSIGNED → NEW
Flags: flashplayer-qrb+
Depends on: 645673
Tamarin is a dead project now. Mass WONTFIX.
Status: NEW → RESOLVED
Closed: 14 years ago6 years ago
Resolution: --- → WONTFIX
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: