Closed Bug 519938 Opened 15 years ago Closed 13 years ago

optimize calls to prototype methods on final classes

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect, P3)

Tracking

(Not tracked)

VERIFIED WONTFIX
Q3 11 - Serrano

People

(Reporter: edwsmith, Unassigned)

References

Details

(Whiteboard: PACMAN)

Attachments

(2 files, 3 obsolete files)

if the receiver is, say, String, and we don't find a binding a compile time, then we currently do a late-bound call.  this will use a cache, and do dynamic lookups on the requested property.

Since we know the type is String, and String is final, we'll *always* do the dynamic lookup, there is no need for the BindingCache.
Priority: -- → P5
Target Milestone: --- → flash10.1
OS: Mac OS X → All
Hardware: x86 → All
Target Milestone: flash10.1 → flash10.2
Priority: P5 → P3
Whiteboard: PACMAN
Component: Virtual Machine → JIT Compiler (NanoJIT)
Severity: normal → minor
Is it possible to discover whether the class is final inside AVM ? Methods have a final attribute. Whether classes have a similar one?
According to the spec the CONSTANT_ClassFinal attribute should be set on the instance_info structure in the ABC if the class is final.
Also, the Traits.final field indicates whether a type is final or not.
Are there any existing benchmarks that can be reused for profiling a fix for this bug? Else, if we are writing a benchmark what metrics is of interest here? Should we consider speed-up in method calls for non-final classes or binding cache size reduction?
You may be able to use e.g. test/performance/jsmicro/string-charCodeAt-1.js, which measures run-time speed.

This bug appears to be pretty specific.  We're just looking to avoid pointless overhead in the case of unknown methods on final classes, I think.  In the case of non-final classes an optimization based on shape trees or hidden classes would be more appropriate.
(In reply to comment #4)

Right.  When I was adding the binding cache and studying stats, I noticed that if a call cache resolves to callprop_obj_none(), and the base type is known and final, then that method will always be used since the name and base type never change.

> Should we consider speed-up in method calls for non-final classes or binding
> cache size reduction?

The potential speedup is what we're after and String is probably the most interesting case.

Another more general hypothesis is that if the base type is final, then for any binding cache (get/set/call), the cache will always resolve to the same accessor.  (right?  any second opinion?)
Attached patch RFC: Tentative patch (obsolete) — Splinter Review
Patch to avoid using binding cache for final class prototype properties (get/set/call). Haven't run through any performance tests, need feedback on correctness of patch.
Attachment #479043 - Flags: feedback?(edwsmith)
Comment on attachment 479043 [details] [diff] [review]
RFC: Tentative patch

This patch saves the memory overhead of the binding cache, but does not optimize along the lines suggested by this bug, because if we're calling an unknown property on a final class, the JIT ends up generating generic code with no optimization, that calls callprop_late(), which repeats the name lookup that we know will fail.

The resulting control flow (if optimized) should be equivalent to calling call_obj_dynamic() directly without any intervening name lookups or cache validity checks.
Attachment #479043 - Flags: feedback?(edwsmith) → feedback-
Added callprop_final method in jit-calls.h which invokes call_obj_dynamic or call_prim_dynamic depending on the final class receiver's type.
Attachment #479043 - Attachment is obsolete: true
Attachment #480038 - Flags: feedback?(edwsmith)
Fixed upload error.
Attachment #480038 - Attachment is obsolete: true
Attachment #480039 - Flags: feedback?(edwsmith)
Attachment #480038 - Flags: feedback?(edwsmith)
Looks better.  The new function, callprop_final, needs comments explaining what its for.  (despite lack of comments in many areas, we are trying to improve the code base as we go).  And the next step is to get some preliminary benchmark numbers to see if we can measure the effect.  If nothing in test/performance/asmicro already stress-tests this code path, then you might need something new.
Attachment #480039 - Flags: feedback?(edwsmith) → feedback+
Added the function callprop_prim_final and callprop_obj_final in jit-calls to perform dynamic lookup for prototype methods of final classes without using the binding cache.
Attachment #480039 - Attachment is obsolete: true
Attachment #481208 - Flags: feedback?(edwsmith)
I did not observe any significant improvement by directly performing dynamic lookup in these benchmarks. Planning to write some specific benchmarks for this code flow to see if there will be any gains.
Comment on attachment 481208 [details] [diff] [review]
Eliminates binding cache for final class prototype methods

re: edwin, delegating feedback to rreitmai
Attachment #481208 - Flags: feedback?(edwsmith) → feedback?(rreitmai)
Is this supposed to speed things up or maybe just save memory?  With the patch, we're just skipping the "PRIM_HIT" check?

    Atom callprop_prim_none(CallCache& c, Atom prim, int argc, Atom* args, MethodEnv* env)
    {
        PROF_IF ("callprop_prim_none hit", PRIM_HIT(prim, c)) {
            // cache hit since all prims are final
            // possible speedup: cached the prototype object itself, or at least
            // get the prototype object using a table lookup with tag as the index
            return call_prim_dynamic(env, prim, c.name, argc, args);
        }
        return callprop_miss(c, prim, argc, args, env);
    }


My tests with the patch show no speedup at all.  I used the following micro benchmark:

package {
	var d1:Number, d2:Number;

	function testC(s2:String):int
	{
		var c:String = 0;
		for (var i:int = 1; i < 100000000; i++)
		{
			c = s2.charAt(0);
		}
		return c;
	}
	
    d1 = getTimer();
	(testC("abc"));
    d2 = getTimer();
    trace("String.charAt elapsed time is " + (d2-d1));
  }

I compiled it specifically without the AS3 flag

java -jar asc.jar -import tamarin-redux/core/builtin.abc -config CONFIG::desktop=true test.as

VTUNE shows us going through callprop_prim_none or callprop_prim_final with the patch.
We did not get significant speed up when we ran it through asmicro/string* benchmarks (Attachment 481209 [details]). With this patch we skip the PRIM_HIT check and also usage of binding cache. Although we expected speed-up, we did not get any. Is it worth implementing hidden classes to optimize prototype functions?
Since we aren't seeing any speedup, I move to resolve this bug as "wontfix".

Implementing hidden classes is out of scope for this bug, but is still a potential way to speed up dynamic property access in general, for javascript prototype-style code.

For the short term it is more important to focus on speeding up AS3 code in general (where hidden classes probably doesn't help) and Vector in particular.
Comment on attachment 481208 [details] [diff] [review]
Eliminates binding cache for final class prototype methods

Remove self from feedback request, in light of above results.
Attachment #481208 - Flags: feedback?(rreitmai)
per comment #17, marking resolved/wontfix
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
bulk verifying resolved !fixed issues
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: