This elaborates on an idea for benchmark tracking. instead of having the test print a performance value, let the VM do it. (analogous to using a profiler instead of instrumenting code). We already do this for memory, and it makes sense to do it for performance. A simple counter that is incremented at coarse control flow boundaries would be independent of execution mechanism and optimization level. Combined with a vm-level timer instead of code-level. Tests would still need to be somewhat deterministic, but no longer need to measure themselves.
Created attachment 448087 [details] [diff] [review] add "blops" metric First prototype. keeps two counters: call_count loop_count call count is incremented once per call, in the prolog. this includes native methods. loop_count is incremented: * once per branch, if the branch direction is backwards, whether or not it is taken. * once per OP_lookupswitch, regardless of case or direction the VM reports "blops" (blocks per second) as a loose measure, computed as: (call_count + loop_count) / ms / 10.0 It is simpler for the interpreter to increment loop_count at the point of the branch rather than the loop header, since loop headers are not identified in ABC. Its simpler for the JIT to increment loop_count unconditionally just before a conditional back-branch. For switch, its simpler to just increment loop_count always. Its expensive for the jit to add increment code to the subset of edges that are backwards (code on edge? table lookup?). in testing, call_count isn't identical for jit vs interp -- i dont know why yet. I suspect inlined cast expressions: T(expr). the patch also modifies runtest.py to use "metric blops" if found. indeed, we can now run sunspider and v8 side by side and get beleivable higher-is-better scores, while ignoring all test output and only using vm-generated metrics. issues - i'm measuring about a 3% counter overhead cost on boids.as. that's higher than i'd like, but in principle we can live with it. - need to debug call_count. - looping inside native method is not accounted for - there is no tie-in with vprof or dtrace, and there probably should be (for perf and gc, arguably), or even a simple universal telemetry mechanism. (channelling Lars here...) design question - should we just count loops and calls and do all the analysis after the fact (outside the vm?). on the one hand, i LIKE having the vm generate a single # as input to various scripts. on the other hand, we'll get more and more metrics over time, and we'll be doing post-game analysis anyway.
Assignee: nobody → edwsmith
Status: NEW → ASSIGNED
Attachment #448087 - Flags: feedback?(lhansen)
Comment on attachment 448087 [details] [diff] [review] add "blops" metric I like it, and I believe in what you write. I think it's most useful to have the shell print a single metric; if there's need for something else add a command line switch to select a different format. All of this needs to be conditionalized, obviously - 3% is too much overhead for production runs in the Flash Player. For the GC, there's MMGC_POLICY_PROFILING which is on by default in the shell and off by default in the player (5% overhead).
Attachment #448087 - Flags: feedback?(lhansen) → feedback+
I think we dont want to count builtin code, for two reasons: 1. In ad-hoc testing I find I can reduce the counter overhead significantly by not counting in native methods. Probably this is because it is expensive to access AvmCore. Maybe we should use a static counter instead of one on AvmCore, but read on: 2. For native methods, we can only get the call count, not the loop count. For builtin AS3, we can get both, but if we change builtin code from C++ to AS3 or back, we will affect the metric and we probably don't want to. e.g. by making changes to AS3.push().
Created attachment 448535 [details] [diff] [review] refined as "metric steps" Changes since last patch * conditionalized * merged call_count and loop_count as "step_count" * count backedges whether taken or not * count all OP_lookupswitch (too messy to worry about which path taken) * do not count builtin code (as3 or native) * rebased to TR tip these changes reduced metric overhead in boids to within noise. open issues: - print metric to stdout? - need commandline switch to enable metric? (should all vm metrics be under one switch?) - use static mem isnstead of AvmCore? (one metric for whole process, or one per vm instance?)
Attachment #448087 - Attachment is obsolete: true
Nice, and I'm starting to wonder if it makes sense look at a more dynamic/comprehensive means of generating this data. We'll probably need something similar in future as input for jit policy decisions and having a generic mechansim that handles both use-cases would be nice. Although you could argue that the effort/engineering required for the jit policy effort is much higher and suitably involved that having this patch available now is more valuable.
(In reply to comment #5) > Although you could argue that the effort/engineering required for the jit > policy effort is much higher and suitably involved that having this patch > available now is more valuable. exactly :-) If we end up adding more counters and a general mechanism seems worthwhile, i'm all for it. not a blocker tho.
Whiteboard: has-patch → has-patch, must-fix-candidate
Target Milestone: Q3 11 - Serrano → Q4 11 - Anza
This still seems like a useful idea and measurement. Perhaps the blops measurement simply turns into a Telemetry measurement.
Retargeting to future as this has lingered fo so long. Ed, retarget if you feel it is needed for HM.
Priority: P1 → --
Target Milestone: Q4 11 - Anza → Future
You need to log in before you can comment on or make changes to this bug.