Closed Bug 568933 Opened 14 years ago Closed 6 years ago

Add coarse performance counters to VM for benchmarking

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

Milestone:

Future

People

(Reporter: edwsmith, Unassigned)

References

Details

(Whiteboard: has-patch)

Attachments

(1 file, 1 obsolete file)

add "blops" metric 14 years ago Edwin Smith 118.42 KB, patch	lhansen : feedback+	Details \| Diff \| Splinter Review
refined as "metric steps" 14 years ago Edwin Smith 11.02 KB, patch		Details \| Diff \| Splinter Review

Edwin Smith

Reporter

Description

•

14 years ago

This elaborates on an idea for benchmark tracking.  instead of having the test print a performance value, let the VM do it.  (analogous to using a profiler instead of instrumenting code).

We already do this for memory, and it makes sense to do it for performance.

A simple counter that is incremented at coarse control flow boundaries would be independent of execution mechanism and optimization level.  Combined with a vm-level timer instead of code-level.  Tests would still need to be somewhat deterministic, but no longer need to measure themselves.

Edwin Smith

Reporter

Updated

•

14 years ago

Priority: -- → P1

Target Milestone: --- → flash10.2

Edwin Smith

Reporter

Comment 1

•

14 years ago

Attached patch add "blops" metric (obsolete) — Details — Splinter Review

First prototype. keeps two counters:
call_count
loop_count

call count is incremented once per call, in the prolog. this includes native methods.

loop_count is incremented:
* once per branch, if the branch direction is backwards, whether or not it is taken.
* once per OP_lookupswitch, regardless of case or direction

the VM reports "blops" (blocks per second) as a loose measure, computed as:
(call_count + loop_count) / ms / 10.0

It is simpler for the interpreter to increment loop_count at the point of the branch rather than the loop header, since loop headers are not identified in ABC. Its simpler for the JIT to increment loop_count unconditionally just before a conditional back-branch.

For switch, its simpler to just increment loop_count always. Its expensive for the jit to add increment code to the subset of edges that are backwards (code on edge? table lookup?).

in testing, call_count isn't identical for jit vs interp -- i dont know why yet. I suspect inlined cast expressions: T(expr).

the patch also modifies runtest.py to use "metric blops" if found. indeed, we can now run sunspider and v8 side by side and get beleivable higher-is-better scores, while ignoring all test output and only using vm-generated metrics.

issues
- i'm measuring about a 3% counter overhead cost on boids.as. that's higher than i'd like, but in principle we can live with it.
- need to debug call_count.
- looping inside native method is not accounted for
- there is no tie-in with vprof or dtrace, and there probably should be (for perf and gc, arguably), or even a simple universal telemetry mechanism. (channelling Lars here...)

design question
- should we just count loops and calls and do all the analysis after the fact (outside the vm?). on the one hand, i LIKE having the vm generate a single # as input to various scripts. on the other hand, we'll get more and more metrics over time, and we'll be doing post-game analysis anyway.

Assignee: nobody → edwsmith

Status: NEW → ASSIGNED

Attachment #448087 - Flags: feedback?(lhansen)

Lars T Hansen

Comment 2

•

14 years ago

Comment on attachment 448087 [details] [diff] [review]
add "blops" metric

I like it, and I believe in what you write.  I think it's most useful to have the shell print a single metric; if there's need for something else add a command line switch to select a different format.

All of this needs to be conditionalized, obviously - 3% is too much overhead for production runs in the Flash Player.  For the GC, there's MMGC_POLICY_PROFILING which is on by default in the shell and off by default in the player (5% overhead).

Attachment #448087 - Flags: feedback?(lhansen) → feedback+

Edwin Smith

Reporter

Comment 3

•

14 years ago

I think we dont want to count builtin code, for two reasons:

1. In ad-hoc testing I find I can reduce the counter overhead significantly by not counting in native methods.  Probably this is because it is expensive to access AvmCore.  Maybe we should use a static counter instead of one on AvmCore, but read on:

2. For native methods, we can only get the call count, not the loop count.  For builtin AS3, we can get both, but if we change builtin code from C++ to AS3 or back, we will affect the metric and we probably don't want to.  e.g. by making changes to AS3.push().

Edwin Smith

Reporter

Comment 4

•

14 years ago

Attached patch refined as "metric steps" — Details — Splinter Review

Changes since last patch
* conditionalized
* merged call_count and loop_count as "step_count"
* count backedges whether taken or not
* count all OP_lookupswitch (too messy to worry about which path taken)
* do not count builtin code (as3 or native)
* rebased to TR tip

these changes reduced metric overhead in boids to within noise.

open issues:
- print metric to stdout?
- need commandline switch to enable metric?  (should all vm metrics be under one switch?)
- use static mem isnstead of AvmCore?  (one metric for whole process, or one per vm instance?)

Attachment #448087 - Attachment is obsolete: true

Rick Reitmaier

Comment 5

•

14 years ago

Nice, and I'm starting to wonder if it makes sense look at a more dynamic/comprehensive means of generating this data.

We'll probably need something similar in future as input for jit policy decisions and having a generic mechansim that handles both use-cases would be nice.

Although you could argue that the effort/engineering required for the jit policy effort is much higher and suitably involved that having this patch available now is more valuable.

Edwin Smith

Reporter

Comment 6

•

14 years ago

(In reply to comment #5)

> Although you could argue that the effort/engineering required for the jit
> policy effort is much higher and suitably involved that having this patch
> available now is more valuable.

exactly :-)  If we end up adding more counters and a general mechanism seems worthwhile, i'm all for it.  not a blocker tho.

Chris Peyer

Updated

•

14 years ago

Blocks: 572860

Edwin Smith

Reporter

Updated

•

14 years ago

Whiteboard: has-patch

Edwin Smith

Reporter

Updated

•

14 years ago

Assignee: edwsmith → nobody

Trevor Baker

Updated

•

13 years ago

Flags: flashplayer-bug-

Whiteboard: has-patch → has-patch, must-fix-candidate

Andre Kruetzfeldt

Updated

•

13 years ago

Depends on: 645018

Dan Smith

Updated

•

13 years ago

Flags: flashplayer-qrb+

Flags: flashplayer-injection-

Target Milestone: Q3 11 - Serrano → Q4 11 - Anza

Dan Smith

Comment 7

•

13 years ago

This still seems like a useful idea and measurement.  Perhaps the blops measurement simply turns into a Telemetry measurement.

Dan Smith

Updated

•

13 years ago

Whiteboard: has-patch, must-fix-candidate → has-patch

Dan Smith

Comment 8

•

13 years ago

Retargeting to future as this has lingered fo so long.  Ed, retarget if you feel it is needed for HM.

Priority: P1 → --

Target Milestone: Q4 11 - Anza → Future

Sylvestre Ledru [:Sylvestre]

Comment 9

•

6 years ago

No assignee, updating the status.

Status: ASSIGNED → NEW

Sylvestre Ledru [:Sylvestre]

Comment 10

•

6 years ago

No assignee, updating the status.

Sylvestre Ledru [:Sylvestre]

Comment 11

•

6 years ago

No assignee, updating the status.

Sylvestre Ledru [:Sylvestre]

Comment 12

•

6 years ago

No assignee, updating the status.

Sylvestre Ledru [:Sylvestre]

Comment 13

•

6 years ago

Tamarin is a dead project now. Mass WONTFIX.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → WONTFIX

Sylvestre Ledru [:Sylvestre]

Comment 14

•

6 years ago

Tamarin isn't maintained anymore. WONTFIX remaining bugs.

You need to log in before you can comment on or make changes to this bug.