Closed Bug 568496 Opened 14 years ago Closed 14 years ago

runtests.py should be able to compute a cumulative score for a performance test suite run

Tracking

(Not tracked)

Status:

VERIFIED INVALID

Milestone:

Q3 11 - Serrano

People

(Reporter: lhansen, Assigned: cpeyer)

References

Details

Attachments

(1 file, 2 obsolete files)

This naive code does not work (nth-root issues? overflow?) 14 years ago Lars T Hansen 2.85 KB, patch		Details \| Diff \| Splinter Review
Update to scores patch to show geo mean of each metric 14 years ago Chris Peyer 4.70 KB, patch	lhansen : feedback+	Details \| Diff \| Splinter Review
cumulative scores + normalization index 14 years ago Chris Peyer 66.99 KB, patch	edwsmith : feedback- brbaker : feedback+	Details \| Diff \| Splinter Review

Lars T Hansen

Reporter

Description

•

14 years ago

Attached patch This naive code does not work (nth-root issues? overflow?) (obsolete) — Details — Splinter Review

Individual results are useful but a cumulative score is useful too.  The obvious candidate is the geometric mean of the individual scores.

Lars T Hansen

Reporter

Updated

•

14 years ago

Summary: runtest.py should be able to compute a cumulative score for a performance test suite run → runtests.py should be able to compute a cumulative score for a performance test suite run

Chris Peyer

Assignee

Comment 1

•

14 years ago

Attached patch Update to scores patch to show geo mean of each metric (obsolete) — Details — Splinter Review

Builds upon initial patch - scores for different metrics are separated.  An even better method might be to show the scores for each directory as thats how the tests are naturally grouped.  Thoughts?

Attachment #447766 - Attachment is obsolete: true

Attachment #448120 - Flags: feedback?(lhansen)

Lars T Hansen

Reporter

Comment 2

•

14 years ago

Comment on attachment 448120 [details] [diff] [review]
Update to scores patch to show geo mean of each metric

That seems fine.  Re the question about subdirectories I don't think I care either way; if that change takes away my ability to compute something across multiple subdirectories then no.  I had envisioned that we would run this once for each score we wanted to compute.

(Central problem remaining with the patch is that when I tested my version out it always resulted in 0 or infinity or NaN (I forget which) due to loss of precision in the computation.)

Attachment #448120 - Flags: feedback?(lhansen) → feedback+

Edwin Smith

Comment 3

•

14 years ago

A general problem for computing an index (with any kind of average: arithmetic, geometric, etc) is that if we use raw measurements then the tests are not all equally weighted.  (raw scores would be raw times, for example, or raw work/time scores from v8).

We could define an "index" file format, and make runtests.py use the index file.
* file contains a list of (reference, test) tuples.
* to compute index value, compute the geometric mean of:  (score/reference) for each test.
* add support in runtest.py to run an "index".  it should read the index definition, run the tests, and of course compute the rolled up result.

The values of the normalizers should be fixed in time:  adding or removing tests from the index, or changing the normalizers, changes the definition of the index and therefore defines a new index that is not comparable to the old one.

To compute normalizers, measure raw scores on a slow baseline.  The reason to use a slow baseline is simply to skew normalized scores into positive integer territory;  mathematically, any reference score will work as well as any other.

caveat: from what i can tell, this is exactly how the v8 tests are organized; each one defines its own reference (hardcoded in the test) and computes a score as a ratio.  this makes "metric v8" results within a subdirectory comparable to each other, but not with other tests in other directories.

Trevor Baker

Updated

•

14 years ago

Assignee: nobody → cpeyer

Flags: flashplayer-qrb+

Priority: -- → P3

Target Milestone: --- → flash10.2

Trevor Baker

Updated

•

14 years ago

Status: NEW → ASSIGNED

Lars T Hansen

Reporter

Comment 4

•

14 years ago

Chris concluded on IM that the problem with always getting 0 was a stupid bug in my code (my characterization, not his).

Any chance of landing this?  The weighting will require a longer discussion perhaps, we can always recompute scores later if we decide to change how they're computed.  We do need to move forward on metrics however, and not having at least basic scoring functionality is a blocker.

Priority: P3 → P1

Chris Peyer

Assignee

Updated

•

14 years ago

Blocks: 572860

Chris Peyer

Assignee

Comment 5

•

14 years ago

Attached patch cumulative scores + normalization index — Details — Splinter Review

Large update to the cumulative scores patch.

This adds the ability to also view results normalized against the values in index.py.

In order to run using the index values use the --index {indexfile} switch:

./runtests.py -i 5 --score --index index.py

A new index file can be created using the --saveindex {indexfile} switch.  If you use both the --saveindex and the --index switches, the values in the --index indexfile will be merged in along with the testrun results (values will be overwritten if the same metrics are generated)

This patch also contains the csv patch (Bug 572861) as some of the csv functionality is used.  There is also a fair amount of code reorg and cleanup that was done.

Attachment #448120 - Attachment is obsolete: true

Attachment #455702 - Flags: review?(brbaker)

Attachment #455702 - Flags: feedback?(lhansen)

Attachment #455702 - Flags: feedback?(edwsmith)

Chris Peyer

Assignee

Updated

•

14 years ago

Attachment #455702 - Flags: review?(dschaffe)

Brent Baker

Comment 6

•

14 years ago

Comment on attachment 455702 [details] [diff] [review]
cumulative scores + normalization index

Chris I know that there is another patch coming but wanted to give some feedback.

could it be possible to drive the tests to run based on the index file? I want to have a "strings" index file which defines the baseline values AND defines which tests to run so that I can just run:
    ./runtests.py -i 5 --index strings.py
I think this was an original request from Edwin:
"* add support in runtest.py to run an "index".  it should read the index
definition, run the tests, and of course compute the rolled up result."

--saveindex should this ONLY save the index data for what was actually run and NOT the entire index file that was fed in via --index? 

--append, should this not only append new tests but also new metrics? If I do a time run and then a memory run I only have 1 metric in the index.

maybe a little cleaner error message around failing to load the specified index file, dumping out a stacktrace is not very useful and message could be clearer

Attachment #455702 - Flags: review?(brbaker) → feedback+

Edwin Smith

Comment 7

•

14 years ago

Comment on attachment 455702 [details] [diff] [review]
cumulative scores + normalization index

Is this patch still relevant or has it been rolled into another one?  General feedback: going in the right direction, but i think Brent's comments should be addressed, and I think the patch should be split into smaller pieces if possible.  For example, dont merge in the CSV changes.

Attachment #455702 - Flags: feedback?(edwsmith) → feedback-

Chris Peyer

Assignee

Comment 8

•

14 years ago

patch has been rolled into 	 572860 - not ideal, but the new way of aggregating the data internally forced me to use a mega-patch.

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → INVALID

Lars T Hansen

Reporter

Comment 9

•

14 years ago

Comment on attachment 455702 [details] [diff] [review]
cumulative scores + normalization index

Deferring to Ed and Brent for now.

Attachment #455702 - Flags: feedback?(lhansen)

Chris Peyer

Assignee

Updated

•

14 years ago

Status: RESOLVED → VERIFIED

Dan Schaffer

Comment 10

•

14 years ago

Comment on attachment 455702 [details] [diff] [review]
cumulative scores + normalization index

removed my review

Attachment #455702 - Flags: review?(dschaffe)

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

runtests.py should be able to compute a cumulative score for a performance test suite run

Categories

(Tamarin Graveyard :: Tools, defect, P1)

Tracking

(Not tracked)

People

(Reporter: lhansen, Assigned: cpeyer)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 2 obsolete files)

Description

Updated

Comment 1

Comment 2

Comment 3

Updated

Updated

Comment 4

Updated

Comment 5

Updated

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Comment 10

Attachment

General

Description

File Name

Content Type