runtests.py should be able to compute a cumulative score for a performance test suite run

VERIFIED INVALID

Status

Tamarin
Tools
P1
normal
VERIFIED INVALID
8 years ago
7 years ago

People

(Reporter: Lars T Hansen, Assigned: Chris Peyer)

Tracking

unspecified
Q3 11 - Serrano
x86
Mac OS X
Bug Flags:
flashplayer-qrb +

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

8 years ago
Created attachment 447766 [details] [diff] [review]
This naive code does not work (nth-root issues? overflow?)

Individual results are useful but a cumulative score is useful too.  The obvious candidate is the geometric mean of the individual scores.
(Reporter)

Updated

8 years ago
Summary: runtest.py should be able to compute a cumulative score for a performance test suite run → runtests.py should be able to compute a cumulative score for a performance test suite run
(Assignee)

Comment 1

8 years ago
Created attachment 448120 [details] [diff] [review]
Update to scores patch to show geo mean of each metric

Builds upon initial patch - scores for different metrics are separated.  An even better method might be to show the scores for each directory as thats how the tests are naturally grouped.  Thoughts?
Attachment #447766 - Attachment is obsolete: true
Attachment #448120 - Flags: feedback?(lhansen)
(Reporter)

Comment 2

8 years ago
Comment on attachment 448120 [details] [diff] [review]
Update to scores patch to show geo mean of each metric

That seems fine.  Re the question about subdirectories I don't think I care either way; if that change takes away my ability to compute something across multiple subdirectories then no.  I had envisioned that we would run this once for each score we wanted to compute.

(Central problem remaining with the patch is that when I tested my version out it always resulted in 0 or infinity or NaN (I forget which) due to loss of precision in the computation.)
Attachment #448120 - Flags: feedback?(lhansen) → feedback+

Comment 3

8 years ago
A general problem for computing an index (with any kind of average: arithmetic, geometric, etc) is that if we use raw measurements then the tests are not all equally weighted.  (raw scores would be raw times, for example, or raw work/time scores from v8).

We could define an "index" file format, and make runtests.py use the index file.
* file contains a list of (reference, test) tuples.
* to compute index value, compute the geometric mean of:  (score/reference) for each test.
* add support in runtest.py to run an "index".  it should read the index definition, run the tests, and of course compute the rolled up result.

The values of the normalizers should be fixed in time:  adding or removing tests from the index, or changing the normalizers, changes the definition of the index and therefore defines a new index that is not comparable to the old one.

To compute normalizers, measure raw scores on a slow baseline.  The reason to use a slow baseline is simply to skew normalized scores into positive integer territory;  mathematically, any reference score will work as well as any other.

caveat: from what i can tell, this is exactly how the v8 tests are organized; each one defines its own reference (hardcoded in the test) and computes a score as a ratio.  this makes "metric v8" results within a subdirectory comparable to each other, but not with other tests in other directories.

Updated

7 years ago
Assignee: nobody → cpeyer
Flags: flashplayer-qrb+
Priority: -- → P3
Target Milestone: --- → flash10.2

Updated

7 years ago
Status: NEW → ASSIGNED
(Reporter)

Comment 4

7 years ago
Chris concluded on IM that the problem with always getting 0 was a stupid bug in my code (my characterization, not his).

Any chance of landing this?  The weighting will require a longer discussion perhaps, we can always recompute scores later if we decide to change how they're computed.  We do need to move forward on metrics however, and not having at least basic scoring functionality is a blocker.
Priority: P3 → P1
(Assignee)

Updated

7 years ago
Blocks: 572860
(Assignee)

Comment 5

7 years ago
Created attachment 455702 [details] [diff] [review]
cumulative scores + normalization index

Large update to the cumulative scores patch.

This adds the ability to also view results normalized against the values in index.py.

In order to run using the index values use the --index {indexfile} switch:

./runtests.py -i 5 --score --index index.py

A new index file can be created using the --saveindex {indexfile} switch.  If you use both the --saveindex and the --index switches, the values in the --index indexfile will be merged in along with the testrun results (values will be overwritten if the same metrics are generated)

This patch also contains the csv patch (Bug 572861) as some of the csv functionality is used.  There is also a fair amount of code reorg and cleanup that was done.
Attachment #448120 - Attachment is obsolete: true
Attachment #455702 - Flags: review?(brbaker)
Attachment #455702 - Flags: feedback?(lhansen)
Attachment #455702 - Flags: feedback?(edwsmith)
(Assignee)

Updated

7 years ago
Attachment #455702 - Flags: review?(dschaffe)

Comment 6

7 years ago
Comment on attachment 455702 [details] [diff] [review]
cumulative scores + normalization index

Chris I know that there is another patch coming but wanted to give some feedback.

could it be possible to drive the tests to run based on the index file? I want to have a "strings" index file which defines the baseline values AND defines which tests to run so that I can just run:
    ./runtests.py -i 5 --index strings.py
I think this was an original request from Edwin:
"* add support in runtest.py to run an "index".  it should read the index
definition, run the tests, and of course compute the rolled up result."

--saveindex should this ONLY save the index data for what was actually run and NOT the entire index file that was fed in via --index? 

--append, should this not only append new tests but also new metrics? If I do a time run and then a memory run I only have 1 metric in the index.

maybe a little cleaner error message around failing to load the specified index file, dumping out a stacktrace is not very useful and message could be clearer
Attachment #455702 - Flags: review?(brbaker) → feedback+

Comment 7

7 years ago
Comment on attachment 455702 [details] [diff] [review]
cumulative scores + normalization index

Is this patch still relevant or has it been rolled into another one?  General feedback: going in the right direction, but i think Brent's comments should be addressed, and I think the patch should be split into smaller pieces if possible.  For example, dont merge in the CSV changes.
Attachment #455702 - Flags: feedback?(edwsmith) → feedback-
(Assignee)

Comment 8

7 years ago
patch has been rolled into 	 572860 - not ideal, but the new way of aggregating the data internally forced me to use a mega-patch.
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → INVALID
(Reporter)

Comment 9

7 years ago
Comment on attachment 455702 [details] [diff] [review]
cumulative scores + normalization index

Deferring to Ed and Brent for now.
Attachment #455702 - Flags: feedback?(lhansen)
(Assignee)

Updated

7 years ago
Status: RESOLVED → VERIFIED

Comment 10

7 years ago
Comment on attachment 455702 [details] [diff] [review]
cumulative scores + normalization index

removed my review
Attachment #455702 - Flags: review?(dschaffe)
You need to log in before you can comment on or make changes to this bug.