[meta] Use test replicates in Perfherder CompareView
Categories
(Tree Management :: Perfherder, task)
Tracking
(Not tracked)
People
(Reporter: sparky, Unassigned)
References
(Depends on 2 open bugs, Blocks 2 open bugs, )
Details
(Keywords: meta)
Currently, the Perfherder Compare view is using the average values from X individual runs (for X values) when it performs a comparison. We should switch this to using the replicates instead because it gives us more trials. With this, we'll be able to tell if there is a regression/improvement with a high confidence using less task runs.
Reporter | ||
Updated•4 years ago
|
Comment 1•4 years ago
•
|
||
:sparky, can you reply to this this Jira question?
Reporter | ||
Comment 2•4 years ago
|
||
For context, the question is related to reviewing this doc: https://docs.google.com/document/d/1y8F0VH3TnAOzqY2cssugjSKBVN28bGRILreQXX_Owrw/edit#heading=h.nsk8ut4t646y
Reporter | ||
Comment 3•4 years ago
|
||
So I'm looking at the doc and it feels like we're doing something backwards. I wonder if we should be modifying the summaries on the harness end instead so you can just take the replicates entries and do a geomean on them? I think this would make the changes you need to make much simpler and if we add new metrics will special summaries you won't have to make any changes.
I'm thinking about speedometer as an example. It has a single summary value calculated from X cycles, but we could also make 1 value per cycle and make it a subtest
Comment 4•4 years ago
•
|
||
I'm thinking about speedometer as an example.
I found this speedometer artifact, which seems to have 160 subtests with 25 cycles per subtest. Hopefully it's a good example.
It has a single summary value calculated from X cycles, but we could also make 1 value per cycle and make it a subtest
I'm not sure I got the idea. You mean we should have 25 new subtests named 'speedometer-subtest-cycle-0', ..., 'speedometer-subtest-cycle-24' & each new subtest would aggregate a single replicate for each of the 160 original subtests?
If yes & these subtests have the same name within the same subtests
array value (I think our schema validation considers this legal) and we no longer have a summary value, Perfherder would easily ingest 25 geomeans per job & display them in Compare view.
Reporter | ||
Comment 5•4 years ago
|
||
Not quite but almost. What I'm thinking is that in the test harness we could make a single new subtest called speedometer
, and this subtest contains 1 value for each speedometer cycle. Right now, we have a single speedometer value for all X cycles, with this method, we would have X speedometer values in a new subtest.
This way, you won't need to play with re-calculating the speedometer score, you can just blindly aggregate the subtest metrics in any way we need to.
Comment 6•4 years ago
|
||
Would it look something like this?
{
"application": {
"name": "firefox",
"version": "87.0a1"
},
"framework": {
"name": "raptor"
},
"suites": [
{
"alertThreshold": 2.0,
"extraOptions": [
"nocondprof",
"webrender"
],
"lowerIsBetter": false,
"name": "raptor-speedometer-firefox",
"subtests": [
{
"alertThreshold": 2.0,
"lowerIsBetter": true,
"name": "speedometer",
"replicates": [
108.0000000000001,
108.0000000000002,
108.0000000000003,
108.0000000000004,
108.0000000000005,
108.0000000000006,
108.0000000000007,
108.0000000000008,
108.0000000000009,
108.0000000000010,
108.0000000000011,
108.0000000000012,
108.0000000000013,
108.0000000000014,
108.0000000000015,
108.0000000000016,
108.0000000000017,
108.0000000000018,
108.0000000000019,
108.0000000000020,
108.0000000000021,
108.0000000000022,
108.0000000000023,
108.0000000000024,
108.0000000000025
],
"unit": "score",
"value": 108.0000000000013
},
{
"alertThreshold": 2.0,
"lowerIsBetter": true,
"name": "Angular2-TypeScript-TodoMVC",
"replicates": [
90.78,
92.18,
89.98,
87.04,
89.38,
87.24,
92.98,
93.78,
84.06,
92.38,
98.66,
90.82,
89.8,
103.06,
95.06,
88.88,
83.26,
91.62,
82.48,
94.02,
85.84,
80.38,
104.12,
91.7,
89.38
],
"unit": "ms",
"value": 90.78
},
...
],
"tags": [
"benchmark",
"warm"
],
"type": "benchmark",
"unit": "score",
"value": 108.8945263882843
}
]
}
Updated•4 years ago
|
Comment 7•4 years ago
•
|
||
I've injected the new speedometer
subtest as you suggested & grouped the X speedometer values under its replicates
field.
(I numbered each X speedometer value with a unique figure at the end, to hint it's a geomean applied on a single cycle.)
This subtest' s value is an average on all its individual geomeans.
Reporter | ||
Comment 8•4 years ago
|
||
Yup, that's right, that's what I'm thinking. What do you think?
Updated•4 years ago
|
Reporter | ||
Comment 10•2 years ago
•
|
||
We've resolved this for compare view on try at least. I've set this as a duplicate of the bug that added the toggle to the compare view. I've added the new meta bug for this work to the see also section. Next steps are to work with SREs to expand this to more branches. The free retrigger issue is a minor issue for this work, and shouldn't be considered as a blocker. They would be good next steps though, so I've added them as dependencies to bug 1831943.
Description
•