1674370 - [meta] Use test replicates in Perfherder CompareView

Reporter

Description

•

4 years ago

Currently, the Perfherder Compare view is using the average values from X individual runs (for X values) when it performs a comparison. We should switch this to using the replicates instead because it gives us more trials. With this, we'll be able to tell if there is a regression/improvement with a high confidence using less task runs.

Greg Mierzwinski [:sparky]

Reporter

Updated

•

4 years ago

URL: https://jira.mozilla.com/browse/FXP-1408

Ionuț Goldan [:igoldan]

Comment 1

•

4 years ago

•

Edited

:sparky, can you reply to this this Jira question?

Flags: needinfo?(gmierz2)

Greg Mierzwinski [:sparky]

Reporter

Comment 2

•

4 years ago

For context, the question is related to reviewing this doc: https://docs.google.com/document/d/1y8F0VH3TnAOzqY2cssugjSKBVN28bGRILreQXX_Owrw/edit#heading=h.nsk8ut4t646y

Greg Mierzwinski [:sparky]

Reporter

Comment 3

•

4 years ago

So I'm looking at the doc and it feels like we're doing something backwards. I wonder if we should be modifying the summaries on the harness end instead so you can just take the replicates entries and do a geomean on them? I think this would make the changes you need to make much simpler and if we add new metrics will special summaries you won't have to make any changes.

I'm thinking about speedometer as an example. It has a single summary value calculated from X cycles, but we could also make 1 value per cycle and make it a subtest

Flags: needinfo?(gmierz2) → needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Comment 4

•

4 years ago

•

Edited

I'm thinking about speedometer as an example.

I found this speedometer artifact, which seems to have 160 subtests with 25 cycles per subtest. Hopefully it's a good example.

It has a single summary value calculated from X cycles, but we could also make 1 value per cycle and make it a subtest

I'm not sure I got the idea. You mean we should have 25 new subtests named 'speedometer-subtest-cycle-0', ..., 'speedometer-subtest-cycle-24' & each new subtest would aggregate a single replicate for each of the 160 original subtests?

If yes & these subtests have the same name within the same subtests array value (I think our schema validation considers this legal) and we no longer have a summary value, Perfherder would easily ingest 25 geomeans per job & display them in Compare view.

Flags: needinfo?(igoldan) → needinfo?(gmierz2)

Greg Mierzwinski [:sparky]

Reporter

Comment 5

•

4 years ago

Not quite but almost. What I'm thinking is that in the test harness we could make a single new subtest called speedometer, and this subtest contains 1 value for each speedometer cycle. Right now, we have a single speedometer value for all X cycles, with this method, we would have X speedometer values in a new subtest.

This way, you won't need to play with re-calculating the speedometer score, you can just blindly aggregate the subtest metrics in any way we need to.

Flags: needinfo?(gmierz2)

Ionuț Goldan [:igoldan]

Comment 6

•

4 years ago

Would it look something like this?

{
  "application": {
    "name": "firefox",
    "version": "87.0a1"
  },
  "framework": {
    "name": "raptor"
  },
  "suites": [
    {
      "alertThreshold": 2.0,
      "extraOptions": [
        "nocondprof",
        "webrender"
      ],
      "lowerIsBetter": false,
      "name": "raptor-speedometer-firefox",
      "subtests": [
        {
          "alertThreshold": 2.0,
          "lowerIsBetter": true,
          "name": "speedometer",
          "replicates": [
            108.0000000000001,
            108.0000000000002,
            108.0000000000003,
            108.0000000000004,
            108.0000000000005,
            108.0000000000006,
            108.0000000000007,
            108.0000000000008,
            108.0000000000009,
            108.0000000000010,
            108.0000000000011,
            108.0000000000012,
            108.0000000000013,
            108.0000000000014,
            108.0000000000015,
            108.0000000000016,
            108.0000000000017,
            108.0000000000018,
            108.0000000000019,
            108.0000000000020,
            108.0000000000021,
            108.0000000000022,
            108.0000000000023,
            108.0000000000024,
            108.0000000000025
          ],
          "unit": "score",
          "value": 108.0000000000013
        },
        {
          "alertThreshold": 2.0,
          "lowerIsBetter": true,
          "name": "Angular2-TypeScript-TodoMVC",
          "replicates": [
            90.78,
            92.18,
            89.98,
            87.04,
            89.38,
            87.24,
            92.98,
            93.78,
            84.06,
            92.38,
            98.66,
            90.82,
            89.8,
            103.06,
            95.06,
            88.88,
            83.26,
            91.62,
            82.48,
            94.02,
            85.84,
            80.38,
            104.12,
            91.7,
            89.38
          ],
          "unit": "ms",
          "value": 90.78
        },
        ...
      ],
      "tags": [
        "benchmark",
        "warm"
      ],
      "type": "benchmark",
      "unit": "score",
      "value": 108.8945263882843
    }
  ]
}

Ionuț Goldan [:igoldan]

Updated

•

4 years ago

Flags: needinfo?(gmierz2)

Ionuț Goldan [:igoldan]

Comment 7

•

4 years ago

•

Edited

I've injected the new speedometer subtest as you suggested & grouped the X speedometer values under its replicates field.
(I numbered each X speedometer value with a unique figure at the end, to hint it's a geomean applied on a single cycle.)

This subtest' s value is an average on all its individual geomeans.

Greg Mierzwinski [:sparky]

Reporter

Comment 8

•

4 years ago

Yup, that's right, that's what I'm thinking. What do you think?

Flags: needinfo?(gmierz2) → needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Comment 9

•

4 years ago

I think this is a reasonable solution.

Flags: needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Updated

•

4 years ago

Depends on: 1693047

Keywords: meta

Summary: Use test replicates in Perfherder CompareView → [meta] Use test replicates in Perfherder CompareView

Ionuț Goldan [:igoldan]

Updated

•

4 years ago

Depends on: 1693051

Ionuț Goldan [:igoldan]

Updated

•

4 years ago

Depends on: 1693053

Kimberly Sereduck :kimberlythegeek

Updated

•

3 years ago

Blocks: 1754831

Greg Mierzwinski [:sparky]

Reporter

Updated

•

3 years ago

Blocks: 1761730

Greg Mierzwinski [:sparky]

Reporter

Comment 10

•

2 years ago

•

Edited

We've resolved this for compare view on try at least. I've set this as a duplicate of the bug that added the toggle to the compare view. I've added the new meta bug for this work to the see also section. Next steps are to work with SREs to expand this to more branches. The free retrigger issue is a minor issue for this work, and shouldn't be considered as a blocker. They would be good next steps though, so I've added them as dependencies to bug 1831943.

Status: NEW → RESOLVED

Closed: 2 years ago

Duplicate of bug: 1831945

Resolution: --- → DUPLICATE

Updated

•

2 years ago

Bugzilla

[meta] Use test replicates in Perfherder CompareView

Categories

(Tree Management :: Perfherder, task)

Tracking

(Not tracked)

People

(Reporter: sparky, Unassigned)

References

(Depends on 2 open bugs, Blocks 2 open bugs,
URL
)

Details

(Keywords: meta)

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Comment 9

Updated

Updated

Updated

Updated

Updated

Comment 10

Updated