Closed Bug 1674370 Opened 4 years ago Closed 10 months ago

[meta] Use test replicates in Perfherder CompareView

Categories

(Tree Management :: Perfherder, task)

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1831945

People

(Reporter: sparky, Unassigned)

References

(Depends on 2 open bugs, Blocks 2 open bugs, )

Details

(Keywords: meta)

Currently, the Perfherder Compare view is using the average values from X individual runs (for X values) when it performs a comparison. We should switch this to using the replicates instead because it gives us more trials. With this, we'll be able to tell if there is a regression/improvement with a high confidence using less task runs.

:sparky, can you reply to this this Jira question?

Flags: needinfo?(gmierz2)

So I'm looking at the doc and it feels like we're doing something backwards. I wonder if we should be modifying the summaries on the harness end instead so you can just take the replicates entries and do a geomean on them? I think this would make the changes you need to make much simpler and if we add new metrics will special summaries you won't have to make any changes.

I'm thinking about speedometer as an example. It has a single summary value calculated from X cycles, but we could also make 1 value per cycle and make it a subtest

Flags: needinfo?(gmierz2) → needinfo?(igoldan)

I'm thinking about speedometer as an example.

I found this speedometer artifact, which seems to have 160 subtests with 25 cycles per subtest. Hopefully it's a good example.

It has a single summary value calculated from X cycles, but we could also make 1 value per cycle and make it a subtest

I'm not sure I got the idea. You mean we should have 25 new subtests named 'speedometer-subtest-cycle-0', ..., 'speedometer-subtest-cycle-24' & each new subtest would aggregate a single replicate for each of the 160 original subtests?

If yes & these subtests have the same name within the same subtests array value (I think our schema validation considers this legal) and we no longer have a summary value, Perfherder would easily ingest 25 geomeans per job & display them in Compare view.

Flags: needinfo?(igoldan) → needinfo?(gmierz2)

Not quite but almost. What I'm thinking is that in the test harness we could make a single new subtest called speedometer, and this subtest contains 1 value for each speedometer cycle. Right now, we have a single speedometer value for all X cycles, with this method, we would have X speedometer values in a new subtest.

This way, you won't need to play with re-calculating the speedometer score, you can just blindly aggregate the subtest metrics in any way we need to.

Flags: needinfo?(gmierz2)

Would it look something like this?

{
  "application": {
    "name": "firefox",
    "version": "87.0a1"
  },
  "framework": {
    "name": "raptor"
  },
  "suites": [
    {
      "alertThreshold": 2.0,
      "extraOptions": [
        "nocondprof",
        "webrender"
      ],
      "lowerIsBetter": false,
      "name": "raptor-speedometer-firefox",
      "subtests": [
        {
          "alertThreshold": 2.0,
          "lowerIsBetter": true,
          "name": "speedometer",
          "replicates": [
            108.0000000000001,
            108.0000000000002,
            108.0000000000003,
            108.0000000000004,
            108.0000000000005,
            108.0000000000006,
            108.0000000000007,
            108.0000000000008,
            108.0000000000009,
            108.0000000000010,
            108.0000000000011,
            108.0000000000012,
            108.0000000000013,
            108.0000000000014,
            108.0000000000015,
            108.0000000000016,
            108.0000000000017,
            108.0000000000018,
            108.0000000000019,
            108.0000000000020,
            108.0000000000021,
            108.0000000000022,
            108.0000000000023,
            108.0000000000024,
            108.0000000000025
          ],
          "unit": "score",
          "value": 108.0000000000013
        },
        {
          "alertThreshold": 2.0,
          "lowerIsBetter": true,
          "name": "Angular2-TypeScript-TodoMVC",
          "replicates": [
            90.78,
            92.18,
            89.98,
            87.04,
            89.38,
            87.24,
            92.98,
            93.78,
            84.06,
            92.38,
            98.66,
            90.82,
            89.8,
            103.06,
            95.06,
            88.88,
            83.26,
            91.62,
            82.48,
            94.02,
            85.84,
            80.38,
            104.12,
            91.7,
            89.38
          ],
          "unit": "ms",
          "value": 90.78
        },
        ...
      ],
      "tags": [
        "benchmark",
        "warm"
      ],
      "type": "benchmark",
      "unit": "score",
      "value": 108.8945263882843
    }
  ]
}
Flags: needinfo?(gmierz2)

I've injected the new speedometer subtest as you suggested & grouped the X speedometer values under its replicates field.
(I numbered each X speedometer value with a unique figure at the end, to hint it's a geomean applied on a single cycle.)

This subtest' s value is an average on all its individual geomeans.

Yup, that's right, that's what I'm thinking. What do you think?

Flags: needinfo?(gmierz2) → needinfo?(igoldan)

I think this is a reasonable solution.

Flags: needinfo?(igoldan)
Depends on: 1693047
Keywords: meta
Summary: Use test replicates in Perfherder CompareView → [meta] Use test replicates in Perfherder CompareView
Depends on: 1693051
Depends on: 1693053
Blocks: 1761730

We've resolved this for compare view on try at least. I've set this as a duplicate of the bug that added the toggle to the compare view. I've added the new meta bug for this work to the see also section. Next steps are to work with SREs to expand this to more branches. The free retrigger issue is a minor issue for this work, and shouldn't be considered as a blocker. They would be good next steps though, so I've added them as dependencies to bug 1831943.

Status: NEW → RESOLVED
Closed: 10 months ago
Duplicate of bug: 1831945
Resolution: --- → DUPLICATE
See Also: → 1747357
See Also: → 1831943
You need to log in before you can comment on or make changes to this bug.