Closed Bug 1604541 Opened 4 years ago Closed 4 years ago

Consider dropping or reducing the frequency of raptor performance tests on opt builds

Tracking

(firefox75 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla75

Tracking Flags:

Tracking

Status

firefox75

---

fixed

People

(Reporter: acreskey, Assigned: alexandrui)

References

(Blocks 1 open bug)

Details

(Whiteboard: [ci-costs-2020:done])

Attachments

(1 file)

Bug 1604541 Stop raptor android performance tests on opt builds r?sparky,#perftest 4 years ago Alexandru Ionescu (needinfo me) [:alexandrui] 47 bytes, text/x-phabricator-request		Details \| Review

Andrew Creskey [:acreskey]

Reporter

Description

•

4 years ago

Currently we run raptor performance tests on opt as well as PGO builds (desktop and android.)

With varying frequencies, these tests are run on commits to:
mozilla-central,
autoland
beta (here only on shippable, I believe)
try pushes
(am I missing any?)

Since we do not ship opt binaries, and since the performance characteristics differ, the value of these tests is much lower than the PGO ones.

This bug is to discuss the possibility of dropping or reducing the frequency at which we run these opt perf tests.

Note: this does not affect MacOS where we do no yet have PGO builds.

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

4 years ago

Blocks: 1573872

Joel Maher ( :jmaher ) (UTC -8)

Comment 1

•

4 years ago

one concern is with developers wanting to push to try and run tests, if we do not run by default on m-c opt jobs, then these jobs need --full to be scheduled. The end result would be in order to run perf jobs we would need pgo/shippable builds on try, that has a longer build time when not using artifact builds which is why some people prefer opt builds.

For android, this might already be the case; if it isn't the case then that seems like a good balance.

For desktop, there might be pushback- I assume the perf team (tooling and development) are probably >50% of the users pushing to try.

If there are any changes to be made, ensuring that wiki/mana/in-tree documentation is up to date and easy to find. Likewise when filing a perf bug make sure to have a comment that indicates the requirement for --full.

Reducing the load on both desktop and android would be a big win for cpu time and something like this is a small change for developers (and a slight delay in try results when investigating perf regressions). Most likely we could run more tests, experiment more, or reduce our budgets in 2020/2021.

:davehunt, can you help shed light on if --full is required when pushing android raptor jobs:
a) ./mach try fuzzy -q 'android raptor'
b) ./mach try fuzzy -q 'android raptor' --full

Flags: needinfo?(dave.hunt)

Joel Maher ( :jmaher ) (UTC -8)

Comment 2

•

4 years ago

oh, from slack, davehunt helped me out:
https://bugzilla.mozilla.org/show_bug.cgi?id=1565644#c1

it appears we already require --full for running android raptor tests (option B above), so removing opt for android on m-c/try would be no change in workflow (maybe time).

I guess if anyone looks at opt vs pgo results that would be useful to ascertain.

Flags: needinfo?(dave.hunt)

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 3

•

4 years ago

So what's actually left over here? Do we still need some action? If yes which platforms are affected? Joel, do you still have the overview?

Priority: -- → P3

Joel Maher ( :jmaher ) (UTC -8)

Comment 4

•

4 years ago

as this bug was reduced to be focused on android and not desktop, here is what we run:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&tier=1%2C2%2C3&searchStr=raptor%2Candroid&revision=81f420f057e45d76c2ea5a9533588341154c92fb

moto g5 we still run 11 tp6m jobs on opt which appears to be duplicated in the larger set run on pgo.

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 5

•

4 years ago

Doesn't that also apply to Pixel 2? For me it looks identical.

Dave, what's your take on that? Shall we drop the tests for opt in favor of the pgo ones which we actually pick for releases?

Flags: needinfo?(dave.hunt)

OS: Unspecified → Android

Hardware: Unspecified → All

Joel Maher ( :jmaher ) (UTC -8)

Comment 6

•

4 years ago

yes, we have the duplicate for p2-aarch64, my fault for not seeing that at first.

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 7

•

4 years ago

I thought we had already stopped running opt Android builds. I see from the link in comment 4 that we're still running speedometer, warm page load, and one cold page load job. I don't know why we'd be running these as a subset. Let's stop running all opt Android perf tests unless there is an objection or concern.

Flags: needinfo?(dave.hunt)

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 8

•

4 years ago

Alright. Florin, can someone from your team take care of that please? Thanks

Flags: needinfo?(fstrugariu)

Florin Strugariu [:Bebe]

Updated

•

4 years ago

Assignee: nobody → aionescu

Status: NEW → ASSIGNED

Flags: needinfo?(fstrugariu)

Florin Strugariu [:Bebe]

Updated

•

4 years ago

Priority: P3 → P1

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 9

•

4 years ago

Oh yeah, here we go again with changes on taskcluster ymls. :)

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Updated

•

4 years ago

Updated

•

4 years ago

Whiteboard: [ci-costs-2020:todo]

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 10

•

4 years ago

(In reply to Dave Hunt [:davehunt] [he/him] ⌚BST from comment #7)

I thought we had already stopped running opt Android builds. I see from the link in comment 4 that we're still running speedometer, warm page load, and one cold page load job. I don't know why we'd be running these as a subset. Let's stop running all opt Android perf tests unless there is an objection or concern.

Should I stop running all opt Android perf tests from all projects? This is what I understand from the description of the bug.

Flags: needinfo?(dave.hunt)

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 11

•

4 years ago

We should continue to support running them on try with --full

Flags: needinfo?(dave.hunt)

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 12

•

4 years ago

I think we're close to the point that those test definitions are not sustainable anymore with using regular expressions to filter them. And I'll explain this.
Taskgraph doesn't allow multiple matching for any of the values. The relationship should be 1:1. One single match for every test signature. The regular expression can easily become exhaustive and hard to fine-tune. For the case below I strongly believe that we shouldn't go further on including also g5. We have for the moment 2 devices only, but what happens when we add multiple devices? This will get close to insanity for controlling the matching any further.

raptor-tp6m-16-geckoview-cold:
    run-on-projects:
        by-test-platform:
            android-hw-p2-.*api-16/.*: []
            android-hw-.*/opt.*: []

> ./mach taskgraph full
Exception: Multiple matching values for test-platform u'android-hw-p2-8-0-arm7-api-16/opt' found while determining item `run-on-projects` in `raptor-tp6m-16-geckoview-cold`

What I am thinking is to include pgo and opt in keyed-by. I'm not sure it is the best solution, but it is the best we got so far.

Flags: needinfo?(dave.hunt)

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 13

•

4 years ago

Here's what we have for job-defaults in raptor-gve.yml:

run-on-projects:
    by-test-name:
        raptor-tp6m-.*-cold:
            by-test-platform:
                android-hw-.*/opt: []
                android-hw-p2-.*api-16/pgo: []
                android-hw-p2-.*aarch64.*/pgo: ['trunk', 'mozilla-beta']
                default: ['trunk', 'mozilla-beta']
        default:
            by-test-platform:
                android-hw-p2-.*api-16/.*: []
                default: ['mozilla-central']

We should be able to

Simplify android-hw-p2-.*api-16/pgo to not call out pgo (like we have for default).
Remove the entry for android-hw-p2-.*aarch64.*/pgo as this matches this group's default.

Here's what we have for raptor-tp6m-16-geckoview-cold (the test you've quoted in comment 12):

run-on-projects:
    by-test-platform:
        android-hw-(?!p2-.*api-16).*/opt: ['mozilla-central']
        android-hw-p2-.*api-16/.*: []
        android-hw-p2-.*aarch64.*/pgo: ['trunk', 'mozilla-beta']
        default: ['trunk', 'mozilla-beta']

This appears to only override the defaults in order to explicitly run opt builds against mozilla-central. As were talking about disabling these, we should be able to just remove the run-on-projects for this test. It looks to me like all instances of run-on-platforms could be removed to allow the defaults to be inherited, with the exception being raptor-unity-webgl-geckoview, which is disabled on Moto G5. Am I missing something?

Flags: needinfo?(dave.hunt)

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 14

•

4 years ago

Attached file Bug 1604541 Stop raptor android performance tests on opt builds r?sparky,#perftest — Details

Add by-build-type [opt/pgo/debug/...] key under run-on-projects to avoid super-complicating the regular expressions in raptor YMLs that are already complicated.

Alexandru Ionescu (needinfo me) [:alexandrui]

Assignee

Comment 15

•

4 years ago

I just pushed a patch tentative that allow by-build-type under run-on-projects to be able to filter them out. The code above is just for raptor-gve. There's more to come for the other mobile apps.

Phabricator Automation

Updated

•

4 years ago

Attachment #9130121 - Attachment description: Bug 1604541 Consider dropping or reducing the frequency of raptor performance tests on opt builds → Bug 1604541 Consider dropping or reducing the frequency of raptor performance tests on opt builds r?sparky,#perftest

Phabricator Automation

Updated

•

4 years ago

Attachment #9130121 - Attachment description: Bug 1604541 Consider dropping or reducing the frequency of raptor performance tests on opt builds r?sparky,#perftest → Bug 1604541 Stop raptor android performance tests on opt builds r?sparky,#perftest

Pulsebot

Comment 16

•

4 years ago

Pushed by aionescu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/80d05dc950b1
Stop raptor android performance tests on opt builds r=sparky,perftest-reviewers

Dorel Luca [:dluca]

Comment 17

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/80d05dc950b1

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox75: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla75

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

4 years ago

Whiteboard: [ci-costs-2020:todo] → [ci-costs-2020:done]

You need to log in before you can comment on or make changes to this bug.