Closed Bug 1656465 Opened 4 years ago Closed 4 years ago

Consolidate all `push-interval-*` optimization strategies into one of the two backstops

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: aryx, Assigned: ahal)

Details

Attachments

(10 files)

Bug 1656465 - Run fuzzing builds on backstop pushes. r=ahal 4 years ago Marco Castelluccio [:marco] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1656465 - [ci] Move fixtures from test_mach_try_auto.py to conftest to share with new tests, r?#taskgraph-reviewers 4 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1656465 - [ci] Add some taskgraph integration tests to ensure fuzzing builds run where we expect them to, r?#taskgraph-reviewers 4 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1656465 - Move all 'push-interval-25' optimizations to 'push-interval-20', r?jmaher 4 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1656465 - [taskgraph] Consider all pushes to release branches as 'backstops', r?marco 4 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1656465 - [taskgraph.optimize] Allow 'split_bugbug_arg' to work with arbitrary number of substrategies, r?marco 4 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1656465 - [taskgraph.optimize] Refactor "optimized-backstop" pushes into "expanded" pushes, r?marco 4 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1656465 - [taskgraph.optimize] Rename 'push-interval-{10,20}' strategies to 'expanded' and 'backstop' respectively, r?marco 4 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1656465 - Drop the 'remove_on_projects' feature from the Backstop optimization, r?marco 4 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1656465 - [taskgraph.optimize] Implement a 'Not' composite strategy, r?marco 4 years ago Andrew Halberstadt [:ahal] 47 bytes, text/x-phabricator-request		Details \| Review

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Reporter

Description

•

4 years ago

Today, Windows debug fuzzing and AArch symbols tasks were broken on central.

During the investigation, it got noticed the fuzzing builds didn't run for the merge candidate because they exactly run every 10th job while test task in general run every 10th push or every 2h, whatever applies first.

This means code which breaks builds can be merged to central.

Quoting decoder: "we need to ensure that merges for fuzzing builds are always green. that was the initial requirement for adding those builds to the 10th push list".

Marco Castelluccio [:marco]

Comment 1

•

4 years ago

There are two options:

we stop scheduling everything every 2 hours, and only rely on push interval;
we drop the "push-interval-10", "push-interval-20" and "push-interval-25" strategies and use alternative strategies that take into account the time interval too.

Joel Maher ( :jmaher ) (UTC -8)

Comment 2

•

4 years ago

I think option 2 is best. Having half backstop and full backstop, where half backstop is guaranteed green, and full backstop is more tier-2 jobs, makes the most sense. If we do this, the (half|full) backstop will have perf, so we would run shippable builds there (which then makes backfilling a regression require shippable builds which will take hours) which will duplicate cost or add some confusion.

If we did this we would need some time based interval for half|full, maybe half=2hours|10pushes, full=4hours|20pushes ?

Marco Castelluccio [:marco]

Comment 3

•

4 years ago

Attached file Bug 1656465 - Run fuzzing builds on backstop pushes. r=ahal — Details

Phabricator Automation

Updated

•

4 years ago

Assignee: nobody → mcastelluccio

Status: NEW → ASSIGNED

Marco Castelluccio [:marco]

Comment 4

•

4 years ago

I've submitted a patch to solve the problem for fuzzing specifically, but the generic problem still applies.

Yet another option would be to have a "merge candidate" indicator on Treeherder for pushes for which we are sure everything was run and there are no failures.

Assignee: mcastelluccio → nobody

Status: ASSIGNED → NEW

Keywords: leave-open

Marco Castelluccio [:marco]

Comment 5

•

4 years ago

(In reply to Marco Castelluccio [:marco] from comment #4)

Yet another option would be to have a "merge candidate" indicator on Treeherder for pushes for which we are sure everything was run and there are no failures.

I filed bug 1657921 for that.

Phabricator Automation

Updated

•

4 years ago

Assignee: nobody → mcastelluccio

Status: NEW → ASSIGNED

Andrew Halberstadt [:ahal]

Assignee

Comment 6

•

4 years ago

My understanding is that this happens to any task that has a push-interval-* optimization strategy. After some discussion on Matrix, I think we'd like to deprecate all of those strategies and force all the tasks that use them to use either the optimized-backstop (every 10th push or 2 hours) or the full-backstop (every 20th push or 2 hours). Other intervals should be disallowed.

Adjusting the title here to reflect, please tweak if I got something wrong.

Andrew Halberstadt [:ahal]

Assignee

Updated

•

4 years ago

Summary: tier 1 backstop pushes need to run on same pushes like full test set → Consolidate all `push-interval-*` optimization strategies into one of the two backstops

Marco Castelluccio [:marco]

Comment 7

•

4 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #6)

My understanding is that this happens to any task that has a push-interval-* optimization strategy. After some discussion on Matrix, I think we'd like to deprecate all of those strategies and force all the tasks that use them to use either the optimized-backstop (every 10th push or 2 hours) or the full-backstop (every 20th push or 2 hours). Other intervals should be disallowed.

Adjusting the title here to reflect, please tweak if I got something wrong.

Agreed! Option 2 from comment 1.

I only submitted a patch to fix fuzzing, as it's probably the most important one (fuzzing needs green mozilla-central builds), but I won't be able to fix the rest of the builds any time soon, so I'm unassigning myself.

Assignee: mcastelluccio → nobody

Status: ASSIGNED → NEW

Pulsebot

Comment 8

•

4 years ago

Pushed by mcastelluccio@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/cd425862bc50 Run fuzzing builds on backstop pushes. r=ahal DONTBUILD

Phabricator Automation

Updated

•

4 years ago

Assignee: nobody → mcastelluccio

Status: NEW → ASSIGNED

Atila Butkovits

Comment 9

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/cd425862bc50

Marco Castelluccio [:marco]

Updated

•

4 years ago

Assignee: mcastelluccio → nobody

Status: ASSIGNED → NEW

Andrew Halberstadt [:ahal]

Assignee

Updated

•

4 years ago

Assignee: nobody → ahal

Status: NEW → ASSIGNED

Andrew Halberstadt [:ahal]

Assignee

Comment 10

•

4 years ago

Attached file Bug 1656465 - [ci] Move fixtures from test_mach_try_auto.py to conftest to share with new tests, r?#taskgraph-reviewers — Details

Under this model, each test file should correspond to a static set of parameters. Each
set of parameters will only ever generate the taskgraph a single time (so adding a new
file will have a perf hit, but adding new test functions to an existing file will not).

Andrew Halberstadt [:ahal]

Assignee

Comment 11

•

4 years ago

Attached file Bug 1656465 - [ci] Add some taskgraph integration tests to ensure fuzzing builds run where we expect them to, r?#taskgraph-reviewers — Details

Each file represents a new taskgraph generation in the tests. But now that these exist, we
can add new assertions to the existing files without worry.

Depends on D89054

Andrew Halberstadt [:ahal]

Assignee

Comment 12

•

4 years ago

Attached file Bug 1656465 - Move all 'push-interval-25' optimizations to 'push-interval-20', r?jmaher — Details

We want to try to align 'push-interval' tasks to the 'backstop'. This way
we have greater confidence in our backstop pushes, and it will allow us to
simplify a lot of our backstop logic.

Depends on D89055

Pulsebot

Comment 13

•

4 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a06084e5fb5f [ci] Move fixtures from test_mach_try_auto.py to conftest to share with new tests, r=taskgraph-reviewers,aki https://hg.mozilla.org/integration/autoland/rev/4f3292b37d8a [ci] Add some taskgraph integration tests to ensure fuzzing builds run where we expect them to, r=taskgraph-reviewers,aki https://hg.mozilla.org/integration/autoland/rev/2e30486edbb3 Move all 'push-interval-25' optimizations to 'push-interval-20', r=jmaher

Noemi Erli[:noemi_erli]

Comment 14

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a06084e5fb5f
https://hg.mozilla.org/mozilla-central/rev/4f3292b37d8a
https://hg.mozilla.org/mozilla-central/rev/2e30486edbb3

Andrew Halberstadt [:ahal]

Assignee

Comment 15

•

4 years ago

Attached file Bug 1656465 - [taskgraph] Consider all pushes to release branches as 'backstops', r?marco — Details

The intent of a "backstop" push, is to run everything so we can be absolutely sure that
the push in question does not cause any regressions.

Previously, backstops were thought to be only something that ran on autoland.
This was because the other branches already ran everything so the concept of
a "backstop" didn't make much sense.

But going by the above definition, it would make more sense to say that every
non autoland (or try) push is a backstop. Since the intent there is to run
everything to avoid regressions.

This change will allow us to simplify our optimization algorithms.

Andrew Halberstadt [:ahal]

Assignee

Comment 16

•

4 years ago

Attached file Bug 1656465 - [taskgraph.optimize] Allow 'split_bugbug_arg' to work with arbitrary number of substrategies, r?marco — Details

This still assumes that the bugbug-based strategy is last however.

Depends on D89499

Andrew Halberstadt [:ahal]

Assignee

Comment 17

•

4 years ago

Attached file Bug 1656465 - [taskgraph.optimize] Refactor "optimized-backstop" pushes into "expanded" pushes, r?marco — Details

This is a nomenclature change + refactoring. Now there is only a single
"backstop" push. Which is currently set to every 20th push on autoland (or
every push on non-autoland branches).

Now there is also a concept of an "expanded" push. These are pushes that run
more stuff than usual, but not as much as a backstop normally would. These are
currently set to run at half the interval of a backstop.

Concretely, here are the strategy changes:

Introduced the 'expanded' strategy which has 'backstop' baked in
Renamed 'optimized-backstop' -> 'test-expanded'
Baked 'expanded' strategy into the 'test-expanded' strategy (therefore 'backstop' is also baked in)
Baked 'test-expanded' strategy into the 'test' strategy

Depends on D89500

Andrew Halberstadt [:ahal]

Assignee

Comment 18

•

4 years ago

Attached file Bug 1656465 - [taskgraph.optimize] Rename 'push-interval-{10,20}' strategies to 'expanded' and 'backstop' respectively, r?marco — Details

This removes the last uses of the 'push-interval-10' and 'push-interval-20' strategies.
They are being removed because they are dangerous in that its easy to accidentally not run
tasks when they should.

Instead, task authors should decide whether they want their tasks to run on
"backstop" pushes (run everything) or "expanded" pushes (run more than usual,
but still not as much as a backstop). Note that using "expanded" means the task
will also run on backstop pushes. It'll just additionally run on "expanded"
pushes.

In practice 'backstop' pushes will be every 20th push and 'expanded' pushes
will be every 10th push. Though this may vary due to the time component in
backstops.

Depends on D89501

Andrew Halberstadt [:ahal]

Assignee

Comment 19

•

4 years ago

Attached file Bug 1656465 - Drop the 'remove_on_projects' feature from the Backstop optimization, r?marco — Details

In the past, the 'backstop' optimization was applied to tasks by default across
all projects, even though it only really made sense on autoland. To choose what
would happen on non-autoland branches, we invented this 'remove_on_projects'
concept.

These days, we only apply the backstop optimization in the first place for
autoland. So 'remove_on_projects' is no longer necessary.

Depends on D88149

Phabricator Automation

Updated

•

4 years ago

Attachment #9174503 - Attachment description: Bug 1656465 - [taskgraph] Consider all pushes to release branches as 'backstops', marco → Bug 1656465 - [taskgraph] Consider all pushes to release branches as 'backstops', r?marco

Andrew Halberstadt [:ahal]

Assignee

Comment 20

•

4 years ago

Attached file Bug 1656465 - [taskgraph.optimize] Implement a 'Not' composite strategy, r?marco — Details

It turns out that 'Not' is needed to negate "backstops". E.g, we normally
we want to use a pattern like so:

All("skip-unless-backstop", "test")

Since 'skip-unless-backstop' returns False on backstop pushes, it disables
the test strategy there.

However, suppose we wanted to run a special optimization, only on backstop
pushes. I.e, the opposite of the above example. Then we need to use:

All(Not("skip-unless-backstop"), "test-backstop")

Depends on D89500

Pulsebot

Comment 21

•

4 years ago

Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/64d8b17b920e [taskgraph] Consider all pushes to release branches as 'backstops', r=marco https://hg.mozilla.org/integration/autoland/rev/fe9bb34b3bae [taskgraph.optimize] Allow 'split_bugbug_arg' to work with arbitrary number of substrategies, r=marco https://hg.mozilla.org/integration/autoland/rev/cf06909a30d8 [taskgraph.optimize] Implement a 'Not' composite strategy, r=marco https://hg.mozilla.org/integration/autoland/rev/1020ae833cdb [taskgraph.optimize] Refactor "optimized-backstop" pushes into "expanded" pushes, r=marco https://hg.mozilla.org/integration/autoland/rev/e4f78c4b8a81 [taskgraph.optimize] Rename 'push-interval-{10,20}' strategies to 'expanded' and 'backstop' respectively, r=marco https://hg.mozilla.org/integration/autoland/rev/d8609e8ba6b0 Drop the 'remove_on_projects' feature from the Backstop optimization, r=marco

Cristina Coroiu [:ccoroiu]

Comment 22

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/64d8b17b920e
https://hg.mozilla.org/mozilla-central/rev/fe9bb34b3bae
https://hg.mozilla.org/mozilla-central/rev/cf06909a30d8
https://hg.mozilla.org/mozilla-central/rev/1020ae833cdb
https://hg.mozilla.org/mozilla-central/rev/e4f78c4b8a81
https://hg.mozilla.org/mozilla-central/rev/d8609e8ba6b0

Andrew Halberstadt [:ahal]

Assignee

Updated

•

4 years ago

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

Keywords: leave-open

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.