1631990 - Additional filters for mach try task blacklist to filter tasks unless --full is specified

Edwin Takahashi (:egao | infrequent contributor)

Assignee

Description

•

4 years ago

I am proposing to insert additional filters into the blacklist that is applied to mach try pushes, as implemented in Bug 1630350 and Bug 1627340.

For the most part, these tasks must intentionally be chosen for mach try fuzzy.
However, if a user pushes with mach try chooser + select all, or mach try syntax + -b do, these tasks are inadvertently scheduled quite easily.

Since in Bug 1630350 it was intentionally made such that mach try syntax does not have a --full option to bypass filtering, this means users intending to run tasks filtered by the blacklist would be directed to use the alternatives.

Proposed list:

build.*aarch64.*
build*.gcp.*
build-linux/opt
.*-rusttests
.*valgrind.*

Edwin Takahashi (:egao | infrequent contributor)

Assignee

Comment 1

•

4 years ago

:jmaher, :gbrown - I'd like to hear your thoughts about adding the above list (and possibly more) to the task filter blacklist to be applied by default to the mach selectors.

I was particularly annoyed that selecting syntax -b do -p all would schedule builds for all those items, including the ones we have restricted from running like linux32.

I suspect if I run -u all I would see a lot of unittests get scheduled, which may be fine additions to the list.

Flags: needinfo?(jmaher)

Flags: needinfo?(gbrown)

Joel Maher ( :jmaher ) (UTC -8)

Comment 2

•

4 years ago

I think this is a great idea. A lot of these are things that we are already looking at restricting to every 10th push or as tier-2, so this is right in line. Also the plain builds might be good to add, same with mingw32 :)

Flags: needinfo?(jmaher)

Edwin Takahashi (:egao | infrequent contributor)

Assignee

Comment 3

•

4 years ago

Attached file Bug 1631990 - additional filters on mach try selectors — Details

Changes:

Add more filters to be applied by default to all mach try selectors.

In order to access tasks hidden by the filters, please use --full with fuzzy/chooser selectors.

Phabricator Automation

Updated

•

4 years ago

Assignee: nobody → egao

Status: NEW → ASSIGNED

Phabricator Automation

Updated

•

4 years ago

Attachment #9142621 - Attachment description: Bug 1631990 - additional filters for mach try syntax → Bug 1631990 - additional filters on mach try selectors

Geoff Brown [:gbrown]

Comment 4

•

4 years ago

From a task efficiency / cost reduction perspective, this looks great.

On the other hand, a reasonable use of syntax -all is "I'm trying to land a tricky change that has unknown side effects and I want to see if my change breaks anything that runs on autoland or central". Anything that's left out is a potential surprise that will require a backout...I wonder if this change is going too far. But I'm not sure: What are the chances of breaking aarch64 builds, rust tests, valgrind, etc? Hard to say, I'm sure.

Flags: needinfo?(gbrown)

Andrew Halberstadt [:ahal]

Comment 5

•

4 years ago

•

Edited

We should keep in mind that every task we add to this list makes developers lives harder because it increases the chances of them being backed out over a failure that they had no chance of detecting on try. I've fielded many complaints about this happening and developers always find it extremely frustrating.

I think there's a risk that if we add too much stuff to this list, it will actually have the opposite effect.. we could see it increase cost. This is because developers grow frustrated and will use --full more often to select the tasks that caused them to get backed out last time. As a result, they'll end up over-selecting.

Historically, we've tried to limit this list to platforms that are resource constrained (like android-hw) or things that are very expensive to run (like ccov). But over time it has started to grow larger. I question whether all of the things in the list currently should be, and think in general we should keep the list as small as possible. My personal opinion is that it is already too big.

All of this isn't to say that maybe some of your suggestions actually do belong on the list. I haven't thought about your particular examples.

Andrew Halberstadt [:ahal]

Comment 6

•

4 years ago

Philosophically speaking, the goal of autoland/central is to catch regressions. The goal of try is to help developers not get backed out. So if we think these tasks aren't providing good value, the conversation should be around removing them from autoland/central. The tasks available on try should just mirror those as closely as possible.

Tom Prince [:tomprince]

Comment 7

•

4 years ago

•

Edited

In my opinion being backed out is not a bad thing, and that people should expect to be backed out some percentage of the time, and that if they are running a bunch of tests to ensure that they are not being backed out, then they are likely running too many tests. I'm fairly certain that some of the optimizations and things that have been added to try syntax where added because people where running -t all more than necessary.

Chris AtLee [:catlee]

Comment 8

•

4 years ago

What about pointing people at a defined preset suite like mach try --preset sample-suites instead of using -t all if they want nearly complete testing before landing on autoland?

Edwin Takahashi (:egao | infrequent contributor)

Assignee

Comment 9

•

4 years ago

•

Edited

(In reply to Andrew Halberstadt [:ahal] from comment #5)

We should keep in mind that every task we add to this list makes developers lives harder because it increases the chances of them being backed out over a failure that they had no chance of detecting on try. I've fielded many complaints about this happening and developers always find it extremely frustrating.

I understand their frustration, but I would like to point out that --full flag exists to precisely permit developers to access the unfiltered list.
The problem I'm trying to address by adding to the blacklist is to prevent the situation where a user select all with mach try chooser or mach try fuzzy. Even worse, for mach try syntax the user will in many cases select all for a lot of things because figuring out what syntax works is too long.

Things like build-.*gcp or valgrind are low-value tasks whose consumer isn't even well defined yet, let alone exist in some cases.

I'm adding predominantly build tasks here, because this loops back to the topic discussed in the Smart Scheduling meeting this (2020/04/23) morning, that most of the build tasks aren't even clear why they run, who consumes them, or have any tests using the build as dependency.

Those are the situations I'm trying to catch with this extended blacklist. They aren't made permanently unavailable; just removed from the regular view unless developers explicitly know that they want to schedule such tasks.

Edwin Takahashi (:egao | infrequent contributor)

Assignee

Updated

•

4 years ago

Summary: Additional filters for the mach try task blacklist → Additional filters for mach try task blacklist to filter tasks unless --full is specified

Joel Maher ( :jmaher ) (UTC -8)

Comment 10

•

4 years ago

most if not all the filters here are for jobs that run on limited hardware, only run on m-c as tier-2, or could be tier-2/low risk.

it does get risky when we have a lot of stuff in a blacklist in one part of code, and other filters elsewhere. Ideally all of these tasks would be tier2 or tier3 and we would only schedule tier1 by default without extra flags.

Andrew Halberstadt [:ahal]

Comment 11

•

4 years ago

(In reply to Edwin Takahashi (:egao) from comment #9)

I understand their frustration, but I would like to point out that --full flag exists to precisely permit developers to access the unfiltered list.

True, but the hope is that using --full is a rare edge case that is only there for power users who know what they are doing. The risk comes with devs developing a habit of using it even when not necessary.

The problem I'm trying to address by adding to the blacklist is to prevent the situation where a user select all with mach try chooser or mach try fuzzy. Even worse, for mach try syntax the user will in many cases select all for a lot of things because figuring out what syntax works is too long.

Yes, this is a problem, and yes your solution would help improve it. Another way to solve this is to remove the all alias to ./mach try syntax and put a hard cap on the number of tasks in ./mach try fuzzy. This will also annoy developers, but I think it might be more palatable for them.

Things like build-.*gcp or valgrind are low-value tasks whose consumer isn't even well defined yet, let alone exist in some cases.

Yes those might be good candidates for the list.

I'm adding predominantly build tasks here, because this loops back to the topic discussed in the Smart Scheduling meeting this (2020/04/23) morning, that most of the build tasks aren't even clear why they run, who consumes them, or have any tests using the build as dependency.

A task that is unclear / has no ownership will still cause a backout just the same as well maintained one. If it is truly unmaintained, it shouldn't be running on mozilla-central in the first place.

Those are the situations I'm trying to catch with this extended blacklist. They aren't made permanently unavailable; just removed from the regular view unless developers explicitly know that they want to schedule such tasks.

Understood. I'm not saying the list should be empty (valgrind seems like a prime candidate for it for instance). I'm just pointing out the negative side of diverging try from mozilla-central. I don't know how to weigh the cost in CI against the cost in developer salary / morale, we don't have data for it.

Andrew Halberstadt [:ahal]

Comment 12

•

4 years ago

(In reply to Tom Prince [:tomprince] from comment #7)

In my opinion being backed out is not a bad thing, and that people should expect to be backed out some percentage of the time, and that if they are running a bunch of tests to ensure that they are not being backed out, then they are likely running too many tests. I'm fairly certain that some of the optimizations and things that have been added to try syntax where added because people where running -t all more than necessary.

I actually strongly agree with this. But when you get backed out and then later find that the task you got backed out from can't even be run on try by default, it can be very frustrating. It's psychological more than anything.

Alternate solutions I'd prefer to this blacklist include removing all from try syntax and putting a hard cap on try_task_config.json tasks. Basically either force people to be thoughtful, or if not, then to use ./mach try auto.

Edwin Takahashi (:egao | infrequent contributor)

Assignee

Comment 13

•

4 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #12)

Alternate solutions I'd prefer to this blacklist include removing all from try syntax and putting a hard cap on try_task_config.json tasks. Basically either force people to be thoughtful, or if not, then to use ./mach try auto.

That's a really good alternate solution.

I'll spend some time reviewing what each of the existing blacklists catch, consider any of them for removal and implement your suggestions.

Andrew Halberstadt [:ahal]

Comment 14

•

4 years ago

It's worth pointing out that in the past, we've sort of hesitated to implement a cap because we didn't have a good answer to the question to "How do I test a large / risky patch?". Once ./mach try auto is a bit better, we'll have that answer.

James Graham [:jgraham]

Comment 15

•

4 years ago

A cap on the number of tasks less than "one suite on all configurations" would be a problem for me/wptsync.

Do we have data on how often these rare tasks are used on try and how often they don't pass?

Edwin Takahashi (:egao | infrequent contributor)

Assignee

Comment 16

•

4 years ago

I'm going to hold off on the suggestions made by :ahal for the time being, as the alternatives aren't ready yet if those restrictions are put in place.

However, I will do the following in this bug:

review the filter list to ensure we're filtering what we want
incorporate build tasks into the list only

In a follow-up bug, once mach try auto is production ready:

investigate removing -u all -t all, or some combination of that
investigate placing an upper limit on try fuzzy or try chooser tasks

Since mach try auto isn't ready yet, the last two items are just things for me to think about.

(In reply to James Graham [:jgraham] from comment #15)

A cap on the number of tasks less than "one suite on all configurations" would be a problem for me/wptsync.

Do we have data on how often these rare tasks are used on try and how often they don't pass?

If I maintain the ability to select -p all but remove the ability to select -t all, for example, then that should retain your ability to run all web-platform-tests upon sync from upstream.

Either way, no action will be taken for some time.

James Graham [:jgraham]

Comment 17

•

4 years ago

wptsync uses mach try fuzzy to select the tasks, not try syntax. My concern is if we say you can only have 400 task total in a try push, or something, then we might hit that limit during wptsync where we really do need to run on all platforms that are running on central.

Pulsebot

Comment 18

•

4 years ago

Pushed by egao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f5b0f10cff51
additional filters on mach try selectors r=jmaher

Cristian Brindusan [:cbrindusan]

Comment 19

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/f5b0f10cff51

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox77: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla77

BMO Automation

Updated

•

2 years ago

Product: Firefox Build System → Developer Infrastructure