Closed Bug 1631990 Opened 4 months ago Closed 3 months ago

Additional filters for mach try task blacklist to filter tasks unless --full is specified

Categories

(Firefox Build System :: Try, enhancement)

enhancement

Tracking

(firefox77 fixed)

RESOLVED FIXED
mozilla77
Tracking Status
firefox77 --- fixed

People

(Reporter: egao, Assigned: egao)

References

Details

Attachments

(1 file)

I am proposing to insert additional filters into the blacklist that is applied to mach try pushes, as implemented in Bug 1630350 and Bug 1627340.

For the most part, these tasks must intentionally be chosen for mach try fuzzy.
However, if a user pushes with mach try chooser + select all, or mach try syntax + -b do, these tasks are inadvertently scheduled quite easily.

Since in Bug 1630350 it was intentionally made such that mach try syntax does not have a --full option to bypass filtering, this means users intending to run tasks filtered by the blacklist would be directed to use the alternatives.

Proposed list:

  • build.*aarch64.*
  • build*.gcp.*
  • build-linux/opt
  • .*-rusttests
  • .*valgrind.*

:jmaher, :gbrown - I'd like to hear your thoughts about adding the above list (and possibly more) to the task filter blacklist to be applied by default to the mach selectors.

I was particularly annoyed that selecting syntax -b do -p all would schedule builds for all those items, including the ones we have restricted from running like linux32.

I suspect if I run -u all I would see a lot of unittests get scheduled, which may be fine additions to the list.

Flags: needinfo?(jmaher)
Flags: needinfo?(gbrown)

I think this is a great idea. A lot of these are things that we are already looking at restricting to every 10th push or as tier-2, so this is right in line. Also the plain builds might be good to add, same with mingw32 :)

Flags: needinfo?(jmaher)

Changes:

Add more filters to be applied by default to all mach try selectors.

In order to access tasks hidden by the filters, please use --full with fuzzy/chooser selectors.

Assignee: nobody → egao
Status: NEW → ASSIGNED
Attachment #9142621 - Attachment description: Bug 1631990 - additional filters for mach try syntax → Bug 1631990 - additional filters on mach try selectors

From a task efficiency / cost reduction perspective, this looks great.

On the other hand, a reasonable use of syntax -all is "I'm trying to land a tricky change that has unknown side effects and I want to see if my change breaks anything that runs on autoland or central". Anything that's left out is a potential surprise that will require a backout...I wonder if this change is going too far. But I'm not sure: What are the chances of breaking aarch64 builds, rust tests, valgrind, etc? Hard to say, I'm sure.

Flags: needinfo?(gbrown)

We should keep in mind that every task we add to this list makes developers lives harder because it increases the chances of them being backed out over a failure that they had no chance of detecting on try. I've fielded many complaints about this happening and developers always find it extremely frustrating.

I think there's a risk that if we add too much stuff to this list, it will actually have the opposite effect.. we could see it increase cost. This is because developers grow frustrated and will use --full more often to select the tasks that caused them to get backed out last time. As a result, they'll end up over-selecting.

Historically, we've tried to limit this list to platforms that are resource constrained (like android-hw) or things that are very expensive to run (like ccov). But over time it has started to grow larger. I question whether all of the things in the list currently should be, and think in general we should keep the list as small as possible. My personal opinion is that it is already too big.

All of this isn't to say that maybe some of your suggestions actually do belong on the list. I haven't thought about your particular examples.

Philosophically speaking, the goal of autoland/central is to catch regressions. The goal of try is to help developers not get backed out. So if we think these tasks aren't providing good value, the conversation should be around removing them from autoland/central. The tasks available on try should just mirror those as closely as possible.

In my opinion being backed out is not a bad thing, and that people should expect to be backed out some percentage of the time, and that if they are running a bunch of tests to ensure that they are not being backed out, then they are likely running too many tests. I'm fairly certain that some of the optimizations and things that have been added to try syntax where added because people where running -t all more than necessary.

What about pointing people at a defined preset suite like mach try --preset sample-suites instead of using -t all if they want nearly complete testing before landing on autoland?

(In reply to Andrew Halberstadt [:ahal] from comment #5)

We should keep in mind that every task we add to this list makes developers lives harder because it increases the chances of them being backed out over a failure that they had no chance of detecting on try. I've fielded many complaints about this happening and developers always find it extremely frustrating.

I understand their frustration, but I would like to point out that --full flag exists to precisely permit developers to access the unfiltered list.
The problem I'm trying to address by adding to the blacklist is to prevent the situation where a user select all with mach try chooser or mach try fuzzy. Even worse, for mach try syntax the user will in many cases select all for a lot of things because figuring out what syntax works is too long.

Things like build-.*gcp or valgrind are low-value tasks whose consumer isn't even well defined yet, let alone exist in some cases.

I'm adding predominantly build tasks here, because this loops back to the topic discussed in the Smart Scheduling meeting this (2020/04/23) morning, that most of the build tasks aren't even clear why they run, who consumes them, or have any tests using the build as dependency.

Those are the situations I'm trying to catch with this extended blacklist. They aren't made permanently unavailable; just removed from the regular view unless developers explicitly know that they want to schedule such tasks.

Summary: Additional filters for the mach try task blacklist → Additional filters for mach try task blacklist to filter tasks unless --full is specified

most if not all the filters here are for jobs that run on limited hardware, only run on m-c as tier-2, or could be tier-2/low risk.

it does get risky when we have a lot of stuff in a blacklist in one part of code, and other filters elsewhere. Ideally all of these tasks would be tier2 or tier3 and we would only schedule tier1 by default without extra flags.

(In reply to Edwin Takahashi (:egao) from comment #9)

I understand their frustration, but I would like to point out that --full flag exists to precisely permit developers to access the unfiltered list.

True, but the hope is that using --full is a rare edge case that is only there for power users who know what they are doing. The risk comes with devs developing a habit of using it even when not necessary.

The problem I'm trying to address by adding to the blacklist is to prevent the situation where a user select all with mach try chooser or mach try fuzzy. Even worse, for mach try syntax the user will in many cases select all for a lot of things because figuring out what syntax works is too long.

Yes, this is a problem, and yes your solution would help improve it. Another way to solve this is to remove the all alias to ./mach try syntax and put a hard cap on the number of tasks in ./mach try fuzzy. This will also annoy developers, but I think it might be more palatable for them.

Things like build-.*gcp or valgrind are low-value tasks whose consumer isn't even well defined yet, let alone exist in some cases.

Yes those might be good candidates for the list.

I'm adding predominantly build tasks here, because this loops back to the topic discussed in the Smart Scheduling meeting this (2020/04/23) morning, that most of the build tasks aren't even clear why they run, who consumes them, or have any tests using the build as dependency.

A task that is unclear / has no ownership will still cause a backout just the same as well maintained one. If it is truly unmaintained, it shouldn't be running on mozilla-central in the first place.

Those are the situations I'm trying to catch with this extended blacklist. They aren't made permanently unavailable; just removed from the regular view unless developers explicitly know that they want to schedule such tasks.

Understood. I'm not saying the list should be empty (valgrind seems like a prime candidate for it for instance). I'm just pointing out the negative side of diverging try from mozilla-central. I don't know how to weigh the cost in CI against the cost in developer salary / morale, we don't have data for it.

(In reply to Tom Prince [:tomprince] from comment #7)

In my opinion being backed out is not a bad thing, and that people should expect to be backed out some percentage of the time, and that if they are running a bunch of tests to ensure that they are not being backed out, then they are likely running too many tests. I'm fairly certain that some of the optimizations and things that have been added to try syntax where added because people where running -t all more than necessary.

I actually strongly agree with this. But when you get backed out and then later find that the task you got backed out from can't even be run on try by default, it can be very frustrating. It's psychological more than anything.

Alternate solutions I'd prefer to this blacklist include removing all from try syntax and putting a hard cap on try_task_config.json tasks. Basically either force people to be thoughtful, or if not, then to use ./mach try auto.

(In reply to Andrew Halberstadt [:ahal] from comment #12)

Alternate solutions I'd prefer to this blacklist include removing all from try syntax and putting a hard cap on try_task_config.json tasks. Basically either force people to be thoughtful, or if not, then to use ./mach try auto.

That's a really good alternate solution.

I'll spend some time reviewing what each of the existing blacklists catch, consider any of them for removal and implement your suggestions.

It's worth pointing out that in the past, we've sort of hesitated to implement a cap because we didn't have a good answer to the question to "How do I test a large / risky patch?". Once ./mach try auto is a bit better, we'll have that answer.

A cap on the number of tasks less than "one suite on all configurations" would be a problem for me/wptsync.

Do we have data on how often these rare tasks are used on try and how often they don't pass?

I'm going to hold off on the suggestions made by :ahal for the time being, as the alternatives aren't ready yet if those restrictions are put in place.

However, I will do the following in this bug:

  • review the filter list to ensure we're filtering what we want
  • incorporate build tasks into the list only

In a follow-up bug, once mach try auto is production ready:

  • investigate removing -u all -t all, or some combination of that
  • investigate placing an upper limit on try fuzzy or try chooser tasks

Since mach try auto isn't ready yet, the last two items are just things for me to think about.

(In reply to James Graham [:jgraham] from comment #15)

A cap on the number of tasks less than "one suite on all configurations" would be a problem for me/wptsync.

Do we have data on how often these rare tasks are used on try and how often they don't pass?

If I maintain the ability to select -p all but remove the ability to select -t all, for example, then that should retain your ability to run all web-platform-tests upon sync from upstream.

Either way, no action will be taken for some time.

wptsync uses mach try fuzzy to select the tasks, not try syntax. My concern is if we say you can only have 400 task total in a try push, or something, then we might hit that limit during wptsync where we really do need to run on all platforms that are running on central.

Pushed by egao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f5b0f10cff51
additional filters on mach try selectors r=jmaher
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla77
You need to log in before you can comment on or make changes to this bug.