Taskcluster is needlessly scheduling a lot of extra jobs on Beta since the 54 uplift

NEW
Unassigned

Status

2 years ago
8 months ago

People

(Reporter: RyanVM, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

2 years ago
There's been some random IRC chatter about this, so figured it would be good to get a bug filed. For some reason, TC seems to be scheduling a LOT of jobs on every push that it shouldn't be (SM-tc, clang toolchain jobs, desktop jobs when only /mobile is being touched, etc). Something seems seriously broken with how we're making scheduling decisions here and it's wasting a lot of resources along the way.

Greg, can you help find someone to investigate?
Flags: needinfo?(garndt)

Comment 1

2 years ago
Could you point me at some jobs that ran that shouldn't have, and the requirements for running those (meaning, are they only supposed to run when something in a directory is touched, or if they shouldn't run at all on a certain branches).  That'll help narrow down what is being scheduled that just shouldn't be because of configuration changes, or what might actually be broke.
Flags: needinfo?(garndt)

Updated

2 years ago
Component: Scheduler → Task Configuration

Comment 2

2 years ago
Somethings noticed from IRC is that the configuration seems to be doing what it says it's doing, but it's different from what's intended.

Firefox desktop jobs have no "when" clause and are set to run on all projects.
toolchain and spidermonkey tasks do specify some optimizations to be performed but beta has "optimize_target_tasks" set to false so no tasks will be optimized out of the task graph.

There is also a need for a feature not yet implemented to optimize out jobs if a push only touches a certain directory.  In this case, optimize out the Firefox desktop jobs if only mobile/ was touched.

Comment 3

2 years ago
I /think/ the "run everything on every push [for beta, release, etc]" mentality stems from a time when we didn't have as much confidence in our automation. Furthermore, I'm willing to wager that approach was cargo culted from buildbot to taskcluster (read: not much thought went into changing it - we were aiming for feature parity with buildbot).

Our automation has come a long way.

I hypothesize that if we were to approach this problem again today, we may reach the conclusion that our automation is more robust and is capable of performing intelligent scheduling. If that's true, then we should revert that "optimize_target_tasks" setting on the beta repo to match what we do on central, autoland, etc.
As mentioned in bug 1360609, this would make all builds essentially have to wait for gcc and clang to finish building once things are hooked up per bug 1313111...

(In reply to Greg Arndt [:garndt] from comment #2)
> but beta has "optimize_target_tasks" set to false so no tasks will
> be optimized out of the task graph.

Interestingly enough... docker images are still optimized away.
Blocks: 1313111
(In reply to Mike Hommey [:glandium] from comment #4)
> Interestingly enough... docker images are still optimized away.

And that seems to happen because they have an empty "run_on_projects".
(In reply to Mike Hommey [:glandium] from comment #5)
> (In reply to Mike Hommey [:glandium] from comment #4)
> > Interestingly enough... docker images are still optimized away.
> 
> And that seems to happen because they have an empty "run_on_projects".

That's why they're not in do_not_optimize, but that's not why they're considered at all, since adding an empty run_on_projects on toolchain jobs doesn't make them go through optimization...
(In reply to Mike Hommey [:glandium] from comment #6)
> That's why they're not in do_not_optimize, but that's not why they're
> considered at all, since adding an empty run_on_projects on toolchain jobs
> doesn't make them go through optimization...

Having something depend on them does that. So, it seems we can reasonably address the issue for toolchains in bug 1360609 in a way that won't prevent bug 1313111 from working.

I think we have good reason to stay cautious with optimizations like "only run this job when changes to $path happen" like SM jobs and such, because it is a fact that some of those don't run close to enough.
No longer blocks: 1313111
1 failures in 814 pushes (0.001 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-beta: 1

Platform breakdown:
* toolchains: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1359838&startday=2017-06-12&endday=2017-06-18&tree=all
1 failures in 892 pushes (0.001 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-beta: 1

Platform breakdown:
* toolchains: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1359838&startday=2017-06-19&endday=2017-06-25&tree=all

Updated

8 months ago
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.